[U-Boot] [PATCH v7] socfpga: Add socfpga preloader signing to mkimage

Sat Mar 8 17:51:23 CET 2014

On Thu, Mar 06, 2014 at 15:40 +1300, Charles Manning wrote:
> 
> [ ... ]
> Unfortunately the CRC used in this boot ROM is not the same as the
> Adler CRC in lib/crc32.c. Indeed the Adler code is not technically a
> CRC but is more correctly described as a checksum.

I don't quite get why you say that the zlib implementation is not
a CRC.  IIUC all of the CRC-32 methods follow a common algorithm,
and just happen to differ in their polynomials or initial and
final masks.  So they result in different CRC values for the same
data stream, yet all of them are CRCs.

But strictly speaking this remark is not specific to this patch
submission, and might be omitted.  The only relevant thing is
that the zlib CRC-32 approach does not result in the value that
the SoC's ROM loader expects.

> Thus, the appropriate CRC generator is added to lib/ as crc32_alt.c.

What does "alt" stand for, and can you provide an appropriate
description with the commit log (or just use a better name)?  As
previous communication suggests, it's certainly not "alternative"
-- as there just are too many alternatives available to mark one
of them as _the_ alternative.  It's neither "apple" where you
appear to have found the matching polynomial first.  If it's
"Altera", I'd suggest to use either "altr" as this is what
elsewhere is used for Altera (the stock ticker), or plain
"altera" instead of something this short and ambiguous.  Or
"bzlib" since this is the most recent implementation that you are
using.

> Signed-off-by: Charles Manning <cdhmanning at gmail.com>
> ---
> 
> Changes for v3:
>  - Fix some coding style issues.
>  - Move from a standalone tool to the mkimgae framework.
> 
> Changes for v4:
>  - Fix more coding style issues.
>  - Fix typos in Makefile.
>  - Rebase on master (previous version was not on master, but on a 
>    working socfpga branch).
> 
> Changes for v5:
>  - Fix more coding style issues.
>  - Add some more comments.
>  - Remove some unused defines.
>  - Move the local CRC32 code into lib/crc32_alt.c.
> 
> Changes for v6:
>  - Fix more coding style issues.
>  - Rejig socfpgaimage_vrec_header() function so that it has one return 
>    path and does stricter size checks.
> 
> Changes for v7:
>  - Use bzlib's crc table instead of adding another one.
>  - Use existing code and a packed structure for header marshalling.

One of the reasons I got tired of providing feedback is that this
change log is rather terse.  Several identical "fix coding style"
phrases are another way of telling readers that they should
figure out for themselves what might have changed (especially
when newer versions have functional changes that are not
announced as such), and whether feedback was considered or got
ignored.  Seeing several iterations which suffer from the same
issues that have been identified multiple versions before isn't
helpful either when asking yourself whether to spend more time
and getting ignored another time.

> --- /dev/null
> +++ b/include/bzlib_crc32.h
> @@ -0,0 +1,17 @@
> +/*
> + * Copyright (C) 2014 Charles Manning <cdhmanning at gmail.com>
> + *
> + * SPDX-License-Identifier:	GPL-2.0+
> + *
> + * Note that the CRC is **not** the zlib/Adler crc32 in crc32.c.
> + * It is the CRC-32 used in bzip2, ethernet and elsewhere.
> + */
> +
> +#ifndef __BZLIB_CRC32_H__
> +#define __BZLIB_CRC32_H__
> +
> +#include <stdint.h>
> +
> +uint32_t bzlib_crc32(uint32_t crc, const void *_buf, int length);
> +
> +#endif

and

> --- /dev/null
> +++ b/lib/bzlib_crc32.c
> @@ -0,0 +1,26 @@
> +/*
> + * Copyright (C) 2014 Charles Manning <cdhmanning at gmail.com>
> + *
> + * SPDX-License-Identifier:	GPL-2.0+
> + *
> + * This provides a CRC-32 using the tables in bzlib_crctable.c
> + */
> +
> +#include <bzlib_crc32.h>
> +#include "bzlib_private.h"
> +
> +uint32_t bzlib_crc32(uint32_t crc, const void *_buf, int length)
> +{
> +	const uint8_t *buf = _buf;
> +
> +	crc ^= ~0;
> +
> +	while (length--) {
> +		crc = (crc << 8) ^ BZ2_crc32Table[((crc >> 24) ^ *buf) & 0xff];
> +		buf++;
> +	}
> +
> +	crc ^= ~0;
> +
> +	return crc;
> +}

Since you already include the private bzlib header for the
BZ2_crc32Table[] declaration, you might as well use the
BZ_INITIALISE_CRC(), BZ_FINALISE_CRC(), and BZ_UPDATE_CRC()
macros declared from there (right next to the table's
declaration), instead of re-inventing how the CRC gets
calculated.

You probably should determine whether you want to provide one
routine to do the complete calculate over a given byte stream,
without any "previous" CRC value -- this is what your current use
case is.  Or whether you want to provide three primitives to
initialize, update, and finalize a CRC (which is not used yet).
Or whether you want to provide the three primitives plus one
convenience wrapper.

> --- a/tools/Makefile
> +++ b/tools/Makefile
> @@ -70,6 +70,8 @@ RSA_OBJS-$(CONFIG_FIT_SIGNATURE) := rsa-sign.o
>  # common objs for dumpimage and mkimage
>  dumpimage-mkimage-objs := aisimage.o \
>  			$(FIT_SIG_OBJS-y) \
> +			bzlib_crc32.o \
> +			bzlib_crctable.o \
>  			crc32.o \
>  			default_image.o \
>  			fit_image.o \
> @@ -85,6 +87,7 @@ dumpimage-mkimage-objs := aisimage.o \
>  			os_support.o \
>  			pblimage.o \
>  			sha1.o \
> +			socfpgaimage.o \
>  			ublimage.o \
>  			$(LIBFDT_OBJS) \
>  			$(RSA_OBJS-y)
> diff --git a/tools/bzlib_crc32.c b/tools/bzlib_crc32.c
> new file mode 100644
> index 0000000..b7f7aa5
> --- /dev/null
> +++ b/tools/bzlib_crc32.c
> @@ -0,0 +1 @@
> +#include "../lib/bzlib_crc32.c"
> diff --git a/tools/bzlib_crctable.c b/tools/bzlib_crctable.c
> new file mode 100644
> index 0000000..53d38ef
> --- /dev/null
> +++ b/tools/bzlib_crctable.c
> @@ -0,0 +1 @@
> +#include "../bzlib_crctable.c"
> diff --git a/tools/bzlib_private.h b/tools/bzlib_private.h
> new file mode 100644
> index 0000000..cfb74be
> --- /dev/null
> +++ b/tools/bzlib_private.h
> @@ -0,0 +1 @@
> +#include "lib/bzlib_private.h"

Now this is incredibly ugly.  You are duplicating lib/ stuff in
tools/ and build it there as well?  Or not at all build it in
lib/ where the code actually resides?  It's only a side note that
#include for "*.c" is problematic as some compilers do
semi-intelligent stuff when the filename does not end in ".h" and
then break compilation, so this hack should neither get spread
nor suggested for use.

Is there any (real, technical) reason why the bzip stuff (the
CRC-32 calculation that has been made available separately)
cannot get built and used as a library, and the tools/
application just gets linked against it as one would expect?

> --- /dev/null
> +++ b/tools/socfpgaimage.c
> @@ -0,0 +1,255 @@
> +/*
> + * Copyright (C) 2014 Charles Manning <cdhmanning at gmail.com>
> + *
> + * SPDX-License-Identifier:	GPL-2.0+
> + *
> + * Reference doc http://www.altera.com.cn/literature/hb/cyclone-v/cv_5400A.pdf
> + * Note this doc is not entirely accurate. Of particular interest to us is the
> + * "header" length field being in U32s and not bytes.

Can you rephrase this?  The current form suggests that you cannot
trust the Altera documentation, while you fail to state what
exactly you have identified as being wrong (whether this one
length spec is all the problems, or whether there are more).

Looking at "Loading the Preloader" on page A-7 I see that it says
the "Program length" is in bytes, while the "CRC" description on
page A-8 mentions "Program Length * 4".  So the Handbook is
inconsistent, and available data suggests that "length in units
of 32bit words" is correct.

The polynomial cited there BTW translates to 0x04c11db7, which
may be more specific than "what is used in ethernet and
elsewhere", and might be useful to have in comments and commit
logs.  The initial value and final XOR and "no reflection" are
the other attributes of this specific CRC-32 method.

> + *
> + * "Header" is a structure of the following format.
> + * this is positioned at 0x40.
> + *
> + * Endian is LSB.
> + *
> + * Offset   Length   Usage
> + * -----------------------
> + *   0x40        4   Validation word 0x31305341
> + *   0x44        1   Version (whatever, zero is fine)
> + *   0x45        1   Flags   (unused, zero is fine)
> + *   0x46        2   Length  (in units of u32, including the end checksum).
> + *   0x48        2   Zero
> + *   0x4A        2   Checksum over the heder. NB Not CRC32

This is wrong (and is not what the Handbook Appendix says).  The
length is 32bits in width.  It just happens that in little endian
representation and with a maximum of 60KB of payload data, the
upper 16bits naturally remain zero.  Still the documentation and
implementation should work with a 32bit length.

I agree that the term "checksum" might be confusing, because it's
a mere sum of the preceeding bytes, and "by coincidence" the sum
is used to check plausibility of the header.  I wouldn't either
know how to best put this into a short comment.  From personal
experience I can tell that "sum" alone would have helped a lot.

> + *
> + * At the end of the code we have a 32-bit CRC checksum over whole binary
> + * excluding the CRC.

This probably should say "CRC over the all payload data from
offset zero up to but not including the position of the CRC".
Depending on whether the 64KB padding is considered part of the
binary or some "afterwards thing", this might help.

Background information:  The SPL is supposed to be up to 60KB in
size (the upper 4KB are for the ROM loader's internal use, and
for data shared among the preloader and the bootloader).  The SPL
memory image has a gap at 0x40 (after the vectors, and before the
"useful code") where the preloader header/signature gets put
into.  After the signature application, a CRC-32 is calculated
over the complete data and gets appended to the data (the length
spec in the header includes the CRC location).  Then the image
gets padded to full 64KB and gets duplicated four times (to
increase robustness by means of redundancy, assuming boot media
might have read errors).

Your submission's motivation is to initially stamp a preloader
that's the fresh result of a compilation.  I can see that it
would be useful and desirable to apply the tool to existing
binaries as well, i.e. throw a 64KB binary at the tool to
re-create the signature.

> + *
> + * Note that the CRC used here is **not** the zlib/Adler crc32. It is the
> + * CRC-32 used in bzip2, ethernet and elsewhere.

If you want to be most specific, cite the parameters of the
CRC-32 used here.  See the above comments and
http://opensource.apple.com/source/CommonCrypto/CommonCrypto-60049/libcn/crc32-bzip2.c
and maybe http://www.ross.net/crc/download/crc_v3.txt

BTW I'm still under the impression that zlib uses the same
polynomial as well as identical initial values and final XOR
masks (see lib/crc32.c).  So where is the difference between the
zlib and the bzlib CRC-32 parameters?

  $ perl -e 'use Compress::Zlib; printf "%x\n", crc32("123456789");'
  cbf43926

> + *
> + * The image is padded out to 64k, because that is what is
> + * typically used to write the image to the boot medium.
> + */
> +
> +#include <bzlib_crc32.h>
> +#include "imagetool.h"
> +#include <image.h>
> +
> +#define HEADER_OFFSET	0x40
> +#define HEADER_SIZE	0xC
> +#define VALIDATION_WORD	0x31305341
> +#define PADDED_SIZE	0x10000
> +
> +/* To allow for adding CRC, the max input size is a bit smaller. */
> +#define MAX_INPUT_SIZE	(PADDED_SIZE - sizeof(uint32_t))

See above, it's not a blocker for the initial motivation, but
might be useful to run the tool on existing 64KB blobs as well.
But then it's hard to determine the "right" length in reliable
ways, and the benefit may not be as big as I think it is, hmm...
So OK, forget this idea, running the tool on fresh compiler
output of at most 60KB size is quite appropriate an assumption.

> +
> +static uint8_t buffer[PADDED_SIZE];
> +
> +static struct {
> +	uint32_t validation;
> +	uint8_t  version;
> +	uint8_t  flags;
> +	uint16_t length_u32;
> +	uint16_t zero;
> +	uint16_t checksum;
> +} __attribute__((packed)) header;

You are aware of the "packed" and "alignment" discussion threads
here, aren't you?  I considered the "serializer" accessors of
previous patch iterations most appropriate (except for their
names), and feel this approach with a struct is inferior
(especially because it doesn't cover the CRC later).  But others
need not agree.

Having a "length_u32" field which is of 'uint16_t' data type
doesn't look right.

> +
> +/*
> + * The header checksum is just a very simple checksum over
> + * the header area.
> + * There is still a crc32 over the whole lot.
> + */
> +static uint16_t hdr_checksum(const uint8_t *buf, int len)
> +{
> +	uint16_t ret = 0;
> +	int i;
> +
> +	for (i = 0; i < len; i++) {
> +		ret += (((uint16_t) *buf) & 0xff);
> +		buf++;
> +	}
> +	return ret;
> +}
> +
> +

What is the typecast good for?  Why are there still whitespace
issues?  Can you either fix them or tell why this is OK?  I.e.
can you _somehow_ respond to feedback you receive?  Ignoring
review comments makes you recive less feedback later.

The usual C language style comments apply:  Please don't mix
declarations and instructions.  You can eliminate variables.  Use
names which better reflect what is happening.  Something like
this is more idiomatic:

static uint16_t header_sum(const uint8_t *buf, int len)
{
	uint16_t sum;

	sum = 0;
	while (len-- > 0) {
		sum += *buf++;
	}
	return sum;
}

> +static void build_header(uint8_t *buf,
> +			  uint8_t version,
> +			  uint8_t flags,
> +			  uint16_t length_bytes)
> +{
> +	header.validation = htole32(VALIDATION_WORD);
> +	header.version = version;
> +	header.flags = flags;
> +	header.length_u32 = htole16(length_bytes/4);
> +	header.zero = 0;
> +	header.checksum = htole16(hdr_checksum((const uint8_t *)&header, 10));
> +
> +	memcpy(buf, &header, sizeof(header));
> +}

I'd suggest to s/4/sizeof(uint32_t)/

Can you rephrase the "10" in terms of "header size minus the
width of its sum"?  Very similar to the CRC which covers "all of
the data except the CRC field".

And there may be a problem with "direct" assignments to a
"packed" struct.  Since you fill in a local struct and memcpy to
the image then, you might as well use "normal serializers".

> +/* Sign the buffer and return the signed buffer size */
> +static int sign_buffer(uint8_t *buf,
> +			uint8_t version, uint8_t flags,
> +			int len, int pad_64k)
> +{
> +	uint32_t calc_crc;
> +
> +	/* Align the length up */
> +	len = (len + 3) & (~3);

Is there nothing you can re-use?  If not can you introduce
something generic to roundup() for user space tools to use?  Can
you rephrase the condition such that it becomes evident you are
doing ROUNDUP(len, sizeof(uint32_t)) and avoid the magic "3" that
has no such meaning for readers?

> +
> +	/* Build header, adding 4 bytes to length to hold the CRC32. */
> +	build_header(buf + HEADER_OFFSET,  version, flags, len + 4);
> +
> +	/* Calculate and apply the CRC */
> +	calc_crc = bzlib_crc32(0, buf, len);
> +
> +	*((uint32_t *)(buf + len)) = htole32(calc_crc);

This is what I was talking about above.  It may be better to use
a serializer to write both header fields as well as the CRC.

> +
> +	if (!pad_64k)
> +		return len + 4;
> +
> +	return PADDED_SIZE;
> +}

> +static void socfpgaimage_print_header(const void *ptr)
> +{
> +	if (verify_buffer(ptr) == 0)
> +		printf("Looks like a sane SOCFPGA preloader\n");
> +	else
> +		printf("Not a sane SOCFPGA preloader\n");
> +}

Would it be useful to print a little more information than "it's
a preloader" without any further details?  I'm not familiar with
the context this routine gets called in, what do users expect?

virtually yours
Gerhard Sittig
-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr. 5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-0 Fax: +49-8142-66989-80  Email: office at denx.de