[U-Boot] [PATCH v7] socfpga: Add socfpga preloader signing to mkimage

Mon Mar 10 21:50:42 CET 2014

Dear Gerhard

Thank you for your further comments and clarifications, may I press you for a 
few more?

On Tuesday 11 March 2014 08:36:24 Gerhard Sittig wrote:
> [ Cc: to Masahiro for the tools/ "vs" lib/ build support part ]
>
> On Mon, Mar 10, 2014 at 16:04 +1300, Charles Manning wrote:
> > On Sunday 09 March 2014 05:51:23 Gerhard Sittig wrote:
> > > On Thu, Mar 06, 2014 at 15:40 +1300, Charles Manning wrote:
> > > > --- /dev/null
> > > > +++ b/lib/bzlib_crc32.c
> > > > @@ -0,0 +1,26 @@
> > > > +/*
> > > > + * Copyright (C) 2014 Charles Manning <cdhmanning at gmail.com>
> > > > + *
> > > > + * SPDX-License-Identifier:	GPL-2.0+
> > > > + *
> > > > + * This provides a CRC-32 using the tables in bzlib_crctable.c
> > > > + */
> > > > +
> > > > +#include <bzlib_crc32.h>
> > > > +#include "bzlib_private.h"
> > > > +
> > > > +uint32_t bzlib_crc32(uint32_t crc, const void *_buf, int length)
> > > > +{
> > > > +	const uint8_t *buf = _buf;
> > > > +
> > > > +	crc ^= ~0;
> > > > +
> > > > +	while (length--) {
> > > > +		crc = (crc << 8) ^ BZ2_crc32Table[((crc >> 24) ^ *buf) & 0xff];
> > > > +		buf++;
> > > > +	}
> > > > +
> > > > +	crc ^= ~0;
> > > > +
> > > > +	return crc;
> > > > +}
> > >
> > > Since you already include the private bzlib header for the
> > > BZ2_crc32Table[] declaration, you might as well use the
> > > BZ_INITIALISE_CRC(), BZ_FINALISE_CRC(), and BZ_UPDATE_CRC()
> > > macros declared from there (right next to the table's
> > > declaration), instead of re-inventing how the CRC gets
> > > calculated.
> >
> > If you think that makes it more clear I shall do that. I don't think
> > though that it really is any clearer.
>
> Admittedly your wrapper is thin.  Yet it duplicates what the
> bzlib header already provides.  And does so in different phrases
> (i.e. textually does not match the original, so probably gets
> missed upon searches).
>
> It's more a matter of consistency to re-use the code as well as
> the table of the bzip2 implementation, instead of re-inventing
> both or using parts of it only.

It does not bug me to use either way. I shall use that.

>
> > > You probably should determine whether you want to provide one
> > > routine to do the complete calculate over a given byte stream,
> > > without any "previous" CRC value -- this is what your current use
> > > case is.  Or whether you want to provide three primitives to
> > > initialize, update, and finalize a CRC (which is not used yet).
> > > Or whether you want to provide the three primitives plus one
> > > convenience wrapper.
> >
> > I try to use a single "familiar" function, like crc32() does.
> >
> > You can start with 0, then use it with the results.
> >
> > eg
> >
> > crc = crc32(0, first_buff,...);
> > crc = crc32(crc, second_bufer,...);
>
> This only works "by coincidence" because the final XOR and the
> initial value happen to be complimentary and only by chance the
> value of zero "works".  So it's not a general pattern, but a
> specific feature of this very implementation.
>
> Given that your current application only determines CRC values
> for complete buffers and nothing is done piecemeal, I'd suggest
> to provide the one convenience wrapper routine which does not
> take a previous partial result.
>
> Should other use cases later require incremental calculation,
> they can introduce and use the three primitives to open, iterate
> and close the calculation.  This better reflects what's
> happening.  Or am I being overly picky?

Perhaps picky, but I do not mind losing this.

The issue is that these functions are often used on long sequences one buffer 
at a time (eg. doing a crc on 100k, but getting the data in 1k blocks). 

This implementation does not need the multi-buffer support, but when I 
refactor the crcs in mkimage (as I have undertaken to do - even though it is 
of no direct utility to me at all), there might be a need to use 
multi-buffer.

For now I will just provide a single buffer version, and will widen it up if 
need be later.

>
> > > > --- a/tools/Makefile
> > > > +++ b/tools/Makefile
> > > > @@ -70,6 +70,8 @@ RSA_OBJS-$(CONFIG_FIT_SIGNATURE) := rsa-sign.o
> > > >  # common objs for dumpimage and mkimage
> > > >  dumpimage-mkimage-objs := aisimage.o \
> > > >  			$(FIT_SIG_OBJS-y) \
> > > > +			bzlib_crc32.o \
> > > > +			bzlib_crctable.o \
> > > >  			crc32.o \
> > > >  			default_image.o \
> > > >  			fit_image.o \
> > > > @@ -85,6 +87,7 @@ dumpimage-mkimage-objs := aisimage.o \
> > > >  			os_support.o \
> > > >  			pblimage.o \
> > > >  			sha1.o \
> > > > +			socfpgaimage.o \
> > > >  			ublimage.o \
> > > >  			$(LIBFDT_OBJS) \
> > > >  			$(RSA_OBJS-y)
> > > > diff --git a/tools/bzlib_crc32.c b/tools/bzlib_crc32.c
> > > > new file mode 100644
> > > > index 0000000..b7f7aa5
> > > > --- /dev/null
> > > > +++ b/tools/bzlib_crc32.c
> > > > @@ -0,0 +1 @@
> > > > +#include "../lib/bzlib_crc32.c"
> > > > diff --git a/tools/bzlib_crctable.c b/tools/bzlib_crctable.c
> > > > new file mode 100644
> > > > index 0000000..53d38ef
> > > > --- /dev/null
> > > > +++ b/tools/bzlib_crctable.c
> > > > @@ -0,0 +1 @@
> > > > +#include "../bzlib_crctable.c"
> > > > diff --git a/tools/bzlib_private.h b/tools/bzlib_private.h
> > > > new file mode 100644
> > > > index 0000000..cfb74be
> > > > --- /dev/null
> > > > +++ b/tools/bzlib_private.h
> > > > @@ -0,0 +1 @@
> > > > +#include "lib/bzlib_private.h"
> > >
> > > Now this is incredibly ugly.  You are duplicating lib/ stuff in
> > > tools/ and build it there as well?  Or not at all build it in
> > > lib/ where the code actually resides?
> > >
> > > It's only a side note that
> > > #include for "*.c" is problematic as some compilers do
> > > semi-intelligent stuff when the filename does not end in ".h" and
> > > then break compilation, so this hack should neither get spread
> > > nor suggested for use.
> >
> > I absolutely agree this is very, very, ugly.
> >
> > I was just following the current way of doing things. This is how
> > crc32.c, sha1.c, md5.c, fdt.c and many others are handled.
> >
> > I expect that this is worth a refactorisation effort, but I think that
> > should be a separate exercise from doing this signer.
> >
> > > Is there any (real, technical) reason why the bzip stuff (the
> > > CRC-32 calculation that has been made available separately)
> > > cannot get built and used as a library, and the tools/
> > > application just gets linked against it as one would expect?
> >
> > From my limited understanding, lib/ is not built for the host. It is only
> > built for the target. That is why we have all these ugly C files:
> > crc32.c, sha1.c,... in tools/.
>
> With Kbuild, is the '#include "../otherdir/source.c" trick still
> needed?  Would a list of object files with relative path specs
> work, or can object files in one output directory result from
> compile units taken out of several source directories?

I lack this level of knowledge of Kbuild so sorry I do not know.

>
> Can the tools/ build for the host environment use a library that
> gets built from lib/ sources?
>
> That an existing implementation uses similar hacks need not mean
> that it must be OK, or cannot get improved. :)

Absolutely agree. The current way is very, very ugly.

However it is surely better to do the refactoring as a separate step from 
adding a new function.

There is not much point in doing some lib/*c includes "the right way" and 
leaving the other stuff "the ugly way". Doing things two ways must surely be 
even worse than consistency.

>
> > > > --- /dev/null
> > > > +++ b/tools/socfpgaimage.c
> > > > @@ -0,0 +1,255 @@
> > > > [ ... ]
> > > >
> > > > + *
> > > > + * "Header" is a structure of the following format.
> > > > + * this is positioned at 0x40.
> > > > + *
> > > > + * Endian is LSB.
> > > > + *
> > > > + * Offset   Length   Usage
> > > > + * -----------------------
> > > > + *   0x40        4   Validation word 0x31305341
> > > > + *   0x44        1   Version (whatever, zero is fine)
> > > > + *   0x45        1   Flags   (unused, zero is fine)
> > > > + *   0x46        2   Length  (in units of u32, including the end
> > > > checksum). + *   0x48        2   Zero
> > > > + *   0x4A        2   Checksum over the heder. NB Not CRC32
> > >
> > > This is wrong (and is not what the Handbook Appendix says).  The
> > > length is 32bits in width.  It just happens that in little endian
> > > representation and with a maximum of 60KB of payload data, the
> > > upper 16bits naturally remain zero.  Still the documentation and
> > > implementation should work with a 32bit length.
> >
> > The implementation surely only has to work with the ROM. There is no
> > point in it working for arbitrary cases.
>
> Still it's a deviation from the documentation and obfuscates
> what's happening.  Think of the next person to read and
> manipulate the code, and find a "32bits length spec" in a 16bit
> field.  That's unexpected, and not necessary.  (or is it?)
>
> I still feel that the "length" spec is both, a length in units of
> 32bit words, as well as a 32bit value.  This is what Altera
> documents in Appendix A.

It does not really bother me either way. I assume the boot ROM will actually 
be reading this as a u32. Might as well just make it one.

Frequently what you get in the app notes is highly different from what the 
boot ROM actually does. In this case there is one certain error, but some 
devices, like the MX53 have all sorts of undocumented flags that need magic 
fields... Bottom line though is that when we're dealing with some hidden 
code, what matters most is getting the actual bytes right.

I will change this to a u32 wide field and drop the zero field.

>
> > > I agree that the term "checksum" might be confusing, because it's
> > > a mere sum of the preceeding bytes, and "by coincidence" the sum
> > > is used to check plausibility of the header.  I wouldn't either
> > > know how to best put this into a short comment.  From personal
> > > experience I can tell that "sum" alone would have helped a lot.
> >
> > What is a checksum but a sum that is used for checking? The term checksum
> > on its own does not imply any particular algorithm. If you think sum is
> > better, I shall use that.
>
> Well, the term "checksum" usually is associated with CRCs, in
> contrast to mere sums.  At least that's what triggered in my
> head.  But I'm happy to learn how others feel about it.  I may be
> wrong, and am happy to improve where possible.

[Academic sidebar:
Both CRCs and checksums are used for checking. The terms are sometimes used 
interchangeably and can often be very confusing.

By my understanding:
* A "true CRC" is something that you add to a the end of a bitstring so that 
doing the crc calculation over the original string + crc gives you zero. That 
is a very useful property when you are using hardware to do CRC validation 
(eg. a CAN controller) and you will often see them associated with 
telephony/networking specs
* A "checksum" just gives you two numbers (received and calculated) that you 
compare.

You can, of course, use a CRC as a checksum (and that is often how software is 
written). You cannot, however, use any checksum as a CRC because they do not 
always have that property.

I might be wrong here, but it is my understanding that the crc32() function in 
zlib lacks the CRC property mentioned above. From my understanding, it can 
only be used as a checksum.

end of sidebar]

>
> > > Background information:  The SPL is supposed to be up to 60KB in
> > > size (the upper 4KB are for the ROM loader's internal use, and
> > > for data shared among the preloader and the bootloader).  The SPL
> > > memory image has a gap at 0x40 (after the vectors, and before the
> > > "useful code") where the preloader header/signature gets put
> > > into.  After the signature application, a CRC-32 is calculated
> > > over the complete data and gets appended to the data (the length
> > > spec in the header includes the CRC location).  Then the image
> > > gets padded to full 64KB and gets duplicated four times (to
> > > increase robustness by means of redundancy, assuming boot media
> > > might have read errors).
> >
> > Is 60K a hard limitation for SPL? I know that people are considering SPLs
> > of larger size. Yeah... I know... what happened to 4k bootloaders?
>
> The 60K value is neither a limitation in the U-Boot project nor
> specific to the SPL approach.  It's the limit of the SoC's ROM
> loader (OCRAM minus reserved data area).
>
> The above "the SPL is supposed to be up to 60KB in size" is a
> specific statement for the Altera SoCFPGA preloader case.  I
> thought you should have recognized that number, having worked on
> the preloader for some time. :)

Ok, sorry, I had misunderstood what you were saying.

>
> > > Your submission's motivation is to initially stamp a preloader
> > > that's the fresh result of a compilation.  I can see that it
> > > would be useful and desirable to apply the tool to existing
> > > binaries as well, i.e. throw a 64KB binary at the tool to
> > > re-create the signature.
> >
> > I am confused by what you are saying here. There is only one point to
> > this: sign a preloader so that a SocFPGA boot ROM will use it. There are
> > no other use cases that make any sense.
> >
> > Or put another way... would I run the omap signer on an arbitrary binary?
> > No. Same argument here.
> >
> > By "signing existing binaries" do you mean existing preloaders? If so,
> > people can to that with mkimage -T socfpgaimage ... (once this addition
> > is in place).
>
> Upon re-consideration the idea turned out to be not of much
> benefit.  I said that, can't tell whether I should have gone and
> removed all remarks about it instead.  Just forget I said
> something. :)
>
> > > > + *
> > > > + * Note that the CRC used here is **not** the zlib/Adler crc32. It
> > > > is the + * CRC-32 used in bzip2, ethernet and elsewhere.
> > >
> > > If you want to be most specific, cite the parameters of the
> > > CRC-32 used here.  See the above comments and
> > > http://opensource.apple.com/source/CommonCrypto/CommonCrypto-60049/libc
> > >n/cr c32-bzip2.c and maybe http://www.ross.net/crc/download/crc_v3.txt
> > >
> > > BTW I'm still under the impression that zlib uses the same
> > > polynomial as well as identical initial values and final XOR
> > > masks (see lib/crc32.c).  So where is the difference between the
> > > zlib and the bzlib CRC-32 parameters?
> > >
> > >   $ perl -e 'use Compress::Zlib; printf "%x\n", crc32("123456789");'
> > >   cbf43926
> >
> > From my understanding zlib crc32() == Adler.
> >
> > Please compare the tables in lib/crc32.c and lib/bzlib_crc32.c. They are
> > not the same.
> >
> >  I have run many tests.
>
> I do believe that they are not the same.  Yet available data
> suggests that both use identical polynomial, although resulting
> in different values for the same input.  So something else must
> be different.  That's what I said.  No conflict here.  I wasn't
> saying that zlib must be used, I was wondering why there would be
> a difference.

Even with the same polynomial, and almost the identical implementation, you 
can get different results if the bytes are shifted through lsb or msb first.

>
> > > I considered the "serializer" accessors of
> > > previous patch iterations most appropriate (except for their
> > > names), and feel this approach with a struct is inferior
> > > (especially because it doesn't cover the CRC later).  But others
> > > need not agree.
> >
> > I too prefer the older serialisers but was told I could not use my own.
> > Yes, there are some in target space: put_aligned_le16() etc, but from my
> > understanding it is unhealthy to use target space functions from host
> > space (mkimage etc).
>
> There may be responses of different severity, there are several
> grades from questions to recommendations to strong suggestions to
> total NAKs.
>
> I understood that you were asked to look into whether there is
> something you can re-use, while experience suggests that there
> should be something along those lines.
>
> Still I guess that the suggestion was not a hard and fast rule.
> Investigating and telling "does not apply" is as valid a response
> to these requests as changing the submission -- if there is
> necessity or reason.  We need not shoehorn code into a specific
> approach which does not apply or does not fit the specific
> purpose.  This kind of feedback is mostly about trying to blend
> in and remaining aware of when something is a workaround (at
> least I got that much).  Code should not only work, but it's as
> important that it's readable.

Ok, what I suggest is that I go back to serialisers.  For now I will just have 
my own private ones. In a future refactoring exercise these can be pulled 
into a file to stare with others.

Does that sound reasonable?

>
> > > > +
> > > > +/*
> > > > + * The header checksum is just a very simple checksum over
> > > > + * the header area.
> > > > + * There is still a crc32 over the whole lot.
> > > > + */
> > > > +static uint16_t hdr_checksum(const uint8_t *buf, int len)
> > > > +{
> > > > +	uint16_t ret = 0;
> > > > +	int i;
> > > > +
> > > > +	for (i = 0; i < len; i++) {
> > > > +		ret += (((uint16_t) *buf) & 0xff);
> > > > +		buf++;
> > > > +	}
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +
> > >
> > > What is the typecast good for?  Why are there still whitespace
> > > issues?  Can you either fix them or tell why this is OK?  I.e.
> > > can you _somehow_ respond to feedback you receive?  Ignoring
> > > review comments makes you recive less feedback later.
> >
> > Ok, I will do away with the cast. It does not serve any function.
> >
> > I am unsure what whitespace issue do you mean? Does checkpatch not catch
> > things like this?
>
> You see the multiple empty lines after the function, don't you?
> Even if not strictly required, you might want to use "--strict"
> with checkpatch as well.  It may sound picky, but this kind of
> stuff is not only cosmetics, but in addition suggests how much
> attention is put to details, and how much care is taken in
> iterations of a submission.

Thank you for that clarification as well as teaching me that checkpatch has 
a --strict option.

>
> > > The usual C language style comments apply:  Please don't mix
> > > declarations and instructions.
> >
> > What do you mean by "declaration and instructions"? Do you mean the
> > initial value mixing?
> >
> > > You can eliminate variables.  Use
> > > names which better reflect what is happening.  Something like
> > > this is more idiomatic:
> > >
> > > static uint16_t header_sum(const uint8_t *buf, int len)
> > > {
> > > 	uint16_t sum;
> > >
> > > 	sum = 0;
> > > 	while (len-- > 0) {
> > > 		sum += *buf++;
> > > 	}
> > > 	return sum;
> > > }
> >
> > Ok I will do that.
>
> Maybe this helps:  "uint16_t sum;" is a declaration.  "sum = 0;"
> is an instruction.  "uint16_t sum = 0;" is a declaration that is
> combined with an instruction.  It's useful to a certain amount,
> but can turn into obfuscation when used excessively.  It saves
> almost nothing (just typing the variable name once), but can be
> costly in terms of bugs and maintenance.  In the specific case
> above, pre-setting the sum and adding the items unnecessarily is
> several lines apart with other declarations between them.  It's
> easy to miss that there is more code involved than one might
> perceive at first glance.  The effect gets stronger with code
> that is not as trivial.

Thanks for that clarification. I will simplify.

>
> > > > +/* Sign the buffer and return the signed buffer size */
> > > > +static int sign_buffer(uint8_t *buf,
> > > > +			uint8_t version, uint8_t flags,
> > > > +			int len, int pad_64k)
> > > > +{
> > > > +	uint32_t calc_crc;
> > > > +
> > > > +	/* Align the length up */
> > > > +	len = (len + 3) & (~3);
> > >
> > > Is there nothing you can re-use?  If not can you introduce
> > > something generic to roundup() for user space tools to use?
> >
> > Sure I define something for reuse...
> >
> > Can you suggest where I might put it?
> >
> > > Can
> > > you rephrase the condition such that it becomes evident you are
> > > doing ROUNDUP(len, sizeof(uint32_t)) and avoid the magic "3" that
> > > has no such meaning for readers?
>
> U-Boot apparently has a roundup() macro, both in the bootloader
> as well as in user space (mxsimage.c uses it).  Did you look it
> up?  `git grep -i roundup`

roundup() is a macro in msximage.h which is a private header file for 
msximage.c.

It is surely an ugly dependency for socfpgaimage.c to include msximage.h just 
for a simple one-line macro.

That would make more sense to refactor into some common header. As I said 
before, refactoring should probably be a separate exercise.

There is a lot that can be refactored in mkimage:
* some macros.
* serialisation.
* crc calcs (there appear to be 3 used in socfpgaimage.c mxsimage.c and 
pblimage.c that are the same or just differ by an inversion at the end or the 
beginning).
* fix up that lib/*c include mess.
* fix the Makefile which is a mix of Kbuild and ifdef.

The danger of refactoring is that it is easy to break things. It was my 
intention to add this socfpga signer in a way that just "clips on" and does 
not fiddle with other code. I think, as a general principle, it is a good 
idea to do refactoring and feature adding as seperate exercises.

>
> > > > +
> > > > +	if (!pad_64k)
> > > > +		return len + 4;
> > > > +
> > > > +	return PADDED_SIZE;
> > > > +}
> > > >
> > > > +static void socfpgaimage_print_header(const void *ptr)
> > > > +{
> > > > +	if (verify_buffer(ptr) == 0)
> > > > +		printf("Looks like a sane SOCFPGA preloader\n");
> > > > +	else
> > > > +		printf("Not a sane SOCFPGA preloader\n");
> > > > +}
> > >
> > > Would it be useful to print a little more information than "it's
> > > a preloader" without any further details?  I'm not familiar with
> > > the context this routine gets called in, what do users expect?
> >
> > It gets called at the end of making an image.
> >
> > Ok, I will print out some numbers too. They are probably of little use to
> > anyone, but they look "technical" :-).
>
> At least the size might be nice to have.  Checksums less so.
> It's a judgement call.  Haven't checked what other formats do
> (except having seen U-Boot's native format several hundred
> times).  Just had to ask.
>
> That the print routine gets called at the end of image creation
> does not mean that it's the only use, BTW.  Somebody may want to
> run -l interactively or from tools/scripts.

Thanks, I will do that.

Regards

Charles