[PATCH v5 00/11] spl: Use common function for loading/parsing images

Tom Rini trini at konsulko.com
Thu Oct 19 18:07:20 CEST 2023


On Wed, Oct 18, 2023 at 01:33:20PM -0400, Sean Anderson wrote:
> On 8/3/23 00:41, Heinrich Schuchardt wrote:
> > On 8/3/23 03:31, Tom Rini wrote:
> > > On Wed, Aug 02, 2023 at 09:13:21PM -0400, Sean Anderson wrote:
> > > > On 8/2/23 16:11, Tom Rini wrote:
> > > > > On Mon, Jul 31, 2023 at 06:42:52PM -0400, Sean Anderson wrote:
> > > > > 
> > > > > > This series adds support for loading all image types (Legacy, FIT (with
> > > > > > and without LOAD_FIT_FULL), and i.MX) to the MMC, SPI, NOR, NET, FAT,
> > > > > > and EXT load methods. It does this by introducing a helper function
> > > > > > which handles the minutiae of invoking the proper parsing function, and
> > > > > > reading the rest of the image.
> > > > > > 
> > > > > > Hopefully, this will make it easier for load methods to support all
> > > > > > image types that U-Boot supports, without having undocumented
> > > > > > unsupported image types. I applied this to several loaders which were
> > > > > > invoking spl_load_simple_fit and/or spl_parse_image_header, but I did
> > > > > > not use it with others (e.g. DFU/RAM) which had complications in the
> > > > > > mix.
> > > > > > 
> > > > > > In order to meet size requirements for some boards, the first two
> > > > > > patches of this series are general size-reduction changes. The ffs/fls
> > > > > > change in particular should reduce code size across the board. The
> > > > > > malloc change also has the potential to reduce code size. I've left it
> > > > > > disabled by default, but maybe we can reverse that in the future.
> > > > > > 
> > > > > > Here's some bloat-o-meter for j7200_evm_a72_defconfig with ext4 support
> > > > > > enabed:
> > > > > > 
> > > > > add/remove: 2/1 grow/shrink: 1/5 up/down: 444/-584 (-140)
> > > > > > Function                                     old     new   delta
> > > > > > spl_load                                       -     216    +216
> > > > > spl_simple_read                                -     184    +184
> > > > > > spl_fit_read                                  60     104     +44
> > > > > > file_fat_read                                 40       -     -40
> > > > > > spl_nor_load_image                           120      76     -44
> > > > > > spl_load_image_ext                           364     296     -68
> > > > > spl_load_image_fat                           320     200    -120
> > > > > > spl_spi_load_image                           304     176    -128
> > > > > > spl_mmc_load                                 716     532    -184
> > > > > > Total: Before=233618, After=233478, chg -0.06%
> > > > > > 
> > > > > > For most boards with a few load methods, this series should break even.
> > > > > > However, in the worst case this series will add around 100 bytes.
> > > > > > 
> > > > > > I have only tested a few loaders. Please try booting your favorite board
> > > > > > with NOR/SPI flash or SPI falcon mode.
> > > > > 
> > > > > > On sama5d27_wlsom1_ek_mmc_defconfig, 100 bytes is too much, so this
> > > > > > series depends on [1] to fit everything in. CI run at [2].
> > > > > > 
> > > > > > [1] https://lore.kernel.org/u-boot/20230731223327.109865-1-sean.anderson@seco.com/
> > > > > > [2] https://source.denx.de/u-boot/custodians/u-boot-clk/-/pipelines/17116
> > > > > 
> > > > > I've boot-tested this on my lab and a handful of SD+FAT or SD+raw boot
> > > > > devices, and it's all worked.  I also did my usual world build
> > > > > before/after to check out the size changes and well, I don't know.  The
> > > > > size wins come on platforms where there's both FAT+EXT support.  And
> > > > > then on mips with say mt7620_rfb I wonder if we're loosing
> > > > > functionality.
> > > > 
> > > > Yes we are. I'll address this in v6.
> > > > 
> > > > > But most platforms grow a bit, especially if they're just
> > > > > a single load type (which is the most common case). Maybe this problem
> > > > > needs to be approached another way? I've put the whole log over at
> > > > > https://gist.github.com/trini/e6772b2134e0eb44393364ea98729e06 and maybe
> > > > > something will jump out to you.
> > > > 
> > > > The fls/ffs patch [1] reduces the growth by around 80 bytes on ARM
> > > > platforms. I sent it separately for ease of review, but it really should
> > > > be applied before this series, since a good portion of the growth is due
> > > > to that one function call. It seems like MIPS also has this instruction
> > > > (and Linux uses it), so I can try putting together a patch for it as
> > > > well.
> > > > 
> > > > In the future, I would like to convert bl_len to bl_shift or something,
> > > > which would remove the need for fls (and going from bl_shift to bl_len
> > > > is just a shift). spl_load_info is used in a lot of places, so I wanted
> > > > to do that in a follow-up. On arches with efficient fls implementations
> > > > the difference is only around 3 instructions or so.
> > > > 
> > > > But fundamentally some of the problem is that the logic is being moved
> > > > into multiple translation units, so the compiler can't see enough to
> > > > make optimizations. For example, all filesystems use a bl_len of 1, so
> > > > all the multiplications/roundings/etc. can be eliminated if the compiler
> > > > can see that. For example, on arm64 copying spl_load into spl_fat.c (as
> > > > fat_load) and making it static saves us 100 bytes:
> > > > 
> > > > add/remove: 3/1 grow/shrink: 0/1 up/down: 324/-92 (232)
> > > > Function                                     old     new   delta
> > > > spl_load                                       -     160    +160
> > > > spl_simple_read                                -     104    +104
> > > > spl_fit_read                                   -      60     +60
> > > > file_fat_read                                 40       -     -40
> > > > spl_load_image_fat                           252     200     -52
> > > > Total: Before=66833, After=67065, chg +0.35%
> > > > 
> > > > add/remove: 2/1 grow/shrink: 1/0 up/down: 172/-40 (132)
> > > > Function                                     old     new   delta
> > > > spl_simple_read                                -     104    +104
> > > > spl_fit_read                                   -      60     +60
> > > > spl_load_image_fat                           252     260      +8
> > > > file_fat_read                                 40       -     -40
> > > > Total: Before=66833, After=66965, chg +0.20%
> > > > 
> > > > and even then, by inspection spl_simple_read isn't really taking
> > > > advantage of bl_len=1. Surprisingly, after enabling LTO this board is
> > > > +400 bytes (compared to without this series). It's just hard to beat
> > > > handwritten routines with a generic solution.
> > > > 
> > > > I would really like to consolidate this stuff, since every load method
> > > > ends up only supporting a few image types and having almost identical
> > > > implementations for what they do support.
> > > > 
> > > > --Sean
> > > > 
> > > > [1] https://lore.kernel.org/u-boot/20230731212734.7256-1-sean.anderson@seco.com/
> > > 
> > > Thanks for digging more, lets keep going in this direction then.
> > > 
> > 
> > Do we need to pass sectors to the info->load() functions?
> > 
> > Only nand and mmc use bl_len != 1. Can we move the conversion to sectors
> > and the alignment checks to functions called by info->load()?
> > 
> > I would assume that this can avoid part of the code size increase.
> 
> Unfortunately, while the conversion to sectors and alignment can be put into ->load,
> MMC (and ROMAPI) can only read in block-sized chunks. This means that the buffer must
> be a multiple of the block size (otherwise we will go over the end), and there must be
> a block's worth of space in front of the buffer as well (which we use when we have a
> non-block-aligned offset).
> 
> These situations generally only matter when reading into a malloc'd buffer, since there
> is data to overwrite on either end of the buffer. With things like SYS_TEXT_BASE, there
> is nothing to overwrite on either end. From examining the various loaders, malloc'd buffers
> are only used when loading the image header (which may include the full image for FITs).
> In that situation, under the beginning of the buffer is not an issue, since the load method
> should have supplied an offset which is a multiple of the block size. Thus, we only need
> to solve the problem of writing past the end of the buffer.
> 
> From my perspective, there are three approaches we could take:
> 
> - Keep bl_len so that callers of ->read can allocate a sufficiently-large buffer.
> - Use a bounce-buffer. This uses around double the memory and will also increase code size.
> - Complete most of the read as normal, but read the final block into a separate buffer which
>   is then copied into the passed-in buffer. This requires two reads, which is likely slower
>   than one. It will also increase code size.
> 
> I think the first approach could be improved by optionally removing bl_len when there are
> no load methods which need it. This will let the compiler optimize out any alignment if it
> is unnecessary, which it cannot currently do (unless LTO is enabled). I am going to try using
> this approach for v6.

This sounds good, and LTO is / can be an option for SPL for platforms
that need to regain space.

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 659 bytes
Desc: not available
URL: <https://lists.denx.de/pipermail/u-boot/attachments/20231019/7e156aa8/attachment.sig>


More information about the U-Boot mailing list