[PATCH v5 00/11] spl: Use common function for loading/parsing images

Sean Anderson seanga2 at gmail.com
Thu Aug 3 03:13:21 CEST 2023


On 8/2/23 16:11, Tom Rini wrote:
> On Mon, Jul 31, 2023 at 06:42:52PM -0400, Sean Anderson wrote:
> 
>> This series adds support for loading all image types (Legacy, FIT (with
>> and without LOAD_FIT_FULL), and i.MX) to the MMC, SPI, NOR, NET, FAT,
>> and EXT load methods. It does this by introducing a helper function
>> which handles the minutiae of invoking the proper parsing function, and
>> reading the rest of the image.
>>
>> Hopefully, this will make it easier for load methods to support all
>> image types that U-Boot supports, without having undocumented
>> unsupported image types. I applied this to several loaders which were
>> invoking spl_load_simple_fit and/or spl_parse_image_header, but I did
>> not use it with others (e.g. DFU/RAM) which had complications in the
>> mix.
>>
>> In order to meet size requirements for some boards, the first two
>> patches of this series are general size-reduction changes. The ffs/fls
>> change in particular should reduce code size across the board. The
>> malloc change also has the potential to reduce code size. I've left it
>> disabled by default, but maybe we can reverse that in the future.
>>
>> Here's some bloat-o-meter for j7200_evm_a72_defconfig with ext4 support
>> enabed:
>>
> add/remove: 2/1 grow/shrink: 1/5 up/down: 444/-584 (-140)
>> Function                                     old     new   delta
>> spl_load                                       -     216    +216
> spl_simple_read                                -     184    +184
>> spl_fit_read                                  60     104     +44
>> file_fat_read                                 40       -     -40
>> spl_nor_load_image                           120      76     -44
>> spl_load_image_ext                           364     296     -68
> spl_load_image_fat                           320     200    -120
>> spl_spi_load_image                           304     176    -128
>> spl_mmc_load                                 716     532    -184
>> Total: Before=233618, After=233478, chg -0.06%
>>
>> For most boards with a few load methods, this series should break even.
>> However, in the worst case this series will add around 100 bytes.
>>
>> I have only tested a few loaders. Please try booting your favorite board
>> with NOR/SPI flash or SPI falcon mode.
>
>> On sama5d27_wlsom1_ek_mmc_defconfig, 100 bytes is too much, so this
>> series depends on [1] to fit everything in. CI run at [2].
>>
>> [1] https://lore.kernel.org/u-boot/20230731223327.109865-1-sean.anderson@seco.com/
>> [2] https://source.denx.de/u-boot/custodians/u-boot-clk/-/pipelines/17116
> 
> I've boot-tested this on my lab and a handful of SD+FAT or SD+raw boot
> devices, and it's all worked.  I also did my usual world build
> before/after to check out the size changes and well, I don't know.  The
> size wins come on platforms where there's both FAT+EXT support.  And
> then on mips with say mt7620_rfb I wonder if we're loosing
> functionality.

Yes we are. I'll address this in v6.

> But most platforms grow a bit, especially if they're just
> a single load type (which is the most common case). Maybe this problem
> needs to be approached another way? I've put the whole log over at
> https://gist.github.com/trini/e6772b2134e0eb44393364ea98729e06 and maybe
> something will jump out to you.

The fls/ffs patch [1] reduces the growth by around 80 bytes on ARM
platforms. I sent it separately for ease of review, but it really should
be applied before this series, since a good portion of the growth is due
to that one function call. It seems like MIPS also has this instruction
(and Linux uses it), so I can try putting together a patch for it as
well.

In the future, I would like to convert bl_len to bl_shift or something,
which would remove the need for fls (and going from bl_shift to bl_len
is just a shift). spl_load_info is used in a lot of places, so I wanted
to do that in a follow-up. On arches with efficient fls implementations
the difference is only around 3 instructions or so.

But fundamentally some of the problem is that the logic is being moved
into multiple translation units, so the compiler can't see enough to
make optimizations. For example, all filesystems use a bl_len of 1, so
all the multiplications/roundings/etc. can be eliminated if the compiler
can see that. For example, on arm64 copying spl_load into spl_fat.c (as
fat_load) and making it static saves us 100 bytes:

add/remove: 3/1 grow/shrink: 0/1 up/down: 324/-92 (232)
Function                                     old     new   delta
spl_load                                       -     160    +160
spl_simple_read                                -     104    +104
spl_fit_read                                   -      60     +60
file_fat_read                                 40       -     -40
spl_load_image_fat                           252     200     -52
Total: Before=66833, After=67065, chg +0.35%

add/remove: 2/1 grow/shrink: 1/0 up/down: 172/-40 (132)
Function                                     old     new   delta
spl_simple_read                                -     104    +104
spl_fit_read                                   -      60     +60
spl_load_image_fat                           252     260      +8
file_fat_read                                 40       -     -40
Total: Before=66833, After=66965, chg +0.20%

and even then, by inspection spl_simple_read isn't really taking
advantage of bl_len=1. Surprisingly, after enabling LTO this board is
+400 bytes (compared to without this series). It's just hard to beat
handwritten routines with a generic solution.

I would really like to consolidate this stuff, since every load method
ends up only supporting a few image types and having almost identical
implementations for what they do support.

--Sean

[1] https://lore.kernel.org/u-boot/20230731212734.7256-1-sean.anderson@seco.com/


More information about the U-Boot mailing list