[PATCH v5 00/11] spl: Use common function for loading/parsing images

Thu Aug 3 06:41:51 CEST 2023

On 8/3/23 03:31, Tom Rini wrote:
> On Wed, Aug 02, 2023 at 09:13:21PM -0400, Sean Anderson wrote:
>> On 8/2/23 16:11, Tom Rini wrote:
>>> On Mon, Jul 31, 2023 at 06:42:52PM -0400, Sean Anderson wrote:
>>>
>>>> This series adds support for loading all image types (Legacy, FIT (with
>>>> and without LOAD_FIT_FULL), and i.MX) to the MMC, SPI, NOR, NET, FAT,
>>>> and EXT load methods. It does this by introducing a helper function
>>>> which handles the minutiae of invoking the proper parsing function, and
>>>> reading the rest of the image.
>>>>
>>>> Hopefully, this will make it easier for load methods to support all
>>>> image types that U-Boot supports, without having undocumented
>>>> unsupported image types. I applied this to several loaders which were
>>>> invoking spl_load_simple_fit and/or spl_parse_image_header, but I did
>>>> not use it with others (e.g. DFU/RAM) which had complications in the
>>>> mix.
>>>>
>>>> In order to meet size requirements for some boards, the first two
>>>> patches of this series are general size-reduction changes. The ffs/fls
>>>> change in particular should reduce code size across the board. The
>>>> malloc change also has the potential to reduce code size. I've left it
>>>> disabled by default, but maybe we can reverse that in the future.
>>>>
>>>> Here's some bloat-o-meter for j7200_evm_a72_defconfig with ext4 support
>>>> enabed:
>>>>
>>> add/remove: 2/1 grow/shrink: 1/5 up/down: 444/-584 (-140)
>>>> Function                                     old     new   delta
>>>> spl_load                                       -     216    +216
>>> spl_simple_read                                -     184    +184
>>>> spl_fit_read                                  60     104     +44
>>>> file_fat_read                                 40       -     -40
>>>> spl_nor_load_image                           120      76     -44
>>>> spl_load_image_ext                           364     296     -68
>>> spl_load_image_fat                           320     200    -120
>>>> spl_spi_load_image                           304     176    -128
>>>> spl_mmc_load                                 716     532    -184
>>>> Total: Before=233618, After=233478, chg -0.06%
>>>>
>>>> For most boards with a few load methods, this series should break even.
>>>> However, in the worst case this series will add around 100 bytes.
>>>>
>>>> I have only tested a few loaders. Please try booting your favorite board
>>>> with NOR/SPI flash or SPI falcon mode.
>>>
>>>> On sama5d27_wlsom1_ek_mmc_defconfig, 100 bytes is too much, so this
>>>> series depends on [1] to fit everything in. CI run at [2].
>>>>
>>>> [1] https://lore.kernel.org/u-boot/20230731223327.109865-1-sean.anderson@seco.com/
>>>> [2] https://source.denx.de/u-boot/custodians/u-boot-clk/-/pipelines/17116
>>>
>>> I've boot-tested this on my lab and a handful of SD+FAT or SD+raw boot
>>> devices, and it's all worked.  I also did my usual world build
>>> before/after to check out the size changes and well, I don't know.  The
>>> size wins come on platforms where there's both FAT+EXT support.  And
>>> then on mips with say mt7620_rfb I wonder if we're loosing
>>> functionality.
>>
>> Yes we are. I'll address this in v6.
>>
>>> But most platforms grow a bit, especially if they're just
>>> a single load type (which is the most common case). Maybe this problem
>>> needs to be approached another way? I've put the whole log over at
>>> https://gist.github.com/trini/e6772b2134e0eb44393364ea98729e06 and maybe
>>> something will jump out to you.
>>
>> The fls/ffs patch [1] reduces the growth by around 80 bytes on ARM
>> platforms. I sent it separately for ease of review, but it really should
>> be applied before this series, since a good portion of the growth is due
>> to that one function call. It seems like MIPS also has this instruction
>> (and Linux uses it), so I can try putting together a patch for it as
>> well.
>>
>> In the future, I would like to convert bl_len to bl_shift or something,
>> which would remove the need for fls (and going from bl_shift to bl_len
>> is just a shift). spl_load_info is used in a lot of places, so I wanted
>> to do that in a follow-up. On arches with efficient fls implementations
>> the difference is only around 3 instructions or so.
>>
>> But fundamentally some of the problem is that the logic is being moved
>> into multiple translation units, so the compiler can't see enough to
>> make optimizations. For example, all filesystems use a bl_len of 1, so
>> all the multiplications/roundings/etc. can be eliminated if the compiler
>> can see that. For example, on arm64 copying spl_load into spl_fat.c (as
>> fat_load) and making it static saves us 100 bytes:
>>
>> add/remove: 3/1 grow/shrink: 0/1 up/down: 324/-92 (232)
>> Function                                     old     new   delta
>> spl_load                                       -     160    +160
>> spl_simple_read                                -     104    +104
>> spl_fit_read                                   -      60     +60
>> file_fat_read                                 40       -     -40
>> spl_load_image_fat                           252     200     -52
>> Total: Before=66833, After=67065, chg +0.35%
>>
>> add/remove: 2/1 grow/shrink: 1/0 up/down: 172/-40 (132)
>> Function                                     old     new   delta
>> spl_simple_read                                -     104    +104
>> spl_fit_read                                   -      60     +60
>> spl_load_image_fat                           252     260      +8
>> file_fat_read                                 40       -     -40
>> Total: Before=66833, After=66965, chg +0.20%
>>
>> and even then, by inspection spl_simple_read isn't really taking
>> advantage of bl_len=1. Surprisingly, after enabling LTO this board is
>> +400 bytes (compared to without this series). It's just hard to beat
>> handwritten routines with a generic solution.
>>
>> I would really like to consolidate this stuff, since every load method
>> ends up only supporting a few image types and having almost identical
>> implementations for what they do support.
>>
>> --Sean
>>
>> [1] https://lore.kernel.org/u-boot/20230731212734.7256-1-sean.anderson@seco.com/
>
> Thanks for digging more, lets keep going in this direction then.
>

Do we need to pass sectors to the info->load() functions?

Only nand and mmc use bl_len != 1. Can we move the conversion to sectors
and the alignment checks to functions called by info->load()?

I would assume that this can avoid part of the code size increase.

Best regards

Heinrich