[PATCH v2 2/7] common: binman: Calling initr_binman() when BINMAN_FDT
Michal Simek
michal.simek at amd.com
Tue Dec 10 13:40:50 CET 2024
Hi Simon,
On 12/9/24 20:27, Simon Glass wrote:
> Hi Michal,
>
> On Mon, 9 Dec 2024 at 11:34, Michal Simek <michal.simek at amd.com> wrote:
>>
>>
>>
>> On 12/9/24 16:47, Simon Glass wrote:
>>> Hi,
>>>
>>> On Mon, 9 Dec 2024 at 08:32, Tom Rini <trini at konsulko.com> wrote:
>>>>
>>>> On Mon, Dec 09, 2024 at 04:26:15PM +0100, Michal Simek wrote:
>>>>>
>>>>>
>>>>> On 12/6/24 20:20, Simon Glass wrote:
>>>>>> On Fri, 1 Nov 2024 at 03:18, Michal Simek <michal.simek at amd.com> wrote:
>>>>>>>
>>>>>>> Calling empty function when BINMAN_FDT is adding +64B for nothing which is
>>>>>>> not helping on size sensitive configurations as Xilinx mini configurations.
>>>>>>>
>>>>>>> Signed-off-by: Michal Simek <michal.simek at amd.com>
>>>>>>> ---
>>>>>>>
>>>>>>> Changes in v2:
>>>>>>> - new patch
>>>>>>>
>>>>>>> From my perspective there is no reason to call empty function. It is just
>>>>>>> increase footprint for nothing and we are not far from that limit now.
>>>>>>>
>>>>>>> ---
>>>>>>> common/board_r.c | 7 +++----
>>>>>>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> Reviewed-by: Simon Glass <sjg at chromium.org>
>>>>>>
>>>>>> This is a bit odd, though. Do you have LTO enabled?
>>>>>>
>>>>>
>>>>> yes LTO is enabled. And there are other candidates like this.
>>>>> Is LTO able to fix function arrays which is calling empty function?
>>>>>
>>>>> (without this patch)
>>>>>
>>>>> 00000000fffc0eb4 <initr_of_live>:
>>>>> fffc0eb4: 52800000 mov w0, #0x0 // #0
>>>>> fffc0eb8: d65f03c0 ret
>>>>>
>>>>> 00000000fffc0ebc <initr_dm_devices>:
>>>>> fffc0ebc: 52800000 mov w0, #0x0 // #0
>>>>> fffc0ec0: d65f03c0 ret
>>>>>
>>>>> 00000000fffc0ec4 <initr_bootstage>:
>>>>> fffc0ec4: 52800000 mov w0, #0x0 // #0
>>>>> fffc0ec8: d65f03c0 ret
>>>>>
>>>>> 00000000fffc0ecc <power_init_board>:
>>>>> fffc0ecc: 52800000 mov w0, #0x0 // #0
>>>>> fffc0ed0: d65f03c0 ret
>>>>>
>>>>> 00000000fffc0ed4 <initr_announce>:
>>>>> fffc0ed4: 52800000 mov w0, #0x0 // #0
>>>>> fffc0ed8: d65f03c0 ret
>>>>>
>>>>> 00000000fffc0edc <initr_binman>:
>>>>> fffc0edc: 52800000 mov w0, #0x0 // #0
>>>>> fffc0ee0: d65f03c0 ret
>>>>>
>>>>> 00000000fffc0ee4 <initr_status_led>:
>>>>> fffc0ee4: 52800000 mov w0, #0x0 // #0
>>>>> fffc0ee8: d65f03c0 ret
>>>>>
>>>>> 00000000fffc0eec <initr_boot_led_blink>:
>>>>> fffc0eec: 52800000 mov w0, #0x0 // #0
>>>>> fffc0ef0: d65f03c0 ret
>>>>>
>>>>> 00000000fffc0ef4 <initr_boot_led_on>:
>>>>> fffc0ef4: 52800000 mov w0, #0x0 // #0
>>>>> fffc0ef8: d65f03c0 ret
>>>>>
>>>>> 00000000fffc0efc <initr_lmb>:
>>>>> fffc0efc: 52800000 mov w0, #0x0 // #0
>>>>> fffc0f00: d65f03c0 ret
>>>>
>>>> No, but maybe Simon would prefer if we marked all of the could-be-empty
>>>> functions as __maybe_unused and did:
>>>> CONFIG_IS_ENABLED(BINMAN_FDT, initr_binman),
>>>> etc in the list instead?
>>>
>>> Yes that looks better.
>>
>> But we are talking about using macro inside array at best with using #ifdefs.
>> Or maybe I am not seeing what you are saying.
>>
>>>
>>> Michal, see also [1] in case you can work out why it 'stopped
>>> working'. I could have sworn inlining the function was a win when it
>>> was applied, but no amount of toolchain juggling could make it be a
>>> win when I came back to it later.
>>
>> Are you saying that it worked in past?
>
> I wasn't able to verify that post facto, but I believe I do remember
> checking it at the time. If you read the original commit message:
>
> 47870afab92 initcall: Move to inline function
>
> The board_r init function was complaining that we are looping through
> an array, calling all our tiny init stubs sequentially via indirect
> function calls (which can't be speculated, so they are slow).
>
> The solution to that is pretty easy though. All we need to do is inline
> the function that loops through the functions and the compiler will
> automatically convert almost all indirect calls into direct inlined code.
>
> With this patch, the overall code size drops (by 40 bytes on riscv64)
> and boot time should become measurably faster for every target.
>
> Signed-off-by: Alexander Graf <agraf at suse.de>
>
> Despite this hopeful sentiment, I seriously doubt any improvement in boot time.
I am not able to replicate this observation on arm64 or riscv64.
Loop unrolling is not happening even if you pass -funroll-all-loops flag.
Maybe different toolchains should be used to see this behavior.
Thanks,
Michal
More information about the U-Boot
mailing list