STM32MP1 boot slow
Marek Vasut
marex at denx.de
Wed Mar 25 17:09:48 CET 2020
On 3/25/20 4:57 PM, Patrick DELAUNAY wrote:
> Hi,
Hi,
>> From: Marek Vasut <marex at denx.de>
>> Sent: mercredi 25 mars 2020 00:39
>>
>> Hi,
>>
>> I was looking at the STM32MP1 boot time and I noticed it takes about 2 seconds
>> to get to U-Boot.
>
> Thanks for the feedback.
>
> To be clear, the SPL is not the ST priority as we have many limitation (mainly on
> power management) for the SPL boot chain (stm32mp15_basic_defconfig):
> Rom code => SPL => U-Boot
>
> The preconized boot chain for STM32MP1 is Rom code => TF-A => U-Boot
> (stm32mp15_trusted_defconfg).
I don't want to use TF-A because it's problematic at best.
However, these issues I listed here are present also in U-Boot, so this
comment is irrelevant anyway.
>> One problem is the insane I2C timing calculation in stm32f7 i2c driver, which is
>> almost a mallocator and CPU stress test and takes about 1 second to complete in
>> SPL -- we need some simpler replacement for that, possibly the one in DWC I2C
>> driver might do?
>
> Our first idea to manage this I2C settings (prescaler/timings setting) was to set this values
> in device tree, but this binding was refused so this function stm32_i2c_choose_solution()
> provided the better settings for any input clock and I2C frequency (called for each probe).
>
> But it is brutal and not optimum solution: try all the solution to found the better one.
> And the performance problem of this loop (shared code between Linux / U-Boot/TF-A drivers)
> had be already see/checked on ST side in TF-A context.
>
> We try to improve the solution, without success, but finally the performance issue
> was solved by dcache activation in TF-A before to execute this loop.
That's not a solution but a workaround.
> But as in SPL the data cache is not activated, this loop has terrible performance.
>
> We need to ding again of this topic for U-Boot point of view
> (SPL & also in U-Boot, before relocation and after relocation) .
>
> And I had shared this issue with the ST owner of this code.
>
> For information, I add some trace and I get for same code execution on DK2 board.
> - 440ms in SPL (dcache OFF)
> - 36ms in U-Boot (dcache ON)
Still, this is a workaround.
The calculation should be simplified. And why do you even need all that
memory allocations in there ?
>> Another item I found is that, in U-Boot, initf_dm() takes about half a second and so
>> does serial_init(). I didn't dig into it to find out why, but I suspect it has to do with
>> the massive amount of UCLASSes the DM has to traverse OR with the CPU being
>> slow at that point, as the clock driver didn't get probed just yet.
>>
>> Thoughts ?
>
> Yes, it is the first parsing of device tree, and it is really slow... directly linked to device
> tree size and libfdt.
>
> And because it is done before relocation (before dache enable).
>
> Measurement on DK2 = 649ms
>
> It is a other topic in my TODO list.
>
> I want to explore livetree activation to reduce the DT parsing time.
>
> And also activate dcache in pre-location stage
> (and potentially also in SPL as it was done in http://patchwork.ozlabs.org/patch/699899/)
>
> A other solution (workaround ?) is to reduced the U-Boot device-tree (remove all the nodes not used in
> U-Boot in soc file stm32mp157.dtsi or use /omit-if-no-ref/ for pincontrol nodes).
>
> See bootsage report on DK2, we have dm_f = 648ms
>
> STM32MP> bootstage report
> Timer summary in microseconds (12 records):
> Mark Elapsed Stage
> 0 0 reset
> 195,613 195,613 SPL
> 837,867 642,254 end SPL
> 840,117 2,250 board_init_f
> 2,739,639 1,899,522 board_init_r
> 3,066,815 327,176 id=64
> 3,103,377 36,562 id=65
> 3,104,078 701 main_loop
> 3,142,171 38,093 id=175
>
> Accumulated time:
> 38,124 dm_spl
> 41,956 dm_r
> 648,861 dm_f
>
> For information the time in spent in
> dm_extended_scan_fdt
> => dm_scan_fdt(blob, pre_reloc_only);
>
> This time is reduce d (few millisecond)
> with http://patchwork.ozlabs.org/patch/1240117/
>
> But only the data cache activation before relocation should improve this part.
For this one, I think we have no better options than the Dcache indeed.
Thanks
More information about the U-Boot
mailing list