STM32MP1 boot slow

Marek Vasut marex at denx.de
Wed Mar 25 17:09:48 CET 2020


On 3/25/20 4:57 PM, Patrick DELAUNAY wrote:
> Hi,

Hi,

>> From: Marek Vasut <marex at denx.de>
>> Sent: mercredi 25 mars 2020 00:39
>>
>> Hi,
>>
>> I was looking at the STM32MP1 boot time and I noticed it takes about 2 seconds
>> to get to U-Boot.
> 
> Thanks for the feedback.
> 
> To be clear, the SPL is not the ST priority as we have many limitation (mainly on
> power management) for the SPL boot chain (stm32mp15_basic_defconfig):
> Rom code => SPL => U-Boot
> 
> The preconized boot chain for STM32MP1 is Rom code => TF-A => U-Boot
> (stm32mp15_trusted_defconfg).

I don't want to use TF-A because it's problematic at best.

However, these issues I listed here are present also in U-Boot, so this
comment is irrelevant anyway.

>> One problem is the insane I2C timing calculation in stm32f7 i2c driver, which is
>> almost a mallocator and CPU stress test and takes about 1 second to complete in
>> SPL -- we need some simpler replacement for that, possibly the one in DWC I2C
>> driver might do?
> 
> Our first idea to manage this I2C settings (prescaler/timings setting) was to set this values 
> in device tree, but this binding was refused so this function stm32_i2c_choose_solution()
> provided the better settings for any input clock and I2C frequency (called for each probe).
> 
> But it is brutal and not optimum solution: try all the solution to found the better one.
> And the performance problem of this loop (shared code between Linux / U-Boot/TF-A drivers)
> had be already see/checked on ST side in TF-A context.
> 
> We try to improve the solution, without success, but finally the performance issue
> was solved by dcache activation in TF-A before to execute this loop.

That's not a solution but a workaround.

> But as in SPL the data cache is not activated, this loop has terrible performance.
> 
> We need to ding again of this topic for U-Boot point of view
> (SPL & also in U-Boot, before relocation and after relocation) .
> 
> And I had shared this issue with the ST owner of this code.
> 
> For information, I add some trace and I get for same code execution on DK2 board.
> - 440ms in SPL (dcache OFF)
> - 36ms in U-Boot (dcache ON)

Still, this is a workaround.

The calculation should be simplified. And why do you even need all that
memory allocations in there ?

>> Another item I found is that, in U-Boot, initf_dm() takes about half a second and so
>> does serial_init(). I didn't dig into it to find out why, but I suspect it has to do with
>> the massive amount of UCLASSes the DM has to traverse OR with the CPU being
>> slow at that point, as the clock driver didn't get probed just yet.
>>
>> Thoughts ?
> 
> Yes, it is the first parsing of device tree, and it is really slow... directly linked to device
> tree size and libfdt.
> 
> And because it is done before relocation (before dache enable).
> 
> Measurement on DK2 = 649ms
> 
> It is a other topic in my TODO list.
> 
> I want to explore livetree activation to reduce the DT parsing time.
>  
> And also activate dcache in pre-location stage
> (and potentially also in SPL as it was done in http://patchwork.ozlabs.org/patch/699899/)
> 
> A other solution (workaround ?) is to reduced the U-Boot device-tree (remove all the nodes not used in
> U-Boot in soc file stm32mp157.dtsi or use /omit-if-no-ref/ for pincontrol nodes).
> 
> See bootsage report on DK2, we have dm_f = 648ms
> 
> STM32MP> bootstage report
> Timer summary in microseconds (12 records):
>        Mark    Elapsed  Stage
>           0          0  reset
>     195,613    195,613  SPL
>     837,867    642,254  end SPL
>     840,117      2,250  board_init_f
>   2,739,639  1,899,522  board_init_r
>   3,066,815    327,176  id=64
>   3,103,377     36,562  id=65
>   3,104,078        701  main_loop
>   3,142,171     38,093  id=175
> 
> Accumulated time:
>                 38,124  dm_spl
>                 41,956  dm_r
>                648,861  dm_f
> 
> For information the time in spent in 
> 	dm_extended_scan_fdt
> 	=> dm_scan_fdt(blob, pre_reloc_only);
> 
> This time is reduce d (few millisecond) 
> with http://patchwork.ozlabs.org/patch/1240117/
> 
> But only the data cache activation before relocation should improve this part.

For this one, I think we have no better options than the Dcache indeed.
Thanks


More information about the U-Boot mailing list