STM32MP1 boot slow

Patrick DELAUNAY patrick.delaunay at st.com
Wed Mar 25 16:57:50 CET 2020


Hi,

> From: Marek Vasut <marex at denx.de>
> Sent: mercredi 25 mars 2020 00:39
> 
> Hi,
> 
> I was looking at the STM32MP1 boot time and I noticed it takes about 2 seconds
> to get to U-Boot.

Thanks for the feedback.

To be clear, the SPL is not the ST priority as we have many limitation (mainly on
power management) for the SPL boot chain (stm32mp15_basic_defconfig):
Rom code => SPL => U-Boot

The preconized boot chain for STM32MP1 is Rom code => TF-A => U-Boot
(stm32mp15_trusted_defconfg).

> One problem is the insane I2C timing calculation in stm32f7 i2c driver, which is
> almost a mallocator and CPU stress test and takes about 1 second to complete in
> SPL -- we need some simpler replacement for that, possibly the one in DWC I2C
> driver might do?

Our first idea to manage this I2C settings (prescaler/timings setting) was to set this values 
in device tree, but this binding was refused so this function stm32_i2c_choose_solution()
provided the better settings for any input clock and I2C frequency (called for each probe).

But it is brutal and not optimum solution: try all the solution to found the better one.
And the performance problem of this loop (shared code between Linux / U-Boot/TF-A drivers)
had be already see/checked on ST side in TF-A context.

We try to improve the solution, without success, but finally the performance issue
was solved by dcache activation in TF-A before to execute this loop.

But as in SPL the data cache is not activated, this loop has terrible performance.

We need to ding again of this topic for U-Boot point of view
(SPL & also in U-Boot, before relocation and after relocation) .

And I had shared this issue with the ST owner of this code.

For information, I add some trace and I get for same code execution on DK2 board.
- 440ms in SPL (dcache OFF)
- 36ms in U-Boot (dcache ON)

> Another item I found is that, in U-Boot, initf_dm() takes about half a second and so
> does serial_init(). I didn't dig into it to find out why, but I suspect it has to do with
> the massive amount of UCLASSes the DM has to traverse OR with the CPU being
> slow at that point, as the clock driver didn't get probed just yet.
>
> Thoughts ?

Yes, it is the first parsing of device tree, and it is really slow... directly linked to device
tree size and libfdt.

And because it is done before relocation (before dache enable).

Measurement on DK2 = 649ms

It is a other topic in my TODO list.

I want to explore livetree activation to reduce the DT parsing time.
 
And also activate dcache in pre-location stage
(and potentially also in SPL as it was done in http://patchwork.ozlabs.org/patch/699899/)

A other solution (workaround ?) is to reduced the U-Boot device-tree (remove all the nodes not used in
U-Boot in soc file stm32mp157.dtsi or use /omit-if-no-ref/ for pincontrol nodes).

See bootsage report on DK2, we have dm_f = 648ms

STM32MP> bootstage report
Timer summary in microseconds (12 records):
       Mark    Elapsed  Stage
          0          0  reset
    195,613    195,613  SPL
    837,867    642,254  end SPL
    840,117      2,250  board_init_f
  2,739,639  1,899,522  board_init_r
  3,066,815    327,176  id=64
  3,103,377     36,562  id=65
  3,104,078        701  main_loop
  3,142,171     38,093  id=175

Accumulated time:
                38,124  dm_spl
                41,956  dm_r
               648,861  dm_f

For information the time in spent in 
	dm_extended_scan_fdt
	=> dm_scan_fdt(blob, pre_reloc_only);

This time is reduce d (few millisecond) 
with http://patchwork.ozlabs.org/patch/1240117/

But only the data cache activation before relocation should improve this part.

> 
> --
> Best regards,
> Marek Vasut

Regards
Patrick


More information about the U-Boot mailing list