STM32MP1 boot slow

Marek Vasut marex at denx.de
Thu Mar 26 17:27:50 CET 2020


On 3/26/20 5:19 PM, Simon Glass wrote:
> Hi Patrick,

Hi,

> On Wed, 25 Mar 2020 at 09:57, Patrick DELAUNAY <patrick.delaunay at st.com> wrote:
>>
>> Hi,
>>
>>> From: Marek Vasut <marex at denx.de>
>>> Sent: mercredi 25 mars 2020 00:39
>>>
>>> Hi,
>>>
>>> I was looking at the STM32MP1 boot time and I noticed it takes about 2 seconds
>>> to get to U-Boot.
>>
>> Thanks for the feedback.
>>
>> To be clear, the SPL is not the ST priority as we have many limitation (mainly on
>> power management) for the SPL boot chain (stm32mp15_basic_defconfig):
>> Rom code => SPL => U-Boot
>>
>> The preconized boot chain for STM32MP1 is Rom code => TF-A => U-Boot
>> (stm32mp15_trusted_defconfg).
>>
>>> One problem is the insane I2C timing calculation in stm32f7 i2c driver, which is
>>> almost a mallocator and CPU stress test and takes about 1 second to complete in
>>> SPL -- we need some simpler replacement for that, possibly the one in DWC I2C
>>> driver might do?
>>
>> Our first idea to manage this I2C settings (prescaler/timings setting) was to set this values
>> in device tree, but this binding was refused so this function stm32_i2c_choose_solution()
> 
> Was the binding refused in linux? Could we add something
> U-Boot-specific then? I think having 'early' timings, etc. is very
> handy. We are doing this on x86.
> 
> Of course it has traditionally been impossible to convince Linux
> people to add this sort of thing. Still, I think we should do it. Our
> U-Boot-specific files allow this.

Or reuse the DWC I2C driver timing calculation, which is real simple,
fast, and should be accurate enough.

>> provided the better settings for any input clock and I2C frequency (called for each probe).
>>
>> But it is brutal and not optimum solution: try all the solution to found the better one.
>> And the performance problem of this loop (shared code between Linux / U-Boot/TF-A drivers)
>> had be already see/checked on ST side in TF-A context.
> 
> We should be able to calculate it, like with dw-i2c.

Yes

>> We try to improve the solution, without success, but finally the performance issue
>> was solved by dcache activation in TF-A before to execute this loop.
> 
> I would like to see patches to enable the cache. We did this some
> years ago in a Chromebook and it made a big difference. It is not that
> hard.

ACK. Why did the chromebook patches never make it upstream ?

>> But as in SPL the data cache is not activated, this loop has terrible performance.
>>
>> We need to ding again of this topic for U-Boot point of view
>> (SPL & also in U-Boot, before relocation and after relocation) .
>>
>> And I had shared this issue with the ST owner of this code.
>>
>> For information, I add some trace and I get for same code execution on DK2 board.
>> - 440ms in SPL (dcache OFF)
>> - 36ms in U-Boot (dcache ON)
>>
>>> Another item I found is that, in U-Boot, initf_dm() takes about half a second and so
>>> does serial_init(). I didn't dig into it to find out why, but I suspect it has to do with
>>> the massive amount of UCLASSes the DM has to traverse OR with the CPU being
>>> slow at that point, as the clock driver didn't get probed just yet.
>>>
>>> Thoughts ?
>>
>> Yes, it is the first parsing of device tree, and it is really slow... directly linked to device
>> tree size and libfdt.
> 
> I wonder if we can improve this. There was a change to how the drivers
> were bound (changing the ordering). We could perhaps revert that for
> SPL.

Link ?

[...]

-- 
Best regards,
Marek Vasut


More information about the U-Boot mailing list