i.MX8MP SPL failures due to memory corruption/overflow?

Frieder Schrempf frieder.schrempf at kontron.de
Thu Mar 16 11:05:10 CET 2023


Hi Emanuele,

On 15.03.23 22:25, Emanuele Ghidoli wrote:
> [Sie erhalten nicht häufig E-Mails von ghidoliemanuele at gmail.com.
> Weitere Informationen, warum dies wichtig ist, finden Sie unter
> https://aka.ms/LearnAboutSenderIdentification ]
> 
> On 15/03/2023 16:24, Frieder Schrempf wrote:
>> On 15.03.23 15:42, Frieder Schrempf wrote:
>>> On 15.03.23 15:17, Michael Nazzareno Trimarchi wrote:
>>>> Hi
>>>>
>>>> On Wed, Mar 15, 2023 at 3:13 PM Frieder Schrempf
>>>> <frieder.schrempf at kontron.de> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I'm trying to bring up a new board based on the i.MX8MP and I have an
>>>>> issue I'm hoping someone can help solving.
>>>>>
>>>>> I'm seeing failures in the early SPL code, usually in the DDR
>>>>> initialization. Often they look like:
>>>>>
>>>>>    U-Boot SPL 2023.04-rc3 (Mar 07 2023 - 14:32:34 +0000)
>>>>>    Training FAILED
>>>>>    Failed to initialize DDR RAM!
>>>>>    ### ERROR ### Please RESET the board ###
>>>>>
>>>>> But sometimes ddr_init() doesn't even return an error and only the
>>>>> get_ram_size() afterwards which tries to allocate the memory fails.
>>>>>
>>>>
>>>> In my experience you don't have space inside the cpu internal
>>>> memory. It means
>>>> that you overlap some stack with the code. Change the printf means
>>>> move a bit. So you have
>>>> problem but depends what you are going to destroy
>>>
>>> Thanks for your reply. That's exactly what I'm thinking, too.
>>>
>>>>
>>>>> The strange thing is that the issues appear or disappear
>>>>> deterministically on the binary level. This means I sometimes get a
>>>>> U-Boot binary which runs just fine in 100% of cases. Then I change for
>>>>> example one of the following:
>>>>>
>>>>> * Adding a single printf() somewhere in the boards spl.c
>>>>> * Using the same binary but booting from SD card instead of USB loader
>>>>> * Using the same source but switching from the OS cross compiler to
>>>>> the
>>>>> one from Yocto/OE
>>>>>
>>>>> And afterwards I get 100% failure rate with an error as described
>>>>> above.
>>>>>
>>>>> My suspicion is that there is some memory corruption/conflict. My
>>>>> SPL is
>>>>> quite large and I wonder if it exceeds some limit.
>>>>>
>>>>> SPL is loaded to 0x920000 and CONFIG_SPL_STACK is set to 0x960000,
>>>>> which
>>>>> leaves 256 KiB in between for the SPL. But all i.MX8MP boards seem to
>>>>> set CONFIG_SPL_MAX_SIZE=0x26000 (152 KiB) for some reason. My
>>>>> u-boot-spl-ddr.bin currently has around 193 KiB but I don't get any
>>>>> warning about exceeding the SPL_MAX_SIZE.
>>>>>
>>>>> My questions:
>>>>>
>>>>> * Why is CONFIG_SPL_MAX_SIZE set to 152 KiB?
>>>
>>> I guess the remainder between the SPL code and the SPL stack is for the
>>> DDR firmware. Which explains why I get failures with SPL exceeding 152
>>> KiB size.
>>
>> Still, it doesn't really make sense to me at the moment as the
>> u-boot-spl-ddr.bin already contains the DDR firmware it should be fine
>> to exceed the 152 KiB size. My u-boot-spl.bin (without DDR firmware) is
>> only 135 KiB.
>>
>> Sorry for spamming you by thinking out loud... ;)
>>
>>>
>>> Now I also understand the reason why the power init code was implemented
>>> using legacy non-DM drivers in other i.MX8MP boards. I probably also
>>> need to do this to save some space.
>>>
>>>>> * Why is there no warning in my case?
>>>
>>> Still, I fail to see why there isn't any error or where the size check
>>> is even implemented.
>>>
>>>>> * Any other ideas or pointers?
>>>>>
>>>>> Thanks for your help!
>>>>>
>>>>> Best regards
>>>>> Frieder
>>
> 
> Hello,
> I fall in a similar problem.
> 
> Some hints:
> - commit 5004901efb3b ("board_init: Do not reserve MALLOC_F area on stack
>   if non-zero MALLOC_F_ADDR") - but you should already have it
> - Reduce (set to something different from default value)
> SPL_SYS_MALLOC_F_LEN.
>   Normally that area is not used a lot. Stack start before heap area and,
>   if I remember well, start address of heap area depend upon this config.
>   And... its default value is equal to SYS_MALLOC_F_LEN, that normally
> is high.
> 
> Suggestions from Rasmus are precious. I adopt a rather similar approch
> to find
> that stack / gd (global data) was overlapping DDR firmware / cfg.

Thanks a lot for the additional pointers. I do have commit 5004901efb3b,
but I didn't look at MALLOC_F_ADDR before. It seems like there are some
i.MX8MP boards which use this to place the malloc area in the separate
OCRAM_S (0x184000) instead of OCRAM which is interesting and another
possibility I didn't know of.

Thanks
Frieder


More information about the U-Boot mailing list