i.MX8MP SPL failures due to memory corruption/overflow?

Frieder Schrempf frieder.schrempf at kontron.de
Thu Mar 16 11:00:16 CET 2023


Hi Rasmus,

On 15.03.23 16:59, Rasmus Villemoes wrote:
> On 15/03/2023 16.24, Frieder Schrempf wrote:
>> On 15.03.23 15:42, Frieder Schrempf wrote:
>>> On 15.03.23 15:17, Michael Nazzareno Trimarchi wrote:
>>>> Hi
>>>>
>>>> On Wed, Mar 15, 2023 at 3:13 PM Frieder Schrempf
>>>> <frieder.schrempf at kontron.de> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I'm trying to bring up a new board based on the i.MX8MP and I have an
>>>>> issue I'm hoping someone can help solving.
>>>>>
>>>>> I'm seeing failures in the early SPL code, usually in the DDR
>>>>> initialization. Often they look like:
>>>>>
>>>>>   U-Boot SPL 2023.04-rc3 (Mar 07 2023 - 14:32:34 +0000)
>>>>>   Training FAILED
>>>>>   Failed to initialize DDR RAM!
>>>>>   ### ERROR ### Please RESET the board ###
>>>>>
>>>>> But sometimes ddr_init() doesn't even return an error and only the
>>>>> get_ram_size() afterwards which tries to allocate the memory fails.
>>>>>
>>>>
>>>> In my experience you don't have space inside the cpu internal memory. It means
>>>> that you overlap some stack with the code. Change the printf means
>>>> move a bit. So you have
>>>> problem but depends what you are going to destroy
>>>
>>> Thanks for your reply. That's exactly what I'm thinking, too.
>>>
>>>>
>>>>> The strange thing is that the issues appear or disappear
>>>>> deterministically on the binary level. This means I sometimes get a
>>>>> U-Boot binary which runs just fine in 100% of cases. Then I change for
>>>>> example one of the following:
>>>>>
>>>>> * Adding a single printf() somewhere in the boards spl.c
>>>>> * Using the same binary but booting from SD card instead of USB loader
>>>>> * Using the same source but switching from the OS cross compiler to the
>>>>> one from Yocto/OE
>>>>>
>>>>> And afterwards I get 100% failure rate with an error as described above.
>>>>>
>>>>> My suspicion is that there is some memory corruption/conflict. My SPL is
>>>>> quite large and I wonder if it exceeds some limit.
>>>>>
>>>>> SPL is loaded to 0x920000 and CONFIG_SPL_STACK is set to 0x960000, which
>>>>> leaves 256 KiB in between for the SPL. But all i.MX8MP boards seem to
>>>>> set CONFIG_SPL_MAX_SIZE=0x26000 (152 KiB) for some reason. My
>>>>> u-boot-spl-ddr.bin currently has around 193 KiB but I don't get any
>>>>> warning about exceeding the SPL_MAX_SIZE.
> 
> I also ran into this problem a while back, but that was back when the
> ddr firmware files were padded to 16K and 32K each to make the magic
> offset computations work; now that binman symbols are used, they only
> take up as much space as they actually use (give or take some 4-byte
> padding perhaps), and I no longer need the debug code I put in place in
> our 2022.07 branch.
> 
> Remember that from the stack, the initial (and in SPL only) malloc arena
> is carved out, and if you haven't adjusted SPL_SYS_MALLOC_F_LEN, you
> probably have that set to the default SYS_MALLOC_F_LEN, which in turn
> (on imx8m) defaults to 0x10000 aka 64KiB. So that could easily explain
> why you collide with the firmware.

Ok, that's something I missed before and it provides a good explanation
for my problems.

> 
> Maybe you can use the debug code I added to our copy of spl.c; I also
> include most of my commit-message-for-future-me. But just something as
> simple as
> 
>   int dummy;
>   printf("stack is around %p\n", &dummy);
> 
> can be quite valuable.

Thanks for all the valuable information and explanations. This helps a
lot. In the first step I disabled some DM drivers in SPL and use legacy
implementations for the PMIC, GPIO, etc. just as other i.MX8MP boards
do. This seems to shrink the SPL enough to avoid collisions.

But I will also try to optimize SPL_SYS_MALLOC_F_LEN now that I know its
role.

Thanks
Frieder


More information about the U-Boot mailing list