RPi4 U-Boot freeze

Stefan Agner stefan at agner.ch
Sun Sep 20 11:09:23 CEST 2020


On 2020-09-19 23:20, Sean Anderson wrote:
> On 9/19/20 7:55 AM, Stefan Agner wrote:
>> On 2020-09-14 10:15, Matthias Brugger wrote:
>>> On 10/09/2020 23:12, Stefan Agner wrote:
>>>> On 2020-09-07 16:36, Peter Robinson wrote:
>>>>>> Any thoughts on this issue?
>>>>>
>>>>> Any reason why you're using 2020.01 and not at least 2020.07, or at
>>>>> least seeing if it's reproducible on 2020.10-rc3? The RPi4 support has
>>>>> changed quite a bit since then I suspect.
>>>>>
>>>>
>>>> Hi Peter,
>>>>
>>>> It's a stable release and we support a couple of devices with the same
>>>> U-Boot version. I'd rather prefer to stay with 2020.01 for RPi4 as well.
>>>>
>>>> We are on 2020.07 on development branch, and it does work fine there. So
>>>> I thought it can't be that hard, just bisect and backport whatever fixes
>>>> it... Unfortunately, it seems that there is no particular commit which
>>>> fixes it (the bisect ended up in a random unrelated commit, and it seems
>>>> that the issue appears/disappears depending on alignment/size...).
>>>>
>>>> I also did applied pretty much every RPi4 related commit made after
>>>> 2020.01 up until master back to 2020.01, no success either.
>>>>
>>>
>>> Which version of the Raspberry Pi firmware did you use?
>>> Unfortunately changes in the FW breaks stuff on U-Boot from time to time.
>>>
>>
>> Ok, I am now able to reproduce the issue on master as well as 2020.07
>> with standard rpi_4_32b_defconfig, but I still need to have parts of a
>> change which seems to trigger the issue in. From what I can tell, the
>> change *really* should not lead to a freeze. The change is just
>> accessing global variables from the data section... (see below).
>>
>> To me it still seems as if relocation somehow did not work correctly in
>> one way or another.
>>
>> Are there maybe restrictions in U-Boot when the data section can be
>> accessed? E.g. is it not legal to access the data section from the
>> serial driver?
> 
> One thing bit me recently, and might be relevant here. Because
> putc_retry is initialized to zero, it is located in bss and not data.
> In U-Boot, bss is not accessable before relocation. The serial driver is
> one of the devices which U-Boot needs before relocating, so setting
> putc_retry may overwrite data in the device tree. To get around this,
> you could try adding __attribute__((section(".data"))) to that variable.

Thanks a lot for that hint, I did not know that bss is not accessible
before relocation.

That indeed seems to be the problem! Actually just removing the
initialization is fine here, as global variables are initialized to zero
anyways (from what it seems also in U-Boot's implementation).

I wonder if some compile time warning/check could be introduced, but I
guess there is not really a way to know what code is used before
relocation.

Anyways, thanks a lot, sorry for the noise. We can consider this solved.

--
Stefan



> 
> --Sean
> 
>>
>>
>> diff --git a/drivers/serial/serial_bcm283x_mu.c
>> b/drivers/serial/serial_bcm283x_mu.c
>> index 8a4af87eb6..74de6801ab 100644
>> --- a/drivers/serial/serial_bcm283x_mu.c
>> +++ b/drivers/serial/serial_bcm283x_mu.c
>> @@ -50,7 +50,8 @@ struct bcm283x_mu_regs {
>>  struct bcm283x_mu_priv {
>>         struct bcm283x_mu_regs *regs;
>>  };
>> -
>> +static char *fs_argv[15];
>> +static uint32_t putc_retry = 0;
>>  static int bcm283x_mu_serial_getc(struct udevice *dev);
>>
>>  static int bcm283x_mu_serial_setbrg(struct udevice *dev, int baudrate)
>> @@ -95,6 +96,8 @@ static int bcm283x_mu_serial_putc(struct udevice *dev,
>> const char data)
>>         struct bcm283x_mu_priv *priv = dev_get_priv(dev);
>>         struct bcm283x_mu_regs *regs = priv->regs;
>>
>> +       putc_retry++;
>> +
>>         /* Wait until there is space in the FIFO */
>>         if (!(readl(&regs->lsr) & BCM283X_MU_LSR_TX_EMPTY))
>>                 return -EAGAIN;
>> @@ -162,6 +165,10 @@ static int bcm283x_mu_serial_probe(struct udevice
>> *dev)
>>         struct bcm283x_mu_priv *priv = dev_get_priv(dev);
>>         fdt_addr_t addr;
>>
>> +       /* Make sure compiler does not optimize out this fs_argv
>> instance */
>> +       if (fs_argv[0])
>> +               fs_argv[0] = "test";
>> +
>>         /* Don't spawn the device if it's not muxed */
>>         if (!bcm283x_is_serial_muxed())
>>                 return -ENODEV;
>>
>> Most curious of all, it seems that the name (!!!) of the variable
>> fs_argv matters! I am not sure if that changes order of variables in
>> data section or something. I can also reproduce the issue with two
>> compilers (GCC 8.3 and GCC 9.2), so a compiler error seems somewhat
>> unlikely...
>>
>> Any ideas? I am a bit out of idea how to debug this (I guess JTAG/gdb
>> might help, but I don't have such a setup).
>>
>> FWIW, I plan to just drop the change which seems to at least partially
>> cause the isssue
>> (https://github.com/home-assistant/operating-system/blob/dev/buildroot-external/board/raspberrypi/patches/uboot/0002-avoid-block-uart-write.patch).
>> Still I think there is something wrong which will show itself someday in
>> a certain configuration.
>>
>> --
>> Stefan
>>
>>
>>> Regards,
>>> Mathias
>>>
>>>> I fear that the problem in fact is still in master, but only appears if
>>>> certain things align a certain way... That is why I thought I bring it
>>>> up, to see if anybody else has noticed something along this lines. We do
>>>> have a rather trimmed down configuration, which might make the problem
>>>> appear more (fit in a D cache...).
>>>>
>>>> I probably will just disable cached around relocation for 2020.01 and
>>>> see if it resurfaces on development branch.
>>>>
>>>> --
>>>> Stefan
>>>>
>>>>
>>>>>> Just to be sure, I did some memory testing on the 2GB module, but no
>>>>>> issues found.
>>>>>>
>>>>>> I still somehow suspected that something else might be wrong with my
>>>>>> hardware, so I bought a new RPi4 (this time with 4GB of RAM) but the
>>>>>> very same with that:
>>>>>>
>>>>>> U-Boot 2020.01 (Aug 23 2020 - 22:02:31 +0000)
>>>>>>
>>>>>> DRAM:  3.9 GiB
>>>>>> <freeze>
>>>>>>
>>>>>> I still think there is something wrong with caching. From what I
>>>>>> understand caches are enabled by the RPi (4) firmware. Is it safe to run
>>>>>> 32-bit ARM U-Boot with enabled caches?
>>>>>>
>>>>>> --
>>>>>> Stefan
>>>>>>
>>>>>> On 2020-08-23 19:06, Stefan Agner wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I noticed a quite common freeze when running 32-bit U-Boot 2020.01
>>>>>>> (rpi_4_32b_defconfig) on a 2GB RPi4 model:
>>>>>>>
>>>>>>> U-Boot 2020.01 (Aug 07 2020 - 13:00:23 +0000)
>>>>>>>
>>>>>>> DRAM:  1.9 GiB
>>>>>>> <freeze, no more output>
>>>>>>>
>>>>>>> This happens fairly often, I would say 4 out of 5 boot tries. However,
>>>>>>> if it boots, everything seems to run fine.
>>>>>>>
>>>>>>> The issue seems to go away when using 2020.04 or any newer release,
>>>>>>> however, when trying to find the actual patch fixing the issue using git
>>>>>>> bisect I ended up with a MMC merge request which really seems unrelated
>>>>>>> (36bdcf7f3b). It seems that the problem is quite evasive and disappears
>>>>>>> if certain structure are aligned differently etc.
>>>>>>>
>>>>>>> Enabling initcall debugging showed that U-Boot crashes right after
>>>>>>> relocation:
>>>>>>>
>>>>>>> ...
>>>>>>> initcall: 00016f2c
>>>>>>>
>>>>>>> RAM Configuration:
>>>>>>> Bank #0: 0 948 MiB
>>>>>>> Bank #1: 40000000 1 GiB
>>>>>>> Bank #2: 0 0 Bytes
>>>>>>> Bank #3: 0 0 Bytes
>>>>>>>
>>>>>>> DRAM:  1.9 GiB
>>>>>>> initcall: 00016bb8
>>>>>>> New Stack Pointer is: 3af6d9e0
>>>>>>> initcall: 00016da4
>>>>>>> initcall: 00016ef0
>>>>>>> initcall: 00016ef8
>>>>>>> initcall: 00016d38
>>>>>>> Relocation Offset is: 3b375000
>>>>>>> Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
>>>>>>> initcall: 00016ec8 [clear_bss]
>>>>>>> initcall: 0004465c [display_options?? only appears sometimes]
>>>>>>> <freeze>
>>>>>>>
>>>>>>> I realized when using CONFIG_SYS_(I|D)CACHE_OFF=y the problem seems to
>>>>>>> disappear. But to be 100% certain that it is cache related, I used my
>>>>>>> original configuration (which is known to "reliably" freeze), and
>>>>>>> replaced 00016ec8 with 00008688 manually in the binary, essentially
>>>>>>> swapping out function pointers in "init_sequence_f" [00008688 is
>>>>>>> cleanup_before_linux, which flushes and disables I-cache/D-cache]. And
>>>>>>> indeed, that hacked up binary does boot reliably every time:
>>>>>>>
>>>>>>> ...
>>>>>>> initcall: 00016f2c
>>>>>>>
>>>>>>> RAM Configuration:
>>>>>>> Bank #0: 0 948 MiB
>>>>>>> Bank #1: 40000000 1 GiB
>>>>>>> Bank #2: 0 0 Bytes
>>>>>>> Bank #3: 0 0 Bytes
>>>>>>>
>>>>>>> DRAM:  1.9 GiB
>>>>>>> initcall: 00016bb8
>>>>>>> New Stack Pointer is: 3af6d9e0
>>>>>>> initcall: 00016da4
>>>>>>> initcall: 00016ef0
>>>>>>> initcall: 00016ef8
>>>>>>> initcall: 00016d38
>>>>>>> Relocation Offset is: 3b375000
>>>>>>> Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
>>>>>>> initcall: 00008688
>>>>>>> initcall: 3b38c10c
>>>>>>> initcall: 3b38c114
>>>>>>> initcall: 000172e0 (relocated to 3b38c2e0)
>>>>>>> initcall: 0001712c (relocated to 3b38c12c)
>>>>>>> ...
>>>>>>>
>>>>>>>  From what I understand on RPi4 caches are enabled when entering U-Boot.
>>>>>>> I was wondering if the relocation code really can handle that?
>>>>>>>
>>>>>>> --
>>>>>>> Stefan
>>>>


More information about the U-Boot mailing list