RPi4 U-Boot freeze
Stefan Agner
stefan at agner.ch
Sat Sep 19 13:55:36 CEST 2020
On 2020-09-14 10:15, Matthias Brugger wrote:
> On 10/09/2020 23:12, Stefan Agner wrote:
>> On 2020-09-07 16:36, Peter Robinson wrote:
>>>> Any thoughts on this issue?
>>>
>>> Any reason why you're using 2020.01 and not at least 2020.07, or at
>>> least seeing if it's reproducible on 2020.10-rc3? The RPi4 support has
>>> changed quite a bit since then I suspect.
>>>
>>
>> Hi Peter,
>>
>> It's a stable release and we support a couple of devices with the same
>> U-Boot version. I'd rather prefer to stay with 2020.01 for RPi4 as well.
>>
>> We are on 2020.07 on development branch, and it does work fine there. So
>> I thought it can't be that hard, just bisect and backport whatever fixes
>> it... Unfortunately, it seems that there is no particular commit which
>> fixes it (the bisect ended up in a random unrelated commit, and it seems
>> that the issue appears/disappears depending on alignment/size...).
>>
>> I also did applied pretty much every RPi4 related commit made after
>> 2020.01 up until master back to 2020.01, no success either.
>>
>
> Which version of the Raspberry Pi firmware did you use?
> Unfortunately changes in the FW breaks stuff on U-Boot from time to time.
>
Ok, I am now able to reproduce the issue on master as well as 2020.07
with standard rpi_4_32b_defconfig, but I still need to have parts of a
change which seems to trigger the issue in. From what I can tell, the
change *really* should not lead to a freeze. The change is just
accessing global variables from the data section... (see below).
To me it still seems as if relocation somehow did not work correctly in
one way or another.
Are there maybe restrictions in U-Boot when the data section can be
accessed? E.g. is it not legal to access the data section from the
serial driver?
diff --git a/drivers/serial/serial_bcm283x_mu.c
b/drivers/serial/serial_bcm283x_mu.c
index 8a4af87eb6..74de6801ab 100644
--- a/drivers/serial/serial_bcm283x_mu.c
+++ b/drivers/serial/serial_bcm283x_mu.c
@@ -50,7 +50,8 @@ struct bcm283x_mu_regs {
struct bcm283x_mu_priv {
struct bcm283x_mu_regs *regs;
};
-
+static char *fs_argv[15];
+static uint32_t putc_retry = 0;
static int bcm283x_mu_serial_getc(struct udevice *dev);
static int bcm283x_mu_serial_setbrg(struct udevice *dev, int baudrate)
@@ -95,6 +96,8 @@ static int bcm283x_mu_serial_putc(struct udevice *dev,
const char data)
struct bcm283x_mu_priv *priv = dev_get_priv(dev);
struct bcm283x_mu_regs *regs = priv->regs;
+ putc_retry++;
+
/* Wait until there is space in the FIFO */
if (!(readl(®s->lsr) & BCM283X_MU_LSR_TX_EMPTY))
return -EAGAIN;
@@ -162,6 +165,10 @@ static int bcm283x_mu_serial_probe(struct udevice
*dev)
struct bcm283x_mu_priv *priv = dev_get_priv(dev);
fdt_addr_t addr;
+ /* Make sure compiler does not optimize out this fs_argv
instance */
+ if (fs_argv[0])
+ fs_argv[0] = "test";
+
/* Don't spawn the device if it's not muxed */
if (!bcm283x_is_serial_muxed())
return -ENODEV;
Most curious of all, it seems that the name (!!!) of the variable
fs_argv matters! I am not sure if that changes order of variables in
data section or something. I can also reproduce the issue with two
compilers (GCC 8.3 and GCC 9.2), so a compiler error seems somewhat
unlikely...
Any ideas? I am a bit out of idea how to debug this (I guess JTAG/gdb
might help, but I don't have such a setup).
FWIW, I plan to just drop the change which seems to at least partially
cause the isssue
(https://github.com/home-assistant/operating-system/blob/dev/buildroot-external/board/raspberrypi/patches/uboot/0002-avoid-block-uart-write.patch).
Still I think there is something wrong which will show itself someday in
a certain configuration.
--
Stefan
> Regards,
> Mathias
>
>> I fear that the problem in fact is still in master, but only appears if
>> certain things align a certain way... That is why I thought I bring it
>> up, to see if anybody else has noticed something along this lines. We do
>> have a rather trimmed down configuration, which might make the problem
>> appear more (fit in a D cache...).
>>
>> I probably will just disable cached around relocation for 2020.01 and
>> see if it resurfaces on development branch.
>>
>> --
>> Stefan
>>
>>
>>>> Just to be sure, I did some memory testing on the 2GB module, but no
>>>> issues found.
>>>>
>>>> I still somehow suspected that something else might be wrong with my
>>>> hardware, so I bought a new RPi4 (this time with 4GB of RAM) but the
>>>> very same with that:
>>>>
>>>> U-Boot 2020.01 (Aug 23 2020 - 22:02:31 +0000)
>>>>
>>>> DRAM: 3.9 GiB
>>>> <freeze>
>>>>
>>>> I still think there is something wrong with caching. From what I
>>>> understand caches are enabled by the RPi (4) firmware. Is it safe to run
>>>> 32-bit ARM U-Boot with enabled caches?
>>>>
>>>> --
>>>> Stefan
>>>>
>>>> On 2020-08-23 19:06, Stefan Agner wrote:
>>>>> Hi,
>>>>>
>>>>> I noticed a quite common freeze when running 32-bit U-Boot 2020.01
>>>>> (rpi_4_32b_defconfig) on a 2GB RPi4 model:
>>>>>
>>>>> U-Boot 2020.01 (Aug 07 2020 - 13:00:23 +0000)
>>>>>
>>>>> DRAM: 1.9 GiB
>>>>> <freeze, no more output>
>>>>>
>>>>> This happens fairly often, I would say 4 out of 5 boot tries. However,
>>>>> if it boots, everything seems to run fine.
>>>>>
>>>>> The issue seems to go away when using 2020.04 or any newer release,
>>>>> however, when trying to find the actual patch fixing the issue using git
>>>>> bisect I ended up with a MMC merge request which really seems unrelated
>>>>> (36bdcf7f3b). It seems that the problem is quite evasive and disappears
>>>>> if certain structure are aligned differently etc.
>>>>>
>>>>> Enabling initcall debugging showed that U-Boot crashes right after
>>>>> relocation:
>>>>>
>>>>> ...
>>>>> initcall: 00016f2c
>>>>>
>>>>> RAM Configuration:
>>>>> Bank #0: 0 948 MiB
>>>>> Bank #1: 40000000 1 GiB
>>>>> Bank #2: 0 0 Bytes
>>>>> Bank #3: 0 0 Bytes
>>>>>
>>>>> DRAM: 1.9 GiB
>>>>> initcall: 00016bb8
>>>>> New Stack Pointer is: 3af6d9e0
>>>>> initcall: 00016da4
>>>>> initcall: 00016ef0
>>>>> initcall: 00016ef8
>>>>> initcall: 00016d38
>>>>> Relocation Offset is: 3b375000
>>>>> Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
>>>>> initcall: 00016ec8 [clear_bss]
>>>>> initcall: 0004465c [display_options?? only appears sometimes]
>>>>> <freeze>
>>>>>
>>>>> I realized when using CONFIG_SYS_(I|D)CACHE_OFF=y the problem seems to
>>>>> disappear. But to be 100% certain that it is cache related, I used my
>>>>> original configuration (which is known to "reliably" freeze), and
>>>>> replaced 00016ec8 with 00008688 manually in the binary, essentially
>>>>> swapping out function pointers in "init_sequence_f" [00008688 is
>>>>> cleanup_before_linux, which flushes and disables I-cache/D-cache]. And
>>>>> indeed, that hacked up binary does boot reliably every time:
>>>>>
>>>>> ...
>>>>> initcall: 00016f2c
>>>>>
>>>>> RAM Configuration:
>>>>> Bank #0: 0 948 MiB
>>>>> Bank #1: 40000000 1 GiB
>>>>> Bank #2: 0 0 Bytes
>>>>> Bank #3: 0 0 Bytes
>>>>>
>>>>> DRAM: 1.9 GiB
>>>>> initcall: 00016bb8
>>>>> New Stack Pointer is: 3af6d9e0
>>>>> initcall: 00016da4
>>>>> initcall: 00016ef0
>>>>> initcall: 00016ef8
>>>>> initcall: 00016d38
>>>>> Relocation Offset is: 3b375000
>>>>> Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
>>>>> initcall: 00008688
>>>>> initcall: 3b38c10c
>>>>> initcall: 3b38c114
>>>>> initcall: 000172e0 (relocated to 3b38c2e0)
>>>>> initcall: 0001712c (relocated to 3b38c12c)
>>>>> ...
>>>>>
>>>>> From what I understand on RPi4 caches are enabled when entering U-Boot.
>>>>> I was wondering if the relocation code really can handle that?
>>>>>
>>>>> --
>>>>> Stefan
>>
More information about the U-Boot
mailing list