[U-Boot] arm: socfpga gen5: warm reboot reliability

Marek Vasut marex at denx.de
Wed Nov 14 23:06:49 UTC 2018


On 11/14/2018 10:04 PM, Simon Goldschmidt wrote:
> On 14.11.2018 21:51, Marek Vasut wrote:
>> On 11/14/2018 09:30 PM, Simon Goldschmidt wrote:
>>> Hi,
>> Hi,
>>
>>> [This whole description is not qspi specific but qspi happens to be my
>>> boot source. It should be the same when booting from mmc or nand.]
>>>
>>> On my socfgpa gen5 board, I was a little surprised that a soft reboot
>>> did not start the SPL from qspi after updating it from a running Linux
>>> (via swupdate writing to qspi).
>>>
>>> When debugging this, I found that this is because SPL and U-Boot set the
>>> magic value 0xae9efebc to the sysmgr_regs->romcodegrp_warmramgrp_enable
>>> register, which tells the boot ROM that it can jump to SPL in OnChip RAM
>>> on next warm reboot.
>>>
>>> Now this is a problem for me specifically because I want to run the SPL
>>> stored in qspi after updating it. I don't know if I'm the only one
>>> wanting to do that, but I could implement this for my own board by just
>>> resetting the magic value in the warmramgrp_enable register.
>> Yes, that's Alteras' workaround for broken systems which do not connect
>> SPI NOR reset and thus leave SPI NOR in undefined state (ie. 4B mode).
> 
> I remember being somewhat shocked to see some of the SPI NOR chips don't
> even have a reset input...

Yep, which is sad. Those probably need some sort of switch on the Vcc,
although I'd rather go for one which has a reset line.

>> When the bootrom tries to load from it, it fails and the system hangs.
>>
>> You can do cold reset instead of warm reset to force the bootrom to
>> reload the preloader from SPI NOR, it's another bit in the reset
>> register.
>>
>>> What surprised me more is that the boot ROM provides the ability to
>>> check the CRC of SPL, but it doesn't check the CRC since the related
>>> registers tellt it to not do that (to enable this, SPL would have to
>>> write its length + CRC to sysmgr registers). I would have suspected this
>>> bad design since maybe we were rebooted by the watchdog because some
>>> driver wrote bad data somewhere, so it might well be the SPL in OnChip
>>> RAM is not intact any more.
>> IIRC the BootROM only checks CRC of the header, no ?
> 
> That would have been good, but from reading the documentation ([1]), it
> seems to be different: it's a offset/length range that is CRC checked.
> And if the 'length' register is 0, "the Boot ROM won't perform CRC
> calculation and CRC check to avoid overhead caused by CRC validation."

We should definitely do the CRC check, unless it has side effects :-)

>>> Now should we:
>>> a) calculate and set a CRC for SPL's text / rodata segments (what about
>>> initialized rwdata?) or
>>> b) set the warmramgrp_enable register to always load SPL from boot
>>> source or
>>> c) make it configurable?
>> a) would be ideal, if it really is possible and works :)
> 
> Well, for someone in the need of updating SPL and using the updated SPL,
> b) would be best, but you can always just reset the magic after such an
> update to ensure the updated SPL is loaded.

Can't you invalidate the OCRAM (memset it to 0) and do cold-reset after
such an update ?

> I guess I'll try the CRC mechanism.
> 
>>
>>> Writing the code for any of the 3 choices is easy, so I wanted to get an
>>> opinion first before sending a patch. Personally, I'd favour always
>>> loading SPL from boot source, but the comment (in misc_gen5.c
>>> arch_early_init_r()) suggests I'd break ancient kernels with that (I
>>> assume the problem is the problem I have with 4.9: Linux sets qspi into
>>> 4-byte mode and the boot rom can then not load SPL since it uses 3-byte
>>> mode without switching mode in the qspi chip).
>> You'd break broken boards some more , that's all. It has nothing to do
>> with kernel version.
> 
> I thought newer kernels use 4-byte opcodes that don't set the chip into
> 4-byte mode, thus not breaking the boot ROM. That would make it
> depending on the kernel version. But I haven't found the time to prove
> this theory, yet.

It depends on the chip. Some chips don't have dedicated 4byte opcodes.
But even page program, which is interrupted in the middle of it, can
cause the bootrom to fail to boot, since the SPI NOR may not respond.
Worse yet, it may store some of the communication as actual data in the
SPI NOR, thus corrupting the content.

-- 
Best regards,
Marek Vasut


More information about the U-Boot mailing list