[U-Boot] [PATCH 01/21] Define new system_restart() and emergency_restart()
Wolfgang Denk
wd at denx.de
Mon Mar 14 21:38:08 CET 2011
Dear "Moffett, Kyle D",
In message <613C8F89-3CE5-4C28-A48E-D5C3E8143A4C at boeing.com> you wrote:
>
> On our boards, when the "reset" button is pressed in hardware, both
> processor modules on the board and all the attached hardware reset at
> the same time.
OK. So a sane design would provide a way for both of the processors
to do the same, for example by toggeling some GPIO or similar.
> If just *one* of the 2 CPUs triggers the reset then only *some* of
> the attached hardware will be properly reset due to a hardware
> errata, and as a result the board will sometimes hang or corrupt DMA
> transfers to the SSDs shortly after reset.
...
> Yes, it's a royal pain, but we're stuck with this hardware for the
> time being, and if the board can't communicate then it might as well
> hang() anyways.
Do you agree that this is a highly board-specific problem (I would
call it a hardware bug, but I don't insist you agree on that term),
and while there is a the need for you to work around such behaviour
there is little or no reason to do this, or anything like that, in
common code ?
> > And if there are more things that could be done to provide a "better"
> > reset, then why should we not always do these?
>
> If the board is in a panic() state it may well have still-running DMA
> transfers (such as USB URBs), or be in the middle of writing to
> FLASH.
The same (at least having USB or other drivers still being enabled,
and USB writing it's SOF counters to RAM) can happen for any call to
the reset() function. I see no reason for assuming there would be
better or worse conditions to perform a reset.
> Performing a jump to early-boot code which is only ever tested when
> everything is OK and devices are properly initialized is a great way
> to cause data corruption.
If there is a software way to prevent such issues, then these steps
should always be performed.
> I know for a fact that our boards would rather hang forever than try
> to reset without cooperation from the other CPU.
As mentioned above, this is a board specific issue that should not
influence common code design.
> >> While I was going through the hooks I noticed that several of them were
> >> explicitly NOT safe if the board was in the middle of a panic() for whatever
> >
> > Can you please peovide some specific examples? I don't understand what
> > you are talking about.
>
> Ok, using the ppmc7xx board as an example:
>
> /* Disable and invalidate cache */
> icache_disable();
> dcache_disable();
>
> /* Jump to cold reset point (in RAM) */
> _start();
>
> /* Should never get here */
> while(1)
> ;
>
> This board uses the EEPRO100 driver, which appears to set up
> statically allocated TX and RX rings which the device performs DMA
> to/from.
>
> If this board starts receiving packets and then panic()s, it will
> disable address translation and immediately re-relocate U-Boot into
> RAM, then zero the BSS. If the network card tries to receive a packet
> after BSS is zeroed, it will read a packet buffer address of
> (probably) 0x0 from the RX ring and promptly overwrite part of
> U-Boot's memory at that address.
Agreed. So this should be fixed. One clean way to fix it would be to
help improving the driver model for U-Boot (read: create one) and
making sure drivers get deinitialized in such a case.
> Since the panic() path is so infrequently used and tested, it's
> better to be safe and hang() on the boards which do not have a
> reliable hardware-level reset than it is to cause undefined behavior
> or potentially corrupt data.
I disagree. Instead of adding somewhat obscure alternate code paths
(which get tested even less frequently) we should focus oin fixing
such problems where we run into them.
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
Microsoft Multitasking:
several applications can crash at the same time.
More information about the U-Boot
mailing list