[U-Boot] [PATCH 2/4] env_nand.c: support falling back to redundant env when writing

Thu Dec 20 22:41:37 CET 2012

On 12/20/2012 03:28:39 PM, Phil Sutter wrote:
> On Tue, Dec 11, 2012 at 05:12:32PM -0600, Scott Wood wrote:
> > Erase blocks are larger than write pages, yes.  I've never heard  
> erase
> > blocks called "pages" or write pages called "blocks" -- but my main
> > point is that the unit of erasing and the unit of badness are the  
> same.
> 
> Ah, OK. Please excuse my humble nomenclature, I never cared enough to
> sort out what is called what. Of course, this is not the best basis  
> for
> a discussion about these things.
> 
> But getting back to the topic: The assumption of blocks getting bad,  
> not
> pages within a block means that for any kind of bad block prevention,
> multiple blocks need to be used. Although I'm honestly speaking not
> really sure why this needs to be like that. Maybe the bad page marking
> would disappear when erasing the block it belongs to?

Yes, it would disappear.  This is why erase operations skip bad blocks,  
unless the scrub option is uesd.

> > > > The block to hold the environment is stored in the OOB of block
> > > zero,
> > > > which is usually guaranteed to not be bad.
> > >
> > > Erase or write block? Note that every write block has it's own  
> OOB.
> >
> > "block" means "erase block".
> >
> > Every write page has its own OOB, but it is erase blocks that are
> > marked bad.  Typically the block can be marked bad in either the  
> first
> > or the second page of the erase block.
> 
> Interesting. I had the impression of pages being marked bad and the
> block's badness being taken from whether it contains bad pages.  
> Probably
> the 'nand markbad' command tricked me.

Do you mean the lack of error checking if you pass a non-block-aligned  
offset into "nand markbad"?

> > > So that assumes that any block initially identified 'good' will  
> ever
> > > turn 'bad' later on?
> >
> > We don't currently have any mechanism for that to happen with the
> > environment -- which could be another good reason to have real
> > redundancy that doesn't get crippled from day one by having one copy
> > land on a factory bad block.  Of course, that requires someone to
> > implement support for redundant environment combined with
> > CONFIG_ENV_OFFSET_OOB.
> 
> Well, as long as CONFIG_ENV_OFFSET_REDUND supported falling back to  
> the
> other copy in case of error there would be a working system in three  
> of
> four cases instead of only one.

I'm not sure what you mean here -- where do "three", "four", and "one"  
come from?

> > Maybe a better option is to implement support for storing the
> > environment in ubi, although usually if your environment is in NAND
> > that means your U-Boot image is in NAND, so you have the same  
> problem
> > there.  Maybe you could have an SPL that contains ubi support, that
> > fits in the guaranteed-good first block.
> >
> > Do you have any data on how often a block might go bad that wasn't
> > factory-bad, to what extent reads versus writes matter, and whether
> > there is anything special about block zero beyond not being  
> factory-bad?
> 
> No, sadly not. I'd guess this information depends on what hardware  
> being
> used specifically. But I suppose block zero being prone to becoming
> worn just like any other block, although it not being erased as often
> should help a lot.
> 
> Assuming a certain number of erase cycles after each block is worn out
> and given the fact that CONFIG_ENV_OFFSET_REDUND has always both  
> blocks
> written (unless power failure occurs), they would turn bad at the same
> time and therefore rendering the environment useless with or without
> fallback. :)

That depends on whether the specified number of erase cycles per block  
is a minimum for any block not marked factory-bad, or whether some  
fraction of non-factory-bad blocks may fail early.

-Scott