[U-Boot] [PATCH 2/4] env_nand.c: support falling back to redundant env when writing

Phil Sutter phil.sutter at viprinet.com
Fri Dec 21 11:34:03 CET 2012


On Thu, Dec 20, 2012 at 03:41:37PM -0600, Scott Wood wrote:
> On 12/20/2012 03:28:39 PM, Phil Sutter wrote:
> > On Tue, Dec 11, 2012 at 05:12:32PM -0600, Scott Wood wrote:
> > > Erase blocks are larger than write pages, yes.  I've never heard  
> > erase
> > > blocks called "pages" or write pages called "blocks" -- but my main
> > > point is that the unit of erasing and the unit of badness are the  
> > same.
> > 
> > Ah, OK. Please excuse my humble nomenclature, I never cared enough to
> > sort out what is called what. Of course, this is not the best basis  
> > for
> > a discussion about these things.
> > 
> > But getting back to the topic: The assumption of blocks getting bad,  
> > not
> > pages within a block means that for any kind of bad block prevention,
> > multiple blocks need to be used. Although I'm honestly speaking not
> > really sure why this needs to be like that. Maybe the bad page marking
> > would disappear when erasing the block it belongs to?
> 
> Yes, it would disappear.  This is why erase operations skip bad blocks,  
> unless the scrub option is uesd.

Which is apparently preventing good pages in a block with a bad page
from being used, isn't it?

> > > > > The block to hold the environment is stored in the OOB of block
> > > > zero,
> > > > > which is usually guaranteed to not be bad.
> > > >
> > > > Erase or write block? Note that every write block has it's own  
> > OOB.
> > >
> > > "block" means "erase block".
> > >
> > > Every write page has its own OOB, but it is erase blocks that are
> > > marked bad.  Typically the block can be marked bad in either the  
> > first
> > > or the second page of the erase block.
> > 
> > Interesting. I had the impression of pages being marked bad and the
> > block's badness being taken from whether it contains bad pages.  
> > Probably
> > the 'nand markbad' command tricked me.
> 
> Do you mean the lack of error checking if you pass a non-block-aligned  
> offset into "nand markbad"?

I think the bigger "problem" is 'nand markbad' updating the bad block
table along the go. So no real bad block detection occurs as far as I
can tell.

> > > > So that assumes that any block initially identified 'good' will  
> > ever
> > > > turn 'bad' later on?
> > >
> > > We don't currently have any mechanism for that to happen with the
> > > environment -- which could be another good reason to have real
> > > redundancy that doesn't get crippled from day one by having one copy
> > > land on a factory bad block.  Of course, that requires someone to
> > > implement support for redundant environment combined with
> > > CONFIG_ENV_OFFSET_OOB.
> > 
> > Well, as long as CONFIG_ENV_OFFSET_REDUND supported falling back to  
> > the
> > other copy in case of error there would be a working system in three  
> > of
> > four cases instead of only one.
> 
> I'm not sure what you mean here -- where do "three", "four", and "one"  
> come from?

Just some quantitative approach: given the environment residing at block
A and it's redundant copy at block B, four situations may occur: both
blocks good, block A bad, block B bad or both blocks bad. Upstream would
fail in all cases but both blocks good. My patch would turn that into
failing only if both blocks bad. So working in three of four cases
instead of in only one of four.

> > > Maybe a better option is to implement support for storing the
> > > environment in ubi, although usually if your environment is in NAND
> > > that means your U-Boot image is in NAND, so you have the same  
> > problem
> > > there.  Maybe you could have an SPL that contains ubi support, that
> > > fits in the guaranteed-good first block.
> > >
> > > Do you have any data on how often a block might go bad that wasn't
> > > factory-bad, to what extent reads versus writes matter, and whether
> > > there is anything special about block zero beyond not being  
> > factory-bad?
> > 
> > No, sadly not. I'd guess this information depends on what hardware  
> > being
> > used specifically. But I suppose block zero being prone to becoming
> > worn just like any other block, although it not being erased as often
> > should help a lot.
> > 
> > Assuming a certain number of erase cycles after each block is worn out
> > and given the fact that CONFIG_ENV_OFFSET_REDUND has always both  
> > blocks
> > written (unless power failure occurs), they would turn bad at the same
> > time and therefore rendering the environment useless with or without
> > fallback. :)
> 
> That depends on whether the specified number of erase cycles per block  
> is a minimum for any block not marked factory-bad, or whether some  
> fraction of non-factory-bad blocks may fail early.

Sure. Also I'm not sure how "wear" happens, so if blocks get worse or
their probability of failure increases from erase to erase. Although the
later case would make it hard to guarantee a certain number of erase
cycles.

Best wishes, Phil

-- 
Viprinet Europe GmbH
Mainzer Str. 43
55411 Bingen am Rhein
Germany

Phone/Zentrale:               +49 6721 49030-0
Direct line/Durchwahl:        +49 6721 49030-134
Fax:                          +49 6721 49030-109

phil.sutter at viprinet.com
http://www.viprinet.com

Registered office/Sitz der Gesellschaft: Bingen am Rhein, Germany
Commercial register/Handelsregister: Amtsgericht Mainz HRB44090
CEO/Geschäftsführer: Simon Kissel


More information about the U-Boot mailing list