[U-Boot] [PATCH v6 2/8] lpc32xx: mtd: nand: add MLC NAND controller

Mon Mar 23 09:45:16 CET 2015

Bonjour Scott,

Le Fri, 20 Mar 2015 17:41:11 -0500, Scott Wood
<scottwood at freescale.com> a écrit :

> On Fri, 2015-03-20 at 10:35 +0100, Albert ARIBAUD wrote:
> > Hi Scott,
> > 
> > Le Thu, 19 Mar 2015 16:39:42 -0500, Scott Wood
> > <scottwood at freescale.com> a écrit :
> > 
> > > On Wed, 2015-03-18 at 10:04 +0100, Albert ARIBAUD (3ADEV) wrote:
> > > > +int nand_spl_load_image(uint32_t offs, unsigned int size, void *dst)
> > > > +{
> > > > +	int block_good;
> > > 
> > > bool?
> > > 
> > > > +	struct lpc32xx_oob oob;
> > > > +	unsigned int page, left;
> > > > +
> > > > +	/* if starting mid-block consider block good */
> > > > +	block_good = 1;
> > > > +
> > > > +	for (page = offs / BYTES_PER_PAGE,
> > > > +	     left = DIV_ROUND_UP(size, BYTES_PER_PAGE);
> > > > +	     left && (page < PAGES_PER_CHIP_MAX); page++) {
> > > > +		/* read inband and oob page data anyway */
> > > > +		int res = read_single_page(dst, page, &oob);
> > > > +		/* return error if a good block's page was not readable */
> > > > +		if ((res < 0) && block_good)
> > > > +			return -1;
> > > 
> > > Why even try to read it if it's been marked bad?
> > 
> > Actually, at this point, we may not know yet if the block is good or
> > bad, since we might be reading a page of it for the first time. Will
> > fix, see below.
> > 
> > > > +		/* if first page in a block, update 'block good' status */
> > > 
> > > The non-SPL driver appears to be checking the last two pages in the
> > > block, not the first page.
> > 
> > Thanks for pointing out this discrepancy.
> > 
> > I took the first page option here after checking the NAND's datasheet,
> > which says that for factory bad block marking at least, while all pages
> > in a bad block /should/ bear the bad block marker, only the first one is
> > /guaranteed/ to have it. The driver might have inherited the 'last page'
> > option from the previous NAND chip used, or from a mistake of my own.
> 
> Are there any plans for this controller to be used with different chips?

Not that I know; but in the same vein, the current LPC32XX maintainer
did not consider there were any plans for a few devices in it, and then
came my patch series :) so I cannot rule out that someday a new LPC32XX
board pops up in U-Boot with another NAND device.

> > BTW, is there a standard way to ask a NAND chip which page(s) in a bad
> > block should be checked?
> 
> Yes and no:
> http://lists.infradead.org/pipermail/linux-mtd/2011-November/038419.html

... right.

How about this: the driver expects a driver-specific configuration
option which tells it which page(s) in a block, and which byte in a
page's OOB area, should be scanned for bad block markers, and "my"
board provides a value for said option.

This way, when the driver is used with a new NAND chip in another
board, it can be configured for this board's specific case.  

> > > > +		if ((page % PAGES_PER_BLOCK) == 0) {
> > > > +			/* assume block is good */
> > > > +			block_good = 1;
> > > > +			/* if page could not be read, assume block is bad */
> > > > +			if (res < 0)
> > > > +				block_good = 0;
> > > 
> > > No.  A block is to be skipped if and only if it has a bad block marker.
> > > ECC errors should not be silently ignored just because they happen on
> > > the first page of a block.  If the writer of this data didn't skip the
> > > block, then skipping it when reading will result in unusable data
> > > regardless of the underlying ECC problem.
> > 
> > You're correct of course.
> > 
> > However, I do want the SPL chainloading to be as resilient as
> > possible, and we have established that it will have to read some part
> > of a bad block -- possibly resulting in a read failure -- before
> > knowing that the block is bad, which means it may have to finally
> > ignore the read failure.
> 
> FWIW, the eLBC/IFC SPL functions have the same problem regarding halting
> on an ECC error before checking the bad block status.  Instead it should
> return an error code, and let the caller halt after it's sure it's not
> marked bad.

Maybe we could generalize an SPL chainloading common/spl/spl.c routine
and have boards /SoCs only provide the hardware-specific page/OOB read
function(s) ? Honestly, that would be way beyond my scope for this
patch series.

> > I guess I could wait to declare the block good or bad until I could
> > read at least one if its pages (actually, one of its pages' OOB data)
> > properly, only bail out if the OOB says the block is supposed to be
> > good.
> >
> > Now, if none of the block's pages could be read, this still prevents us
> > from knowing whether these read failures are catastrophic.
> >
> > If the block was actually bad and skipped, then the resulting image
> > might be intact, will pass the checksum test, and will run; if the
> > block was actually good, SPL will detect it and not boot the corrupt
> > image -- except if the completely unreadable good block was the first
> > one, which holds the signature and checksum, in which case SPL will
> > fall back to considering a raw, unsigned, unverifiable image, and will
> > jump into corrupt code.
> > 
> > This may happen; but I think the probability of having a completely
> > unreadable sector is very low, and the probability of this sector being
> > the first one of the image is very low too [1].
> > 
> > Which leads me to two possible courses of action:
> > 
> > - consider the risk even though the probabilities are very low, thus
> >   consider any totally unreadable sector as good, and bail out in this
> >   case, or
> 
> I was thinking instead to distinguish between a hard failure where you
> got no data from the NAND (e.g. a timeout), versus an ECC failure.  In
> the later case you can still look in the OOB for a bad block marker.
>
> > - add an option to SPL that prevents common/spl/spl.c from falling back
> >   to running the image as raw if no signature is found, and consider
> >   any totally unreadable sector as bad and therefore ignore the read
> >   errors. Either the sector was actually good, and SPL will catch the
> >   image as corrupt or missing a signature, or the sector was actually
> >   bad, and the image will run as expected.
> 
> ...but this seems reasonable as well (it wouldn't work for eLBC/IFC
> since they need to support tiny SPLs that can only handle raw images).

This leads me to a half-OT question: so those SPL, while too tiny to
handle non-raw images, still do include the whole common/spl/spl.c
and thus contain code which will never apply to them? Then maybe we
should have /two/ options, one enabling raw images, and one enabling
'non-raw' images (and at least one enabled for any SPL), so that each
SPL could choose what code it wants in. Again, I am not offering to do
that in this patch series.

> If you want to do this, just put a comment in explaining why you're
> skipping in that situation.

Ok -- I will go with the 'option' method then, and add a (lengthy, I'm
afraid) comment in common/spl/spl.c explaining that skipping the raw
image handling is required when chainloading from NAND and the NAND
driver considers totally unreadable sectors bad, in order not to
confuse a missing image header with a raw image.

> -Scott

Thanks for the feedback!

Cordialement,
Albert ARIBAUD
3ADEV