[U-Boot] [PATCH v6 00/18] GPT over MTD

Fri May 19 16:58:32 UTC 2017

Thanks Maxime for the feedback,

> -----Original Message-----
> From: Maxime Ripard [mailto:maxime.ripard at free-electrons.com]
> Sent: mardi 16 mai 2017 11:29
> 
> On Fri, May 12, 2017 at 04:09:08PM +0000, Patrick DELAUNAY wrote:
> > Hi Maxime
> >
> > > From: Maxime Ripard [mailto:maxime.ripard at free-electrons.com]
> > > Sent: jeudi 11 mai 2017 16:46
> > >
> > > On Thu, May 11, 2017 at 09:19:16AM +0000, Patrick DELAUNAY wrote:
> > > > Hi Maxime,
> > > >
> > > > > From: Maxime Ripard [mailto:maxime.ripard at free-electrons.com]
> > > > > Sent: jeudi 11 mai 2017 10:20
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Thu, May 11, 2017 at 09:51:50AM +0200, Patrick Delaunay wrote:
> > > > > > I have a request to support GPT over MTD to have the MTD
> > > > > > informations without U-Boot
> > > environment(CONFIG_ENV_IS_NOWHERE is a
> > > > > > other requirement of my project to manage several board
> > > > > > configuration with the same defconfig; boot from NAND or NOR
> > > > > > or
> > > SDCARD).
> > > > >
> > > > > What would happen if you have a bad block in the middle of the
> > > > > primary or secondary GPT headers (or both)?
> > > > >
> > > > > Maxime
> > > > >
> > > >
> > > > All Bad block are skipped....
> > > > => primary GPT header is located at the beginning of first good
> > > > block => backup GPT header is located at the end the of last good
> > > > block
> > > >
> > > > And gpt create will failed if the erase block command (for primary
> > > > or backup GPT) produce a new bad block.
> > >
> > > Right, but what happens if your block becomes bad or too corrupted
> > > after it's been written.
> > >
> > > You mention in your Drawbacks section that if you erase the block
> > > and is now detected to be bad, u-boot will have to act upon it. But
> > > that can happen outside of U-Boot as well, or not directly to this
> > > block, by reads or writes disturb... In this case, your GPT header
> > > is gone, and you have no way to recover from it.
> >
> > Yes, I known that it is the main issue for my proposal: the management
> > of NAND bad block.  But it is the same for all the binary in boot
> > stage (SPL / U-Boot / U-Boot env) in NAND.
> 
> U-Boot environment can be stored in UBI, and iirc U-Boot too. And usually
> the SPLs can be stored at multiple offsets to reduce the risks.

So multiple copy for fist boot stage (SPL or other) to reduce the risk
And first boot stage need to handle the all NAND constraint.

Then for U-Boot SPL, the more simple is to manage next stage in UBIFS volume.
And for other first boot stage, multiple copy to avoid all NAND issues (bad block / read disturb)

I agree that this architecture simplify the NAND management for SPL and next boot stage
and allow improvement for SPL (FALCON mode).

> 
> > For my point of view, erase should done only when GPT header need to
> > updated
> >
> > => for first flashing / complete update of NAND => for the refresh of
> > the GPT header So the NAND block becomes BAD only in this 2 cases.
> >
> > If a read or write disturb for GPT block occur, it should be detected
> > by NAND ECC => when unrecoverable error occur for one boot the backup
> > GPT header should be used PS: Perhaps need to do something in U-Boot
> > to refresh primary GPT header from backup header informaiton
> 
> Like I was saying, the issue really isn't in U-Boot itself, but when U-Boot isn't
> there anymore to deal with those issues.

Yes in this case the product is dead....

> > The expected strategy is to read the boot partitions (all of them ,
> > GPT header , SPL and U-Boot) periodically outside of U-Boot and if ECC
> > read are too important refresh them :
> > read the partition and write it again in RAW mode (skip bad block)
> >
> > So if the partitioning is correctly managed (with reserve some tank of
> > good block for partition refresh) The GPT can de refreshed in raw mode
> > without breaking the other partition
> >
> > My idea is : to prepare a partionning with tank of good block
> > 3 is this example :
> >
> > 0  => MBR + primary GPT
> > 1 => tank block
> > 2 => tank block
> > 3 => tank block
> > --------------------- MTD1
> > 4 => SPL
> > 5 => tank block
> > 6 => tank block
> > 7 => tank block
> > ----------------------MTD2
> > 8 => U-Boot  1/3
> > 9 => U-Boot  2/3
> > 10 => U-Boot  3/3
> > 11 => tank block
> > 12 => tank block
> > 13 => tank block
> > ----------------------MTD3
> > 14 => UBI for kernel
> > ......
> > N-8 => last usable LBA
> > ----------------------
> > N-7 => tank block
> > N-6 => tank block
> > N-5 => tank block
> > N-4 => backup GPT
> > N-3 => BBT (marked as bad)
> > N-2 => BBT (marked as bad)
> > N-1 => BBT (marked as bad)
> > N => BBT (marked as bad)
> >
> > Block 0  and N-4 will be refreshed when ECC errors reach the threshold
> > for any partition => refresh = read + erase + write (skip bad block)
> > => if the erase command detect a bad block, the next tank block is used.
> >
> > 0  => Bad block (NEW)
> > 1 => MBR + primary GPT
> > 2 => tank block
> > 3 => tank block
> >
> > Same for backup GPT but with inversed way
> >
> > N-7 => tank block
> > N-6 => tank block
> > N-5 => backup GPT
> > N-4 => bad block (NEW)
> >
> > The number of tank block need to be choose correctly (To be defined
> > with NAND information)
> 
> This really feels like you're trying to reinvent the UBI-partitions wheel, but
> yeah, that would make it a bit more robust.

Agree: more robust but still not perfect, 
and many thinks to manage (number of good block in tank reservoir) to guaranty a number of boot.

> > PS: it is the same for SPL and U-Boot partition
> >
> > Do you know better solution to handle read disturb issue on boot
> > partitions than refresh ?
> 
> Unfortunately, I don't, but the biggest concern I have is that you have no way
> to tell that your system is basically in self-destruct mode outside of U-Boot.
> And that's an issue because:
>   A) U-Boot will be there for (hopefully) less than a second
>   B) An user getting a brand new system will have no idea that it
>      needs to install something else so that his entire system do not
>      fall apart.
>
> Maxime
>

I understood the issue, and I agree with analyze.... after some internal discussion.

In fact, today I am less and less confident with the proposed strategy by my project for NAND
because I see all the impacts since I wrote the first RFC and start to test it.... 

I think I need to cross-check  my initial requirements on my side for the boot from NAND scenario
(because the project to will use SPL but also a other primary bootloader)

Since the beginning the GPT was a strong request to have common behavior fot block device and MTD device
but after it seens too difficult to guaranty boot with NAND constraint.

With your feedback, I will try to drop this requirement....
I will check this point internally and to push to switch to UBIFS.

Patrick