[U-Boot] [PATCH v1 0/2] mtd: nand: omap: booting from NAND using u-boot

Brian Norris computersforpeace at gmail.com
Wed Feb 12 00:40:40 CET 2014


Hi Pekon,

On Tue, Jan 28, 2014 at 07:42:09AM +0000, Pekon Gupta wrote:
> >From: Brian Norris
> >
> >On Fri, Dec 13, 2013 at 02:42:56PM +0530, Pekon Gupta wrote:
> >> As there were parallel set of patches running between u-boot and kernel.
> >
> >I don't know what patches you're talking about.
> >
> Following patch-sets touched ecc.layout while cleaning up OMAP NAND driver.
> u-boot: http://lists.denx.de/pipermail/u-boot/2013-November/167551.html
> kernel: http://lists.infradead.org/pipermail/linux-mtd/2013-October/049414.html
> 
> >> hence, some patch-sets caused regression for OMAP3x platforms when booting
> >
> >"Hence" is not the right word. The previous sentence doesn't imply that
> >there were regressions.
> >
> >> using u-boot specifically for ecc-schemes (like BCH4_SW).
> >
> >Huh? What does "specifically for ecc-schemes" mean? You mean that only
> >some schemes had regressions? Can you be more specific? What are these
> >regressions?
> >
> It was reported by Enric Balletbo Serra <eballetbo at iseebcn.com> [2] that
> when using 'OMAP_ECC_BCH8_CODE_HW_DETECTION_SW' ecc-scheme
> there was a mismatch in ECC layout between mainline u-boot and kernel.
> This caused boot failure when kernel tried to mount UBIFS with image
> flashed from u-boot.
> 
> This was missed while testing earlier patch-sets because only OMAP3 boards
> use above ecc-scheme as OMAP3 SoC do not have ELM hardware engine.
> All the later platforms use 'OMAP_ECC_BCH8_CODE_HW' ecc-scheme.
> Both ecc-scheme are based on same algorithm of BCH8 ECC code except that:
> (1) 'OMAP_ECC_BCH8_CODE_HW_DETECTION_SW' uses /lib/bch.c software
> library for ecc-correction.
> (2) 'OMAP_ECC_BCH8_CODE_HW' uses ELM hardware engine for ecc-correction.
> 
> This regression affects following ecc-schemes, as ecc.layout logic is just
> copy-paste of each-other.
> - 'OMAP_ECC_BCH4_CODE_HW_DETECTION_SW' and
> - 'OMAP_ECC_BCH8_CODE_HW_DETECTION_SW'

OK

> >The rest of this cover letter (and most of the patch descriptions)
> >describes the configurations and testing (which is all very good, don't
> >get me wrong), with a lot of focus on things I don't follow (namely,
> >U-Boot), but you omit many of the important details, like
> >
> > * What is the regression? I honestly don't see a description of the
> >   actual regression. (I can read the code to intuit that, but what's
> >   the point of your pages of description if it doesn't mention this?)
> >   Please give some logical description of the problem
> >
> [Patch 1/2] of this series actually simplification of 'ecclayout->oobfree->offset'
> It does not fix the regression, but it's important to clean it, before any new
> feature additions.

By my reading, it is actually wrong, and therefore not just a clean up.
Can you address my comment there about your indexing into the eccpos[]
array?

Did you actually check (e.g., print out) the resulting ecclayout before
and after your cleanup to make sure it's exactly the same? I'm looking
at the oobfree[] values, which looked wrong to me.

> [Patch 2/2] Actually fixes the regression for affected ecc-schemes. But
> "this patch also touches other ecc-schemes as the fix required
>  refactoring common code, into ecc-scheme specific code."
> 
> 
> > * What are the effects of the regression? ECC bytes in the wrong place,
> >   so you see correction failures/corruption? JFFS2 failures? (The
> >   "oobfree" changes, for instance, should only affect something like
> >   JFFS2.)
> >
> As mentioned above [PATCH 1/2] which touches 'ecclayout->oobfree->offset'
> is a clean-up not affecting regression, but it should be taken in before
> any further feature or ecc-scheme changes.

That's fine; a fixup patch can come just before a bugfix, if that helps
simplify the bufix.

> Now, what is your preference?
> (a)  Take [PATCH 1/2] ecc.layout clean-up separately.
> (b) Take ecc.layout as part of this series, but may be with different patch title.

I think we'll keep these patches together, but they both need to have
better descriptions to tell what's actually going on.

> > * What Linux commit caused the regression?
> > * Should this patch set be marked for -stable? And for what kernel
> >   release? (the previous question would help answer this)
> >
> This regression was caused in 
> 	commit b491da7233d5dc1a24d46ca1ad0209900329c5d0
> 	 mtd: nand: omap: clean-up ecc layout for BCH ecc schemes
> So, this should be applicable from 3.13.x+
> 
> 
> >> Hence this patch series fixes those regressions, and tests complete
> >[...] snip
> >
> >I'm preparing a 3.14 pull request soon, and since you seem committed to
> >fixing and properly testing a known regression here, I'd like to see
> >this go in. But given the late timing and the unanswered questions, I
> >think it's unlikely to go in -rc1. Perhaps I can send a later pull
> >request if we can get it together properly.
> >
> Yes, if the above details are sufficient, then I'll:
> (1) re-word [PATCH 1/2] commit log & title to mention that it's a clean-up
> (2) Add above description and details of regression into commit log of [PATCH 2/2]
> (3) Include your other comments on individual patches.

That's a good plan. But please, before sending another version, can you
at least address my comment on patch 1 about the eccpos[] indexing you
are using? It doesn't look right to me.

Also, can you please print out your full layout (eccpos, eccbytes,
oobavail, and oobfree) and check it?

> >So I'd like to first of all get a few answers to my questions, and then
> >I think we might want to refresh this patch series with a little better
> >commit messages and perhaps a separate documentation file (not
> >necessarily in the same patch series, as the fix is more urgent).
> >
> Yes, I plan to get some documentation in kernel docs. But for now
> I do have a 'un-polished' page for OMAP NAND driver on TI wiki [3].
> Please let me know if you would like something similar for kernel docs.

Thanks for the link. Some portion of that doc is probably useful to
include in Linux eventually.

> I'm happy that you are scrutinizing each patch and their commit logs wisely.
> Thanks much for that. This helps me and probably everyone else to tighten-up
> so that only good quality patch-sets go in. This eventually will make the
> sub-system and the drivers more stable for long-term.
> 
> Also, if you don't mind, I'll still continue to cross-post this particular
> Patch-series on both u-boot and kernel mailing lists just to maintain
> continuity, as I know there are lot of OMAP3 users which are following
> only one of the two lists.. (but I'm cutting down on the CC list).
> I'll avoid cross posting in future.

Cross-posting is OK with me if it's made a little more clear who is
targeted. The restrictive U-Boot mailing list policy (which rejects
non-member / too-large-CC-list email) diminishes the usefulness of
cross-posting too.

But most of all, using "u-boot" in the subject of the cover letter email
does not clearly convey that this is a Linux patch!

> >Brian
> >
> >   [1] For instance, rather than saying "the ECC layout doesn't match
> >   U-Boot", you should probably describe what the intended ECC layout
> >   should look like and show that Linux does not match it. Perhaps it's
> >   time for some better documentation, like Ezequiel stashed under
> >   Documentation/mtd/nand/ for pxa3xx. You could also take some cues
> >   from Huang's comments on ECC layouts, etc., in gpmi-nand.c.
> 
> 
> [2] http://lists.denx.de/pipermail/u-boot/2013-December/168440.html
> [3] https://processors.wiki.ti.com/index.php/Linux_Core_NAND_User%27s_Guide (under construction)

I look forward to you addressing my last concerns and sending a good v2.
Thanks!

Brian


More information about the U-Boot mailing list