[U-Boot-Users] RFC on davinci Nand support changes

Thu Sep 27 00:33:03 CEST 2007

On Tue, 25 Sep 2007, Troy Kisky wrote:

> ksi at koi8.net wrote:
>> On Mon, 24 Sep 2007, Troy Kisky wrote:
>>
>> First of all, one should always remember rule number one -- if it
> ain't
>> broke, don't fix it.
>>
>> Leaving techical details for later let's start with generic,
> philosophical
>> question first -- what are you trying to achieve with that "fix"? What
> is
>> your goal, why fix a working code? What advantages would your fix give
> us
>> versus existing working and tested code? I can see none and problems
> are
>> aplenty -- you wanna go against what silicon designers did, against
> the
>> Linux kernel and against common sense.

> How can I go against silicon designers? I think you give me far too much
> credit for ingenuity.

You are trying to do something that goes against silicon designers'
intentions.

>> Now for the technical side.
>>
>> First of all, erased blocks do _NOT_ contain ECC errors. They are
> _ERASED_
>> i.e. _EVERYTHING_ is set to all ones. Erased blocks simply do not have
> ECC.
>> And ECC code just skips erased blocks. One can _NOT_ "fix" OOB bytes
> for
>> erased blocks because it will make them _NOT_ erased thus making those
>> blocks dirty and unsuitable for writing any data into them.
>
> Well, the software ecc algorithm is designed to produce a ecc of ffffff
> for an
> erased block. I was merely following that philosophy. It also leads to
> more concise
> code without special cases.

Hardware algorithm does exactly the same.

>> "Much easier to read function" is not a reason for a change to working
>> code.

> You say, the code is working. I say it doesn't. It is far easier for
> someone
> to prove an error, than to prove correctness. Have you ever tried to
> force
> an ecc error into the last ecc block of a large page nand chip? Does the
> current
> code correctly detect and fix it? I have tried, and my code does fix it.
> But ignoring that, easier to read usually also means less buggy and more
> efficient.
> The current code is far from obvious. It counts for 12 bits of
> differences,
> which can lead to fixing an ecc error which, in reality, was unfixable.
> My approach of XORING the high and low 12 bits and comparing to fff is
> much
> safer, as well as more efficient.

Again, you are trying to reinvent the wheel. The ECC algorithm is described
in details in TI documentation and MontaVista implemented it verbatim.

>> Less for the fact that easiness is in the eye of beholder, the
> existing
>> code

> In this case, I don't think you'll find many that will argue that the
> old
> code looks better. Do you argue that yourself?

This code is a verbatim implementation of what is described in TI
documentation. It's not supposed to look nice, it's supposed to work.

>> is taken almost verbatim from MontaVista Linux kernel that everybody
> use
>> with a bugfix for unaligned access that made original code hang the
> U-Boot
>> (why it works in the kernel is left as an exercize for the reader.)
> And
>> this
>> has been done _DELIBERATELY_ to make U-Boot compatible with the
> kernel.
>
> I agree, linux must change if hardware ecc is being used. I said as much
> in the initial email. Is anyone using davinci hardware ecc under
> linux????
> Just curious.

I did. One must use some kind of ECC. And there is absolutely no reason to
attach a horse to a car, it's much better without.

>> Now
>> you propose some "fix" that 1.) breakes that compatibility thus
> forcing
>> everybody not only change working U-Boot code but also mess with a
> working
>> kernel for no apparent reason; 2.) makes it different from the kernel
>> source
>> thus splitting the code base.

> I am more than willing to fix the linux kernel as well. And I'd probably
> agree
> that should happen first.

Why do you want to fix something that is not broken? Don't you have anything
better to spend your time on? And why do you think that TI guys that
designed that silicon are deceiving us all with bad algorithm in their
documentation? And why do you think most of HWECC implementations in
different chips from different vendors include the same algorithm in their
documentation almost verbatim? Are they all stupid or you discovered a
conspiracy?

>> Also I can see no reason for using any other chip select for NAND with
>> DaVinci. If one boots it off of NAND it _MUST_ be on CS2 because it is
>> hardcoded in DaVinci ROM bootcode. If one has NAND there is no reason
> to
>> have NOR flash in the system ergo nothing's going to CS2 so why just
> don't
>> put NAND on that chip select according to what silicon design
> suggests? The
>> only reason I can see is using _SEVERAL_ NAND chips to get a bigger
> memory.
>> But that is totally different question that should be addressed in
>> board-specific portion so it does not pertain to those 3 boards
> currently
>> supported, none of them have more than one NAND chip and none have it
> on
>> anything but CS2. And frankly I don't see a reason for having more
> than one
>> such chip with current NAND chip densities.
>>
>> Bus width from EM_WIDTH pin is also unnecessary, that is what NAND
>> Device ID
>> for.

> True, but then an error message is printed. I wish there was a callback
> to set the width after the ID is obtained. Maybe there is, but I didn't
> see it.

>> As for EM_WAIT, there is a special pin dedicated exactly for this
>> purpose on
>> DaVinci so I can see no reason for not using it and using some
> precious
>> GPIO
>> instead. And anyways, if one wants to do it despite common sense, this
>> belongs to board-specific portion, not to generic NAND code.

> It is ifdefed out if you don't use it, so it won't cause a larger image.
> EM_WAIT is used for more than just NAND chips. If you want to free
> EM_WAIT
> for another purpose, something must take it's place. I.e. harddrive,
> boot from
> NOR, and NAND chip used concurrently. Although I'm not sure that would
> work,
> I see no reason for this code to make that policy decision.

You can NOT use ATA and NAND/NOR together in DaVinci. There is also
absolutely no reason to add a NOR Flash to a board that already has NAND.
And all those READY/WAIT outputs are always OD (or OC) so they work
perfectly well in wired-OR. That is how they are always designed.

>> Hardware ECC in DaVinci is _NOT_ flexible, it only works with 512-byte
>> blocks. That is why large page NAND ECC is done in 4 steps.

> I can find no documentation that says that. And if I did, I would still
> try it.
> I cannot think of a logical reason as to why it wouldn't work. But my
> question
> isn't will it work. My question is is it beneficial.

Have you tried to read TI documentation on DaVinci?

>> There is also absolutely no sane reason to forfeit perfectly working
>> hardware ECC in spite of software ECC that is a kludge for those
> systems
>> that don't have appropriate hardware. It brings additional code with
> all
>> its
>> penalties -- bigger size, slower speed etc. Why emulate something we
> do
>> have
>> in hardware? What's wrong with it?

> Your forgetting its biggest advantage, more eccs on smaller groups,
> means
> more single bit ecc errors can be corrected before giving up and a
> longer NAND
> flash life. If you read the ecc from the hardware after 256 bytes,
> instead of 512,
> it should just return the current value. It will be a little slower, but
> should not
> require a complete software implementation. It will require double the
> hardware ecc
> register reads, and double the comparisons. Thanks, this is exactly the
> discussion
> I wanted to start.

It won't prolong NAND life. The first thing one does when encountered an
error is copying data to a different block and mark the entire faulted block
as bad. Existing ECC is good enough to detect an error and, in most cases,
correct it at least once while data being moved to a spare block.

And you probably don't know that those OOB bytes are precious resource. Some
guys do really use it and every single byte counts. If you take some bytes
from them their data won't fit in the remaining space thus making some
extremely useful software unusable with nothing gained in return.

>> And no, DaVinci hardware ECC does _NOT_ overwrite factory bad block
>> markers.
>> Neither in small page NAND devices nor in large page ones. That is
> even
>> true
>> for NAND initial boot loader that does not use true ECC, just raw
> hardware
>> syndrome that is only used for checking if data is correct, not for
>> correction. The ECC part TI guys managed to implement properly unlike
> some
>> other parts of silicon that are either buggy or braindead...
>>
>> So it is definite NACK for something like this from your's truly KSI
> who
>> did
>> initial DaVinci port.

> I appreciate the effort you went to in this response. I hope more people
> will
> also look at it. I hope you will copy the segment that I have ifdefed to
> force
> an ecc error into your code, and let us know if the current
> implementation does
> indeed work.

---
******************************************************************
*  KSI at home    KOI8 Net  < >  The impossible we do immediately.  *
*  Las Vegas   NV, USA   < >  Miracles require 24-hour notice.   *
******************************************************************