[U-Boot-Users] RFC on davinci Nand support changes

Tue Sep 25 21:22:51 CEST 2007

ksi at koi8.net wrote:
> On Mon, 24 Sep 2007, Troy Kisky wrote:
> 
> First of all, one should always remember rule number one -- if it ain't
> broke, don't fix it.
> 
> Leaving techical details for later let's start with generic, philosophical
> question first -- what are you trying to achieve with that "fix"? What is
> your goal, why fix a working code? What advantages would your fix give us
> versus existing working and tested code? I can see none and problems are
> aplenty -- you wanna go against what silicon designers did, against the
> Linux kernel and against common sense.
How can I go against silicon designers? I think you give me far too much
credit for ingenuity.
> 
> Now for the technical side.
> 
> First of all, erased blocks do _NOT_ contain ECC errors. They are _ERASED_
> i.e. _EVERYTHING_ is set to all ones. Erased blocks simply do not have ECC.
> And ECC code just skips erased blocks. One can _NOT_ "fix" OOB bytes for
> erased blocks because it will make them _NOT_ erased thus making those
> blocks dirty and unsuitable for writing any data into them.

Well, the software ecc algorithm is designed to produce a ecc of ffffff for an
erased block. I was merely following that philosophy. It also leads to more concise
code without special cases.

> 
> "Much easier to read function" is not a reason for a change to working 
> code.
You say, the code is working. I say it doesn't. It is far easier for someone
to prove an error, than to prove correctness. Have you ever tried to force
an ecc error into the last ecc block of a large page nand chip? Does the current
code correctly detect and fix it? I have tried, and my code does fix it.
But ignoring that, easier to read usually also means less buggy and more efficient.
The current code is far from obvious. It counts for 12 bits of differences,
which can lead to fixing an ecc error which, in reality, was unfixable.
My approach of XORING the high and low 12 bits and comparing to fff is much
safer, as well as more efficient.

> Less for the fact that easiness is in the eye of beholder, the existing 
> code
In this case, I don't think you'll find many that will argue that the old
code looks better. Do you argue that yourself?

> is taken almost verbatim from MontaVista Linux kernel that everybody use
> with a bugfix for unaligned access that made original code hang the U-Boot
> (why it works in the kernel is left as an exercize for the reader.) And 
> this
> has been done _DELIBERATELY_ to make U-Boot compatible with the kernel.

I agree, linux must change if hardware ecc is being used. I said as much
in the initial email. Is anyone using davinci hardware ecc under linux????
Just curious.

> Now
> you propose some "fix" that 1.) breakes that compatibility thus forcing
> everybody not only change working U-Boot code but also mess with a working
> kernel for no apparent reason; 2.) makes it different from the kernel 
> source
> thus splitting the code base.
I am more than willing to fix the linux kernel as well. And I'd probably agree
that should happen first.

> 
> Also I can see no reason for using any other chip select for NAND with
> DaVinci. If one boots it off of NAND it _MUST_ be on CS2 because it is
> hardcoded in DaVinci ROM bootcode. If one has NAND there is no reason to
> have NOR flash in the system ergo nothing's going to CS2 so why just don't
> put NAND on that chip select according to what silicon design suggests? The
> only reason I can see is using _SEVERAL_ NAND chips to get a bigger memory.
> But that is totally different question that should be addressed in
> board-specific portion so it does not pertain to those 3 boards currently
> supported, none of them have more than one NAND chip and none have it on
> anything but CS2. And frankly I don't see a reason for having more than one
> such chip with current NAND chip densities.
> 
> Bus width from EM_WIDTH pin is also unnecessary, that is what NAND 
> Device ID
> for.
True, but then an error message is printed. I wish there was a callback
to set the width after the ID is obtained. Maybe there is, but I didn't see it.

> 
> As for EM_WAIT, there is a special pin dedicated exactly for this 
> purpose on
> DaVinci so I can see no reason for not using it and using some precious 
> GPIO
> instead. And anyways, if one wants to do it despite common sense, this
> belongs to board-specific portion, not to generic NAND code.
It is ifdefed out if you don't use it, so it won't cause a larger image.
EM_WAIT is used for more than just NAND chips. If you want to free EM_WAIT
for another purpose, something must take it's place. I.e. harddrive, boot from
NOR, and NAND chip used concurrently. Although I'm not sure that would work,
I see no reason for this code to make that policy decision.

> 
> Hardware ECC in DaVinci is _NOT_ flexible, it only works with 512-byte
> blocks. That is why large page NAND ECC is done in 4 steps.
I can find no documentation that says that. And if I did, I would still try it.
I cannot think of a logical reason as to why it wouldn't work. But my question
isn't will it work. My question is is it beneficial.

> 
> There is also absolutely no sane reason to forfeit perfectly working
> hardware ECC in spite of software ECC that is a kludge for those systems
> that don't have appropriate hardware. It brings additional code with all 
> its
> penalties -- bigger size, slower speed etc. Why emulate something we do 
> have
> in hardware? What's wrong with it?
Your forgetting its biggest advantage, more eccs on smaller groups, means
more single bit ecc errors can be corrected before giving up and a longer NAND
flash life. If you read the ecc from the hardware after 256 bytes, instead of 512,
it should just return the current value. It will be a little slower, but should not
require a complete software implementation. It will require double the hardware ecc
register reads, and double the comparisons. Thanks, this is exactly the discussion
I wanted to start.

> 
> And no, DaVinci hardware ECC does _NOT_ overwrite factory bad block 
> markers.
> Neither in small page NAND devices nor in large page ones. That is even 
> true
> for NAND initial boot loader that does not use true ECC, just raw hardware
> syndrome that is only used for checking if data is correct, not for
> correction. The ECC part TI guys managed to implement properly unlike some
> other parts of silicon that are either buggy or braindead...
> 
> So it is definite NACK for something like this from your's truly KSI who 
> did
> initial DaVinci port.
> 
I appreciate the effort you went to in this response. I hope more people will
also look at it. I hope you will copy the segment that I have ifdefed to force
an ecc error into your code, and let us know if the current implementation does
indeed work.

Sincerely,
Troy Kisky