[U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned
Marek Vasut
marex at denx.de
Tue Jun 26 22:39:08 CEST 2012
Dear Scott Wood,
> On 06/25/2012 08:33 PM, Marek Vasut wrote:
> > Dear Scott Wood,
> >
> >> On 06/25/2012 06:37 PM, Marek Vasut wrote:
> >>> Dear Scott Wood,
> >>>
> >>>> On 06/24/2012 07:17 PM, Marek Vasut wrote:
> >>>>> but that involves a lot of copying and therefore degrades performance
> >>>>> rapidly. Therefore disallow this possibility of unaligned load
> >>>>> address altogether if data cache is on.
> >>>>
> >>>> How about use the bounce buffer only if the address is misaligned?
> >>>
> >>> Not happening, bounce buffer is bullshit,
> >>
> >> Hacking up the common frontend with a new limitation because you can't
> >> be bothered to fix your drivers is bullshit.
> >
> > The drivers are not broken, they have hardware limitations.
>
> They're broken because they ignore those limitations.
>
> > And checking for
> > those has to be done as early as possible.
>
> Why?
Why keep an overhead.
> > And it's not a new common frontend!
>
> No, it's a compatibility-breaking change to the existing common frontend.
Well, those are corner cases, it's not like the people will start hitting it en-
masse.
I agree it should be somehow platform or even CPU specific.
> >>> It's like driving a car in the wrong lane. Sure, you can do it, but
> >>> it'll eventually have some consequences. And using a bounce buffer is
> >>> like driving a tank in the wrong lane ...
> >>
> >> Using a bounce buffer is like parking your car before going into the
> >> building, rather than insisting the building's hallways be paved.
> >
> > The other is obviously faster, more comfortable and lets you carry more
> > stuff at once.
>
> Then you end up needing buildings to be many times as large to give
> every cubicle an adjacent parking spot, maneuvering room, etc. You'll
> be breathing fumes all day, and it'll be a lot less comfortable to get
> even across the hallway without using a car, etc. Communication between
> coworkers would be limited to horns and obscene gestures. :-)
Ok, this has gone waaay out of hand here :-)
> > And if you drive a truck, you can dump a lot of payload instead of
> > carrying it back and forth from the building. That's why there's a
> > special garage for trucks possibly with cargo elevators etc.
>
> Yes, it's called targeted optimization rather than premature optimization.
>
> >>>> The
> >>>> corrective action a user has to take is the same as with this patch,
> >>>> except for an additional option of living with the slight performance
> >>>> penalty.
> >>>
> >>> Slight is very weak word here.
> >>
> >> Prove me wrong with benchmarks.
> >
> > Well, copying data back and forth is tremendous overhead. You don't need
> > a benchmark to calculate something like this:
> >
> > 133MHz SDRAM (pumped) gives you what ... 133 Mb/s throughput
>
> You're saying you get only a little more bandwidth from memory than
> you'd get from a 100 Mb/s ethernet port? Come on. Data buses are not
> one bit wide.
Good point, it was too late and I forgot to count that in.
> And how fast can you pull data out of a NAND chip, even with DMA?
>
> > (now if it's DDR, dual/quad pumped, that doesn't give you any more
> > advantage
>
> So such things were implemented for fun?
>
> > since you have to: send address, read the data, send address, write the
> > data ...
>
> What about bursts? I'm pretty sure you don't have to send the address
> separately for every single byte.
If you do memcpy? You only have registers, sure, you can optimize it, but on
intel for example, you don't have many of those.
> > this is expensive ... without data cache on, even more so)
>
> Why do we care about "without data cache"? You don't need the bounce
> buffer in that case.
Correct, it's expensive in both cases though.
> > Now consider you do it via really dump memcpy, what happens:
> It looks like ARM U-Boot has an optimized memcpy.
>
> > 1) You need to read the data into register
> > 1a) Send address
> > 1b) Read the data into register
> > 2) You need to write the data to a new location
> > 2a) Send address
> > 2b) Write the data into the memory
> >
> > In the meantime, you get some refresh cycles etc. Now, if you take read
> > and write in 1 time unit and "send address" in 0.5 time unit (this gives
> > total 3 time units per one loop) and consider you're not doing sustained
> > read/write, you should see you'll be able to copy at speed of about
> > 133/3 ~= 40Mb/s
> >
> > If you want to load 3MB kernel at 40Mb/s onto an unaligned address via
> > DMA, the DMA will deploy it via sustained write, that'll be at 10MB/s,
> > therefore in 300ms. But the subsequent copy will take another 600ms.
>
> On a p5020ds, using NAND hardware that doesn't do DMA at all, I'm able
> to load a 3MiB image from NAND in around 300-400 ms. This is with using
> memcpy_fromio() on an uncached hardware buffer.
blazing almost half a second ... but everyone these days wants a faster boot,
without memcpy, it can go down to 100ms or even less! And this kind of
limitation is not something that'd inconvenience anyone.
> Again, I'm not saying that bounce buffers are always negligible overhead
> -- just that I doubt NAND is fast enough that it makes a huge difference
> in this specific case.
It does make a difference. Correct, thinking about it -- implementing a generic
bounce buffer for cache-impotent hardware might be a better way to go.
> >>>> How often does this actually happen? How much does it
> >>>> actually slow things down compared to the speed of the NAND chip?
> >>>
> >>> If the user is dumb, always. But if you tell the user how to milk the
> >>> most of the hardware, he'll be happier.
> >>
> >> So, if you use bounce buffers conditionally (based on whether the
> >> address is misaligned), there's no impact except to "dumb" users, and
> >> for those users they would merely get a performance degradation rather
> >> than breakage. How is this "bullshit"?
> >
> > Correct, but users will complain if they get a subpar performance.
>
> If you expend the minimal effort required to make the bounce buffer
> usage conditional on the address actually being misaligned, the only
> users that will see subpar performance are those who would see breakage
> with your approach. Users will complain if they see breakage even more
> than when the see subpar performance.
>
> If the issue is educating the user to avoid the performance hit
> (regardless of magnitude), and you care enough, have the driver print a
> warning (not error) message the first time it needs to use a bounce buffer.
Ok, on a second thought, you're right. Let's do it the same way we did it with
mmc.
> -Scott
Best regards,
Marek Vasut
More information about the U-Boot
mailing list