[U-Boot] [PATCH 0/5] dcache support for Raspberry Pi 1

Albert ARIBAUD albert.u.boot at aribaud.net
Thu Jul 9 00:47:10 CEST 2015


Hello Alexander,

On Wed, 08 Jul 2015 20:15:42 +0200, Alexander Stein
<alexanders83 at web.de> wrote:
> Hello Albert,
> 
> On Monday 06 July 2015, 23:26:41 wrote Albert ARIBAUD:
> > On Mon, 06 Jul 2015 20:24:31 +0200, Alexander Stein
> > <alexanders83 at web.de> wrote:
> > > Hello Albert,
> > > 
> > > On Monday 06 July 2015, 09:39:40 wrote Albert ARIBAUD:
> > > > On Sat,  4 Jul 2015 11:48:39 +0200, Alexander Stein
> > > > <alexanders83 at web.de> wrote:
> > > > 
> > > > > dcache supprt increases the MMC read performance on RPI 1 from 5,4 MiB/s to
> > > > > 12.3 MiB/s. It doesn't seem to have any affect on RPI 2 though. I just get
> > > > > error messages about non-cacheline aligned address upon invalidation.
> > > > 
> > > > Could it be that code needed to support dcache is not the same for
> > > > rpi_2's bcm2836 than it is for rpi's bcm2835?
> > > 
> > > Sure, bcm2835 is a armv6 while bcm2836 is a armv7.
> > > 
> > > > Anyway: if code properly handles unaligned addresses then it should not
> > > > throw an error message about it. Can you look into why the error is
> > > > thrown?
> > > 
> > > Apparently it does not handle non-cacheline aligned addresses transparently or silently.
> > > 
> > > Here is the part of the code:
> > > > static void v7_dcache_inval_range(u32 start, u32 stop, u32 line_len)
> > > > {
> > > > 	/*
> > > > 	 * If start address is not aligned to cache-line do not
> > > > 	 * invalidate the first cache-line
> > > > 	 */
> > > > 	if (start & (line_len - 1)) {
> > > > 		printf("ERROR: %s - start address is not aligned - 0x%08x\n",
> > > > 			__func__, start);
> > > > 		/* move to next cache line */
> > > > 		start = (start + line_len - 1) & ~(line_len - 1);
> > > > 	}
> > > 
> > > I don't know why (a) the cache invalidation is only done from the next cache line and (b) why this can't be done transparently without printing an error.
> > > But currently I'm not keen on fiddling with armv7 caches.
> > 
> > Well, I can see why.
> > 
> > Let's assume were invalidating the second half of a cache line because
> > that's where a buffer starts which we want to force-read from external
> > memory because some device fille dthis buffer with important data.
> > 
> > Now, most probably the compiler and linker will have used the addresses
> > before our buffer to map some variables which may be unrelated to the
> > buffer.
> > 
> > At the time we're told to invalidate the buffer, these variables may be
> > modified in-cache but not yet written out to external memory. If we
> > invalidate the first cache line, then we erase these modifications --
> > they're lost.
> > 
> > Now, this is an unsolvable problem -- we can't flush these variables
> > before invalidating, because then we would flush the whole cache line,
> > which would overwrite and trash the buffer in external memory.
> > 
> > So anyway, we're doomed; there is nothing we can do -- hence the ERROR
> > message. From the on, we can either just give up and go hang(), or we
> > can try to save whatever can be, skip the half cache line and start
> > invalidating at the next boundary.
> > 
> > (same goes for the last address: it has to be at the end of a cache
> > line, or else we can neither invalidate nor flush.)
> 
> Thanks for thise detailed explanation. I agree this is really a problem.
> But how should this behandled: The raspberry pi messagebox handling sends a message which might have more or less arbitrary length.
> I think it might be possible to achieve in every case that those message starts at the beginning of a cacheline. But the end might be
> at different positions with different messages sent. You must flush your data to get the firmware actually see this and you must invalidate to eventually read the answer data which is located at the same position.
> I guess I might just have not hit your described problem in my board (yet).

True, each message might have various and non-cacheline-aligned
sizes, but there is always a point in time when the buffer for that
message is allocated, and the size of buffer can be greater than the
size of the message. Here, we can compute the /buffer/ size by rounding
up the /message/ size to a multiple of the cache line.

Then, if we make sure that the buffer starts at a cache line boundary
(using the 'align' attribute or calling memalign), it follows that the
buffer also ends at a cache line boundary--IOW, the buffer occupies
whole cache lines only, and cannot share a cache line with some
other, unknown, variable.

Therefore, invalidating (resp. flushing) the buffer will always
invalidate (resp. flush) the whole message and nothing else than the
message. It will invalidate (resp.flush) a few more bytes, but these
are unused so there is no risk.

Note that I've considered the cache line here, but buffers which are
copied to/from DDR by some device using DMA might have stricter
alignment constraints yet. This just changes what value we should
align the start address and size of such buffers to. 

> Best regards,
> Alexander

Amicalement,
-- 
Albert.


More information about the U-Boot mailing list