[U-Boot] [PATCH 2/3] net/designware: invalidate entire descriptor in dw_eth_send

Fri Apr 25 10:48:41 CEST 2014

Hi Ian,

On Thu, 2014-04-24 at 20:14 +0100, Ian Campbell wrote:
> On Thu, 2014-04-24 at 17:41 +0000, Alexey Brodkin wrote:
> 
> > 1. Don't invalidate "sizeof(struct dmamacdescr)" but only
> > "roundup(sizeof(desc_p->txrx_status), ARCH_DMA_MINALIGN))".
> 
> OK. (Although given the realities of the real world values of
> ARCH_DMA_MINALIGN on every arch and the sizes of the structs & fields
> involved this isn't actually buying you anything at all)

Well this particular structure is of size sizeof(uint32_t) * 4 = 16
bytes. And I may suppose that cache lines could be shorter than 16 bytes
even though it could be pretty rare situation. So definitely not a big
deal.

But since we're dealing with macros here all mentioned calculations will
be done by pre-processor and execution performance won't be affected.

> > 2. In the following lines implements rounding as well:
> 
> Will fix as well.
> 
> > 3. Check carefully if there're other instances of probably unaligned
> > cache operations.

I thought a bit more about this situation and now I'm not that sure if
we need to align addresses we pass to cache invalidate/flush functions.

Because IMHO drivers shouldn't care about specifics of particular
platform or architecture. Otherwise we'll need to patch each and every
driver only for cache invalidate/flush functions. I looked how these
functions are used in other drivers and see that in most of cases no
additional alignment precautions were implemented. People just pass
start and end addresses.

In its turn platform and architecture provides cache invalidate/flush
functions implement its functionality depending on hardware specifics.

For example on architectures that may only flush/invalidate with
granularity of 1 cache line cache invalidate/flush functions make sure
to start processing from the start of the cache line to which start
address falls and end processing when cache line where end address falls
is processed.

I may assume that there're architectures that automatically understand
from which cache line to start and at which line to stop processing.

But if your architecture requires cache line aligned addresses to be
used for start/end addresses you may look for examples in ARC
(http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/arc/cpu/arc700/cache.c),, MIPS (http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/mips/cpu/mips32/cpu.c), SH (http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/sh/cpu/sh4/cache.c),

and what's interesting even implementation you use have semi-proper
start/end addresses handling -
http://git.denx.de/?p=u-boot.git;a=blob;f=arch/arm/lib/cache-pl310.c

Here's your invalidation procedure:
============
/* invalidate memory from start to stop-1 */
void v7_outer_cache_inval_range(u32 start, u32 stop)
{
	/* PL310 currently supports only 32 bytes cache line */
	u32 pa, line_size = 32;

	/*
	 * If start address is not aligned to cache-line do not
	 * invalidate the first cache-line
	 */
	if (start & (line_size - 1)) {
		printf("ERROR: %s - start address is not aligned - 0x%08x\n",
			__func__, start);
		/* move to next cache line */
		start = (start + line_size - 1) & ~(line_size - 1);
	}

	/*
	 * If stop address is not aligned to cache-line do not
	 * invalidate the last cache-line
	 */
	if (stop & (line_size - 1)) {
		printf("ERROR: %s - stop address is not aligned - 0x%08x\n",
			__func__, stop);
		/* align to the beginning of this cache line */
		stop &= ~(line_size - 1);
	}

	for (pa = start; pa < stop; pa = pa + line_size)
		writel(pa, &pl310->pl310_inv_line_pa);

	pl310_cache_sync();
}
============

1. I don't understand why start from the next cache line if start
address is not aligned to cache line boundary? I'd say that you want to
invalidate cache line that contains unaligned start address. Otherwise
first bytes won't be invalidated, right?

2. Why do we throw _error_ message. I may understand if you emit
_warning_ message in case of debug build (with DEBUG defined). Well in
current implementation (see 1) it could be error because behavior is
really dangerous. But if you start from correct cache line only warning
in debug mode makes sense (IMHO).

3. Stop/end address in contrast might need to be extended depending on
HW implementation (see above comment).

And here's your flush procedure:
===========
void v7_outer_cache_flush_range(u32 start, u32 stop)
{
	/* PL310 currently supports only 32 bytes cache line */
	u32 pa, line_size = 32;

	/*
	 * Align to the beginning of cache-line - this ensures that
	 * the first 5 bits are 0 as required by PL310 TRM
	 */
	start &= ~(line_size - 1);

	for (pa = start; pa < stop; pa = pa + line_size)
		writel(pa, &pl310->pl310_clean_inv_line_pa);

	pl310_cache_sync();
}
===========

Which looks very correct to me. I'm wondering if there was a reason to
have so different implementation of functions that do very similar
things.

So at this point I would ask you to modify cache invalidate function for
your architecture. This way you prevent mentioned issues with other
drivers.

> I'm not seeing any others, in practice or by eye-balling the code.
> 
> > 4. Why don't you squeeze all 3 patches in 1 and name it like "fix
> > alignment issues with caches on some platforms"? Basically with all 3
> > patches you fix one and only issue and application of any one of those 3
> > patches doesn't solve your problem, right?
> 
> These are the issues as I discovered them one by one. I can fold them if
> you like but doing them separately will aid bisection if one of them
> turns out to be wrong in some way. As you prefer.

Keeping in mind things written above I'd say that patches 2 & 3 are not
needed at all, while patch 1 makes perfect sense and fixes an obvious
issue.

Regards,
Alexey