[U-Boot] [PATCH v5 02/19] usb: dwc2: Use separate input and output buffers

BrĂ¼ns, Stefan Stefan.Bruens at rwth-aachen.de
Mon Apr 3 14:26:36 UTC 2017


On Montag, 3. April 2017 01:23:17 CEST you wrote:
> Hi Stefan,
> 
> On 2 April 2017 at 15:34, Stefan Bruens <stefan.bruens at rwth-aachen.de> 
wrote:
> > On Sonntag, 2. April 2017 17:43:38 CEST Simon Glass wrote:
> >> Hi Stefan,
> >> 
> >> On 2 April 2017 at 07:10, Stefan Bruens <stefan.bruens at rwth-aachen.de>
> > 
> > wrote:
> >> > On Sonntag, 2. April 2017 05:01:41 CEST Marek Vasut wrote:
> >> >> On 04/02/2017 01:40 AM, Simon Glass wrote:
> >> >> > Hi Marek,
> >> >> > 
> >> >> > On 1 April 2017 at 14:15, Marek Vasut <marex at denx.de> wrote:
> >> >> >> On 04/01/2017 08:05 PM, Simon Glass wrote:
> >> >> >>> On Raspberry Pi 2 and 3 a problem was noticed when enabling driver
> >> >> >>> model
> >> >> >>> for USB: the cache invalidate after an incoming transfer does not
> >> >> >>> seem
> >> >> >>> to
> >> >> >>> work correctly.
> >> >> >>> 
> >> >> >>> This may be a problem with the underlying caching implementation
> >> >> >>> on
> >> >> >>> armv7
> >> >> >>> and armv8 but this seems very unlikely. As a work-around, use
> >> >> >>> separate
> >> >> >>> buffers for input and output. This ensures that the input buffer
> >> >> >>> will
> >> >> >>> not
> >> >> >>> hold dirty cache data.
> >> >> >> 
> >> >> >> What do you think of this patch:
> >> >> >> [U-Boot] usb: dwc2: invalidate the dcache before starting the DMA
> >> >> > 
> >> >> > Yes that matches what I did as a hack. I didn't realise that the DMA
> >> >> > would go through the cache. Thanks for the pointer.
> >> >> 
> >> >> DMA should not go through the cache. I have yet to review that patch,
> >> >> but IMO it's relevant to this problem you observe.
> >> > 
> >> > DMA transfers not going through the cache is probably the problem here:
> >> > 
> >> > Assume we have the aligned_buffer at address 0xdead0000
> >> > 
> >> > 1. The cpu writes to address 0xdead0002. This is fine, as it is the
> >> > current
> >> > owner of the address. The cacheline is marked dirty.
> >> > 2. The cpu no longer needs the corresponding address range, and it is
> >> > reallocated (i.e. freed and then allocated from dwc2) or reused (i.e.
> >> > formerly out buffer, now in buffer).
> >> > 3. The CPU starts the DMA transfer
> >> > 4. The DMA transfer writes to e.g. 0xdead0000-0xdead0200 in memory.
> >> > 5. The CPU fetches an address aliasing with 0xdead0000. The dirty cache
> >> > line is evicted, and the 0xdead0000-0xdead0040 memory contents are
> >> > overwritten.
> >> 
> >> This is the part I don't understand. This should be an invalidate, not
> >> a clean and invalidate, so there should be not memory write.
> >> 
> >> Also if the CPU fetches from cached 0xdead0000 without an invalidate,
> >> it will not cause a cash clean. It will simple read the data from the
> >> cache and ignore what the DMA wrote.
> > 
> > The CPU does not fetch 0xdead0000, but from an address *aliasing* with
> > 0xdead000. As 0xdead0000 is *dirty* (we have neither flushed (clears dirty
> > bit) or invalidated (implicitly clears dirty for the address)), the cache
> > controller has to write out the 0xdead0000 cache line to memory.
> 
> That doesn't make any sense to me. Can you explain it a bit more?
> 
> If the CPU fetches from a cache-alias of 0xdead0000, say 0xa11a5000
> then I expect the cache line would contain the data for that alias,
> not for 0xdead0000.

The important part is the dirty flag in the 0xdead0000 cacheline. By reading 
an aliasing address, you are causing eviction of the current cache line 
contents, and writing back its contents to memory. Reading of an address may 
cause write of a different address. You can see it as an dcache_flush_range 
done by the cache controller.

I think you are assuming a write-through cache here, which leads to your 
confusion.

> So a later invalidate of 0xdead0000 should at most
> clean the cache line and write to memory at 0xa11a5000. If it were to
> write cached data intended for 0xa11a5000 to memory at 0xdead0000,
> then surely this would be a bug?

After the cache line for 0xdead0000 has been evicted, any flush/invalidate 
operations are noops for that address.

> Therefore I cannot see the situation where the CPU should write to
> 0xdead0000 when that address is invalidated.

It is not the invalidation which causes the write, but eviction from the 
cache.

 > >> On armv8 we appear not to suppose invalidate in the code, so it makes
> >> sense for rpi_3.
> >> 
> >> But for rpi_2 which seems to do a proper invalidate, I still don't see
> >> the problem.
> > 
> > Which part of the code is different between rpi2 and rpi3? The dwc2 code
> > is
> > identical, is the memory invalidated in some other place?
> 
> It is the invalidate_dcache_range() function.

Thats for pointing that out, for anyone not having read the code:

ARMv7 has different operations for flush_dcache_range and 
invalidate_dcache_range, the former does a writeback of dirty cache lines and 
sets the invalid tags for the corresponding cache lines, the latter only does 
the invalidate part.

ARMv8 does a writeback for both operations. I assume thats what you call 
"improper".

The important part is, calling flush multiple times in a row is *exactly* the 
same thing as calling flush once. Calling flush instead of invalidate is the 
same thing *if* the dirty flag is not set, as the writeback part is skipped in 
that case.
 
> >> > Obviously, the dirty cache line from (1.) has to be cleared at the
> >> > beginning of (3.), as Eddys patch does.
> >> 
> >> But I still don't understand why we have to clean instead of just
> >> invalidate?
> > 
> > The patch by Eddie Cai just does an invalidate_dcache_range on the
> > transfer
> > buffer, nothing else. Where do you see a "clean" (whatever that refers
> > to)?
> 
> In the armv8  invalidate_dcache_range() function.

The writeback does not happen, as the cacheline is not dirty. It should not 
even be in the cache after invalidate has been called once.

We have to make sure the buffers address range is not in the cache prior to 
starting the DMA. We can either use invalidate_dcache_range or 
flush_dcache_range to guarantee this. Which one we use does not matter here, 
although invalidate only is typically a little bit more lightweight.

Kind regards,

Stefan



More information about the U-Boot mailing list