[U-Boot] [PATCH v5 02/19] usb: dwc2: Use separate input and output buffers

Simon Glass sjg at chromium.org
Mon Apr 3 15:38:40 UTC 2017


Hi Stefan,

On 3 April 2017 at 08:26, BrĂ¼ns, Stefan <Stefan.Bruens at rwth-aachen.de> wrote:
> On Montag, 3. April 2017 01:23:17 CEST you wrote:
>> Hi Stefan,
>>
>> On 2 April 2017 at 15:34, Stefan Bruens <stefan.bruens at rwth-aachen.de>
> wrote:
>> > On Sonntag, 2. April 2017 17:43:38 CEST Simon Glass wrote:
>> >> Hi Stefan,
>> >>
>> >> On 2 April 2017 at 07:10, Stefan Bruens <stefan.bruens at rwth-aachen.de>
>> >
>> > wrote:
>> >> > On Sonntag, 2. April 2017 05:01:41 CEST Marek Vasut wrote:
>> >> >> On 04/02/2017 01:40 AM, Simon Glass wrote:
>> >> >> > Hi Marek,
>> >> >> >
>> >> >> > On 1 April 2017 at 14:15, Marek Vasut <marex at denx.de> wrote:
>> >> >> >> On 04/01/2017 08:05 PM, Simon Glass wrote:
>> >> >> >>> On Raspberry Pi 2 and 3 a problem was noticed when enabling driver
>> >> >> >>> model
>> >> >> >>> for USB: the cache invalidate after an incoming transfer does not
>> >> >> >>> seem
>> >> >> >>> to
>> >> >> >>> work correctly.
>> >> >> >>>
>> >> >> >>> This may be a problem with the underlying caching implementation
>> >> >> >>> on
>> >> >> >>> armv7
>> >> >> >>> and armv8 but this seems very unlikely. As a work-around, use
>> >> >> >>> separate
>> >> >> >>> buffers for input and output. This ensures that the input buffer
>> >> >> >>> will
>> >> >> >>> not
>> >> >> >>> hold dirty cache data.
>> >> >> >>
>> >> >> >> What do you think of this patch:
>> >> >> >> [U-Boot] usb: dwc2: invalidate the dcache before starting the DMA
>> >> >> >
>> >> >> > Yes that matches what I did as a hack. I didn't realise that the DMA
>> >> >> > would go through the cache. Thanks for the pointer.
>> >> >>
>> >> >> DMA should not go through the cache. I have yet to review that patch,
>> >> >> but IMO it's relevant to this problem you observe.
>> >> >
>> >> > DMA transfers not going through the cache is probably the problem here:
>> >> >
>> >> > Assume we have the aligned_buffer at address 0xdead0000
>> >> >
>> >> > 1. The cpu writes to address 0xdead0002. This is fine, as it is the
>> >> > current
>> >> > owner of the address. The cacheline is marked dirty.
>> >> > 2. The cpu no longer needs the corresponding address range, and it is
>> >> > reallocated (i.e. freed and then allocated from dwc2) or reused (i.e.
>> >> > formerly out buffer, now in buffer).
>> >> > 3. The CPU starts the DMA transfer
>> >> > 4. The DMA transfer writes to e.g. 0xdead0000-0xdead0200 in memory.
>> >> > 5. The CPU fetches an address aliasing with 0xdead0000. The dirty cache
>> >> > line is evicted, and the 0xdead0000-0xdead0040 memory contents are
>> >> > overwritten.
>> >>
>> >> This is the part I don't understand. This should be an invalidate, not
>> >> a clean and invalidate, so there should be not memory write.
>> >>
>> >> Also if the CPU fetches from cached 0xdead0000 without an invalidate,
>> >> it will not cause a cash clean. It will simple read the data from the
>> >> cache and ignore what the DMA wrote.
>> >
>> > The CPU does not fetch 0xdead0000, but from an address *aliasing* with
>> > 0xdead000. As 0xdead0000 is *dirty* (we have neither flushed (clears dirty
>> > bit) or invalidated (implicitly clears dirty for the address)), the cache
>> > controller has to write out the 0xdead0000 cache line to memory.
>>
>> That doesn't make any sense to me. Can you explain it a bit more?
>>
>> If the CPU fetches from a cache-alias of 0xdead0000, say 0xa11a5000
>> then I expect the cache line would contain the data for that alias,
>> not for 0xdead0000.
>
> The important part is the dirty flag in the 0xdead0000 cacheline. By reading
> an aliasing address, you are causing eviction of the current cache line
> contents, and writing back its contents to memory. Reading of an address may
> cause write of a different address. You can see it as an dcache_flush_range
> done by the cache controller.

OK so I think you are saying that reading from 0xa11a5000 causes dirty
data to be flushed from the cache to 0xdead0000. Makes perfect sense.
But why is there dirty data at 0xdead0000?

- If we did a write last time, then it would have been dirty until we
flushed the cache line, which we did before telling the DMA to start
up.

- If we did a read last time, then it is clean. We have read the data,
but not changed it.

What am I missing?

>
> I think you are assuming a write-through cache here, which leads to your
> confusion.

No that's a separate issue.

>
>> So a later invalidate of 0xdead0000 should at most
>> clean the cache line and write to memory at 0xa11a5000. If it were to
>> write cached data intended for 0xa11a5000 to memory at 0xdead0000,
>> then surely this would be a bug?
>
> After the cache line for 0xdead0000 has been evicted, any flush/invalidate
> operations are noops for that address.

OK good, so that's not the problem.

>
>> Therefore I cannot see the situation where the CPU should write to
>> 0xdead0000 when that address is invalidated.
>
> It is not the invalidation which causes the write, but eviction from the
> cache.
>
>  > >> On armv8 we appear not to suppose invalidate in the code, so it makes
>> >> sense for rpi_3.
>> >>
>> >> But for rpi_2 which seems to do a proper invalidate, I still don't see
>> >> the problem.
>> >
>> > Which part of the code is different between rpi2 and rpi3? The dwc2 code
>> > is
>> > identical, is the memory invalidated in some other place?
>>
>> It is the invalidate_dcache_range() function.
>
> Thats for pointing that out, for anyone not having read the code:
>
> ARMv7 has different operations for flush_dcache_range and
> invalidate_dcache_range, the former does a writeback of dirty cache lines and
> sets the invalid tags for the corresponding cache lines, the latter only does
> the invalidate part.
>
> ARMv8 does a writeback for both operations. I assume thats what you call
> "improper".

Yes, I believe it is wrong. Linux has a proper invalidate, why not U-Boot?

>
> The important part is, calling flush multiple times in a row is *exactly* the
> same thing as calling flush once. Calling flush instead of invalidate is the
> same thing *if* the dirty flag is not set, as the writeback part is skipped in
> that case.

Yes indeed.

>
>> >> > Obviously, the dirty cache line from (1.) has to be cleared at the
>> >> > beginning of (3.), as Eddys patch does.
>> >>
>> >> But I still don't understand why we have to clean instead of just
>> >> invalidate?
>> >
>> > The patch by Eddie Cai just does an invalidate_dcache_range on the
>> > transfer
>> > buffer, nothing else. Where do you see a "clean" (whatever that refers
>> > to)?
>>
>> In the armv8  invalidate_dcache_range() function.
>
> The writeback does not happen, as the cacheline is not dirty. It should not
> even be in the cache after invalidate has been called once.
>
> We have to make sure the buffers address range is not in the cache prior to
> starting the DMA. We can either use invalidate_dcache_range or
> flush_dcache_range to guarantee this. Which one we use does not matter here,
> although invalidate only is typically a little bit more lightweight.

Yes. Just to restate my assertion. It should be possible to:

- have some dirty data in the cache
- start up DMA
- invalidate that data
- read it

in that order. It should not be necessary to move the invalidate to
before the DMA start-up, right?

Regards,
Simon


More information about the U-Boot mailing list