[U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues

Wed Aug 20 21:12:20 CEST 2014

On 08/18/2014 02:00 AM, Thierry Reding wrote:
> From: Thierry Reding <treding at nvidia.com>
>
> This series attempts to fix a long-standing problem in the rtl8169 driver
> (though the same problem may exist in other drivers as well). Let me first
> explain what exactly the issue is:
>
> The rtl8169 driver provides a set of RX and TX descriptors for the device to
> use. Once they're set up, the device is told about their location so that it
> can fetch the descriptors using DMA. The device will also write packet state
> back into these descriptors using DMA. For this to work properly, whenever a
> driver needs to access these descriptors it needs to invalidate the D-cache
> line(s) associated with them. Similarly when changes to the descriptor have
> been made by the driver, the cache lines need to be flushed to make sure the
> changes are visible to the device.
>
> The descriptors are 16 bytes in size. This causes problems when used on CPUs
> that have a cache-line size that is larger than 16 bytes. One example is the
> NVIDIA Tegra124 which has 64-byte cache-lines. That means that 4 descriptors
> fit into a single cache-line. So whenever the driver flushes a cache-line it
> has the potential to discard changes made to another descriptor by the DMA
> device. One typical symptom is that large transfers over TFTP will often not
> complete and hang somewhere midway because a device marked a packet received
> but the driver flushing the cache and causing the packet to be lost.
>
> Since the descriptors need to be consecutive in memory, I don't see a way to
> fix this other than to use uncached memory. Therefore the solution proposed
> in this patch series is to introduce a mechanism in U-Boot to allow a driver
> to allocate from a pool of uncached memory. Currently an implementation is
> provided only for ARM v7. The idea is that a region (of user-definable size)
> immediately below (taking into account architecture-specific alignment
> restrictions) the malloc() area is mapped uncacheable in the MMU. A driver
> can use the new noncached_alloc() function to allocate a chunk of memory
> from this pool dynamically for buffers that it can't or doesn't want to do
> any explicit cache-maintainance on, yet needs to be shared with DMA devices.
>
> Patches 1-3 are minor preparatory work. Patch 1 cleans up some coding style
> issues in the ARM v7 cache code and patch 2 uses more future-proof types for
> the mmu_set_region_dcache_behaviour() function arguments. Patch 3 is purely
> for debugging purposes. It will print out the region used by malloc() when
> DEBUG is enabled. This can be useful to see where the malloc() region is in
> the memory map (compared to the noncached region introduced in a later patch
> for example).
>
> Patch 4 implements the noncached API for ARM v7. It obtains the start of the
> malloc() area and places the noncached region immediately below it so that
> noncached_alloc() can allocate from it. During boot, the noncached area will
> be set up immediately after malloc().
>
> Patch 5 enables noncached memory for all Tegra boards. It uses a 1 MiB chunk
> which should be plenty (it's also the minimum on ARM v7 because it matches
> the MMU section size and therefore the granularity at which U-Boot can set
> the cacheable attributes).

If LPAE were to be enabled, the minimum would be 2MiB, but I suppose we 
can deal with that if/when the time comes.