[U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations

Tue Mar 17 15:57:21 CET 2015

On 17/03/15 03:04, Stephen Warren wrote:
> It would be nice though if someone from the RPi Foundation could comment
> on the exact effect of the upper bus address bits, and why 0xc would
> work for RPi2 but 0x4 for the RPi 1. I wonder if the ARM cache status
> (enabled, disabled) interacts with the GPU cache enable in any way, e.g.
> burst vs. non-burst transactions on the bus or something? That's about
> the only reason I can see for the RPi Foundation kernel working with 0x4
> bus addresses on both chips, but U-Boot needing something different on
> RPi2...
>
> Dom, for reference, see:
> http://lists.denx.de/pipermail/u-boot/2015-March/207947.html
> http://lists.denx.de/pipermail/u-boot/2015-March/thread.html#207947

First, remember that 2835 is a large GPU with a small ARM attached. On some platforms the ARM is not even used.
The GPU boots first and may wake the arm. The GPU is the centre of the universe, and the ARM has to fit in.

Okay, I'll try to explain what goes on. Here are my definitions of some terms:

bus address: a VideoCore/GPU address. The lower 30-bits define the 1G of addressable memory. The top two bits define the caching alias.
physical address: An ARM side address given to the VC MMU. This is a 30 bit address space.

The GPU always uses bus addresses. GPU bus mastering peripherals (like DMA) use bus addresses. The ARM uses physical addresses.

VC MMU: A coarse MMU used by the arm for accessing GPU memory. Each page is 16M and there are 64 pages. This maps 30-bits of physical address to 32-bits of bus address.
The setup of VC MMU is handled by the GPU and by default the mapping is:
2835: first 32 pages map physical addresses 0x00000000-0x1fffffff to bus addresses 0x40000000-0x5ffffffff. The next page maps physical adddress 0x20000000 to 0x20ffffff to bus addresses 0x7e000000 to 0x7effffff
2836: first 63 pages map physical addresses 0x00000000-0x3effffff to bus addresses 0xc0000000-0xfefffffff. The next page maps physical adddress 0x3f000000 to 0x3fffffff to bus addresses 0x7e000000 to 0x7effffff

Bus address 0x7exxxxxx contains the peripherals.
Note: the top 16M of sdram is not visible to the arm due the mapping of the peripherals. The GPU and GPU peripherals (DMA) can see it as they use bus addresses

The bus address cache alias bits are:

 From the VideoCore processor:
0x0 L1 and L2 cache allocating and coherent
0x4 L1 non-allocating, but coherent. L2 allocating and coherent
0x8 L1 non-allocating, but coherent. L2 non-allocating, but coherent
0xc SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent

 From the GPU peripherals (note: all peripherals bypass the L1 cache. The arm will see this view once through the VC MMU):
0x0 Do not use
0x4 L1 non-allocating, and incoherent. L2 allocating and coherent.
0x8 L1 non-allocating, and incoherent. L2 non-allocating, but coherent
0xc SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent

In general as long as VideoCore processor and GPU peripherals use the same alias everything works out. Mixing aliases requires flushing/invalidating for coherency and is generally avoided.

So, on 2835 the ARM has a 16K L1 cache and no L2 cache. The GPU has a 128M L2 cache. The GPU's L2 cache is accessible from the ARM but it's not particularly close (i.e. not very fast).
However mapping through the L2 allocating alias (0x4) was shown to be beneficial on 2835, so that is the alias we use.

The situation is different on 2836. The ARM has a 32K L1 cache and a 512M integrated/fast L2 cache. Additionally going through the smaller/slower GPU L2 is bad for performance.
So, we map through the SDRAM alias (0xc) and avoid the GPU L2 cache.

So, what does this mean? In general if you don't use GPU peripherals or communicate with the GPU, you only care about physical addresses and it makes no difference what bus address is actually being used.
The ARM just sees 1G of physical space that is always coherent. No flushing of GPU L2 cache is ever required. No need to know about aliases.

However if you do want to use GPU bus mastering peripherals (like DMA), or you communicate with the GPU (e.g. using the mailbox interface) you do need to distinguish physical and bus addresses, and you must use the correct alias.

So, on 2835 you convert from physical to bus address with
   bus_address = 0x40000000 | physical_address;
And on 2836 you convert from physical to bus address with
   bus_address = 0xC0000000 | physical_address;

(Note: you can get these offsets from device tree. See: https://github.com/raspberrypi/userland/commit/3b81b91c18ff19f97033e146a9f3262ca631f0e9#diff-c65a4fe18bb33aed0fc9536339f06b80R168)

So, when using GPU DMA, the addresses used for SCB, SA (source address), DA (dest address) must never be zero. They should be bus addresses and therefore 0x4 or 0xc aliases.
However the difference between a 0x0 alias and a 0x4 alias is small. Using 0x0 is wrong, may be incoherent, and may trigger exceptions on the GPU. But you may get away with it.
The difference between a 0x0 alias and a 0xC alias is much larger. There is now 128K of incoherent data you may hit. You are less likely to get away with getting this wrong.

So, I don't believe there is any issue with:
>ARM cache status (enabled, disabled) interacts with the GPU cache enable in any way, e.g. burst vs. non-burst transactions on the bus or something

but I would guess there may be a current bug/misunderstanding on Pi1 uboot that happens to be more fatal on Pi2.