32-bit DMA limit for devices (and drivers)

Jernej Škrabec jernej.skrabec at siol.net
Fri Apr 30 18:31:55 CEST 2021


Hi!

Dne petek, 30. april 2021 ob 15:34:28 CEST je Andre Przywara napisal(a):
> On Fri, 30 Apr 2021 14:02:52 +0200 (CEST)
> Mark Kettenis <mark.kettenis at xs4all.nl> wrote:
> 
> Hi Mark,
> 
> thanks for the reply!
> 
> (CC:ing Alex and Heinrich for the UEFI questions below)
> 
> > > Date: Fri, 30 Apr 2021 12:21:21 +0100
> > > From: Andre Przywara <andre.przywara at arm.com>
> > > 
> > > Hi,
> > > 
> > > We now see the first Allwinner devices [1] having DRAM located above
> > > 4GB in address space (4GB DRAM starting at 1GB). After one fix[2]
> > > this works somewhat fine, but the sun8i-emac network device is still
> > > limited to 32-bit DMA addresses. With U-Boot relocating itself (plus
> > > stack and heap) to the end of DRAM, it now runs completely beyond 4GB
> > > on those machines, so not giving pure 32-bit addresses for buffers
> > > anymore.
> > > In Linux we handle this easily by just keeping the default DMA
> > > mask at 32 bits, and letting the DMA framework deal with the nasty
> > > details.
> > > 
> > > I was wondering how this should be handled in U-Boot? The straight
> > > forward solution would be:
> > > - Let the driver allocate the RX and TX buffers separately, placing them
> > > 
> > >   below 4GB in the address space (using lmb_reserve(), I guess?)
> > > 
> > > - Use those RX buffers and hand the addresses back to the upper layers.
> > > - We already copy TX packets, so this would also be covered, in this
> > > 
> > >   situation. Other drivers might need to introduce copying.
> > 
> > What you describe here is called a bounce buffer approach.  I believe
> > Linux developers also refer to this as swiotlb.
> 
> Yes, but it's not entirely the same as bounce buffering in Linux,
> since we allocate the buffers ourselves, in the driver, so we have full
> control over it. The problem I face is that malloc() works on the heap
> (which is high), or we use the automatic priv_alloc mechanism, which
> uses the heap as well, IIUC.
> 
> > > This sounds like a common problem, so I was wondering if there is a
> > > more generic solution to this? Maybe there are already platforms or
> > > devices affected? Or should the whole heap and stack be moved below 4GB
> > > (if this is easily possible)?
> > > In our case we make the buffers part of our priv struct, so should
> > > there be an option to let the priv_auto allocation come from below 4GB?
> > > 
> > > Grateful for any input on this!
> > 
> > I looked into this a bit when I was trying to figure out what to do on
> > Apple M1 systems where I have a somewhat related issue.  These systems
> > have an IOMMU that can't be bypassed.  Since I don't want to add IOMMU
> > infrastructure to U-Boot, I set up the IOMMU to map a fixed block of
> > physical memory and make sure that all allocations of memory come from
> > that block of memory.  In this case this is fairly easy to achieve.
> > U-Boot allocates memory from the top of usable memory, so as long as I
> > let the IOMMU map that high memory, things work.  U-Boot doesn't need
> > a lot of memory, so a block of 512MB is more than sufficient.
> 
> I'd rather not play around with the visible memory size (see below).
> And while technically there is a (scatter/gather) IOMMU in the SoC, it
> would be too big guns for that small problem.

IOMMU is connected only to video related cores, so it's not an option here.

Best regards,
Jernej

> 
> > In your case this means that as long as you set the top of usable
> > memory to an address < 4G, U-Boot itself should be fine and no bounce
> > buffers are needed.  You have to make sure the addresses in the U-Boot
> > environment for loading things like the kernel and the FDT are set to
> > an address < 4G as well.
> > 
> > For EFI things are different though.  You want to expose all physical
> > memory in the EFI memory map.
> 
> Not only for UEFI, since U-Boot populates the DT memory node even for
> booti/bootm, in arch/arm/lib/bootm-fdt.c:arch_fixup_fdt().
> So limiting the memory is not an option, since this would be passed on
> to the OS.
> 
> > This means that an EFI application
> > (such as an OS loader) may pick memory > 4G and use it to do I/O.
> 
> I think we should be safe here, as the driver has full control over the
> buffers: For TX we copy already, to use "fire-and-forget", so we
> just start the DMA and return. And for RX U-Boot network drivers
> return the buffer address, so it's our own buffer again. So wherever
> higher layers put the packets, we should be good (given our own buffers
> are).
> 
> 
> So I guess my question boils down to: How can I best allocate buffers
> from "low" memory? And do those buffers carveouts make it into the UEFI
> memory map, as reserved regions? Or can UEFI differentiate between
> boot services and runtime services allocations? The buffers would be
> needed during boot services, for the UEFI network protocol. But later
> on they can be abandoned.
> 
> > this purpose U-Boot already implements bounce buffers.  See the
> > CONFIG_EFI_LOADER_BOUNCE_BUFFER option.
> 
> Interesting, thanks, I will have a look at that. Maybe that contains
> some useful traces to other code.
> 
> Cheers,
> Andre






More information about the U-Boot mailing list