QEMU NUMA and U-Boot

AKASHI Takahiro takahiro.akashi at linaro.org
Wed Jul 7 12:16:15 CEST 2021


On Wed, Jul 07, 2021 at 11:37:19AM +0200, Fran??ois Ozog wrote:
> On Wed, 7 Jul 2021 at 09:40, François Ozog <francois.ozog at linaro.org> wrote:
> 
> > On Wed, 7 Jul 2021 at 05:59, Heinrich Schuchardt <xypron.glpk at gmx.de>
> > wrote:
> > >
> > > Am 7. Juli 2021 05:18:20 MESZ schrieb Heinrich Schuchardt <
> > xypron.glpk at gmx.de>:
> > > >Am 7. Juli 2021 03:44:35 MESZ schrieb AKASHI Takahiro
> > > ><takahiro.akashi at linaro.org>:
> > > >>François,
> > > >>
> > > >>On Tue, Jul 06, 2021 at 08:10:08PM +0200, Heinrich Schuchardt wrote:
> > > >>> On 7/6/21 6:13 PM, François Ozog wrote:
> > > >>> > Hi Heinrich, U-Boot 2021-07rc5 does not take into account memory
> > > >>> > description when using Qemu 5.2 NUMA configuration to adapt memory
> > > >>map
> > > >>> > (kernel_addr_r...):
> > > >>> >
> > > >>> >         -smp 4 \
> > > >>> >           -m 8G,slots=2,maxmem=16G \
> > > >>> >          -object memory-backend-ram,size=4G,id=m0 \
> > > >>> >          -object memory-backend-ram,size=4G,id=m1 \
> > > >>> >          -numa node,cpus=0-1,nodeid=0,memdev=m0 \
> > > >>> >          -numa node,cpus=2-3,nodeid=1,memdev=m1
> > > >>> >
> > > >>> > kernel_addr_r is still 0x4040000 and thus you can't use it to
> > > >>bootefi.
> > > >>> >
> > > >>> > fdt addr 0x13ede6de0; fdt print
> > > >>> >
> > > >>> > Displays fdt while I think it should not.
> > > >>> >
> > > >>> > If I load the kernel at dram.start, the load works but not boot
> > > >>> >
> > > >>> > U-Boot 2021.07 (Jul 06 2021 - 13:26:43 +0000)
> > > >>> >
> > > >>> >
> > > >>> > DRAM:4 GiB
> > > >>> >
> > > >>> > Flash: 64 MiB
> > > >>> >
> > > >>> > Loading Environment from Flash... OK
> > > >>> >
> > > >>> > In:pl011 at 9000000
> > > >>> >
> > > >>> > Out: pl011 at 9000000
> > > >>> >
> > > >>> > Err: pl011 at 9000000
> > > >>> >
> > > >>> > Net: eth0: virtio-net#32
> > > >>> >
> > > >>> > Hit any key to stop autoboot:0
> > > >>> >
> > > >>> > =>
> > > >>> >
> > > >>> > => bdinfo
> > > >>> >
> > > >>> > boot_params = 0x0000000000000000
> > > >>> >
> > > >>> > DRAM bank = 0x0000000000000000
> > > >>> >
> > > >>> > -> start= 0x0000000140000000
> > > >>> >
> > > >>> > -> size = 0x0000000100000000
> > > >>> >
> > > >>> > flashstart= 0x0000000000000000
> > > >>> >
> > > >>> > flashsize = 0x0000000004000000
> > > >>> >
> > > >>> > flashoffset = 0x00000000000bc990
> > > >>> >
> > > >>> > baudrate= 115200 bps
> > > >>> >
> > > >>> > relocaddr = 0x000000013ff27000
> > > >>> >
> > > >>> > reloc off = 0x000000013ff27000
> > > >>> >
> > > >>> > Build = 64-bit
> > > >>> >
> > > >>> > current eth = virtio-net#32
> > > >>> >
> > > >>> > ethaddr = 52:52:52:52:52:52
> > > >>> >
> > > >>> > IP addr = <NULL>
> > > >>> >
> > > >>> > fdt_blob= 0x000000013ede6de0
> > > >>> >
> > > >>> > new_fdt = 0x000000013ede6de0
> > > >>> >
> > > >>> > fdt_size= 0x0000000000100000
> > > >>> >
> > > >>> > lmb_dump_all:
> > > >>> >
> > > >>> > memory.cnt= 0x1
> > > >>> >
> > > >>> > memory.reg[0x0].base = 0x140000000
> > > >>> >
> > > >>> > .size = 0x100000000
> > > >>> >
> > > >>> >
> > > >>> > reserved.cnt= 0x0
> > > >>> >
> > > >>> > arch_number = 0x0000000000000000
> > > >>> >
> > > >>> > TLB addr= 0x000000013fff0000
> > > >>> >
> > > >>> > irq_sp= 0x000000013ede6dd0
> > > >>> >
> > > >>> > sp start= 0x000000013ede6dd0
> > > >>> >
> > > >>> > Early malloc usage: 3a8 / 2000
> > > >>> >
> > > >>> > => load virtio 0:1 0x140000000 /oskit.efi
> > > >>> >
> > > >>> > 853424 bytes read in 1 ms (813.9 MiB/s)
> > > >>> >
> > > >>> > => bootefi0x140000000 0x13ede6dd0
> > > >>> >
> > > >>> > ERROR: Failed to register WaitForKey event
> > > >>> >
> > > >>> > Setting OsIndications failed
> > > >>> >
> > > >>> > Error: Cannot initialize UEFI sub-system, r = 9
> > > >>> >
> > > >>> >
> > > >>> > I think there is a need to calculate memory map based on previous
> > > >>> > firmware (TFA, QEMU can be considered as previous frimware)
> > > >>information
> > > >>> > (DT or blob_list).
> > > >>> >
> > > >>> > What do you think ?
> > > >>> >
> > > >>> > Cheers
> > > >>> >
> > > >>> > FF
> > > >>> >
> > > >>> > --
> > > >>> >
> > > >>> > François-Frédéric Ozog | /Director Business Development/
> > > >>> > T: +33.67221.6485
> > > >>> > francois.ozog at linaro.org <mailto:francois.ozog at linaro.org>
> > > >>| Skype: ffozog
> > > >>> >
> > > >>> >
> > > >>>
> > > >>> The kernel load address is hard coded here:
> > > >>> include/configs/qemu-arm.h:41:  "kernel_addr_r=0x40400000\0" \
> > > >>>
> > > >>> bdinfo shows:
> > > >>> DRAM start = 0x140000000
> > > >>> DRAM size  = 0x100000000
> > > >>>
> > > >>> fdt addr $fdt_addr
> > > >>> fdt printf
> > > >>>
> > > >>> shows two memory areas. One at 40000000, one at 140000000.
> > > >>
> > > >>(This shows that U-Boot receives a correct memory map via dtb.)
> > > >>
> > > >>Is this a NUMA machine, isn't it? Why should we care of which
> > > >>memory region be used here? Please note that this is a virtual
> > > >machine,
> > > >>there is no practical difference between two regions.
> > > >>
> > > >>The root problem is that U-Boot did not recognize there were two
> > > >>memory regions. We can fix this issue in either way:
> > > >>
> > > >>1)
> > > >>diff --git a/configs/qemu_arm64_defconfig
> > > >>b/configs/qemu_arm64_defconfig
> > > >>index f6e586627a8e..b70ffae8bf6e 100644
> > > >>--- a/configs/qemu_arm64_defconfig
> > > >>+++ b/configs/qemu_arm64_defconfig
> > > >>@@ -1,7 +1,7 @@
> > > >> CONFIG_ARM=y
> > > >> CONFIG_POSITION_INDEPENDENT=y
> > > >> CONFIG_ARCH_QEMU=y
> > > >>-CONFIG_NR_DRAM_BANKS=1
> > > >>+CONFIG_NR_DRAM_BANKS=2
> > > >> CONFIG_ENV_SIZE=0x40000
> > > >> CONFIG_ENV_SECT_SIZE=0x40000
> > > >> CONFIG_AHCI=y
> > > >>
> > > >>2)
> > > >>diff --git a/lib/fdtdec.c b/lib/fdtdec.c
> > > >>index 4b097fb588ed..4067ea2dead6 100644
> > > >>--- a/lib/fdtdec.c
> > > >>+++ b/lib/fdtdec.c
> > > >>@@ -1111,7 +1111,7 @@ int fdtdec_setup_memory_banksize(void)
> > > >>                return -EINVAL;
> > > >>        }
> > > >>
> > > >>-       for (bank = 0; bank < CONFIG_NR_DRAM_BANKS; bank++) {
> > > >>+       for (bank = 0; ; bank++) {
> > > >>                ret = ofnode_read_resource(mem, reg++, &res);
> > > >>                if (ret < 0) {
> > > >>                        reg = 0;
> > > >>
> > > >>   (fdtdec_setup_memory_banksize() is called in dram_init_banksize().)
> > > >>
> > > >>
> > > >>(2) seems much better, but I don't know why we had to use
> > > >>CONFIG_NR_DRAM_BANKS here.
> > > >>
> >
> > 2) alone does not work as other places in the code refer to
> > CONFIG_NR_DRAM_BANKS. Setting ...BANKS to 32 makes my code work and
> > bdinfo seems now correct:
> >
> => bdinfo
> > boot_params = 0x0000000000000000
> > DRAM bank   = 0x0000000000000000
> > -> start    = 0x0000000140000000
> > -> size     = 0x0000000100000000
> > DRAM bank   = 0x0000000000000001
> > -> start    = 0x0000000040000000
> > -> size     = 0x0000000100000000
> > flashstart  = 0x0000000000000000
> > flashsize   = 0x0000000004000000
> > flashoffset = 0x00000000000bcb88
> > baudrate    = 115200 bps
> > relocaddr   = 0x000000013ff27000
> > reloc off   = 0x000000013ff27000
> > Build       = 64-bit
> > current eth = virtio-net#32
> > ethaddr     = 52:52:52:52:52:52
> > IP addr     = <NULL>
> > fdt_blob    = 0x000000013ede6cf0
> > new_fdt     = 0x000000013ede6cf0
> > fdt_size    = 0x0000000000100000
> > lmb_dump_all:
> >     memory.cnt   = 0x1
> >     memory.reg[0x0].base   = 0x40000000
> >   .size   = 0x200000000
> >     reserved.cnt   = 0x1
> >     reserved.reg[0x0].base = 0x13ede58f0
> >     .size = 0x121a710
> > arch_number = 0x0000000000000000
> > TLB addr    = 0x000000013fff0000
> > irq_sp      = 0x000000013ede6ce0
> > sp start    = 0x000000013ede6ce0
> > Early malloc usage: 3a8 / 2000
> >
> > May I suggest you propose a combined patch Akashi-san? If we assume
> > NUMA systems to be tested up to 8 nodes to mimic real existing
> > enterprise hardware and up to 4 memory slots (say for memory hot
> > plugging tests) what about a default value of 32? Alternatively, we
> > could set this value to a much higher one if the costs are negligible.
> >
> >
> > Well, lets not rush as there are other twists:
> 
> the 4G bank in node 1 is marked BootServicesData in the UEFI GetMemoryMap
> which I assume is not the case. EDK2 reports it as ConventionalMemory.
> 
> The root cause seem to be gd->ramtop not being setup properly.
> 
> Further analysis shows that the DT passed to the booted EFI payload does
> not seem to be correct:
> 
> DT fragment passed to U-Boot
> 
> memory at 140000000 {
> numa-node-id = <0x00000001>;
> reg = <0x00000001 0x40000000 0x00000001 0x00000000>;
> device_type = "memory";
> };
> memory at 40000000 {
> numa-node-id = <0x00000000>;
> reg = <0x00000000 0x40000000 0x00000001 0x00000000>;
> device_type = "memory";
> };
> 
> DT passed to payload (as per my debug code):
> 
> memory at 140000000: memory
> 
>     numa-node-id 1
> 
>     reg (len= 32)
> 
>          140000000 100000000
> 
>          40000000 100000000
> 
> memory at 40000000: memory
> 
>     numa-node-id 0
> 
>     reg (len= 16)
> 
>          40000000 100000000
> 
> I am investigating this further...

You should check the logic of fdt_fixup_memory_banks()
which is called this way:
  efi_dt_fixup()
    image_setup_libfdt()
      arch_fixup_fdt()
        fdt_fixup_memory_banks()

What it does is to put *all* the memory regions unconditionally as
a single "reg" array into the *first-detected* "memory" node, which is
"memory at 140000000" in this case.
It means that this function doesn't respect NUMA configuration.

-Takahiro Akashi


> > >>In this case, other occurrences of CONFIG_NR_DRAM_BANKS in this file
> > > >>should be replaced with a variable for it.
> > > >>
> > > >>> Your use case is well beyond the typical U-Boot usage. So I guess it
> > > >>> will be up to Linaro to provide the necessary patches:
> > > >>>
> > > >>> * determine the active CPU
> > > >>> * determine the RAM assigned to the active CPU according
> > > >>>   to the numa-node-id in the device-tree
> > > >>> * make sure that U-Boot only uses the memory of the active CPU
> > > >>>   internally
> > > >>> * make sure that the UEFI memory map contains a compliant
> > > >description
> > > >>> * possibly, dynamically set up the environment variables
> > > >>>
> > > >>> +CC Tuomas Tynkkynen (maintainer for qemu_arm64_defconfig)
> > > >>
> > > >>For (1), we'd better have a different config, or increase
> > > >>the value of CONFIG_NR_DRAM_BANKS to a bigger number?
> > > >
> > > >Is the system configured such that each CPU can access the others CPU's
> > > >RAM when entering U-Boot?
> > > >
> > > >Best regards
> > > >
> > > >Heinrich
> > > >
> > >
> > > At least the comments for this patch sound as if on a physical system
> > cross NUMA node memory access is only available after full SMP
> > initialization:
> > >
> > >
> > https://patchwork.kernel.org/project/linux-acpi/patch/20180625130552.5636-1-lorenzo.pieralisi@arm.com/
> > >
> > > QEMU may be less restrictive.
> > >
> > > QEMU allows the node distance to be 255 indicating that cross node
> > access is infeasible.
> > >
> > > Best regards
> > >
> > > Heinrich
> > >
> > > >>
> > > >>-Takahiro Akashi
> > > >>
> > > >>
> > > >>> Best regards
> > > >>>
> > > >>> Heinrich
> > >
> >
> >
> > --
> > François-Frédéric Ozog | Director Business Development
> > T: +33.67221.6485
> > francois.ozog at linaro.org | Skype: ffozog
> >
> 
> 
> -- 
> François-Frédéric Ozog | *Director Business Development*
> T: +33.67221.6485
> francois.ozog at linaro.org | Skype: ffozog


More information about the U-Boot mailing list