QEMU NUMA and U-Boot

François Ozog francois.ozog at linaro.org
Wed Jul 7 11:37:19 CEST 2021


On Wed, 7 Jul 2021 at 09:40, François Ozog <francois.ozog at linaro.org> wrote:

> On Wed, 7 Jul 2021 at 05:59, Heinrich Schuchardt <xypron.glpk at gmx.de>
> wrote:
> >
> > Am 7. Juli 2021 05:18:20 MESZ schrieb Heinrich Schuchardt <
> xypron.glpk at gmx.de>:
> > >Am 7. Juli 2021 03:44:35 MESZ schrieb AKASHI Takahiro
> > ><takahiro.akashi at linaro.org>:
> > >>François,
> > >>
> > >>On Tue, Jul 06, 2021 at 08:10:08PM +0200, Heinrich Schuchardt wrote:
> > >>> On 7/6/21 6:13 PM, François Ozog wrote:
> > >>> > Hi Heinrich, U-Boot 2021-07rc5 does not take into account memory
> > >>> > description when using Qemu 5.2 NUMA configuration to adapt memory
> > >>map
> > >>> > (kernel_addr_r...):
> > >>> >
> > >>> >         -smp 4 \
> > >>> >           -m 8G,slots=2,maxmem=16G \
> > >>> >          -object memory-backend-ram,size=4G,id=m0 \
> > >>> >          -object memory-backend-ram,size=4G,id=m1 \
> > >>> >          -numa node,cpus=0-1,nodeid=0,memdev=m0 \
> > >>> >          -numa node,cpus=2-3,nodeid=1,memdev=m1
> > >>> >
> > >>> > kernel_addr_r is still 0x4040000 and thus you can't use it to
> > >>bootefi.
> > >>> >
> > >>> > fdt addr 0x13ede6de0; fdt print
> > >>> >
> > >>> > Displays fdt while I think it should not.
> > >>> >
> > >>> > If I load the kernel at dram.start, the load works but not boot
> > >>> >
> > >>> > U-Boot 2021.07 (Jul 06 2021 - 13:26:43 +0000)
> > >>> >
> > >>> >
> > >>> > DRAM:4 GiB
> > >>> >
> > >>> > Flash: 64 MiB
> > >>> >
> > >>> > Loading Environment from Flash... OK
> > >>> >
> > >>> > In:pl011 at 9000000
> > >>> >
> > >>> > Out: pl011 at 9000000
> > >>> >
> > >>> > Err: pl011 at 9000000
> > >>> >
> > >>> > Net: eth0: virtio-net#32
> > >>> >
> > >>> > Hit any key to stop autoboot:0
> > >>> >
> > >>> > =>
> > >>> >
> > >>> > => bdinfo
> > >>> >
> > >>> > boot_params = 0x0000000000000000
> > >>> >
> > >>> > DRAM bank = 0x0000000000000000
> > >>> >
> > >>> > -> start= 0x0000000140000000
> > >>> >
> > >>> > -> size = 0x0000000100000000
> > >>> >
> > >>> > flashstart= 0x0000000000000000
> > >>> >
> > >>> > flashsize = 0x0000000004000000
> > >>> >
> > >>> > flashoffset = 0x00000000000bc990
> > >>> >
> > >>> > baudrate= 115200 bps
> > >>> >
> > >>> > relocaddr = 0x000000013ff27000
> > >>> >
> > >>> > reloc off = 0x000000013ff27000
> > >>> >
> > >>> > Build = 64-bit
> > >>> >
> > >>> > current eth = virtio-net#32
> > >>> >
> > >>> > ethaddr = 52:52:52:52:52:52
> > >>> >
> > >>> > IP addr = <NULL>
> > >>> >
> > >>> > fdt_blob= 0x000000013ede6de0
> > >>> >
> > >>> > new_fdt = 0x000000013ede6de0
> > >>> >
> > >>> > fdt_size= 0x0000000000100000
> > >>> >
> > >>> > lmb_dump_all:
> > >>> >
> > >>> > memory.cnt= 0x1
> > >>> >
> > >>> > memory.reg[0x0].base = 0x140000000
> > >>> >
> > >>> > .size = 0x100000000
> > >>> >
> > >>> >
> > >>> > reserved.cnt= 0x0
> > >>> >
> > >>> > arch_number = 0x0000000000000000
> > >>> >
> > >>> > TLB addr= 0x000000013fff0000
> > >>> >
> > >>> > irq_sp= 0x000000013ede6dd0
> > >>> >
> > >>> > sp start= 0x000000013ede6dd0
> > >>> >
> > >>> > Early malloc usage: 3a8 / 2000
> > >>> >
> > >>> > => load virtio 0:1 0x140000000 /oskit.efi
> > >>> >
> > >>> > 853424 bytes read in 1 ms (813.9 MiB/s)
> > >>> >
> > >>> > => bootefi0x140000000 0x13ede6dd0
> > >>> >
> > >>> > ERROR: Failed to register WaitForKey event
> > >>> >
> > >>> > Setting OsIndications failed
> > >>> >
> > >>> > Error: Cannot initialize UEFI sub-system, r = 9
> > >>> >
> > >>> >
> > >>> > I think there is a need to calculate memory map based on previous
> > >>> > firmware (TFA, QEMU can be considered as previous frimware)
> > >>information
> > >>> > (DT or blob_list).
> > >>> >
> > >>> > What do you think ?
> > >>> >
> > >>> > Cheers
> > >>> >
> > >>> > FF
> > >>> >
> > >>> > --
> > >>> >
> > >>> > François-Frédéric Ozog | /Director Business Development/
> > >>> > T: +33.67221.6485
> > >>> > francois.ozog at linaro.org <mailto:francois.ozog at linaro.org>
> > >>| Skype: ffozog
> > >>> >
> > >>> >
> > >>>
> > >>> The kernel load address is hard coded here:
> > >>> include/configs/qemu-arm.h:41:  "kernel_addr_r=0x40400000\0" \
> > >>>
> > >>> bdinfo shows:
> > >>> DRAM start = 0x140000000
> > >>> DRAM size  = 0x100000000
> > >>>
> > >>> fdt addr $fdt_addr
> > >>> fdt printf
> > >>>
> > >>> shows two memory areas. One at 40000000, one at 140000000.
> > >>
> > >>(This shows that U-Boot receives a correct memory map via dtb.)
> > >>
> > >>Is this a NUMA machine, isn't it? Why should we care of which
> > >>memory region be used here? Please note that this is a virtual
> > >machine,
> > >>there is no practical difference between two regions.
> > >>
> > >>The root problem is that U-Boot did not recognize there were two
> > >>memory regions. We can fix this issue in either way:
> > >>
> > >>1)
> > >>diff --git a/configs/qemu_arm64_defconfig
> > >>b/configs/qemu_arm64_defconfig
> > >>index f6e586627a8e..b70ffae8bf6e 100644
> > >>--- a/configs/qemu_arm64_defconfig
> > >>+++ b/configs/qemu_arm64_defconfig
> > >>@@ -1,7 +1,7 @@
> > >> CONFIG_ARM=y
> > >> CONFIG_POSITION_INDEPENDENT=y
> > >> CONFIG_ARCH_QEMU=y
> > >>-CONFIG_NR_DRAM_BANKS=1
> > >>+CONFIG_NR_DRAM_BANKS=2
> > >> CONFIG_ENV_SIZE=0x40000
> > >> CONFIG_ENV_SECT_SIZE=0x40000
> > >> CONFIG_AHCI=y
> > >>
> > >>2)
> > >>diff --git a/lib/fdtdec.c b/lib/fdtdec.c
> > >>index 4b097fb588ed..4067ea2dead6 100644
> > >>--- a/lib/fdtdec.c
> > >>+++ b/lib/fdtdec.c
> > >>@@ -1111,7 +1111,7 @@ int fdtdec_setup_memory_banksize(void)
> > >>                return -EINVAL;
> > >>        }
> > >>
> > >>-       for (bank = 0; bank < CONFIG_NR_DRAM_BANKS; bank++) {
> > >>+       for (bank = 0; ; bank++) {
> > >>                ret = ofnode_read_resource(mem, reg++, &res);
> > >>                if (ret < 0) {
> > >>                        reg = 0;
> > >>
> > >>   (fdtdec_setup_memory_banksize() is called in dram_init_banksize().)
> > >>
> > >>
> > >>(2) seems much better, but I don't know why we had to use
> > >>CONFIG_NR_DRAM_BANKS here.
> > >>
>
> 2) alone does not work as other places in the code refer to
> CONFIG_NR_DRAM_BANKS. Setting ...BANKS to 32 makes my code work and
> bdinfo seems now correct:
>
=> bdinfo
> boot_params = 0x0000000000000000
> DRAM bank   = 0x0000000000000000
> -> start    = 0x0000000140000000
> -> size     = 0x0000000100000000
> DRAM bank   = 0x0000000000000001
> -> start    = 0x0000000040000000
> -> size     = 0x0000000100000000
> flashstart  = 0x0000000000000000
> flashsize   = 0x0000000004000000
> flashoffset = 0x00000000000bcb88
> baudrate    = 115200 bps
> relocaddr   = 0x000000013ff27000
> reloc off   = 0x000000013ff27000
> Build       = 64-bit
> current eth = virtio-net#32
> ethaddr     = 52:52:52:52:52:52
> IP addr     = <NULL>
> fdt_blob    = 0x000000013ede6cf0
> new_fdt     = 0x000000013ede6cf0
> fdt_size    = 0x0000000000100000
> lmb_dump_all:
>     memory.cnt   = 0x1
>     memory.reg[0x0].base   = 0x40000000
>   .size   = 0x200000000
>     reserved.cnt   = 0x1
>     reserved.reg[0x0].base = 0x13ede58f0
>     .size = 0x121a710
> arch_number = 0x0000000000000000
> TLB addr    = 0x000000013fff0000
> irq_sp      = 0x000000013ede6ce0
> sp start    = 0x000000013ede6ce0
> Early malloc usage: 3a8 / 2000
>
> May I suggest you propose a combined patch Akashi-san? If we assume
> NUMA systems to be tested up to 8 nodes to mimic real existing
> enterprise hardware and up to 4 memory slots (say for memory hot
> plugging tests) what about a default value of 32? Alternatively, we
> could set this value to a much higher one if the costs are negligible.
>
>
> Well, lets not rush as there are other twists:

the 4G bank in node 1 is marked BootServicesData in the UEFI GetMemoryMap
which I assume is not the case. EDK2 reports it as ConventionalMemory.

The root cause seem to be gd->ramtop not being setup properly.

Further analysis shows that the DT passed to the booted EFI payload does
not seem to be correct:

DT fragment passed to U-Boot

memory at 140000000 {
numa-node-id = <0x00000001>;
reg = <0x00000001 0x40000000 0x00000001 0x00000000>;
device_type = "memory";
};
memory at 40000000 {
numa-node-id = <0x00000000>;
reg = <0x00000000 0x40000000 0x00000001 0x00000000>;
device_type = "memory";
};

DT passed to payload (as per my debug code):

memory at 140000000: memory

    numa-node-id 1

    reg (len= 32)

         140000000 100000000

         40000000 100000000

memory at 40000000: memory

    numa-node-id 0

    reg (len= 16)

         40000000 100000000

I am investigating this further...

> >>In this case, other occurrences of CONFIG_NR_DRAM_BANKS in this file
> > >>should be replaced with a variable for it.
> > >>
> > >>> Your use case is well beyond the typical U-Boot usage. So I guess it
> > >>> will be up to Linaro to provide the necessary patches:
> > >>>
> > >>> * determine the active CPU
> > >>> * determine the RAM assigned to the active CPU according
> > >>>   to the numa-node-id in the device-tree
> > >>> * make sure that U-Boot only uses the memory of the active CPU
> > >>>   internally
> > >>> * make sure that the UEFI memory map contains a compliant
> > >description
> > >>> * possibly, dynamically set up the environment variables
> > >>>
> > >>> +CC Tuomas Tynkkynen (maintainer for qemu_arm64_defconfig)
> > >>
> > >>For (1), we'd better have a different config, or increase
> > >>the value of CONFIG_NR_DRAM_BANKS to a bigger number?
> > >
> > >Is the system configured such that each CPU can access the others CPU's
> > >RAM when entering U-Boot?
> > >
> > >Best regards
> > >
> > >Heinrich
> > >
> >
> > At least the comments for this patch sound as if on a physical system
> cross NUMA node memory access is only available after full SMP
> initialization:
> >
> >
> https://patchwork.kernel.org/project/linux-acpi/patch/20180625130552.5636-1-lorenzo.pieralisi@arm.com/
> >
> > QEMU may be less restrictive.
> >
> > QEMU allows the node distance to be 255 indicating that cross node
> access is infeasible.
> >
> > Best regards
> >
> > Heinrich
> >
> > >>
> > >>-Takahiro Akashi
> > >>
> > >>
> > >>> Best regards
> > >>>
> > >>> Heinrich
> >
>
>
> --
> François-Frédéric Ozog | Director Business Development
> T: +33.67221.6485
> francois.ozog at linaro.org | Skype: ffozog
>


-- 
François-Frédéric Ozog | *Director Business Development*
T: +33.67221.6485
francois.ozog at linaro.org | Skype: ffozog


More information about the U-Boot mailing list