[PATCH] riscv: set the width of the physical address/size data type based on arch

Wed May 7 11:41:38 CEST 2025

On Wed, 7 May 2025 at 13:19, Sughosh Ganu <sughosh.ganu at linaro.org> wrote:
>
> On Tue, 6 May 2025 at 16:35, Heinrich Schuchardt
> <heinrich.schuchardt at canonical.com> wrote:
> >
> >
> >
> > Sughosh Ganu <sughosh.ganu at linaro.org> schrieb am Di., 6. Mai 2025, 12:50:
> >>
> >> On Tue, 6 May 2025 at 15:19, Heinrich Schuchardt
> >> <heinrich.schuchardt at canonical.com> wrote:
> >> >
> >> > On 5/6/25 11:24, Sughosh Ganu wrote:
> >> > > U-Boot has support for both the 32-bit and 64-bit RiscV platforms. Set
> >> > > the width of the phys_{addr,size}_t data types based on the register
> >> > > size of the architecture.
> >> > >
> >> > > Currently, even the 32-bit RiscV platforms have a 64-bit
> >> > > phys_{addr,size}_t data types. This causes issues on the 32-bit
> >> > > platforms, where the upper 32-bits of the variables of these types
> >> > > can have junk data, and that can cause all kinds of side-effects.
> >> >
> >> > How could it be that the upper 32-bit have junk data?
> >> >
> >> > When we convert from a shorter variable the compiler should fill the
> >> > upper bits with zero.
> >>
> >> That does not seem to be happening. The efi_fit test fails on the
> >> qemu-riscv32 platform, when attempting to boot the OS from the FIT
> >> image.
> >>
> >> These are the values of the base address that I see in the
> >> _lmb_alloc_addr() function.
> >>
> >> _lmb_alloc_addr: 755, rgn => -1, base => 0x1a1c0e00802000bc, size => 0x50b1
> >
> >
> > As you are running on QEMU you should be able to track down where the value is actually assigned with gdb. This could for instance be a buffer overrun.
>
> I was able to hook up gdb and re-create the issue. What I observe is
> that when the lmb_allocate_mem() function is called, the base address
> parameter, which is 64-bits, shows a value with the upper 32-bits not
> zeroed out. So, this looks like a compiler issue, where the upper
> 32-bits are not being zeroed out. Fwiw, this shows up with the
> compiler being used in the CI environment, as well as the one that I
> am using.

Thinking a bit on this, I don't think this is a compiler issue. The
problem is that we are using the ulong type in some places(especially
in the boot* commands) for storing the address values, while we use
phys_addr_t in other places. And because this is a pointer being
passed across functions, when the data-type that the pointer is
pointing to changes from a 32-bit to 64-bit value, the upper 32-bits
get considered. So the issue is that we use ulong in some places, and
phys_addr_t in others for storing the addresses.

But I think that the solution for this(at least for now) is to set
phys_addr_t based on the underlying architecture. In the long run,
there needs to be an audit of the usage of ulong for storing
addresses, and that needs to be changed to phys_addr_t.

-sughosh

>
> /toolchains/riscv32-linux/bin/riscv32-linux-gcc --version
> riscv32-linux-gcc (GCC) 13.2.0
>
> /toolchains/riscv32-linux/bin/riscv32-linux-ld --version
> GNU ld (GNU Binutils) 2.41
>
> Given that LPAE is not being used on riscv32 platforms in U-Boot, I
> think this patch should be considered. Thanks.
>
> -sughosh
>
> >
> > If we don't correct the root cause, it will hit us again.
> >
> > Best regards
> >
> > Heinrich
> >
> >>
> >> The actual value that gets passed is 0x802000bc. The upper 32-bits do
> >> not get zeroed out. Which causes the _lmb_alloc_addr() to return an
> >> error. You can check my branch [1], where I have put a temporary
> >> commit to print the values that cause the issue.
> >>
> >> -sughosh
> >>
> >> [1] - https://source.denx.de/u-boot/contributors/sng/u-boot/-/commits/lmb_apis_cleanup_bisectable_v2
> >>
> >> >
> >> > >
> >> > > This was discovered on the qemu Riscv 32-bit platform  when the return
> >> > > value of an LMB API was checked, and some LMB allocation that ought
> >> > > not to have failed, was failing. The upper 32-bits of the address
> >> > > variable contained garbage, resulting in failures.
> >> > >
> >> > > Signed-off-by: Sughosh Ganu <sughosh.ganu at linaro.org>
> >> > > ---
> >> > >
> >> > > Note:
> >> > > Although the LMB API cleanup series depends on this patch, I am
> >> > > sending it separately so that it gets noticed by the RiscV
> >> > > maintainers. Sometimes a patch may not get the required attention when
> >> > > sent as part of another seemingly unrelated series.
> >> > >
> >> > >
> >> > >   arch/riscv/include/asm/types.h | 9 +++++++--
> >> > >   1 file changed, 7 insertions(+), 2 deletions(-)
> >> > >
> >> > > diff --git a/arch/riscv/include/asm/types.h b/arch/riscv/include/asm/types.h
> >> > > index 49f7a5d6b3a..45d806c83eb 100644
> >> > > --- a/arch/riscv/include/asm/types.h
> >> > > +++ b/arch/riscv/include/asm/types.h
> >> > > @@ -35,8 +35,13 @@ typedef u64 dma_addr_t;
> >> > >   typedef u32 dma_addr_t;
> >> > >   #endif
> >> > >
> >> > > -typedef unsigned long long phys_addr_t;
> >> > > -typedef unsigned long long phys_size_t;
> >> > > +#ifdef CONFIG_PHYS_64BIT
> >> > > +typedef u64 phys_addr_t;
> >> > > +typedef u64 phys_size_t;
> >> > > +#else
> >> > > +typedef u32 phys_addr_t;
> >> > > +typedef u32 phys_size_t;
> >> >
> >> > This matches what has been done for ARM.
> >> >
> >> > 86c915628d58 ("riscv: Change phys_addr_t and phys_size_t to 64-bit")
> >> > changed the definition to 64bit phys addr_t to support the 34bit
> >> > physical addresses (similar to LPAE on arm32).
> >> >
> >> > I don't think that we need 34bit support currently. But I would prefer
> >> > if we could find the root cause why the upper 32bit gets messed up as
> >> > this might point to a generic problem.
> >> >
> >> > Best regards
> >> >
> >> > Heinrich
> >> >
> >> > > +#endif
> >> > >
> >> > >   #endif /* __KERNEL__ */
> >> > >
> >> >