[PATCH] riscv: set the width of the physical address/size data type based on arch

Wed May 7 20:37:37 CEST 2025

On Wed, 7 May 2025 at 21:18, Tom Rini <trini at konsulko.com> wrote:
>
> On Wed, May 07, 2025 at 03:11:38PM +0530, Sughosh Ganu wrote:
> > On Wed, 7 May 2025 at 13:19, Sughosh Ganu <sughosh.ganu at linaro.org> wrote:
> > >
> > > On Tue, 6 May 2025 at 16:35, Heinrich Schuchardt
> > > <heinrich.schuchardt at canonical.com> wrote:
> > > >
> > > >
> > > >
> > > > Sughosh Ganu <sughosh.ganu at linaro.org> schrieb am Di., 6. Mai 2025, 12:50:
> > > >>
> > > >> On Tue, 6 May 2025 at 15:19, Heinrich Schuchardt
> > > >> <heinrich.schuchardt at canonical.com> wrote:
> > > >> >
> > > >> > On 5/6/25 11:24, Sughosh Ganu wrote:
> > > >> > > U-Boot has support for both the 32-bit and 64-bit RiscV platforms. Set
> > > >> > > the width of the phys_{addr,size}_t data types based on the register
> > > >> > > size of the architecture.
> > > >> > >
> > > >> > > Currently, even the 32-bit RiscV platforms have a 64-bit
> > > >> > > phys_{addr,size}_t data types. This causes issues on the 32-bit
> > > >> > > platforms, where the upper 32-bits of the variables of these types
> > > >> > > can have junk data, and that can cause all kinds of side-effects.
> > > >> >
> > > >> > How could it be that the upper 32-bit have junk data?
> > > >> >
> > > >> > When we convert from a shorter variable the compiler should fill the
> > > >> > upper bits with zero.
> > > >>
> > > >> That does not seem to be happening. The efi_fit test fails on the
> > > >> qemu-riscv32 platform, when attempting to boot the OS from the FIT
> > > >> image.
> > > >>
> > > >> These are the values of the base address that I see in the
> > > >> _lmb_alloc_addr() function.
> > > >>
> > > >> _lmb_alloc_addr: 755, rgn => -1, base => 0x1a1c0e00802000bc, size => 0x50b1
> > > >
> > > >
> > > > As you are running on QEMU you should be able to track down where the value is actually assigned with gdb. This could for instance be a buffer overrun.
> > >
> > > I was able to hook up gdb and re-create the issue. What I observe is
> > > that when the lmb_allocate_mem() function is called, the base address
> > > parameter, which is 64-bits, shows a value with the upper 32-bits not
> > > zeroed out. So, this looks like a compiler issue, where the upper
> > > 32-bits are not being zeroed out. Fwiw, this shows up with the
> > > compiler being used in the CI environment, as well as the one that I
> > > am using.
> >
> > Thinking a bit on this, I don't think this is a compiler issue. The
> > problem is that we are using the ulong type in some places(especially
> > in the boot* commands) for storing the address values, while we use
> > phys_addr_t in other places. And because this is a pointer being
> > passed across functions, when the data-type that the pointer is
> > pointing to changes from a 32-bit to 64-bit value, the upper 32-bits
> > get considered. So the issue is that we use ulong in some places, and
> > phys_addr_t in others for storing the addresses.
> >
> > But I think that the solution for this(at least for now) is to set
> > phys_addr_t based on the underlying architecture. In the long run,
> > there needs to be an audit of the usage of ulong for storing
> > addresses, and that needs to be changed to phys_addr_t.
>
> Thanks for digging in to this more. I agree with what you're saying here
> for both the short and long term.

Heinrich and I had a discussion on IRC on this, and for the short
term, it was decided to instead have the ulong values copied into a
local variable of type phys_addr_t before calling the lmb API. This
approach too will work for now. Heinrich is of the opinion that it
would be better not to make the change to the riscv32 file as the
maintainers think it appropriate to use u64 for phys_addr_t. I will be
making this change as part of my upcoming version of the lmb API
series. I will be on leave for the next week, and will send the v2
once back. Thanks.

-sughosh

>
> --
> Tom