[PATCH] arm64: Add support for bigger u-boot when CONFIG_POSITION_INDEPENDENT=y

Wed Sep 2 17:25:15 CEST 2020

On Wed, Sep 02, 2020 at 04:18:48PM +0100, André Przywara wrote:
> On 02/09/2020 15:53, Edgar E. Iglesias wrote:
> > On Wed, Sep 02, 2020 at 03:43:08PM +0100, Andrï¿½ Przywara wrote:
> >> On 02/09/2020 12:15, Michal Simek wrote:
> 
> Hi,
> 
> >>
> >>> From: "Edgar E. Iglesias" <edgar.iglesias at xilinx.com>
> >>>
> >>> When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y
> >>> compilation error is shown:
> >>> /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation
> >>> truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end'
> >>> defined in .bss_start section in u-boot.
> >>>
> >>> It is caused by adr instruction which permits the calculation of any byte
> >>> address within +- 1MB of the current PC.
> >>> Because U-Boot is bigger then 1MB calculation is failing.
> >>>
> >>> The patch is using adrp/add instructions where adrp shifts a signed, 21-bit
> >>> immediate left by 12 bits (4k page), adds it to the value of the program
> >>> counter with the bottom 12 bits cleared to zero. Then add instruction
> >>> provides the lower 12 bits which is offset within 4k page.
> >>> These two instructions together compose full 32bit offset which should be
> >>> more then enough to cover the whole u-boot size.
> >>>
> >>> Signed-off-by: Edgar E. Iglesias <edgar.iglesias at xilinx.com>
> >>> Signed-off-by: Michal Simek <michal.simek at xilinx.com>
> >>
> >> It's a bit scary that you need more than 1MB, but indeed what you do
> >> below is the canonical pattern to get the full range of PC relative
> >> addressing (this is used heavily in Trusted Firmware, for instance).
> >>
> >> The only thing to keep in mind is that this assumes that the load
> >> address of the binary is 4K aligned, so that the low 12 bits of the
> >> symbol stay the same. I wonder if we should enforce this somehow? But
> >> the load address is not controlled by the build process (the whole
> >> purpose of PIE), so that's not doable just in the build system?
> > 
> > There shouldn't be any need for 4K alignment. Could you elaborate on
> > why you think there is?
> 
> That seems to be slightly tricky, and I tried to get some confirmation,
> but here goes my reasoning. Maybe you can confirm this:
> 
> - adrp takes the relative offset, but only of the upper 20 bits (because
> that's all we can encode). It clears the lower 12 bits of the register.
> - the "add" is not PC relative anymore, so it just takes the lower 12
> bits of the "absolute" linker symbol.

I was under the impression that this would use a PC-relative lower 12bit
relocation but you are correct. I dissasembled the result:

  40:   91000042        add     x2, x2, #0x0
                        40: R_AARCH64_ADD_ABS_LO12_NC   __rel_dyn_start

> So this assumes that the lower 12 bits of the actual address in memory
> and the lower 12 bits of the linker's view match.
> An example:
> 00024: adrp x0, SYMBOL
> 00028: add  x0, x0, :lo12:SYMBOL
> 
> SYMBOL:
> 42058: ...
> 
> The toolchain will generate:
> 	adrp x0, #0x42; add x0, x0, #0x058
> 
> Now you load the code to 0x8000.0800 (NOT 4K aligned). SYMBOL is now at
> 0x80042858.
> The adrp will use the PC (0x8000.0824) & ~0xfff + offs => 0x8004.2000.
> The add will just add 0x58, so you end up with x0 being 0x80042058,
> which is not the right address.
> 
> Does this make sense?

Yes, it makes sense.

> 
> > Perhaps the commit message is a little confusing. The toolchain will
> > compute the pc-relative offset from this particular location to the
> > symbol and apply the relocations accordingly.
> 
> Yes, but the PC relative offset applies only to the upper 20 bits,
> because it's only adrp that has PC relative semantics.
> 
> 
> >>
> >> Shall we at least document this? I guess typical load address are
> >> actually quite well aligned, so it might not be an issue in practice.
> >>

Yes, probably worth documenting and perhaps an early bail-out if it's not
the case...

Thanks,
Edgar