[U-Boot] Relocation size penalty calculation

Graeme Russ graeme.russ at gmail.com
Sun Oct 11 12:47:19 CEST 2009


On Sun, Oct 11, 2009 at 2:38 AM, Joakim Tjernlund
<joakim.tjernlund at transmode.se> wrote:
> Graeme Russ <graeme.russ at gmail.com> wrote on 10/10/2009 13:21:10:
>>
>> On Sat, Oct 10, 2009 at 9:47 PM, Joakim Tjernlund
>> <joakim.tjernlund at transmode.se> wrote:
>> >
>> >
>> > Graeme Russ <graeme.russ at gmail.com> wrote on 10/10/2009 12:38:19:
>> >>
>> >> On Sat, Oct 10, 2009 at 8:27 PM, Joakim Tjernlund
>> >> <joakim.tjernlund at transmode.se> wrote:
>> >> >
>> >> >
>> >> > Graeme Russ <graeme.russ at gmail.com> wrote on 10/10/2009 10:46:52:
>> >> >>
>> >> >> On Sat, Oct 10, 2009 at 7:07 PM, Joakim Tjernlund
>> >> >> <joakim.tjernlund at transmode.se> wrote:
>> >> >> > Graeme Russ <graeme.russ at gmail.com> wrote on 10/10/2009 06:43:52:
>> >> >> >>
>> >> >> >> On Fri, Oct 9, 2009 at 10:12 AM, Joakim Tjernlund
>> >> >> >> <joakim.tjernlund at transmode.se> wrote:
>> >> >> >> >>
>> >> >> >> >> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
>> >> >> >> >> <jwilliamcampbell at comcast.net> wrote:
>> >> >> >> >> > Graeme Russ wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
>> >> >> >> >> >> <jwilliamcampbell at comcast.net> wrote:
>> >> >> >> >> >>
>> >> >> >> >> >>>
>> >> >> >> >> >>> Graeme Russ wrote:
>> >> >> >> >> >>>
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> Out of curiosity, I wanted to see just how much of a size penalty I am
>> >> >> >> >> >>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>> >> >> >> >> >>>> the results (fixed width font will help - its space, not tab,
>> >> >> >> >> >>>> formatted):
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> Section             non-reloc     reloc
>> >> >> >> >> >>>> ---------------------------------------
>> >> >> >> >> >>>> .text                000118c4  000137fc <- 0x1f38 bytes (~8kB) bigger
>> >> >> >> >> >>>> .rodata              00005bad  000059d0
>> >> >> >> >> >>>> .interp              n/a       00000013
>> >> >> >> >> >>>> .dynstr              n/a       00000648
>> >> >> >> >> >>>> .hash                n/a       00000428
>> >> >> >> >> >>>> .eh_frame            00003268  000034fc
>> >> >> >> >> >>>> .data                00000a6c  000001dc
>> >> >> >> >> >>>> .data.rel            n/a       00000098
>> >> >> >> >> >>>> .data.rel.ro.local   n/a       00000178
>> >> >> >> >> >>>> .data.rel.local      n/a       000007e4
>> >> >> >> >> >>>> .got                 00000000  000001f0
>> >> >> >> >> >>>> .got.plt             n/a       0000000c
>> >> >> >> >> >>>> .rel.got             n/a       000003e0
>> >> >> >> >> >>>> .rel.dyn             n/a       00001228
>> >> >> >> >> >>>> .dynsym              n/a       00000850
>> >> >> >> >> >>>> .dynamic             n/a       00000080
>> >> >> >> >> >>>> .u_boot_cmd          000003c0  000003c0
>> >> >> >> >> >>>> .bss                 00001a34  00001a34
>> >> >> >> >> >>>> .realmode            00000166  00000166
>> >> >> >> >> >>>> .bios                0000053e  0000053e
>> >> >> >> >> >>>> =======================================
>> >> >> >> >> >>>> Total                0001d5dd  00022287 <- 0x4caa bytes (~19kB) bigger
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> Its more than a 16% increase in size!!!
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> .text accounts for a little under half of the total bloat, and of that,
>> >> >> >> >> >>>> the crude dynamic loader accounts for only 341 bytes
>> >> >> >> >> >>>>
>> >> >> >> >> >>>>
>> >> >> >> >> >>>
>> >> >> >> >> >>> Hi Graeme,
>> >> >> >> >> >>>     I would be interested in a third option (column), the x86 build with
>> >> >> >> >> >>> just -mrelocateable but NOT -fpic. It will not be definitive because
>> >> >> >> >> >>> there
>> >> >> >> >> >>> will be extra code that references the GOT and missing code todo some of
>> >> >> >> >> >>> the relocation, but it would still be interesting.
>> >> >> >> >> >>>
>> >> >> >> >> >>
>> >> >> >> >> >> x86 does not have -mrelocatable. This is a PPC only option :(
>> >> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >> > Hi Graeme,
>> >> >> >> >> >          You are unfortunately correct. However, I wonder if we can get
>> >> >> >> >> > essentially the same result by executing the final ld step with the
>> >> >> >> >> > --emit-relocs switch included. This may also include some "extra" sections
>> >> >> >> >> > that we would want to strip out, but if it works, it could give all
>> >> >> >> >> > ELF-based systems a way to a relocatable u-boot.
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> I don't think --emit-relocs is necessary with -pic. I haven't gone through
>> >> >> >> >> all the permutations to see if there is a smaller option, but gcc -fpic and
>> >> >> >> >> ld -pie creates enough information to perform relocation on the x86
>> >> >> >> >> platform
>> >> >> >> >
>> >> >> >> > Try -fvisibility=hidden
>> >> >> >>
>> >> >> >> Thanks - Shaved another 2539 bytes off the binary
>> >> >> >>
>> >> >> >> Also found out how to get rid of .eh_frame (crept in when I upgraded to
>> >> >> >> gcc 4.4.1) with -fno-dwarf2-cfi-asm, so that shaves another 13452 bytes
>> >> >> >>
>> >> >> >> Total saving of 15.6k
>> >> >> >
>> >> >> > Great, so now you are back at just a few percent added I guess?
>> >> >> >
>> >> >> >
>> >> >>
>> >> >> Not really - The .eh_frame saving applies to both relocated and non
>> >> >> relocated builds
>> >> >
>> >> > OK, so you didn't use PIC before at all?
>> >> >
>> >> > Anyway I think you can do more. Using -Bsymbolic you should get
>> >> > away with RELATIVE relocs only and be able to skip a lot of segments above.
>> >> > Have a look at uClibc ldso/ldso/dl-startup.c
>> >> >
>> >> >
>> >>
>> >> My build options thus far are:
>> >>
>> >> PLATFORM_RELFLAGS += -fpie -fvisibility=hidden
>> >> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
>> >> PLATFORM_LDFLAGS += -pie
>> >>
>> >> -fpic / -pic make no difference
>> >
>> > not on x86, on ppc it is a big difference.
>> >
>> >>
>> >> Interestingly, -Bsymbolic adds exactly 8 bytes to .dynamic, but doesn't
>> >> change the size of any other section
>> >>
>> >> Pulling apart the relocation sections, it seems that all relocations are
>> >> already RELATIVE even without -Bsymbolic
>> >
>> > Ah, that is because you built an exe with -pie
>> > Then you should be able to drop everything but the RELATIVE
>> > from the linking, or almost in any case.
>> >
>> >  Jocke
>> >
>> >
>>
>> Hmm, so its seems I may have hit the limit. I tried:
>>
>> PLATFORM_LDFLAGS += -r --emit-relocs
>>
>> but there is not enough information left to complete the relocation. It
>> seems as though I need .rel.got, .got.plt, .dynsym and .rel.dyn in order
>> to find the actual bytes that need modifying (it also seems to mess with
>> the size of the stripped binary for some reason)
>>
>> Looks like I'll have to proceed with my original plan - a bit bloated,
>> but it works
>
> Relocation costs :(
>
> I am not sure why you need .got.plt, it should be empty,
> what is in it?
> Same with dynsym, what is in it?
>
> Memory fails me, but since u-boot is a freestanding app it I think
> these two might not be needed. Perhaps there are weak unresolved
> syms in there?
>
>     Jocke
>
>

Well, I'm in the middle of a pretty intense analysis of what is going on.

Compile flags are:

PLATFORM_RELFLAGS += -fpic -fvisibility=hidden
PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
PLATFORM_LDFLAGS += -pic -Bsymbolic

So far I have found that the only sections that have changes as a result
of a change in TEXT_BASE are:
    .text
    .rodata
    .data.rel
    .got
    .got.plt
    .rel.text
    .rel.got
    .rel.dyn
    .dynsym
    .dynamic
    .u_boot_cmd

Changes in .text are covered by .rel.text (see below) or as a result of
CONFIG_SYS_MONITOR_BASE being equal to TEXT_BASE (used in cfi_flash.c)

Changes in .rodata are a result of version_string changing for each
compile

  .rel.text
    - Contains a list of pointers into .got
    - All entries are R_386_RELATIVE
    - All entries (8 of) are in cpu/i386/start.o
    - cpu/i386/start.o only used during initial bootstrap - not needed
      after execution starts in RAM
    - Can be safely discarded

  .rel.got
    - Contains a list of pointers into .got
    - All entries are R_386_RELATIVE
    - Not all entries change with TEXT_BASE. Some entries are symbols
      exported from the linker script (in particular section size
      exports) while the others are in the somewhat 'special' BIOS and
      Real Mode sections which are located in a fixed RAM location (these
      sections are used for real-mode trampolining into Linux by providing
      a limited PC 'BIOS'
    - All entries that are not linked to TEXT_BASE are easily identified
      because they are 'located' below TEXT_BASE (specically between
      0x00000000 and 0x00001A34)
    - This section is not needed in the final binary - Direct processing
      of .got will achieve the required end result

  .rel.dyn
    - Contains a list of pointers into .data.rel and .u_boot_cmd
    - Like .rel.got, not all entries in .data.rel need relocating. Again,
      like .rel.got, these are easily identified
    - This section not needed

Another 5.5k saved

So, all that is left are .dynsym and .dynamic ...
  .dynsym
    - Contains 70 entries (16 bytes each, 1120 bytes)
    - 44 entries mimic those entries in .got which are not relocated
    - 21 entries are the remaining symbols exported from the linker
      script
    - 4 entries are labels defined in inline asm and used in C
    - 1 entry is a NULL entry

  .dynamic
    - 88 bytes
    - Array of Elf32_Dyn
    - typedef struct {
          Elf32_Sword     d_tag;
          union {
              Elf32_Word  d_val;
              Elf32_Addr  d_ptr;
          } d_un;
      } Elf32_Dyn;
    - 0x11 entries
      [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
      [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
      [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
      [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
      [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
      [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
      [06] 0x00000015, 0x00000000 DT_DEBUG, ???
      [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
      [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
      [09] 0x00000013, 0x00000008 DT_RELENT, ???
      [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
      [0b] 0x6FFFFFFA, 0x00000236 ???, Entries in .rel.dyn
      [0c] 0x00000000, 0x00000000 DT_NULL, End of Array
      [0d] 0x00000000, 0x00000000 DT_NULL, End of Array
      [0e] 0x00000000, 0x00000000 DT_NULL, End of Array
      [0f] 0x00000000, 0x00000000 DT_NULL, End of Array
      [10] 0x00000000, 0x00000000 DT_NULL, End of Array

I think some more investigation into the need for .dynsym and .dynamic is
still required...

Regards,

Graeme


More information about the U-Boot mailing list