[U-Boot] Relocation size penalty calculation

Wed Oct 14 17:35:44 CEST 2009

Joakim Tjernlund wrote:
> "J. William Campbell" <jwilliamcampbell at comcast.net> wrote on 14/10/2009 01:48:52:
>   
>> Joakim Tjernlund wrote:
>>     
>>> Graeme Russ <graeme.russ at gmail.com> wrote on 13/10/2009 22:06:56:
>>>
>>>
>>>       
>>>> On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
>>>> <joakim.tjernlund at transmode.se> wrote:
>>>>
>>>>         
>>>>> Graeme Russ <graeme.russ at gmail.com> wrote on 13/10/2009 13:21:05:
>>>>>
>>>>>           
>>>>>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
>>>>>> <joakim.tjernlund at transmode.se> wrote:
>>>>>>
>>>>>>             
>>>>>>> Graeme Russ <graeme.russ at gmail.com> wrote on 11/10/2009 12:47:19:
>>>>>>>
>>>>>>>               
>>>>>> [Massive Snip :)]
>>>>>>
>>>>>>
>>>>>>             
>>>>>>>> So, all that is left are .dynsym and .dynamic ...
>>>>>>>>   .dynsym
>>>>>>>>     - Contains 70 entries (16 bytes each, 1120 bytes)
>>>>>>>>     - 44 entries mimic those entries in .got which are not relocated
>>>>>>>>     - 21 entries are the remaining symbols exported from the linker
>>>>>>>>       script
>>>>>>>>     - 4 entries are labels defined in inline asm and used in C
>>>>>>>>
>>>>>>>>                 
>>>>>>> Try adding proper asm declarations. Look at what gcc
>>>>>>> generates for a function/variable and mimic these.
>>>>>>>
>>>>>>>               
>>>>>> Thanks - Now .dynsym contains only exports from the linker script
>>>>>>
>>>>>>             
>>>>> :)
>>>>>
>>>>>           
>>>>>>>>     - 1 entry is a NULL entry
>>>>>>>>
>>>>>>>>   .dynamic
>>>>>>>>     - 88 bytes
>>>>>>>>     - Array of Elf32_Dyn
>>>>>>>>     - typedef struct {
>>>>>>>>           Elf32_Sword     d_tag;
>>>>>>>>           union {
>>>>>>>>               Elf32_Word  d_val;
>>>>>>>>               Elf32_Addr  d_ptr;
>>>>>>>>           } d_un;
>>>>>>>>       } Elf32_Dyn;
>>>>>>>>     - 0x11 entries
>>>>>>>>       [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
>>>>>>>>       [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
>>>>>>>>       [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
>>>>>>>>       [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
>>>>>>>>       [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
>>>>>>>>       [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
>>>>>>>>       [06] 0x00000015, 0x00000000 DT_DEBUG, ???
>>>>>>>>       [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
>>>>>>>>       [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
>>>>>>>>
>>>>>>>>                 
>>>>>>> How big DT_REL is
>>>>>>>
>>>>>>>               
>>>>>>>>       [09] 0x00000013, 0x00000008 DT_RELENT, ???
>>>>>>>>
>>>>>>>>                 
>>>>>>> hmm, cannot remeber :)
>>>>>>>
>>>>>>>               
>>>>>> How big an entry in DT_REL is
>>>>>>
>>>>>>             
>>>>> Right, how could I forget :)
>>>>>
>>>>>           
>>>>>>>>       [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
>>>>>>>>
>>>>>>>>                 
>>>>>>> Oops, you got text relocations. This is generally a bad thing.
>>>>>>> TEXTREL is commonly caused by asm code that arent truly pic so it needs
>>>>>>> to modify the .text segment to adjust for relocation.
>>>>>>> You should get rid of this one. Look for DT_TEXTREL in .o files to find
>>>>>>> the culprit.
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> Alas I cannot - The relocations are a result of loading a register with a
>>>>>> return address when calling show_boot_progress in the very early stages of
>>>>>> initialisation prior to the stack becoming available. The x86 does not
>>>>>> allow direct access to the IP so the only way to find the 'current
>>>>>> execution address' is to 'call' to the next instruction and pop the return
>>>>>> address off the stack
>>>>>>
>>>>>>             
>>>>> hmm, same as ppc but that in it self should not cause a TEXREL, should it?
>>>>> Ahh, the 'call' is absolute, not relative? I guess there is some way around it
>>>>> but it is not important ATM I guess.
>>>>>
>>>>> Evil idea, skip -fpic et. all and add the full reloc procedure
>>>>> to relocate by rewriting directly in TEXT segment. Then you save space
>>>>> but you need more relocation code. Something like dl_do_reloc from
>>>>> uClibc. Wonder how much extra code that would be? Not too much I think.
>>>>>
>>>>>
>>>>>           
>>>> With the following flags
>>>>
>>>> PLATFORM_RELFLAGS += -fvisibility=hidden
>>>> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
>>>> PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions
>>>>
>>>> I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
>>>> this might mean I need the symbol table in the binary in order to resolve
>>>> them
>>>>
>>>>         
>
> BTW, how many relocs do you get compared with -fPIC? I suspect you more
> now but hopefully not that many more.
>
>   
>>> Possibly, but I think you only need to add an offset to all those
>>> relocs.
>>>
>>>       
>> Almost right. The relocations specify a symbol value that needs to be
>> added to the data in memory to relocate the reference. The symbol values
>> involved should be the start of the text section for program references,
>> the start of the uninitialized data section for bss references, and the
>> start of the data section for initialized data and constants. So there
>> are about four symbols whose value you need to keep. Take a look at
>> http://refspecs.freestandards.org/elf/elf.pdf (which you have probably
>> already looked at) and it tells you what to do with R_386_PC32 ad
>> R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded
>> will remove all the symbols you don't actually need, but I don't know
>> that for sure. Note also that you can change the section flags of a
>> section marked noload  to load.
>>     
>
> Still think you can get away with just ADDING an offset. The image is linked to a
> specific address and then you move the whole image to a new address. Therefore
> you should be able to read the current address, add offset, write back the new address.
>
> Normally one do what you describe but here we know that the whole img has moved so
> we don't have to do calculate the new address from scratch.
>   
If the addresses of the bss, text, and data segments change by the same 
value, I think you are correct. However, if the text and data/bss 
segments are moved by different offsets, naturally the relocations would 
be different. One reason to retain this capability would be to allow the 
u-boot copy to execute in place in NOR flash while re-locating the 
read-write storage once memory has been sized. Having different 
relocation factors is not much worse than just one, and it may be just 
as easy to get working initially as a single relocation constant.

FWIW, the "ultimate" solution to minimum relocation size is a 
post-processing step that creates "several" arrays of relocation offsets 
as two byte quantities. This reduces the cost of each relocation entry 
to just a bit more than two bytes (there is a small overhead for array 
size, MSB values and relocation offset selection.) Naturally, this is 
much less than the ELF version of the same relocations, because we do 
not need to retain as much information and ELF doesn't worry about size 
that much.. This may pacify users for which the flash size of the image 
is critical, at the expense of an extra link step. Naturally, getting 
things to work with "standard ELF" is the most important step, and 
probably enough for most people.

I also am interested in the number of additional relocations generated 
without -fpic. I suspect on the 386 it can be substantial. However, for 
every new reloc generated, a .got reference load will probably be 
eliminated. This should result in a shorter text segment to balance the 
increased relocation segment. Adding the -fno-jump-tables gcc option may 
also help a bit.

Bill Campbell
>        Jocke
>
>
>
>