[U-Boot] [RFC PATCH 0/7] reboard: Introduce generic relocation feature

Wed Dec 7 03:56:30 CET 2011

Hi Simon

Sorry for the late review on this - I've only just now had a good
chance to look at it...

On Wed, Dec 7, 2011 at 12:56 PM, Simon Glass <sjg at chromium.org> wrote:
> Hi Tom,
>
> On Mon, Nov 28, 2011 at 3:45 PM, Tom Rini <tom.rini at gmail.com> wrote:
>> On Mon, Nov 21, 2011 at 4:57 PM, Simon Glass <sjg at chromium.org> wrote:
>>> This is the second patch series aiming to unify the various board.c files
>>> in each architecture into a single one. This series creates a libboard
>>> library and implements relocation in it. It then moves ARM over to use
>>> this framework, as an example.
>>>
>>> On ARM the relocation code is duplicated for each CPU yet it
>>> is the same. We can bring this up to the arch level. But since (I believe)
>>> Elf relocation is basically the same process for all archs, there is no
>>> reason not to bring it up to the generic level.
>>>
>>> This series establishes a new libboard library in the board/ subdir and
>>> puts some relocation code in it. Each architecture which uses this
>>> framework needs to provide a function called arch_elf_relocate_entry()
>>> which processes a single relocation entry. If there is concern about
>>> calling a function for all 2000-odd relocations then I can change this.

Ugh, that overhead is a killer - The relocation can be done entirely in a
tight loop with a switch statement in your reloc_elf() function. But I
think reloc_elf() should be weak - x86 relocation can be done in a few
lines of assembler which is smaller and faster than a C implementation
will ever be

>>> For ARM, a new arch/arm/lib/proc.S file is created, which holds generic
>>> ARM assembler code (things that cannot be written in C and are common
>>> functions used by all ARM CPUs). This helps reduce duplication. Interrupt
>>> handling code and perhaps even some startup code can move there later.
>>>
>>> It may be useful for other architectures to have a similar file.
>>>
>>> This series moves ARM over to use this framework. Overall this means that
>>> two new files are required 'early' in boot: board/reloc.c and
>>> arch/arm/lib/proc.S.  This is tricky mainly due to SPL. I believe that
>>> we may need to adjust link scripts to put these two files early in the
>>> link scripts also. But I am not sure about this and can't actually find
>>> a problem as yet. I would much prefer to solve this with a new section
>>> name like .text.early if we can.
>>>
>>> (I should really cc all arch maintainers but I think in that case I get
>>> an error from the list server. Not sure what the limit is.)
>>>
>>> Comments please...

The whole sequence of the series seems a little bit 'random' to me - I
think patches 2 and 3 can be squahed and the dead code removal in patch 7
should be done in the same patch(es) that create the dead code.

It looks like reloc_make_copy(), reloc_clear_bss() and reloc_elf() have
been poached from x86 - The copy and clear loops are quite inefficient. I
have patches at home which change these to use memcpy and memset

>>
>> Seems like a good idea.  Got some size's before and after?  Maybe some
>> boot times too (as you mention 2000-odd function calls during
>> relocation) ?
>>
>> --
>> Tom
>
> Yes I have this info - will package it up tomorrow along with a new
> series to deal with review comments.

As far as I understand it, the long-term plan is to unify the board init
sequence across all architectures. I can see three approaches:

 1) Tweak each arch until they are all the same and then 'throw the switch'
 2) Pick one or two arches and 'throw the switch' on them and migrate the
    others later
 3) Pull out common bits and pieced and deal with the left-overs later

It looks like you've opted for number 3. The problem I see in this approach
is that you are relying on the other arches to follow your lead which is, at
best, a 50/50 bet.

Now I look at board_init_f() and board_init_r() in ARM and x86 and think
'why on earth isn't most of than in the init loops?'. The boot sequence
for ARM and x86 should be reducible to (all arches are generally pretty
close to this, so it should not introduce too much pain when they decide
to switch)

 - Low level init (reset vector etc)
 - Setup temporary stack and gd storage (typically in CPU cache)
 + init_f loop <-- SDRAM initialised
 - Move gd to RAM
 - Setup new stack in SDRAM
 - Turn on caching (cannot be done while gd and stack in cache!!!)
 + Calculate relocation address
 + Copy to RAM
 + Do relocation fixups
 + Clear BSS
 - Jump to RAM
 + init_r loop
 + main_loop()

+ = common code
- = arch, SoC, or board specific

Note: Typically the boot sequence currently has 'Turn on caching' after jump
to RAM - I have patches in the works which change this for x86

So my vote would be #2 - Unify the ARM and x86 and bring any code that can
be inside the init loops in and then switch both over to a generic init
implementation. Other arches can be switched later

Regards,

Graeme