[U-Boot] [RFC PATCH 0/7] reboard: Introduce generic relocation feature

Wed Dec 7 04:36:02 CET 2011

Hi Simon,

On Wed, Dec 7, 2011 at 2:25 PM, Simon Glass <sjg at chromium.org> wrote:
> Hi Graeme,
>
> On Tue, Dec 6, 2011 at 6:56 PM, Graeme Russ <graeme.russ at gmail.com> wrote:
>> Hi Simon
>>
>> Sorry for the late review on this - I've only just now had a good
>> chance to look at it...
>>
>
> Thanks for the comments.
>
>> On Wed, Dec 7, 2011 at 12:56 PM, Simon Glass <sjg at chromium.org> wrote:
>>> Hi Tom,
>>>
>>> On Mon, Nov 28, 2011 at 3:45 PM, Tom Rini <tom.rini at gmail.com> wrote:
>>>> On Mon, Nov 21, 2011 at 4:57 PM, Simon Glass <sjg at chromium.org> wrote:
>>>>> This is the second patch series aiming to unify the various board.c files
>>>>> in each architecture into a single one. This series creates a libboard
>>>>> library and implements relocation in it. It then moves ARM over to use
>>>>> this framework, as an example.
>>>>>
>>>>> On ARM the relocation code is duplicated for each CPU yet it
>>>>> is the same. We can bring this up to the arch level. But since (I believe)
>>>>> Elf relocation is basically the same process for all archs, there is no
>>>>> reason not to bring it up to the generic level.
>>>>>
>>>>> This series establishes a new libboard library in the board/ subdir and
>>>>> puts some relocation code in it. Each architecture which uses this
>>>>> framework needs to provide a function called arch_elf_relocate_entry()
>>>>> which processes a single relocation entry. If there is concern about
>>>>> calling a function for all 2000-odd relocations then I can change this.
>>
>> Ugh, that overhead is a killer - The relocation can be done entirely in a
>> tight loop with a switch statement in your reloc_elf() function. But I
>> think reloc_elf() should be weak - x86 relocation can be done in a few
>> lines of assembler which is smaller and faster than a C implementation
>> will ever be
>
> I will benchmark it tomorrow. If we move the for loop into the
> architecture-specific code then it goes away. Just seemed to me that
> the less duplicated code we have the better. We could use an inline
> function but I'm not keen on that.

This may be one case were performance outweighs the desire to be generic.
The nice thing is we can provide a generic function which will work for all
arches (albiet a little big and slow) - This will help when switching an
arch over as there is, at least, a default implementation much like all the
string functions (off topic, but why aren't they weak instead of #defined)

>>>>> For ARM, a new arch/arm/lib/proc.S file is created, which holds generic
>>>>> ARM assembler code (things that cannot be written in C and are common
>>>>> functions used by all ARM CPUs). This helps reduce duplication. Interrupt
>>>>> handling code and perhaps even some startup code can move there later.
>>>>>
>>>>> It may be useful for other architectures to have a similar file.
>>>>>
>>>>> This series moves ARM over to use this framework. Overall this means that
>>>>> two new files are required 'early' in boot: board/reloc.c and
>>>>> arch/arm/lib/proc.S.  This is tricky mainly due to SPL. I believe that
>>>>> we may need to adjust link scripts to put these two files early in the
>>>>> link scripts also. But I am not sure about this and can't actually find
>>>>> a problem as yet. I would much prefer to solve this with a new section
>>>>> name like .text.early if we can.
>>>>>
>>>>> (I should really cc all arch maintainers but I think in that case I get
>>>>> an error from the list server. Not sure what the limit is.)
>>>>>
>>>>> Comments please...
>>
>> The whole sequence of the series seems a little bit 'random' to me - I
>> think patches 2 and 3 can be squahed and the dead code removal in patch 7
>> should be done in the same patch(es) that create the dead code.
>>
> Yes I can clean this up.
>
>> It looks like reloc_make_copy(), reloc_clear_bss() and reloc_elf() have
>> been poached from x86 - The copy and clear loops are quite inefficient. I
>> have patches at home which change these to use memcpy and memset
>
> Yes they have - in fact I have a series here which unifies board.c
> completely for ARM and x86. But I need to take this in baby steps as I
> said.
>
>>
>>>>
>>>> Seems like a good idea.  Got some size's before and after?  Maybe some
>>>> boot times too (as you mention 2000-odd function calls during
>>>> relocation) ?
>>>>
>>>> --
>>>> Tom
>>>
>>> Yes I have this info - will package it up tomorrow along with a new
>>> series to deal with review comments.
>>
>> As far as I understand it, the long-term plan is to unify the board init
>> sequence across all architectures. I can see three approaches:
>>
>>  1) Tweak each arch until they are all the same and then 'throw the switch'
>>  2) Pick one or two arches and 'throw the switch' on them and migrate the
>>    others later
>>  3) Pull out common bits and pieced and deal with the left-overs later
>>
>> It looks like you've opted for number 3. The problem I see in this approach
>> is that you are relying on the other arches to follow your lead which is, at
>> best, a 50/50 bet.
>
> Sort of. Actually I have gone for option 2. But rather than send a
> series which switches over ARM and then x86 (about 40 patches) I have
> decided to do it in steps. Relocation is the first step.

OK, sounds like a plan

>> Now I look at board_init_f() and board_init_r() in ARM and x86 and think
>> 'why on earth isn't most of than in the init loops?'. The boot sequence
>> for ARM and x86 should be reducible to (all arches are generally pretty
>> close to this, so it should not introduce too much pain when they decide
>> to switch)
>
> Yes my generic board.c consists of a couple of init loops. I can email
> the patch if you like but it is not ready for the mailing list.

Hmm, I don't think I have the time right now

>>  - Low level init (reset vector etc)
>>  - Setup temporary stack and gd storage (typically in CPU cache)
>>  + init_f loop <-- SDRAM initialised
>>  - Move gd to RAM
>>  - Setup new stack in SDRAM
>>  - Turn on caching (cannot be done while gd and stack in cache!!!)
>>  + Calculate relocation address
>>  + Copy to RAM
>>  + Do relocation fixups
>>  + Clear BSS
>>  - Jump to RAM
>>  + init_r loop
>>  + main_loop()
>>
>> + = common code
>> - = arch, SoC, or board specific
>>
>> Note: Typically the boot sequence currently has 'Turn on caching' after jump
>> to RAM - I have patches in the works which change this for x86
>>
>> So my vote would be #2 - Unify the ARM and x86 and bring any code that can
>> be inside the init loops in and then switch both over to a generic init
>> implementation. Other arches can be switched later
>
> Agreed, but I can't send a series that big, so I am doing it in stages.

Excellent - I don't have a huge amoun of spare time, but if you let me know
what you need, I can probably give x86 a little prod here and there to
align it with what you are doing. I already have a few patches, and Gabe has
been doing a bit as well in order to make the coreboot port integrate more
seamlessly.

Regards,

Graeme