[U-Boot] [PATCH 0/57] RFC: Move arch-specific global data into its own structure

Thu Dec 6 01:33:34 CET 2012

Hi,

On Tue, Dec 4, 2012 at 5:14 PM, Graeme Russ <graeme.russ at gmail.com> wrote:
> Hi Wolfgang,
>
> On Wed, Dec 5, 2012 at 6:25 AM, Wolfgang Denk <wd at denx.de> wrote:
>> Dear Simon Glass,
>>
>> In message <CAPnjgZ2KVHV6JCvOjiQBrXFCfHMeWfEfj9bLHFw_Qyf5_7dj8Q at mail.gmail.com> you wrote:
>>>
>>> > To be honest, I think gd should only be a temporary structure used to
>>> > carry specific data through the initialisation process up to the point
>>> > BSS becomes available. With the 'early malloc' patches in the
>>> > pipeline, it might even be possible to malloc the gd structure early
>>> > and then when BSS is available, copy the data into the final global
>>> > data structure in BSS. I think that would be complicated by functions
>>> > that need to use gd both before and after BSS becomes available.
>>>
>>> I mostly agree, but that sounds like an exercise in removing fields
>>> from the gd one by one in the source code. The bit I am not sure of is
>>> whether it is useful for gd to hang around post relocation to provide
>>> access to the data that was decided on early in boot (after all, the
>>> position in memory of gd changes post relocation, so why maintain two
>>> structures for the same info?).
>>
>> Sure.  If you look back how this developed, then initially there was
>> only struct bd_info.  Then it turned out that it costs too much of
>> code size (and performance, actually) to pass around the same struct
>> as parameter to about each and every functiuon, so I invented GD - wit
>> the intention to drop it as soon as writable global data becomes
>> available, i. e. after relocation.  I even think the first versions
>> worked that way.  Only later that code code optimized because it
>> seemed easier to keep this struct and be able to use the same code
>> before and after relocation.  And open Pandora's box was...
>
> Yes, the old 'cost versus complexity' problem. Seriously, take a look at
> arch/x86/lib/board.c, it's nice and clean and give a good view of how we
> can move forward.
>
> For starters, the functions listed in init_sequence_f and init_sequence_f_r
> never need to be copied into RAM (there are functions they call that may
> need to be though). Like the Linux kernel, these can be moved into a
> dedicated linker section and not copied (and their relocation entries can
> be skipped as well). For x86, there are not a lot of functions in these
> two lists. Maybe these can have 'gd' passed to them
>
> init_sequence_r is the big list so passing 'gd' to each of these will
> result in massive code bloat. But by this stage, we have BSS, so global
> data is writable and there is no need to pass gd.
>
> BSS is actually available during the processing of init_sequence_f_r,
> so in theory it would be possible to copy data from gd (used during
> init_sequence_f) into BSS during the processing of init_sequence_f_r
>
> All that would be left is dealing with the (handful?) of functions that
> are called from both init_sequence_f and init_sequence_r (I doubt any
> common functions will be called during init_sequence_f_r). One option
> may be to pass a point to gd to these functions. If it is NULL, use
> the variable in BSS, otherwise use gd.

Sounds reasonable to me.

I modified buildman to summarise image sizes for each architecture.
Here are the code size results:

       x86: (3 boards)   text -26.7
   sandbox: (1 boards)   text +64.0   bss +96.0
      m68k: (50 boards)   text +1.5
   powerpc: (621 boards)   text +2.4   data +0.0
        sh: (20 boards)   text +14.4
microblaze: (1 boards)   text -24.0   bss -8.0
       arm: (283 boards)   spl/u-boot-spl:text -0.2   text -21.5
spl/u-boot-spl:data +4.8   bss +0.5
     nds32: (3 boards)   text -8.0

The numbers indicate the average number of bytes increase(+) or
decrease(-) with this series applied, for each element of the image
size. So for example, powerpc text increases by an average of 2.4
bytes, ARM text reduces by an average of 21.5 bytes. ARM spl data
increases by an average of 4.8 bytes.

To me this doesn't seem very significant and the differences are minor.

Regards,
Simon

>
> Regards,
>
> Graeme