[U-Boot] U-Boot proper(not SPL) relocate option

Simon Glass sjg at chromium.org
Mon Nov 27 17:13:09 UTC 2017


(Tom - any thoughts about a more expansive cc list on this?)

Hi Masahiro,

On 26 November 2017 at 07:16, Masahiro Yamada
<yamada.masahiro at socionext.com> wrote:
> 2017-11-26 20:38 GMT+09:00 Simon Glass <sjg at chromium.org>:
>> Hi Philipp,
>>
>> On 25 November 2017 at 16:31, Dr. Philipp Tomsich
>> <philipp.tomsich at theobroma-systems.com> wrote:
>>> Hi,
>>>
>>>> On 25 Nov 2017, at 23:34, Simon Glass <sjg at chromium.org> wrote:
>>>>
>>>> +Tom, Masahiro, Philipp
>>>>
>>>> Hi,
>>>>
>>>> On 22 November 2017 at 03:27, Wolfgang Denk <wd at denx.de> wrote:
>>>>> Dear Kever Yang,
>>>>>
>>>>> In message <fd0bb500-80c4-f317-cc18-f7aaf1344fd8 at rock-chips.com> you wrote:
>>>>>>
>>>>>> I can understand this feature, we always do dram_init_banks() first,
>>>>>> then we relocate to 'known' area, then will be no risk to access memory.
>>>>>> I believe there must be some historical reason for some kind of device,
>>>>>> the relocate feature is a wonderful idea for it.
>>>>>
>>>>> This is actuallyu not so much a feature needed to support some
>>>>> specific device (in this case much simpler approahces would be
>>>>> possible), but to support a whole set of features.  Unfortunately
>>>>> these appear to get forgotten / ignored over time.
>>>>>
>>>>>>     many other SoCs should be similar.
>>>>>> - Without relocate we can save many step, some of our customer really
>>>>>>     care much about the boot time duration.
>>>>>>     * no need to relocate everything
>>>>>>     * no need to copy all the code
>>>>>>     * no need init the driver more than once
>>>>>
>>>>> Please have a look at the README, section "Memory Management".
>>>>> The reloaction is not done to any _fixed_ address, but the address
>>>>> is actually computed at runtime, depending on a number features
>>>>> enabled (at least this is how it used to be - appearently little of
>>>>> this is tested on a regular base, so I would not be surprised if
>>>>> things are broken today).
>>>>>
>>>>> The basic idea was to reserve areas of memory at the top of RAM,
>>>>> that would not be initialized / modified by U-Boot and Linux, not
>>>>> even across a reset / warm boot.
>>>>>
>>>>> This was used for exaple for:
>>>>>
>>>>> - pRAM (Protected RAM) which could be used to store all kind of data
>>>>>  (for example, using a pramfs [Protected and Persistent RAM
>>>>>  Filesystem]) that could be kept across reboots of the OS.
>>>>>
>>>>> - shared frame buffer / video memory. U-Boot and Linux would be able
>>>>>  to initialize the video memory just once (in U-Boot) and then
>>>>>  share it, maybe even across reboots.  especially, this would allow
>>>>>  for a very early splash screen that gets passed (flicker free) to
>>>>>  Linux until some Linux GUI takes over (much more difficult today).
>>>>>
>>>>> - shared log buffer: U-Boot and Linux used to use the same syslog
>>>>>  buffer mechanism, so you could share it between U-Boot and Linux.
>>>>>  this allows for example to
>>>>>  * read the Linux kernel panic messages after reset in U-Boot; this
>>>>>    is very useful when you bring up a new system and Linux crashes
>>>>>    before it can display the log buffer on the console
>>>>>  * pass U-Boot POST results on to Linux, so the application code
>>>>>    can read and process these
>>>>>  * process the system log of the previous run (especially after a
>>>>>    panic) in Lunux after it rebootet.
>>>>>
>>>>> etc.
>>>>>
>>>>> There are a number of such features which require to reserve room at
>>>>> the top of RAM, the size of which is calculatedat runtime, often
>>>>> depending on user settable environment data.
>>>>>
>>>>> All this cannot be done without relocation to a (dynmaically
>>>>> computed) target address.
>>>>>
>>>>>
>>>>> Yes, the code could be simpler and faster without that - but then,
>>>>> you cut off a number of features.
>>>>
>>>> I would be interested in seeing benchmarks showing the cost of
>>>> relocation in terms of boot time. Last time I did this was on Exynos 5
>>>> and it was some years ago. The time was pretty small provided the
>>>> cache was on for the memory copies associated with relocation itself.
>>>> Something like 10-20ms but I don't have the numbers handy.
>>>>
>>>> I think it is useful to be able to allocate memory in board_init_f()
>>>> for use by U-Boot for things like the display and the malloc() region.
>>>>
>>>> Options we might consider:
>>>>
>>>> 1. Don't relocate the code and data. Thus we could avoid the copy and
>>>> relocation cost. This is already supported with the GD_FLG_SKIP_RELOC
>>>> used when U-Boot runs as an EFI app
>>>>
>>>> 2. Rather than throwing away the old malloc() region, keep it around
>>>> so existing allocated blocks work. Then new malloc() region would be
>>>> used for future allocations. We could perhaps ignore free() calls in
>>>> that region
>>>>
>>>> 2a. This would allow us to avoid re-init of driver model in most cases
>>>> I think. E.g. we could init serial and timer before relocation and
>>>> leave them inited after relocation. We could just init the
>>>> 'additional' devices not done before relocation.
>>>>
>>>> 2b. I suppose we could even extend this to SPL if we wanted to. I
>>>> suspect it would just be a pain though, since SPL might use memory
>>>> that U-Boot wants.
>>>>
>>>> 3. We could turn on the cache earlier. This removes most of the
>>>> boot-time penalty. Ideally this should be turned on in SPL and perhaps
>>>> redone in U-Boot which has more memory available. If SPL is not used,
>>>> we could turn on the cache before relocation.
>>>
>>> Both turning on the cache and initialising the clocking could be of benefit
>>> to boot-time.
>>>
>>> However, the biggest possible gain will come from utilising Falcon mode
>>> to skip the full U-Boot stage and directly boot into the OS from SPL.  This
>>> assumes that the drivers involved are fully optimised, so loading up the
>>> OS image does not take longer than necessary.
>>
>> I'd like to see numbers on that. From my experience, loading and
>> running U-Boot does not take very long...
>>
>>>
>>>> 4. Rather than the reserving memory in board_init_f() we could have it
>>>> call malloc() from the expanded region. We could then perhaps then
>>>> move this reserve/allocate code in to particular drivers or
>>>> subsystems, and drop a good chunk of the init sequence. We would need
>>>> to have a larger malloc() region than is currently the case.
>>>>
>>>> There are still some arch-specific bits in board_init_f() which make
>>>> these sorts of changes a bit tricky to support generically. IMO it
>>>> would be best to move to 'generic relocation' written in C, where all
>>>> archs work basically the same way, before attempting any of the above.
>>>>
>>>> Still, I can see some benefits and even some simplifications.
>>>>
>>>> Regards,
>>>> Simon
>>>
>
>
>
> This discussion should have happened.
> U-Boot boot sequence is crazily inefficient.
>
>
>
> When we talk about "relocation", two things are happening.
>
>  [1] U-Boot proper copies itself to the very end of DRAM
>  [2] Fix-up the global symbols
>
> In my opinion, only [2] is useful.
>
>
> SPL initializes the DRAM, so it knows the base and size of DRAM.
> SPL should be able to load the U-Boot proper to the final destination.
> So, [1] is unnecessary.
>
>
> [2] is necessary because SPL may load the U-Boot proper
> to a different place than CONFIG_SYS_TEXT_BASE.
> This feature is useful for platforms
> whose DRAM base/size is only known at run-time.
> (Of course, it should be user-configurable by CONFIG_RELOCATE
> or something.)
>
> Moreover, board_init_f() is unneeded -
> everything in board_init_f() is already done by SPL.
> Multiple-time DM initialization is really inefficient and ugly.
>
>
> The following is how the ideal boot loader would work.
>
>
> Requirement for U-Boot proper:
> U-Boot never changes the location by itself.
> So, SPL or a vendor loader must load U-Boot proper
> to the final destination directly.
> (You can load it to the very end of DRAM if you like,
> but the actual place does not matter here.)
>
>
> Boot sequence of U-Boot proper:
> If CONFIG_RELOCATE (or something) is enabled,
> it fixes the global symbols at the very beginning
> of the boot.
> (In this case, CONFIG_SYS_TEXT_BASE can be arbitrary)
>
> That's it.  Proceed to the rest of init code.
> (= board_init_r)
> board_init_f() is unnecessary.
>
> This should work for recent platforms.

Yes that sounds reasonable to me.

We could do the symbol fixup/relocation in SPL after loading U-Boot.,
although that would probably push us to using ELF format for U-Boot
which is a bit limited.

Still I think the biggest performance improvement comes from turning
on the cache in SPL. So the above is a simplification, not really a
speed-up.

>
>
>
> We should think about old platforms that boot from a NOR flash or something.
> There are two solutions:
>  - execute-in-place: run the code in the flash directly
>  - use SPL (common/spl/spl-nor.c) if you want to run
>    it from RAM

This seems like a big regression in functionality. For example for x86
32-bit we currently don't have an SPL (we do for 64-bit). So I think
this means that everything would be forced to have an SPL?

I am wondering who else we should cc on this discussion?

Regards,
Simon


More information about the U-Boot mailing list