[U-Boot] U-Boot proper(not SPL) relocate option

Sun Nov 26 13:44:34 UTC 2017

> On 26 Nov 2017, at 12:38, Simon Glass <sjg at chromium.org> wrote:
> 
> Hi Philipp,
> 
> On 25 November 2017 at 16:31, Dr. Philipp Tomsich
> <philipp.tomsich at theobroma-systems.com <mailto:philipp.tomsich at theobroma-systems.com>> wrote:
>> Hi,
>> 
>>> On 25 Nov 2017, at 23:34, Simon Glass <sjg at chromium.org> wrote:
>>> 
>>> +Tom, Masahiro, Philipp
>>> 
>>> Hi,
>>> 
>>> On 22 November 2017 at 03:27, Wolfgang Denk <wd at denx.de> wrote:
>>>> Dear Kever Yang,
>>>> 
>>>> In message <fd0bb500-80c4-f317-cc18-f7aaf1344fd8 at rock-chips.com> you wrote:
>>>>> 
>>>>> I can understand this feature, we always do dram_init_banks() first,
>>>>> then we relocate to 'known' area, then will be no risk to access memory.
>>>>> I believe there must be some historical reason for some kind of device,
>>>>> the relocate feature is a wonderful idea for it.
>>>> 
>>>> This is actuallyu not so much a feature needed to support some
>>>> specific device (in this case much simpler approahces would be
>>>> possible), but to support a whole set of features.  Unfortunately
>>>> these appear to get forgotten / ignored over time.
>>>> 
>>>>>    many other SoCs should be similar.
>>>>> - Without relocate we can save many step, some of our customer really
>>>>>    care much about the boot time duration.
>>>>>    * no need to relocate everything
>>>>>    * no need to copy all the code
>>>>>    * no need init the driver more than once
>>>> 
>>>> Please have a look at the README, section "Memory Management".
>>>> The reloaction is not done to any _fixed_ address, but the address
>>>> is actually computed at runtime, depending on a number features
>>>> enabled (at least this is how it used to be - appearently little of
>>>> this is tested on a regular base, so I would not be surprised if
>>>> things are broken today).
>>>> 
>>>> The basic idea was to reserve areas of memory at the top of RAM,
>>>> that would not be initialized / modified by U-Boot and Linux, not
>>>> even across a reset / warm boot.
>>>> 
>>>> This was used for exaple for:
>>>> 
>>>> - pRAM (Protected RAM) which could be used to store all kind of data
>>>> (for example, using a pramfs [Protected and Persistent RAM
>>>> Filesystem]) that could be kept across reboots of the OS.
>>>> 
>>>> - shared frame buffer / video memory. U-Boot and Linux would be able
>>>> to initialize the video memory just once (in U-Boot) and then
>>>> share it, maybe even across reboots.  especially, this would allow
>>>> for a very early splash screen that gets passed (flicker free) to
>>>> Linux until some Linux GUI takes over (much more difficult today).
>>>> 
>>>> - shared log buffer: U-Boot and Linux used to use the same syslog
>>>> buffer mechanism, so you could share it between U-Boot and Linux.
>>>> this allows for example to
>>>> * read the Linux kernel panic messages after reset in U-Boot; this
>>>>   is very useful when you bring up a new system and Linux crashes
>>>>   before it can display the log buffer on the console
>>>> * pass U-Boot POST results on to Linux, so the application code
>>>>   can read and process these
>>>> * process the system log of the previous run (especially after a
>>>>   panic) in Lunux after it rebootet.
>>>> 
>>>> etc.
>>>> 
>>>> There are a number of such features which require to reserve room at
>>>> the top of RAM, the size of which is calculatedat runtime, often
>>>> depending on user settable environment data.
>>>> 
>>>> All this cannot be done without relocation to a (dynmaically
>>>> computed) target address.
>>>> 
>>>> 
>>>> Yes, the code could be simpler and faster without that - but then,
>>>> you cut off a number of features.
>>> 
>>> I would be interested in seeing benchmarks showing the cost of
>>> relocation in terms of boot time. Last time I did this was on Exynos 5
>>> and it was some years ago. The time was pretty small provided the
>>> cache was on for the memory copies associated with relocation itself.
>>> Something like 10-20ms but I don't have the numbers handy.
>>> 
>>> I think it is useful to be able to allocate memory in board_init_f()
>>> for use by U-Boot for things like the display and the malloc() region.
>>> 
>>> Options we might consider:
>>> 
>>> 1. Don't relocate the code and data. Thus we could avoid the copy and
>>> relocation cost. This is already supported with the GD_FLG_SKIP_RELOC
>>> used when U-Boot runs as an EFI app
>>> 
>>> 2. Rather than throwing away the old malloc() region, keep it around
>>> so existing allocated blocks work. Then new malloc() region would be
>>> used for future allocations. We could perhaps ignore free() calls in
>>> that region
>>> 
>>> 2a. This would allow us to avoid re-init of driver model in most cases
>>> I think. E.g. we could init serial and timer before relocation and
>>> leave them inited after relocation. We could just init the
>>> 'additional' devices not done before relocation.
>>> 
>>> 2b. I suppose we could even extend this to SPL if we wanted to. I
>>> suspect it would just be a pain though, since SPL might use memory
>>> that U-Boot wants.
>>> 
>>> 3. We could turn on the cache earlier. This removes most of the
>>> boot-time penalty. Ideally this should be turned on in SPL and perhaps
>>> redone in U-Boot which has more memory available. If SPL is not used,
>>> we could turn on the cache before relocation.
>> 
>> Both turning on the cache and initialising the clocking could be of benefit
>> to boot-time.
>> 
>> However, the biggest possible gain will come from utilising Falcon mode
>> to skip the full U-Boot stage and directly boot into the OS from SPL.  This
>> assumes that the drivers involved are fully optimised, so loading up the
>> OS image does not take longer than necessary.
> 
> I'd like to see numbers on that. From my experience, loading and
> running U-Boot does not take very long…

I was referring to the OS images, not to U-Boot itself.
While U-Boot will less than 512KB, a typical kernel image will be a handful
of MB… plus there may be a few MB of ramdisk to accompany it.

>> 
>>> 4. Rather than the reserving memory in board_init_f() we could have it
>>> call malloc() from the expanded region. We could then perhaps then
>>> move this reserve/allocate code in to particular drivers or
>>> subsystems, and drop a good chunk of the init sequence. We would need
>>> to have a larger malloc() region than is currently the case.
>>> 
>>> There are still some arch-specific bits in board_init_f() which make
>>> these sorts of changes a bit tricky to support generically. IMO it
>>> would be best to move to 'generic relocation' written in C, where all
>>> archs work basically the same way, before attempting any of the above.
>>> 
>>> Still, I can see some benefits and even some simplifications.
>>> 
>>> Regards,
>>> Simon
>> 
> 
> Regards,
> Simon