[U-Boot] U-Boot proper(not SPL) relocate option

Peter Robinson pbrobinson at gmail.com
Tue Nov 28 11:30:19 UTC 2017


>> (Tom - any thoughts about a more expansive cc list on this?)
>>
>> Hi Masahiro,
>>
>> On 26 November 2017 at 07:16, Masahiro Yamada
>> <yamada.masahiro at socionext.com> wrote:
>> > 2017-11-26 20:38 GMT+09:00 Simon Glass <sjg at chromium.org>:
>> >> Hi Philipp,
>> >>
>> >> On 25 November 2017 at 16:31, Dr. Philipp Tomsich
>> >> <philipp.tomsich at theobroma-systems.com> wrote:
>> >>> Hi,
>> >>>
>> >>>> On 25 Nov 2017, at 23:34, Simon Glass <sjg at chromium.org> wrote:
>> >>>>
>> >>>> +Tom, Masahiro, Philipp
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> On 22 November 2017 at 03:27, Wolfgang Denk <wd at denx.de> wrote:
>> >>>>> Dear Kever Yang,
>> >>>>>
>> >>>>> In message
>> >>>>> <fd0bb500-80c4-f317-cc18-f7aaf1344fd8 at rock-chips.com> you
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> I can understand this feature, we always do dram_init_banks()
>> >>>>>> first, then we relocate to 'known' area, then will be no risk
>> >>>>>> to access memory. I believe there must be some historical
>> >>>>>> reason for some kind of device, the relocate feature is a
>> >>>>>> wonderful idea for it.
>> >>>>>
>> >>>>> This is actuallyu not so much a feature needed to support some
>> >>>>> specific device (in this case much simpler approahces would be
>> >>>>> possible), but to support a whole set of features.
>> >>>>> Unfortunately these appear to get forgotten / ignored over time.
>> >>>>>
>> >>>>>>     many other SoCs should be similar.
>> >>>>>> - Without relocate we can save many step, some of our customer
>> >>>>>> really care much about the boot time duration.
>> >>>>>>     * no need to relocate everything
>> >>>>>>     * no need to copy all the code
>> >>>>>>     * no need init the driver more than once
>> >>>>>
>> >>>>> Please have a look at the README, section "Memory Management".
>> >>>>> The reloaction is not done to any _fixed_ address, but the
>> >>>>> address is actually computed at runtime, depending on a number
>> >>>>> features enabled (at least this is how it used to be -
>> >>>>> appearently little of this is tested on a regular base, so I
>> >>>>> would not be surprised if things are broken today).
>> >>>>>
>> >>>>> The basic idea was to reserve areas of memory at the top of RAM,
>> >>>>> that would not be initialized / modified by U-Boot and Linux,
>> >>>>> not even across a reset / warm boot.
>> >>>>>
>> >>>>> This was used for exaple for:
>> >>>>>
>> >>>>> - pRAM (Protected RAM) which could be used to store all kind of
>> >>>>> data (for example, using a pramfs [Protected and Persistent RAM
>> >>>>>  Filesystem]) that could be kept across reboots of the OS.
>> >>>>>
>> >>>>> - shared frame buffer / video memory. U-Boot and Linux would be
>> >>>>> able to initialize the video memory just once (in U-Boot) and
>> >>>>> then share it, maybe even across reboots.  especially, this
>> >>>>> would allow for a very early splash screen that gets passed
>> >>>>> (flicker free) to Linux until some Linux GUI takes over (much
>> >>>>> more difficult today).
>> >>>>>
>> >>>>> - shared log buffer: U-Boot and Linux used to use the same
>> >>>>> syslog buffer mechanism, so you could share it between U-Boot
>> >>>>> and Linux. this allows for example to
>> >>>>>  * read the Linux kernel panic messages after reset in U-Boot;
>> >>>>> this is very useful when you bring up a new system and Linux
>> >>>>> crashes before it can display the log buffer on the console
>> >>>>>  * pass U-Boot POST results on to Linux, so the application code
>> >>>>>    can read and process these
>> >>>>>  * process the system log of the previous run (especially after
>> >>>>> a panic) in Lunux after it rebootet.
>> >>>>>
>> >>>>> etc.
>> >>>>>
>> >>>>> There are a number of such features which require to reserve
>> >>>>> room at the top of RAM, the size of which is calculatedat
>> >>>>> runtime, often depending on user settable environment data.
>> >>>>>
>> >>>>> All this cannot be done without relocation to a (dynmaically
>> >>>>> computed) target address.
>> >>>>>
>> >>>>>
>> >>>>> Yes, the code could be simpler and faster without that - but
>> >>>>> then, you cut off a number of features.
>> >>>>
>> >>>> I would be interested in seeing benchmarks showing the cost of
>> >>>> relocation in terms of boot time. Last time I did this was on
>> >>>> Exynos 5 and it was some years ago. The time was pretty small
>> >>>> provided the cache was on for the memory copies associated with
>> >>>> relocation itself. Something like 10-20ms but I don't have the
>> >>>> numbers handy.
>> >>>>
>> >>>> I think it is useful to be able to allocate memory in
>> >>>> board_init_f() for use by U-Boot for things like the display and
>> >>>> the malloc() region.
>> >>>>
>> >>>> Options we might consider:
>> >>>>
>> >>>> 1. Don't relocate the code and data. Thus we could avoid the
>> >>>> copy and relocation cost. This is already supported with the
>> >>>> GD_FLG_SKIP_RELOC used when U-Boot runs as an EFI app
>> >>>>
>> >>>> 2. Rather than throwing away the old malloc() region, keep it
>> >>>> around so existing allocated blocks work. Then new malloc()
>> >>>> region would be used for future allocations. We could perhaps
>> >>>> ignore free() calls in that region
>> >>>>
>> >>>> 2a. This would allow us to avoid re-init of driver model in most
>> >>>> cases I think. E.g. we could init serial and timer before
>> >>>> relocation and leave them inited after relocation. We could just
>> >>>> init the 'additional' devices not done before relocation.
>> >>>>
>> >>>> 2b. I suppose we could even extend this to SPL if we wanted to. I
>> >>>> suspect it would just be a pain though, since SPL might use
>> >>>> memory that U-Boot wants.
>> >>>>
>> >>>> 3. We could turn on the cache earlier. This removes most of the
>> >>>> boot-time penalty. Ideally this should be turned on in SPL and
>> >>>> perhaps redone in U-Boot which has more memory available. If SPL
>> >>>> is not used, we could turn on the cache before relocation.
>> >>>
>> >>> Both turning on the cache and initialising the clocking could be
>> >>> of benefit to boot-time.
>> >>>
>> >>> However, the biggest possible gain will come from utilising
>> >>> Falcon mode to skip the full U-Boot stage and directly boot into
>> >>> the OS from SPL.  This assumes that the drivers involved are
>> >>> fully optimised, so loading up the OS image does not take longer
>> >>> than necessary.
>> >>
>> >> I'd like to see numbers on that. From my experience, loading and
>> >> running U-Boot does not take very long...
>> >>
>> >>>
>> >>>> 4. Rather than the reserving memory in board_init_f() we could
>> >>>> have it call malloc() from the expanded region. We could then
>> >>>> perhaps then move this reserve/allocate code in to particular
>> >>>> drivers or subsystems, and drop a good chunk of the init
>> >>>> sequence. We would need to have a larger malloc() region than is
>> >>>> currently the case.
>> >>>>
>> >>>> There are still some arch-specific bits in board_init_f() which
>> >>>> make these sorts of changes a bit tricky to support generically.
>> >>>> IMO it would be best to move to 'generic relocation' written in
>> >>>> C, where all archs work basically the same way, before
>> >>>> attempting any of the above.
>> >>>>
>> >>>> Still, I can see some benefits and even some simplifications.
>> >>>>
>> >>>> Regards,
>> >>>> Simon
>> >>>
>> >
>> >
>> >
>> > This discussion should have happened.
>> > U-Boot boot sequence is crazily inefficient.
>> >
>> >
>> >
>> > When we talk about "relocation", two things are happening.
>> >
>> >  [1] U-Boot proper copies itself to the very end of DRAM
>> >  [2] Fix-up the global symbols
>> >
>> > In my opinion, only [2] is useful.
>> >
>> >
>> > SPL initializes the DRAM, so it knows the base and size of DRAM.
>> > SPL should be able to load the U-Boot proper to the final
>> > destination. So, [1] is unnecessary.
>> >
>> >
>> > [2] is necessary because SPL may load the U-Boot proper
>> > to a different place than CONFIG_SYS_TEXT_BASE.
>> > This feature is useful for platforms
>> > whose DRAM base/size is only known at run-time.
>> > (Of course, it should be user-configurable by CONFIG_RELOCATE
>> > or something.)
>> >
>> > Moreover, board_init_f() is unneeded -
>> > everything in board_init_f() is already done by SPL.
>> > Multiple-time DM initialization is really inefficient and ugly.
>> >
>> >
>> > The following is how the ideal boot loader would work.
>> >
>> >
>> > Requirement for U-Boot proper:
>> > U-Boot never changes the location by itself.
>> > So, SPL or a vendor loader must load U-Boot proper
>> > to the final destination directly.
>> > (You can load it to the very end of DRAM if you like,
>> > but the actual place does not matter here.)
>> >
>> >
>> > Boot sequence of U-Boot proper:
>> > If CONFIG_RELOCATE (or something) is enabled,
>> > it fixes the global symbols at the very beginning
>> > of the boot.
>> > (In this case, CONFIG_SYS_TEXT_BASE can be arbitrary)
>> >
>> > That's it.  Proceed to the rest of init code.
>> > (= board_init_r)
>> > board_init_f() is unnecessary.
>> >
>> > This should work for recent platforms.
>>
>> Yes that sounds reasonable to me.
>>
>> We could do the symbol fixup/relocation in SPL after loading U-Boot.,
>> although that would probably push us to using ELF format for U-Boot
>> which is a bit limited.
>>
>> Still I think the biggest performance improvement comes from turning
>> on the cache in SPL. So the above is a simplification, not really a
>> speed-up.
>>
>> >
>> >
>> >
>> > We should think about old platforms that boot from a NOR flash or
>> > something. There are two solutions:
>> >  - execute-in-place: run the code in the flash directly
>> >  - use SPL (common/spl/spl-nor.c) if you want to run
>> >    it from RAM
>>
>> This seems like a big regression in functionality. For example for x86
>> 32-bit we currently don't have an SPL (we do for 64-bit). So I think
>> this means that everything would be forced to have an SPL?
>>
>> I am wondering who else we should cc on this discussion?
>
> Not all boards use SPL. There are some targets, which use FBL (SPL
> counterpart) from vendor and only U-boot proper. Good example is Odroid
> XU3.

Some aarch64 boards like Jetson TX series and Dragonboard chain load
u-boot from some other loader, things like qemu support I don't
believe use SPL either.


More information about the U-Boot mailing list