[U-Boot] early_malloc outline

Wed Aug 1 04:57:35 CEST 2012

Hi Thomas,

On 08/01/2012 01:30 AM, Tomas Hlavacek wrote:
> Hello all!
> 
> In u-boot-dm mailinglist we had a discussion about implementation of
> early_malloc (not only) for U-Boot Driver Model. The intention is to
> have a simple malloc() function in the early stage of init before
> relocation and before RAM is up and running. There was an experimental
> patch that added the early heap to GD structure.
> 
> In the following discussion Graeme Russ pointed out that there is a
> pre-console buffer which does the similar thing. And we should not
> explode GD by adding the early heap (which is going to be few hundreds
> of bytes long) into it. He suggested to create an independent area
> locked in cache lines for early heap in order to allow split GD and
> early heap into more non-contiguous blocks.

More specifically, we must not assume that we have a single, contiguous
region of memory capable of holding pre-relocations early stack,
pre-relocation global data, pre-console buffer, and early (pre-relocation)
heap.

Forget about 'locked cache lines' - That is only important when considering
when to call enable_caches(). The cover the generic case (which covers all
architectures) we simply need to keep in mind that enable_caches() can only
be called _after_ the early heap has been moved to the final (SDRAM) heap.
Therefore, we must keep in mind that any code which manipulated the early
heap into the final heap is going to be performance-hindered.

> Pavel Hermann said that we would have to copy data twice (first before
> the RAM is up and running and caches are still off and second after
> RAM and dlmalloc is initialized).

I think I understand why now - The idea is to blind-copy the early-heap
into SDRAM, enable caches and then process the early heap into final heap.
This _may_ provide a performance bonus on _some_ (most) cases

> Marek Vasut said (earlier in the discussion) that we do not need to
> care about few hundred of bytes, especially after copying them into
> RAM. And Wolfgang Denk resisted. He also pointed out that there are

And so do I - it sets a very bad precedent and it is simply not how
embedded developer should think. This is not 1980's Wall Street - Greed is
NOT good.

> other possibilities where early memory may be allocated -
> on-chip-memory, external SRAM and others and these should be kept in
> mind including existing size restrictions.
> 
> (I apologize for eventual misinterpretation and I am sorry that we do
> not have a link to the u-boot-dm mailinglist archive nor GMANE. But I
> can eventually Fwd. needed pieces of the discussion.)

OK, lets forget about Driver Model here - it is no longer relevant to the
discussion at hand.

> We would like to hear opinions on the early_malloc idea to find a
> broadly acceptable solution.
> 
> Can/should we use some existing mechanism? Or would it be considered a
> viable option to choose different beginning address for early heap,
> use it (in architecture-specific way) and keep the pointer to the
> beginning in GD. Then copy the early heap to memory before caches are
> flushed and in case of DM copy again data from early heap to new
> destinations that has been obtained through malloc() when it is
> initialized?

OK, I'm going to go out on a long and thin limb here (i.e. look out for
daft ideas) and say that all we need before relocation and final heap
initialisation is an early stack and an early heap (no global data or no
pre-relocation buffer as they are currently implemented). What! I hear you
say :)

Well, why can't we put global data and pre-relocation buffer _on_ the early
heap? Yes, it will be a bit tricky as there is some very early code (in
assembler) that reads/writes to/from GD, but if GD is placed at the top of
the heap, it's members can still be directly referenced.

And now we can have some fun with an early version of brk() / sbrk()
whereby if early malloc fails, a call to early_sbrk() will give us more
early heap which _may be in a memory region which is non-contiguous with
the existing early heap.

E.g.:

+ ----------------------+ \
|                       | |
|      Early Stack      | |
|                       | |
+-----------------------+ |
|     Early Heap A      | |
| +-------------------+ | |
| | Early Global Data | | |
| +-------------------+ | > Locked Cache Lines
| |    Early Data A   | | |
| +-------------------+ | |
| |    Early Data B   | | |
| +-------------------+ | |
| |    Early Data C   | | |
+ +-------------------+ + |
|                       | |
|      Unused Bytes     | |
|                       | |
+-----------------------+ /

+-----------------------+ \
|     Early Heap B      | |
| +-------------------+ | |
| |    Early Data D   | | |
| +-------------------+ | |
| |    Early Data E   | | |
| +-------------------+ | |
| |    Early Data F   | | > SRAM
| +-------------------+ | |
| |    Early Data G   | | |
+ +-------------------+ + |
|                       | |
| Free Early Heap Space | |
|                       | |
+-----------------------+ /

Now what we can have is an 'early heap info' structure at the start of each
early heap space:

struct early_heap_info {
  void *next_free_block;
  uint *free_bytes;
  void *next_early_heap;
};

early_malloc() would traverse the early_heap_info list until it found an
ealrly heap with enough space to fulfil the request or, if the last one has
not enough space and next_early_heap == NULL then call early_sbrk() to
attempt to create more early heap. early_sbrk() is platform (and even board
specific) and is intended to release pre-SDRAM memory in the most
appropriate manner possible (fastest first for example, or maybe biggest
first if the cost of setting up the fastest is too much)

And what about global data - I'm wondering how much of it is actually used
across all boards of a particular architecture. I'm thinking that, after
relocation, some contents of global data could be cherry-picked and some
not copied at all. Maybe it could be split into 'Global Data the lives
across relocation' and 'Global Data only used pre-relocation'. Examples of
the latter may include:

  reloc_off - After relocation, is this ever referenced anymore?
  env_buf - Isn't this for pre-relocation anyway? Could it be malloc'd?
  x86 has a few I know are only referenced during the transit through
  relocation and are then forgotten about.

If we move global data onto the heap, does this simplify things or does it
become even more complex?

And as for the question of fixing up pointer in the structures allocated on
the early heap, that is entirely up to the user of the early heap as only
they know what the contents of the structures mean.

Regards,

Graeme