[U-Boot] Early malloc() summary

Graeme Russ graeme.russ at gmail.com
Mon Aug 20 01:47:20 CEST 2012


Hi Tomas,

On Sun, Aug 19, 2012 at 11:21 PM, Tomas Hlavacek <tmshlvck at gmail.com> wrote:
> Hello Graeme!
>
> On Fri, Aug 17, 2012 at 3:15 AM, Graeme Russ <graeme.russ at gmail.com> wrote:
>> dm_malloc(bytes, driver *)
>>   |
>>   +-> early_malloc(bytes, reloc_helper *)  /* Pre-Relocation */
>>   |     |
>>   |     +->register_helper(reloc_helper *)
>>   |     |
>>   |     +->pre_reloc_malloc(size_t bytes)
>>   |
>>   +-> malloc(bytes)                        /* Post-Relocation */
>>
>>
>> Drivers call dm_malloc(), helper functions call early_malloc()
>>
>> dm_malloc() is implemented in the DM core code and checks for whether the
>> call is pre- or post- relocation. If pre-relocation, it checks for the
>> driver having a relocation helper (or the 'I don't need one' flag)
>>
>> early_malloc() is implemented in the early malloc code seperate from the
>> DM code.
>>
>> **** WARNING!!! STOP READING NOW!!! ****
>>
>> early_malloc() registers a relocation function (if provided) which will be
>> called during relocation. DM core will strip this out as it will (for the
>> time being) handle the calling of the relocation helper for each of the
>> registered drivers. In the long term, I think that responsibility might be
>> able to be taken away from DM core (but there may be call-order issues that
>> might make that impossible)
>>
>> The way I imagine it in the future, any code that might possible allocate
>> memory prior to relocation would do something like:
>>
>> static int my_relocator(void *data)
>> {
>>   struct foo *new_bar;
>>
>>   new_bar = malloc(sizeof(struct foo));
>>   mem_cpy(new_bar, data, sizeof(struct foo));
>>
>>   /* Tweak internal new_bar members */
>>
>>   return 0;
>> }
>>
>> int some_function()
>> {
>>   struct foo *bar;
>>
>>   bar = malloc(sizeof(struct foo));
>>   register_helper(bar, my_relocator);
>>
>>   return 0;
>> }
>>
>>
>> And behind the scenes we have:
>>
>> data = malloc(bytes);
>>           |
>>           +->data = pre_reloc_malloc(size_t bytes)   /* Pre-Relocation */
>>           |     |
>>           |     +->add_to_reloc_list(data)
>>           |     |
>>           |     +->return data;
>>           |
>>           +->malloc(size_t bytes);                   /* Post-Relocation */
>>
>> register_helper(data, reloc_helper *)
>>           |
>>           +->update_reloc_list(data, reloc_helper *) /* Pre-Relocation */
>>           |
>>           +->Do Nothing                              /* Post-Relocation */
>>
>> During relocation, the 'reloc list' is processed. Each 'data' entry with no
>> 'reloc_helper' will elicite a (debug) warning to let you know about data
>> that was allocated but will not be relocated.
>
> OK, I got this. It seems to me that everything starts with
> pre_reloc_malloc(). And I think that this is roughly equivalent to my
> void *early_malloc(size_t) function in previous experimental patches.

Correct

> But I am not sure that the identifier pre_reloc_malloc() is proper for
> this function because on archs without strict separation of
> board_init_f and board_init_r, where the U-Boot is running in RAM from
> the very beginning and no relocation is needed (microblaze, nios2,
> openrisc, sh) it does not reflect the actual use - it is the function

Good point, which also highlights why wrapping malloc() might be a good
approach. Architectures which already have SDRAM initialised (either by the
Soc's IPL or the board's SPL for example) prior to U-Boot being loaded
should be allowed to initialise the malloc heap 'extremely early' (perhaps
before even console output). In such cases, there would not be a need to
perform any kind of malloc chunk relocations

> used to obtain allocation from early_heap. And I think that in case of
> that architectures we still need early_heap and working dm_malloc()

Yes, but as above, in some cases early heap may be one and the same as
'late' heap

> before the true malloc() is initialized. (It is because we might still
> need to create the DM tree before malloc is initialized to facilitate
> DM part of actual memory and malloc initialization.)

I think it may be a good exercise (later, not now) to look at these 'U-Boot
already running in RAM' cases and see if 'late' malloc can be initialised
before DM...

> I am thinking about a way to obtain some space for the first
> early_heap (assuming that I have the heap header you suggested some
> time ago that has void *next_early_heap for future expansion with
> arch-specific or CPU/board-specific ways to grab non-contiguous
> early_heap). Do you know some elegant way to obtain some early_heap
> space that would work on each architectures in question? It came to my

No - it is very arch specific. Some may allocate from locked cach lines,
others from SRAM - Who knows. That is why I suggested a brk() function
that would do the allocation in the background.

> mind that I can steal the space from the early stack by something like
> this:
>
> #define DECLARE_EARLY_HEAP_ON_STACK char
> __early_heap[CONFIG_SYS_EARLY_HEAP_SIZE]; \
>                                         gd->early_heap_first = (void *)__early_heap
>
> void board_init_f()
> {
> ...
> memset(gd) here
> ...
> DECLARE_EARLY_HEAP_ON_STACK;

Yes, that could be a possibility

> Although it is somehow architecture independent (except the fact that
> we need sensible value of CONFIG_SYS_EARLY_HEAP_SIZE and it is perhaps
> not feasible for x86 which has 3 init stages - board_init_f,
> board_init_f_r and board_init_r, the stack is lost in between
> board_init_f and board_init_f_r, but true malloc() is initialized as
> late as in board_init_r, if I understand it well), but I am not sure

Yes, you understand it well. The init phases for x86 are designed that way
to get caches online as quick as possible. IMNSHO, I think this sequence
should extend to all architectures that initialise SDRAM in board_init_f()
(as opposed to IPL or SPL). board_init_f_r() copies gd to SDRAM, turns on
caches and then relocates U-Boot into SDRAM. It would be trivial to change
that to:

  - init malloc() pool
  - copy gd to SDRAM
  - relocate early malloc pool
  - turn on caches
  - relocate U-Boot

The crux is to keep the early malloc pool as small as possible (or more
specificially, keep the amount which needs to be relocated as small as
possible). That way, the amount of data moved while caches are off is
minimised.

Hmmm... Maybe early_free() could de-register any relocation helper that
has been associated with that block...

> whether it is acceptable way to grab early_heap space like that.

I think it is valid.

> My intention is to keep the prospective patch with early_heap and
> pre_reloc_malloc() relatively low-profile and do it without
> unnecessary architecture/CPU/ board specific code when possible.

Good :)

> Anyway I think we are going to need only as low as 20B of early_heap
> for the root DM node on wast majority boards and therefore we could go
> forward with really small early_heap in the beginning.

You will need as much as is required for each of the drivers that are
initialised early. Some boards may not even need DM early. But this is
not really anything that you need to be concerned about...

> What do you think?

Start by assuming there is some arbitrary amount of memory available to
the early_malloc() core and create the implementation from there. Don't
worry about all my 'relocation helper' stuff (that is the DM core's problem
for now)

We can then investigate, at an arch-specific level, how to create that
block of memory for early_malloc()

Regards,

Graeme


More information about the U-Boot mailing list