[U-Boot] [PATCH v9] [RFC] Add dmmalloc module for DM.

Fri Oct 26 01:04:28 CEST 2012

Hi Tomas,

On Fri, Oct 26, 2012 at 6:16 AM, Tomas Hlavacek <tmshlvck at gmail.com> wrote:
> Hello Graeme,
>
> On Thu, Oct 25, 2012 at 3:40 AM, Graeme Russ <graeme.russ at gmail.com> wrote:
>
>>> diff --git a/arch/arm/include/asm/global_data.h b/arch/arm/include/asm/global_data.h
>>> index 2b9af93..9045829 100644
>>> --- a/arch/arm/include/asm/global_data.h
>>> +++ b/arch/arm/include/asm/global_data.h
>>> @@ -82,6 +82,9 @@ typedef       struct  global_data {
>>>         unsigned long   post_log_res; /* success of POST test */
>>>         unsigned long   post_init_f_time; /* When post_init_f started */
>>>  #endif
>>> +#ifdef CONFIG_SYS_EARLY_MALLOC
>>> +       void            *early_heap;    /* heap for early_malloc */
>>> +#endif
>>
>> Why not early_heap_header *early_heap; ?
>>
>
> It might be.
>
> Actually it is a good point because I am using 3 different ways of
> dealing with addresses: 1) struct early_heap header * or struct
> early_block_header * - I am using this when I want to access members
> of the stucture or compute address past the structure (which is where
> the heap or block starts); 2) phys_addr_t - which is plain integer and
> I use this for simple computations when I do not want to worry about
> pointer arithmetic; 3) void * when I have just plain address,
> especially when I want to pass an addres among logical parts of the
> mallocator or outside. This may a bit controversial and perhaps I
> should replace it by specific strucutre pointers internally.
>
> I am unable to decide: Should I remove struct early_heap_header from
> dmmalloc.h making it publicly unavailable or should I rather change
> the void * to struct early_heap_header * in the GD structure? What do
> you think is better?

I think struct early_heap_header * in the GD structure is the better way to
go as that is exactly what it is.

[snip]

>>> +       h = (struct early_heap_header *)CONFIG_SYS_EARLY_HEAP_ADDR;
>>> +       b = (struct early_block_header *)(h + 1);
>>
>> Hmmm, does this really work? I would have thought:
>>
>> b = (struct early_block_header *)(h + sizeof(struct early_heap_header));
>>
>> but I could be mistaken
>
> It seems that it works as it is (at least I wrote bunch of tests and I
> inspected resulting heaps and it was all right). I believe that since
> h is a pointer to the struct early_heap_header then pointer arithmetic
> is in effect and h+1 actually means "next element in the array of
> struct early_heap_header". Which is the address past the header that
> equals beginning of the heap data block. (?)

As I said, I could be mistaken - it appears I am :)

>>> +int early_malloc_active(void)
>>> +{
>>> +       return ((gd->flags & GD_FLG_RELOC) != GD_FLG_RELOC);
>>> +}
>>
>> I think we need another flag - GD_FLG_RELOC gets set before the permanent
>> heap is initialised, so there is a window of opportunity where things may
>> break
>
> Well, as I wrote in the commit message - this is only a temporary
> implementation. I suppose I am going to change this when we have more
> coarse initialization flags wired into DM (which I believe we are
> going to have it anyway). So now I am just working around "forward
> dependency" here.
>
>>
>>> +
>>> +void early_heap_dump(struct early_heap_header *h)
>>> +{
>>> +       struct early_block_header *b;
>>> +
>>> +       debug("heap: h=%p, h->size=%d\n", h, h->size);
>>> +
>>> +       b = (struct early_block_header *)(h+1);
>>> +       while ((phys_addr_t)b + sizeof(struct early_block_header)
>>> +                       < (phys_addr_t)h + h->size) {
>>> +               debug("block: h=%p h->size=%d b=%p b->size=%d b->(used)=%d\n",
>>> +                               h, h->size, b, BLOCK_SIZE(b->size),
>>> +                               BLOCK_USED(b->size));
>>> +               assert(BLOCK_SIZE(b->size) > 0);
>>> +               b = (struct early_block_header *)((phys_addr_t)b +
>>> +                               sizeof(struct early_block_header) +
>>> +                               BLOCK_SIZE(b->size));
>>> +       }
>>> +       debug("--- heap dump end ---\n");
>>> +}
>>
>> Nice touch, but could we just iterate through all ealry heap chunks starting
>> from gd->early_heap?
>
> Or I can have two functions. One heap specific and one for all heaps.
> I think both might be useful when somebody needs to debug early_malloc
> or memory usage etc. in the early stage. Thanks.

True, just adding another function which iterates through the heaps and
calls this function would be fine.

[snip]

>>> +static inline void *dmrealloc(void *oldaddr, size_t bytes)
>>> +{
>>> +#ifdef CONFIG_SYS_EARLY_MALLOC
>>> +       char *addr;
>>> +       struct early_block_header *h;
>>> +       if (early_malloc_active()) {
>>> +               addr = dmmalloc(bytes);
>>> +               if (addr == NULL)
>>> +                       return NULL;
>>> +
>>> +               h = BLOCK_HEADER(oldaddr);
>>> +               if (BLOCK_FREE(h->size))
>>> +                       return NULL;
>>> +
>>> +               if (bytes > BLOCK_SIZE(h->size))
>>> +                       bytes = BLOCK_SIZE(h->size);
>>> +
>>> +               memcpy(addr, oldaddr, bytes);
>>> +               dmfree(oldaddr);
>>> +               return addr;
>>> +       }
>>> +#endif /* CONFIG_SYS_EARLY_MALLOC */
>>> +       return realloc(oldaddr, bytes);
>>> +}
>>
>> Hmmm, we have a very intersting corner case to deal with. What if we use
>> dmrealloc() to move allocated data from the early malloc pool to the final
>> malloc pool?
>>
>> At some point, we need to assume the developer is only ever going to pass
>> early malloc'd memory to dmrealloc()
>
> Good point! The current code would do the job assuming that the
> early_malloc can still access the proper early_heap (or a copy, but in
> that case some additional magic is needed) and the real malloc is
> already initialized.
>
> As you can see I am still sticking with the double-copy method. It is
> maybe due to lack of insight. But the discussion here was not
> absolutely conclusive last time. I have even some experimental code
> (not ready for submitting at all) for double copy method but I would
> prefer to discuss it separately when the early_malloc() is done.

Well the double-copy approach to reallocating early-malloc'd memory and
how dmrealloc() is implemented are tightly coupled.

I understand the reason for the double copy is performance related (getting
into a position to enable cache as early as possible), but I wonder how
much will really be gained. It seems to me that the only performance gain
will be for the execution of the driver 'relocation fixup' code wich, I
assume, would not really consume that many CPU cycles.

There is a point where code simplicity outweighs performance gains.

>> The other option is to use the gd->flags...
>>
>> Define two flags - something like:
>>
>> GD_FLG_HEAP_INIT -> Final heap has been initialised
>> GD_FLG_EH_DONE    -> free(), realloc() refer to final heap
>>
>> (I don't like the names, I'm just not up to thinking of anything better)
>>
>> This way we can use dmalloc() prior to the heap being initialised and then
>> set GD_FLG_HEAP_INIT. Once GD_FLG_HEAP_INIT has been set, do a bunch of
>> dmrealloc() calls to essentially move the data into the permanent heap (of
>> course pointers int the structures need to be fixed up by the drivers) and
>> finally set GD_FLG_EH_DONE (and call early_heap_dump)
>>
>> One problem I see is what happens of you call malloc() and free() on the
>> same block between the setting of GD_FLG_HEAP_INIT and GD_FLG_EH_DONE? The
>> code will try to reference the block as if it is in the early heap, but it
>> won't be.
>>
>> One solution is, on detection of GD_FLG_HEAP_INIT being set, dmfree() and
>> dmrealloc() could search the early heap chunks instead of just assuming
>> that the referenced block is in the early heap (if GD_FLG_HEAP_INIT has not
>> been set, it will be safe to assume the memory is on the early heap)
>
> Sure we will have that flags. But I think we can use them as well for
> switching from DM driver instance list to tree for example. Or other
> way around: I can use DM flags for early_malloc. Therefore I would
> like to synchronize with DM cores for PCI and another low-level things
> which are certainly going to start in early stage. It would be best to
> use the same flags and switch on/off early_malloc based on DM internal
> state.

Ah, I see where more performance gains are to be made by switching on cache
earlier - During the reallocation phase, you are switching from the list
based structure (fast for the small number of pre-SDRAM drivers) into the
final tree based structure.

I'm looking at this early malloc code from a much more generic point of
view - I think there are use-cases outside the driver model, so I don't
see a need (rather the opposite) to tie early malloc to the driver model

Regards,

Graeme