[U-Boot] [PATCH V3 1/3] drivers: block: add block device cache

Sat Apr 2 16:24:05 CEST 2016

Hi Stephen,

On 04/01/2016 07:07 PM, Stephen Warren wrote:
> On 04/01/2016 05:16 PM, Eric Nelson wrote:
>> On 04/01/2016 03:57 PM, Stephen Warren wrote:
>>> On 03/31/2016 02:24 PM, Eric Nelson wrote:
>>>> On 03/30/2016 02:57 PM, Stephen Warren wrote:
>>>>> On 03/30/2016 11:34 AM, Eric Nelson wrote:
>>>>>> On 03/30/2016 07:36 AM, Stephen Warren wrote:
>>>>>>> On 03/28/2016 11:05 AM, Eric Nelson wrote:
>>
>> <snip>
>>
>>>>>
>>>>> We could allocate the data storage for the block cache at the top
>>>>> of RAM
>>>>> before relocation, like many other things are allocated, and hence not
>>>>> use malloc() for that.
>>>>
>>>> Hmmm. We seem to have gone from a discussion about data structures to
>>>> type of allocation.
>>>>
>>>> I'm interested in seeing how that works. Can you provide hints about
>>>> what's doing this now?
>>>
>>> Something like common/board_f.c:reserve_mmu() and many other functions
>>> there. relocaddr starts at approximately the top of RAM, continually
>>> gets adjusted down as many static allocations are reserved, and
>>> eventually becomes the address that U-Boot is relocated to. Simply
>>> adding another entry into init_sequence_f[] for the disk cache might
>>> work.
>>>
>>
>> Thanks for the pointer. I'll review that when time permits.
>>
>> This would remove the opportunity to re-configure the cache though,
>> right?
> 
> Well, it would make it impossible to use less RAM. One could use more by
> having a mix of the initial static allocation plus some additional
> dynamic allocation, but that might get a bit painful to manage.
> 

This might not be too bad though. Even if we allocated 4x the current
defaults, we're only at ~64k.

> It might be interesting to use the MMU more and allow de-fragmentation
> of VA space. That is, assuming there's much more VA space than RAM, such
> as is true on current 64-bit architectures. Then I wouldn't dislike
> dynamic allocation so much:-)
> 

That's interesting, but probably more invasive than this patch set.

>> I'm not sure whether how important this feature is, and I think
>> only time and use will tell.
>>
>> I'd prefer to keep that ability at least for a cycle or two so that
>> I and others can test.
>>
>>>>>> While re-working the code, I also thought more about using an
>>>>>> array and
>>>>>> still don't see how the implementation doesn't get more complex.
>>>>>>
>>>>>> The key bit is that the list is implemented in MRU order so
>>>>>> invalidating the oldest is trivial.
>>>>>
>>>>> Yes, the MRU logic would make it more complex. Is that particularly
>>>>> useful, i.e. is it an intrinsic part of the speedup?
>>>>
>>>> It's not a question of speed with small numbers of entries. The code
>>>> to handle eviction would just be more complex.
>>>
>>> My thought was that if the eviction algorithm wasn't important (i.e.
>>> most of the speedup comes from have some (any) kind of cache, but the
>>> eviction algorithm makes little difference to the gain from having the
>>> cache), we could just drop MRU completely. If that's not possible, then
>>> indeed a list would make implementing MRU easier.
>>>
>>
>> How would we decide which block to discard? I haven't traced enough
>> to know what algorithm(s) might be best, but I can say that there's
>> a preponderance of repeated accesses to the last-accessed block,
>> especially in ext4.
> 
> Perhaps just keep an index into the array, use that index any time
> something is written to the cache, and then increment it each time.
> Probably not anywhere near as optimal as MRU/LRU though.

I see that Tom just applied V3, so I'd be interested in seeing
patches on top of that.

Regards,

Eric