[U-Boot] [PATCH V3 1/3] drivers: block: add block device cache

Wed Mar 30 23:57:32 CEST 2016

On 03/30/2016 11:34 AM, Eric Nelson wrote:
> Thanks again for the detailed review, Stephen.
>
> On 03/30/2016 07:36 AM, Stephen Warren wrote:
>> On 03/28/2016 11:05 AM, Eric Nelson wrote:
>>> Add a block device cache to speed up repeated reads of block devices by
>>> various filesystems.

>>> diff --git a/disk/part.c b/disk/part.c
>>
>>> @@ -268,6 +268,8 @@ void part_init(struct blk_desc *dev_desc)
>>>        const int n_ents = ll_entry_count(struct part_driver, part_driver);
>>>        struct part_driver *entry;
>>>
>>> +    blkcache_invalidate(dev_desc->if_type, dev_desc->devnum);
>>
>> Doesn't this invalidate the cache far too often? I expect that function
>> is called for command the user executes from the command-line, whereas
>> it'd be nice if the cache persisted across commands. I suppose this is a
>> reasonable (and very safe) first implementation though, and saves having
>> to go through each storage provider type and find out the right place to
>> detect media changes.
>
> I'm not sure it does. I traced through the mmc initialization and it's
> only called when the card itself is initialized.

I don't believe U-Boot caches the partition structure across user 
commands. Doesn't each user command (e.g. part list, ls, load, save) 
first look up the block device, then scan the partition table, then 
"mount" the filesystem, then perform its action, then throw all that 
state away? Conversely, "mmc rescan" only happens under explicit user 
control. Still as I said, the current implementation is probably fine to 
start with, and at least is safe.

>>> diff --git a/drivers/block/blkcache.c b/drivers/block/blkcache.c
>>
>>> +struct block_cache_node {
>>> +    struct list_head lh;
>>> +    int iftype;
>>> +    int devnum;
>>> +    lbaint_t start;
>>> +    lbaint_t blkcnt;
>>> +    unsigned long blksz;
>>> +    char *cache;
>>> +};
>>> +
>>> +static LIST_HEAD(block_cache);
>>> +
>>> +static struct block_cache_stats _stats = {
>>> +    .max_blocks_per_entry = 2,
>>> +    .max_entries = 32
>>> +};
>>
>> Now is a good time to mention another reason why I don't like using a
>> dynamically allocated linked list for this: Memory fragmentation. By
>> dynamically allocating the cache, we could easily run into a situation
>> where the user runs a command that allocates memory and also adds to the
>> block cache, then most of that memory gets freed when U-Boot returns to
>> the command prompt, then the user runs the command again but it fails
>> since it can't allocate the memory due to fragmentation of the heap.
>> This is a real problem I've seen e.g. with the "ums" and "dfu" commands,
>> since they might initialize the USB controller the first time they're
>> run, which allocates some new memory. Statically allocation would avoid
>> this.
>
> We're going to allocate a block or set of blocks every time we allocate
> a new node for the list, so having the list in an array doesn't fix the
> problem.

We could allocate the data storage for the block cache at the top of RAM 
before relocation, like many other things are allocated, and hence not 
use malloc() for that.

> While re-working the code, I also thought more about using an array and
> still don't see how the implementation doesn't get more complex.
>
> The key bit is that the list is implemented in MRU order so
> invalidating the oldest is trivial.

Yes, the MRU logic would make it more complex. Is that particularly 
useful, i.e. is it an intrinsic part of the speedup?