[U-Boot] [PATCH V3 1/3] drivers: block: add block device cache

Wed Mar 30 17:21:27 CEST 2016

On 03/30/2016 09:19 AM, Tom Rini wrote:
> On Wed, Mar 30, 2016 at 08:36:21AM -0600, Stephen Warren wrote:
>> On 03/28/2016 11:05 AM, Eric Nelson wrote:
>>> Add a block device cache to speed up repeated reads of block devices by
>>> various filesystems.
>>>
>>> This small amount of cache can dramatically speed up filesystem
>>> operations by skipping repeated reads of common areas of a block
>>> device (typically directory structures).
>>>
>>> This has shown to have some benefit on FAT filesystem operations of
>>> loading a kernel and RAM disk, but more dramatic benefits on ext4
>>> filesystems when the kernel and/or RAM disk are spread across
>>> multiple extent header structures as described in commit fc0fc50.
>>>
>>> The cache is implemented through a minimal list (block_cache) maintained
>>> in most-recently-used order and count of the current number of entries
>>> (cache_count). It uses a maximum block count setting to prevent copies
>>> of large block reads and an upper bound on the number of cached areas.
>>>
>>> The maximum number of entries in the cache defaults to 32 and the maximum
>>> number of blocks per cache entry has a default of 2, which has shown to
>>> produce the best results on testing of ext4 and FAT filesystems.
>>>
>>> The 'blkcache' command (enabled through CONFIG_CMD_BLOCK_CACHE) allows
>>> changing these values and can be used to tune for a particular filesystem
>>> layout.
> [snip]
>>> diff --git a/disk/part.c b/disk/part.c
>>
>>> @@ -268,6 +268,8 @@ void part_init(struct blk_desc *dev_desc)
>>>   	const int n_ents = ll_entry_count(struct part_driver, part_driver);
>>>   	struct part_driver *entry;
>>>
>>> +	blkcache_invalidate(dev_desc->if_type, dev_desc->devnum);
>>
>> Doesn't this invalidate the cache far too often? I expect that
>> function is called for command the user executes from the
>> command-line, whereas it'd be nice if the cache persisted across
>> commands. I suppose this is a reasonable (and very safe) first
>> implementation though, and saves having to go through each storage
>> provider type and find out the right place to detect media changes.
>
> My initial reaction here is that we should stay conservative and
> invalidate the cache more often rather than too infrequent.  I mean,
> what's the worst case here, an extra read? A few extra reads?  We want
> to make sure we keep the complexity to functionality ratio right here,
> if we make the recovery/flashing/factory cases a whole lot better but
> are leaving 1 second of wall clock time on the table when we've just
> gained a minute, we're OK.
>
>>> diff --git a/drivers/block/blkcache.c b/drivers/block/blkcache.c
>>
>>> +struct block_cache_node {
>>> +	struct list_head lh;
>>> +	int iftype;
>>> +	int devnum;
>>> +	lbaint_t start;
>>> +	lbaint_t blkcnt;
>>> +	unsigned long blksz;
>>> +	char *cache;
>>> +};
>>> +
>>> +static LIST_HEAD(block_cache);
>>> +
>>> +static struct block_cache_stats _stats = {
>>> +	.max_blocks_per_entry = 2,
>>> +	.max_entries = 32
>>> +};
>>
>> Now is a good time to mention another reason why I don't like using
>> a dynamically allocated linked list for this: Memory fragmentation.
>> By dynamically allocating the cache, we could easily run into a
>> situation where the user runs a command that allocates memory and
>> also adds to the block cache, then most of that memory gets freed
>> when U-Boot returns to the command prompt, then the user runs the
>> command again but it fails since it can't allocate the memory due to
>> fragmentation of the heap. This is a real problem I've seen e.g.
>> with the "ums" and "dfu" commands, since they might initialize the
>> USB controller the first time they're run, which allocates some new
>> memory. Statically allocation would avoid this.
>
> That is a good point.  But how would you hit this?  The problem in
> ums/dfu was that it was several megabytes, yes?  My quick read over the
> code right now has me thinking this is something measured in kilobytes.

The allocation that failed was a large multi-megabyte buffer. However, 
the allocations that caused the fragmentation/failure were small 
(probably tens/hundreds of bytes, but I didn't check).