[U-Boot] [PATCH V3 1/3] drivers: block: add block device cache
Eric Nelson
eric at nelint.com
Thu Mar 31 22:24:34 CEST 2016
Hi Stephen,
On 03/30/2016 02:57 PM, Stephen Warren wrote:
> On 03/30/2016 11:34 AM, Eric Nelson wrote:
>> Thanks again for the detailed review, Stephen.
>>
>> On 03/30/2016 07:36 AM, Stephen Warren wrote:
>>> On 03/28/2016 11:05 AM, Eric Nelson wrote:
>>>> Add a block device cache to speed up repeated reads of block devices by
>>>> various filesystems.
>>>> diff --git a/disk/part.c b/disk/part.c
>>>
>>>> @@ -268,6 +268,8 @@ void part_init(struct blk_desc *dev_desc)
>>>> const int n_ents = ll_entry_count(struct part_driver,
>>>> part_driver);
>>>> struct part_driver *entry;
>>>>
>>>> + blkcache_invalidate(dev_desc->if_type, dev_desc->devnum);
>>>
>>> Doesn't this invalidate the cache far too often? I expect that function
>>> is called for command the user executes from the command-line, whereas
>>> it'd be nice if the cache persisted across commands. I suppose this is a
>>> reasonable (and very safe) first implementation though, and saves having
>>> to go through each storage provider type and find out the right place to
>>> detect media changes.
>>
>> I'm not sure it does. I traced through the mmc initialization and it's
>> only called when the card itself is initialized.
>
> I don't believe U-Boot caches the partition structure across user
> commands. Doesn't each user command (e.g. part list, ls, load, save)
> first look up the block device, then scan the partition table, then
> "mount" the filesystem, then perform its action, then throw all that
> state away? Conversely, "mmc rescan" only happens under explicit user
> control. Still as I said, the current implementation is probably fine to
> start with, and at least is safe.
>
At least for MMC, this isn't the case. Various filesystem commands
operate without calling part_init.
>>>> diff --git a/drivers/block/blkcache.c b/drivers/block/blkcache.c
>>>
>>>> +struct block_cache_node {
>>>> + struct list_head lh;
>>>> + int iftype;
>>>> + int devnum;
>>>> + lbaint_t start;
>>>> + lbaint_t blkcnt;
>>>> + unsigned long blksz;
>>>> + char *cache;
>>>> +};
>>>> +
>>>> +static LIST_HEAD(block_cache);
>>>> +
>>>> +static struct block_cache_stats _stats = {
>>>> + .max_blocks_per_entry = 2,
>>>> + .max_entries = 32
>>>> +};
>>>
>>> Now is a good time to mention another reason why I don't like using a
>>> dynamically allocated linked list for this: Memory fragmentation. By
>>> dynamically allocating the cache, we could easily run into a situation
>>> where the user runs a command that allocates memory and also adds to the
>>> block cache, then most of that memory gets freed when U-Boot returns to
>>> the command prompt, then the user runs the command again but it fails
>>> since it can't allocate the memory due to fragmentation of the heap.
>>> This is a real problem I've seen e.g. with the "ums" and "dfu" commands,
>>> since they might initialize the USB controller the first time they're
>>> run, which allocates some new memory. Statically allocation would avoid
>>> this.
>>
>> We're going to allocate a block or set of blocks every time we allocate
>> a new node for the list, so having the list in an array doesn't fix the
>> problem.
>
> We could allocate the data storage for the block cache at the top of RAM
> before relocation, like many other things are allocated, and hence not
> use malloc() for that.
>
Hmmm. We seem to have gone from a discussion about data structures to
type of allocation.
I'm interested in seeing how that works. Can you provide hints about
what's doing this now?
>> While re-working the code, I also thought more about using an array and
>> still don't see how the implementation doesn't get more complex.
>>
>> The key bit is that the list is implemented in MRU order so
>> invalidating the oldest is trivial.
>
> Yes, the MRU logic would make it more complex. Is that particularly
> useful, i.e. is it an intrinsic part of the speedup?
It's not a question of speed with small numbers of entries. The code
to handle eviction would just be more complex.
Given that the command "blkcache configure 0 0" will discard all
cache and since both dfu and ums should properly have the cache
disabled, I'd like to proceed as-is with the list and heap approach.
A follow-up change to use another form of allocation is unlikely to
change the primary interfaces, though I can't be sure until I
understand how these allocation(s) would occur.
I have a V3 prepped that addresses your other comments.
To reiterate the impact of this code, I have use cases where file
loading takes minutes when it should take seconds and suspect that
others have been seeing the same for quite some time.
Let me know your thoughts.
Regards,
Eric
More information about the U-Boot
mailing list