[U-Boot] UBIFS mount bug when mounting from multiple MTD partitions

Schrempf Frieder frieder.schrempf at kontron.de
Thu Apr 11 07:50:11 UTC 2019


On 10.04.19 15:59, Heiko Schocher wrote:
> Hello Frieder,
> 
> Am 10.04.2019 um 15:31 schrieb Schrempf Frieder:
>> Hi Heiko,
>>
>> On 10.04.19 14:44, Heiko Schocher wrote:
>>> Hello Frieder,
>>>
>>> Am 10.04.2019 um 12:49 schrieb Schrempf Frieder:
>>>> Hi,
>>>>
>>>> I have a customer who has a NAND device with two MTD partitions and
>>>> each > of the partitions contains one UBI volume with a UBIFS 
>>>> filesystem.
>>>
>>> Bad idea ... why?
>>>
>>> You may loose lifetime of the board, as UBI cannot use PEBs between the
>>> 2 MTD partitions on the nand ... better you would have one big MTD
>>> Partition
>>> with n ubi volumes in it ...
>>>
>>> But ... okay... this must work also.
>>
>> Yeah, I only recently learned about the disadvantages of this setup.
>> Maybe we can change it in the future, but for now they are using
>> separate partitions.
> 
> Ok.
> 
>>>> Now U-Boot can mount the UBIFS from the first partition just fine, but
>>>> if the UBIFS from the second partition is mounted afterwards this fails
>>>> in some cases.
>>>
>>> :-(
>>>
>>>> I can reproduce the error and tracked it down to uboot_ubifs_mount() in
>>>> fs/ubifs/super.c. If this function is run for the second mount, the
>>>> struct ubifs_fs_type is reused and it contains a list fs_supers, that
>>>> still holds one entry for the first mount.
>>>
>>> Sure?
>>>
>>> fs_supers in struct file_system_type seems used only in none
>>> U-Boot code...
>>
>> Right, I had a closer look and fs_supers seems to be unused indeed, but
>> somehow it causes corruption in my case. When I apply 5a08cfee3967
>> (ubifs: remove useless code) the problem disappears.
> 
> Ah, 5a08cfee3967 is in mainline ... good ;-)
> 
>> Without this patch there still is hlist_add_head(&s->s_instances,
>> &type->fs_supers) and this line somehow seems to cause the error.
>>
>> I applied this debug diff:
>>
>> --- a/fs/ubifs/super.c
>> +++ b/fs/ubifs/super.c
>> @@ -2428,7 +2428,15 @@ retry:
>>    #else
>>           strncpy(s->s_id, type->name, sizeof(s->s_id));
>>    #endif
>> +       printf("%s:%d: ubi_num: %d, vol_id: %d\n",
>> +              __func__, __LINE__,
>> +              ((struct ubifs_info *)s->s_fs_info)->vi.ubi_num,
>> +              ((struct ubifs_info *)s->s_fs_info)->vi.vol_id);
>>           hlist_add_head(&s->s_instances, &type->fs_supers);
>> +       printf("%s:%d: ubi_num: %d, vol_id: %d\n",
>> +              __func__, __LINE__,
>> +              ((struct ubifs_info *)s->s_fs_info)->vi.ubi_num,
>> +              ((struct ubifs_info *)s->s_fs_info)->vi.vol_id);
>>    #ifndef __UBOOT__
>>           spin_unlock(&sb_lock);
>>           get_filesystem(type);
>>
>> And I'm getting this for the first mount:
>>
>> sget:2431: ubi_num: 0, vol_id: 0
>> sget:2433: ubi_num: 0, vol_id: 0
>>
>> And this for the second mount:
>>
>> sget:2431: ubi_num: 0, vol_id: 0
>> sget:2433: ubi_num: -1678121552, vol_id: -1678120656
> 
> Hmm...
> 
>>>> I guess, that if the second mount would happen on a volume that is on
>>>> the same MTD partition as the first volume, than this will work. The
>>>> second entry is added to ubifs_fs_type.fs_supers.
>>>
>>> I cannot see this from looking into code ... so hard to say, but
>>> I only looked into mainline code ...
>>
>> Yeah, I was probably wrong with these first wild guesses...
> 
> ;-)
> 
>>>> In my case however, the second entry being added to
>>>> ubifs_fs_type.fs_supers is invalid and causes the mount error.
>>>>
>>>> Reinitializing the list in uboot_ubifs_mount() before each mount, 
>>>> solves
>>>> the problem, but I guess that it will cause failures in other setups,
>>>> where there are actually multiple volumes on one MTD device.
>>>>
>>>> So how can I solve this properly? Do we need one instance of struct
>>>> ubifs_fs_type for each MTD device?
>>>
>>> Hmm.. without digging into it, it is difficult to say...
>>>
>>>> I tested this on an old version (2017.03), but looking at the current
>>>> code, it looks like the same problem applies to current mainline.
>>>
>>> Is there any chance to try it with current mainline ?
>>
>> The problem is a bit strange and this what I'm actually worried about.
>> It is persistent in a certain environment: U-Boot loaded from SPI NOR,
>> environment set to certain values, data written to UBIFS partition in
>> Linux and then power-cut.
>>
>> If one of these conditions changes, the error usually disappears, for
>> example if I use the exact same setup, but load the Bootloader from MMC
>> or RAM. Or if no write access with power-cut happens.
>>
>> So I wonder if there's some memory corruption somewhere else. Though,
>> the error happens always at the same place. Debug prints or other code
>> changes have no influence.
>>
>> I really would like to understand what's going on so I can make sure
>> that 5a08cfee3967 actually solves the real issue or just hides it.
> 
> Please keep me informed ... I try to reproduce it here on a hw I
> have access to, but need some "free" time ...

So I got down to the roots of the problem. On the second mount 
operation, the fs_supers list contains a reference to the super_block of 
the previous mount, that has local scope and is not valid anymore.

Upon calling hlist_add_head(&s->s_instances, &type->fs_supers), the new 
super_block is added to the list fs_supers as a second entry. As this is 
a double linked list, the first entries pprev pointer is updated, what 
causes a corruption.

So 5a08cfee3967 is not just a simple cleanup change, as noted in the 
commit message. It also fixes a real bug by removing the update of the 
unused list.

Regards,
Frieder


More information about the U-Boot mailing list