[U-Boot] UBIFS mount bug when mounting from multiple MTD partitions
Schrempf Frieder
frieder.schrempf at kontron.de
Wed Apr 10 13:31:27 UTC 2019
Hi Heiko,
On 10.04.19 14:44, Heiko Schocher wrote:
> Hello Frieder,
>
> Am 10.04.2019 um 12:49 schrieb Schrempf Frieder:
>> Hi,
>>
>> I have a customer who has a NAND device with two MTD partitions and
>> each > of the partitions contains one UBI volume with a UBIFS filesystem.
>
> Bad idea ... why?
>
> You may loose lifetime of the board, as UBI cannot use PEBs between the
> 2 MTD partitions on the nand ... better you would have one big MTD
> Partition
> with n ubi volumes in it ...
>
> But ... okay... this must work also.
Yeah, I only recently learned about the disadvantages of this setup.
Maybe we can change it in the future, but for now they are using
separate partitions.
>
>> Now U-Boot can mount the UBIFS from the first partition just fine, but
>> if the UBIFS from the second partition is mounted afterwards this fails
>> in some cases.
>
> :-(
>
>> I can reproduce the error and tracked it down to uboot_ubifs_mount() in
>> fs/ubifs/super.c. If this function is run for the second mount, the
>> struct ubifs_fs_type is reused and it contains a list fs_supers, that
>> still holds one entry for the first mount.
>
> Sure?
>
> fs_supers in struct file_system_type seems used only in none
> U-Boot code...
Right, I had a closer look and fs_supers seems to be unused indeed, but
somehow it causes corruption in my case. When I apply 5a08cfee3967
(ubifs: remove useless code) the problem disappears.
Without this patch there still is hlist_add_head(&s->s_instances,
&type->fs_supers) and this line somehow seems to cause the error.
I applied this debug diff:
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -2428,7 +2428,15 @@ retry:
#else
strncpy(s->s_id, type->name, sizeof(s->s_id));
#endif
+ printf("%s:%d: ubi_num: %d, vol_id: %d\n",
+ __func__, __LINE__,
+ ((struct ubifs_info *)s->s_fs_info)->vi.ubi_num,
+ ((struct ubifs_info *)s->s_fs_info)->vi.vol_id);
hlist_add_head(&s->s_instances, &type->fs_supers);
+ printf("%s:%d: ubi_num: %d, vol_id: %d\n",
+ __func__, __LINE__,
+ ((struct ubifs_info *)s->s_fs_info)->vi.ubi_num,
+ ((struct ubifs_info *)s->s_fs_info)->vi.vol_id);
#ifndef __UBOOT__
spin_unlock(&sb_lock);
get_filesystem(type);
And I'm getting this for the first mount:
sget:2431: ubi_num: 0, vol_id: 0
sget:2433: ubi_num: 0, vol_id: 0
And this for the second mount:
sget:2431: ubi_num: 0, vol_id: 0
sget:2433: ubi_num: -1678121552, vol_id: -1678120656
>> I guess, that if the second mount would happen on a volume that is on
>> the same MTD partition as the first volume, than this will work. The
>> second entry is added to ubifs_fs_type.fs_supers.
>
> I cannot see this from looking into code ... so hard to say, but
> I only looked into mainline code ...
Yeah, I was probably wrong with these first wild guesses...
>
>> In my case however, the second entry being added to
>> ubifs_fs_type.fs_supers is invalid and causes the mount error.
>>
>> Reinitializing the list in uboot_ubifs_mount() before each mount, solves
>> the problem, but I guess that it will cause failures in other setups,
>> where there are actually multiple volumes on one MTD device.
>>
>> So how can I solve this properly? Do we need one instance of struct
>> ubifs_fs_type for each MTD device?
>
> Hmm.. without digging into it, it is difficult to say...
>
>> I tested this on an old version (2017.03), but looking at the current
>> code, it looks like the same problem applies to current mainline.
>
> Is there any chance to try it with current mainline ?
The problem is a bit strange and this what I'm actually worried about.
It is persistent in a certain environment: U-Boot loaded from SPI NOR,
environment set to certain values, data written to UBIFS partition in
Linux and then power-cut.
If one of these conditions changes, the error usually disappears, for
example if I use the exact same setup, but load the Bootloader from MMC
or RAM. Or if no write access with power-cut happens.
So I wonder if there's some memory corruption somewhere else. Though,
the error happens always at the same place. Debug prints or other code
changes have no influence.
I really would like to understand what's going on so I can make sure
that 5a08cfee3967 actually solves the real issue or just hides it.
Thanks,
Frieder
More information about the U-Boot
mailing list