[U-Boot] UBIFS mount bug when mounting from multiple MTD partitions

Schrempf Frieder frieder.schrempf at kontron.de
Wed Apr 10 13:31:27 UTC 2019


Hi Heiko,

On 10.04.19 14:44, Heiko Schocher wrote:
> Hello Frieder,
> 
> Am 10.04.2019 um 12:49 schrieb Schrempf Frieder:
>> Hi,
>>
>> I have a customer who has a NAND device with two MTD partitions and 
>> each > of the partitions contains one UBI volume with a UBIFS filesystem.
> 
> Bad idea ... why?
> 
> You may loose lifetime of the board, as UBI cannot use PEBs between the
> 2 MTD partitions on the nand ... better you would have one big MTD 
> Partition
> with n ubi volumes in it ...
> 
> But ... okay... this must work also.

Yeah, I only recently learned about the disadvantages of this setup. 
Maybe we can change it in the future, but for now they are using 
separate partitions.

> 
>> Now U-Boot can mount the UBIFS from the first partition just fine, but
>> if the UBIFS from the second partition is mounted afterwards this fails
>> in some cases.
> 
> :-(
> 
>> I can reproduce the error and tracked it down to uboot_ubifs_mount() in
>> fs/ubifs/super.c. If this function is run for the second mount, the
>> struct ubifs_fs_type is reused and it contains a list fs_supers, that
>> still holds one entry for the first mount.
> 
> Sure?
> 
> fs_supers in struct file_system_type seems used only in none
> U-Boot code...

Right, I had a closer look and fs_supers seems to be unused indeed, but 
somehow it causes corruption in my case. When I apply 5a08cfee3967 
(ubifs: remove useless code) the problem disappears.

Without this patch there still is hlist_add_head(&s->s_instances, 
&type->fs_supers) and this line somehow seems to cause the error.

I applied this debug diff:

--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -2428,7 +2428,15 @@ retry:
  #else
         strncpy(s->s_id, type->name, sizeof(s->s_id));
  #endif
+       printf("%s:%d: ubi_num: %d, vol_id: %d\n",
+              __func__, __LINE__,
+              ((struct ubifs_info *)s->s_fs_info)->vi.ubi_num,
+              ((struct ubifs_info *)s->s_fs_info)->vi.vol_id);
         hlist_add_head(&s->s_instances, &type->fs_supers);
+       printf("%s:%d: ubi_num: %d, vol_id: %d\n",
+              __func__, __LINE__,
+              ((struct ubifs_info *)s->s_fs_info)->vi.ubi_num,
+              ((struct ubifs_info *)s->s_fs_info)->vi.vol_id);
  #ifndef __UBOOT__
         spin_unlock(&sb_lock);
         get_filesystem(type);

And I'm getting this for the first mount:

sget:2431: ubi_num: 0, vol_id: 0
sget:2433: ubi_num: 0, vol_id: 0

And this for the second mount:

sget:2431: ubi_num: 0, vol_id: 0
sget:2433: ubi_num: -1678121552, vol_id: -1678120656


>> I guess, that if the second mount would happen on a volume that is on
>> the same MTD partition as the first volume, than this will work. The
>> second entry is added to ubifs_fs_type.fs_supers.
> 
> I cannot see this from looking into code ... so hard to say, but
> I only looked into mainline code ...

Yeah, I was probably wrong with these first wild guesses...

> 
>> In my case however, the second entry being added to
>> ubifs_fs_type.fs_supers is invalid and causes the mount error.
>>
>> Reinitializing the list in uboot_ubifs_mount() before each mount, solves
>> the problem, but I guess that it will cause failures in other setups,
>> where there are actually multiple volumes on one MTD device.
>>
>> So how can I solve this properly? Do we need one instance of struct
>> ubifs_fs_type for each MTD device?
> 
> Hmm.. without digging into it, it is difficult to say...
> 
>> I tested this on an old version (2017.03), but looking at the current
>> code, it looks like the same problem applies to current mainline.
> 
> Is there any chance to try it with current mainline ?

The problem is a bit strange and this what I'm actually worried about. 
It is persistent in a certain environment: U-Boot loaded from SPI NOR, 
environment set to certain values, data written to UBIFS partition in 
Linux and then power-cut.

If one of these conditions changes, the error usually disappears, for 
example if I use the exact same setup, but load the Bootloader from MMC 
or RAM. Or if no write access with power-cut happens.

So I wonder if there's some memory corruption somewhere else. Though, 
the error happens always at the same place. Debug prints or other code 
changes have no influence.

I really would like to understand what's going on so I can make sure 
that 5a08cfee3967 actually solves the real issue or just hides it.

Thanks,
Frieder


More information about the U-Boot mailing list