[EXTERNAL] Re: [PATCH 0/4] fs: ubifs: Fix crash and add safeguards

Michael Nazzareno Trimarchi michael at amarulasolutions.com
Mon Aug 5 10:58:31 CEST 2024


Hi all

On Mon, Aug 5, 2024 at 10:49 AM Alexander Dahl <ada at thorsis.com> wrote:
>
> Hello,
>
> Am Mon, Aug 05, 2024 at 07:28:19AM +0200 schrieb Heiko Schocher:
> > Hello Ravi,
> >
> > On 01.08.24 21:39, Ravi Minnikanti wrote:
> > > Hi Heiko, Alexander,
> > >
> > > On 7/31/24 23:54, Heiko Schocher wrote:
> > > > Hello Alexander, On 01. 08. 24 08: 50, Alexander Dahl wrote: > Hei, > > Am Thu,
> > > > Jul 04, 2024 at 10: 18: 55AM +0200 schrieb Alexander Dahl: >> Hello Heiko, >> >>
> > > > Am Thu, Jul 04, 2024 at 06: 28: 31AM +0200 schrieb Heiko Schocher:
> > > >
> > > >
> > > > Hello Alexander,
> > > >
> > > > On 01.08.24 08:50, Alexander Dahl wrote:
> > > > > Hei,
> > > > >
> > > > > Am Thu, Jul 04, 2024 at 10:18:55AM +0200 schrieb Alexander Dahl:
> > > > > > Hello Heiko,
> > > > > >
> > > > > > Am Thu, Jul 04, 2024 at 06:28:31AM +0200 schrieb Heiko Schocher:
> > > > > > > Hello Alexander,
> > > > > > >
> > > > > > > On 03.07.24 12:12, Alexander Dahl wrote:
> > > > > > > > Hei hei,
> > > > > > > >
> > > > > > > > filesystem handling is different in U-Boot and beyond that UBI/UBIFS is
> > > > > > > > different from other filesystems in U-Boot.  There's UBI and UBIFS code
> > > > > > > > ported from Linux (quite old already now, maybe someone wants to update
> > > > > > > > that?), and there's "glue code" or "wrapper code" to interface with
> > > > > > > > U-Boot scripts, commands, and filesystem handling.  The fixes and
> > > > > > > > improvements in this patch series are for this U-Boot specific glue
> > > > > > > > code.
> > > > > > >
> > > > > > > Yes, the linux base is very old ... patches are welcome!
> > > > > >
> > > > > > The last sync was back in 2015 from linux v4.2, there were 800+
> > > > > > changes to ubi/ubifs in Linux since then. :-/
> > > > > >
> > > > > > > And for me it is not that easy, as I do not have a hardware with
> > > > > > > current mainline U-Boot running on it... I want to update a hardware
> > > > > > > I have to current mainline, but I had no time yet for it...
> > > > > >
> > > > > > Besides the custom hardware here, I used Microchip SAM9X60-Curiosity
> > > > > > lately, which is coming with a raw NAND flash and can boot from it.
> > > > > >
> > > > > > >
> > > > > > > > I'm no filesystem expert, but after days of debugging I'm quite sure the
> > > > > > > > bug is in U-Boot since UBIFS support was added in 2009, and it was
> > > > > > > > repeated in 2015 when generic filesystem support for UBIFS was added.
> > > > > > > > So please review carefully!
> > > > > > >
> > > > > > > Which bug?
> > > > > >
> > > > > > The memory leak and double free fixed with patch 1 of the series.
> > > > > >
> > > > > > >
> > > > > > > > The crashes were not easily reproducible, only with boards using the old
> > > > > > > > distroboot _and_ a boot script inspired by (but not equal to) the one
> > > > > > > > proposed by RAUC [1], which basically boils down to:
> > > > > > > >
> > > > > > > >      ubifsmount ubi0:boot (from distroboot)
> > > > > > > >      test -e (from distroboot)
> > > > > > > >      ubifsmount ubi0:rootfs1 (this time from the boot script,
> > > > > > > >                               triggering a ubifs_umount)
> > > > > > >
> > > > > > > So, you have a special sequence you execute to trigger the bug, good!
> > > > > > >
> > > > > > > In special 2 ubifsmount in a row... may not often needed for booting!
> > > > > > > (I ask me, why that is needed? Boottime is not good than...)
> > > > > >
> > > > > > Using distroboot with a script here.  The script is in a separate UBI
> > > > > > volume ubi0:boot, kernel is loaded from ubi0:rootfs1 or ubi0:rootfs2
> > > > > > however.  So there is 'ubifsmount ubi0:boot' from distroboot and in the
> > > > > > script found, loaded, and run there is 'ubifsmount ubi0:rootfs1' (or
> > > > > > rootfs2) later.  ubifsmount calls ubifsumount internally if some
> > > > > > volume is mounted already.
> > > > > >
> > > > > > >
> > > > > > > BTW: Is this really a good bootcmd in [1] as on *every* boot your
> > > > > > >        Environment is saved? This is not good for lifetime of your
> > > > > > >        storage device ... why not using bootcounter?
> > > > > >
> > > > > > Well, I was not aware of bootcounter, but I had a look and the actual
> > > > > > counter must be stored somewhere.  Possible are:
> > > > > >
> > > > > > - pmic → has no storage possibility on my board
> > > > > > - rtc → soc internal only, volatile in the end (if battery is empty)
> > > > > > - i2c eeprom → missing
> > > > > > - spi flash → missing
> > > > > > - filesystem → ends up on the flash
> > > > > > - nvmem → no other nvmems present
> > > > > > - syscon or some cpu register or sram → volatile
> > > > > >
> > > > > > So none of these are possible in my case, I only have a raw NAND as
> > > > > > storage and thus I have to use U-Boot env, which is stored in UBI here
> > > > > > btw to not stress the flash too much.
> > > > > >
> > > > > > I could investigate if it would be possible to let RAUC use the
> > > > > > U-Boot bootcounter infrastructure, but currently RAUC depends on
> > > > > > U-Boot environment variables for tracking boot attempts.
> > > > > >
> > > > > > btw: documentation of bootcount is sparse, I only found the very short
> > > > > > 'doc/README.bootcount' and it's not even migrated to recent U-Boot
> > > > > > sphinx based docs. ;-)
> > > > > >
> > > > > > But from what I understood the concept is the same, U-Boot counts
> > > > > > something and Linux userspace has to reset it.  The counter must be
> > > > > > stored somewhere, for example in U-Boot env if no other storage is
> > > > > > possible.
> > > > > >
> > > > > > >
> > > > > > > > Crashes can be triggered more easily, if patch order is changed and
> > > > > > > > patch 2 (resetting pointers to NULL after free) comes first, or if patch
> > > > > > > > 2 is applied on its own only.
> > > > > > >
> > > > > > > Hmm...
> > > > > > >
> > > > > > > > The fix is in the first patch, and on my boards I see no crashes
> > > > > > > > anymore.  I also tested all kinds of combinations of calling `ubi part`,
> > > > > > > > `ubi detach`, `ubifsmount`, `ubifsumount`, `ubifsls`, `ubifsload`, `ls`,
> > > > > > > > `load`, `size`, and `test -e` and got no crashes anymore after the fix.
> > > > > > >
> > > > > > > That sounds good! Hmm.. test -e has nothing to do with ubi/ubifs I think.
> > > > > >
> > > > > > Oh it has, 'test -e' calls file_exists() which calls fs_exists() which
> > > > > > ends up calling ubifs_exists() which is one of the functions causing
> > > > > > an immediate memory leak, see patch 1.
> > > > > >
> > > > > > > On what hardware do you test? Is it in mainline?
> > > > > >
> > > > > > Tested on custom hardware, but I'm confident it should be reproducible
> > > > > > on any board using ubifs, especially after applying patch 2 resetting
> > > > > > pointers of freed memory to NULL.  This should trigger the bug with
> > > > > > the simple sequence already described:
> > > > > >
> > > > > >     > ubifsmount ubi0:anyvolume
> > > > > >     > ls ubi ubi0:anyvolume / # (or load, or test -e, or size)
> > > > > >     > ubifsumount
> > > > > >
> > > > > > ubifsumount will call ubifs_umount() which calls
> > > > > > ubi_close_volume(c->ubi), that pointer is either invalid leading to a
> > > > > > double free inside of ubi_close_volume() and it will crash only in
> > > > > > certain conditions or the pointer is NULL after applying patch 2 of
> > > > > > the series, then ubi_close_volume() crashes right away with a NULL
> > > > > > pointer exception.
> > > > > >
> > > > > > Note: without patch 2 it very much depends on the sequence of
> > > > > > commands, but after the first ubi_close_volume() triggered by
> > > > > > ls/load/size/exists the pointer in ubifs_sb is invalid, but accessed later
> > > > > > by the second ubi_close_volume() triggered by ubifs_umount().  If you
> > > > > > do something in between those using the freed memory by something else
> > > > > > again, the second ubi_close_volume() access might get corrupted data
> > > > > > or access things outside of RAM.  Patch 2 redirects this on a clean
> > > > > > NULL pointer exception you can easily trigger.
> > > > > >
> > > > > > In my case I got a pointer variable actually containing a string
> > > > > > "ng.." aka 0x2e2e676e which looked suspiciously similar to a valid
> > > > > > pointer on the platform somewhere in RAM between 0x20000000 and
> > > > > > 0x28000000 so it took me two days to realize it's not a pointer. ;-)
> > > > > >
> > > > > > >
> > > > > > > > The three additional patches (2 to 4) are more or less safeguards and
> > > > > > > > improvements for the future, and come from me trying and my debugging
> > > > > > > > code done on the way, more or less optional, but I think nice to have.
> > > > > > >
> > > > > > > I will look at them .. but give me some time, as I am in holidays the
> > > > > > > next 2 weeks ... Hmm.. and it would be good to get some Tested-by
> > > > > > > from people with hardware...
> > > > > >
> > > > > > Take your time, no need to work in holidays.  Would appreciate a
> > > > > > Tested-by by anyone else though, maybe some of the raw NAND folks?
> > > > >
> > > > > Well, apparently nobody had a look in the month of July, I add the raw
> > > > > NAND maintainers in Cc, maybe I should have done in the first place.
> > > > >
> > > > > Would be happy if someone could have a look at the fix, maybe read the
> > > > > patches first before the discussion? ;-)
> > > >
> > > > I asked Ravi and Alexey (added to cc) if they have time to look it they
> > > > can reproduce the issue and test your patches...
> > > >
> > > > bye,
> > > > Heiko
> > >
> > > I tried to reproduce with CONFIG_SYS_MALLOC_LEN reduced to 1MB and
> > > calling ubifsls in a loop, logic below:
> > >


> > > ubi part ubi0:rootfs
> > > ubifsmount ubi0:rootfs
> > > setenv i 1
> > > while test $i -le 200; do
> > >      ubifsls a/b/test.bin
> > >      setexpr i $i + 1
> > >      sleep 1
> > >      echo "Loop count: $i"
> > > done
> >
> > Thanks for testing! But it is not exactly the same sequence Alexander
> > described ... but you also trigger a bug.
>

diff --git a/fs/ubifs/ubifs.c b/fs/ubifs/ubifs.c
index 048730db7f..33df4ff51f 100644
--- a/fs/ubifs/ubifs.c
+++ b/fs/ubifs/ubifs.c
@@ -557,6 +557,7 @@ static unsigned long ubifs_findfile(struct
super_block *sb, char *filename)

                        /* We have some sort of symlink recursion, bail out */
                        if (symlink_count++ > 8) {
+                               ubifs_iput(inode);
                                printf("Symlink recursion, aborting\n");
                                return 0;
                        }
@@ -568,6 +569,7 @@ static unsigned long ubifs_findfile(struct
super_block *sb, char *filename)
                                 * the leading slash */
                                next = name = link_name + 1;
                                root_inum = 1;
+                               ubifs_iput(inode);
                                continue;
                        }
                        /* Relative to cur dir */
@@ -575,6 +577,7 @@ static unsigned long ubifs_findfile(struct
super_block *sb, char *filename)
                                        link_name, next == NULL ? "" : next);
                        memcpy(symlinkpath, buf, sizeof(buf));
                        next = name = symlinkpath;
+                       ubifs_iput(inode);
                        continue;
                }

@@ -583,8 +586,10 @@ static unsigned long ubifs_findfile(struct
super_block *sb, char *filename)
                 */

                /* Found the node!  */
-               if (!next || *next == '\0')
+               if (!next || *next == '\0') {
+                       ubifs_iput(inode);
                        return inum;
+               }

                root_inum = inum;
                name = next;


Should we not free inode_iget?

Michael

> +1 … thanks for testing. :-)
>
> > > It crashes with or without patch. With patch it takes 4,5 loops more to crash.
> > >
> > > Log:
> > > UBIFS error (ubi0:0 pid 0): ubifs_iget: failed to read inode 85, error -12
> > > "Synchronous Abort" handler, esr 0x96000004
> >
> > -12 = -ENOMEM
> >
> > So it seems, we have a memleak...
>
> If with my patch it takes more loops to crash, this suggests we have
> more than one memory leak, and of them got fixed with my patch(es).
>
> > Yes, ubi/ubifs linux base is very old, but I tend to think that problem is
> > in U-Boot adaption layer...
>
> I would also suspect the adaption layer.
>
> However, although my patch fixes one memory leak, that was not the
> main reason doing this.  The reason of the crash in my case was the
> double free, and accessing invalid memory afterwards.  So from my
> perspective the patches are okay on their own, and looking after more
> memory leaks should not be part of this series.
>
> Greets
> Alex
>
> P.S.: not sure what's causing this, but I only got the mails from you
> all through the mailing list.  You can keep me in Cc, so I won't
> overlook your answers easily.
>
> >
> > bye,
> > Heiko
> > >
> > >
> > > Thanks,
> > > Ravi
> > > > >
> > > > > Greets
> > > > > Alex
> > > > >
> > > > > >
> > > > > > Greets
> > > > > > Alex
> > > > > >
> > > > > > >
> > > > > > > bye,
> > > > > > > Heiko
> > > > > > > >
> > > > > > > > Greets
> > > > > > > > Alex
> > > > > > > >
> > > > > > > > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rauc_rauc_blob_master_contrib_uboot.sh&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=3apjoJolwzwH82pd_c1HzSLzzrwfu9x1lSm802edgOg&m=lpy642UhqtTXHV0dd0xnOtQxkQSO5RC6FwcUbzt-FsNwd3QSXZsulfqC6dP9qV7G&s=wbnw_T9_r6taMKIFqJ2-wtj9Zv02OD7mbIy-ZZFRjRE&e=  <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rauc_rauc_blob_master_contrib_uboot.sh&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=3apjoJolwzwH82pd_c1HzSLzzrwfu9x1lSm802edgOg&m=lpy642UhqtTXHV0dd0xnOtQxkQSO5RC6FwcUbzt-FsNwd3QSXZsulfqC6dP9qV7G&s=wbnw_T9_r6taMKIFqJ2-wtj9Zv02OD7mbIy-ZZFRjRE&e=>
> > > > > > > >
> > > > > > > > Alexander Dahl (4):
> > > > > > > >      fs: ubifs: Fix memleak and double free in u-boot wrapper functions
> > > > > > > >      fs: ubifs: Set pointers to NULL after free
> > > > > > > >      fs: ubifs: Make k(z)alloc/kfree symmetric
> > > > > > > >      fs: ubifs: Add volume mounted check
> > > > > > > >
> > > > > > > >     fs/ubifs/super.c |  8 ++++++--
> > > > > > > >     fs/ubifs/ubifs.c | 31 +++++++++++++++++++------------
> > > > > > > >     2 files changed, 25 insertions(+), 14 deletions(-)
> > > > > > > >
> > > > > > > >
> > > > > > > > base-commit: 65fbdab27224ee3943a89496b21862db83c34da2
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > DENX Software Engineering GmbH,      Managing Director: Erika Unter
> > > > > > > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> > > > > > > Phone: +49-8142-66989-52   Fax: +49-8142-66989-80   Email: hs at denx.de
> > > >
> > > > --
> > > > DENX Software Engineering GmbH,      Managing Director: Erika Unter
> > > > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> > > > Phone: +49-8142-66989-52   Fax: +49-8142-66989-80   Email: hs at denx.de
> > > >
> >
> > --
> > DENX Software Engineering GmbH,      Managing Director: Erika Unter
> > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> > Phone: +49-8142-66989-52   Fax: +49-8142-66989-80   Email: hs at denx.de



-- 
Michael Nazzareno Trimarchi
Co-Founder & Chief Executive Officer
M. +39 347 913 2170
michael at amarulasolutions.com
__________________________________

Amarula Solutions BV
Joop Geesinkweg 125, 1114 AB, Amsterdam, NL
T. +31 (0)85 111 9172
info at amarulasolutions.com
www.amarulasolutions.com


More information about the U-Boot mailing list