[PATCH] nand: Add a watch command

Michael Nazzareno Trimarchi michael at amarulasolutions.com
Tue Nov 26 21:19:58 CET 2024


Hi Miquel

On Tue, Nov 28, 2023 at 11:56 AM Miquel Raynal
<miquel.raynal at bootlin.com> wrote:
>
> This is a debug command to monitor the retention state of the data on
> the array. The command needs a duplication of the mtd_read_oob()
> function to actually return the maximum number of bitflips encountered
> while reading the page. We could write a specific implementation for the
> Sunxi driver but this is probably enough.
>
>     nand watch <off> <size> - check an area for bitflips
>     nand watch.part <part> - check a partition for bitflips
>     nand watch.chip - check the whole device for bitflips
>
> The output may be a bit verbose and could look like:
>

I have rebased and I will take a look on this patch tomorrow. I think having it
is ok.

Michael

>     => nand watch.chip
>     device 0 whole chip
>     size adjusted to 0xff60000 (5 bad blocks)
>
>     NAND watch for bitflips in area 0x0-0xff60000:
>     Page       0 (0x00000000) -> error -74
>     Page       1 (0x00000800) -> error -74
>     Page       2 (0x00001000) -> error -74
>     Page       3 (0x00001800) -> error -74
>     Page       4 (0x00002000) -> error -74
>     Page       5 (0x00002800) -> error -74
>     Page       6 (0x00003000) -> error -74
>     Page       7 (0x00003800) -> error -74
>     Page       8 (0x00004000) -> error -74
>     Page       9 (0x00004800) -> error -74
>     Page      10 (0x00005000) -> error -74
>     Page      11 (0x00005800) -> error -74
>     Page      12 (0x00006000) -> error -74
>     Page      13 (0x00006800) -> error -74
>     Page      14 (0x00007000) -> error -74
>     Page      15 (0x00007800) -> error -74
>     Page      16 (0x00008000) -> error -74
>     Page      17 (0x00008800) -> error -74
>     Page      18 (0x00009000) -> error -74
>     Page      19 (0x00009800) -> error -74
>     Page      20 (0x0000a000) -> error -74
>     Page      21 (0x0000a800) -> error -74
>     Page      22 (0x0000b000) -> error -74
>     Page      23 (0x0000b800) -> error -74
>     Page    1110 (0x0022b000) -> up to  1 bf/chunk
>     Page    1122 (0x00231000) -> up to  1 bf/chunk
>     Page    1132 (0x00236000) -> up to  1 bf/chunk
>     Page    1362 (0x002a9000) -> up to  1 bf/chunk
>     Page    4990 (0x009bf000) -> up to  1 bf/chunk
>     Page    5728 (0x00b30000) -> up to  1 bf/chunk
>     Page    7116 (0x00de6000) -> up to  1 bf/chunk
>     Page    7160 (0x00dfc000) -> up to  1 bf/chunk
>     Page    7494 (0x00ea3000) -> up to  1 bf/chunk
>     Page   10842 (0x0152d000) -> up to  1 bf/chunk
>     Page   11614 (0x016af000) -> up to  1 bf/chunk
>     Page   11970 (0x01761000) -> up to  1 bf/chunk
>     Page   12536 (0x0187c000) -> up to  1 bf/chunk
>     Page   12687 (0x018c7800) -> up to  1 bf/chunk
>     Page   14298 (0x01bed000) -> up to  1 bf/chunk
>     Page   18268 (0x023ae000) -> up to  1 bf/chunk
>     Page   18760 (0x024a4000) -> up to  1 bf/chunk
>     Page   21440 (0x029e0000) -> up to  1 bf/chunk
>     Page   22336 (0x02ba0000) -> up to  1 bf/chunk
>     Page   22592 (0x02c20000) -> up to  1 bf/chunk
>     Page   23872 (0x02ea0000) -> up to  1 bf/chunk
>     Page   27584 (0x035e0000) -> up to  1 bf/chunk
>     Page   35008 (0x04460000) -> up to  1 bf/chunk
>     Page   37184 (0x048a0000) -> up to  1 bf/chunk
>     Page   41728 (0x05180000) -> up to  1 bf/chunk
>     Page   42176 (0x05260000) -> up to  1 bf/chunk
>     Page   43200 (0x05460000) -> up to  1 bf/chunk
>     Page   43328 (0x054a0000) -> up to  1 bf/chunk
>     Page   45376 (0x058a0000) -> up to  1 bf/chunk
>     Page   47040 (0x05be0000) -> up to  1 bf/chunk
>     Page   47552 (0x05ce0000) -> up to  1 bf/chunk
>     Page   49344 (0x06060000) -> up to  1 bf/chunk
>     Page   49856 (0x06160000) -> up to  1 bf/chunk
>     Page   62784 (0x07aa0000) -> up to  1 bf/chunk
>     Page   65153 (0x07f40800) -> up to  1 bf/chunk
>     Page   65228 (0x07f66000) -> up to  1 bf/chunk
>     Page   65382 (0x07fb3000) -> up to  1 bf/chunk
>     Page   98624 (0x0c0a0000) -> up to  1 bf/chunk
>     Page  101952 (0x0c720000) -> up to  1 bf/chunk
>     Page  107584 (0x0d220000) -> up to  1 bf/chunk
>     Page  118208 (0x0e6e0000) -> up to  1 bf/chunk
>     Page  126656 (0x0f760000) -> up to  1 bf/chunk
>     Page  127680 (0x0f960000) -> up to  1 bf/chunk
>     Page  129920 (0x0fdc0000) -> up to  1 bf/chunk
>     Maximum number of bitflips: 1
>     Pages with bitflips: 44/130752
>
> It is also possible to reduce the output with the .quiet suffix in order
> to just show the summary.
>
>     => nand watch.chip
>     device 0 whole chip
>     size adjusted to 0xff60000 (5 bad blocks)
>
>     NAND watch for bitflips in area 0x0-0xff60000:
>     Maximum number of bitflips: 1
>     Pages with bitflips: 44/130752
>
> Signed-off-by: Miquel Raynal <miquel.raynal at bootlin.com>
> ---
>
> Hello, I recently came across a batch of NANDs with a lot of "natural"
> bitflips so in order to easily and objectively characterize how
> unstable these parts were, I wrote this little tool which was pretty
> handy to have in U-Boot. I believe it can be useful for others as well,
> so here is the patch.
> Cheers, Miquèl
>
>  cmd/Kconfig             |   5 ++
>  cmd/nand.c              | 103 ++++++++++++++++++++++++++++++++++++++++
>  drivers/mtd/mtdcore.c   |  22 +++++++++
>  include/linux/mtd/mtd.h |   1 +
>  4 files changed, 131 insertions(+)
>
> diff --git a/cmd/Kconfig b/cmd/Kconfig
> index 451baa3ecac..0524328d373 100644
> --- a/cmd/Kconfig
> +++ b/cmd/Kconfig
> @@ -1384,6 +1384,11 @@ config CMD_NAND_TORTURE
>         help
>           NAND torture support.
>
> +config CMD_NAND_WATCH
> +       bool "nand watch"
> +       help
> +         NAND watch bitflip support.
> +
>  endif # CMD_NAND
>
>  config CMD_NVME
> diff --git a/cmd/nand.c b/cmd/nand.c
> index 71b8f964429..3bf67f5b65e 100644
> --- a/cmd/nand.c
> +++ b/cmd/nand.c
> @@ -231,6 +231,54 @@ free_dat:
>         return ret;
>  }
>
> +#ifdef CONFIG_CMD_NAND_WATCH
> +static int nand_watch_bf(struct mtd_info *mtd, ulong off, ulong size, bool quiet)
> +{
> +       unsigned int max_bf = 0, pages_wbf = 0;
> +       unsigned int first_page, pages, i;
> +       struct mtd_oob_ops ops = {};
> +       u_char *buf;
> +       int ret;
> +
> +       buf = memalign(ARCH_DMA_MINALIGN, mtd->writesize);
> +       if (!buf) {
> +               puts("No memory for page buffer\n");
> +               return 1;
> +       }
> +
> +       first_page = off / mtd->writesize;
> +       pages = size / mtd->writesize;
> +
> +       ops.datbuf = buf;
> +       ops.len = mtd->writesize;
> +       for (i = first_page; i < first_page + pages; i++) {
> +               ulong addr = mtd->writesize * i;
> +               ret = mtd_read_oob_bf(mtd, addr, &ops);
> +               if (ret < 0) {
> +                       if (quiet)
> +                               continue;
> +
> +                       printf("Page %7d (0x%08lx) -> error %d\n",
> +                              i, addr, ret);
> +               } else if (ret) {
> +                       max_bf = max(max_bf, (unsigned int)ret);
> +                       pages_wbf++;
> +                       if (quiet)
> +                               continue;
> +                       printf("Page %7d (0x%08lx) -> up to %2d bf/chunk\n",
> +                              i, addr, ret);
> +               }
> +       }
> +
> +       printf("Maximum number of bitflips: %u\n", max_bf);
> +       printf("Pages with bitflips: %u/%u\n", pages_wbf, pages);
> +
> +       free(buf);
> +
> +       return 0;
> +}
> +#endif
> +
>  /* ------------------------------------------------------------------------- */
>
>  static int set_dev(int dev)
> @@ -778,6 +826,55 @@ static int do_nand(struct cmd_tbl *cmdtp, int flag, int argc,
>                 return ret == 0 ? 0 : 1;
>         }
>
> +#ifdef CONFIG_CMD_NAND_WATCH
> +       if (strncmp(cmd, "watch", 5) == 0) {
> +               int args = 2;
> +
> +               if (cmd[5]) {
> +                       if (!strncmp(&cmd[5], ".part", 5)) {
> +                               args = 1;
> +                       } else if (!strncmp(&cmd[5], ".chip", 5)) {
> +                               args = 0;
> +                       } else {
> +                               goto usage;
> +                       }
> +               }
> +
> +               if (cmd[10])
> +                       if (!strncmp(&cmd[10], ".quiet", 6))
> +                               quiet = true;
> +
> +               if (argc != 2 + args)
> +                       goto usage;
> +
> +               ret = mtd_arg_off_size(argc - 2, argv + 2, &dev, &off, &size,
> +                                      &maxsize, MTD_DEV_TYPE_NAND, mtd->size);
> +               if (ret)
> +                       return ret;
> +
> +               /* size is unspecified */
> +               if (argc < 4)
> +                       adjust_size_for_badblocks(&size, off, dev);
> +
> +               if ((off & (mtd->writesize - 1)) ||
> +                   (size & (mtd->writesize - 1))) {
> +                       printf("Attempt to read non page-aligned data\n");
> +                       return -EINVAL;
> +               }
> +
> +               ret = set_dev(dev);
> +               if (ret)
> +                       return ret;
> +
> +               mtd = get_nand_dev_by_index(dev);
> +
> +               printf("\nNAND watch for bitflips in area 0x%llx-0x%llx:\n",
> +                      off, off + size);
> +
> +               return nand_watch_bf(mtd, off, size, quiet);
> +       }
> +#endif
> +
>  #ifdef CONFIG_CMD_NAND_TORTURE
>         if (strcmp(cmd, "torture") == 0) {
>                 loff_t endoff;
> @@ -943,6 +1040,12 @@ U_BOOT_LONGHELP(nand,
>         "nand erase.chip [clean] - erase entire chip'\n"
>         "nand bad - show bad blocks\n"
>         "nand dump[.oob] off - dump page\n"
> +#ifdef CONFIG_CMD_NAND_WATCH
> +       "nand watch <off> <size> - check an area for bitflips\n"
> +       "nand watch.part <part> - check a partition for bitflips\n"
> +       "nand watch.chip - check the whole device for bitflips\n"
> +       "\t\t.quiet - Query only the summary, not the details\n"
> +#endif
>  #ifdef CONFIG_CMD_NAND_TORTURE
>         "nand torture off - torture one block at offset\n"
>         "nand torture off [size] - torture blocks from off to off+size\n"
> diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
> index aa78d41a55e..2baf92a9056 100644
> --- a/drivers/mtd/mtdcore.c
> +++ b/drivers/mtd/mtdcore.c
> @@ -1126,6 +1126,28 @@ int mtd_read_oob(struct mtd_info *mtd, loff_t from, struct mtd_oob_ops *ops)
>  }
>  EXPORT_SYMBOL_GPL(mtd_read_oob);
>
> +/* This is a bare copy of mtd_read_oob returning the actual number of bitflips */
> +int mtd_read_oob_bf(struct mtd_info *mtd, loff_t from, struct mtd_oob_ops *ops)
> +{
> +       int ret_code;
> +       ops->retlen = ops->oobretlen = 0;
> +       if (!mtd->_read_oob)
> +               return -EOPNOTSUPP;
> +       /*
> +        * In cases where ops->datbuf != NULL, mtd->_read_oob() has semantics
> +        * similar to mtd->_read(), returning a non-negative integer
> +        * representing max bitflips. In other cases, mtd->_read_oob() may
> +        * return -EUCLEAN. In all cases, perform similar logic to mtd_read().
> +        */
> +       ret_code = mtd->_read_oob(mtd, from, ops);
> +       if (unlikely(ret_code < 0))
> +               return ret_code;
> +       if (mtd->ecc_strength == 0)
> +               return 0;       /* device lacks ecc */
> +       return ret_code;
> +}
> +EXPORT_SYMBOL_GPL(mtd_read_oob_bf);
> +
>  int mtd_write_oob(struct mtd_info *mtd, loff_t to,
>                                 struct mtd_oob_ops *ops)
>  {
> diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h
> index 09f52698877..28afbb86ea9 100644
> --- a/include/linux/mtd/mtd.h
> +++ b/include/linux/mtd/mtd.h
> @@ -413,6 +413,7 @@ int mtd_panic_write(struct mtd_info *mtd, loff_t to, size_t len, size_t *retlen,
>                     const u_char *buf);
>
>  int mtd_read_oob(struct mtd_info *mtd, loff_t from, struct mtd_oob_ops *ops);
> +int mtd_read_oob_bf(struct mtd_info *mtd, loff_t from, struct mtd_oob_ops *ops);
>  int mtd_write_oob(struct mtd_info *mtd, loff_t to, struct mtd_oob_ops *ops);
>
>  int mtd_get_fact_prot_info(struct mtd_info *mtd, size_t len, size_t *retlen,
> --
> 2.34.1
>


-- 
Michael Nazzareno Trimarchi
Co-Founder & Chief Executive Officer
M. +39 347 913 2170
michael at amarulasolutions.com
__________________________________

Amarula Solutions BV
Joop Geesinkweg 125, 1114 AB, Amsterdam, NL
T. +31 (0)85 111 9172
info at amarulasolutions.com
www.amarulasolutions.com


More information about the U-Boot mailing list