[EXTERNAL] Re: [PATCH] mtd: nand: pxa3xx: Incorrect bitflip return on page read

Tue May 21 22:15:01 CEST 2024

Hi Dario

Can you add to next pull?

Michael

On Tue, May 21, 2024, 4:31 PM Ravi Minnikanti <rminnikanti at marvell.com>
wrote:

> Hi,
>
> Can you please merge this PR, if there are no more review comments?
>
> Thanks,
> Ravi
> On 5/6/24 11:28, Michael Nazzareno Trimarchi wrote:
>
> > ----------------------------------------------------------------------
> > Hi Ravi
> >
> > On Mon, May 6, 2024 at 7:33 PM Ravi Minnikanti <rminnikanti at marvell.com>
> wrote:
> >>
> >> On 5/6/24 00:35, Michael Nazzareno Trimarchi wrote:
> >>> Hi Ravi
> >>>
> >>> On Tue, Apr 30, 2024 at 6:25 AM Ravi Minnikanti <
> rminnikanti at marvell.com> wrote:
> >>>>
> >>>> On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote:
> >>>>
> >>>>>
> ----------------------------------------------------------------------
> >>>>> On Mon, Apr 29, 2024 at 6:22 PM Chris Packham <
> judge.packham at gmail.com> wrote:
> >>>>>>
> >>>>>> On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti <
> rminnikanti at marvell.com> wrote:
> >>>>>>>
> >>>>>>> Once a page is read with higher bitflips all subsequent reads
> >>>>>>> are returning the same bitflip value even though they have none.
> >>>>>>> max_bitflip variable is not being reset to 0 across page reads.
> >>>>>>>
> >>>>>>> This is causing problems like incorrectly
> >>>>>>> marking erase blocks bad by UBI and causing read failures.
> >>>>>>>
> >>>>>>> Verified the change with both MTD reads and UBI.
> >>>>>>> This change is inline with other NFC drivers.
> >>>>>>>
> >>>>>>> Sample error log where a block is marked bad incorrectly:
> >>>>>>>
> >>>>>>> ubi0: fixable bit-flip detected at PEB 125
> >>>>>>> ubi0: run torture test for PEB 125
> >>>>>>> ubi0: fixable bit-flip detected at PEB 125
> >>>>>>> ubi0 error: torture_peb: read problems on freshly erased PEB 125,
> >>>>>>> must be bad
> >>>>>>> ubi0 error: erase_worker: failed to erase PEB 125, error -5
> >>>>>>> ubi0: mark PEB 125 as bad
> >>>>>>>
> >>>>>>> Signed-off-by: rminnikanti <rminnikanti at marvell.com>
> >>>>>>
> >>>>>> Looks good to me
> >>>>>>
> >>>>>> Reviewed-by: Chris Packham <judge.packham at gmail.com>
> >>>>>>
> >>>>>>> ---
> >>>>>>>  drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++
> >>>>>>>  1 file changed, 5 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c
> b/drivers/mtd/nand/raw/pxa3xx_nand.c
> >>>>>>> index 1d9a6d107b..d2a4faad56 100644
> >>>>>>> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c
> >>>>>>> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c
> >>>>>>> @@ -800,6 +800,11 @@ static void prepare_start_command(struct
> pxa3xx_nand_info *info, int command)
> >>>>>>>         info->ecc_err_cnt       = 0;
> >>>>>>>         info->ndcb3             = 0;
> >>>>>>>         info->need_wait         = 0;
> >>>>>>> +       /*
> >>>>>>> +        * Reset max_bitflips to zero. Once command is complete,
> >>>>>>> +        * max_bitflips for this READ is returned in
> ecc.read_page()
> >>>>>>> +        */
> >>>>>>> +       info->max_bitflips      = 0;
> >>>>>>>
> >>>>>
> >>>>> Why this should not be put to 0 in read_page instead on
> prepare_start_command?
> >>>>>
> >>>>> Michael
> >>>>>
> >>>>
> >>>> ecc.read_page is invoked after the read command execution.
> >>>> First chip->cmdfunc is executed with NAND_CMD_READ0 and then
> ecc.read_page is invoked
> >>>> to read the page from buffer. So, by the time read_page is invoked,
> info->max_bitflips
> >>>> must already have the bit flip value.
> >>>>
> >>>
> >>> All the other implementation has a slight different way to handle.
> >>> From what you said the reset should
> >>> be done on for NAND_CMD_READ0 command and should be sufficient.
> >>> Technically should be moved
> >>> in switch and not unconditionally.
> >>>
> >>> Michael
> >>>
> >>
> >> max_bitflip is not being reset to 0 across page reads.
> >> Once a page is read with higher bitflips all subsequent reads
> >> are returning the same bitflip value even though they have none.
> >>
> >> This is causing problems like incorrectly
> >> marking erase blocks bad by UBI and read failures.
> >>
> >> Tested it with both MTD reads and UBI attach.
> >> This change is inline with other NFC drivers.
> >>
> >> Sample error log where a block is marked bad incorrectly:
> >>
> >> ubi0: fixable bit-flip detected at PEB 125
> >> ubi0: run torture test for PEB 125
> >> ubi0: fixable bit-flip detected at PEB 125
> >> ubi0 error: torture_peb: read problems on freshly erased PEB 125,
> >> must be bad
> >> ubi0 error: erase_worker: failed to erase PEB 125, error -5
> >> ubi0: mark PEB 125 as bad
> >>
> >> Signed-off-by: rminnikanti <rminnikanti at marvell.com>
> >> ---
> >>  drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++
> >>  1 file changed, 5 insertions(+)
> >>
> >> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c
> b/drivers/mtd/nand/raw/pxa3xx_nand.c
> >> index 1d9a6d107b..97f250483f 100644
> >> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c
> >> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c
> >> @@ -803,6 +803,11 @@ static void prepare_start_command(struct
> pxa3xx_nand_info *info, int command)
> >>
> >>         switch (command) {
> >>         case NAND_CMD_READ0:
> >> +               /*
> >> +                * Reset max_bitflips to zero. Once command is complete,
> >> +                * max_bitflips for this READ is returned in
> ecc.read_page()
> >> +                */
> >> +               info->max_bitflips = 0;
> >>         case NAND_CMD_READOOB:
> >>         case NAND_CMD_PAGEPROG:
> >>                 if (!info->force_raw)
> >> --
> >> 2.17.1
> >>
> >> Thanks Michael. Fixed it.
> >>
> >>>> Thanks, Ravi.
> >>>>
> >>>>>>>         switch (command) {
> >>>>>>>         case NAND_CMD_READ0:
> >>>>>>> --
> >>>>>>> 2.17.1
> >>>>>
> >>>>
> >>>
> >>>
> >
> > Acked-by: Michael Trimarchi <michael at amarulasolutions.com>
> >
> >
> >
>