[PATCH] mtd: nand: pxa3xx: Incorrect bitflip return on page read

Sun Jun 16 12:24:31 CEST 2024

Hi

Can this be merged?
Let me know if I missed any process.

Thanks,
Ravi

On 5/21/24 13:15, Michael Nazzareno Trimarchi wrote:
> Hi Dario
> 
> Can you add to next pull?
> 
> Michael
> 
> On Tue, May 21, 2024, 4:31 PM Ravi Minnikanti <rminnikanti at marvell.com>
> wrote:
> 
>> Hi,
>>
>> Can you please merge this PR, if there are no more review comments?
>>
>> Thanks,
>> Ravi
>> On 5/6/24 11:28, Michael Nazzareno Trimarchi wrote:
>>
>>> ----------------------------------------------------------------------
>>> Hi Ravi
>>>
>>> On Mon, May 6, 2024 at 7:33 PM Ravi Minnikanti <rminnikanti at marvell.com>
>> wrote:
>>>>
>>>> On 5/6/24 00:35, Michael Nazzareno Trimarchi wrote:
>>>>> Hi Ravi
>>>>>
>>>>> On Tue, Apr 30, 2024 at 6:25 AM Ravi Minnikanti <
>> rminnikanti at marvell.com> wrote:
>>>>>>
>>>>>> On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote:
>>>>>>
>>>>>>>
>> ----------------------------------------------------------------------
>>>>>>> On Mon, Apr 29, 2024 at 6:22 PM Chris Packham <
>> judge.packham at gmail.com> wrote:
>>>>>>>>
>>>>>>>> On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti <
>> rminnikanti at marvell.com> wrote:
>>>>>>>>>
>>>>>>>>> Once a page is read with higher bitflips all subsequent reads
>>>>>>>>> are returning the same bitflip value even though they have none.
>>>>>>>>> max_bitflip variable is not being reset to 0 across page reads.
>>>>>>>>>
>>>>>>>>> This is causing problems like incorrectly
>>>>>>>>> marking erase blocks bad by UBI and causing read failures.
>>>>>>>>>
>>>>>>>>> Verified the change with both MTD reads and UBI.
>>>>>>>>> This change is inline with other NFC drivers.
>>>>>>>>>
>>>>>>>>> Sample error log where a block is marked bad incorrectly:
>>>>>>>>>
>>>>>>>>> ubi0: fixable bit-flip detected at PEB 125
>>>>>>>>> ubi0: run torture test for PEB 125
>>>>>>>>> ubi0: fixable bit-flip detected at PEB 125
>>>>>>>>> ubi0 error: torture_peb: read problems on freshly erased PEB 125,
>>>>>>>>> must be bad
>>>>>>>>> ubi0 error: erase_worker: failed to erase PEB 125, error -5
>>>>>>>>> ubi0: mark PEB 125 as bad
>>>>>>>>>
>>>>>>>>> Signed-off-by: rminnikanti <rminnikanti at marvell.com>
>>>>>>>>
>>>>>>>> Looks good to me
>>>>>>>>
>>>>>>>> Reviewed-by: Chris Packham <judge.packham at gmail.com>
>>>>>>>>
>>>>>>>>> ---
>>>>>>>>>  drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++
>>>>>>>>>  1 file changed, 5 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c
>> b/drivers/mtd/nand/raw/pxa3xx_nand.c
>>>>>>>>> index 1d9a6d107b..d2a4faad56 100644
>>>>>>>>> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c
>>>>>>>>> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c
>>>>>>>>> @@ -800,6 +800,11 @@ static void prepare_start_command(struct
>> pxa3xx_nand_info *info, int command)
>>>>>>>>>         info->ecc_err_cnt       = 0;
>>>>>>>>>         info->ndcb3             = 0;
>>>>>>>>>         info->need_wait         = 0;
>>>>>>>>> +       /*
>>>>>>>>> +        * Reset max_bitflips to zero. Once command is complete,
>>>>>>>>> +        * max_bitflips for this READ is returned in
>> ecc.read_page()
>>>>>>>>> +        */
>>>>>>>>> +       info->max_bitflips      = 0;
>>>>>>>>>
>>>>>>>
>>>>>>> Why this should not be put to 0 in read_page instead on
>> prepare_start_command?
>>>>>>>
>>>>>>> Michael
>>>>>>>
>>>>>>
>>>>>> ecc.read_page is invoked after the read command execution.
>>>>>> First chip->cmdfunc is executed with NAND_CMD_READ0 and then
>> ecc.read_page is invoked
>>>>>> to read the page from buffer. So, by the time read_page is invoked,
>> info->max_bitflips
>>>>>> must already have the bit flip value.
>>>>>>
>>>>>
>>>>> All the other implementation has a slight different way to handle.
>>>>> From what you said the reset should
>>>>> be done on for NAND_CMD_READ0 command and should be sufficient.
>>>>> Technically should be moved
>>>>> in switch and not unconditionally.
>>>>>
>>>>> Michael
>>>>>
>>>>
>>>> max_bitflip is not being reset to 0 across page reads.
>>>> Once a page is read with higher bitflips all subsequent reads
>>>> are returning the same bitflip value even though they have none.
>>>>
>>>> This is causing problems like incorrectly
>>>> marking erase blocks bad by UBI and read failures.
>>>>
>>>> Tested it with both MTD reads and UBI attach.
>>>> This change is inline with other NFC drivers.
>>>>
>>>> Sample error log where a block is marked bad incorrectly:
>>>>
>>>> ubi0: fixable bit-flip detected at PEB 125
>>>> ubi0: run torture test for PEB 125
>>>> ubi0: fixable bit-flip detected at PEB 125
>>>> ubi0 error: torture_peb: read problems on freshly erased PEB 125,
>>>> must be bad
>>>> ubi0 error: erase_worker: failed to erase PEB 125, error -5
>>>> ubi0: mark PEB 125 as bad
>>>>
>>>> Signed-off-by: rminnikanti <rminnikanti at marvell.com>
>>>> ---
>>>>  drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++
>>>>  1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c
>> b/drivers/mtd/nand/raw/pxa3xx_nand.c
>>>> index 1d9a6d107b..97f250483f 100644
>>>> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c
>>>> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c
>>>> @@ -803,6 +803,11 @@ static void prepare_start_command(struct
>> pxa3xx_nand_info *info, int command)
>>>>
>>>>         switch (command) {
>>>>         case NAND_CMD_READ0:
>>>> +               /*
>>>> +                * Reset max_bitflips to zero. Once command is complete,
>>>> +                * max_bitflips for this READ is returned in
>> ecc.read_page()
>>>> +                */
>>>> +               info->max_bitflips = 0;
>>>>         case NAND_CMD_READOOB:
>>>>         case NAND_CMD_PAGEPROG:
>>>>                 if (!info->force_raw)
>>>> --
>>>> 2.17.1
>>>>
>>>> Thanks Michael. Fixed it.
>>>>
>>>>>> Thanks, Ravi.
>>>>>>
>>>>>>>>>         switch (command) {
>>>>>>>>>         case NAND_CMD_READ0:
>>>>>>>>> --
>>>>>>>>> 2.17.1
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>>> Acked-by: Michael Trimarchi <michael at amarulasolutions.com>
>>>
>>>
>>>
>>
>