[U-Boot] UBI fixable bit-flip issue
Heiko Schocher
hs at denx.de
Thu Jul 12 05:22:13 UTC 2018
Hello Mark,
added Richard Weinberger to cc...
Am 12.07.2018 um 02:28 schrieb Mark Spieth:
> Hi
>
> In the process of investigating a boot failure on one of our devices, the
>
> UBI: fixable bit-flip detected at PEB
>
> message was seen with the following behaviour during kernel load in u-boot.
>
> Read [2285568] bytes
> UBI: fixable bit-flip detected at PEB 415
> UBI: schedule PEB 415 for scrubbing
> UBI: fixable bit-flip detected at PEB 415
> UBI: fixable bit-flip detected at PEB 419
> UBI: schedule PEB 419 for scrubbing
> UBI: fixable bit-flip detected at PEB 419
> UBI: fixable bit-flip detected at PEB 420
> UBI: schedule PEB 420 for scrubbing
> UBI: fixable bit-flip detected at PEB 420
> UBI: fixable bit-flip detected at PEB 419
> UBI: fixable bit-flip detected at PEB 420
> UBI: fixable bit-flip detected at PEB 419
> UBI: fixable bit-flip detected at PEB 420
> UBI: fixable bit-flip detected at PEB 419
> UBI: fixable bit-flip detected at PEB 420
> UBI: fixable bit-flip detected at PEB 419
> UBI: fixable bit-flip detected at PEB 420
> UBI: fixable bit-flip detected at PEB 419
> UBI: fixable bit-flip detected at PEB 420
> UBI: fixable bit-flip detected at PEB 419
>
> This repeats until reset.
>
> U boot is a patched version of 2010.06 supplied by the chip vendor. No newer version is available
> from the vendor to try.
:-(
Can you use current mainline ? It s hard to say something
about a 8 year old vendor U-Boot version ...
> The patches include the init eba/wl swap.
What do you mean here?
> A more detailed log with debugging available follows:
>
> UBI: fixable bit-flip detected at PEB 419
> UBI DBG: schedule_erase: schedule erasure of PEB 419, EC 19, torture 0
> UBI DBG: erase_worker: erase PEB 419 EC 19
> UBI DBG: sync_erase: erase PEB 419, old EC 19
> UBI DBG: do_sync_erase: erase PEB 419
> UBI DBG: sync_erase: erased PEB 419, new EC 20
> UBI DBG: ubi_io_write_ec_hdr: write EC header to PEB 419
> UBI DBG: ubi_io_write: write 2048 bytes to PEB 419:0
> UBI DBG: ensure_wear_leveling: schedule scrubbing
> UBI DBG: wear_leveling_worker: scrub PEB 420 to PEB 419
> UBI DBG: ubi_io_read_vid_hdr: read VID header from PEB 420
> UBI DBG: ubi_io_read: read 2048 bytes from PEB 420:2048
> UBI DBG: ubi_eba_copy_leb: copy LEB 6:11, PEB 420 to PEB 419
> UBI DBG: ubi_eba_copy_leb: read 126976 bytes of data
> UBI DBG: ubi_io_read: read 126976 bytes from PEB 420:4096
> UBI: fixable bit-flip detected at PEB 420
> UBI DBG: ubi_io_write_vid_hdr: write VID header to PEB 419
> UBI DBG: ubi_io_write: write 2048 bytes to PEB 419:2048
> UBI DBG: ubi_io_read_vid_hdr: read VID header from PEB 419
> UBI DBG: ubi_io_read: read 2048 bytes from PEB 419:2048
> UBI DBG: ubi_io_write: write 126976 bytes to PEB 419:4096
> UBI DBG: ubi_io_read: read 126976 bytes from PEB 419:4096
> UBI: fixable bit-flip detected at PEB 419
> UBI DBG: schedule_erase: schedule erasure of PEB 419, EC 20, torture 0
> UBI DBG: erase_worker: erase PEB 419 EC 20
> UBI DBG: sync_erase: erase PEB 419, old EC 20
> UBI DBG: do_sync_erase: erase PEB 419
> UBI DBG: sync_erase: erased PEB 419, new EC 21
> UBI DBG: ubi_io_write_ec_hdr: write EC header to PEB 419
> UBI DBG: ubi_io_write: write 2048 bytes to PEB 419:0
> UBI DBG: ensure_wear_leveling: schedule scrubbing
> UBI DBG: wear_leveling_worker: scrub PEB 420 to PEB 419
> UBI DBG: ubi_io_read_vid_hdr: read VID header from PEB 420
> UBI DBG: ubi_io_read: read 2048 bytes from PEB 420:2048
> UBI DBG: ubi_eba_copy_leb: copy LEB 6:11, PEB 420 to PEB 419
> UBI DBG: ubi_eba_copy_leb: read 126976 bytes of data
> UBI DBG: ubi_io_read: read 126976 bytes from PEB 420:4096
> UBI: fixable bit-flip detected at PEB 420
> UBI DBG: ubi_io_write_vid_hdr: write VID header to PEB 419
> UBI DBG: ubi_io_write: write 2048 bytes to PEB 419:2048
> UBI DBG: ubi_io_read_vid_hdr: read VID header from PEB 419
> UBI DBG: ubi_io_read: read 2048 bytes from PEB 419:2048
> UBI DBG: ubi_io_write: write 126976 bytes to PEB 419:4096
> UBI DBG: ubi_io_read: read 126976 bytes from PEB 419:4096
> UBI: fixable bit-flip detected at PEB 419
>
> Investigation showed that a read with correctable bit errors was done returning -EUCLEAN to the ubi
> read function.
>
> Having read https://lists.denx.de/pipermail/u-boot/2013-September/161961.html which details a
> workaround to not return EUCLEAN from the NAND reader unless the number of fixed bits returned was
> 75% of the total number of correctable bits was exceeded during the read. This was impleneted in
> this version of ubi in uboot 2010.06 and it does hide the bit-flip infinite issue since this is new
> NAND FLASH. The original 2010.06 implementation returns EUCLEAN for any number of fixable bit flips
> and thus causes the PEB move to the best free one (scrub mode in wear_leveling_worker).
>
> This fix is not a root cause fix though. Investigating further led to the following root cause
> solution. The following is AFAICT.
>
> When the scrubber chooses a PEB to move the from the free balanced tree. This tree is sorted by EC
> (erase count) and then by PEB number.
>
> The find_wl_entry call uses a max parameter of WL_FREE_MAX_DIFF which is 8192 in this config. So the
> find_wl_entry function will find a PEB that is better in error count that the current PEB EC. This
> can easily cause it to find the PEB that was just moved from if it is the lowest numbered PEB in the
> free tree. Waiting for EC to go above 8192 would take a long time and cause premature aging of the
> flash PEBs in question.
>
> The easy solution is to change the max parameter to this call to 0 so it finds a PEB with a smaller
> EC than the one being replaced. This means it wont use the previously discarded PEB as its first
> choice.
I am not sure if it is so easy ...
> This fix was implemented and fixable bit-flip errors no longer hang/freeze the boot process! UBI
> erase and reformat was used between re-tests to get consistent results.
>
> Adding the above 75% correctable bitflip threshold is also a good thing as less movement will ensue
> when the FLASH is new, but as the flash ages, the root cause will once again be invoked causing
> un-recoverable boot failures.
>
> Note this fault is also in the latest kernel drivers for UBI and may also exist in other wear
> leveling implementations. The kernel driver issue may be at fault for android devices locking
> up/freezing sporadically during FLASH read when scrubbing due to a relatively full flash and
> correctable errors causing ping pong PEB moves.
>
> The question is, is my root cause solution sound or have I missed something?
I have to think about, before I write nonsene, but may Richard has
here a deeper insight.
> I know an algo change would probably be better or a way to detect move loops to prevent this from
> occurring, but this solution does work on all the devices that were failing manufacture tests
> previously.
bye,
Heiko
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-52 Fax: +49-8142-66989-80 Email: hs at denx.de
More information about the U-Boot
mailing list