[PATCH 1/2] cmd: mtd: fix read failure on default full partition read with bad blocks
Michael Nazzareno Trimarchi
michael at amarulasolutions.com
Sat Dec 20 09:39:37 CET 2025
HI Peter
On Tue, Dec 16, 2025 at 2:41 PM Peter Suti
<peter.suti at streamunlimited.com> wrote:
>
> When performing an 'mtd read' on a NAND partition, the presence of bad
> blocks can cause the command to fail with error -22 (EINVAL) if the
> requested size is equal to (or close to) the partition size.
> When size is not provided, the whole partition is read by default.
>
> This issue occurs because the bad block skipping logic increments the
> physical read offset ('off') without decrementing the 'remaining' byte
> count. If enough bad blocks are skipped, the 'off' pointer eventually
> slides past the end of the partition boundary ('mtd->size') while
> 'remaining' is still non-zero. The subsequent call to 'mtd_read_oob'
> then fails its internal boundary checks.
>
> The Scenario:
> Partition Size: 0x1200000
> Bad Blocks: 1
> Command: mtd read swufit ${loadaddr}
> (Defaults to reading full 0x1200000 bytes)
>
> 1. U-Boot attempts to read 0x1200000 valid bytes.
> 2. It encounters the bad block and skips it (off += mtd->erasesize).
> 3. To satisfy the full request, it must read
> (0x1200000 + mtd->erasesize) physical bytes.
> 4. The read pointer 'off' exceeds 'mtd->size'.
> 5. Command aborts with: "Read on swufit failed with error -22".
>
> Add a boundary check at the start of the read loop. If the read pointer
> 'off' reaches the end of the physical device, check if the user explicitly
> requested a size (argc >= 2).
>
> * If no size was specified (default read): Stop the read gracefully,
> print a notice, and return success. We assume the user wants "all
> available data" up to the physical limit.
> * If a size was explicitly specified: Continue standard execution. This
> will result in an error (-22) as expected, since the specific requirement
> cannot be met.
>
> It is physically impossible to read the full logical size of a partition
> if it contains bad blocks (as the bad blocks consume physical space
> required for the data). In this case, truncating the read is reasonable.
> For common use cases like loading FIT images (which often
> contain padding at the end), missing the trailing 0xFF data is acceptable,
> whereas failing the entire load operation is not.
>
> Also add some logging when bad blocks are encountered.
>
> NOTE: This changes the behavior of 'mtd read' when no size is provided.
> Previously, it failed if the full read was impossible. Now, it
> succeeds with a truncated buffer. The caller is responsible for
> verifying data integrity (e.g., via fitImage hash verification).
>
> Signed-off-by: Peter Suti <peter.suti at streamunlimited.com>
> [polished the comment and commit message]
> Signed-off-by: Radek Dostál <radek.dostal at streamunlimited.com>
> ---
> cmd/mtd.c | 27 ++++++++++++++++++++++++++-
> 1 file changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/cmd/mtd.c b/cmd/mtd.c
> index 7f25144098b..1d1845bce44 100644
> --- a/cmd/mtd.c
> +++ b/cmd/mtd.c
> @@ -559,8 +559,10 @@ static int do_mtd_io(struct cmd_tbl *cmdtp, int flag, int argc,
>
> /* Search for the first good block after the given offset */
> off = start_off;
> - while (mtd_block_isbad(mtd, off))
> + while (mtd_block_isbad(mtd, off)) {
> + printf("Bad block: failed to read at offset 0x%llx, skipping.\n", off);
> off += mtd->erasesize;
> + }
Would like to avoid adding more debug stuff or printing here.
>
> led_activity_blink();
>
> @@ -569,9 +571,32 @@ static int do_mtd_io(struct cmd_tbl *cmdtp, int flag, int argc,
>
> /* Loop over the pages to do the actual read/write */
> while (remaining) {
> + /*
> + * Boundary Check: Bad block skipping may push the physical read offset
> + * past the partition end (mtd->size).
> + * If the read pointer 'off' has reached the end of the physical flash,
> + * we physically cannot read more.
> + *
> + * Check 'argc' to see if the user explicitly requested a specific size.
> + * * If argc < 2, the user did NOT specify a size (defaulted to partition size).
> + * In this case, we treat "end of flash" as a natural stopping point
> + * and truncate the read gracefully.
> + * * If argc >= 2, the user explicitly asked for N bytes.
> + * In this case, we do NOT truncate. We let the loop continue, which
> + * will trigger the standard error (-22) because the request cannot be met.
> + */
> + if (argc < 2 && off >= mtd->size) {
> + printf("Notice: Reached end of partition with %lld bytes remaining. "
> + "Truncating read.\n", remaining);
This is ok as print, to inform about the boundary check
> + remaining = 0;
> + ret = CMD_RET_SUCCESS;
What is the drawback to failing here, I mean you can still verify the
integrity. GIve a script example
where this is needed in the commit message
> + break;
> + }
> +
> /* Skip the block if it is bad */
> if (mtd_is_aligned_with_block_size(mtd, off) &&
> mtd_block_isbad(mtd, off)) {
> + printf("Bad block: failed to read at offset 0x%llx, skipping.\n", off);
> off += mtd->erasesize;
This is an extra print for now, you can propose it on a separated patch
Michael
> continue;
> }
> --
> 2.43.0
>
--
Michael Nazzareno Trimarchi
Co-Founder & Chief Executive Officer
M. +39 347 913 2170
michael at amarulasolutions.com
__________________________________
Amarula Solutions BV
Joop Geesinkweg 125, 1114 AB, Amsterdam, NL
T. +31 (0)85 111 9172
info at amarulasolutions.com
www.amarulasolutions.com
More information about the U-Boot
mailing list