[U-Boot-Users] Loading from NAND using 'nboot' Periodically Fails Where 'nand read' Succeeds

Fri Jun 6 00:30:09 CEST 2008

On 6/5/08 1:47 PM, Grant Erickson wrote:
> On 6/2/08 3:02 PM, Grant Erickson wrote:
>> On 6/2/08 11:21 AM, Scott Wood wrote:
>>> On Mon, Jun 02, 2008 at 08:22:21AM +0200, Stefan Roese wrote:
>>>> Hi Grant,
>>>> 
>>>> On Monday 02 June 2008, Grant Erickson wrote:
>>>>> Before I jump in with the BDI and start debugging, has anyone else using
>>>>> 'nboot' and FIT images noticed that 'nboot' periodically fails where 'nand
>>>>> read.i' of the SAME region of NAND succeeds?
>>>> 
>>>> Not sure here, since I never used nboot before. But "nand read.i" skips bad
>>>> blocks and perhaps "nboot" not? I suggest that you check if this is the
>>>> case 
>>>> and if you have bad blocks in this NAND area.
>>> 
>>> It is indeed the case -- you need to use "nboot.i".
>>> 
>>> -Scott
>> 
>> Scott and Stefan,
>> 
>> Thanks for the suggestion. That solved it. As an academic exercise, is there
>> any practical reason a system would want to use nboot, as I erroneously chose
>> to do, without .i|.jffs2|.e?
> 
> It would appear I was slightly too quick to regard this as fixed. What I found
> this morning is the following with the AMCC "Haleakala" board:
> 
> 1) A 48+ hour reboot test bouncing between the boot0 and boot1 partitions
> passed, where boot0 and boot1 are defined as:
> 
>     => printenv bootaddr bootcmd boot0 boot1
>     bootaddr=800000
>     bootcmd=run boot0 || run boot1 || reset
>     boot0=nboot.i ${bootaddr} 0 0 && setenv bootargs root=/dev/mtdblock9 &&
> run addjffs2 addtty && bootm ${bootaddr}
>     boot1=nboot.i ${bootaddr} 0 1C00000 && setenv bootargs
> root=/dev/mtdblock11 && run addjffs2 addtty && bootm ${bootaddr}
> 
> 2) I stopped that test and added power-cycling to the mix and it, again,
> immediately failed with:
> 
>     Loading from NAND 64MiB 3,3V 8-bit, offset 0x0
>     ** Bad FIT image format
> 
>     Loading from NAND 64MiB 3,3V 8-bit, offset 0x1c00000
>     ** Bad FIT image format
> 
> So, resets are not enough to trigger this issue, it takes a power cycle.
> 
> I have found that this state is recoverable by issuing a 'nand read.i' and
> then re-running 'boot0':
> 
>     => nand read.i ${bootaddr} 0 400000 && run boot0
> 
> From that point forward, both boot0 and boot1 work flawlessly.
> 
> I have also found that NFS booting to get Linux up and running, restarting and
> then running boot0 or boot1 also works from that point forward until the next
> power cycle.
> 
> So, there seems to be some specific state the PowerPC NDFC (NAND controller)
> or Samsung K9F1208U0B NAND gets in where either 'nand read.i' or the Linux MTD
> driver kick one or both in such a way as to get out of whatever state prevents
> nboot.i from working.
> 
> Strangely though, both nand read.i and nboot.i both exercise the same
> nand_read_opts path in nand_util.c.
> 
> Any thoughts?

Marian:

I'm following up with you on this since 'git blame cmd_nand.c' seems to
indicate you added the CONFIG_FIT support to this file.

Based on stepping through with the debugger, my initial guess about hardware
issues may have been incorrect. Is there an implicit assumption in the
following snippet from nand_load_image() in cmd_nand.c:

    ...

    cnt = nand->oobblock;
    if (jffs2) {
        nand_read_options_t opts;
        memset(&opts, 0, sizeof(opts));
        opts.buffer = (u_char*) addr;
        opts.length = cnt;
        opts.offset = offset;
        opts.quiet      = 1;
        r = nand_read_opts(nand, &opts);
    } else {
        r = nand_read(nand, offset, &cnt, (u_char *) addr);
    }

    if (r) {
        puts("** Read error\n");
        show_boot_progress (-56);
        return 1;
    }

    ...

    switch (genimg_get_format ((void *)addr)) {

    ...

#if defined(CONFIG_FIT)
    case IMAGE_FORMAT_FIT:
        fit_hdr = (const void *)addr;
        if (!fit_check_format (fit_hdr)) {
            show_boot_progress (-150);
            puts ("** Bad FIT image format\n");
            return 1;
        }
        show_boot_progress (151);
        puts ("Fit image detected...\n");

        cnt = fit_get_size (fit_hdr);
        break;
#endif

    ...

that casting 'addr' to 'fit_hdr' represents more than 512 bytes of valid
data to be accessed by fit_check_format()? If so, should not 'cnt =
nand->oobblock' be explicitly set to match that assumption?

I am guessing that my observation that NFS booting and nand read.i addressed
the issue strictly had to do with the fact that the 8 MiB address to which
those operate were not getting used or otherwise updated between resets
after the boot of the kernel allowing subsequent runs of 'nboot' to
"leverage" the stale data.

Regards,

Grant