[U-Boot] Problem converting da850evm to generic board and use libfdt

Simon Glass sjg at chromium.org
Thu Dec 11 03:10:49 CET 2014


Hi Peter,

On 10 December 2014 at 18:37, Simon Glass <sjg at chromium.org> wrote:
> Hi Peter,
>
> On Dec 10, 2014 6:23 PM, "Peter Howard" <pjh at northern-ridge.com.au> wrote:
>>
>> On Wed, 2014-12-10 at 17:49 -0700, Simon Glass wrote:
>> > Hi Peter,
>> >
>> > On 10 December 2014 at 17:19, Peter Howard <pjh at northern-ridge.com.au>
>> > wrote:
>> > > On Wed, 2014-12-10 at 15:43 -0700, Simon Glass wrote:
>> > >> Hi Peter,
>> > >>
>> > >> On 10 December 2014 at 15:17, Peter Howard
>> > >> <pjh at northern-ridge.com.au> wrote:
>> > >> >
>> > >> > On Tue, 2014-12-09 at 17:45 -0700, Simon Glass wrote:
>> > >> > > Hi Peter,
>> > >> > >
>> > >> > > On 9 December 2014 at 17:13, Peter Howard
>> > >> > > <pjh at northern-ridge.com.au> wrote:
>> > >> > > >
>> > >> > > > On Wed, 2014-12-03 at 14:20 -0800, Simon Glass wrote:
>> > >> > > > > Hi Peter,
>> > >> > > > >
>> > >> > > > > On 3 December 2014 at 13:53, Peter Howard
>> > >> > > > > <pjh at northern-ridge.com.au> wrote:
>> > >> > > > > > On Wed, 2014-12-03 at 06:38 -0700, Simon Glass wrote:
>> > >> > > > > >> Hi Peter,
>> > >> > > > > >>
>> > >> > > > > >> On 2 December 2014 at 14:59, Peter Howard
>> > >> > > > > >> <pjh at northern-ridge.com.au> wrote:
>> > >> > > > > >> >
>> > >> > > > > >> > I'm trying to make two changes to building u-boot for
>> > >> > > > > >> > the da850evm.
>> > >> > > > > >> >       * Use the generic board code to get rid of the
>> > >> > > > > >> > warning, and
>> > >> > > > > >> >       * Enable libfdt to allow booting of linux with a
>> > >> > > > > >> > standalone dtb
>> > >> > > > > >> >         image.
>> > >> > > > > >> >
>> > >> > > > > >> > The first part appears to be simple.  Just adding
>> > >> > > > > >> >
>> > >> > > > > >> >         #define CONFIG_SYS_GENERIC_BOARD
>> > >> > > > > >> >
>> > >> > > > > >> > in include/configs/da850evm.h works with no obvious
>> > >> > > > > >> > side-effects.
>> > >> > > > > >> >
>> > >> > > > > >> > However, adding
>> > >> > > > > >> >
>> > >> > > > > >> >         #define CONFIG_OF_LIBFDT
>> > >> > > > > >> >
>> > >> > > > > >> > is a different story.  It appears to introduce memory
>> > >> > > > > >> > corruption when
>> > >> > > > > >> > loading the environment.  On first boot it gives the
>> > >> > > > > >> > "bad CRC!" warning
>> > >> > > > > >> > and uses the default environment.  If you *don't* save
>> > >> > > > > >> > the environment
>> > >> > > > > >> > you can boot fine (including manual editing of the
>> > >> > > > > >> > environment). However
>> > >> > > > > >> > if you save the environment via saveenv bad things
>> > >> > > > > >> > happen on the next
>> > >> > > > > >> > boot.  An example log:
>> > >> > > > > >> >
>> > >> > > > > >> > U-Boot SPL 2015.01-rc1 (Nov 27 2014 - 14:30:26)
>> > >> > > > > >> >
>> > >> > > > > >> >
>> > >> > > > > >> > U-Boot 2015.01-rc1 (Nov 27 2014 - 14:30:26)
>> > >> > > > > >> >
>> > >> > > > > >> > I2C:   ready
>> > >> > > > > >> > DRAM:  64 MiB
>> > >> > > > > >> > WARNING: Caches not enabled
>> > >> > > > > >> > MMC:   davinci: 0
>> > >> > > > > >> > SF: Detected M25P64 with page size 256 Bytes, erase size
>> > >> > > > > >> > 64 KiB, total 8 MiB
>> > >> > > > > >> > In:    serial
>> > >> > > > > >> > Out:   serial
>> > >> > > > > >> > Err:   serial
>> > >> > > > > >> > SF: Detected M25P64 with page size 256 Bytes, erase size
>> > >> > > > > >> > 64 KiB, total 8 MiB
>> > >> > > > > >> > Warning: Invalid MAC address read from SPI flash
>> > >> > > > > >> > Net:   DaVinci-EMAC
>> > >> > > > > >> > Error: DaVinci-EMAC address not set.
>> > >> > > > > >> >
>> > >> > > > > >> > U-Boot > help
>> > >> > > > > >> > data abort
>> > >> > > > > >> > pc : [<c108ffd8>]          lr : [<c10900b4>]
>> > >> > > > > >> > sp : c3e5f838  ip : 00000000     fp : c3e5fda4
>> > >> > > > > >> > r10: c10b1f28  r9 : c3e5ff08     r8 : 0000000e
>> > >> > > > > >> > r7 : c10b22c4  r6 : c10aa2a0     r5 : 00000000  r4 :
>> > >> > > > > >> > 0000001b
>> > >> > > > > >> > r3 : c10b8f70  r2 : 00000001     r1 : c3e5f840  r0 :
>> > >> > > > > >> > ffffffff
>> > >> > > > > >> > Flags: Nzcv  IRQs off  FIQs off  Mode SVC_32
>> > >> > > > > >> > Resetting CPU ...
>> > >> > > > > >> >
>> > >> > > > > >> > If I rebuild  with CONFIG_OF_LIBFDT removed again from
>> > >> > > > > >> > da850evm.h the
>> > >> > > > > >> > problem disappears.  And you can see that the saveenv
>> > >> > > > > >> > worked (i.e. the
>> > >> > > > > >> > environment is what was saved before the reboot and data
>> > >> > > > > >> > abort).
>> > >> > > > > >> >
>> > >> > > > > >> > I've traced the problem as far as the inline version of
>> > >> > > > > >> > console_puts()
>> > >> > > > > >> > in common/console.c.  The table dispatch there and the
>> > >> > > > > >> > fact that the
>> > >> > > > > >> > problem appears only when you load the environment makes
>> > >> > > > > >> > me think it's
>> > >> > > > > >> > memory corruption.
>> > >> > > > > >> >
>> > >> > > > > >> > Note: if you do *not* specify CONFIG_SYS_GENERIC_BOARD
>> > >> > > > > >> > you still get the
>> > >> > > > > >> > data abort, however it takes a bit more effort to
>> > >> > > > > >> > trigger (like actually
>> > >> > > > > >> > looking at the environment :-)  )
>> > >> > > > > >> >
>> > >> > > > > >> > (Note: This is building against the u-boot-2015.01-rc1
>> > >> > > > > >> > tree)
>> > >> > > > > >> >
>> > >> > > > > >> > Suggestions?
>> > >> > > > > >>
>> > >> > > > > >> In case it helps, I got the same symptom (help crashes)
>> > >> > > > > >> and it was due
>> > >> > > > > >> to BSS not being cleared. Stefan (on cc) found this
>> > >> > > > > >> problem - he said
>> > >> > > > > >> something to do with GDT calculation or handling. However
>> > >> > > > > >> it is just a
>> > >> > > > > >> guess and probably has nothing to do with your issue.
>> > >> > > > > >
>> > >> > > > > > I may be missing something, but the GDT appears to be
>> > >> > > > > > x86-specific
>> > >> > > > > > whereas I'm building for ARMv5.
>> > >> > > > >
>> > >> > > > > OK for some reason I thought this was PPC!
>> > >> > > > >
>> > >> > > > > Maybe you can find your pc in System.map and work out where
>> > >> > > > > it is
>> > >> > > > > going wrong? Are you hitting some image size limit?
>> > >> > > > >
>> > >> > > > > pc : [<c108ffd8>]
>> > >> > > >
>> > >> > > >
>> > >> > > > Sorry, been distracted on other stuff for a few days.
>> > >> > > >
>> > >> > > > First, I now understand the global descriptor a bit better.
>> > >> > > > For ARMv5
>> > >> > > > It's stored in r9 and still looks sane.  The relevant info:
>> > >> > > >
>> > >> > > > (gdb) print/x *((gd_t *)$r9)
>> > >> > > > $1 = {bd = 0xc3e5ffb0, flags = 0x183, baudrate = 0x1c200,
>> > >> > > > cpu_clk = 0x0,
>> > >> > > >   bus_clk = 0x0, pci_clk = 0x0, mem_clk = 0x0, have_console =
>> > >> > > > 0x1,
>> > >> > > >   env_addr = 0xc10a8fcc, env_valid = 0x1, ram_top = 0xc4000000,
>> > >> > > >   relocaddr = 0xc3f80000, ram_size = 0x4000000, mon_len =
>> > >> > > > 0x6ffb0,
>> > >> > > >   irq_sp = 0xc3e5fef0, start_addr_sp = 0xc3e5fee0, reloc_off =
>> > >> > > > 0x2f00000,
>> > >> > > >   new_gd = 0xc3e5ff08, fdt_blob = 0x0, new_fdt = 0x0, fdt_size
>> > >> > > > = 0x0,
>> > >> > > >   jt = 0xc3e601c0, env_buf = {0x31, 0x31, 0x35, 0x32, 0x30,
>> > >> > > > 0x30,
>> > >> > > >     0x0 <repeats 26 times>}, cur_i2c_bus = 0x0, timebase_h =
>> > >> > > > 0x0,
>> > >> > > >   timebase_l = 0x0, arch = {timer_rate_hz = 0x16e360, tbu =
>> > >> > > > 0x0,
>> > >> > > >     tbl = 0x4cc62, lastinc = 0x0, timer_reset_value = 0x0,
>> > >> > > >     tlb_addr = 0xc3ff0000, tlb_size = 0x4000}}
>> > >> > > >
>> > >> > > >
>> > >> > > > The pc is definitely bogus.  The reloc address is 0xc3f80000
>> > >> > > > whereas
>> > >> > > > that would be a pre-reloc address (starting at 0xc1080000).
>> > >> > > > And it's
>> > >> > > > definitely relocated by the time of failure.  The only other
>> > >> > > > bit of
>> > >> > > > information I have right now is that adding CONFIG_OF_LIBFDT
>> > >> > > > drops the
>> > >> > > > reloc address from 0xc3f85000 to 0xc3f80000.
>> > >> > > >
>> > >> > > > Don't know if any of that gives additional insight.  Meanwhile
>> > >> > > > I
>> > >> > > > continue tracing.
>> > >> > >
>> > >> > > Yes, continue tracing.
>> > >> > >
>> > >> > > If ram_size is 0x40000000 and ram_top is 0xc4000000 then your RAM
>> > >> > > presumably starts at 0xc0000000. Then the relocation address
>> > >> > > actually
>> > >> > > seems reasonable to me.
>> > >> > >
>> > >> > > I don't know why the reloc address changes when you add
>> > >> > > CONFIG_OF_LIBFDT.
>> > >> > >
>> > >> > > You can add '#define DEBUG' at the very top of board_f/r.c to see
>> > >> > > addresses.
>> > >> >
>> > >> > I'm not sure what you meant by board_f/r.c as that file doesn't
>> > >> > seem to
>> > >>
>> > >> common/board_f.c
>> > >> common/board_r.c
>> > >>
>> > >> >
>> > >> > exist.  I whacked '#define DEBUG' in da850evm.h and got a wealth of
>> > >> > output.  However, the only new bit of information I've gleaned is
>> > >> > that
>> > >> > the lower that the reloc address goes, the faster things die.  It
>> > >> > goes
>> > >> > lower in -rc3 (0xc3f7f000), and it doesn't make it to the prompt on
>> > >> > a
>> > >> > reset after saving the environment.  Likewise with '#define DEBUG';
>> > >> > after saving the environment it doesn't get back to the prompt on
>> > >> > the
>> > >> > next reset.  All the addresses printed seem reasonable.
>> > >> >
>> > >> > The only thing that doesn't look right is that the command function
>> > >> > pointers all look to be pre-reloc addresses.  Though I don't see
>> > >> > how
>> > >> > this change would cause a failure that wouldn't happen already.
>> > >> >
>> > >> > So it seems that _something_ is being overwritten by the
>> > >> > environment
>> > >> > load, but I'm yet to get an idea of what.
>> > >> >
>> > >> > --
>> > >> > Peter Howard <pjh at northern-ridge.com.au>
>> > >> >
>> > >>
>> > >> Me neither. But you do have a data abort so may be able to look
>> > >> around
>> > >> there and figure out where exactly it died. Better if you can use a
>> > >> debugger.
>> > >>
>> > >
>> > >
>> > > Here's what appears to be happening with a death on typing
>> > > "help" (-rc1): The logic flow gets to the (unrelocated) fputs() - and
>> > > into the inline version of console_putc().  It looks up
>> > > stdio_devices[1]
>> > > (again, unrelocated addr) which is a valid pointer - sort of.  The
>> > > value
>> > > is 0x2081004 which is outside of RAM, and the contents of the address
>> > > are, according to gdb, zeroed out.  Which means
>> > > stdio_devices[1]->putc()
>> > > is a jump to 0x0.  I've stepped through that using JTAG+openocd+gdb.
>> > >
>> > > With extra debug statements, console output seems to cause a hang from
>> > > somewhere in himport_r() (which is using relocated addresses including
>> > > data).
>> > >
>> > > All this, to me, points to an issue with the unrelocated locations
>> > > being
>> > > used after environment import, but I don't know enough about u-boot
>> > > structure to know if that is right or not . . .
>> > >
>> > >
>> > > Peter Howard <pjh at northern-ridge.com.au>
>> > >
>> >
>> > Perhaps look at how it gets to the unrelocated fputs()? If it can call
>> > the correct fputs() before initr_env() then you can perhaps narrow it
>> > down.
>> >
>> > But I can't see how you would be able to type at the console with this
>> > problem, since fputs() is used by the command line editor.
>> >
>> > I suspect you are actually seeing a symptom of something else. You
>> > could try enabling CONFIG_CONSOLE_MUX and see if that changes the bug.
>>
>> Hmmm.  That produces a new failure - it goes into an endless loop in
>> fgetc().  And it does that:
>>       * With CONFIG_GENERIC_BOARD and CONFIG_OF_LIBFDT - both with and
>>         without saving the environment
>>       * With CONFIG_GENERIC_BOARD only,
>>       * Without CONFIG_GENERIC_BOARD.
>>
>> :-)
>
> so just adding the console config changes the behavior on your board? Does
> you BSS work? Do you have a custom link script? Are you writing to BSS
> before relocation?

I see a few things:

- 4KB stack (should be enough I suppose)
- SPL link script, but it doesn't look like it does anything useful.
Maybe drop it?

But I'm pretty sure this is nothing to do with it. This is a bit of a
long shot, but if your relocation is broken you might be corrupting
BSS - the variables in System.map between __rel_dyn_start and
__rel_dyn_end. This can happen if you write to a BSS variable before
relocation. You can check the area (e.g. by checksumming it) early in
board_init_f() - e.g. setup_mon_len(). Put the result in a new member
of struct global_data (gd) - then checksum again and compare before
relocation in setup_reloc().

Probably nothing else but to keep digging.

Regards,
Simon


More information about the U-Boot mailing list