[U-Boot] Problem converting da850evm to generic board and use libfdt

Peter Howard pjh at northern-ridge.com.au
Tue Dec 16 07:27:35 CET 2014


On Wed, 2014-12-10 at 19:10 -0700, Simon Glass wrote:
> Hi Peter,
> 
> On 10 December 2014 at 18:37, Simon Glass <sjg at chromium.org> wrote:
> > Hi Peter,
> >
> > On Dec 10, 2014 6:23 PM, "Peter Howard" <pjh at northern-ridge.com.au> wrote:
> >>
> >> On Wed, 2014-12-10 at 17:49 -0700, Simon Glass wrote:
> >> > Hi Peter,
> >> >
> >> > On 10 December 2014 at 17:19, Peter Howard <pjh at northern-ridge.com.au>
> >> > wrote:
> >> > > On Wed, 2014-12-10 at 15:43 -0700, Simon Glass wrote:
> >> > >> Hi Peter,
> >> > >>
> >> > >> On 10 December 2014 at 15:17, Peter Howard
> >> > >> <pjh at northern-ridge.com.au> wrote:
> >> > >> >
> >> > >> > On Tue, 2014-12-09 at 17:45 -0700, Simon Glass wrote:
> >> > >> > > Hi Peter,
> >> > >> > >
> >> > >> > > On 9 December 2014 at 17:13, Peter Howard
> >> > >> > > <pjh at northern-ridge.com.au> wrote:
> >> > >> > > >
> >> > >> > > > On Wed, 2014-12-03 at 14:20 -0800, Simon Glass wrote:
> >> > >> > > > > Hi Peter,
> >> > >> > > > >
> >> > >> > > > > On 3 December 2014 at 13:53, Peter Howard
> >> > >> > > > > <pjh at northern-ridge.com.au> wrote:
> >> > >> > > > > > On Wed, 2014-12-03 at 06:38 -0700, Simon Glass wrote:
> >> > >> > > > > >> Hi Peter,
> >> > >> > > > > >>
> >> > >> > > > > >> On 2 December 2014 at 14:59, Peter Howard
> >> > >> > > > > >> <pjh at northern-ridge.com.au> wrote:
> >> > >> > > > > >> >
> >> > >> > > > > >> > I'm trying to make two changes to building u-boot for
> >> > >> > > > > >> > the da850evm.
> >> > >> > > > > >> >       * Use the generic board code to get rid of the
> >> > >> > > > > >> > warning, and
> >> > >> > > > > >> >       * Enable libfdt to allow booting of linux with a
> >> > >> > > > > >> > standalone dtb
> >> > >> > > > > >> >         image.
> >> > >> > > > > >> >
> >> > >> > > > > >> > The first part appears to be simple.  Just adding
> >> > >> > > > > >> >
> >> > >> > > > > >> >         #define CONFIG_SYS_GENERIC_BOARD
> >> > >> > > > > >> >
> >> > >> > > > > >> > in include/configs/da850evm.h works with no obvious
> >> > >> > > > > >> > side-effects.
> >> > >> > > > > >> >
> >> > >> > > > > >> > However, adding
> >> > >> > > > > >> >
> >> > >> > > > > >> >         #define CONFIG_OF_LIBFDT
> >> > >> > > > > >> >
> >> > >> > > > > >> > is a different story.  It appears to introduce memory
> >> > >> > > > > >> > corruption when
> >> > >> > > > > >> > loading the environment.  On first boot it gives the
> >> > >> > > > > >> > "bad CRC!" warning
> >> > >> > > > > >> > and uses the default environment.  If you *don't* save
> >> > >> > > > > >> > the environment
> >> > >> > > > > >> > you can boot fine (including manual editing of the
> >> > >> > > > > >> > environment). However
> >> > >> > > > > >> > if you save the environment via saveenv bad things
> >> > >> > > > > >> > happen on the next
> >> > >> > > > > >> > boot.  An example log:
> >> > >> > > > > >> >
> >> > >> > > > > >> > U-Boot SPL 2015.01-rc1 (Nov 27 2014 - 14:30:26)
> >> > >> > > > > >> >
> >> > >> > > > > >> >
> >> > >> > > > > >> > U-Boot 2015.01-rc1 (Nov 27 2014 - 14:30:26)
> >> > >> > > > > >> >
> >> > >> > > > > >> > I2C:   ready
> >> > >> > > > > >> > DRAM:  64 MiB
> >> > >> > > > > >> > WARNING: Caches not enabled
> >> > >> > > > > >> > MMC:   davinci: 0
> >> > >> > > > > >> > SF: Detected M25P64 with page size 256 Bytes, erase size
> >> > >> > > > > >> > 64 KiB, total 8 MiB
> >> > >> > > > > >> > In:    serial
> >> > >> > > > > >> > Out:   serial
> >> > >> > > > > >> > Err:   serial
> >> > >> > > > > >> > SF: Detected M25P64 with page size 256 Bytes, erase size
> >> > >> > > > > >> > 64 KiB, total 8 MiB
> >> > >> > > > > >> > Warning: Invalid MAC address read from SPI flash
> >> > >> > > > > >> > Net:   DaVinci-EMAC
> >> > >> > > > > >> > Error: DaVinci-EMAC address not set.
> >> > >> > > > > >> >
> >> > >> > > > > >> > U-Boot > help
> >> > >> > > > > >> > data abort
> >> > >> > > > > >> > pc : [<c108ffd8>]          lr : [<c10900b4>]
> >> > >> > > > > >> > sp : c3e5f838  ip : 00000000     fp : c3e5fda4
> >> > >> > > > > >> > r10: c10b1f28  r9 : c3e5ff08     r8 : 0000000e
> >> > >> > > > > >> > r7 : c10b22c4  r6 : c10aa2a0     r5 : 00000000  r4 :
> >> > >> > > > > >> > 0000001b
> >> > >> > > > > >> > r3 : c10b8f70  r2 : 00000001     r1 : c3e5f840  r0 :
> >> > >> > > > > >> > ffffffff
> >> > >> > > > > >> > Flags: Nzcv  IRQs off  FIQs off  Mode SVC_32
> >> > >> > > > > >> > Resetting CPU ...
> >> > >> > > > > >> >
> >> > >> > > > > >> > If I rebuild  with CONFIG_OF_LIBFDT removed again from
> >> > >> > > > > >> > da850evm.h the
> >> > >> > > > > >> > problem disappears.  And you can see that the saveenv
> >> > >> > > > > >> > worked (i.e. the
> >> > >> > > > > >> > environment is what was saved before the reboot and data
> >> > >> > > > > >> > abort).
> >> > >> > > > > >> >
> >> > >> > > > > >> > I've traced the problem as far as the inline version of
> >> > >> > > > > >> > console_puts()
> >> > >> > > > > >> > in common/console.c.  The table dispatch there and the
> >> > >> > > > > >> > fact that the
> >> > >> > > > > >> > problem appears only when you load the environment makes
> >> > >> > > > > >> > me think it's
> >> > >> > > > > >> > memory corruption.
> >> > >> > > > > >> >
> >> > >> > > > > >> > Note: if you do *not* specify CONFIG_SYS_GENERIC_BOARD
> >> > >> > > > > >> > you still get the
> >> > >> > > > > >> > data abort, however it takes a bit more effort to
> >> > >> > > > > >> > trigger (like actually
> >> > >> > > > > >> > looking at the environment :-)  )
> >> > >> > > > > >> >
> >> > >> > > > > >> > (Note: This is building against the u-boot-2015.01-rc1
> >> > >> > > > > >> > tree)
> >> > >> > > > > >> >
> >> > >> > > > > >> > Suggestions?
> >> > >> > > > > >>
> >> > >> > > > > >> In case it helps, I got the same symptom (help crashes)
> >> > >> > > > > >> and it was due
> >> > >> > > > > >> to BSS not being cleared. Stefan (on cc) found this
> >> > >> > > > > >> problem - he said
> >> > >> > > > > >> something to do with GDT calculation or handling. However
> >> > >> > > > > >> it is just a
> >> > >> > > > > >> guess and probably has nothing to do with your issue.
> >> > >> > > > > >
> >> > >> > > > > > I may be missing something, but the GDT appears to be
> >> > >> > > > > > x86-specific
> >> > >> > > > > > whereas I'm building for ARMv5.
> >> > >> > > > >
> >> > >> > > > > OK for some reason I thought this was PPC!
> >> > >> > > > >
> >> > >> > > > > Maybe you can find your pc in System.map and work out where
> >> > >> > > > > it is
> >> > >> > > > > going wrong? Are you hitting some image size limit?
> >> > >> > > > >
> >> > >> > > > > pc : [<c108ffd8>]
> >> > >> > > >
> >> > >> > > >
> >> > >> > > > Sorry, been distracted on other stuff for a few days.
> >> > >> > > >
> >> > >> > > > First, I now understand the global descriptor a bit better.
> >> > >> > > > For ARMv5
> >> > >> > > > It's stored in r9 and still looks sane.  The relevant info:
> >> > >> > > >
> >> > >> > > > (gdb) print/x *((gd_t *)$r9)
> >> > >> > > > $1 = {bd = 0xc3e5ffb0, flags = 0x183, baudrate = 0x1c200,
> >> > >> > > > cpu_clk = 0x0,
> >> > >> > > >   bus_clk = 0x0, pci_clk = 0x0, mem_clk = 0x0, have_console =
> >> > >> > > > 0x1,
> >> > >> > > >   env_addr = 0xc10a8fcc, env_valid = 0x1, ram_top = 0xc4000000,
> >> > >> > > >   relocaddr = 0xc3f80000, ram_size = 0x4000000, mon_len =
> >> > >> > > > 0x6ffb0,
> >> > >> > > >   irq_sp = 0xc3e5fef0, start_addr_sp = 0xc3e5fee0, reloc_off =
> >> > >> > > > 0x2f00000,
> >> > >> > > >   new_gd = 0xc3e5ff08, fdt_blob = 0x0, new_fdt = 0x0, fdt_size
> >> > >> > > > = 0x0,
> >> > >> > > >   jt = 0xc3e601c0, env_buf = {0x31, 0x31, 0x35, 0x32, 0x30,
> >> > >> > > > 0x30,
> >> > >> > > >     0x0 <repeats 26 times>}, cur_i2c_bus = 0x0, timebase_h =
> >> > >> > > > 0x0,
> >> > >> > > >   timebase_l = 0x0, arch = {timer_rate_hz = 0x16e360, tbu =
> >> > >> > > > 0x0,
> >> > >> > > >     tbl = 0x4cc62, lastinc = 0x0, timer_reset_value = 0x0,
> >> > >> > > >     tlb_addr = 0xc3ff0000, tlb_size = 0x4000}}
> >> > >> > > >
> >> > >> > > >
> >> > >> > > > The pc is definitely bogus.  The reloc address is 0xc3f80000
> >> > >> > > > whereas
> >> > >> > > > that would be a pre-reloc address (starting at 0xc1080000).
> >> > >> > > > And it's
> >> > >> > > > definitely relocated by the time of failure.  The only other
> >> > >> > > > bit of
> >> > >> > > > information I have right now is that adding CONFIG_OF_LIBFDT
> >> > >> > > > drops the
> >> > >> > > > reloc address from 0xc3f85000 to 0xc3f80000.
> >> > >> > > >
> >> > >> > > > Don't know if any of that gives additional insight.  Meanwhile
> >> > >> > > > I
> >> > >> > > > continue tracing.
> >> > >> > >
> >> > >> > > Yes, continue tracing.
> >> > >> > >
> >> > >> > > If ram_size is 0x40000000 and ram_top is 0xc4000000 then your RAM
> >> > >> > > presumably starts at 0xc0000000. Then the relocation address
> >> > >> > > actually
> >> > >> > > seems reasonable to me.
> >> > >> > >
> >> > >> > > I don't know why the reloc address changes when you add
> >> > >> > > CONFIG_OF_LIBFDT.
> >> > >> > >
> >> > >> > > You can add '#define DEBUG' at the very top of board_f/r.c to see
> >> > >> > > addresses.
> >> > >> >
> >> > >> > I'm not sure what you meant by board_f/r.c as that file doesn't
> >> > >> > seem to
> >> > >>
> >> > >> common/board_f.c
> >> > >> common/board_r.c
> >> > >>
> >> > >> >
> >> > >> > exist.  I whacked '#define DEBUG' in da850evm.h and got a wealth of
> >> > >> > output.  However, the only new bit of information I've gleaned is
> >> > >> > that
> >> > >> > the lower that the reloc address goes, the faster things die.  It
> >> > >> > goes
> >> > >> > lower in -rc3 (0xc3f7f000), and it doesn't make it to the prompt on
> >> > >> > a
> >> > >> > reset after saving the environment.  Likewise with '#define DEBUG';
> >> > >> > after saving the environment it doesn't get back to the prompt on
> >> > >> > the
> >> > >> > next reset.  All the addresses printed seem reasonable.
> >> > >> >
> >> > >> > The only thing that doesn't look right is that the command function
> >> > >> > pointers all look to be pre-reloc addresses.  Though I don't see
> >> > >> > how
> >> > >> > this change would cause a failure that wouldn't happen already.
> >> > >> >
> >> > >> > So it seems that _something_ is being overwritten by the
> >> > >> > environment
> >> > >> > load, but I'm yet to get an idea of what.
> >> > >> >
> >> > >> > --
> >> > >> > Peter Howard <pjh at northern-ridge.com.au>
> >> > >> >
> >> > >>
> >> > >> Me neither. But you do have a data abort so may be able to look
> >> > >> around
> >> > >> there and figure out where exactly it died. Better if you can use a
> >> > >> debugger.
> >> > >>
> >> > >
> >> > >
> >> > > Here's what appears to be happening with a death on typing
> >> > > "help" (-rc1): The logic flow gets to the (unrelocated) fputs() - and
> >> > > into the inline version of console_putc().  It looks up
> >> > > stdio_devices[1]
> >> > > (again, unrelocated addr) which is a valid pointer - sort of.  The
> >> > > value
> >> > > is 0x2081004 which is outside of RAM, and the contents of the address
> >> > > are, according to gdb, zeroed out.  Which means
> >> > > stdio_devices[1]->putc()
> >> > > is a jump to 0x0.  I've stepped through that using JTAG+openocd+gdb.
> >> > >
> >> > > With extra debug statements, console output seems to cause a hang from
> >> > > somewhere in himport_r() (which is using relocated addresses including
> >> > > data).
> >> > >
> >> > > All this, to me, points to an issue with the unrelocated locations
> >> > > being
> >> > > used after environment import, but I don't know enough about u-boot
> >> > > structure to know if that is right or not . . .
> >> > >
> >> > >
> >> > > Peter Howard <pjh at northern-ridge.com.au>
> >> > >
> >> >
> >> > Perhaps look at how it gets to the unrelocated fputs()? If it can call
> >> > the correct fputs() before initr_env() then you can perhaps narrow it
> >> > down.
> >> >
> >> > But I can't see how you would be able to type at the console with this
> >> > problem, since fputs() is used by the command line editor.
> >> >
> >> > I suspect you are actually seeing a symptom of something else. You
> >> > could try enabling CONFIG_CONSOLE_MUX and see if that changes the bug.
> >>
> >> Hmmm.  That produces a new failure - it goes into an endless loop in
> >> fgetc().  And it does that:
> >>       * With CONFIG_GENERIC_BOARD and CONFIG_OF_LIBFDT - both with and
> >>         without saving the environment
> >>       * With CONFIG_GENERIC_BOARD only,
> >>       * Without CONFIG_GENERIC_BOARD.
> >>
> >> :-)
> >
> > so just adding the console config changes the behavior on your board? Does
> > you BSS work? Do you have a custom link script? Are you writing to BSS
> > before relocation?
> 
> I see a few things:
> 
> - 4KB stack (should be enough I suppose)
> - SPL link script, but it doesn't look like it does anything useful.
> Maybe drop it?
> 
> But I'm pretty sure this is nothing to do with it. This is a bit of a
> long shot, but if your relocation is broken you might be corrupting
> BSS - the variables in System.map between __rel_dyn_start and
> __rel_dyn_end. This can happen if you write to a BSS variable before
> relocation. You can check the area (e.g. by checksumming it) early in
> board_init_f() - e.g. setup_mon_len(). Put the result in a new member
> of struct global_data (gd) - then checksum again and compare before
> relocation in setup_reloc().
> 
> Probably nothing else but to keep digging.

OK, I _think_ I have a handle on this.  But hopefully there's someone
out there who understands better than me how the da850 SPI flash is
setup wrt. u-boot usage.

It appears that the damage occurs with the actual writing of the env via
saveenv (i.e. not the reading back of it next time round).    Why?
because stepping through crt0.S and relocate.S shows different results
by relocate_done: in relocate.S  When the environment is not read, all
the relocations are correct. After saveenv is done and the board is
reset, various relocations are incomplete - i.e. the addresses in the
relocated tables point to the pre-relocation addresses.  Which then get
trashed when the environment is read (afterwards).  

I'm guessing that the problem is the size of the u-boot image is now
overlapping in spi flash with the location of the environment.  So
saving the environment actually trashes part of the u-boot image.
Further guessing is it involves the __rel_dyn area, so the address
fixups don't happen.

Does that sound believable?
-- 
Peter Howard <pjh at northern-ridge.com.au>



More information about the U-Boot mailing list