data abort when run 'dhcp'
qianfan
qianfanguijin at 163.com
Fri Mar 25 11:04:46 CET 2022
It's very strange. And I can't detect it's a bug of usb or dlmalloc.
1. Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok.
2. Starting u-boot and dhcp via am335x's usb net, data abort.
3. start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok.
在 2022/3/24 17:33, qianfan 写道:
>
> 在 2022/3/23 18:12, Heinrich Schuchardt 写道:
>> On 3/23/22 11:07, qianfan wrote:
>>>
>>> 在 2022/3/23 17:51, Heinrich Schuchardt 写道:
>>>> On 3/23/22 10:13, qianfan wrote:
>>>>>
>>>>> 在 2022/3/23 16:02, qianfan 写道:
>>>>>>
>>>>>>
>>>>>> 在 2022/3/23 15:45, qianfan 写道:
>>>>>>>
>>>>>>>
>>>>>>> 在 2022/3/23 10:28, qianfan 写道:
>>>>>>>>
>>>>>>>> Hi:
>>>>>>>>
>>>>>>>> I had a custom AM335X board connected my computer by usbnet. It
>>>>>>>> always report data abort when 'dhcp':
>>>>>>>>
>>>>>>>> Next it the log:
>>>>>>>>
>>>>>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02
>>>>>>>> +0800)
>>>>>>>>
>>>>>>>> CPU : AM335X-GP rev 2.1
>>>>>>>> Model: WISDOM AM335X CCT
>>>>>>>> DRAM: 512 MiB
>>>>>>>> NAND: 256 MiB
>>>>>>>> MMC: OMAP SD/MMC: 0
>>>>>>>> Loading Environment from NAND... *** Warning - bad CRC, using
>>>>>>>> default environment
>>>>>>>>
>>>>>>>> Net: Could not get PHY for ethernet at 4a100000: addr 0
>>>>>>>> eth2: ethernet at 4a100000, eth3: usb_ether
>>>>>>>> Hit any key to stop autoboot: 0
>>>>>>>> => setenv autoload no
>>>>>>>> => dhcp
>>>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>>>>>>> MAC de:ad:be:ef:00:01
>>>>>>>> HOST MAC de:ad:be:ef:00:00
>>>>>>>> RNDIS ready
>>>>>>>> musb-hdrc: peripheral reset irq lost!
>>>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>>>>>>> USB RNDIS network up!
>>>>>>>> BOOTP broadcast 1
>>>>>>>> BOOTP broadcast 2
>>>>>>>> BOOTP broadcast 3
>>>>>>>> DHCP client bound to address 192.168.200.4 (757 ms)
>>>>>>>> data abort
>>>>>>>> pc : [<9fe9b0a2>] lr : [<9febbc3f>]
>>>>>>>> reloc pc : [<808130a2>] lr : [<80833c3f>]
>>>>>>>> sp : 9de53410 ip : 9de53578 fp : 00000001
>>>>>>>> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5
>>>>>>>> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018
>>>>>>>> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700
>>>>>>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T)
>>>>>>>> Code: f023 0303 60ca 4403 (6091) 685a
>>>>>>>> Resetting CPU ...
>>>>>>>>
>>>>>>>> resetting ...
>>>>>>>>
>>>>>>>>
>>>>>>>> It's there has any doc about how to debug data abort? Or is the bug
>>>>>>>> is already fixed?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>> This bug doesn't fixed on master code. I found v2021.01 is good and
>>>>>>> v2021.04-rc2 is bad.
>>>>>>>
>>>>>>> Also I had tested this on beaglebone black with am335x_evm_defconfig,
>>>>>>> has the simliar problem.
>>>>>>>
>>>>>>> find the first bug commit via 'git bisect': it told me that commit
>>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very
>>>>>>> strange due to this commit doesn't touch any dhcp or network code.
>>>>>>>
>>>>>>> ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug
>>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit
>>>>>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917
>>>>>>> Author: Heinrich Schuchardt <xypron.glpk at gmx.de>
>>>>>>> Date: Wed Jan 20 22:21:53 2021 +0100
>>>>>>>
>>>>>>> fs: fat: consistent error handling for flush_dir()
>>>>>>>
>>>>>>> Provide function description for flush_dir().
>>>>>>> Move all error messages for flush_dir() from the callers to the
>>>>>>> function.
>>>>>>> Move mapping of errors to -EIO to the function.
>>>>>>> Always check return value of flush_dir() (Coverity CID 316362).
>>>>>>>
>>>>>>> In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
>>>>>>>
>>>>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk at gmx.de>
>>>>>>>
>>>>>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e
>>>>>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M fs
>>>>>>>
>>>>>>>
>>>>>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before
>>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917:
>>>>>>>
>>>>>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
>>>>>>> handling for flush_dir()
>>>>>>> * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag
>>>>>>> 'u-boot-rockchip-20210121' of
>>>>>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip
>>>>>>> |\
>>>>>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc
>>>>>>> based PCIe controller driver
>>>>>>>
>>>>>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
>>>>>>>
>>>>>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
>>>>>>>
>>>>>>> CPU : AM335X-GP rev 2.1
>>>>>>> Model: TI AM335x BeagleBone Black
>>>>>>> DRAM: 512 MiB
>>>>>>> WDT: Started with servicing (60s timeout)
>>>>>>> NAND: 0 MiB
>>>>>>> MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1
>>>>>>> Loading Environment from FAT... <ethaddr> not set. Validating first
>>>>>>> E-fuse MAC
>>>>>>> Net: eth2: ethernet at 4a100000, eth3: usb_ether
>>>>>>> Hit any key to stop autoboot: 0
>>>>>>> => dhcp
>>>>>>> ethernet at 4a100000 Waiting for PHY auto negotiation to
>>>>>>> complete......... TIMEOUT !
>>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>>>>>> MAC de:ad:be:ef:00:01
>>>>>>> HOST MAC de:ad:be:ef:00:00
>>>>>>> RNDIS ready
>>>>>>> musb-hdrc: peripheral reset irq lost!
>>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>>>>>> USB RNDIS network up!
>>>>>>> BOOTP broadcast 1
>>>>>>> BOOTP broadcast 2
>>>>>>> BOOTP broadcast 3
>>>>>>> DHCP client bound to address 192.168.200.157 (757 ms)
>>>>>>> Using usb_ether device
>>>>>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157
>>>>>>> Filename 'u-boot.img'.
>>>>>>> Load address: 0x82000000
>>>>>>> Loading:
>>>>>>> #################################################################
>>>>>>> #################################################################
>>>>>>> #################################################################
>>>>>>> #########################
>>>>>>> 2.5 MiB/s
>>>>>>> done
>>>>>>> Bytes transferred = 1123888 (112630 hex)
>>>>>>> =>
>>>>>>>
>>>>> "data abort" messages:
>>>>>
>>>>> data abort
>>>>> pc : [<9ff8196c>] lr : [<9ffa1cd7>]
>>>>> reloc pc : [<8081496c>] lr : [<80834cd7>]
>>>>> sp : 9df38e60 ip : 9df38fc8 fp : 00000001
>>>>> r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d
>>>>> r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018
>>>>> r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00
>>>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T)
>>>>> Code: 0303 60ca 4403 6091 (685a) f042
>>>>> Resetting CPU ...
>>>>>
>>>>> objdump u-boot:pc is in malloc and lr is in env_attr_walk
>>>>>
>>>>> unlink(victim, bck, fwd);
>>>>> 80814966: 60ca str r2, [r1, #12]
>>>>> set_inuse_bit_at_offset(victim, victim_size);
>>>>> 80814968: 4403 add r3, r0
>>>>> unlink(victim, bck, fwd);
>>>>> 8081496a: 6091 str r1, [r2, #8]
>>>>> set_inuse_bit_at_offset(victim, victim_size);
>>>>> 8081496c: 685a ldr r2, [r3, #4]
>>>>> 8081496e: f042 0201 orr.w r2, r2, #1
>>>>> 80814972: 605a str r2, [r3, #4]
>>>>>
>>>>> r3 is 3ff589e0 and it's not a valid ram address on am335x.
>>>>>
>>>>>
>>>>
>>>> I have seen crashes in common/dlmalloc.c before after double free() or
>>>> free() with an incorrect pointer.
>>>>
>>>> The assert() statements in do_check_inuse_chunk() are meant to catch
>>>> this but assert() as defined in include/log.h does not stop the code and
>>>> even does not print without _DEBUG=1.
>>>>
>>>> You should be able to get the assert output with
>>>>
>>>> #include <common.h>
>>>> #define _DEBUG 1
>>>> #include <log.h>
>>>>
>>>> at the top of common/dlmalloc.c.
>>>>
>>>> You should get full malloc debug output with
>>>
>>> Hi: I had try add DEBUG marco before <log.h> and no other malloc message
>>
>> assert() checks for _DEBUG. Defining DEBUG after common.h will not
>> define _DEBUG.
>
> Finally I got a malloc error message on console:
>
> TFTP from server 192.168.200.1; our IP address is 192.168.200.39
> Filename 'u-boot.img'.
> Load address: 0x82000000
> Loading: #################################################################
> #################################################################
> #################################################################
> ###################################################### 0 Bytes
> 1.9 MiB/s
> done
> Bytes transferred = 1274816 (1373c0 hex)
> common/dlmalloc.c:819: do_check_chunk: Assertion `(char*)p + sz <= (char*)top'
> failed.
>
> I had tried many times, do_check_chunk not always failed, and sometimes it
> report common/dlmalloc.c:802: do_check_chunk: Assertion `!chunk_is_mmapped(p)'
> failed. The situation is not the same.
>
> I got a bt stack when malloc failed:
>
> (gdb) bt
> #0 0x9ffb5684 in panic_finish () at lib/panic.c:23
> #1 panic (fmt=0x9ffbd96b "%s:%u: %s: Assertion `%s' failed.") at lib/panic.c:49
> #2 0x9ffb5696 in __assert_fail (assertion=<optimized out>, file=<optimized
> out>, line=<optimized out>, function=<optimized out>) at lib/panic.c:56
> #3 0x9ff76910 in do_check_inuse_chunk (p=p at entry=0x9ffd7200) at
> common/dlmalloc.c:866
> #4 0x9ff769d6 in do_check_malloced_chunk (p=p at entry=0x9ffd7200, s=s at entry=24)
> at common/dlmalloc.c:900
> #5 0x9ff76da6 in malloc (bytes=<optimized out>) at common/dlmalloc.c:1552
> #6 0x9ff96b72 in env_attr_walk (attr_list=<optimized out>,
> callback=0x9ff969f9 <regex_callback>, priv=0x9df28dc8) at env/attr.c:70
> #7 0x9ff96bc2 in env_attr_lookup (attr_list=<optimized out>, name=<optimized
> out>, attributes=0x9df28dec "") at env/attr.c:184
> #8 0x9ff97146 in env_callback_init (var_entry=0x9df46f60) at env/callback.c:67
> #9 0x9ffb36fc in hsearch_r (item=..., action=ENV_ENTER, retval=0x9df28f60,
> htab=0x9ffdbce8, flag=512) at lib/hashtable.c:403
> #10 0x9ff7090e in _do_env_set (argc=<optimized out>, argv=<optimized out>,
> env_flag=512, flag=0) at cmd/nvedit.c:296
> #11 0x9ff70b64 in env_set (varname=<optimized out>, varvalue=<optimized out>)
> at cmd/nvedit.c:318
> #12 0x9ff6d522 in netboot_update_env () at cmd/net.c:133
> #13 netboot_common (proto=DHCP, cmdtp=0x9ffdd0e8, argc=<optimized out>,
> argv=0x9df442c8) at cmd/net.c:268
> #14 0x9ff783a4 in cmd_call (repeatable=0x9df29008, argv=0x9df442c8, argc=1,
> flag=0, cmdtp=0x9ffdd0e8) at common/command.c:580
> #15 cmd_process (flag=<optimized out>, argc=1, argv=0x9df442c8,
> repeatable=0x9ffdf6a0, ticks=0x0) at common/command.c:635
> #16 0x9ff71d16 in run_pipe_real (pi=0x9df44220) at common/cli_hush.c:1676
> #17 run_list_real (pi=<optimized out>) at common/cli_hush.c:1873
> #18 0x9ff71e28 in run_list (pi=0x9df44220) at common/cli_hush.c:2022
> #19 parse_stream_outer (inp=inp at entry=0x9df290e8, flag=flag at entry=2) at
> common/cli_hush.c:3206
> #20 0x9ff721ba in parse_file_outer () at common/cli_hush.c:3289
> #21 0x9ff77c1a in cli_loop () at common/cli.c:229
> #22 0x9ff70d3e in main_loop () at common/main.c:66
> #23 0x9ff72672 in run_main_loop () at common/board_r.c:584
> #24 0x9ff72830 in initcall_run_list (init_sequence=0x9ffd7224) at
> include/initcall.h:46
> #25 board_init_r (new_gd=<optimized out>, dest_addr=<optimized out>) at
> common/board_r.c:822
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>
>>
>> Best regards
>>
>> Heinrich
>>
>>> printed.
>>>
>>>>
>>>> #define DEBUG 1
>>>> #include <common.h>
>>>> #include <log.h>
>>>>
>>>> Best regards
>>>>
>>>> Heinrich
>>>
More information about the U-Boot
mailing list