How to debug u-boot data abort

qianfan qianfanguijin at 163.com
Thu Mar 24 08:38:23 CET 2022


在 2022/3/24 11:18, AKASHI Takahiro 写道:
> On Wed, Mar 23, 2022 at 09:27:08AM +0100, Heinrich Schuchardt wrote:
>> On 3/23/22 08:45, qianfan wrote:
>>> 在 2022/3/23 10:28, qianfan 写道:
>>>> Hi:
>>>>
>>>> I had a custom AM335X board connected my computer by usbnet. It always
>>>> report data abort when 'dhcp':
>>>>
>>>> Next it the log:
>>>>
>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)
>>>>
>>>> CPU  : AM335X-GP rev 2.1
>>>> Model: WISDOM AM335X CCT
>>>> DRAM:  512 MiB
>>>> NAND:  256 MiB
>>>> MMC:   OMAP SD/MMC: 0
>>>> Loading Environment from NAND... *** Warning - bad CRC, using default
>>>> environment
>>>>
>>>> Net:   Could not get PHY for ethernet at 4a100000: addr 0
>>>> eth2: ethernet at 4a100000, eth3: usb_ether
>>>> Hit any key to stop autoboot:  0
>>>> => setenv autoload no
>>>> => dhcp
>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>>> MAC de:ad:be:ef:00:01
>>>> HOST MAC de:ad:be:ef:00:00
>>>> RNDIS ready
>>>> musb-hdrc: peripheral reset irq lost!
>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>>> USB RNDIS network up!
>>>> BOOTP broadcast 1
>>>> BOOTP broadcast 2
>>>> BOOTP broadcast 3
>>>> DHCP client bound to address 192.168.200.4 (757 ms)
>>>> data abort
>> This could be an alignment error.
>>
>>>> pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
>>>> reloc pc : [<808130a2>]    lr : [<80833c3f>]
>> You can use these addresses together with the u-boot.map file to figure
>> out in which function the abort occurs and from where it was called.
>>
>> Use 'arm-linux-gnueabihf-objdump -S -D' to find the exact code positions.
>>
>>>> sp : 9de53410  ip : 9de53578     fp : 00000001
>>>> r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
>>>> r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
>>>> r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
>>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>>>> Code: f023 0303 60ca 4403 (6091) 685a
>> This is how to find the exact instruction causing the problem:
>>
>> $ echo 'Code: f023 0303 60ca 4403 (6091) 685a' | \
>>>        ARCH=arm scripts/decodecode
>> Code: f023 0303 60ca 4403 (6091) 685a
>> All code
>> ========
>>     0:   23 f0                   and    %eax,%esi
>>     2:   03 03                   add    (%rbx),%eax
>>     4:   ca 60 03                lret   $0x360
>>     7:*  44 91                   rex.R xchg %eax,%ecx            <--
>> trapping instruction
>>     9:   60                      (bad)
>>     a:   5a                      pop    %rdx
>>     b:   68                      .byte 0x68
>>
>> Code starting with the faulting instruction
>> ===========================================
>>     0:   91                      xchg   %eax,%ecx
>>     1:   60                      (bad)
>>     2:   5a                      pop    %rdx
>>     3:   68                      .byte 0x68
> The code looks like x86 instructions.
> Please don't forget to add "CROSS_COMPILE=..." :)
>
> Code: f023 0303 60ca 4403 (6091) 685a
> All code
> ========
>     0:	f023 0303 	bic.w	r3, r3, #3
>     4:	60ca      	str	r2, [r1, #12]
>     6:	4403      	add	r3, r0
>     8:*	6091      	str	r1, [r2, #8]		<-- trapping instruction
>     a:	685a      	ldr	r2, [r3, #4]
>
> Code starting with the faulting instruction
> ===========================================
>     0:	6091      	str	r1, [r2, #8]
>     2:	685a      	ldr	r2, [r3, #4]
>
> Then,
> ${CROSS_COMPILE}objdump --disassemble=malloc -lS ${BUILDDIR}/u-boot | grep -A 10 -B 20 ${PATTERN}
> # Here, PATTERN may be the instruction ("6091") or the location ("8081496c" in your case?)
>
> or similarly
>
> ${CROSS_COMPILE}gdb --batch -ex "disas/m ${LOC}" ${BUILDDIR}/u-boot | grep -A 10 -B 20 ${LOC}
> # Here, LOC is your "reloc pc" (0x80817586)
>
> gives you some hint about the exact location.
>
> -Takahiro Akashi

Hi:

Thanks for your's guide. I know the pc in malloc and lr is env_attr_walk. But 
can't get the full stack or malloc.

I can't understand dlmalloc's logic and it's hard to me to solve this problem.

>
>
>> I hope this helps to figure out, where exactly the problem occurs
>>
>> Best regards
>>
>> Heinrich
>>
>>>> Resetting CPU ...
>>>>
>>>> resetting ...
>>>>
>>>>
>>>> It's there has any doc about how to debug data abort? Or is the bug is
>>>> already fixed?
>>>>
>>>> Thanks
>>>>
>>> This bug doesn't fixed on master code. I found v2021.01 is good and
>>> v2021.04-rc2 is bad.
>>>
>>> Also I had tested this on beaglebone black with am335x_evm_defconfig,
>>> has the simliar problem.
>>>
>>> find the first bug commit via 'git bisect': it told me that commit
>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very
>>> strange due to this commit doesn't touch any dhcp or network code.
>>>
>>> ➜  u-boot-main git:(e97eb638de) ✗ git bisect bug
>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit
>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917
>>> Author: Heinrich Schuchardt <xypron.glpk at gmx.de>
>>> Date:   Wed Jan 20 22:21:53 2021 +0100
>>>
>>>       fs: fat: consistent error handling for flush_dir()
>>>
>>>       Provide function description for flush_dir().
>>>       Move all error messages for flush_dir() from the callers to the
>>> function.
>>>       Move mapping of errors to -EIO to the function.
>>>       Always check return value of flush_dir() (Coverity CID 316362).
>>>
>>>       In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
>>>
>>>       Signed-off-by: Heinrich Schuchardt <xypron.glpk at gmx.de>
>>>
>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e
>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M      fs
>>>
>>>
>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before
>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917:
>>>
>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
>>> handling for flush_dir()
>>> *   184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag
>>> 'u-boot-rockchip-20210121' of
>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip
>>> |\
>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based
>>> PCIe controller driver
>>>
>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
>>>
>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
>>>
>>> CPU  : AM335X-GP rev 2.1
>>> Model: TI AM335x BeagleBone Black
>>> DRAM:  512 MiB
>>> WDT:   Started with servicing (60s timeout)
>>> NAND:  0 MiB
>>> MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
>>> Loading Environment from FAT... <ethaddr> not set. Validating first
>>> E-fuse MAC
>>> Net:   eth2: ethernet at 4a100000, eth3: usb_ether
>>> Hit any key to stop autoboot:  0
>>> => dhcp
>>> ethernet at 4a100000 Waiting for PHY auto negotiation to complete.........
>>> TIMEOUT !
>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>> MAC de:ad:be:ef:00:01
>>> HOST MAC de:ad:be:ef:00:00
>>> RNDIS ready
>>> musb-hdrc: peripheral reset irq lost!
>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>> USB RNDIS network up!
>>> BOOTP broadcast 1
>>> BOOTP broadcast 2
>>> BOOTP broadcast 3
>>> DHCP client bound to address 192.168.200.157 (757 ms)
>>> Using usb_ether device
>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157
>>> Filename 'u-boot.img'.
>>> Load address: 0x82000000
>>> Loading: #################################################################
>>> #################################################################
>>> #################################################################
>>>            #########################
>>>            2.5 MiB/s
>>> done
>>> Bytes transferred = 1123888 (112630 hex)
>>> =>
>>>



More information about the U-Boot mailing list