[Regression] rk3588 failed to boot linux
Andy Yan
andyshrk at 163.com
Tue Oct 15 03:19:19 CEST 2024
Helllo,
在 2024-10-14 23:14:04,"Jonas Karlman" <jonas at kwiboo.se> 写道:
>Hi Sughosh,
>
>On 2024-10-14 13:32, Sughosh Ganu wrote:
>> On Mon, 14 Oct 2024 at 16:49, Andy Yan <andyshrk at 163.com> wrote:
>>>
>>>
>>> Hi Sughosh,
>>>
>>> At 2024-10-14 19:00:24, "Sughosh Ganu" <sughosh.ganu at linaro.org> wrote:
>>>> On Mon, 14 Oct 2024 at 16:12, Andy Yan <andyshrk at 163.com> wrote:
>>>>>
>>>>>
>>>>> Hi Suqhosh,
>>>>>
>>>>> At 2024-10-14 18:13:35, "Sughosh Ganu" <sughosh.ganu at linaro.org> wrote:
>>>>>> On Mon, 14 Oct 2024 at 15:12, Andy Yan <andyshrk at 163.com> wrote:
>>>>>>>
>>>>>>> When test with current main branch on rk3588 based coolpi 4b,
>>>>>>> the board failed to boot linux os[0]:
>>>>>>> I do the same test on another board that is preparing to send
>>>>>>> patches upstream, and it also failed.
>>>>>>>
>>>>>>> The two boards boots fine with v2024.10.
>>>>>>> With some bisect, it seems that this issue is caused by:
>>>>>>>
>>>>>>> commit 360aaddd9cea8c256f50c576794415cadfb61819
>>>>>>> Merge: 2c832abc732 f8ffc6f3cc4
>>>>>>> Author: Tom Rini <trini at konsulko.com>
>>>>>>> Date: Tue Sep 3 14:09:30 2024 -0600
>>>>>>>
>>>>>>> Merge patch series "Make LMB memory map global and persistent"
>>>>>>>
>>>>>>> Sughosh Ganu <sughosh.ganu at linaro.org> says:
>>>>>>>
>>>>>>> My boards boot fine before this merge on u-boot/next, and all failed
>>>>>>> to boot linux after this merge.
>>>>>>>
>>>>>>> I am not familiar with the LMB mechanism, so i don't know how to find
>>>>>>> the root case now.
>>>>>>>
>>>>>>> I dump the bdinfo for both good case[1] and gegression case[2], not sure
>>>>>>> if they are usefull for debug this issue.
>>>>>>>
>>>>>>> [0] Synchronous Abort when starting kernel:
>>>>>>> Scanning bootdev 'mmc at fe2c0000.bootdev':
>>>>>>> Card did not respond to voltage select! : -110
>>>>>>> Scanning bootdev 'mmc at fe2e0000.bootdev':
>>>>>>> 1 script ready mmc 1 mmc at fe2e0000.bootdev.part
>>>>>>> /boot/boot.scr
>>>>>>> ** Booting bootflow 'mmc at fe2e0000.bootdev.part_1' with script
>>>>>>> Boot script loaded from mmc 0:1
>>>>>>> 224 bytes read in 7 ms (31.3 KiB/s)
>>>>>>> 26510301 bytes read in 97 ms (260.6 MiB/s)
>>>>>>> 32883200 bytes read in 122 ms (257 MiB/s)
>>>>>>> 148323 bytes read in 30 ms (4.7 MiB/s)
>>>>>>> Working FDT set to 12000000
>>>>>>> Trying kaslrseed command... Info: Unknown command can be safely ignored
>>>>>>> since kaslrseed does not apply to all boards.
>>>>>>> Unknown command 'kaslrseed' - try 'help'
>>>>>>> ## Loading init Ramdisk from Legacy Image at 12180000 ...
>>>>>>> Image Name: uInitrd
>>>>>>> Image Type: AArch64 Linux RAMDisk Image (gzip compressed)
>>>>>>> Data Size: 26510237 Bytes = 25.3 MiB
>>>>>>> Load Address: 00000000
>>>>>>> Entry Point: 00000000
>>>>>>> Verifying Checksum ... OK
>>>>>>> ## Flattened Device Tree blob at 12000000
>>>>>>> Booting using the fdt blob at 0x12000000
>>>>>>> Working FDT set to 12000000
>>>>>>> Loading Ramdisk to eb58f000, end eced739d ... OK
>>>>>>> Loading Device Tree to 00000000eb502000, end 00000000eb58efff ... OK
>>>>>>> Working FDT set to eb502000
>>>>>>>
>>>>>>> Starting kernel ...
>>>>>>>
>>>>>>> "Synchronous Abort" handler, esr 0x96000004, far 0x96142b896930b907
>>>>>>> elr: 0000000000a8d3b8 lr : 0000000000a77450 (reloc)
>>>>>>> elr: 00000000effa13b8 lr : 00000000eff8b450
>>>>>>> x0 : 96142b896930b907 x1 : 00000000effab870
>>>>>>> x2 : 0000000000000010 x3 : 00000000edf47310
>>>>>>> x4 : 0000000000000000 x5 : 96142b896930b907
>>>>>>> x6 : 0000000000000007 x7 : 0000000000000004
>>>>>>> x8 : 0000000000000040 x9 : fffffffffffffff0
>>>>>>> x10: 00000000eb526fff x11: 00000000edf3a808
>>>>>>> x12: 0000000000000006 x13: 00000000eb502000
>>>>>>> x14: 00000000ffffffff x15: 00000000ededb588
>>>>>>> x16: 00000000eff68738 x17: 0000000000000000
>>>>>>> x18: 00000000edef4d70 x19: 00000000eceb4040
>>>>>>> x20: 00000000eff14f50 x21: 00000000eceac000
>>>>>>> x22: 00000000effcc000 x23: 0000000000000001
>>>>>>> x24: 0000000000000001 x25: 0000000000200000
>>>>>>> x26: 00000000edf385c0 x27: 00000000eced8000
>>>>>>> x28: 00000000eced9000 x29: 00000000ededb3c0
>>>>>>>
>>>>>>> Code: eb04005f 54000061 52800000 14000006 (386468a3)
>>>>>>> Resetting CPU ...
>>>>>>>
>>>>>>> [1] bdinfo for success boot
>>>>>>> => bdinfo
>>>>>>> boot_params = 0x0000000000000000
>>>>>>> DRAM bank = 0x0000000000000000
>>>>>>> -> start = 0x0000000000200000
>>>>>>> -> size = 0x00000000efe00000
>>>>>>> DRAM bank = 0x0000000000000001
>>>>>>> -> start = 0x00000001f0000000
>>>>>>> -> size = 0x0000000010000000
>>>>>>> flashstart = 0x0000000000000000
>>>>>>> flashsize = 0x0000000000000000
>>>>>>> flashoffset = 0x0000000000000000
>>>>>>> baudrate = 1500000 bps
>>>>>>> relocaddr = 0x00000000eff14000
>>>>>>> reloc off = 0x00000000ef514000
>>>>>>> Build = 64-bit
>>>>>>> current eth = unknown
>>>>>>> eth-1addr = (not set)
>>>>>>> IP addr = <NULL>
>>>>>>> fdt_blob = 0x00000000ededcd80
>>>>>>> new_fdt = 0x00000000ededcd80
>>>>>>> fdt_size = 0x0000000000017fa0
>>>>>>> lmb_dump_all:
>>>>>>> memory.cnt = 0x2 / max = 0x10
>>>>>>> memory[0] [0x200000-0xefffffff], 0xefe00000 bytes flags: 0
>>>>>>> memory[1] [0x1f0000000-0x1ffffffff], 0x10000000 bytes flags: 0
>>>>>>> reserved.cnt = 0x2 / max = 0x10
>>>>>>> reserved[0] [0xeced8000-0xefffffff], 0x03128000 bytes flags: 0
>>>>>>> reserved[1] [0x1f0000000-0x1ffffffff], 0x10000000 bytes flags: 0
>>>>>>> devicetree = separate
>>>>>>> serial addr = 0x00000000feb50000
>>>>>>> width = 0x0000000000000004
>>>>>>> shift = 0x0000000000000002
>>>>>>> offset = 0x0000000000000000
>>>>>>> clock = 0x00000000016e3600
>>>>>>> arch_number = 0x0000000000000000
>>>>>>> TLB addr = 0x00000000efff0000
>>>>>>> irq_sp = 0x00000000ededcd70
>>>>>>> sp start = 0x00000000ededcd70
>>>>>>> Early malloc usage: 2440 / 10000
>>>>>>> =>
>>>>>>>
>>>>>>>
>>>>>>> [2] bdinfo for boot failed case;
>>>>>>> => bdinfo
>>>>>>> boot_params = 0x0000000000000000
>>>>>>> DRAM bank = 0x0000000000000000
>>>>>>> -> start = 0x0000000000200000
>>>>>>> -> size = 0x00000000efe00000
>>>>>>> DRAM bank = 0x0000000000000001
>>>>>>> -> start = 0x00000001f0000000
>>>>>>> -> size = 0x0000000010000000
>>>>>>> flashstart = 0x0000000000000000
>>>>>>> flashsize = 0x0000000000000000
>>>>>>> flashoffset = 0x0000000000000000
>>>>>>> baudrate = 1500000 bps
>>>>>>> relocaddr = 0x00000000eff14000
>>>>>>> reloc off = 0x00000000ef514000
>>>>>>> Build = 64-bit
>>>>>>> current eth = unknown
>>>>>>> eth-1addr = (not set)
>>>>>>> IP addr = <NULL>
>>>>>>> fdt_blob = 0x00000000ededc1b0
>>>>>>> lmb_dump_all:
>>>>>>> memory.count = 0x1
>>>>>>> memory[0] [0x200000-0xefffffff], 0xefe00000 bytes flags: none
>>>>>>> reserved.count = 0x1
>>>>>>> reserved[0] [0xeced81a0-0xefffffff], 0x03127e60 bytes flags:
>>>>>>> no-overwrite
>>>>>>> devicetree = separate
>>>>>>> serial addr = 0x00000000feb50000
>>>>>>> width = 0x0000000000000004
>>>>>>> shift = 0x0000000000000002
>>>>>>> offset = 0x0000000000000000
>>>>>>> clock = 0x00000000016e3600
>>>>>>> arch_number = 0x0000000000000000
>>>>>>> TLB addr = 0x00000000efff0000
>>>>>>> irq_sp = 0x00000000ededc1a0
>>>>>>> sp start = 0x00000000ededc1a0
>>>>>>> Early malloc usage: 2440 / 10000
>>>>>>
>>>>>> With the LMB series applied, the memory region covered by the DRAM
>>>>>> Bank 1 is not getting added to the LMB memory map. And I suspect that
>>>>>> the scripts are using addresses in the bank 1 to load and boot the
>>>>>> kernel. Can you confirm this ? This seems to be happening because of
>>>>>> the value of gd->ram_top that is being set for the rockchip boards.
>>>>>> Based on a cursory look at arch/arm/mach-rockchip/sdram.c, the value
>>>>>> of gd->ram_top is capped at 0xf0000000 for the rk3588 boards. Can you
>>>>>> confirm if this is indeed the case ? If so, the second DRAM bank will
>>>>>> not get added to the LMB memory map, and consequently you will not be
>>>>>> able to load images to addresses in this bank. To fix this, the value
>>>>>> of gd->ram_top will have to be changed for the rockchip boards to
>>>>>> reflect the presence of memory above 4GB. Another possible solution is
>>>>>> to use addresses in the bank0 for booting the images.
>>>>>
>>>>>
>>>>> According to the bdinfo [1][2], the board have two bank:
>>>>> DRAM bank = 0x0000000000000000
>>>>> -> start = 0x0000000000200000
>>>>> -> size = 0x00000000efe00000
>>>>> DRAM bank = 0x0000000000000001
>>>>> -> start = 0x00000001f0000000
>>>>> -> size = 0x0000000010000000
>>>>>
>>>>> The Armbian scripts boot linux kernel with command:
>>>>>
>>>>> booti ${kernel_addr_r} ${ramdisk_addr_r} ${fdt_addr_r}
>>>>> fdt_addr_r=0x12000000
>>>>> kernel_addr_r=0x02000000
>>>>> ramdisk_addr_r=0x12180000
>>>>>
>>>>> It seems that the are all in bank0 ?
>>>>
>>>> Yes, they all seem to be booting from bank 0. Can you share the
>>>> command that you use for trying to boot these images ? I will try to
>>> The boot scripts is here[3].
>>>
>>> U-boot will load the boot script from emmc, then load dtb, ramdisk, kernel Image,
>>> then boot it with booti command, see the boot log before.
>>>
>>>
>>>> boot on the rockpi-4 that I have with me and see if I hit this issue.
>>>> It will be much easier for me to understand what is happening if I can
>>>> reproduce the issue on my end.
>>>
>>> Thanks for it. I can also help provide more debug information(such as modify u-boot
>>> code and add debug log) if you needed it.
>>
>> Thanks for sharing this. Will try this on my rockpi-4 and get back.
>> IIRC, Jonas has tried booting linux on other rockchip based boards,
>> and has been able to do so with the EFI part of the series applied. So
>> I suspect that this is something specific to the memory layout defined
>> for this SoC.
>
>This has nothing to do with the memory layout defined for the SoC.
>
>Images is loaded in low ram and later moved into LMB ram_top area,
>however after the series "Make LMB memory map global and persistent"
>there is an overlap of a EFI pool and where these images are moved.
>
>One or two EFI pools is allocated early during the boot, possible when
>efi_mgt bootmeth is tested, when this fails and script or extlinux
>bootmeth is used images is instead loaded and moved into LMB area.
>
>Before jumping to kernel the 1-2 remaining EFI pools is being freed and
>this cause an crash, or an illegal free, depending on what ramdisk or
>fdt data happened to overwrite the EFI pool data.
>
>Here is an example:
>
> Scanning global bootmeth 'efi_mgr':
> EFI: efi_add_memory_map_pg: 0xecedf000 0x1 4 yes
> EFI: efi_add_memory_map_pg: 0xecede000 0x1 4 yes
> EFI: BlockIO: part 0, present 1, logical 0, removable 1, last_block 62357503
> EFI: efi_add_memory_map_pg: 0xecedd000 0x1 4 yes
> EFI: efi_add_memory_map_pg: 0xecedc000 0x1 4 yes
> ...
> EFI boot manager: Cannot load any image
> Boot failed (err=-14)
> Scanning bootdev 'mmc at fe2b0000.bootdev':
> 1 extlinux ready mmc 1 mmc at fe2b0000.bootdev.part /extlinux/extlinux.conf
> ** Booting bootflow 'mmc at fe2b0000.bootdev.part_1' with extlinux
> ...
> Working FDT set to edee3f90
> Loading Ramdisk to ecd42000, end ecedf8f5 ... OK
> Loading Device Tree to 00000000ecd2d000, end 00000000ecd41dd7 ... OK
> Working FDT set to ecd2d000
>
> Starting kernel ...
>
> efi_free_pool: illegal free 0x00000000ecedf040
> efi_free_pool: illegal free 0x00000000ecedc040
>
>Above 0xecedf000 and 0xecedc000 was mapped early during efi_mgr was
>tested. However it was not freed until after ramdisk (0xecd42000 -
>0xecedf8f5) is loaded into these locations. That time it only resulted
>in an illegal free instead of a crash.
>
The systen can boot with EFI_LOADER disabled, so it seems that the solution
is wating until Sughoshs EFI/LMB sync series or Simons alternative series
is merged,
>More examples:
>https://gist.github.com/Kwiboo/7ed4fd2dea4877672189b0219b25c28b#file-u-boot-next-20241002-illegal-free-log
>
>Regards,
>Jonas
>
>>
>> -sughosh
>>
>>>
>>> [3] https://github.com/armbian/build/blob/main/config/bootscripts/boot-rockchip64.cmd
>>>
>>>>
>>>> -sughosh
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> -sughosh
>>>>>>
>>>>>>> 2.34.1
>>>>>>>
More information about the U-Boot
mailing list