[PATCH v2 00/12] mips: Add initial Octeon MIPS64 base support

Daniel Schwierzeck daniel.schwierzeck at gmail.com
Tue Jun 16 19:27:09 CEST 2020


Hi Stefan,

Am 15.06.20 um 09:49 schrieb Stefan Roese:

>>
>>> Actually I wanted to do a more complete header sync with Linux and
>>> submit some minimal refactorings of start.S so that you have a better
>>> working base. Unfortuneately I hadn't time to do so in the last 3 weeks.
>>
>> I understand. Is there something I can help you with, to speed this
>> up?
> 
> Again, can I assist with some of your tasks? I really would like to see
> the base Octeon support appear in mainline U-Boot in the upcoming
> merge window. Please see below.

my current work state is pushed to u-boot-mips in branches
header_sync_v2 and lowlevel_refactoring. That should be enough for the
first refactoring step for the next merge window. I'll try to send
patches soon. With this you can also drop patches 2/12 and 8/12.

> 
>>> Regarding the copying to L2 cache: could we do this later and add the
>>> init stuff at first? So we could have functionality at first and then
>>> boot time optimization later. This also would give me some more time to
>>> refactor the start.S hooks which would give you a better base for
>>> implementing such optimizations.
>>
>> IIUYC, then your main reason that I should remove the L2 cache copy
>> is, that you would like to refactor start.S first and add some hooks
>> for e.g. this L2C cache copy, where it would fit better than currently
>> in mips_sram_init(). I definitely promise to re-work this Octeon early
>> boot code to fit into your refactored start.S, once its ready. This
>> way, you wouldn't be pressed in time to get this rework done. And we
>> have a chance to get this Octeon base support integrated in a "decent"
>> state (bootup time, please see DDR4 init below) soon.
>>
>> There are other reasons, beside the boot speedup, for the L2 cache
>> copy:
>>
>> Running from L2 cache also helps with re-mapping the CFI flash
>> bootmap, so that the 8 MiB flash can be fully accessed. Otherwise
>> the flash is too big for the initial bootmap size and things like
>> environement can't be accessed.
>>
>> The DDR init code that I'm currently working on - I'm actually
>> preparing the first patchset right now - is only tested with this
>> configration right now.
>>
>> I'm also working on other peripheral drivers for Octeon. Some have
>> already started (like USB) and others are on my list. All this work
>> is pretty painful without the boot speedup, as e.g. the DDR4 init
>> will take very long (many minutes compared to a few seconds, as I've
>> been told).
>>
>> But of course this is your call. If you insist on removing the L2 cache
>> copy from the base Octeon port, then I will re-visit the patchet and
>> try to address it.

I had a deeper look into it, but I still can't understand why and how
this L2 copy even works. Especially after a cold-boot from flash without
any Octeon specific CPU initialization.

U-Boot is a position-dependent binary and all branches and jumps are
implemented with absolute addresses. You can't simply add an offset to
$pc (except when changing KSEG) and expect U-Boot to be running from the
new location. That's why we have the relocation step and the reloc table
where all absolute addresses are fixed relative to the relocation address.

Assembly dump of the current patch series:

ffffffff80000000 <_start>:
ENTRY(_start)
	/* U-Boot entry point */
	b	reset
ffffffff80000000:	1000013f 	b	ffffffff80000500 <reset>
	 mtc0	zero, CP0_COUNT	# clear cp0 count for most accurate boot timing
ffffffff80000004:	40804800 	mtc0	zero,$9
	...
ffffffff80000500 <reset>:
1:	mfc0	t0, CP0_EBASE
ffffffff80000500:	400c7801 	mfc0	t0,$15,1
	and	t0, t0, EBASE_CPUNUM
ffffffff80000504:	318c03ff 	andi	t0,t0,0x3ff

	/* Hang if this isn't the first CPU in the system */
2:	beqz	t0, 4f
ffffffff80000508:	11800004 	beqz	t0,ffffffff8000051c <reset+0x1c>
	 nop
ffffffff8000050c:	00000000 	nop
3:	wait
ffffffff80000510:	42000020 	wait
	b	3b
ffffffff80000514:	1000fffe 	b	ffffffff80000510 <reset+0x10>
	 nop
ffffffff80000518:	00000000 	nop

	/* Init CP0 Status */
4:	mfc0	t0, CP0_STATUS
ffffffff8000051c:	400c6000 	mfc0	t0,$12
...

If you boot off from flash at ffffffffbfc00000, the very first
instruction will jump to ffffffff80000500. This can only work if there
is already some magic memory/TLB mapping from ffffffffbfc00000 to
ffffffff80000000 or I-caches are already active.

Could you describe the basic system design (or share some documents)
i.e. how CPUs, busses, cache coherency engine, memory controller, flash
are connected and at which addresses them are mapped? And how the boot
process works? Are the I-caches already active when booting from flash?
This would really help me for understanding the code ;)

Also the complete copy of the U-Boot binary to L2 cache breaks the
U-Boot design philosophy. The task of start.S is to initialize the CPU,
initialize the external memory, prepare global_data/BSS and setup the
initial stack space. Originally the MIPS start.S required to implement
everything in assembly. Later we added the option to use global_data and
initial stack from some SRAM to be able to use a full C environment for
the lowlevel_init() code. That's why start.S is now cluttered and needs
some refactoring.

Wouldn't it possible for the first Octeon patch series to simply fit in
this scheme and use the normal relocation step to copy the U-Boot binary
to RAM or effectively L2 cache? Without the DDR init code there is not
much code which would run slowly from flash.

While writing this, it occurred to me that you maybe just work-around a
non-working or not properly initialized I-cache when booting from flash.
Or maybe fetching single instructions from flash to I-cache is slower
than copying the code at once? Can't that be fixed? Would something like
arch/mips/mach-mscc/cpu.c:vcoreiii_tlb_init() work with Octeon? Or was
something like that already done in Aaron's custom start.S but not in a
much understandable way (when not knowing the Octeon internals)?

> 
> Do you have some guidance for me on this? How should we continue here?
> Can I help with the Linux header sync or start.S refactoring? Did you
> start with this work already or do you have a rough schedule?
> 
> Sorry for being so persistent on this. I really don't want to rush
> you, but as mentioned above, I really would like to see at least the
> Octeon base port in mainline U-Boot in the next merge window. And if
> there is something that I can do to help you, then please let me know.
> 

I also want to merge the base port as soon as possible. Than we could
figure out how to properly accelerate the DDR init code respectively all
code running from flash. But I want to avoid adding a second and custom
relocation step if there are better ways. And the discussion about this
should not block the initial patch series.

Could you rebase your patch series to u-boot-mips/header_sync_v2 and
move the L2 copy code to the DDR init patch series? Than I could apply
the base port to u-boot-mips/next and you could still develop based on
the current L2 copy code?

-- 
- Daniel


More information about the U-Boot mailing list