[U-Boot] [PATCH 1/4] Revert "sunxi: Move the SPL stack top to 0x1A000 on Allwinner A64/A80"

Thu Sep 8 15:23:14 CEST 2016

Hi,

On 08/09/16 11:51, Siarhei Siamashka wrote:
> On Mon, 5 Sep 2016 09:23:00 +0100
> Andre Przywara <andre.przywara at arm.com> wrote:
> 
>> Hi,
>>
>> On 05/09/16 05:12, Siarhei Siamashka wrote:
>>> On Mon,  5 Sep 2016 01:32:38 +0100
>>> Andre Przywara <andre.przywara at arm.com> wrote:
>>>   
>>>> This commit moved the SPL stack into SRAM C, which worked when the SPL
>>>> set the AHB1 clock down to 100 MHz to cope with the flaky SRAM C access
>>>> from the CPU.
>>>> However booting with boot0 (and thus not using SPL at all) we still run
>>>> with a 200 MHz AHB1, so any access to SRAM C is prone to fail.
>>>> Since this commit does _not_ only affect the SPL code, but also the
>>>> U-Boot proper, we fail when booting with boot0.  
>>>
>>> Yes, it unfortunately affected both the SPL and the U-Boot
>>> proper because currently both CONFIG_SPL_STACK and
>>> CONFIG_SYS_INIT_SP_ADDR defines affect the SPL stack
>>> location and in practice this only works in a predictable
>>> way if they are set to the same value. I have sent a patch
>>> to address this problem (but the fix may be unsafe for
>>> v2016.09 because many ARM platforms are affected):
>>>
>>>     https://patchwork.ozlabs.org/patch/665608/
>>>
>>> After this problem is resolved, the CONFIG_SYS_INIT_SP_ADDR
>>> define can be decoupled from CONFIG_SPL_STACK and configured to
>>> even use the DRAM instead of thrashing some part of the scarce
>>> SRAM space (which may be already occupied by the OpenRISC
>>> firmware and/or the ATF at the time when the U-Boot proper is
>>> starting).
>>>   
>>>> As the introduction of tiny-printf reduced the size of the SPL, we
>>>> can afford to have the SPL stack in SRAM A1.  
>>>
>>> We still need to check how much space is really available. The FIT
>>> support is rather heavyweight and we may want to enable some other
>>> features too.  
>>
>> Yes, I had to learn this yesterday ;-)
>> So 64-bit SPL works for me now with Jens' DRAM support patches (yeah!),
>> but enabling FIT support makes mksunxiboot barf about the file being to
>> big. The actual SPL code is about 31K, so maybe I can talk mksunxiboot
>> into relaxing its alignment requirements a bit (from 8K down to 512) and
>> also increase the available SRAM size - it says 0x7600 for sun4i, is
>> this still true to newer SoCs/BROMs?
> 
> We have this information in the linux-sunxi wiki since a long time ago
> (at least for the SoC variants that I have and could experiment with)
> and it is available here:
> 
>     https://linux-sunxi.org/BROM#U-Boot_SPL_limitations
> 
> All the new SoCs have a 32K size limit for the SPL code, which can be
> loaded by the BROM. Older A10/A20 SoCs artificially limit it to 24K,
> probably trying to forcefully encourage the users to have 8K stack in
> the remaining part of the SRAM A1.
> 
> On A64, we have 32K of SRAM A1. Then we have 108K of SRAM C, which is a
> continuation of SRAM A1 in the address space thus making it look like a
> nice single 140K chunk. Then we also have 64K of SRAM A2, which is
> supposed to be used by the OpenRISC core and is the only memory area,
> which has a reasonable performance when used by OpenRISC:
> 
>     https://linux-sunxi.org/AR100#Memory_Map
> 
> The idea was to let the BROM load up to 32K of the SPL code to the
> SRAM A1 (like it normally does) and then have 8K of stack a bit higher
> in the address space in SRAM C. But it turned out that the SRAM C is a
> bit quirky and suffers from data corruption problems if we reclock
> AHB1 too early.
> 
> Now there are two possible ways to move forward on A64:
>   1) Try to use SRAM C in such a way that it does not fail (and hope
>      that no additional quirks get discovered later).
>   2) Move the initial SPL stack to SRAM A2.
> 
> If we move everything to SRAM A2, then we will have to make sure that
> all the SRAM users (the FEL storage area, the SPL stack, the ATF and
> the yet to be implemented OpenRISC firmware) never clash with each
> other.

So I moved the initial stack into SRAM A2 already, which made SPL to
work in AArch64 (in contrast to having the stack at the end of SRAM A1,
which breaks quite early - though I managed to see the SPL banner ;-)
But I agree that we need to teach FEL about it as well and this may
break actually loading things like ATF into SRAM A2 with FEL then.

> About the 31K code size. This does not look good and is very close to
> the BROM limit (32K). Just using a different compiler may bring us into
> a trouble. Or some minor code tweaks and feature additions.

I totally agree. A nasty drawback is already that I can't enable debug.

>> Trying this in the past (with libdram) and compiling for (32-bit) Thumb2
>> worked, but I need to check what the actual size with Jens' patches are
>> these days for Thumb2.
> 
> We have already discussed this off-list a long time ago. I know that
> both you and Alexander Graf are generally in favour of compiling the
> SPL as 64-bit code.
> 
> I think that this is the usual case of utility versus fashion. Everyone
> wants to plug every hole with 64-bit ARM code right now just because it
> is new and innovative. But this fad will fade away in a few years. Now
> just imagine an alternative reality, where ARM64 is an old and boring
> thing, while Thumb2 is a recent invention to improve code density in
> microcontrollers and other code space constrained systems. I'm sure
> that everyone would be trying to find a way to replace the legacy
> bloated 64-bit ARM code in the SPL with the new and shiny Thumb2
> stuff for improving code density ;-)
> 
> If we take a pragmatic approach and try to evaluate pro- and cons-
> factors, then we can see that the 64-bit code in the SPL on
> Allwinner A64 hardware does not give us any real improvements.
> Quite the contrary: it offers worse code density than 32-bit
> Thumb2 and also a functional USB FEL boot support becomes much
> more tricky (because the boot ROM implements FEL as a 32-bit code).

Well, when we did 64-bit SPL experiments earlier this year, we found the
code size to be quite reasonable (around 20-24K?), but apparently this
was missing some stuff (FIT for instance, which includes some libfdt
code, or SPI flash).

Now contrary to your apparent belief I am not married to aarch64 ;-)
Actually I started yesterday with going back to a 32-bit SPL, and the
code size is amazingly small there, up to a point where I wonder if
there is some bug somewhere in the build which either optimizes armv7
more or includes unneeded stuff in armv8. I need to investigate this.

So after I spend an evening with cutting of bytes from the 64-bit build
(don't link ccn and GIC code, creating tiny-ctype, only instantiating
two MMC devices, ...) I am quite open to the idea of going with a 32-bit
SPL ;-)

> The only real argument in favour of having a 64-bit SPL is that we
> can use a single AArch64 toolchain to build both the SPL and the
> main U-Boot. And we are in this situation only because the AArch64
> toolchain does not support the "-m32" option. There is no technical
> justification for this.

Well, there is. In contrast to other both 32 and 64-bit capable
architectures the ISA is quite different between the both: assembly
isn't compatible (w0 vs. r0), the encoding is _totally_ different, many
instructions are different, different capabilities of relative
addressing and immediate encoding (you can't orr w0, w0, #0x28000, for
instance ;-)
Yes, they also share many ideas and the assembly bears some
similarities, so in the end I guess there are an equal number of
arguments in favour and against this. Plus the GCC and binutils code
base is reportedly not in a shape which would encourage such mergers ;-)

> ARM decided to be different just for the
> sake of being different (every other architecture has the -m32
> option in GCC if the processor is able to work in both modes).

As mentioned above this isn't entirely true, and I think there were
quite some discussions about that - and there still are (I heard
something about -m32 over lunch a few months back).
Feel free to bring this up again on the respective toolchain mailing
lists ;-)

> If the "-m32" option was supported, then building a 32-bit SPL
> would have been mostly a trivial matter of adding "-m32 -mthumb"
> options to CFLAGS.
> 
> But we can try to do a 32-bit SPL build by introducing something like
> a CROSS_COMPILE_SPL environment variable, just like suggested some
> time ago: http://lists.denx.de/pipermail/u-boot/2012-April/122236.html

Which seemed to have been shut down by Wolfgang ;-), though with
arguments not really applying to our case.
That being said, I remember having written a small wrapper script back
when we had this discussion, which scans for a -m32 option in the
command line and calling the arm(32) compiler (with the -m32 removed)
then - or passing everything to the aarch64 cross compiler.
This was surprisingly simple, I wonder if this could be integrated into
the (sunxi) U-Boot build environment somehow.
Still it would require people to have two cross-compilers installed - or
(probably more annoying) to have a cross-compiler in the first place
even if one compiles natively.

> Also I'm finally going to submit the runtime SPL code decompression
> patches for the next U-Boot release, because there is no need to delay
> the implementation of this feature any longer. Yes, I know that any
> saved space will be wasted almost instantly by various gimmicks, but
> that's just how it is.

Great! I was wondering about the state of that. Can you give some
ballpark figures already?

>> Anyway, thanks for your patch, I will try tonight if I can squeeze all
>> the bits in.
> 
> If you mean https://patchwork.ozlabs.org/patch/665608/ then it only
> gives us the freedom to move CONFIG_SYS_INIT_SP_ADDR somewhere else.
> 
> And moving the initial stack if the U-Boot proper into the DRAM would
> make a lot of sense.

Oh, I was under the impression that it would that already?

> We only need to agree what kind of DRAM address to
> use. After all, even the SPL relocates the stack into the DRAM. Why
> does the U-Boot proper want to use the SRAM for its stack again?
> 

Cheers,
Andre.