Requiring SPL_DM for new boards?

Wed Nov 2 02:54:11 CET 2022

Hi all,

Thanks for CCing me, Andre.

On 11/1/22 13:15, Andre Przywara wrote:
> On Mon, 31 Oct 2022 15:43:01 -0400
> Tom Rini <trini at konsulko.com> wrote:
> 
> Hi Tom, Simon,
> 
>> On Mon, Oct 31, 2022 at 01:27:06PM -0600, Simon Glass wrote:
>>> Hi Tom,
>>>
>>> On Sun, 30 Oct 2022 at 11:53, Tom Rini <trini at konsulko.com> wrote:  
>>>>
>>>> On Sat, Oct 29, 2022 at 07:44:01PM -0600, Simon Glass wrote:  
>>>>> Hi Tom,
>>>>>
>>>>> On Fri, 21 Oct 2022 at 10:26, Tom Rini <trini at konsulko.com> wrote:
>>>>>  
>>>>>>
>>>>>> On Fri, Oct 14, 2022 at 09:56:44AM -0600, Simon Glass wrote:
>>>>>>  
>>>>>>> Hi,
>>>>>>>
>>>>>>> What do people think about requiring SPL_DM for new boards?
>>>>>>> Would that cause any problems?
>>>>>>>
>>>>>>> There is not much use of of-platdata (compiling the DT into C
>>>>>>> to save space) - is that because it doesn't work for people?

Indeed, I tried enabling of-platdata on sunxi at one time, and it was
not feasible to get working. Some of the issues I saw:

 - There was no support for udevice_id::data -> udevice::driver_data.
   Several of our drivers share one U_BOOT_DRIVER for the entire SoC
   family; DM_DRIVER_ALIAS would need to plumb through a data argument.

 - Functions like clk_get_by_index and gpio_request_by_name would need
   to work at least for index zero, where you could sort of get away
   with not being able to read #foo-cells from the supplier. (The
   OF_PLATDATA code would need to rewrite phandles to device IDs.)
   Otherwise, you need totally separate code for OF_PLATDATA vs OF_REAL.

 - We recently converted our platform to use the pinctrl nodes from the
   devicetree (PINCTRL_GENERIC) in U-Boot proper. That does not work at
   all with OF_PLATDATA.

>>>>>>> I am particularly keen to drop the old block interface from
>>>>>>> SPL. It seems to me that boards that can use that might have
>>>>>>> enough space to enable SPL_DM and SPL_DM_BLK? What do people
>>>>>>> think?  
>>>>>>
>>>>>> I don't think this works. The problem is we aren't seeing new
>>>>>> SoCs that have a large initial amount of memory but rather many
>>>>>> continuing to have 32KiB or similar tiny sizes. So, I'd rather
>>>>>> continue to go with saying it's optional, but that we won't
>>>>>> introduce new SPL functionality that can be DM or not DM, but
>>>>>> only new functionality that needs SPL_DM and if platforms want
>>>>>> it, but have limited memory, we need to go TPL->SPL in that
>>>>>> case.  
>>>>>
>>>>> OK I see.
>>>>>
>>>>> What do you think of a migration method for boards which don't use
>>>>> SPL_DM, so they migrate to TPL? Would that cause a lot of
>>>>> problems?  
>>>>
>>>> I'm not sure what it gains us. Maybe the first step here is to see
>>>> what the list of non-DM_SPL platforms / SoCs are?  
>>>
>>> OK:
>>>
>>> $./tools/moveconfig.py -b
>>>
>>> $ ./tools/moveconfig.py -f SPL ~SPL_DM
>>> 323 matches
>>> ...
>>>
>>> $ ./tools/moveconfig.py -f SPL_DM
>>> 333 matches
>>> ...  
>>
>> OK, if we start parsing things out, PowerPC is one chunk of that and
>> won't change. Another chunk of that is sunxi which is a "still making
>> new SoCs with very small SRAM" and it's worth talking with Andre for
>> thoughts there.
> 
> Most newer SoCs are not that seriously limited anymore - though that's
> not universal, since Allwinner is still making a lot of new "small"
> SoCs (with single Cortex-A7s (or older!) and embedded DRAM). And
> regardless of that, until recently the BROM wouldn't load more than 32 KB
> into SRAM, anyway.

Starting with H6, and as far as I know for every SoC after that, the
normal BROM will load up to the full size of SRAM A1 + SRAM C, which is
>96 KB. That is plenty of space for SPL_DM + SPL_OF_REAL. I can look
back at my BROM analyses if you want the exact numbers.

So I don't think it is  correct to say Allwinner is "still making new
SoCs with very small SRAM".

> So I cannot say for sure what the situation for new "boards" (rather
> SoCs?) will be, but we are stuck with the current legacy SPL for existing
> SoCs, for sure.

That is mostly true, with the exception that H6 and H616 can upgrade to
SPL_DM. As you mention, A64 and H5 are the most problematic cases, since
they are aarch64 but still have the 32 KB limit.

> Samuel has been working on SPL_DM for the D1 (RISC-V) port, though I am
> not a big fan of it:
> - We would still need the legacy SPL code, since older SoCs are still
>   bound to the 32K limit. That is already a stretch for SoCs like the
>   A64, where we are already very close to that limit.
> - It adds to the test matrix, since we now need to support and
>   maintain DM/proper, legacy SPL and SPL-DM.

I agree that there is some additional complication here, with the two
main sources of surprising differences between the SPL and U-Boot DM
execution environment being inconsistent Kconfig options and malloc.

However, on the driver side, the code running under SPL_DM and U-Boot is
exactly the same. There is not a third driver variant.

> - Forcing a DT and DM code into that very restricted space requires too
>   many compromises for my taste. I like nice driver frameworks and love
>   DT, but one must be able to afford all of this. If you have 100s of
>   KBs or MBs available, that's all fine, but cutting corners to make it
>   fit into 32K takes away much of the beauty and flexibility. The DT
>   changes (u-boot,dm-pre-reloc) we need to make are some sign of it.

Yes, I would *really* appreciate the ability to skip fdtgrep and just
put the whole devicetree in SPL, or at least have it be much more
conservative about what it drops (certain properties, disabled nodes) so
that annotating the devicetree is not necessary.

The current Makefile logic does not allow us to have a per-SoC
"-u-boot.dtsi" file, so we would need to annotate per board. But even if
it did, I don't want to pick and choose nodes; that's what Kconfig is for!

> - We actually don't gain much, because the information the SPL needs is
>   mostly not in the DT to begin with:
>   - The whole DT clock node is opaque, it basically just says "it's this
>     SoC's CCU". That is OK for a single image kernel like Linux, but the
>     SPL knows that already - either by build time config or by reading the
>     SOCID register. And the SPL does *basic* clock setup, which we cannot
>     really describe in the DT at all.
>   - The situation is similar for pinctrl: the actual mux value for a
>     certain function is not in the DT, but hardcoded in the driver. We
>     already tried to hack this down for U-Boot, and only got away with
>     quite some squinting.

You say "we don't gain much", but I see clk_enable() and the automatic
pinctrl_select_state() Just Working as an absolutely massive improvement.

Currently, SoC bringup requires updating the DT driver... and then 20+
ifdefs in various files. If new SoCs use SPL_DM, then new SoCs just need
an updated DT driver. They don't have to touch any of the ifdefs.

>   - The DRAM controller isn't even mentioned in the DT. And while we could
>     add that, the information we need is very minimal.

It is; the DRAM controller is the mbus node.

Writing a DM driver for the DRAM controller is effectively the same
amount of effort as doing it the legacy way. You just wrap it in a
U_BOOT_DRIVER and do the init in the .probe function. The change in the
board code is similarly trivial:

-	sunxi_dram_init();
+	uclass_get_device(UCLASS_RAM, 0, &dev);

>   - For storage devices (MMC, SPI-NOR) we can use the same fixed per-SoC
>     values as the BROM does, so just need the base address. There are only
>     like four different values across all Allwinner SoCs. The rest of the

There are even fewer values to keep track of if the only MMIO addresses
hardcoded in some header are the ones for pre-H6 SoCs ;-).

>     DT node is either not useful (opaque clock handles) or not needed
>     (interrupts).

As long as you ignore raw NAND, maybe you can get away with using slow,
safe code like the BROM does. But sometimes you really do need per-board
information like ECC parameters from the DT node.

> Yes, there are some boards which require regulator setup in the SPL, which
> is described in the DT, but again this still requires regulator
> knowledge in the code, and is also quite universal (mostly by SoC again).
> 
> So in summary: it would be a lot of work, which we cannot extend to older
> SoCs because of technical limitations. But more importantly I think we
> don't gain much to make it worth.

I agree that old SoCs pre-H6 are stuck for now. But I do see a lot of
benefit for new SoCs. Especially when starting from a blank slate on
RISC-V, it was much easier porting one set of drivers and SPL_DM, than
porting two sets of drivers.

> Historically we more naturally shared code between SPL and U-Boot proper,
> because U-Boot proper used to look much like the SPL looks today (clock
> code, for instance). But much of this is mostly obsolete, because there is
> not much overlap, code-wise, the only exception being the common MMC protocol
> handling, maybe. So I am actually more tempted to spell this out more
> openly, and separate and trim down the SPL code, avoiding full-featured
> (DM) drivers at all, if possible (like the SPI NOR code does).
> 
> We can look into parsing the DT to gather base addresses (and putting them
> into generated headers), or to enable Kconfig options (board needs a
> regulator), but I would very much like to keep the SPL lean and mean.
> 
> The BROM is able to do all the loading without *any* board information
> whatsoever. All that the BROM is missing is the DRAM init, which requires
> just two or three parameters (LPDDR vs. DDR, frequency). So we could
> actually live with a *per-SoC* SPL: we know where the BROM booted from, so
> can continue doing so using the same fixed settings as the BROM used (for
> SD card, eMMC, SPI, FEL). We actually exercise this idea already in
> arch/arm/mach-sunxi/spl_spi_sunxi.c, which is separate from the normal
> SPI-NOR code, just focusing on some conservative read-only command to get
> the FIT image into DRAM.
> I would rather go into this direction than forcing DM into the SPL.

I suppose there are two ways of thinking about SPL:
 1) Exactly like U-Boot proper, except we removed all of the interactive
    parts to make it smaller.
 2) Load U-Boot from a fixed location on disk as fast as possible using
    as little code as possible, and do nothing else.

The second view ignores things like disk/MTD partitions, verified boot,
falcon mode, reboot modes, multi-DTB FITs, etc. that would benefit from
having all of the DM and FDT infrastructure available. But on the other
hand, it gets you a highly-optimized program that does one thing and
does it well. I do like the idea of only needing one binary per SoC.

I guess my question is, what do the U-Boot maintainers want U-Boot SPL
to be? If it's more like the first description, then maybe it makes more
sense to build the lean and mean SPL outside the U-Boot infrastructure.
Then if the SPL_DM migration is forced at some point, we could just
disable SUPPORT_SPL for anything older than H6, and treat SPL as a blob.
But that still seems like quite a lot of unnecessary work when we have a
working U-Boot SPL for those SoCs today. What do you think?

Regards,
Samuel