[U-Boot] [EXT] Re: Cavium/Marvell Octeon Support

Sun Oct 27 03:08:29 UTC 2019

On Saturday, October 26, 2019 3:15:36 PM PDT Tom Rini wrote:
> External Email
> 
> ----------------------------------------------------------------------
> 
> On Fri, Oct 25, 2019 at 05:13:57PM +0200, Daniel Schwierzeck wrote:
> > Hi Aaron,
> > 
> > Am 23.10.19 um 05:50 schrieb Aaron Williams:
> > > Hi all,
> > > 
> > > I have been tasked with porting our Octeon U-Boot to the latest U-Boot
> > > and merging it upstream. This will involve a very significant amount of
> > > code that generally will not be compatible with other MIPS processors
> > > due to our needs and requirements. For example, the start.S will need to
> > > be completely different than what is present. For example, our existing
> > > start.S is 3577 lines of code in order to deal with things like RAS,
> > > exceptions, virtual memory and more. We need to use virtual memory since
> > > U-Boot can be loaded at any 4MB boundary in memory, not just 0xbfc00000.
> > > A number of drivers will need to be updated in order to properly map
> > > pointers to physical addresses. This is needed anyway, since I see
> > > numerous drivers that assume that a pointer is a DMA address. For MIPS
> > > this is never the case (I'm looking at XHCI).
> > 
> > Good to see some progress in mainline Octeon support. Could you briefly
> > describe the differences and commonalities in booting an Octeon CPU
> > compared to other "generic" MIPS cores? Or could you point me to a
> > public Git tree? It can't be that different because Linux kernel is also
> > able to share most of the code ;)
> > 
> > In principle you could compile an own start.S in your mach-octeon
> > directory, but you should try to use the generic start.S which is
> > already customisable and extensible. If needed, we could add more
> > extension points to it. Booting from any custom memory address is
> > already supported and very common for other MIPS based SoC's. Exception
> > support is also already there.
> > 
> > > The new Octeon U-Boot will be native 64-bit instead of how the earlier
> > > one was 32-bit using the N32 ABI (so 64-bit addresses could be
> > > accessed). We had to jump through some hoops to make a 32-bit U-Boot
> > > fully support 64-bit hardware.
> > 
> > We have 64 bit support for MIPS. I even sync'ed the asm/io stuff from
> > Linux in the past (which includes support for Octeon) so that you would
> > be able to use the standard IO primitives and ioremap stuff and hook in
> > your platform-specifc memory mappings.
> > 
> > > I think we can shrink the code by removing support for starting "simple
> > > executive" tasks. Simple executive tasks are bare metal applications
> > > that can run on dedicated cores beside Linux (or without Linux). I will
> > > also not be porting any support for anything older than Octeon3.
> > > 
> > > We also make heavy use of our SDK in order to perform hardware
> > > initialization and networking. In our old U-Boot, we have almost 900K
> > > lines of code. I can cut out much of this but much will remain.
> > > 
> > > We also have added extensive infrastructure for handling SFP and QSFP
> > > cables as well as very extensive phy support for phys from
> > > Aquantia/Marvell, Vitesse/Microsemi, Inphi/Cortina and an Avago gearbox.
> > > Our customer wants us to port all of this to the new U-Boot and upstream
> > > it. I'm worried about the sheer amount of code since it is absolutely
> > > massive.
> > 
> > Maybe you should cut down your customers expectations a bit. According
> > to sloccount we currently have 1.6M SLOC for the whole U-Boot. I guess
> > Tom or Wolfgang wouldn't agree with adding another 900k only for one
> > CPU. Actually what should be upstream is the basic CPU, driver and board
> > support to be able to boot a mainline kernel. Everything else like
> > custom bare metal applications or the SFP/PHY handling stuff mentioned
> > below could also be maintained in a downstream tree. Maybe Wolfgang is
> > willing to host one on gitlab.denx.de.
> > 
> > > Some of these phy drivers are extremely complex and need to tie
> > > into the SFP management. We also need to use a background polling thread
> > > while at the command prompt. A fair bit of our phy code is not in the
> > > normal phy drivers because it did not fit the model. Some of these phy
> > > drivers need to interact with the SFP support code in order to handle
> > > hot plug events in order to reconfigure themselves based on the cable
> > > type. The existing SFP code handles everything from SFP to SFP28 as well
> > > as QSFP and 100G QSFP (never tested).
> > > 
> > > In the old U-Boot the PHY support had to be significantly enhanced due
> > > to requirements for hot-plugging and how some of the PHYs are
> > > configured. It gets quite complicated with phys like the Inphi where one
> > > phy can handle either four ports (XFI/SGMII) or a single 4-lane port
> > > (XLAUI). It gets even worse since in some boards we use reclocking chips
> > > and there is one chip that handles the receive path of a QSFP and
> > > another that handles the transmit path. Further complicating things,
> > > with a QSFP it can be treated either as XLAUI or as four XFI ports, so
> > > you can have four ports spread across two chips, with each port using
> > > different slices of each chip. In the case of the Inphi/Cortina chip, a
> > > single device can handle one or four ports based on the configuration
> > > and it is configured by "slice" which is basically an offset into the
> > > MDIO register space. We had to jump through hoops in order to have this
> > > stuff work in a sane way in the device tree. We added entries for SFP
> > > and QSFP slots in the device tree which point to the MACs, GPIOs and I2C
> > > bus because pointing them to the phys just got too insane. This will
> > > need to be ported to the new U-Boot. It should not break the existing
> > > support since most of it was implemented outside of the core PHY
> > > handling code. In the port, it would be far better if this could be
> > > integrated in. The SFP management code is architecture agnostic as is
> > > all of the PHY support. The callbacks for the SFP support are used by
> > > the MAC which then notifies the PHY since the MAC often needs to
> > > reconfigure itself. It can handle some crazy configurations.
> > > 
> > > While I see some phy drivers that we also support, i.e. Cortina, our
> > > drivers tend to have a lot more functionality. For example, all of our
> > > phy drivers that support firmware support commands for upgrading the
> > > firmware as well as things like cable testing and other features.
> > 
> > PHY drivers and ethernet drivers should be really reduced to the
> > required functionality to enable basic networking like Ping, DHCP, TFTP.
> > U-Boot is still "just" a bootloader and not a system managemnt tool ;)
> > You should do that stuff either in Linux or in a downstream fork.
> > 
> > > Our bootloader needs to be able to be booted from a variety of sources,
> > > including SPI, eMMC, NOR flash and booting over the PCI bus from a host
> > > system. This is one reason we use virtual memory. The other reason is
> > > that it eliminates the need to perform relocation. Our start.S code
> > > handles all of these different cases as well as exception handling.
> > 
> > This is already supported for MIPS. You should try to use the generic
> > SPL framework for that. Whether you like the relocation or not, it's one
> > of the basic design principles of U-Boot. I guess it likely won't be
> > accepted if you circumvent this. In fact by now we're sharing the same
> > technology as Linux to have relocatable binaries without using gcc's
> > -fPIC or -mabicalls to reduce the binary footprint. You can configure
> > gd->ram_top to any address of your liking as reference address for the
> > relocation.
> > 
> > > I will also say up front that the memory initialization code is a mess
> > > and quite large (it was written by a hardware engineer who never heard
> > > of functions).
> > > 
> > > One thing is that this will break mips unless it is refactored like ARM
> > > is, for example, separating armv7 and armv8. This way we could have
> > > arch/mips/cpu/octeon. I did this with the old bootloader to separate our
> > > stuff. I'm open to suggestions as for the naming. I don't see how we can
> > > share much of the code with the other MIPS CPUs.
> > 
> > We have the same mach directory handling as in Linux MIPS. So you could
> > easily add all your platform specific code (except drivers) to
> > arch/mips/mach-octeon or (-cavium). Inside that directory you can have
> > an include directory for you cusom header files, you can even override
> > the generic files from arch/mips/include like in Linux. arch/mips/cpu
> > and arch/mips/lib should only contain generic code. As already mentioned
> > you could provide an own start.S inside arch/mips/mach-octeon but if
> > possible you should try to reuse or extend the generic variant.
> > 
> > > All in all, I think the final port will add between 500K-1M lines of
> > > code for the Octeon CPU. It is much more extensive than what is required
> > > for OcteonTX since in the latter case most of the hardware
> > > initialization is done by earlier stage bootloaders and the ATF handles
> > > things like SFP port management and many of the networking operations.
> > > 
> > > I'm not sure how well I'll be able to upstream all of this code at this
> > > point since I was just handed this task. We already have at least 1M
> > > lines of code added to the old U-Boot which is based off of 2013.08 with
> > > a lot of backports.
> 
> Daniel makes a lot of good points and I defer to him on general MIPS
> questions.  What I do want to add is that it's a good idea to start by
> focusing on the minimum needs to be able to boot Linux and aim for a
> medium term goal of having enough upstream that all of the other things
> that can live downstream, as Daniel suggests, be applied in your
> internal tree and work over time to minimize that delta, either by
> re-evaluating use-cases or submitting more code upstream.

This is my goal, unfortunately getting it to this point requires that most of 
the stuff works. I'll start on the "simpler" boards like the one the customer 
requires we first support, unfortunately there's not much simple about it. It 
requires the full networking support, SFP management and one of the more 
complex phys (and a custom one at that). Booting Linux also requires a lot of 
stuff work, including our custom command for booting Linux and all the code to 
bring cores out of reset and initialize them, at least for the current Linux 
kernel. Hopefully we can  move away from this but we will still need to 
support the current stuff. I think much of our existing code can be used and 
cleaned up. We had to jump through some hoops due to the fact that our current 
U-Boot is 32-bit but we're dealing with a 64-bit environment so this allows 
some code to be cleaned up and simplified, though even though it's 32-bit it 
can still natively perform 64-bit addressing using the N32 ABI.

The required networking and initialization code alone is massive, and that's 
just for ping, dhcp and tftp! The Linux code is much smaller because U-Boot 
needs to do all the low-level hardware initialization first. Fortunately I've 
generally been fairly strict at following the U-Boot coding standard (such as 
it was). and tried to keep the code fairly modular. I can move a few drivers 
out of the arch section and into the driver section. It's also generally well 
commented (which leads to some of the size).

I'll basically strip out all the support for earlier Octeon devices which will 
help some, unfortunately most of the current code is for Octeon3.

My goal is to re-use as much existing U-Boot code as possible and make the 
smallest impact on it as I can. There are a handful of changes I will need to 
make to the U-Boot core code, but most of these are generally quite minor.

--Aaron