[U-Boot] [EXT] Re: Cavium/Marvell Octeon Support

Wed Oct 30 16:20:31 UTC 2019

Hi Aaron,

Am 27.10.19 um 03:34 schrieb Aaron Williams:
> Hi Daniel,
> 
> On Friday, October 25, 2019 8:13:57 AM PDT Daniel Schwierzeck wrote:
>> External Email
>>
>> ----------------------------------------------------------------------
>> Hi Aaron,
>>
>> Am 23.10.19 um 05:50 schrieb Aaron Williams:
>>> Hi all,
>>>
>>> I have been tasked with porting our Octeon U-Boot to the latest U-Boot
>>> and merging it upstream. This will involve a very significant amount of
>>> code that generally will not be compatible with other MIPS processors
>>> due to our needs and requirements. For example, the start.S will need to
>>> be completely different than what is present. For example, our existing
>>> start.S is 3577 lines of code in order to deal with things like RAS,
>>> exceptions, virtual memory and more. We need to use virtual memory since
>>> U-Boot can be loaded at any 4MB boundary in memory, not just 0xbfc00000.
>>> A number of drivers will need to be updated in order to properly map
>>> pointers to physical addresses. This is needed anyway, since I see
>>> numerous drivers that assume that a pointer is a DMA address. For MIPS
>>> this is never the case (I'm looking at XHCI).
>>
>> Good to see some progress in mainline Octeon support. Could you briefly
>> describe the differences and commonalities in booting an Octeon CPU
>> compared to other "generic" MIPS cores? Or could you point me to a
>> public Git tree? It can't be that different because Linux kernel is also
>> able to share most of the code ;)
>>
> 
> Actually the low level code is significantly different. First of all, we need 
> the U-Boot bootloader to be able to boot from different memory locations. 
> Because of this, we use mapped memory for U-Boot. A side effect of this is 
> that it eliminates the need for relocation when it is shifted to the top of 
> memory. All we need to do is just set a couple of TLB entries.

Understood. but still U-Boot relocates itself from its initial entry
memory address to its destination memory address based on gd->ram_top.
Maybe this is ineffective nowadays with various SPL/TPL boot methods
because U-Boot proper is already loaded to an executable memory location
by SPL, but you have to initially deal with that design. Feel free to
suggest/submit a patch for the generic board init code to make the
reloaction configurable.

> 
> The assembly code is significantly different and is far more extensive.
> 
> Additionally, the way Octeon Linux is booted is different.
> 
> The generic start.S is not usable in our case.
> 
> We have a significant amount of code for dealing with the cache and for things 
> like copying U-Boot from flash into the L2 cache. We also have to deal with 
> taking other cores out of reset in our start.S. Our exception handler has also 
> been extended to handle multiple cores.

it's hard to discuss this without example code but I still think the
basic principles of cache and exception handling can't be that different
from generic MIPS cores. Locking cache lines and loading code to it
could be useful for other MIPS platforms and should be added as generic
feature. BTW the exception handler code is a port of the Linux one, I
only skipped the stack trace output because of the complicated stack
unwinding code. I think the current dump of general and CP0 and EPC
registers is more than feasible for a bootloader. It already helped me
multiple times to quickly locate code locations with e.g. null pointer
dereferencing.

> 
> Some other things we have included are a native API that allows Simple 
> Executive applications to make calls into U-Boot for such things as 
> environment variable access as well as access to block devices and 
> filesystems.

This is one of the parts that shouldn't be needed for basic upstream
support. It your API is a parallel and independent implementation of the
API that U-Boot already has for standalone applications, than I'm afraid
this won't be accepted and should be kept in a downstream fork.

> 
> 
> We used to have our Octeon SDK available for download but it seems this has 
> been taken down :( I'm trying to find out how I can make it available but I'm 
> getting pushback in sharing our GPLed U-Boot even though it is GPL.
> 
>> In principle you could compile an own start.S in your mach-octeon
>> directory, but you should try to use the generic start.S which is
>> already customisable and extensible. If needed, we could add more
>> extension points to it. Booting from any custom memory address is
>> already supported and very common for other MIPS based SoC's. Exception
>> support is also already there.
>>
> 
> The bootloader needs to be able to start from multiple memory locations 
> without recompiling. Our existing bootloader can run from any 4MB boundary 
> without recompiling or relocation. It can start out of flash (from any sector 
> boundary, not just 0) or L2 cache. Starting by L2 cache is supported by eMMC, 
> SPI and PCI target bootloaders. Additionally the same bootloader can be 
> started from RAM such as when the failsafe bootloader starts the main 
> bootloader. In most cases, the failsafe is the same full-featured bootloader 
> since it fits entirely within the L2 cache. Our only bootloader requirement is 
> that it fits in the L2 cache (except when booting from Flash, though this is 
> preferred for speed) and that it remain under 4 MiB in size.
> 
> I believe our exception handling is more extensive than the standard U-Boot 
> exception handler. It includes the stack output as well as numerous COP0 
> registers and decoding the cause of the exception. The exception handler is 
> also independent of a working C environment. We also need to handle exceptions 
> occurring on multiple cores as they're brought out of reset and not all cases 
> are exceptions. 

as I wrote above, the current exception handling is already feasible in
almost all cases to quickly locate code bugs and doesn't need much code.
Adding stack trace output would required adding a lot of more code. But
if you only missing some registers or want to dump the stack itself,
feel free to extend the current code.

Cores are first powered on and kept in a halted state, then
> later when we start the Linux kernel or simple executive applications, the 
> exception handler is updated (via a bootbus moveable memory region)  and an 
> NMI is generated for the cores where they will begin executing code out of 
> start.S before moving to the code that sets up the environment for booting 
> Linux and/or simple executive applications. In the latter case, TLB entries 
> are programmed in for each core.
> 
>>> The new Octeon U-Boot will be native 64-bit instead of how the earlier
>>> one was 32-bit using the N32 ABI (so 64-bit addresses could be
>>> accessed). We had to jump through some hoops to make a 32-bit U-Boot
>>> fully support 64-bit hardware.
>>
>> We have 64 bit support for MIPS. I even sync'ed the asm/io stuff from
>> Linux in the past (which includes support for Octeon) so that you would
>> be able to use the standard IO primitives and ioremap stuff and hook in
>> your platform-specifc memory mappings.
>>
> That is good to know. What I have run into is the fact that many drivers do 
> not support I/O remapping. I.e. XHCI assumes that a pointer is a DMA address. 
> Also, does the 64-bit support handle multiple cores in U-Boot?

we already have stuff like dev_remap_addr(struct udevice* dev) as part
of the driver model API to map your physical addresses from device tree
to virtual addresses. This is used in all drivers compatible with MIPS.
That function is backed by the MIPS specific ioremap_nocache() function
(also ported from Linux) so that you can hook in platform specific
mapping code. If you want to use existing drivers which don't do
remapping yet, you have to patch them. But this should be simple, we
recently did that on Broadcom or Mediatek platforms, which are sharing
drivers between their MIPS and ARM CPUs.

For XHCI you probably only need to patch the xhci_readl() and
xhci_writel() functions and establish the memory mappings in your
platform specific glue code. But USB support shouldn't be your first
priority ;)

> 
> I agree about using the standard ioremap stuff. I'm only pointing out that 
> there are places where it is missing in the common U-Boot code. Where it is 
> present, there won't be any issues since traditionally I used those methods to 
> call our platform specific remapping. I will look to see what is present and 
> if it will work or not.

yes, those places need some patching anyway. There is already an ongoing
task to address this:

https://gitlab.denx.de/u-boot/custodians/u-boot-mips/issues/15

> 
>>> I think we can shrink the code by removing support for starting "simple
>>> executive" tasks. Simple executive tasks are bare metal applications
>>> that can run on dedicated cores beside Linux (or without Linux). I will
>>> also not be porting any support for anything older than Octeon3.
>>>
>>> We also make heavy use of our SDK in order to perform hardware
>>> initialization and networking. In our old U-Boot, we have almost 900K
>>> lines of code. I can cut out much of this but much will remain.
>>>
>>> We also have added extensive infrastructure for handling SFP and QSFP
>>> cables as well as very extensive phy support for phys from
>>> Aquantia/Marvell, Vitesse/Microsemi, Inphi/Cortina and an Avago gearbox.
>>> Our customer wants us to port all of this to the new U-Boot and upstream
>>> it. I'm worried about the sheer amount of code since it is absolutely
>>> massive.
>>
>> Maybe you should cut down your customers expectations a bit. According
>> to sloccount we currently have 1.6M SLOC for the whole U-Boot. I guess
>> Tom or Wolfgang wouldn't agree with adding another 900k only for one
>> CPU. Actually what should be upstream is the basic CPU, driver and board
>> support to be able to boot a mainline kernel. Everything else like
>> custom bare metal applications or the SFP/PHY handling stuff mentioned
>> below could also be maintained in a downstream tree. Maybe Wolfgang is
>> willing to host one on gitlab.denx.de.
>>
> 
> I will try and cut it down. Much of the code is register definitions. The 
> register definition files are auto-generated and tend to be huge. They're 
> fully commented and include both big and little endian bitfields. In this case 
> I can do like I did for OcteonTX and modify the scripts that generate these 
> headers to strip out the little-endian and comments. There is a huge amount of 
> code for configuring our QLM hardware interfaces. We also have a lot of code 
> for SFP/QSFP ports. 
> 
> There are some other huge files that can also be eliminated by dropping 
> support for Octeon II and earlier. The error handling files are massive for 
> those chips.
> 
> Much of the rest can be shrunk somewhat, but a lot of that code is still 
> required.
> 
> There is a huge amount of code for dealing with our quad-lane modules (QLMs). 
> The QLMs can be configured to run in a variety of modes, from PCIe, SGMII, 
> SATA, XLAUI, XFI, Interlaken, SVRIO, QSGMII, XAUI, RXAUI and more. There is a 
> lot of tuning and configuration code needed in order to handle different 
> clocks, equalization, gain, AGC and a whole host of other serdes issues.
> 
> The MAC code is also quite large and complex since there are many coprocessors 
> that must be configured. These chips are designed as network processors. While 
> it makes their networking quite powerful and fast, it also means that a lot of 
> programming is needed before they will work. There are input parser engines, 
> buffer management engines, queueing engines, output engines and more that must 
> be fully configured before any packets can be sent or received.

what I meant was that your customer shouldn't expect to get his custom
code merged upstream as it is only with some cleanups. Of course an
user/customer can decide to use U-Boot as system management and hardware
initialisation tool but that doesn't correspond with U-Boot's design. I
think most people would agree, that a proper OS like Linux should be
doing the heavy network initialisation and hardware-offloading stuff as
well as booting all remaining CPU cores. U-Boot's responsibilty should
only be to boot that OS in the first CPU ;)

> 
> There is a fair bit of code used to bring additional cores out of reset. In 
> our biggest configuration, there can be two Octeon CN78XX chips connected in 
> tandem where each chip has 48 cores. In this case there is a lot of tuning 
> that needs to happen with the lanes connecting the two chips before this 
> configuration works reliably. There is a tuning process that is required to 
> run on both sides (and the second chip runs a small binary image as well to 
> perform its half of the tuning).
> 
> I do not know if this will change or not but the way the Linux kernel is 
> booted on Octeon is not compatible with the standard boot commands. Part of 
> this is due to the fact that Linux can be run in parallel with Simple 
> Executive applications. It's even possible to run two copies of Linux 
> simultaneously on different cores. To go along with this, there is also a 
> mechanism with named memory blocks that is used. When bring cores out of reset  
> for SE applications, the TLB entries need to be configured. There also is a 
> fair bit of code dealing with core masks when choosing which cores are used 
> for what.
> 
> We also have a named memory block feature which is used by Linux and simple 
> executive applications where blocks of memory can be carved up. U-Boot needs 
> to tie into this.
> 
> There are also a numerous other I/O interfaces that we also need to 
> initialize. Unfortunately we also have some erratas we need to work around as 
> well and a few are non-trivial.
> 
> The DRAM initialization code is also massive.  It handles DDR3 and DDR4 for 
> both registered and unregistered memory with ECC.
> 
> In many cases, the reason for the size of the code is due to the complexity of 
> the SoC and the platforms built around it. You can think of CN78XX as being 
> more like an enterprise-class server than a simple embedded device. The CN73XX 
> is not too far behind the CN78XX. The only reason our Octeon TX2 U-Boot is so 
> much smaller is that most of the early initialization takes place before U-
> Boot is started and the fact that a lot of the networking support (such as SFP 
> management and PHY support) is handled by ATF as well as on-chip managment 
> cores. This is necessary because Linux does not have any SFP management 
> support 

last year the PHY framework has been reworked to a phylink framework
which supports hot-plugging and dynamically linking of PHY drivers with
MAC drivers especially to support SFP modules. A SFP module driver is
there as well. There was a talk on ELCE 2018 about this:

https://events19.linuxfoundation.org/wp-content/uploads/2017/12/chevallier-tenart-from-the-ethernet-mac-to-the-link-partner.pdf

nor can it handle the complex typologies we're frequently running into
> today.  The requirements of Redhat also preclude any additional software being 
> installed in order for the networking support to run.
> 
> One thing I may need to re-introduce to U-Boot is the temperature sensor 
> support for devices like this, since thermal monitoring is important.

this should be easy as U-Boot already has a thermal uclass within the
driver model.

> 
> Some boards require a background task to perform periodic monitoring for 
> certain events, including the board that needs to be upstreamed. I haven't 
> checked if anything is available now, but what I did in the past was hook into 
> the input function and while waiting for input it calls a user-defined polling 
> function.
> 
> If interrupts are supported it makes the polling job easier.
>>> Some of these phy drivers are extremely complex and need to tie
>>> into the SFP management. We also need to use a background polling thread
>>> while at the command prompt. A fair bit of our phy code is not in the
>>> normal phy drivers because it did not fit the model. Some of these phy
>>> drivers need to interact with the SFP support code in order to handle
>>> hot plug events in order to reconfigure themselves based on the cable
>>> type. The existing SFP code handles everything from SFP to SFP28 as well
>>> as QSFP and 100G QSFP (never tested).
>>>
>>> In the old U-Boot the PHY support had to be significantly enhanced due
>>> to requirements for hot-plugging and how some of the PHYs are
>>> configured. It gets quite complicated with phys like the Inphi where one
>>> phy can handle either four ports (XFI/SGMII) or a single 4-lane port
>>> (XLAUI). It gets even worse since in some boards we use reclocking chips
>>> and there is one chip that handles the receive path of a QSFP and
>>> another that handles the transmit path. Further complicating things,
>>> with a QSFP it can be treated either as XLAUI or as four XFI ports, so
>>> you can have four ports spread across two chips, with each port using
>>> different slices of each chip. In the case of the Inphi/Cortina chip, a
>>> single device can handle one or four ports based on the configuration
>>> and it is configured by "slice" which is basically an offset into the
>>> MDIO register space. We had to jump through hoops in order to have this
>>> stuff work in a sane way in the device tree. We added entries for SFP
>>> and QSFP slots in the device tree which point to the MACs, GPIOs and I2C
>>> bus because pointing them to the phys just got too insane. This will
>>> need to be ported to the new U-Boot. It should not break the existing
>>> support since most of it was implemented outside of the core PHY
>>> handling code. In the port, it would be far better if this could be
>>> integrated in. The SFP management code is architecture agnostic as is
>>> all of the PHY support. The callbacks for the SFP support are used by
>>> the MAC which then notifies the PHY since the MAC often needs to
>>> reconfigure itself. It can handle some crazy configurations.
>>>
>>> While I see some phy drivers that we also support, i.e. Cortina, our
>>> drivers tend to have a lot more functionality. For example, all of our
>>> phy drivers that support firmware support commands for upgrading the
>>> firmware as well as things like cable testing and other features.
>>
>> PHY drivers and ethernet drivers should be really reduced to the
>> required functionality to enable basic networking like Ping, DHCP, TFTP.
>> U-Boot is still "just" a bootloader and not a system managemnt tool ;)
>> You should do that stuff either in Linux or in a downstream fork.
>>
> 
> This is the case for the most part. Unfortunately, many of these drivers 
> require a lot of code and some require frequent monitoring to make 
> adjustments. The SFP support is required to monitor what cable type is plugged 
> in and to reprogram the phy as needed based on the type of cable. The 10G and 
> 25G phys need different settings for optical/active vs passive copper vs SFP 
> connectors. In addition, some require different settings based on the cable 
> length and in some cases exceptions are needed for certain modules (there are 
> a series of Avago SFP to Gigabit modules that require autonegotiation to be 
> disabled in 1000Base-X mode). In at least one case there needs to be frequent 
> polling to make adjustments (25G) as the equalization settings can change 
> based on temperature. The SFP management code identifies the type of cable 
> connected and its parameters so that the phy driver can adjust the appropriate 
> settings. The SFP management code is generic and not tied to any one type of 
> phy or MAC or brand of module. It also monitors all of the GPIO pins and will 
> make callbacks when needed. Many phys lack the support for doing this 
> themselves. Phys I have worked with that need this support include Cortina/
> Inphi and several Microsemi/Vitesse devices.
> 
> The Inphy devices will typically handle four XFI lanes with four bi-
> directional slices with each slice given a different register range. Further 
> complicating matters is that a QSFP port can either be four XFI interfaces or 
> a single XLAUI interface. We have code to update the firmware for the Inphi 
> chips, but this is small compared to the rest of the initialization code. 
> These chips require that equalization and gain be configured on each slice 
> based on the board and cable characteristics as well as LED configuration.
> 
> With the Microsemi reclocking chips, each chip has four unidirectional lanes. 
> For a QSFP port, two chips are required with one chip configured for ingress 
> and the other for egress. This can support either XLAUI or four XFI 
> interfaces. When it is configured for XFI there are four XFI interfaces, since 
> now four MACs are shared with two chips with each MAC going to one lane on 
> each chip.
> 
> Also making things fun is that Inphi and the reclocking chips do not conform 
> to the clause 45 standard at all. In the case of Inphi, the ID registers are 
> 0.0 and 0.1 instead of 1.2 and 1.3 as they are in Clause 45.
> 
> The MAC drivers are also non-trivial. The Octeon chips are designed as network 
> processors with a lot of hardware offloading and coprocessors. Bringing up a 
> "simple Ethernet" interface is anything but simple. There are numerous offload 
> engines that must be configured before it will work. While we do have one 
> "simple" interface that can be configured, it often isn't because it's usually 
> only good for a management port and many boards do not have this and the 
> customers desire to be able to use any port.
> 
> Just configuring the interface between the MAC and PHY is also non-trivial. 
> The Octeon (and later CPUs) have what are called "QLMs" or quad lane modules. 
> These QLMs contain programmable serdes which can be configured for PCIe, SATA, 
> XFI, XAUI, RXAUI, SGMII, 1000Base-X, XLAUI and a whole host of other interface 
> types with a lot of tuning for things like equalization and clocks. The amount 
> of QLM initialization code is quite large but necessary. There are a lot of 
> clock and analog tuning parameters and sequences that must be run.
> 
> Sadly all of this is needed just for basic ping and DHCP. This isn't like a 
> simple e1000 NIC or the NICs common with most SoCs.

as already stated this heavy networking stuff should be the task of an
OS. I understand why you chose another way because Linux only recently
got real support for SFP or more hardware-offloading capabilities but
maybe you should take the chance and update your system design and
submit missing functionality to Linux rather than adding a lot of
networm management stuff to U-Boot.

> 
> Think of scaling from a Raspberry Pi to a dual-CPU XEON enterprise-class 
> server with 96 cores and 256GiB of RAM with 10, 25 and 40Gbe ports but without 
> a BCM or MCU to handle low-level board changes while also having many 
> enterprise-class requirements for RAS, etc. That is why our code is so large 
> and complex. There are a lot of hardware engines for offloading a lot of tasks 
> since the chips are often used in security appliances. There are engines for 
> ZIP compression, hardware regex engines, packet ordering engines, packet 
> parsing engines, buffer management engines, RAID engines and a whole host of 
> others. Many are not used in U-Boot, but a fair number are required for basic 
> packet I/O.
> 
> For example, one of the boxes contains a CN78XX with 8 10G ports (where either 
> can also be configured in XLAUI using 4 to 1 using a QSFP to SFP+ splitter 
> cable. It has 128GiB of registered DDR4 DIMMS, 4 SATA drives, redundant power 
> supplies and a whole host of other things including multiple temperature 
> monitors. This uses an Inphi/Cortina phy chip that requires full SFP 
> management support. With Inphi phys, the phy cannot drive LEDs based on 
> traffic since it has no concept of packets, especially in XLAUI mode since 
> each lane is independent of the others.
> 
> Another board, one I specifically have been told to upstream is a NIC that 
> contains a CN73XX and two 10G/25G ports that go through a complex gearbox 
> chip. Since there is no hardware support for LEDs in the Octeon SoC to 
> indicate link and packet I/O this must be done in software (including U-Boot, 
> customer requirement) and SFP port management is also a must. The phy is not 
> at all a traditional phy. It uses i2c instead of MDIO and requires frequent 
> monitoring of the link parameters (it's an older custom gearbox chip, there 
> are newer and better chips that don't require this now). I have a hook while 
> U-Boot is sitting at the prompt which allows for background tasks to operate 
> while it's sitting.
> 
> I have several other NICs to support that use a Microsemi reclocking chip that 
> has four unidirectional lanes per chip. The chip has zero intelligence and is 
> shared between ports (and on some devices, multiple chips are shared between 
> ports). Everything must be tuned based on the SFP/QSFP module type and cable 
> length. LEDs also must be software driven. (The software driving of LEDs is 
> eliminated in OcteonTX2). These chips have no way to drive the LEDs themselves 
> to indicate packet I/O or link status.
> 
> There are also other boards that use the Microsemi reclocking chips. They were 
> chosen in part due to the power budget and these chips are very low power (and 
> inexpensive).
> 
> In all of these phy cases, all of the parameters are maintained in the device 
> tree so the drivers are generic. Unfortunately these drivers also require SFP 
> and QSFP management support.
> 
> I figure if there are several boards I need to upstream, it's not much more 
> effort to port all of the boards to the new U-Boot. I've worked hard to 
> minimize the board-specific code and make as much of it generic and based on 
> the device tree as possible.
> 
> Someday I would love for SFP/QSFP infrastructure to get into Linux. Some NIC 
> cards do it in their drivers, but I'd like to see generic infrastructure (like 
> my U-Boot support). This might make it harder for some drivers to only support 
> certain brands of modules too :) The generic code I wrote works with most 
> modules except Intel (because they have bad checksums, but counterfeit Intel 
> modules work fine!). It still can be expanded at some point since there is no 
> support for module diagnostics other than identifying if it is present. Pretty 
> much all it does is monitor the GPIO pins and parse and decode the EEPROM. The 
> SFP code is generic enough such that any phy driver that needs it can easily 
> hook into it.

as already noted this is already in Linux:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/phy/phylink.c

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/phy/sfp.c

> 
>>> Our bootloader needs to be able to be booted from a variety of sources,
>>> including SPI, eMMC, NOR flash and booting over the PCI bus from a host
>>> system. This is one reason we use virtual memory. The other reason is
>>> that it eliminates the need to perform relocation. Our start.S code
>>> handles all of these different cases as well as exception handling.
>>
>> This is already supported for MIPS. You should try to use the generic
>> SPL framework for that. Whether you like the relocation or not, it's one
>> of the basic design principles of U-Boot. I guess it likely won't be
>> accepted if you circumvent this. In fact by now we're sharing the same
>> technology as Linux to have relocatable binaries without using gcc's
>> -fPIC or -mabicalls to reduce the binary footprint. You can configure
>> gd->ram_top to any address of your liking as reference address for the
>> relocation.
>>
> 
> I will look into this. One other complication is the fact that we require both 
> a failsafe as well as a default bootloader. With the older U-Boot we got 
> around all of this by just using TLB entries to map U-Boot to always run in 
> the same virtual address regardless of the physical address. It eliminated any 
> need for -fPIC and helped keep the binary small. For our older bootloader, it 
> always executes at 0xC0000000 regardless of where it sits in physical memory. 
> Using virtual memory also helps keep U-Boot simple and small.
> 
>>> I will also say up front that the memory initialization code is a mess
>>> and quite large (it was written by a hardware engineer who never heard
>>> of functions).
>>>
>>> One thing is that this will break mips unless it is refactored like ARM
>>> is, for example, separating armv7 and armv8. This way we could have
>>> arch/mips/cpu/octeon. I did this with the old bootloader to separate our
>>> stuff. I'm open to suggestions as for the naming. I don't see how we can
>>> share much of the code with the other MIPS CPUs.
>>
>> We have the same mach directory handling as in Linux MIPS. So you could
>> easily add all your platform specific code (except drivers) to
>> arch/mips/mach-octeon or (-cavium). Inside that directory you can have
>> an include directory for you cusom header files, you can even override
>> the generic files from arch/mips/include like in Linux. arch/mips/cpu
>> and arch/mips/lib should only contain generic code. As already mentioned
>> you could provide an own start.S inside arch/mips/mach-octeon but if
>> possible you should try to reuse or extend the generic variant.
>>
> 
> We can't use the existing start.S. We have a lot of requirements that are not 
> supported there as well as a fair bit of code dedicated to dealing with the 
> cache and TLBs and bringing additional cores out of reset. We make use of a 
> boot bus movable region in order to do this and handle other cases like NMIs 
> and the watchdog. Our start.S currently sits at around 3800 lines of code. 
> Some is common but most is not.
> 
> Our start.S is designed to be able to boot both a failsafe and non-failsafe 
> image and supports adjusting the flash mapping in order to start from an 
> offset other than zero in the flash. There is also a fair bit of code for 
> copying the image out of flash into the L2 cache for a significant speedup for 
> DRAM initialization. I'm trying to get permission to share our existing code 
> but I'm getting push-back (even though it's GPL!?!). How they want me to 
> upstream it without sharing the code is beyond me.
> 
> While U-Boot has an exception handler, I believe ours is more comprehensive. 
> It is written entirely in assembler and is not dependent on a working C 
> runtime environment. It also dumps more information than just the registers 
> such as the stack and a number of other exception registers and does some 
> exception decoding. It's quite a bit better than the ARMv8 exception handler 
> IMHO.
> 
> Putting this under mach-octeon will make it much easier. I'll try and re-use 
> where I can.
> 
>>> All in all, I think the final port will add between 500K-1M lines of
>>> code for the Octeon CPU. It is much more extensive than what is required
>>> for OcteonTX since in the latter case most of the hardware
>>> initialization is done by earlier stage bootloaders and the ATF handles
>>> things like SFP port management and many of the networking operations.
>>>
>>> I'm not sure how well I'll be able to upstream all of this code at this
>>> point since I was just handed this task. We already have at least 1M
>>> lines of code added to the old U-Boot which is based off of 2013.08 with
>>> a lot of backports.
> 
> I'm trying to get  our existing code made available someplace online. I'm 
> getting pushback even though U-Boot is GPL and the license on our SDK is BSD-
> like (i.e. do whatever you want but don't hold us responsible). It looks like 
> it used to be available but was taken down. I don't undertstand lawyers. All 
> of the code I wrote is GPL. There is some U-Boot specific code in our SDK, but 
> none was copied from U-Boot. There also is some duplication of functionality 
> between U-Boot and our SDK that I'll try and eliminate.
> 
> I have implemented just about every feature in U-Boot I could with our Octeon 
> SoC. That's another reason it's so large. Some customer always comes back and 
> says they want feature X to work. Fortunately, the changes to the U-Boot 
> supplied code are generally minimal, despite it being so large.
> 
> I likely will need to add some more hooks to board_f.c and board_r.c. I have 
> run into many cases where we need a specific order of initialization that does 
> not match the normal U-Boot order. Perhaps make init_sequence_f and 
> init_sequence_r weak so that they can be overridden if needed by a specific 
> board or architecture. While much of the current init order works,  we need 
> some things initialized as quickly as possible and others initialized later. 
> For example, the first thing we call is an early_errate_workaround function in 
> the init sequence before anything else is called. 
> 

I guess overriding the complete generic board init code is not
acceptable. It was once hard work to unify this. A hook like
early_errate_workaround() sounds reasonable but could also be called
from start.S before handing over to board_init_f(). But everything else
should fit into the exisiting init hooks. There are quite a lot.

-- 
- Daniel