[PATCH v1 01/10] mips: octeon: Initial minimal support for the Marvell Octeon SoC
Stefan Roese
sr at denx.de
Thu May 14 11:19:52 CEST 2020
On 14.05.20 01:43, Daniel Schwierzeck wrote:
>
>
> Am 02.05.20 um 10:59 schrieb Stefan Roese:
>> From: Aaron Williams <awilliams at marvell.com>
>>
>> This patch adds very basic support for the Octeon III SoCs. Only
>> CFI parallel NOR flash and UART is supported for now.
>>
>> Please note that the basic Octeon port does not include the DDR3/4
>> initialization yet. This will be added in some follow-up patches
>> later. To still use U-Boot on with this port, the L2 cache (4MiB on
>> Octeon III CN73xx) is used as RAM. This way, U-Boot can boot to the
>> prompt on such boards.
>>
>> Signed-off-by: Aaron Williams <awilliams at marvell.com>
>> Signed-off-by: Stefan Roese <sr at denx.de>
>> ---
>>
>> MAINTAINERS | 6 +
>> arch/Kconfig | 1 +
>> arch/mips/Kconfig | 49 +-
>> arch/mips/Makefile | 7 +
>> arch/mips/cpu/Makefile | 4 +-
>> arch/mips/include/asm/arch-octeon/cavm-reg.h | 42 +
>> arch/mips/include/asm/arch-octeon/clock.h | 24 +
>> arch/mips/mach-octeon/Kconfig | 92 ++
>> arch/mips/mach-octeon/Makefile | 10 +
>> arch/mips/mach-octeon/clock.c | 22 +
>> arch/mips/mach-octeon/cpu.c | 55 +
>> arch/mips/mach-octeon/dram.c | 27 +
>> arch/mips/mach-octeon/include/ioremap.h | 30 +
>> arch/mips/mach-octeon/start.S | 1241 ++++++++++++++++++
>> 14 files changed, 1608 insertions(+), 2 deletions(-)
>> create mode 100644 arch/mips/include/asm/arch-octeon/cavm-reg.h
>> create mode 100644 arch/mips/include/asm/arch-octeon/clock.h
>> create mode 100644 arch/mips/mach-octeon/Kconfig
>> create mode 100644 arch/mips/mach-octeon/Makefile
>> create mode 100644 arch/mips/mach-octeon/clock.c
>> create mode 100644 arch/mips/mach-octeon/cpu.c
>> create mode 100644 arch/mips/mach-octeon/dram.c
>> create mode 100644 arch/mips/mach-octeon/include/ioremap.h
>> create mode 100644 arch/mips/mach-octeon/start.S
>>
>
> I couldn't completely understand the start.S. There is too much stuff in
> it for an initial merge. But I don't see a hard reason against using the
> generic start.S. So the first patch series should only implement the
> bare minimum needed to boot from flash, init the boot CPU core, maybe
> suspend all other cores and relocate to L2 cache.
I already worked on using the common start.S with minimal custom
additions for Octeon. This will be included in v2 of the base Octeon
patchset.
> I know the current start.S is not really suited yet but I'm working on a
> refactoring to add some more hooks which a SoC/CPU can implement. Once
> we have your initial patch series and the refactoring in mainline, it
> should be possible to gradually add more Octeon stuff like memory init.
>
> Basic idea for refactoring is something like this:
>
> reset:
> - mips_cpu_early_init() # custom early init, fix errata
> - init CP0 registers, Watch registers
> - mips_cache_disable() # set K0 CCA to uncached
> - mips_cpu_core_init() # per CPU core init
> # -> generic code issues wait instr.
> # -> custom code can do custom init
> # or custom boot protocols
> - mips_cm_map() # init CM if available
> - mips_cache_init() # init caches, set K0 CCA to non-coh.
> - mips_sram_init() # init SRAM, Scratch RAM if avail
> - setup initial stack and global_data
> - debug_uart_init()
> - mips_mem_init() # init external memory, C env avail.
> - init malloc_f
> - board_init_f()
Thanks Daniel, this sounds like a very good approach. I'll send v2 later
today (as its already finished). We can then work on how to integrate
it, either by using the currently available functions like
mips_sram_init(), or by extending start.S (and the Octeon custom code)
with some other, newly introduced functions.
Thanks,
Stefan
>> +
>> +#endif /* __ASM_MACH_OCTEON_IOREMAP_H */
>> diff --git a/arch/mips/mach-octeon/start.S b/arch/mips/mach-octeon/start.S
>> new file mode 100644
>> index 0000000000..acb967201a
>> --- /dev/null
>> +++ b/arch/mips/mach-octeon/start.S
>> @@ -0,0 +1,1241 @@
>> +/* SPDX-License-Identifier: GPL-2.0+ */
>> +/*
>> + * Startup Code for OCTEON 64-bit CPU-core
>> + *
>> + * Copyright (c) 2003 Wolfgang Denk <wd at denx.de>
>> + * Copyright 2004, 2005, 2010 - 2015 Cavium Inc..
>> + */
>> +
>> +#include <asm-offsets.h>
>> +#include <config.h>
>> +#include <asm/regdef.h>
>> +#include <asm/mipsregs.h>
>> +#include <asm/asm.h>
>> +
>> +#define BOOT_VECTOR_NUM_WORDS 8
>> +
>> +#define OCTEON_BOOT_MOVEABLE_MAGIC_OFFSET 0x70
>> +#define OCTEON_BOOT_VECTOR_MOVEABLE_OFFSET 0x78
>> +
>> +#define OCTEON_BOOT_MOVEABLE_MAGIC1_RAW 0xdb00110ad358eacd
>> +#define OCTEON_BOOT_MOVEABLE_MAGIC1 OCTEON_BOOT_MOVEABLE_MAGIC1_RAW
>> +
>> +#define OCTEON_CIU_SOFT_RST 0x8001070000000740
>> +
>> +#define OCTEON_L2C_WPAR_PP0 0x8001180080840000
>> +#define OCTEON_MIO_BOOT_BASE 0x8001180000000000
>> +#define OCTEON_MIO_BOOT_REG_CFG0_OFF 0x0000
>> +#define OCTEON_MIO_BOOT_LOC_CFG0_OFF 0x0080
>> +#define OCTEON_MIO_BOOT_LOC_ADR_OFF 0x0090
>> +#define OCTEON_MIO_BOOT_LOC_DAT_OFF 0x0098
>> +#define OCTEON_MIO_RST_BOOT 0x8001180000001600
>> +#define OCTEON_MIO_BOOT_REG_CFG0 0x8001180000000000
>> +#define OCTEON_MIO_BOOT_REG_TIM0 0x8001180000000040
>> +#define OCTEON_MIO_BOOT_LOC_CFG0 0x8001180000000080
>> +#define OCTEON_MIO_BOOT_LOC_ADR 0x8001180000000090
>> +#define OCTEON_MIO_BOOT_LOC_DAT 0x8001180000000098
>> +#define OCTEON_MIO_FUSE_DAT3 0x8001180000001418
>> +#define OCTEON_L2D_FUS3 0x80011800800007B8
>> +#define OCTEON_LMC0_DDR_PLL_CTL 0x8001180088000258
>> +
>> +#define OCTEON_RST 0x8001180006000000
>> +#define OCTEON_RST_BOOT_OFFSET 0x1600
>> +#define OCTEON_RST_SOFT_RST_OFFSET 0x1680
>> +#define OCTEON_RST_COLD_DATAX_OFFSET(X) (0x17C0 + (X) * 8)
>> +#define OCTEON_RST_BOOT 0x8001180006001600
>> +#define OCTEON_RST_SOFT_RST 0x8001180006001680
>> +#define OCTEON_RST_COLD_DATAX(X) (0x80011800060017C0 + (X) * 8)
>> +
>> +#define OCTEON_OCX_COM_NODE 0x8001180011000000
>> +#define OCTEON_L2C_OCI_CTL 0x8001180080800020
>> +#define OCTEON_L2C_TAD_CTL 0x8001180080800018
>> +#define OCTEON_L2C_CTL 0x8001180080800000
>> +
>> +#define OCTEON_DBG_DATA 0x80011F00000001E8
>> +#define OCTEON_PCI_READ_CMD_E 0x80011F0000001188
>> +#define OCTEON_NPEI_DBG_DATA 0x80011F0000008510
>> +#define OCTEON_CIU_WDOG(X) (0x8001070000000500 + (X) * 8)
>> +#define OCTEON_CIU_PP_POKE(X) (0x8001070000000580 + (X) * 8)
>> +#define OCTEON_CIU3_WDOG(X) (0x8001010000020000 + (X) * 8)
>> +#define OCTEON_CIU3_PP_POKE(X) (0x8001010000030000 + (X) * 8)
>> +#define OCTEON_OCX_COM_LINKX_CTL(X) (0x8001180011000020 + (X) * 8)
>> +#define OCTEON_SLI_CTL_STATUS 0x80011F0000028570
>> +#define OCTEON_GSERX_SCRATCH(X) (0x8001180090000020 + (X) * 0x1000000)
>> +
>> +/** PRID for CN56XX */
>> +#define OCTEON_PRID_CN56XX 0x04
>> +/** PRID for CN52XX */
>> +#define OCTEON_PRID_CN52XX 0x07
>> +/** PRID for CN63XX */
>> +#define OCTEON_PRID_CN63XX 0x90
>> +/** PRID for CN68XX */
>> +#define OCTEON_PRID_CN68XX 0x91
>> +/** PRID for CN66XX */
>> +#define OCTEON_PRID_CN66XX 0x92
>> +/** PRID for CN61XX */
>> +#define OCTEON_PRID_CN61XX 0x93
>> +/** PRID for CNF71XX */
>> +#define OCTEON_PRID_CNF71XX 0x94
>> +/** PRID for CN78XX */
>> +#define OCTEON_PRID_CN78XX 0x95
>> +/** PRID for CN70XX */
>> +#define OCTEON_PRID_CN70XX 0x96
>> +/** PRID for CN73XX */
>> +#define OCTEON_PRID_CN73XX 0x97
>> +/** PRID for CNF75XX */
>> +#define OCTEON_PRID_CNF75XX 0x98
>> +
>> +/* func argument is used to create a mark, must be unique */
>> +#define GETOFFSET(reg, func) \
>> + .balign 8; \
>> + bal func ##_mark; \
>> + nop; \
>> + .dword .; \
>> +func ##_mark: \
>> + ld reg, 0(ra); \
>> + dsubu reg, ra, reg;
>> +
>> +#define JAL(func) \
>> + .balign 8; \
>> + bal func ##_mark; \
>> + nop; \
>> + .dword .; \
>> +func ##_mark: \
>> + ld t8, 0(ra); \
>> + dsubu t8, ra, t8; \
>> + dla t9, func; \
>> + daddu t9, t9, t8; \
>> + jalr t9; \
>> + nop;
>> +
>> + .set arch=octeon3
>> + .set noreorder
>> +
>> + .macro uhi_mips_exception
>> + move k0, t9 # preserve t9 in k0
>> + move k1, a0 # preserve a0 in k1
>> + li t9, 15 # UHI exception operation
>> + li a0, 0 # Use hard register context
>> + sdbbp 1 # Invoke UHI operation
>> + .endm
>> +
>> + .macro setup_stack_gd
>> + li t0, -16
>> + PTR_LI t1, big_stack_start
>> + and sp, t1, t0 # force 16 byte alignment
>> + PTR_SUBU \
>> + sp, sp, GD_SIZE # reserve space for gd
>> + and sp, sp, t0 # force 16 byte alignment
>> + move k0, sp # save gd pointer
>> +#if CONFIG_VAL(SYS_MALLOC_F_LEN) && \
>> + !CONFIG_IS_ENABLED(INIT_STACK_WITHOUT_MALLOC_F)
>> + li t2, CONFIG_VAL(SYS_MALLOC_F_LEN)
>> + PTR_SUBU \
>> + sp, sp, t2 # reserve space for early malloc
>> + and sp, sp, t0 # force 16 byte alignment
>> +#endif
>> + move fp, sp
>> +
>> + /* Clear gd */
>> + move t0, k0
>> +1:
>> + PTR_S zero, 0(t0)
>> + PTR_ADDIU t0, PTRSIZE
>> + blt t0, t1, 1b
>> + nop
>> +
>> +#if CONFIG_VAL(SYS_MALLOC_F_LEN) && \
>> + !CONFIG_IS_ENABLED(INIT_STACK_WITHOUT_MALLOC_F)
>> + PTR_S sp, GD_MALLOC_BASE(k0) # gd->malloc_base offset
>> +#endif
>> + .endm
>> +
>> +/* Saved register usage:
>> + * s0: not used
>> + * s1: not used
>> + * s2: Address U-Boot loaded into in L2 cache
>> + * s3: Start address
>> + * s4: flags
>> + * 1: booting from RAM
>> + * 2: executing out of cache
>> + * 4: booting from flash
>> + * s5: u-boot size (data end - _start)
>> + * s6: offset in flash.
>> + * s7: _start physical address
>> + * s8:
>> + */
>> +
>> +ENTRY(_start)
>> + /* U-Boot entry point */
>> + b reset
>> +
>> + /* The above jump instruction/nop are considered part of the
>> + * bootloader_header_t structure but are not changed when the header is
>> + * updated.
>> + */
>> +
>> + /* Leave room for bootloader_header_t header at start of binary. This
>> + * header is used to identify the board the bootloader is for, what
>> + * address it is linked at, failsafe/normal, etc. It also contains a
>> + * CRC of the entire image.
>> + */
>> +
>> +#if defined(CONFIG_ROM_EXCEPTION_VECTORS)
>> + /*
>> + * Exception vector entry points. When running from ROM, an exception
>> + * cannot be handled. Halt execution and transfer control to debugger,
>> + * if one is attached.
>> + */
>> + .org 0x200
>> + /* TLB refill, 32 bit task */
>> + uhi_mips_exception
>> +
>> + .org 0x280
>> + /* XTLB refill, 64 bit task */
>> + uhi_mips_exception
>> +
>> + .org 0x300
>> + /* Cache error exception */
>> + uhi_mips_exception
>> +
>> + .org 0x380
>> + /* General exception */
>> + uhi_mips_exception
>> +
>> + .org 0x400
>> + /* Catch interrupt exceptions */
>> + uhi_mips_exception
>> +
>> + .org 0x480
>> + /* EJTAG debug exception */
>> +1: b 1b
>> + nop
>> +
>> + .org 0x500
>> +#endif
>> +
>> +/* Reserve extra space so that when we use the boot bus local memory
>> + * segment to remap the debug exception vector we don't overwrite
>> + * anything useful
>> + */
>> +
>> +/* Basic exception handler (dump registers) in all ASM. When using the TLB for
>> + * mapping u-boot C code, we can't branch to that C code for exception handling
>> + * (TLB is disabled for some exceptions.
>> + */
>> +
>> +/* RESET/start here */
>> + .balign 8
>> +reset:
>> + nop
>> + synci 0(zero)
>> + mfc0 k0, CP0_STATUS
>> + ori k0, 0x00E0 /* enable 64 bit mode for CSR access */
>> + mtc0 k0, CP0_STATUS
>> +
>> + /* Save the address we're booting from, strip off low bits */
>> + bal 1f
>> + nop
>> +1:
>> + move s3, ra
>> + dins s3, zero, 0, 12
>> +
>> + /* Disable boot bus moveable regions */
>> + PTR_LI k0, OCTEON_MIO_BOOT_LOC_CFG0
>> + sd zero, 0(k0)
>> + sd zero, 8(k0)
>> +
>> + /* Disable the watchdog timer
>> + * First we check if we're running on CN78XX, CN73XX or CNF75XX to see
>> + * if we use CIU3 or CIU.
>> + */
>> + mfc0 t0, CP0_PRID
>> + ext t0, t0, 8, 8
>> + /* Assume CIU */
>> + PTR_LI t1, OCTEON_CIU_WDOG(0)
>> + PTR_LI t2, OCTEON_CIU_PP_POKE(0)
>> + blt t0, OCTEON_PRID_CN78XX, wd_use_ciu
>> + nop
>> + beq t0, OCTEON_PRID_CN70XX, wd_use_ciu
>> + nop
>> + /* Use CIU3 */
>> + PTR_LI t1, OCTEON_CIU3_WDOG(0)
>> + PTR_LI t2, OCTEON_CIU3_PP_POKE(0)
>> +wd_use_ciu:
>> + sd zero, 0(t2) /* Pet the dog */
>> + sd zero, 0(t1) /* Disable watchdog timer */
>> +
>> + /* Errata: CN76XX has a node ID of 3. change it to zero here.
>> + * This needs to be done before we relocate to L2 as addresses change
>> + * For 76XX pass 1.X we need to zero out the OCX_COM_NODE[ID],
>> + * L2C_OCI_CTL[GKSEGNODE] and CP0 of Root.CvmMemCtl2[KSEGNODE].
>> + */
>> + mfc0 a4, CP0_PRID
>> + /* Check for 78xx pass 1.x processor ID */
>> + andi a4, 0xffff
>> + blt a4, (OCTEON_PRID_CN78XX << 8), 1f
>> + nop
>> +
>> + /* Zero out alternate package for now */
>> + dins a4, zero, 6, 1
>> + bge a4, ((OCTEON_PRID_CN78XX << 8) | 0x08), 1f
>> + nop
>> +
>> + /* 78xx or 76xx here, first check for bug #27141 */
>> + PTR_LI a5, OCTEON_SLI_CTL_STATUS
>> + ld a6, 0(a5)
>> + andi a7, a4, 0xff
>> + andi a6, a6, 0xff
>> +
>> + beq a6, a7, not_bug27141
>> + nop
>> +
>> + /* core 0 proc_id rev_id field does not match SLI_CTL_STATUS rev_id */
>> + /* We just hit bug #27141. Need to reset the chip and try again */
>> +
>> + PTR_LI a4, OCTEON_RST_SOFT_RST
>> + ori a5, zero, 0x1 /* set the reset bit */
>> +
>> +reset_78xx_27141:
>> + sync
>> + synci 0(zero)
>> + cache 9, 0(zero)
>> + sd a5, 0(a4)
>> + wait
>> + b reset_78xx_27141
>> + nop
>> +
>> +not_bug27141:
>> + /* 76XX pass 1.x has the node number set to 3 */
>> + mfc0 a4, CP0_EBASE
>> + ext a4, a4, 0, 10
>> + bne a4, 0x180, 1f /* Branch if not node 3 core 0 */
>> + nop
>> +
>> + /* Clear OCX_COM_NODE[ID] */
>> + PTR_LI a5, OCTEON_OCX_COM_NODE
>> + ld a4, 0(a5)
>> + dins a4, zero, 0, 2
>> + sd a4, 0(a5)
>> + ld zero, 0(a5)
>> +
>> + /* Clear L2C_OCI_CTL[GKSEGNODE] */
>> + PTR_LI a5, OCTEON_L2C_OCI_CTL
>> + ld a4, 0(a5)
>> + dins a4, zero, 4, 2
>> + sd a4, 0(a5)
>> + ld zero, 0(a5)
>> +
>> + /* Clear CP0 Root.CvmMemCtl2[KSEGNODE] */
>> + dmfc0 a4, CP0_CVMMEMCTL2
>> + dins a4, zero, 12, 2
>> + dmtc0 a4, CP0_CVMMEMCTL2
>> +
>> + /* Put the flash address in the start of the EBASE register to
>> + * enable our exception handler but only for core 0.
>> + */
>> + mfc0 a4, CP0_EBASE
>> + dext a4, a4, 0, 10
>> + bnez a4, no_flash
>> + /* OK in delay slot */
>> + dext a6, a6, 0, 16 /* Get the base address in flash */
>> + sll a6, a6, 16
>> + mtc0 a6, CP0_EBASE /* Enable exceptions */
>> +
>> +no_flash:
>> + /* Zero out various registers */
>> + mtc0 zero, CP0_DEPC
>> + mtc0 zero, CP0_EPC
>> + mtc0 zero, CP0_CAUSE
>> + mfc0 a4, CP0_PRID
>> + ext a4, a4, 8, 8
>> + mtc0 zero, CP0_DESAVE
>> +
>> + /* The following are only available on Octeon 2 or later */
>> + mtc0 zero, CP0_KSCRATCH1
>> + mtc0 zero, CP0_KSCRATCH2
>> + mtc0 zero, CP0_KSCRATCH3
>> + mtc0 zero, CP0_USERLOCAL
>> +
>> + /* Turn off ROMEN bit to disable ROM */
>> + PTR_LI a1, OCTEON_MIO_RST_BOOT
>> + /* For OCTEON 3 we use RST_BOOT instead of MIO_RST_BOOT.
>> + * The difference is bits 24-26 are 6 instead of 0 for the address.
>> + */
>> + /* For Octeon 2 and CN70XX we can ignore the watchdog */
>> + blt a4, OCTEON_PRID_CN78XX, watchdog_ok
>> + nop
>> +
>> + PTR_LI a1, OCTEON_RST_BOOT
>> +
>> + beq a4, OCTEON_PRID_CN70XX, watchdog_ok
>> + nop
>> +
>> + ld a2, 0(a1)
>> + /* There is a bug where some registers don't get properly reset when
>> + * the watchdog timer causes a reset. In this case we need to force
>> + * a reset.
>> + */
>> + bbit0 a2, 11, watchdog_ok /* Skip if watchdog not hit */
>> + dins a2, zero, 2, 18 /* Don't clear LBOOT, LBOOT_EXT or LBOOT_OCI */
>> + /* Clear bit indicating reset due to watchdog */
>> + ori a2, 1 << 11
>> + sd a2, 0(a1)
>> +
>> + /* Disable watchdog */
>> + PTR_LI a1, OCTEON_CIU3_PP_POKE(0)
>> + sd zero, 0(a1)
>> + PTR_LI a1, OCTEON_CIU3_WDOG(0)
>> + sd zero, 0(a1)
>> +
>> + /* Record this in the GSER0_SCRATCH register in bit 11 */
>> + PTR_LI a1, OCTEON_GSERX_SCRATCH(0)
>> + ld a2, 0(a1)
>> + ori a2, 1 << 11
>> + sd a2, 0(a1)
>> +
>> + PTR_LI a1, OCTEON_RST_SOFT_RST
>> + li a2, 1
>> + sd a2, 0(a1)
>> + wait
>> +
>> + /* We should never get here */
>> +
>> +watchdog_ok:
>> + ld a2, 0(a1)
>> + /* Don't clear LBOOT/LBOOT_EXT or LBOOT_OCI */
>> + dins a2, zero, 2, 18
>> + dins a2, zero, 60, 1 /* Clear ROMEN bit */
>> + sd a2, 0(a1)
>> +
>> + /* Start of Octeon setup */
>> +
>> + /* Check what core we are - if core 0, branch to init tlb
>> + * loop in flash. Otherwise, look up address of init tlb
>> + * loop that was saved in the boot vector block.
>> + */
>> + mfc0 a0, CP0_EBASE
>> + andi a0, EBASE_CPUNUM /* get core */
>> + beqz a0, InitTLBStart_local
>> + nop
>> +
>> + break
>> + /* We should never get here - non-zero cores now go directly to
>> + * tlb init from the boot stub in movable region.
>> + */
>> +
>> + .globl InitTLBStart
>> +InitTLBStart:
>> +InitTLBStart_local:
>> + /* If we don't have working memory yet configure a bunch of
>> + * scratch memory, and set the stack pointer to the top
>> + * of it. This allows us to go to C code without having
>> + * memory set up
>> + *
>> + * Warning: do not change SCRATCH_STACK_LINES as this can impact the
>> + * transition from start.S to crti.asm. crti requires 590 bytes of
>> + * stack space.
>> + */
>> + cache 1,0(zero) /* Clear Dcache so cvmseg works right */
>> +#if CONFIG_OCTEON_BIG_STACK_SIZE
>> + rdhwr v0, $0
>> + bnez v0, 1f
>> + nop
>> + PTR_LA sp, big_stack_start - 16
>> + b stack_clear_done
>> + nop
>> +1:
>> +#endif
>> +#define SCRATCH_STACK_LINES 0x36 /* MAX is 0x36 */
>> + dmfc0 v0, CP0_CVMMEMCTL
>> + dins v0, zero, 0, 9
>> + /* setup SCRATCH_STACK_LINES scratch lines of scratch */
>> + ori v0, 0x100 | SCRATCH_STACK_LINES
>> + dmtc0 v0, CP0_CVMMEMCTL
>> + /* set stack to top of scratch memory */
>> + li sp, 0xffffffffffff8000 + (SCRATCH_STACK_LINES * 128)
>> + /* Clear scratch for CN63XX pass 2.0 errata Core-15169*/
>> + li t0, 0xffffffffffff8000
>> +clear_scratch:
>> + sd zero, 0(t0)
>> + addiu t0, 8
>> + bne t0, sp, clear_scratch
>> + nop
>> +
>> + /* This code run on all cores - core 0 from flash,
>> + * the rest from DRAM. When booting from PCI, non-zero cores
>> + * come directly here from the boot vector - no earlier code in this
>> + * file is executed.
>> + */
>> +
>> + /* Some generic initialization is done here as well, as we need this
>> + * done on all cores even when booting from PCI
>> + */
>> +stack_clear_done:
>> + /* Clear watch registers. */
>> + mtc0 zero, CP0_WATCHLO
>> + mtc0 zero, CP0_WATCHHI
>> +
>> + /* STATUS register */
>> + mfc0 k0, CP0_STATUS
>> + li k1, ~ST0_IE
>> + and k0, k1
>> + mtc0 k0, CP0_STATUS
>> +
>> + /* CAUSE register */
>> + mtc0 zero, CP0_CAUSE
>> +
>> + /* Init Timer */
>> + dmtc0 zero, CP0_COUNT
>> + dmtc0 zero, CP0_COMPARE
>> +
>> +
>> + mfc0 a5, CP0_STATUS
>> + li v0, 0xE0 /* enable 64 bit mode for CSR access */
>> + or v0, v0, a5
>> + mtc0 v0, CP0_STATUS
>> +
>> +
>> + dli v0, 1 << 29 /* Enable large physical address support in TLB */
>> + mtc0 v0, CP0_PAGEGRAIN
>> +
>> +InitTLB:
>> + dmtc0 zero, CP0_ENTRYLO0
>> + dmtc0 zero, CP0_ENTRYLO1
>> + mtc0 zero, CP0_PAGEMASK
>> + dmtc0 zero, CP0_CONTEXT
>> + /* Use an offset into kseg0 so we won't conflict with Mips1 legacy
>> + * TLB clearing
>> + */
>> + PTR_LI v0, 0xFFFFFFFF90000000
>> + mfc0 a0, CP0_CONFIG1
>> + srl a0, a0, 25
>> + /* Check if config4 reg present */
>> + mfc0 a1, CP0_CONFIG3
>> + bbit0 a1, 31, 2f
>> + and a0, a0, 0x3F /* a0 now has the max mmu entry index */
>> + mfc0 a1, CP0_CONFIG4
>> + bbit0 a1, 14, 2f /* check config4[MMUExtDef] */
>> + nop
>> + /* append config4[MMUSizeExt] to most significant bit of
>> + * config1[MMUSize-1]
>> + */
>> + ins a0, a1, 6, 8
>> + and a0, a0, 0x3fff /* a0 now includes max entries for cn6xxx */
>> +2:
>> + dmtc0 zero, CP0_XCONTEXT
>> + mtc0 zero, CP0_WIRED
>> +
>> +InitTLBloop:
>> + dmtc0 v0, CP0_ENTRYHI
>> + tlbp
>> + mfc0 v1, CP0_INDEX
>> + daddiu v0, v0, 1<<13
>> + bgez v1, InitTLBloop
>> +
>> + mtc0 a0, CP0_INDEX
>> + tlbwi
>> + bnez a0, InitTLBloop
>> + daddiu a0, -1
>> +
>> + mthi zero
>> + mtlo zero
>> +
>> + /* Set up status register */
>> + mfc0 v0, CP0_STATUS
>> + /* Enable COP0 and COP2 access */
>> + li a4, (1 << 28) | (1 << 30)
>> + or v0, a4
>> +
>> + /* Must leave BEV set here, as DRAM is not configured for core 0.
>> + * Also, BEV must be 1 later on when the exception base address is set.
>> + */
>> +
>> + /* Mask all interrupts */
>> + ins v0, zero, 0, 16
>> + /* Clear NMI (used to start cores other than core 0) */
>> + ori v0, 0xE4 /* enable 64 bit, disable interrupts */
>> + mtc0 v0, CP0_STATUS
>> +
>> + dli v0,0xE000000F /* enable all readhw locations */
>> + mtc0 v0, CP0_HWRENA
>> +
>> + dmfc0 v0, CP0_CVMCTL
>> + ori v0, 1<<14 /* enable fixup of unaligned mem access */
>> + dmtc0 v0, CP0_CVMCTL
>> +
>> + /* Setup scratch memory. This is also done in
>> + * cvmx_user_app_init, and this code will be removed
>> + * from the bootloader in the near future.
>> + */
>> +
>> + /* Set L2C_LAD_CTL[MAXLFB] = 0 on CN73XX */
>> + mfc0 a4, CP0_PRID
>> + ext a4, a4, 8, 8
>> + blt a4, OCTEON_PRID_CN73XX, 72f
>> + nop
>> + PTR_LI v0, OCTEON_L2C_TAD_CTL
>> + ld t1, 0(v0)
>> + dins t1, zero, 0, 4
>> + sd t1, 0(v0)
>> + ld zero, 0(v0)
>> +
>> +72:
>> +
>> + /* clear these to avoid immediate interrupt in noperf mode */
>> + dmtc0 zero, CP0_COMPARE /* clear timer interrupt */
>> + dmtc0 zero, CP0_COUNT /* clear timer interrupt */
>> + dmtc0 zero, CP0_PERF_CNT0 /* clear perfCnt0 */
>> + dmtc0 zero, CP0_PERF_CNT1 /* clear perfCnt1 */
>> + dmtc0 zero, CP0_PERF_CNT2
>> + dmtc0 zero, CP0_PERF_CNT3
>> +
>> + /* If we're running on a node other than 0 then we need to set KSEGNODE
>> + * to 0. The nice thing with this code is that it also autodetects if
>> + * we're running on a processor that supports CVMMEMCTL2 or not since
>> + * only processors that have this will have a non-zero node ID. Because
>> + * of this there's no need to check if we're running on a 78XX.
>> + */
>> + mfc0 t1, CP0_EBASE
>> + dext t1, t1, 7, 3 /* Extract node number */
>> + beqz t1, is_node0 /* If non-zero then we're not node 0 */
>> + nop
>> + dmfc0 t1, CP0_CVMMEMCTL2
>> + dins t1, zero, 12, 4
>> + dmtc0 t1, CP0_CVMMEMCTL2
>> +is_node0:
>> +
>> + /* Set up TLB mappings for u-boot code in flash. */
>> +
>> + /* Use a bal to get the current PC into ra. Since this bal is to
>> + * the address immediately following the delay slot, the ra is
>> + * the address of the label. We then use this to get the actual
>> + * address that we are executing from.
>> + */
>> + bal __dummy
>> + nop
>> +
>> +__dummy:
>> + /* Get the actual address that we are running at */
>> + PTR_LA a6, _start /* Linked address of _start */
>> + PTR_LA a7, __dummy
>> + dsubu t0, a7, a6 /* offset of __dummy label from _start*/
>> + dsubu a7, ra, t0 /* a7 now has actual address of _start*/
>> +
>> + /* Save actual _start address in s7. This is where we
>> + * are executing from, as opposed to where the code is
>> + * linked.
>> + */
>> + move s7, a7
>> + move s4, zero
>> +
>> + /* s7 has actual address of _start. If this is
>> + * on the boot bus, it will be between 0xBFC000000 and 0xBFFFFFFF.
>> + * If it is on the boot bus, use 0xBFC00000 as the physical address
>> + * for the TLB mapping, as we will be adjusting the boot bus
>> + * to make this adjustment.
>> + * If we are running from DRAM (remote-boot), then we want to use the
>> + * real address in DRAM.
>> + */
>> +
>> + /* Check to see if we are running from flash - we expect that to
>> + * be 0xffffffffb0000000-0xffffffffbfffffff
>> + * (0x10000000-0x1fffffff, unmapped/uncached)
>> + */
>> + dli t2, 0xffffffffb0000000
>> + dsubu t2, s7
>> + slt s4, s7, t2
>> + bltz t2, uboot_in_flash
>> + nop
>> +
>> + /* If we're not core 0 then we don't care about cache */
>> + mfc0 t2, CP0_EBASE
>> + andi t2, EBASE_CPUNUM
>> + bnez t2, uboot_in_ram
>> + nop
>> +
>> + /* Find out if we're OCTEON I or OCTEON + which don't support running
>> + * out of cache.
>> + */
>> + mfc0 t2, CP0_PRID
>> + ext t2, t2, 8, 8
>> + li s4, 1
>> + blt t2, 0x90, uboot_in_ram
>> + nop
>> +
>> + /* U-Boot can be executing either in RAM or L2 cache. Now we need to
>> + * check if DRAM is initialized. The way we do that is to look at
>> + * the reset bit of the LMC0_DDR_PLL_CTL register (bit 7)
>> + */
>> + PTR_LI t2, OCTEON_LMC0_DDR_PLL_CTL
>> + ld t2, 0(t2)
>> + bbit1 t2, 7, uboot_in_ram
>> + nop
>> +
>> + /* We must be executing out of cache */
>> + b uboot_in_ram
>> + li s4, 2
>> +
>> +uboot_in_flash:
>> + /* Set s4 to 4 to indicate we're running in FLASH */
>> + li s4, 4
>> +
>> +#if defined(CONFIG_OCTEON_DISABLE_L2_CACHE_INDEX_ALIASING)
>> + /* By default, L2C index aliasing is enabled. In some cases it may
>> + * need to be disabled. The L2C index aliasing can only be disabled
>> + * if U-Boot is running out of L2 cache and the L2 cache has not been
>> + * used to store anything.
>> + */
>> + PTR_LI t1, OCTEON_L2C_CTL
>> + ld t2, 0(t1)
>> + ori t2, 1
>> + sd t2, 0(t1)
>> +#endif
>> +
>> + /* Use BFC00000 as physical address for TLB mappings when booting
>> + * from flash, as we will adjust the boot bus mappings to make this
>> + * mapping correct.
>> + */
>> + dli a7, 0xFFFFFFFFBFC00000
>> + dsubu s6, s7, a7 /* Save flash offset in s6 */
>> +
>> +#if defined(CONFIG_OCTEON_COPY_FROM_FLASH_TO_L2)
>> + /* For OCTEON II we check to see if the L2 cache is big enough to hold
>> + * U-Boot. If it is big enough then we copy ourself from flash to the
>> + * L2 cache in order to speed up execution.
>> + */
>> +
>> + /* Check for OCTEON 2 */
>> + mfc0 t1, CP0_PRID
>> + ext t1, t1, 8, 8
>> + /* Get number of L2 cache sets */
>> + beq t1, OCTEON_PRID_CNF71XX, got_l2_sets /* CNF71XX */
>> + li t2, 1 << 9
>> + beq t1, OCTEON_PRID_CN78XX, got_l2_sets /* CN78XX */
>> + li t2, 1 << 13
>> + beq t1, OCTEON_PRID_CN70XX, got_l2_sets /* CN70XX */
>> + li t2, 1 << 10
>> + beq t1, OCTEON_PRID_CN73XX, got_l2_sets /* CN73XX */
>> + li t2, 1 << 11
>> + beq t1, OCTEON_PRID_CNF75XX, got_l2_sets /* CNF75XX */
>> + li t2, 1 << 11
>> + b l2_cache_too_small /* Unknown OCTEON model */
>> + nop
>> +
>> +got_l2_sets:
>> + /* Get number of associations */
>> + PTR_LI t0, OCTEON_MIO_FUSE_DAT3
>> + ld t0, 0(t0)
>> + dext t0, t0, 32, 3
>> +
>> + beq t1, OCTEON_PRID_CN70XX, process_70xx_l2sets
>> + nop
>> + /* 0 = 16-way, 1 = 12-way, 2 = 8-way, 3 = 4-way, 4-7 reserved */
>> + beqz t0, got_l2_ways
>> + li t3, 16
>> + beq t0, 1, got_l2_ways
>> + li t3, 12
>> + beq t0, 2, got_l2_ways
>> + li t3, 8
>> + beq t0, 3, got_l2_ways
>> + li t3, 4
>> + b l2_cache_too_small
>> + nop
>> +
>> +process_70xx_l2sets:
>> + /* For 70XX, the number of ways is defined as:
>> + * 0 - full cache (4-way) 512K
>> + * 1 - 3/4 ways (3-way) 384K
>> + * 2 - 1/2 ways (2-way) 256K
>> + * 3 - 1/4 ways (1-way) 128K
>> + * 4-7 illegal (aliased to 0-3)
>> + */
>> + andi t0, 3
>> + beqz t0, got_l2_ways
>> + li t3, 4
>> + beq t0, 1, got_l2_ways
>> + li t3, 3
>> + beq t0, 2, got_l2_ways
>> + li t3, 2
>> + li t3, 1
>> +
>> +got_l2_ways:
>> + dmul a1, t2, t3 /* Calculate cache size */
>> + dsll a1, 7 /* Ways * Sets * cache line sz (128) */
>> + daddiu a1, a1, -128 /* Adjust cache size for copy code */
>> +
>> + /* Calculate size of U-Boot image */
>> + /*
>> + * "uboot_end - _start" is not correct, as the image also
>> + * includes the DTB appended to the end (OF_EMBED is deprecated).
>> + * Lets use a defined max for now here.
>> + */
>> + PTR_LI s5, CONFIG_BOARD_SIZE_LIMIT
>> +
>> + daddu t2, s5, s7 /* t2 = end address */
>> + daddiu t2, t2, 127
>> + ins t2, zero, 0, 7 /* Round up to cache line for memcpy */
>> +
>> + slt t1, a1, s5 /* See if we're bigger than the L2 cache */
>> + bnez t1, l2_cache_too_small
>> + nop
>> + /* Address we plan to load at in the L2 cache */
>> + PTR_LI t9, CONFIG_OCTEON_L2_UBOOT_ADDR
>> +# ifdef CONFIG_OCTEON_L2_MEMCPY_IN_CACHE
>> + /* Enable all ways for PP0. Authentik ROM may have disabled these */
>> + PTR_LI a1, OCTEON_L2C_WPAR_PP0
>> + sd zero, 0(a1)
>> +
>> + /* Address to place our memcpy code */
>> + PTR_LI a0, CONFIG_OCTEON_L2_MEMCPY_ADDR
>> + /* The following code writes a simple memcpy routine into the cache
>> + * to copy ourself from flash into the L2 cache. This makes the
>> + * memcpy routine a lot faster since each instruction can potentially
>> + * require four read cycles to flash over the boot bus.
>> + */
>> + /* Zero cache line in the L2 cache */
>> + zcb (a0)
>> + synci 0(zero)
>> + dli a1, 0xdd840000dd850008 /* ld a0, 0(t0); ld a1, 8(t0) */
>> + sd a1, 0(a0)
>> + dli a1, 0xdd860010dd870018 /* ld a2, 16(t0); ld a3, 24(t0) */
>> + sd a1, 8(a0)
>> + dli a1, 0xfda40000fda50008 /* sd a0, 0(t1); sd a1, 8(t1) */
>> + sd a1, 16(a0)
>> + dli a1, 0xfda60010fda70018 /* sd a2, 16(t1); sd a3, 24(t1) */
>> + sd a1, 24(a0)
>> + dli a1, 0x258c0020158efff6 /* addiu t0, 32; bne t0, t2, -40 */
>> + sd a1, 32(a0)
>> + dli a1, 0x25ad002003e00008 /* addiu t1, 32; jr ra */
>> + sd a1, 40(a0)
>> + sd zero, 48(a0) /* nop; nop */
>> +
>> + /* Synchronize the caches */
>> + sync
>> + synci 0(zero)
>> +
>> + move t0, s7
>> + move t1, t9
>> +
>> + /* Do the memcpy operation in L2 cache to copy ourself from flash
>> + * to the L2 cache.
>> + */
>> + jalr a0
>> + nop
>> +
>> +# else
>> + /* Copy ourself to the L2 cache from flash, 32 bytes at a time */
>> + /* This code is now written to the L2 cache using the code above */
>> +1:
>> + ld a0, 0(t0)
>> + ld a1, 8(t0)
>> + ld a2, 16(t0)
>> + ld a3, 24(t0)
>> + sd a0, 0(t1)
>> + sd a1, 8(t1)
>> + sd a2, 16(t1)
>> + sd a3, 24(t1)
>> + addiu t0, 32
>> + bne t0, t2, 1b
>> + addiu t1, 32
>> +# endif /* CONFIG_OCTEON_L2_MEMCPY_IN_CACHE */
>> +
>> + /* Adjust the start address of U-Boot and the global pointer */
>> + subu t0, s7, t9 /* t0 = address difference */
>> + move s7, t9 /* Update physical address */
>> + move s2, t9
>> + sync
>> + synci 0(zero)
>> +
>> + /* Now we branch to the L2 cache. We first get our PC then adjust it
>> + */
>> + bal 3f
>> + nop
>> +3:
>> + /* Don't add any instructions here! */
>> + subu t9, ra, t0
>> + /* Give ourself 16 bytes */
>> + addiu t9, 0x10
>> +
>> + jal t9 /* Branch to address in L2 cache */
>> +
>> + nop
>> + nop
>> + /* Add instructions after here */
>> +
>> + move a7, s7
>> +
>> + b uboot_in_ram
>> + ori s4, 2 /* Running out of L2 cache */
>> +
>> +l2_cache_too_small: /* We go here if we can't copy ourself to L2 */
>> +#endif /* CONFIG_OCTEON_COPY_FROM_FLASH_TO_L2 */
>> +
>> + /* This code is only executed if booting from flash. */
>> + /* For flash boot (_not_ RAM boot), we do a workaround for
>> + * an LLM errata on CN38XX and CN58XX parts.
>> + */
>> +
>> +uboot_in_ram:
>> + /* U-boot address is now in reg a7, and is 4 MByte aligned.
>> + * (boot bus addressing has been adjusted to make this happen for flash,
>> + * and for DRAM this alignment must be provided by the remote boot
>> + * utility.
>> + */
>> + /* See if we're in KSEG0 range, if so set EBASE register to handle
>> + * exceptions.
>> + */
>> + dli a1, 0x20000000
>> + bge a7, a1, 1f
>> + nop
>> + /* Convert our physical address to KSEG0 */
>> + PTR_LI a1, 0xffffffff80000000
>> + or a1, a1, a7
>> + mtc0 a1, CP0_EBASE
>> +1:
>> + /* U-boot now starts at 0xBFC00000. Use a single 4 MByte TLB mapping
>> + * to map u-boot.
>> + */
>> + move a0, a6 /* Virtual addr in a0 */
>> + dins a0, zero, 0, 16 /* Zero out offset bits */
>> + move a1, a7 /* Physical addr in a1 */
>> +
>> + /* Now we need to remove the MIPS address space bits. For this we
>> + * need to determine if it is a 32 bit compatibility address or not.
>> + */
>> +
>> + /* 'lowest' address in compatibility space */
>> + PTR_LI t0, 0xffffffff80000000
>> + dsubu t0, t0, a1
>> + bltz t0, compat_space
>> + nop
>> +
>> + /* We have a xkphys address, so strip off top bit */
>> + b addr_fixup_done
>> + dins a1, zero, 63, 1
>> +
>> +compat_space:
>> + PTR_LI a2, 0x1fffffff
>> + and a1, a1, a2 /* Mask phy addr to remove address space bits */
>> +
>> +addr_fixup_done:
>> + /* Currenty the u-boot image size is limited to 4 MBytes. In order to
>> + * support larger images the flash mapping will need to be changed to
>> + * be able to access more than that before C code is run. Until that
>> + * is done, we just use a 4 MByte mapping for the secondary cores as
>> + * well.
>> + */
>> + /* page size (only support 4 Meg binary size for now for core 0)
>> + * This limitation is due to the fact that the boot vector is
>> + * 0xBFC00000 which only makes 4MB available. Later more flash
>> + * address space will be available after U-Boot has been copied to
>> + * RAM. For now assume that it is in flash.
>> + */
>> + li a2, 2*1024*1024
>> +
>> + mfc0 a4, CP0_EBASE
>> + andi a4, EBASE_CPUNUM /* get core */
>> + beqz a4, core_0_tlb
>> + nop
>> +
>> + /* Now determine how big a mapping to use for secondary cores,
>> + * which need to map all of u-boot + heap in DRAM
>> + */
>> + /* Here we look at the alignment of the the physical address,
>> + * and use the largest page size possible. In some cases
>> + * this can result in an oversize mapping, but for secondary cores
>> + * this mapping is very short lived.
>> + */
>> +
>> + /* Physical address in a1 */
>> + li a2, 1
>> +1:
>> + sll a2, 1
>> + and a5, a1, a2
>> + beqz a5, 1b
>> + nop
>> +
>> + /* a2 now contains largest page size we can use */
>> +core_0_tlb:
>> + JAL(single_tlb_setup)
>> +
>> + /* Check if we're running from cache */
>> + bbit1 s4, 1, uboot_in_cache
>> + nop
>> +
>> + /* If we are already running from ram, we don't need to muck
>> + * with boot bus mappings.
>> + */
>> + PTR_LI t2, 0xffffffffb0000000
>> + dsubu t2, s7
>> + /* See if our starting address is lower than the boot bus */
>> + bgez t2, uboot_in_ram2 /* If yes, booting from RAM */
>> + nop
>> +
>> +uboot_in_cache:
>> +#if CONFIG_OCTEON_BIG_STACK_SIZE
>> + /* The large stack is only for core 0. For all other cores we need to
>> + * use the L1 cache otherwise the other cores will stomp on top of each
>> + * other unless even more space is reserved for the stack space for
>> + * each core. With potentially 96 cores this gets excessive.
>> + */
>> + mfc0 v0, CP0_EBASE
>> + andi a0, EBASE_CPUNUM
>> + bnez a0, no_big_stack
>> + nop
>> + PTR_LA sp, big_stack_start
>> + daddiu sp, -16
>> +
>> +no_big_stack:
>> +#endif
>> + /* We now have the TLB set up, so we need to remap the boot bus.
>> + * This is tricky, as we are running from flash, and will be changing
>> + * the addressing of the flash.
>> + */
>> + /* Enable movable boot bus region 0, at address 0x10000000 */
>> + PTR_LI a4, OCTEON_MIO_BOOT_BASE
>> + dli a5, 0x81000000 /* EN + base address 0x11000000 */
>> + sd a5, OCTEON_MIO_BOOT_LOC_CFG0_OFF(a4)
>> +
>> + /* Copy code to that remaps the boot bus to movable region */
>> + sd zero, OCTEON_MIO_BOOT_LOC_DAT_OFF(a4)
>> +
>> + PTR_LA a6, change_boot_mappings
>> + GETOFFSET(a5, change_boot_mappings);
>> + daddu a5, a5, a6
>> +
>> + /* The code is 16 bytes (2 DWORDS) */
>> + ld a7, 0(a5)
>> + sd a7, OCTEON_MIO_BOOT_LOC_DAT_OFF(a4)
>> + ld a7, 8(a5)
>> + sd a7, OCTEON_MIO_BOOT_LOC_DAT_OFF(a4)
>> +
>> + /* Read from an RML register to ensure that the previous writes have
>> + * completed before we branch to the movable region.
>> + */
>> + ld zero, OCTEON_MIO_BOOT_LOC_CFG0_OFF(a4)
>> +
>> + /* Compute value for boot bus configuration register */
>> + /* Read region 0 config so we can _modify_ the base address field */
>> + PTR_LI a4, OCTEON_MIO_BOOT_REG_CFG0 /* region 0 config */
>> + ld a0, 0(a4)
>> + dli a4, 0xf0000000 /* Mask off bits we want to save */
>> + and a4, a4, a0
>> + dli a0, 0x0fff0000 /* Force size to max */
>> + or a4, a4, a0
>> +
>> + move a5, s6
>> + /* Convert to 64k blocks, as used by boot bus config */
>> + srl a5, 16
>> + li a6, 0x1fc0 /* 'normal' boot bus base config value */
>> + subu a6, a6, a5 /* Subtract offset */
>> + /* combine into register value to pass to boot bus routine */
>> + or a0, a4, a6
>> +
>> + /* Branch there */
>> + PTR_LA a1, __mapped_continue_label
>> + PTR_LI a2, OCTEON_MIO_BOOT_REG_CFG0
>> + /* If region 0 is not enabled we can skip it */
>> + ld a4, 0(a2)
>> + bbit0 a4, 31, __mapped_continue_label
>> + nop
>> + li a4, 0x10000000
>> + j a4
>> + synci 0(zero)
>> +
>> + /* We never get here, as we go directly to __mapped_continue_label */
>> + break
>> +
>> +
>> +uboot_in_ram2:
>> +
>> + /* Now jump to address in TLB mapped memory to continue execution */
>> + PTR_LA a4, __mapped_continue_label
>> + synci 0(a4)
>> + j a4
>> + nop
>> +
>> +__mapped_continue_label:
>> + /* Check if we are core 0, if we are not then we need
>> + * to vector to code in DRAM to do application setup, and
>> + * skip the rest of the bootloader. Only core 0 runs the bootloader
>> + * and sets up the tables that the other cores will use for
>> + * configuration.
>> + */
>> + mfc0 a0, CP0_EBASE
>> + andi a0, EBASE_CPUNUM /* get core */
>> + /* if (__all_cores_are_equal==0 && core==0),
>> + * then jump to execute BL on core 0; else 'go to next line'
>> + * (core_0_cont1 is executed ONLY when k0=a0=0(core0_ID))
>> + */
>> + lw t0, __all_cores_are_equal
>> + beq a0, t0, core_0_cont1
>> + nop
>> +
>> + /* other cores look up addr from dram */
>> + /* DRAM controller already set up by first core */
>> + li a1, (BOOT_VECTOR_NUM_WORDS * 4)
>> + mul a0, a0, a1
>> +
>> + /* Now find out the boot vector base address from the moveable boot
>> + * bus region.
>> + */
>> +
>> + /* Get the address of the boot bus moveable region */
>> + PTR_LI t8, OCTEON_MIO_BOOT_BASE
>> + ld t9, OCTEON_MIO_BOOT_LOC_CFG0_OFF(t8)
>> + /* Make sure it's enabled */
>> + bbit0 t9, 31, invalid_boot_vector
>> + dext t9, t9, 3, 24
>> + dsll t9, t9, 7
>> + /* Make address XKPHYS */
>> + li t0, 1
>> + dins t9, t0, 63, 1
>> +
>> + ld t0, OCTEON_BOOT_MOVEABLE_MAGIC_OFFSET(t9)
>> + dli t1, OCTEON_BOOT_MOVEABLE_MAGIC1
>> + bne t0, t1, invalid_boot_vector
>> + nop
>> +
>> + /* Load base address of boot vector table */
>> + ld t0, OCTEON_BOOT_VECTOR_MOVEABLE_OFFSET(t9)
>> + /* Add offset for core */
>> + daddu a1, t0, a0
>> +
>> + mfc0 v0, CP0_STATUS
>> + move v1, v0
>> + ins v1, zero, 19, 1 /* Clear NMI bit */
>> + mtc0 v1, CP0_STATUS
>> +
>> + /* Get app start function address */
>> + lw t9, 8(a1)
>> + beqz t9, invalid_boot_vector
>> + nop
>> +
>> + j t9
>> + lw k0, 12(a1) /* Load global data (deprecated) */
>> +
>> +invalid_boot_vector:
>> + wait
>> + b invalid_boot_vector
>> + nop
>> +
>> +__all_cores_are_equal:
>> + /* The following .word tell if 'all_cores_are_equal' or core0 is special
>> + * By default (for the first execution) the core0 should be special,
>> + * in order to behave like the old(existing not-modified) bootloader
>> + * and run the bootloader on core 0 to follow the existing design.
>> + * However after that we make 'all_cores_equal' which allows to run SE
>> + * applications on core0 like on any other core. NOTE that value written
>> + * to '__all_cores_are_equal' should not match any core ID.
>> + */
>> + .word 0
>> +
>> +core_0_cont1:
>> + li t0, 0xffffffff
>> + sw t0, __all_cores_are_equal
>> + /* From here on, only core 0 runs, other cores have branched
>> + * away.
>> + */
>> +#ifdef CONFIG_MIPS_INIT_STACK_IN_SRAM
>> + /* Set up initial stack and global data */
>> + setup_stack_gd
>> +# ifdef CONFIG_DEBUG_UART
>> + PTR_LA t9, debug_uart_init
>> + jalr t9
>> + nop
>> +# endif
>> +#endif
>> + move a0, zero # a0 <-- boot_flags = 0
>> + PTR_LA t9, board_init_f
>> +
>> + jr t9
>> + move ra, zero
>> + END(_start)
>> +
>> + .balign 8
>> + .globl single_tlb_setup
>> + .ent single_tlb_setup
>> + /* Sets up a single TLB entry. Virtual/physical addresses
>> + * must be properly aligned.
>> + * a0 Virtual address
>> + * a1 Physical address
>> + * a2 page (_not_ mapping) size
>> + */
>> +single_tlb_setup:
>> + /* Determine the number of TLB entries available, and
>> + * use the top one.
>> + */
>> + mfc0 a3, CP0_CONFIG1
>> + dext a3, a3, 25, 6 /* a3 now has the max mmu entry index */
>> + mfc0 a5, CP0_CONFIG3 /* Check if config4 reg present */
>> + bbit0 a5, 31, single_tlb_setup_cont
>> + nop
>> + mfc0 a5, CP0_CONFIG4
>> + bbit0 a5, 14, single_tlb_setup_cont /* check config4[MMUExtDef] */
>> + nop
>> + /* append config4[MMUSizeExt] to most significant bit of
>> + * config1[MMUSize-1]
>> + */
>> + dins a3, a5, 6, 8
>> + and a3, a3, 0x3fff /* a3 now includes max entries for cn6xxx */
>> +
>> +single_tlb_setup_cont:
>> +
>> + /* Format physical address for entry low */
>> + nop
>> + dsrl a1, a1, 12
>> + dsll a1, a1, 6
>> + ori a1, a1, 0x7 /* set DVG bits */
>> +
>> + move a4, a2
>> + daddu a5, a4, a4 /* mapping size */
>> + dsll a6, a4, 1
>> + daddiu a6, a6, -1 /* pagemask */
>> + dsrl a4, a4, 6 /* adjust for adding with entrylo */
>> +
>> + /* Now set up mapping */
>> + mtc0 a6, CP0_PAGEMASK
>> + mtc0 a3, CP0_INDEX
>> +
>> + dmtc0 a1, CP0_ENTRYLO0
>> + daddu a1, a1, a4
>> +
>> + dmtc0 a1, CP0_ENTRYLO1
>> + daddu a1, a1, a4
>> +
>> + dmtc0 a0, CP0_ENTRYHI
>> + daddu a0, a0, a5
>> +
>> + ehb
>> + tlbwi
>> + jr ra
>> + nop
>> + .end single_tlb_setup
>> +
>> +
>> +/**
>> + * This code is moved to a movable boot bus region,
>> + * and it is responsible for changing the flash mappings and
>> + * jumping to run from the TLB mapped address.
>> + *
>> + * @param a0 New address for boot bus region 0
>> + * @param a1 Address to branch to afterwards
>> + * @param a2 Address of MIO_BOOT_REG_CFG0
>> + */
>> + .balign 8
>> +change_boot_mappings:
>> + sd a0, 0(a2)
>> + sync
>> + j a1 /* Jump to new TLB mapped location */
>> + synci 0(zero)
>> +
>> +/* If we need a large stack, allocate it here. */
>> +#if CONFIG_OCTEON_BIG_STACK_SIZE
>> + /* Allocate the stack here so it's in L2 cache or DRAM */
>> + .balign 16
>> +big_stack_end:
>> + .skip CONFIG_OCTEON_BIG_STACK_SIZE, 0
>> +big_stack_start:
>> + .dword 0
>> +#endif
>>
>
Viele Grüße,
Stefan
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-51 Fax: (+49)-8142-66989-80 Email: sr at denx.de
More information about the U-Boot
mailing list