[U-Boot] [PATCH v3 3/7] MSCC: add support for Ocelot SoCs

Thu Dec 13 14:55:51 UTC 2018

Am Do., 13. Dez. 2018 um 15:05 Uhr schrieb Gregory CLEMENT
<gregory.clement at bootlin.com>:
>
> Hi Daniel,
>
>  On lun., déc. 10 2018, Daniel Schwierzeck <daniel.schwierzeck at gmail.com> wrote:
>
> >> diff --git a/arch/mips/mach-mscc/include/ioremap.h b/arch/mips/mach-mscc/include/ioremap.h
> >> new file mode 100644
> >> index 0000000000..8ea5c65ce3
> >> --- /dev/null
> >> +++ b/arch/mips/mach-mscc/include/ioremap.h
> >> @@ -0,0 +1,51 @@
> >> +/* SPDX-License-Identifier: (GPL-2.0+ OR MIT) */
> >
> > this line should start with a //. There are more files in this patch
> > which need to be fixed.
>
> Actually, according to the documentation (Licenses/README):
>   The SPDX license identifier is added in form of a comment.  The comment
>    style depends on the file type::
>
>       C source: // SPDX-License-Identifier: <SPDX License Expression>
>       C header: /* SPDX-License-Identifier: <SPDX License Expression> */
>
> So for a C header file, /* comment */ is correct.

oh sorry, I missed that there is a difference

>
> >
> >> +/*
> >> + * Copyright (c) 2018 Microsemi Corporation
> >> + */
> >> +
> >> +#ifndef __ASM_MACH_MSCC_IOREMAP_H
> >> +#define __ASM_MACH_MSCC_IOREMAP_H
> >> +
> >> +#include <linux/types.h>
> >> +#include <mach/common.h>
> >> +
> >> +/*
> >> + * Allow physical addresses to be fixed up to help peripherals located
> >> + * outside the low 32-bit range -- generic pass-through version.
> >> + */
> >> +static inline phys_addr_t fixup_bigphys_addr(phys_addr_t phys_addr,
> >> +                                         phys_addr_t size)
> >> +{
> >> +    return phys_addr;
> >> +}
> >> +
> >> +static inline int is_vcoreiii_internal_registers(phys_addr_t offset)
> >> +{
> >> +#if defined(CONFIG_ARCH_MSCC)
> >
> > this define is superfluous because this directory is only added to the
> > include paths when CONFIG_ARCH_MSCC is selected
>
> OK
>
> >
> >> +    if ((offset >= MSCC_IO_ORIGIN1_OFFSET &&
> >> +         offset < (MSCC_IO_ORIGIN1_OFFSET + MSCC_IO_ORIGIN1_SIZE)) ||
> >> +        (offset >= MSCC_IO_ORIGIN2_OFFSET &&
> >> +         offset < (MSCC_IO_ORIGIN2_OFFSET + MSCC_IO_ORIGIN2_SIZE)))
> >> +            return 1;
> >> +#endif
> >> +
> >> +    return 0;
> >> +}
>
> [...]
>
> >> +/*
> >> + * DDR memory sanity checking failed, tally and do hard reset
> >> + *
> >> + * NB: Assumes inlining as no stack is available!
> >> + */
> >> +static inline void hal_vcoreiii_ddr_failed(void)
> >> +{
> >> +    register u32 reset;
> >> +
> >> +    writel(readl(BASE_CFG + ICPU_GPR(6)) + 1, BASE_CFG + ICPU_GPR(6));
> >> +
> >> +    clrbits_le32(BASE_DEVCPU_GCB + PERF_GPIO_OE, BIT(19));
> >> +
> >> +    /* Jump to reset - does not return */
> >> +    reset = KSEG0ADDR(_machine_restart);
> >> +    /* Reset while running from cache */
> >> +    icache_lock((void *)reset, 128);
> >> +    asm volatile ("jr %0"::"r" (reset));
> >
> > could you briefly describe the reason for this in a comment? It's not
> > clear why this code is necessary without knowing the SoC. AFAIU from
> > your last mail the boot SPI flash is mapped to KUSEG and you need to
> > establish a TLB mapping in lowlevel_init() to be able to move to
> > KSEG0.
>
> The reboot workaround in _machine_restart() will change the SPI NOR
> into SW bitbang.
>
> This will render the CPU unable to execute directly from the NOR, which
> is why the reset instructions are prefetched into the I-cache.
>
> When failing the DDR initialization we are executing from NOR.
>
> The last instruction in _machine_restart() will reset the MIPS CPU
> (and the cache), and the CPU will start executing from the reset vactor.
>
> I will add this explanation as comment.

thanks

>
> >
> >> +
> >> +    panic("DDR init failed\n");
> >> +}
> >> +
> >> +/*
> >> + * DDR memory sanity checking done, possibly enable ECC.
> >> + *
> >> + * NB: Assumes inlining as no stack is available!
> >> + */
> >> +static inline void hal_vcoreiii_ddr_verified(void)
> >> +{
> >> +#ifdef MIPS_VCOREIII_MEMORY_ECC
> >> +    /* Finally, enable ECC */
> >> +    register u32 val = readl(BASE_CFG + ICPU_MEMCTRL_CFG);
> >> +
> >> +    val |= ICPU_MEMCTRL_CFG_DDR_ECC_ERR_ENA;
> >> +    val &= ~ICPU_MEMCTRL_CFG_BURST_SIZE;
> >> +
> >> +    writel(val, BASE_CFG + ICPU_MEMCTRL_CFG);
> >> +#endif
> >> +
> >> +    /* Reset Status register - sticky bits */
> >> +    writel(readl(BASE_CFG + ICPU_MEMCTRL_STAT), BASE_CFG + ICPU_MEMCTRL_STAT);
> >> +}
> >> +
> >> +/* NB: Assumes inlining as no stack is available! */
> >> +static inline int look_for(u32 bytelane)
> >> +{
> >> +    register u32 i;
> >> +
> >> +    /* Reset FIFO in case any previous access failed */
> >> +    for (i = 0; i < sizeof(training_data); i++) {
> >> +            register u32 byte;
> >> +
> >> +            memphy_soft_reset();
> >> +            /* Reset sticky bits */
> >> +            writel(readl(BASE_CFG + ICPU_MEMCTRL_STAT),
> >> +                   BASE_CFG + ICPU_MEMCTRL_STAT);
> >> +            /* Read data */
> >> +            byte = ((volatile u8 *)MSCC_DDR_TO)[bytelane + (i * 4)];
> >
> > __raw_readl()?
>
> I had tried it before but without luck, but after trying harder this
> time I managed to use read(b|l)/write(b|l) everywhere and get ride of
> the volatile variable.
>
> >
> >> +            /*
> >> +             * Prevent the compiler reordering the instruction so
> >> +             * the read of RAM happens after the check of the
> >> +             * errors.
> >> +             */
> >> +            asm volatile("" : : : "memory");
> >
> > this is available as barrier(). But according to context you could use
> > rmb(). Anyway with the current volatile pointer or the suggested
> > __raw_readl() the compiler shouldn't reorder at all
>
> I had a close look on the code generating the __raw_readl and there is
> nothing there to guaranty the ordering. Actually in our case (32 bits)
> __read_readl is just:
> static inline u32 __raw_readl(const volatile void __iomem *mem)
> {
>         u32 __val;
>
>         __val = *mem;
>         return __val;
> }
>
> initial code is here:
> https://elixir.bootlin.com/u-boot/v2018.11-rc3/source/arch/mips/include/asm/io.h#L265
> but __swizzle_addr_l() did nothing
> https://elixir.bootlin.com/u-boot/v2018.11-rc3/source/arch/mips/include/asm/mach-generic/mangle-port.h#L10
> same for __raw_ioswabl():
> https://elixir.bootlin.com/u-boot/v2018.11-rc3/source/arch/mips/include/asm/io.h#L35
>
> So the code is the same that we have written. I agree it is cleaner
> to use __raw_readl but it doesn't add anything about the ordering.
>
> It is the same for the use of the volatile, it ensures that the compiler
> will always produce a operation to read the data in memory, but it is
> not about ordering.
>
> As you suggested I will use rmb();
>
> >> +static inline int hal_vcoreiii_train_bytelane(u32 bytelane)
> >> +{
> >> +    register int res;
> >> +    register u32 dqs_s;
> >> +
> >> +    set_dly(bytelane, 0);   // Start training at DQS=0
> >
> > no C++ style comments
>
> OK
>
> >
> >> +    while ((res = look_for(bytelane)) == DDR_TRAIN_CONTINUE)
> >> +            ;
> >> +    if (res != DDR_TRAIN_OK)
> >> +            return res;
> >> +
> >> +    dqs_s = readl(BASE_CFG + ICPU_MEMCTRL_DQS_DLY(bytelane));
> >> +    while ((res = look_past(bytelane)) == DDR_TRAIN_CONTINUE)
> >> +            ;
> >> +    if (res != DDR_TRAIN_OK)
> >> +            return res;
> >> +    /* Reset FIFO - for good measure */
> >> +    memphy_soft_reset();
> >> +    /* Adjust to center [dqs_s;cur] */
> >> +    center_dly(bytelane, dqs_s);
> >> +    return DDR_TRAIN_OK;
> >> +}
> >> +
> >> +/* This algorithm is converted from the TCL training algorithm used
> >> + * during silicon simulation.
> >> + * NB: Assumes inlining as no stack is available!
> >> + */
> >> +static inline int hal_vcoreiii_init_dqs(void)
> >> +{
> >> +#define MAX_DQS 32
> >> +    register u32 i, j;
> >> +
> >> +    for (i = 0; i < MAX_DQS; i++) {
> >> +            set_dly(0, i);  // Byte-lane 0
> >
> > no C++ style comments
>
> OK
>
> >
> >> +            for (j = 0; j < MAX_DQS; j++) {
> >> +                    register u32 __attribute__ ((unused)) byte;
> >
> > why unused? If you really need it, you could use __maybe_unused
>
> Because the purpose of this variable is just to access the memory, we
> don't do nothing of the value read, and gcc complain about it. But as
> you suggest I will use __maybe_unused.
>
> >
> >> +                    set_dly(1, j);  // Byte-lane 1
> >> +                    /* Reset FIFO in case any previous access failed */
> >> +                    memphy_soft_reset();
> >> +                    writel(readl(BASE_CFG + ICPU_MEMCTRL_STAT),
> >> +                           BASE_CFG + ICPU_MEMCTRL_STAT);
> >> +                    byte = ((volatile u8 *)MSCC_DDR_TO)[0];
> >> +                    byte = ((volatile u8 *)MSCC_DDR_TO)[1];
> >> +                    if (!(readl(BASE_CFG + ICPU_MEMCTRL_STAT) &
> >> +                        (ICPU_MEMCTRL_STAT_RDATA_MASKED |
> >> +                         ICPU_MEMCTRL_STAT_RDATA_DUMMY)))
> >> +                            return 0;
> >> +            }
> >> +    }
> >> +    return -1;
> >> +}
> >> +
> >> +static inline int dram_check(void)
> >> +{
> >> +#define DDR ((volatile u32 *) MSCC_DDR_TO)
> >> +    register u32 i;
> >> +
> >> +    for (i = 0; i < 8; i++) {
> >> +            DDR[i] = ~i;
> >> +            if (DDR[i] != ~i)
> >
> > __raw_readl(), __raw_writel() and drop the explicit volatile?
>
> Yes, as explain above, it s done now.
>
> >> +
> >> +/*
> >> + * Target offset base(s)
> >> + */
> >> +#define MSCC_IO_ORIGIN1_OFFSET 0x70000000
> >> +#define MSCC_IO_ORIGIN1_SIZE   0x00200000
> >> +#define MSCC_IO_ORIGIN2_OFFSET 0x71000000
> >> +#define MSCC_IO_ORIGIN2_SIZE   0x01000000
> >> +#define BASE_CFG        ((void __iomem *)0x70000000)
> >> +#define BASE_DEVCPU_GCB ((void __iomem *)0x71070000)
> >
> > Would it be possible on that SoC to define those register offsets as
> > simple physical address and create the mapping when needed?
> > For example:
> >
> > void foo()
> > {
> >     void __iomem *base_cfg = ioremap(BASE_CFG, ...);
> >     writel(base_cfg + XXX, 0);
> > }
>
> Actually creating the mapping is just casting the physical address in an
> (void __iomem *), see our plat_ioremap.
>
> Calling ioremap in every function will just grow them with little
> benefit.
>
> If you really want it, what I could is sharing void __iomem *base_cfg
> and void __iomem *base_devcpu_gcb at platform level, and initialize them
> only once very early during the boot.

ok, it was only a question. Normally ioremap should be completely optimised
away by the compiler.

>
>
> >> +LEAF(lowlevel_init)
> >> +    /*
> >> +     * As we have no stack yet, we can assume the restricted
> >> +     * luxury of the sX-registers without saving them
> >> +     */
> >> +    move    s0,ra
> >> +
> >> +    jal     vcoreiii_tlb_init
> >> +    nop
> >
> > we use the same style as Linux MIPS where instructions in the delay slot
> > should be indented by an extra space.
>
> OK
>
> Thanks,
>
> Gregory
>
> --
> Gregory Clement, Bootlin
> Embedded Linux and Kernel engineering
> http://bootlin.com

-- 
- Daniel