[PATCH 01/19] spi: cadence_qspi: Add support for DDR PHY mode

Sat Mar 30 03:15:59 CET 2024

Hi Tejas,

+Ian Roberts,

(Apologies for the double mail--I accidentally dropped the u-boot
mailing list in version 1)

We have been developing OSPI support against the same Cadence IP in
the Analog parts (SC594 and SC598) and we have some specific feedback
as this patch series will affect the driver's functionality on our
platform and require changes to our upcoming patches to work together.

> {
> @@ -353,6 +649,9 @@  static int cadence_spi_mem_exec_op(struct spi_slave *spi,
>  break;
>  }
>
> + if (op->cmd.dtr)
> + err = cadence_spi_setup_ddrmode(spi, op);
> +
>  return err;
>  }

This appears to trigger calibration with the first spi command
received with the dtr flag set (cadence_spi_mem_exec_op ->
cadence_spi_setup_ddrmode -> cadence_spi_setdlldelay ->
cadence_qspi_rx_dll_tuning). However the calibration code assumes that
this command is a read ID command, because code has been added to
spi_nor_micro_octal_dtr_enable specifically to read the ID first. This
will cause problems if we are adding support to another chip and are
unaware that this host SPI controller detail needs to be added to the
DTR enable function inside spi-nor-core.

in cadence_qspi_rx_dll_tuning:
> + id_matched = true;
> + for (j = 0; j < op->data.nbytes; j++) {
> + if (spi->device_id[j] != id[j]) {
> + id_matched = false;
> + break;
> + }
> + }

This should probably just be a call to memcmp.

Additionally, this assumes that the ID has already been read into
spi->device_id, which happens in spi_nor_read_id. We believe that
setup_ddrmode should be self contained, for example generating a
non-dtr ID read and store (locally) first, then reconfiguring for DTR
and issuing a DTR read.

Based on our experience with the ADI EZKIT, reading a 3 or 6 byte ID
is not always sufficient to perform robust calibration, as some data
lines may end up being "calibrated" against a constant value. For
example, the ISSI flash chips capable of DTR include a "data learning
pattern" register which could be used to ensure that calibration is
performed against a 0 to 1 transition and a 1 to 0 transition. Such
functionality will be chip-specific, so we need to a way to connect
what happens in the dtr enable function in spi-nor-core with what
happens inside the driver (possibly another ops function).

in priv_setup_ddrmode:
> +    /* Disable DAC mode */
> +    if (priv->use_dac_mode) {
> +        clrbits_le32(regbase + CQSPI_REG_CONFIG,
> +                 CQSPI_REG_CONFIG_DIRECT);
> +        priv->use_dac_mode = false;
> +    }

Why is DAC forcefully disabled? On our platform we can only achieve
max PHY mode performance with DAC enabled. Can we provide a
platform-tunable flag to control this?

In priv_setup_ddrmode:
> +    clrsetbits_le32(regbase + CQSPI_REG_RD_DATA_CAPTURE,
> +            (CQSPI_REG_RD_DATA_CAPTURE_DELAY_MASK <<
> +             CQSPI_REG_RD_DATA_CAPTURE_DELAY_LSB),
> +            CQSPI_REG_READCAPTURE_DQS_ENABLE);

DQS is not required for PHY operation. Some chips can operate in DDR
mode and don't have a DQS line; we have one of these on our boards.
Please make this configurable

> +__weak int cadence_qspi_versal_set_dll_mode(struct udevice *dev)
> +{
> +    return -ENOTSUPP;
> +}
> +

Setting the DLL mode will not be a Versal-specific function. Can we
rename this (and other Versal-specific names) in order to make it
clearer that each platform integrating this IP is providing its own
implementation for this function, rather than it being a uniquely
versal-related capability? For DLL mode in particular, it may be
sufficient to check for a device tree property, which can be included
in the generic driver without any platform-specific functions.

In cadence_spi_setdlldelay:
> +    priv->extra_dummy = false;
> +    for (extra_dummy = 0; extra_dummy <= 1; extra_dummy++) {
> +        if (extra_dummy)
> +            priv->extra_dummy = true;
> +
> +        rxtap = cadence_qspi_rx_dll_tuning(spi, op, txtap, extra_dummy);
> +        if (extra_dummy && rxtap < 0) {
> +            printf("Failed RX dll tuning\n");
> +            return rxtap;
> +        }
> +    }

Can you please explain the extra dummy cycles? Why are more needed
outside of what is called out for in flash datasheets? This looks to
us like magic compensating for a calibration failure or a
configuration failure where the flash chip is not providing the
expected number of dummy cycles.

in cadence_qspi_rx_dll_tuning:
> +        ret = cadence_qspi_apb_command_read_setup(priv, op);
> +        if (!ret) {
> +            ret = cadence_qspi_apb_command_read(priv, op);
> +            if (ret < 0) {
> +                printf("error %d reading JEDEC ID\n", ret);
> +                return ret;
> +            }
> +

I see that this cannot use cadence_spi_mem_exec_op() because it would
recursively trigger itself as the calibration was not yet completed,
but as a result this does not support all of the read modes that the
device could be configured for. I think this is further reason to have
a specialized function for enabling DTR rather than triggering it on
the first DTR operation.

> +static int cadence_spi_child_pre_probe(struct udevice *bus)
> +{
> +    struct spi_slave *slave = dev_get_parent_priv(bus);
> +
> +    slave->bytemode = SPI_4BYTE_MODE;
> +
> +    return 0;
> +}
> +
> +__weak int cadence_qspi_versal_set_dll_mode(struct udevice *dev)
> +{
> +    return -ENOTSUPP;
> +}

Although it is likely that any flash supporting DTR uses 4 byte
addressing, we should not set it here.

> diff --git a/drivers/mtd/spi/spi-nor-core.c b/drivers/mtd/spi/spi-nor-core.c
> index faf02c7778..5895b5de09 100644
> --- a/drivers/mtd/spi/spi-nor-core.c
> +++ b/drivers/mtd/spi/spi-nor-core.c

Can we split the micron-specific changes into a separate patch? I
realize currently that they are tightly coupled with the calibration
implementation, but after changing the calibration implementation
perhaps this will be possible.

Some additional comments not specific to any part of the code:

How does this patch address the case when the calibrated frequency is
faster than the non-calibrated mode can run at? For example, if
spi-max-frequency is 125MHz, but single IO operation breaks down at
speeds greater than 50MHz? (This also happens on our board). Also, the
calibration algorithm doesn't appear to re-calibrate the read capture
delay register, which is required whenever the clock or IO mode
changes.

At Timesys we have a set of patches that implement DDR and PHY
calibration for the same Cadence IP that presents different solutions
to the above issues. However, our patches only support bypass mode
calibration and direct mode, because they are the only options that
work on our hardware. Would you be interested in looking at our
version and working together to create a new version of this patchset
that can support both of our platforms at once?

Thanks,
Greg

-- 
Greg Malysa
Timesys Corporation