[RFC PATCH] nvme: Always invalidate whole cqes[] array

Fri Feb 26 17:12:37 CET 2021

On 08/02/2021 14:31, Andre Przywara wrote:
> At the moment nvme_read_completion_status() tries to invidate a single
> member of the cqes[] array, which is shady as just a single entry is
> not cache line aligned.
> The structure is dictated by hardware, and with 16 bytes is smaller than
> any cache line we usually deal with. Also multiple entries need to be
> consecutive in memory, so we can't pad them to cover a whole cache line.
> 
> As a consequence we can only always invalidate all of them - U-Boot just
> uses two of them anyway. This is fine, as they are only ever read by the
> CPU (apart from the initial zeroing), so they can't become dirty.
> 
> Make this obvious by always invalidating the whole array, regardless of
> the entry number we are about to read.
> Also blow up the allocation size to cover whole cache lines, to avoid
> other heap allocations to sneak in.
> 
> Signed-off-by: Andre Przywara <andre.przywara at arm.com>
> ---
> Hi,
> 
> this is just compile tested, and should fix the only questionable
> cache invalidate call in this driver.
> Please verify if this fixes any issues!
> 
> Cheers,
> Andre
> 
>  drivers/nvme/nvme.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvme/nvme.c b/drivers/nvme/nvme.c
> index 5d6331ad346..c9efeff4bc9 100644
> --- a/drivers/nvme/nvme.c
> +++ b/drivers/nvme/nvme.c
> @@ -22,6 +22,8 @@
>  #define NVME_AQ_DEPTH		2
>  #define NVME_SQ_SIZE(depth)	(depth * sizeof(struct nvme_command))
>  #define NVME_CQ_SIZE(depth)	(depth * sizeof(struct nvme_completion))
> +#define NVME_CQ_ALLOCATION	ALIGN(NVME_CQ_SIZE(NVME_Q_DEPTH), \
> +				      ARCH_DMA_MINALIGN)
>  #define ADMIN_TIMEOUT		60
>  #define IO_TIMEOUT		30
>  #define MAX_PRP_POOL		512
> @@ -144,8 +146,14 @@ static __le16 nvme_get_cmd_id(void)
>  
>  static u16 nvme_read_completion_status(struct nvme_queue *nvmeq, u16 index)
>  {
> -	u64 start = (ulong)&nvmeq->cqes[index];
> -	u64 stop = start + sizeof(struct nvme_completion);
> +	/*
> +	 * Single CQ entries are always smaller than a cache line, so we
> +	 * can't invalidate them individually. However CQ entries are
> +	 * read only by the CPU, so it's safe to always invalidate all of them,
> +	 * as the cache line should never become dirty.
> +	 */
> +	ulong start = (ulong)&nvmeq->cqes[0];
> +	ulong stop = start + NVME_CQ_ALLOCATION;
>  
>  	invalidate_dcache_range(start, stop);
>  
> @@ -241,7 +249,7 @@ static struct nvme_queue *nvme_alloc_queue(struct nvme_dev *dev,
>  		return NULL;
>  	memset(nvmeq, 0, sizeof(*nvmeq));
>  
> -	nvmeq->cqes = (void *)memalign(4096, NVME_CQ_SIZE(depth));
> +	nvmeq->cqes = (void *)memalign(4096, NVME_CQ_ALLOCATION);
>  	if (!nvmeq->cqes)
>  		goto free_nvmeq;
>  	memset((void *)nvmeq->cqes, 0, NVME_CQ_SIZE(depth));
> @@ -339,7 +347,7 @@ static void nvme_init_queue(struct nvme_queue *nvmeq, u16 qid)
>  	nvmeq->q_db = &dev->dbs[qid * 2 * dev->db_stride];
>  	memset((void *)nvmeq->cqes, 0, NVME_CQ_SIZE(nvmeq->q_depth));
>  	flush_dcache_range((ulong)nvmeq->cqes,
> -			   (ulong)nvmeq->cqes + NVME_CQ_SIZE(nvmeq->q_depth));
> +			   (ulong)nvmeq->cqes + NVME_CQ_ALLOCATION);
>  	dev->online_queues++;
>  }
>  
> 

On Amlogic A311D, fixes timeout on nvme scan:

Tested-by: Neil Armstrong <narmstrong at baylibre.com>