[U-Boot] [PATCH 3/4] mtd: nand: add Freescale NFC driver

Stefan Agner stefan at agner.ch
Wed Aug 13 13:20:35 CEST 2014


Am 2014-08-13 00:58, schrieb Bill Pringlemeir:
[snip]
>>>>> +static u32 nfc_read(struct mtd_info *mtd, uint reg)
>>>>> +{
>>>>> +	struct fsl_nfc *nfc = mtd_to_nfc(mtd);
>>>>> +
>>>>> +	if (reg == NFC_FLASH_STATUS1 ||
>>>>> +	    reg == NFC_FLASH_STATUS2 ||
>>>>> +	    reg == NFC_IRQ_STATUS)
>>>>> +		return __raw_readl(nfc->regs + reg);
>>>>> +	/* Gang read/writes together for most registers. */
>>>>> +	else
>>>>> +		return *(u32 *)(nfc->regs + reg);
>>>>> +}
>>>>> +
>>>>> +static void nfc_write(struct mtd_info *mtd, uint reg, u32 val)
>>>>> +{
>>>>> +	struct fsl_nfc *nfc = mtd_to_nfc(mtd);
>>>>> +
>>>>> +	if (reg == NFC_FLASH_STATUS1 ||
>>>>> +	    reg == NFC_FLASH_STATUS2 ||
>>>>> +	    reg == NFC_IRQ_STATUS)
>>>>> +		__raw_writel(val, nfc->regs + reg);
>>>>> +	/* Gang read/writes together for most registers. */
>>>>> +	else
>>>>> +		*(u32 *)(nfc->regs + reg) = val;
>>>>> +}
>>>>
>>>> You should always be using raw I/O accessors.  If the intent is to
>>>> bypass I/O ordering for certain registers, the raw accessors should
>>>> already be skipping that.
> 
>>> Ok, will do.
> 
>> Sorry, the above should say "always be using I/O accesors", not always
>> raw ones.
> 
> This is probably because of me.  There are lines like this for
> configuration,
> 
>         /* Set configuration register. */
>         nfc_clear(mtd, NFC_FLASH_CONFIG, CONFIG_ADDR_AUTO_INCR_BIT);
>         nfc_clear(mtd, NFC_FLASH_CONFIG, CONFIG_BUFNO_AUTO_INCR_BIT);
>         nfc_clear(mtd, NFC_FLASH_CONFIG, CONFIG_BOOT_MODE_BIT);
>         nfc_clear(mtd, NFC_FLASH_CONFIG, CONFIG_DMA_REQ_BIT);
>         nfc_set(mtd, NFC_FLASH_CONFIG, CONFIG_FAST_FLASH_BIT);
> 
> If you use some sort of 'volatile', you are saying to the compiler that
> these lines are *not*,
> 
>        ldr r0, [r1, #CONFIG]    # read previous value
>        ldr r2, =~MASK
>        orr r0, #FAST_FLASH_BIT  # set one bit.
>        and r0, r2               # clear all bits at once.
>        str r0, [r1, #CONFIG]    # write it back.
> 
> Instead it will change into five different load/modify/stores.  The
> memory itself is just like SRAM except a few registers that are actually
> volatile; ie changed via the hardware.
> 
> Particularly bad are the memcpy_io() on the ARM.  Using these results
> in about 1/2 to 1/4 of the performance of the drivers through-put on the
> Vybrid.  I can see that using accessors is good, but just replacing this
> with 'writel' will result in significantly more code and slower
> throughput.
> 
> If people insist on this, then something like,
> 
>       val = nfc_read(mtd, NFC_REG);
>       val = nfc_clear(val, CONFIG_ADDR_AUTO_INCR_BIT);
>       val = nfc_clear(val, CONFIG_BUFNO_AUTO_INCR_BIT);
>       val = nfc_clear(val, CONFIG_BOOT_MODE_BIT);
>       val = nfc_clear(val, CONFIG_DMA_REQ_BIT);
>       val = nfc_set(val, CONFIG_FAST_FLASH_BIT);
>       nfc_write(mtd, NFC_REG, val);
> 
> would result in the same code with 'nfc_read()' and 'nfc_write()' using
> the I/O accessors.  I found this more difficult to read for the higher
> level functions.  Instead some some standard macros for setting and
> clearing bits could be used.  The original driver was using 'nfc_set()'
> and 'nfc_clear()'.  Maybe they should just go.
> 

I just applied this change:

diff --git a/drivers/mtd/nand/fsl_nfc.c b/drivers/mtd/nand/fsl_nfc.c
index df2c8be..01c010f 100644
--- a/drivers/mtd/nand/fsl_nfc.c
+++ b/drivers/mtd/nand/fsl_nfc.c
@@ -191,26 +191,14 @@ static u32 nfc_read(struct mtd_info *mtd, uint
reg)
 {
 	struct fsl_nfc *nfc = mtd_to_nfc(mtd);
 
-	if (reg == NFC_FLASH_STATUS1 ||
-	    reg == NFC_FLASH_STATUS2 ||
-	    reg == NFC_IRQ_STATUS)
-		return __raw_readl(nfc->regs + reg);
-	/* Gang read/writes together for most registers. */
-	else
-		return *(u32 *)(nfc->regs + reg);
+	return __raw_readl(nfc->regs + reg);
 }
 
 static void nfc_write(struct mtd_info *mtd, uint reg, u32 val)
 {
 	struct fsl_nfc *nfc = mtd_to_nfc(mtd);
 
-	if (reg == NFC_FLASH_STATUS1 ||
-	    reg == NFC_FLASH_STATUS2 ||
-	    reg == NFC_IRQ_STATUS)
-		__raw_writel(val, nfc->regs + reg);
-	/* Gang read/writes together for most registers. */
-	else
-		*(u32 *)(nfc->regs + reg) = val;
+	__raw_writel(val, nfc->regs + reg);
 }
 
 static void nfc_set(struct mtd_info *mtd, uint reg, u32 bits)

And did some performance measurement:

=> nand read ${loadaddr} 0x0 0x2000000

===> With if/else
NAND read: device 0 offset 0x0, size 0x2000000
 33554432 bytes read in 8494 ms: OK (3.8 MiB/s)

===> With __raw_readl only...
NAND read: device 0 offset 0x0, size 0x2000000
 33554432 bytes read in 8184 ms: OK (3.9 MiB/s)

=> nand write ${loadaddr} rootfs-ubi 0x2000000

===> With if/else
NAND write: device 0 offset 0xa00000, size 0x2000000
 33554432 bytes written in 18200 ms: OK (1.8 MiB/s)

===> With __raw_readl only...
NAND write: device 0 offset 0xa00000, size 0x2000000
 33554432 bytes written in 15095 ms: OK (2.1 MiB/s)

These values are reproducible. I guess by removing the conditional
branch in the nfc_read/nfc_write functions we even gain performance.

IMHO we should use the raw_writel only and "hand optimize" for functions
which are used often. For the initialization/configuration functions,
there is little value to save some register access.

--
Stefan


More information about the U-Boot mailing list