[U-Boot] [PATCH 1/4] Optimized nand_read_buf for kirkwood
Scott Wood
scottwood at freescale.com
Tue Nov 27 00:39:19 CET 2012
On 11/26/2012 04:33:08 AM, Phil Sutter wrote:
> The basic idea is taken from the linux-kernel, but further optimized.
>
> First align the buffer to 8 bytes, then use ldrd/strd to read and
> store
> in 8 byte quantities, then do the final bytes.
>
> Tested using: 'date ; nand read.raw 0xE00000 0x0 0x10000 ; date'.
> Without this patch, NAND read of 132MB took 49s (~2.69MB/s). With this
> patch in place, reading the same amount of data was done in 27s
> (~4.89MB/s). So read performance is increased by ~80%!
>
> Signed-off-by: Nico Erfurth <ne at erfurth.eu>
> Tested-by: Phil Sutter <phil.sutter at viprinet.com>
> Cc: Prafulla Wadaskar <prafulla at marvell.com>
> ---
> drivers/mtd/nand/kirkwood_nand.c | 29 +++++++++++++++++++++++++++++
> 1 files changed, 29 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/mtd/nand/kirkwood_nand.c
> b/drivers/mtd/nand/kirkwood_nand.c
> index bdab5aa..e04a59f 100644
> --- a/drivers/mtd/nand/kirkwood_nand.c
> +++ b/drivers/mtd/nand/kirkwood_nand.c
> @@ -38,6 +38,34 @@ struct kwnandf_registers {
> static struct kwnandf_registers *nf_reg =
> (struct kwnandf_registers *)KW_NANDF_BASE;
>
> +
> +/* The basic idea is stolen from the linux kernel, but the inner
> loop is optimized a bit more */
> +static void kw_nand_read_buf(struct mtd_info *mtd, uint8_t *buf, int
> len)
> +{
> + struct nand_chip *chip = mtd->priv;
> +
> + while (len && (unsigned long)buf & 7)
> + {
Brace goes on the previous line.
> + *buf++ = readb(chip->IO_ADDR_R);
> + len--;
> + };
> +
> + asm volatile (
> + ".LFlashLoop:\n"
> + " subs\t%0, #8\n"
> + " ldrpld\tr2, [%2]\n" // Read 2 words
> + " strpld\tr2, [%1], #8\n" // Read 2 words
> + " bpl\t.LFlashLoop\n" // This results in one
> additional loop if len%8 <> 0
> + " addne\t%0, #8\n"
> + : "+&r" (len), "+&r" (buf)
> + : "r" (chip->IO_ADDR_R)
> + : "r2", "r3", "memory", "cc"
> + );
Use a real tab (or a space) rather than \t (which only helps
readability in the asm output, rather than the C source that people
actually look at).
Should probably use a numeric label to avoid any possibility of
conflict.
Would this make more sense as a more generic optimized memcpy_fromio()
or similar?
-Scott
More information about the U-Boot
mailing list