[U-Boot] [PATCH v2 3/4] mtd: nand: add Freescale NFC driver

Bill Pringlemeir bpringlemeir at nbsps.com
Mon Aug 18 18:38:43 CEST 2014


On 18 Aug 2014, stefan at agner.ch wrote:

> Am 2014-08-14 23:12, schrieb Bill Pringlemeir:
>>> On 14 Aug 2014, stefan at agner.ch wrote:
>>>
>>> This adds initial support for Freescale NFC (NAND Flash Controller)
>>> found in ARM Vybrid SoC's, Power Architecture MPC5125 and others.
>>> However, this driver is only tested on Vybrid.
>>
>> This is only to expand on the nand controller register and SRAM use.
>>
>> [snip]
>>
>>> diff --git a/drivers/mtd/nand/vf610_nfc.c
>>> b/drivers/mtd/nand/vf610_nfc.c new file mode 100644 index
>>> 0000000..3150ac1 --- /dev/null +++ b/drivers/mtd/nand/vf610_nfc.c
>>
>> [snip]
>>
>>> +static inline u32 vf610_nfc_read(struct mtd_info *mtd, uint reg) +{

>> Ok, we always use readl/writel.  This is fine, but a little slower
>> and bigger.  I may try a register cache if I resubmit to the Linux
>> MTD as per Scott's suggestion.  Especially, this version is good for
>> an incremental patch.

> I measured the difference and get 1MB/s
> Full pages, readl/writel:
> NAND read: device 0 offset 0x200000, size 0x800000
> 8388608 bytes read in 772 ms (10.4 MiB/s): OK

> Full pages, __raw_readl/__raw_writel
> NAND read: device 0 offset 0x200000, size 0x800000
> 8388608 bytes read in 696 ms (11.5 MiB/s): OK

> Ok, this is actually quite a lot. Especially since I already optimized
> the C code (by not using the helper functions like nfc_set/nfc_clear
> in vf610_nfc_send_command), one would think there is now almost no
> optimization potential. I looked into the disassembled code and could
> narrow down the issue. Due to the memory barriers, all offsets were
> calculated on each register access (nfc base to reg base, and add reg
> offset), multiple instances of:

>  20:	e59cc120 	ldr	ip, [ip, #288]	; 0x120
>  24:	e59cc134 	ldr	ip, [ip, #308]	; 0x134

> I optimized the code again and calculate the offsets manually and
> access __raw_readl/__raw_writel rather then vf610_nfc_read/write in
> the vf610_nfc_send_command(s) function, I get the full speed again:

> NAND read: device 0 offset 0x200000, size 0x800000
> 8388608 bytes read in 687 ms (11.6 MiB/s): OK

I think what you have is fine.  The 10BM/s versus 11MB/s is not
insignificant, but it is not a huge difference.  I expect that the
driver is already better than the others; especially the Imx-25 was only
7.7MB/s read and 4.6Mb/s write from Linux mtd tests.  Although the end
user might like to wait 10% less for an image to load.

Also, probably a lot of people care about code size.  This is not so
much a case for U-Boot as it is usually machine specific and doesn't
support several SOCs afaik.

However, the more specific you make the optimization to the platform,
the less likely it is to extend well.  We also wish to have this work
well with different gcc versions and CPUs (PowerPC, etc).  The
'readl/writel' handicap the compiler.  Although they are more likely to
work with a wide variety of buses.

[snip]

> On Wed, 2014-08-13 at 22:32, Scott Wood wrote:
>> raw_writel() is itself something that should only be used for
>> hand-optimized sections.  For non-performance-critical code you
>> should use normal writel() so that you don't need to worry about
>> manually adding I/O barriers.
>
> The reason I choosed readl/writel instead of the raw variants is to
> preserve align with other drivers...

>> care.  Maybe a comment is fine?  It seems the Vybrid is safe for
>> different access sizes, but it is possible that some other CPU might
>> not be able to access this memory via 32/16/8 bit accesses and
>> 'memcpy()' may not be appropriate.  It seems that 'natural' size of
>> the NFC controller itself is 32bits and the CPU interface does lane
>> masking.  Ie, boot mode documentation talks about remapping
>> 'sram_physical_addr[13:3] = {cpu_addr[11:3],cpu_addr[13:12]}' saying
>> that bits 2,1 are not used (hopefully one based numbers).  This is
>> just my guess...

> What assumptions do you make how memcpy accesses memory? This latest
> patch now uses the optimized versions from the kernel... Maybe they
> even try to access 64-bit width (the NIC interconnect supports 64-bit
> access)

The memcpy() itself could use anything. 64bits is possible on AXI/NIC.
The 'PBRIDGE' is 64bit, but I think the AIPS/IPS (apparently AIPS means
'AHB-lite to IPS) are 32bit.  At least that is the case on the Imx25
which has a different AIPS version.  I assumed the 'memcpy()' was using
32bits but this certainly isn't explicit in the code.

The majority of the register banks are non-volatile with this
controller.  Instead of running multiple NAND programming sequences, the
controller runs them all for us.  Most registers are are mainly like
SRAM.

My only point is that the SRAM buffers use the same interface as the
main Nand controller register banks.  So using 'readl/writel' for the
register, but not the SRAM buffers seems inconsistent.

So to address this inconsistency, I was thinking that we should at least
have a comment?

 /* BUS: This assumes the BUS is 32 bit accessible.  If you are porting
    to other systems, this may not be the case.
  */
 memcpy(nfc->regs + NFC_MAIN_AREA(0) + c, buf, l);

Or we could implement our own version of memcpy that did 32bit aligned
transfers with a similar comment.  In theory, we need a barrier after
the memcpy(), in case anyone modified the code to touch the
NFC_FLASH_CMD2's START_BIT directly after the memcpy() or custom
function.  But all the paranoia adds some code and has potential to slow
things down.

Doing a barrier after every single byte read as the ARM Linux's
memcpy_fromio() does will surely make significant performance
differences.  Instead of being double the Imx25, it was half.  A user
waiting 400% longer won't be too happy.

Fwiw,
Bill Pringlemeir.


More information about the U-Boot mailing list