[U-Boot-Users] MPC83xx data cache lock?

Liu Dave-r63238 DaveLiu at freescale.com
Fri May 26 07:33:45 CEST 2006


Hi Wolfgang,

Here has one patch can make DMA performance improved 6x.
This patch make DMA with cache line burst read and burst write
from/to DDR memory. 

DMA, ECC on
ddr init duration: 1335 ms
 
DMA, ECC off
ddr init duration: 966 ms

====================================
diff -r u-boot/cpu/mpc83xx/cpu.c u-boot-b/cpu/mpc83xx/cpu.c
260c260,262
<       dmamr0 = (DMA_CHANNEL_TRANSFER_MODE_DIRECT);
---
>       dmamr0 = (DMA_CHANNEL_TRANSFER_MODE_DIRECT |
>                       DMA_CHANNEL_SOURCE_ADDRESS_HOLD_8B |
>                       DMA_CHANNEL_SOURCE_ADRESSS_HOLD_EN);
diff -r u-boot/cpu/mpc83xx/cpu_init.c u-boot-b/cpu/mpc83xx/cpu_init.c
72,73d71
<       /* Set CSB bus pipeline depth */
<       im->arbiter.acr = 0x00030000;
diff -r u-boot/cpu/mpc83xx/spd_sdram.c u-boot-b/cpu/mpc83xx/spd_sdram.c
429c429
< #define CONFIG_DDR_ECC_INIT_VIA_DMA
---
> /* #define CONFIG_DDR_ECC_INIT_VIA_DMA */
======================================

Regards,
Dave


> -----Original Message-----
> From: u-boot-users-admin at lists.sourceforge.net 
> [mailto:u-boot-users-admin at lists.sourceforge.net] On Behalf 
> Of Liu Dave-r63238
> Sent: Wednesday, May 24, 2006 5:57 PM
> To: 'wd at denx.de'
> Cc: u-boot-users at lists.sourceforge.net
> Subject: RE: [U-Boot-Users] MPC83xx data cache lock? 
> 
> 
> 
> > -----Original Message-----
> > Just measue the time it takes to initialize ECC memory
> > either  using the  cache  or DMA methods; here is a short 
> > summary (don't complain - you asked for it!):
> > 
> > ----- quote begin -----
> > 
> > 1. Read vs. write performance
> > 
> > Writing to DDR memory is *much* slower than reading it.
> > 
> > ECC off
> > read  duration: 509 ms
> > write duration: 1546 ms
> > 
> > ECC on
> > read  duration: 509 ms
> > write duration: 5703 ms
> > 
> I have a test, the read vs. write performance is
> 
> ECC off
> read duration: 4124 ms
> write duration: 1516 ms
> 
> ECC on
> read duration: 4634 ms
> write duration: 5703 ms
> 
> Because data cache is locked all of ways, so the data cache's 
> behavior looks like cache inhibited, we access memory with 
> the two instructions, stw for 32bits write and lwz---for 32bits read.
> 
> The write performance is the same to you, but read 
> performance is very different between us.
> 
> I don't know how did you do the read access memory?
> 
> If you only read from memory to one variable, and you don't 
> reference this variable later, the compiler will remove the 
> load instruction to optimize. Or you define the variable with 
> volatile type.
> 
> I suggest you check the assembler code to make sure the load 
> instruction in the loop and no any other memory access 
> instructions in the loop. 
> 
> When the ECC enable, the write duration is 4x difference when 
> the ECC is off, I think sub-double word write cause 
> read-modify-write bus operation. It will consume more time do 
> the write access.
> 
> Why the read time is triple than the write time in my test?
> I will address this.
> 
> > There's no clear indication in both DDR (8349) docs and
> > Micron specification of our module on if and how read vs. 
> > write operations differ in timing. There is one pointer for 
> > the ECC case, which suggests writes can take three stages 
> > (full read-modify-write cycle) instead of just one:
> > 
> > "9.5.4 SDRAM Interface Timing - If ECC is disabled, writes
> > smaller than 
> > double words are performed by appropriately activating the 
> > data mask. If 
> > ECC is enabled, the controller performs a read-modify write."
> > 
> > The problem is we see 3x difference when the ECC is off, 
> and 10x when
> > on. We also did a series of tests with various chunk sizes of data 
> > written, so as to be sure we do not do the indicated 
> sub-double word 
> > writes, but the results were the same.
> > 
> Do you make sure you do not do the sub-double word writes?
> 
> I also do one 64 bits read / write access test for full memory space. 
> 
> Access memory with dobule precision float load/store 
> instructions. Lfd for 64 bits read and stfd for 64 bits write.
> 
> The code see the attatchment. And the result is
> 
> ECC off
> read duration: 2317 ms
> write duration: 774 ms
> 
> ECC on
> read duration: 2317 ms
> write duration: 774 ms
> 
> When ECC is on, we do double word write operation, so RMW 
> cycles don't happen.
> 
> 
> > This is really strange, although at least read operations are not
> > affected by enabling ECC (which is according to the book - 
> > there should 
> > be minimal overhead put on read operations while ECC on, see 
> > 3. below).
> > 
> > 2. DMA (low) performance
> > 
> > Using DMA for transfers proves very inefficient. As mentioned
> > earlier, 
> > the DMA module in 8349 is different than seen in other 
> > families, and it 
> > occured to us a bit "alien" when compared with the rest of 
> > the chip (DMA 
> > documentation part is rather limited, and different in style 
> > etc.), as 
> > if taken from elsewhere. It is also peculiar in technical aspects: 
> > endianness used is different, so we need to convert the order 
> > explicitly 
> > in s/w.
> > 
> > We tried increasing the local bus clocking but to no avail.
> > 
> Local bus clock don't effect to CSB and DDR performance.
> 
> > Given that low performance it doesn't make much difference
> > whether ECC 
> > is enabled or not:
> > 
> > DMA, ECC on
> > ddr init duration: 6947 ms
> > 
> > DMA, ECC off
> > ddr init duration: 6721 ms
> >
>  
> My test data is:
> 
> DMA, ECC on
> ddr init duration: 6945 ms
> 
> DMA, ECC off
> ddr init duration: 6558 ms
> 
> Just little difference to you.
> 
> > There seems something broken with the DMA operations in
> > general as they 
> > are way slower than just plain read/write to memory, which 
> is somehow 
> > confirmed by your recent communication from the customer.
> >
> Init all of memory with DMA method as u-boot code,
> DMA controller will do ----read from memory  and do ----write 
> to memory. and loop it.
> 
> This will arise lot of read access from memory. Consume more time. 
>  
> > 
> > 3. ECC penalty
> > 
> > As can be seen in results given in 1. enabling ECC puts a
> > huge burden on 
> > write access, which is contrary to 8349 UM:
> > 
> > p. 9-27 (above figure 9-24) "When ECC is enabled, one clock cycle is
> > added to the read path to check ECC and correct single-bit 
> > errors.  ECC 
> > generation does not add a cycle to the write path."
> > 
> > ----- quote begin -----
> > 
> > 
> > Can you explain why writing to ECC memory is  10  times
> > slower  than reading?
> > 
> I hope you can tell me how did you mesure the read time. Thanks.
> 
> 
> Regards,
> Dave 
> 
> 




More information about the U-Boot mailing list