[U-Boot-Users] MPC83xx data cache lock?
Liu Dave-r63238
DaveLiu at freescale.com
Fri May 26 07:33:45 CEST 2006
Hi Wolfgang,
Here has one patch can make DMA performance improved 6x.
This patch make DMA with cache line burst read and burst write
from/to DDR memory.
DMA, ECC on
ddr init duration: 1335 ms
DMA, ECC off
ddr init duration: 966 ms
====================================
diff -r u-boot/cpu/mpc83xx/cpu.c u-boot-b/cpu/mpc83xx/cpu.c
260c260,262
< dmamr0 = (DMA_CHANNEL_TRANSFER_MODE_DIRECT);
---
> dmamr0 = (DMA_CHANNEL_TRANSFER_MODE_DIRECT |
> DMA_CHANNEL_SOURCE_ADDRESS_HOLD_8B |
> DMA_CHANNEL_SOURCE_ADRESSS_HOLD_EN);
diff -r u-boot/cpu/mpc83xx/cpu_init.c u-boot-b/cpu/mpc83xx/cpu_init.c
72,73d71
< /* Set CSB bus pipeline depth */
< im->arbiter.acr = 0x00030000;
diff -r u-boot/cpu/mpc83xx/spd_sdram.c u-boot-b/cpu/mpc83xx/spd_sdram.c
429c429
< #define CONFIG_DDR_ECC_INIT_VIA_DMA
---
> /* #define CONFIG_DDR_ECC_INIT_VIA_DMA */
======================================
Regards,
Dave
> -----Original Message-----
> From: u-boot-users-admin at lists.sourceforge.net
> [mailto:u-boot-users-admin at lists.sourceforge.net] On Behalf
> Of Liu Dave-r63238
> Sent: Wednesday, May 24, 2006 5:57 PM
> To: 'wd at denx.de'
> Cc: u-boot-users at lists.sourceforge.net
> Subject: RE: [U-Boot-Users] MPC83xx data cache lock?
>
>
>
> > -----Original Message-----
> > Just measue the time it takes to initialize ECC memory
> > either using the cache or DMA methods; here is a short
> > summary (don't complain - you asked for it!):
> >
> > ----- quote begin -----
> >
> > 1. Read vs. write performance
> >
> > Writing to DDR memory is *much* slower than reading it.
> >
> > ECC off
> > read duration: 509 ms
> > write duration: 1546 ms
> >
> > ECC on
> > read duration: 509 ms
> > write duration: 5703 ms
> >
> I have a test, the read vs. write performance is
>
> ECC off
> read duration: 4124 ms
> write duration: 1516 ms
>
> ECC on
> read duration: 4634 ms
> write duration: 5703 ms
>
> Because data cache is locked all of ways, so the data cache's
> behavior looks like cache inhibited, we access memory with
> the two instructions, stw for 32bits write and lwz---for 32bits read.
>
> The write performance is the same to you, but read
> performance is very different between us.
>
> I don't know how did you do the read access memory?
>
> If you only read from memory to one variable, and you don't
> reference this variable later, the compiler will remove the
> load instruction to optimize. Or you define the variable with
> volatile type.
>
> I suggest you check the assembler code to make sure the load
> instruction in the loop and no any other memory access
> instructions in the loop.
>
> When the ECC enable, the write duration is 4x difference when
> the ECC is off, I think sub-double word write cause
> read-modify-write bus operation. It will consume more time do
> the write access.
>
> Why the read time is triple than the write time in my test?
> I will address this.
>
> > There's no clear indication in both DDR (8349) docs and
> > Micron specification of our module on if and how read vs.
> > write operations differ in timing. There is one pointer for
> > the ECC case, which suggests writes can take three stages
> > (full read-modify-write cycle) instead of just one:
> >
> > "9.5.4 SDRAM Interface Timing - If ECC is disabled, writes
> > smaller than
> > double words are performed by appropriately activating the
> > data mask. If
> > ECC is enabled, the controller performs a read-modify write."
> >
> > The problem is we see 3x difference when the ECC is off,
> and 10x when
> > on. We also did a series of tests with various chunk sizes of data
> > written, so as to be sure we do not do the indicated
> sub-double word
> > writes, but the results were the same.
> >
> Do you make sure you do not do the sub-double word writes?
>
> I also do one 64 bits read / write access test for full memory space.
>
> Access memory with dobule precision float load/store
> instructions. Lfd for 64 bits read and stfd for 64 bits write.
>
> The code see the attatchment. And the result is
>
> ECC off
> read duration: 2317 ms
> write duration: 774 ms
>
> ECC on
> read duration: 2317 ms
> write duration: 774 ms
>
> When ECC is on, we do double word write operation, so RMW
> cycles don't happen.
>
>
> > This is really strange, although at least read operations are not
> > affected by enabling ECC (which is according to the book -
> > there should
> > be minimal overhead put on read operations while ECC on, see
> > 3. below).
> >
> > 2. DMA (low) performance
> >
> > Using DMA for transfers proves very inefficient. As mentioned
> > earlier,
> > the DMA module in 8349 is different than seen in other
> > families, and it
> > occured to us a bit "alien" when compared with the rest of
> > the chip (DMA
> > documentation part is rather limited, and different in style
> > etc.), as
> > if taken from elsewhere. It is also peculiar in technical aspects:
> > endianness used is different, so we need to convert the order
> > explicitly
> > in s/w.
> >
> > We tried increasing the local bus clocking but to no avail.
> >
> Local bus clock don't effect to CSB and DDR performance.
>
> > Given that low performance it doesn't make much difference
> > whether ECC
> > is enabled or not:
> >
> > DMA, ECC on
> > ddr init duration: 6947 ms
> >
> > DMA, ECC off
> > ddr init duration: 6721 ms
> >
>
> My test data is:
>
> DMA, ECC on
> ddr init duration: 6945 ms
>
> DMA, ECC off
> ddr init duration: 6558 ms
>
> Just little difference to you.
>
> > There seems something broken with the DMA operations in
> > general as they
> > are way slower than just plain read/write to memory, which
> is somehow
> > confirmed by your recent communication from the customer.
> >
> Init all of memory with DMA method as u-boot code,
> DMA controller will do ----read from memory and do ----write
> to memory. and loop it.
>
> This will arise lot of read access from memory. Consume more time.
>
> >
> > 3. ECC penalty
> >
> > As can be seen in results given in 1. enabling ECC puts a
> > huge burden on
> > write access, which is contrary to 8349 UM:
> >
> > p. 9-27 (above figure 9-24) "When ECC is enabled, one clock cycle is
> > added to the read path to check ECC and correct single-bit
> > errors. ECC
> > generation does not add a cycle to the write path."
> >
> > ----- quote begin -----
> >
> >
> > Can you explain why writing to ECC memory is 10 times
> > slower than reading?
> >
> I hope you can tell me how did you mesure the read time. Thanks.
>
>
> Regards,
> Dave
>
>
More information about the U-Boot
mailing list