[U-Boot] SoCFPGA PL330 DMA driver and ECC scrubbing
Jason Rush
jarush at gmail.com
Tue Jul 10 12:10:48 UTC 2018
On 7/9/2018 3:08 AM, Marek Vasut wrote:
> On 07/07/2018 12:56 AM, Jason Rush wrote:
>> On 7/5/2018 6:10 PM, Marek Vasut wrote:
>>> On 07/06/2018 01:11 AM, Jason Rush wrote:
>>>> On 7/4/2018 2:23 AM, Marek Vasut wrote:
>>>>> On 07/04/2018 01:45 AM, Jason Rush wrote:
>>>>>> On 7/3/2018 9:08 AM, Marek Vasut wrote:
>>>>>>> On 07/03/2018 03:58 PM, Jason Rush wrote:
>>>>>>>> On 6/29/2018 10:17 AM, Marek Vasut wrote:
>>>>>>>>> On 06/29/2018 05:06 PM, Jason Rush wrote:
>>>>>>>>>> On 6/29/2018 9:52 AM, Marek Vasut wrote:
>>>>>>>>>>> On 06/29/2018 04:44 PM, Jason Rush wrote:
>>>>>>>>>>>> On 6/29/2018 9:34 AM, Marek Vasut wrote:
>>>>>>>>>>>>> On 06/29/2018 04:31 PM, Jason Rush wrote:
>>>>>>>>>>>>>> Dinh,
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>>> A while ago, you posted the following patchset for SoCFPGA to add the PL330
>>>>>>>>>>>>>> DMA driver, and updated the SoCFPGA SDRAM init to write zeros to SDRAM to
>>>>>>>>>>>>>> initialize the ECC bits if ECC was enabled:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://lists.denx.de/pipermail/u-boot/2016-October/269643.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I know it's been a long time, so I'll summarize some of the conversation...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> At the time, you had a problem with the patchset causing the SPL to fail to
>>>>>>>>>>>>>> find the MMC. You had tracked it down to an issue with the following commit
>>>>>>>>>>>>>> "a78cd8613204 ARM: Rework and correct barrier definitions". You and Marek
>>>>>>>>>>>>>> discussed it a bit, but I don't think there was a real conclusion. You
>>>>>>>>>>>>>> submitted a second version of the patchset asking for advice on debugging
>>>>>>>>>>>>>> the issue:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://lists.denx.de/pipermail/u-boot/2016-December/275822.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> No real conversation came from the second patchset, and that was the end of
>>>>>>>>>>>>>> the patch.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I was hoping we could revisit adding your patchset again. I am working on a
>>>>>>>>>>>>>> custom SoCFPGA board with a Cyclone V and ECC SDRAM. I rebased your patchset
>>>>>>>>>>>>>> against v2018.05 and it is working on my custom board (although I don't have
>>>>>>>>>>>>>> an MMC). I also tested it on a SoCKit booting from an MMC (I forced it to
>>>>>>>>>>>>>> scrub the SDRAM on the SoCKit, because it doesn't have ECC RAM), and the
>>>>>>>>>>>>>> SoCKit finds the MMC and boots.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't have any suggestions on why it is working now on my board and not
>>>>>>>>>>>>>> back when you first submitted the patchset. Maybe something else was fixed
>>>>>>>>>>>>>> in the MMC? I was hoping you and Marek could test this patch again on some
>>>>>>>>>>>>>> different SoCFPGA boards to see if you get the same results.
>>>>>>>>>>>>> Look at this patch
>>>>>>>>>>>>> http://git.denx.de/?p=u-boot/u-boot-socfpga.git;a=commit;h=9bb8a249b292d26f152c20e3641600b3d7b3924b
>>>>>>>>>>>>>
>>>>>>>>>>>>> You likely want similar approach, it's faster then the DMA and much simpler.
>>>>>>>>>>>>>
>>>>>>>>>>>> Thanks Marek. I'll give it a try. Would you be interested in a similar patch for the Gen 5?
>>>>>>>>>>> I don't have any Gen5 board which uses ECC, do you ?
>>>>>>>>>>> If so, yes, prepare a patch, it should be very similar.
>>>>>>>>>>>
>>>>>>>>>>> Make sure to measure how long it takes to scrub the memory and how much
>>>>>>>>>>> memory you have, I'd be interested in the numbers.
>>>>>>>>>>>
>>>>>>>>>> Looking at the master branch, it doesn't look like that code is ever being called?
>>>>>>>>>> The sdram_init_ecc_bits() function is called from the ddr_calibration_sequence function(),
>>>>>>>>>> but I can't find where ddr_calibration_sequence is called().
>>>>>>>>> git grep for it, it's called from somewhere in the arch/arm/mach-socfpga/
>>>>>>>>>
>>>>>>>>>> Either way, I can test it. I have a custom Cyclone V board with ECC, and the Intel Arria V SoC
>>>>>>>>>> Dev Kit I can test it on too which I think has ECC.
>>>>>>>>> Please do.
>>>>>>>>>
>>>>>>>> I implemented a similar memset approach for the gen 5 socfpga. It's basically the same
>>>>>>>> code as in that patch; however, when I performed a single memset the processor would
>>>>>>>> reset for some reason. I changed it to loop over calling memset with a size of 32MB over
>>>>>>>> the entire address the address, and that worked as opposed to doing a single memset on
>>>>>>>> the RAM.
>>>>>>> Can you do grep MEMSET .config in your U-Boot build dir ? The arch
>>>>>>> memset is implemented in assembler and doesn't trigger WDT , so if it
>>>>>>> takes too long, it could be that the WDT resets the platform.
>>>>>> Both CONFIG_USE_ARCH_MEMSET and CONFIG_SPL_USE_ARCH_MEMSET
>>>>>> are set in my .config, so it must be the WDT triggering as you suspect.
>>>>>>
>>>>>>>> I started on a SoCKit because it was handy, I know it doesn't have ECC
>>>>>>> It doesn't by default.
>>>>>>>
>>>>>>>> , but I forced it to
>>>>>>>> initialize the RAM as a quick test. It seems much slower than the DMA approach. It
>>>>>>>> should be noted, I didn't implement any code to time the scrubbing, but rather just
>>>>>>>> roughly monitored the time to get a rough idea of how long it took.
>>>>>>>>
>>>>>>>> On the SoCKit, which has 1GB of RAM, the memset takes around 8 seconds to complete,
>>>>>>>> and the DMA takes under 2 seconds.
>>>>>>> Did you enable i/d cache in the SPL ? It's mandatory, otherwise it's
>>>>>>> slow.
>>>>>> I have calls to icache_enable() and dcache_enable() just as you do in
>>>>>> the Arria 10 sdram_init_ecc_bits() function.
>>>>>>
>>>>>> I did double check that both these enable functions call the versions
>>>>>> of the functions in the ./arch/arm/lib/cache-cp15.c file that are
>>>>>> implemented in the SPL. So I believe that both icache and dcache is
>>>>>> enabled.
>>>>> Are you sure it's not just the stubs that are called ? Or that the code
>>>>> doesn't skip the dcache enabling due to some funny stuff, like MMU being
>>>>> already enabled ?
>>>> I added prints to ensure it is calling the real icache_enable()/dcache_enable()
>>>> functions, and not the stubs.
>>>>
>>>>>> I probably should have added a print of icache_status() and
>>>>>> dcache_status() to verify the caches are enabled. I'll add that
>>>>>> tomorrow.
>>>>> Yes, you really should verify that the dcache was enabled.
>>>>>
>>>>>>> Just be careful about the MMU tables placement, they are big and
>>>>>>> if you place them in RAM, make sure you don't overwrite them with the
>>>>>>> memset. The trick might be to memset the first 1 MiB of RAM, then put
>>>>>>> MMU tables at some offset therein (since 0x0 can be used for ARM
>>>>>>> vectors) and then turn on i/d cache and memset the rest.
>>>>>> That is essentially what I am doing I believe, with the exception that I
>>>>>> am only clearing the first 32KiB before initializing the MMU table (which
>>>>>> is what you did in the Arria 10 version).
>>>>>>
>>>>>> I modeled my code almost identically to yours with the exception that
>>>>>> I loop over the memset calls 32MiB at a time. Here's the order of
>>>>>> operations I perform:
>>>>>>
>>>>>> 1. icache_enable()
>>>>>> 2. memset the first 0x8000 bytes to zero
>>>>>> 3. setup gd->arch.tlb_arch and gd->arch.tlb_size
>>>>>> 4. dcache_enable()
>>>>>> 5. loop over remaining memory, memsetting 32MiB at a time to zero
>>>>>> 6. flush_dcache_all()
>>>>>> 7. dcache_disable()
>>>>>>
>>>>>> It looks like the call to dcache_enable is what sets up the MMU tables.
>>>>>> I suspect that's why you did a memset of the first 32KiB before enabling
>>>>>> the dcache on the Arria 10. I think the MMU is initialized okay since the
>>>>>> SPL keeps executing, u-boot loads, and Linux boots after running the
>>>>>> above (maybe that's not a fair assumption).
>>>>> I had to write zeroes to the first 32kiB to init the ECC counters before
>>>>> putting MMU tables there.
>>>>>
>>>>> You really should double check if the MMU and dcache are enabled, 8
>>>>> seconds to scrub the memory is too long I think.
>>>> I added checks to verify that the MMU, icache, and dcache are all setup and
>>>> enabled.
>>>>
>>>> Calling icache_enable() set the CR_I bit (Icache enable) in the CR (control
>>>> register). Then calling dcache_enable() called the mmu_setup() function,
>>>> which setup the MMU and set the CR_M bit (MMU enable) in the CR, and
>>>> finally dcache_enable() set the CR_C bit (Dcache enable) bit in the CR.
>>>>
>>>> I also printed out the control register before the memset calls, and it
>>>> indicated that the mmu, icache, and dcache were enabled.
>>> Is the DRAM area set as cacheable in the MMU tables ?
>>>
>> Good news bad news... The MMU tables weren't being set up because the
>> bd->bi_dram[bank].start and bd->bi_dram[bank].size weren't set up. As a quick
>> test, I hardcoded start to 0 and size to 1GiB. After that, the memset was
>> really quick, U-Boot loads, Linux loads, and everything seems to work great.
> Good.
>
>> However, if I press the HPS_RST push button on the SoCKit (which is connected
>> to power on reset), occasionally U-Boot will lock up while booting. It always
>> boots and operates correctly from the initial power on, but it almost always
>> fails to boot after pressing the HPS_RST button.
>>
>> Usually after pressing the HPS_RST button, U-Boot makes it past the SPL, and
>> hangs somewhere after the call to setup_reloc() in ./common/board_f.c. Once
>> it hangs there, pressing the HPS_RST button again usually causes the SPL to
>> hang while setting up the MMU (before my call to memset). Eventually the
>> WDT kicks in, and it just keeps hanging up in the same place. Once it gets in
>> this mode, the only way to recover it is by toggling power on the board.
>>
>> I spent a bunch of time today trying to track down where it was hanging, but
>> I couldn't pin point anything. The MMU tables looked correct. The MMU
>> registers looked good. I'm not sure the best way to debug what's going on.
> Try triggering warm reset and cold reset via the reset register:
>
> mw 0xffd05004 1
> mw 0xffd05004 2
>
> Does it hang in one case and not in the other ?
>
It hangs in both cases.
I did find that if I do not metset the last 1MiB of DRAM with the cache on,
both warm and cold resets work.
I changed the ecc scrubbing to zero out the first 0x8000 bytes and the last
0x10000 bytes before the MMU is setup and I enable dcache. Then with
the dcache enabled, I zero out the rest of memory. The resets work in this
case as well. So there seems to be some side effect of clearing out the
relocate address space with the cache on.
More information about the U-Boot
mailing list