[U-Boot] [PATCH 0/9] arm64: Unify MMU code

Stephen Warren swarren at wwwdotorg.org
Wed Feb 24 17:57:59 CET 2016


On 02/24/2016 03:19 AM, Alexander Graf wrote:
>
>
> On 22.02.16 21:15, york sun wrote:
>> On 02/22/2016 12:09 PM, Alexander Graf wrote:
>>>
>>>
>>> On 22.02.16 20:52, york sun wrote:
>>>> On 02/22/2016 11:42 AM, Alexander Graf wrote:
>>>>>
>>>>>
>>>>> On 22.02.16 19:39, york sun wrote:
>>>>>> On 02/22/2016 10:31 AM, Alexander Graf wrote:
>>>>>>>
>>>>>>> On Feb 22, 2016, at 7:12 PM, york sun <york.sun at nxp.com> wrote:
>>>>>>>
>>>>>>>> On 02/22/2016 10:02 AM, Alexander Graf wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Am 22.02.2016 um 18:37 schrieb york sun <york.sun at nxp.com>:
>>>>>>>>>>
>>>>>>>>>>> On 02/21/2016 05:57 PM, Alexander Graf wrote:
>>>>>>>>>>> Howdy,
>>>>>>>>>>>
>>>>>>>>>>> Currently on arm64 there is a big pile of mess when it comes to MMU
>>>>>>>>>>> support and page tables. Each board does its own little thing and the
>>>>>>>>>>> generic code is pretty dumb and nobody actually uses it.
>>>>>>>>>>>
>>>>>>>>>>> This patch set tries to clean that up. After this series is applied,
>>>>>>>>>>> all boards except for the FSL Layerscape ones are converted to the
>>>>>>>>>>> new generic page table logic and have icache+dcache enabled.
>>>>>>>>>>>
>>>>>>>>>>> The new code always uses 4k page size. It dynamically allocates 1G or
>>>>>>>>>>> 2M pages for ranges that fit. When a dcache attribute request comes in
>>>>>>>>>>> that requires a smaller granularity than our previous allocation could
>>>>>>>>>>> fulfill, pages get automatically split.
>>>>>>>>>>>
>>>>>>>>>>> I have tested and verified the code works on HiKey (bare metal),
>>>>>>>>>>> vexpress64 (Foundation Model) and zynqmp (QEMU). The TX1 target is
>>>>>>>>>>> untested, but given the simplicity of the maps I doubt it'll break.
>>>>>>>>>>> ThunderX in theory should also work, but I haven't tested it. I would
>>>>>>>>>>> be very happy if people with access to those system could give the patch
>>>>>>>>>>> set a try.
>>>>>>>>>>>
>>>>>>>>>>> With this we're a big step closer to a good base line for EFI payload
>>>>>>>>>>> support, since we can now just require that all boards always have dcache
>>>>>>>>>>> enabled.
>>>>>>>>>>>
>>>>>>>>>>> I would also be incredibly happy if some Freescale people could look
>>>>>>>>>>> at their MMU code and try to unify it into the now cleaned up generic
>>>>>>>>>>> code. I don't think we're far off here.
>>>>>>>>>>
>>>>>>>>>> Alex,
>>>>>>>>>>
>>>>>>>>>> Unified MMU will be great for all of us. The reason we started with our own MMU
>>>>>>>>>> table was size and performance. I don't know much about other ARMv8 SoCs. For
>>>>>>>>>> our use, we enable cache very early to speed up running, especially for
>>>>>>>>>> pre-silicon development on emulators. We don't have DDR to use for the early
>>>>>>>>>> stage and we have very limited on-chip SRAM. I believe we can use the unified
>>>>>>>>>> structure for our 2nd stage MMU when DDR is up.
>>>>>>>>>
>>>>>>>>> Yup, and I think it should be fairly doable to move the early generation into the same table format - maybe even fully reuse the generic code.
>>>>>>>>
>>>>>>>> What's the size for the MMU tables? I think it may be simpler to use static
>>>>>>>> tables for our early stage.
>>>>>>>
>>>>>>> The size is determined dynamically from the memory map using some code that (as Steven found) is not 100% sound, but works well enough so far :).
>>>>>>
>>>>>> That's the part I can't live with. Since we have very limited on-chip RAM, we
>>>>>> have to know limit the size. But again, I do see the benefit to use unified
>>>>>> structure for the 2nd stage.
>>>>>
>>>>> I'm not quite sure I see how your current code works any differently.
>>>>> While the code to determine the page table pool size is dynamic, the
>>>>> outcome is static depending on your memory map. So the same memory map
>>>>> always means the same page table pool size.
>>>>>
>>>>> We could also just hard code the size for the early phase for you I guess.
>>>>
>>>> We can definitely try.
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> The thing that I tripped over while attempting conversion was that you don't always map phys==virt, unless other boards, and I didn't fully understand why.
>>>>>>>>>
>>>>>>>> True. We have some complication on the address mapping. For compatibility, each
>>>>>>>> device is mapped (partially) under 32-bit space. If the device is too large to
>>>>>>>
>>>>>>> Compatibility with what? Do we really need this in an AArch64 world?
>>>>>>
>>>>>> It's not up to me. The SoC was designed this way. By the way, this SoC can work
>>>>>> in AArch32 mode.
>>>>>
>>>>> I think I'm slowly grasping what the problem is.
>>>>>
>>>>> The fact that the SoC can run in AArch32 mode doesn't actually make a
>>>>> difference here though, since we're talking about U-Boot internal memory
>>>>> maps. The only reason to keep things mapped reachable from 32bits is if
>>>>> you want to run 32bit code with the U-Boot maps. I don't think you'd
>>>>> want to do that, no? :)
>>>>
>>>> I don't really want to run 32-bit code. My point is the SoC was designed that
>>>> way. We have DDR under 32-bit space, and in high region. We have the same for
>>>> flash controller where NOR is connected. Explained later below.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> For 32bit code I can definitely understand why you'd want to have phys != virt. But in a pure 64bit world (which this target really is, no?) I see little benefit on it.
>>>>>>>
>>>>>>>> fit, the rest is mapped to high regions. I remember one particular case on top
>>>>>>>> of my head. It is the NOR flash we use for environmental variables. U-boot uses
>>>>>>>> that address for saving, but also uses that for loading during booting. For our
>>>>>>>> case, the NOR flash doesn't fit well in the low region, so it is remapped to
>>>>>>>> high region after booting. To make the environmental variables accessible during
>>>>>>>> boot, we mapped the high region phys with different virt, so u-boot doesn't have
>>>>>>>> to know the low region address.
>>>>>>>
>>>>>>> I might be missing the obvious, but why can't the environmental variables live in high regions?
>>>>>>>
>>>>>>
>>>>>> It is in high region. But as I tried to explain, the default physical mapping of
>>>>>> NOR flash (not MMU) is in low region out of reset.
>>>>>
>>>>> I see. So the problem is during the transitioning phase from uncached to
>>>>> MMU enabled, where we'd end up at a different address.
>>>>
>>>> Not exactly. We enable cache very early for performance boost on emulator. It
>>>> may sound trivial but it makes big difference when debugging software on
>>>> emulators. Since we still use emulators for new product, I am not ready to drop
>>>> the early MMU approach.
>>>
>>> I'm surprised it is that slow for you. Running the Foundation model
>>> (which doesn't do early mmu FWIW) seemed to be fast enough.
>>
>> Foundation model is a simulator, not an emulator. Our emulator runs on hardware.
>> It is much much slower than simulator, but more accurate on lower level.
>
> Ah, I remember the confusion in terminology from the PPC times :).
>
>>
>>>
>>>> But you get the idea, the difference is before and after relocation. After
>>>> u-boot relocates itself into DDR, we remap flash controller physical address to
>>>> high region.
>>>>
>>>>>
>>>>> Could we just configure NOR to be in high memory in early asm init code,
>>>>> then always use the high physical NOR address range and jump to it from
>>>>> asm very early on? Then we could ignore the 32bit map and everything
>>>>> could just stay 1:1 mapped.
>>>>>
>>>>
>>>> Out of reset, if booting from NOR flash, the flash controller is pre-configured
>>>> to use low region address. We can only reprogram the controller when u-boot is
>>>> not running on it.
>>>
>>> I see, so you keep the low map alive until you make the switch-over to
>>> DDR. Makes a lot of sense.
>>>
>>> I guess I can give the conversion another stab now whenever I get a free
>>> night :). If I understand you correctly we'd only need to do non-1:1
>>> maps for the early code, right?
>>
>> So far, yes. But we don't want to block ourselves from using non-1:1 mapping
>> down the road, do we?
>
> We're not blocking us at all if we stick to the verbose struct
> definition. We can just add a va field later on and default to 1:1 if
> it's not set.

Well, that rather precludes a VA of 0 being valid. Still, we should be 
able to easily find all instances of the table, and simply edit them to 
set the VA field to the current PA value, rather than relying on 
comparing the VA field against 0.


More information about the U-Boot mailing list