[U-Boot] [PATCH 02/10] arm64: Make full va map code more dynamic

Alexander Graf agraf at suse.de
Mon Feb 29 22:05:53 CET 2016



On 29.02.16 17:52, Stephen Warren wrote:
> On 02/27/2016 05:09 AM, Alexander Graf wrote:
>>
>>
>> On 26.02.16 20:25, Stephen Warren wrote:
>>> On 02/25/2016 09:36 AM, Alexander Graf wrote:
>>>>
>>>>
>>>> On 24.02.16 19:14, Stephen Warren wrote:
>>>>> On 02/24/2016 05:11 AM, Alexander Graf wrote:
>>>>>> The idea to generate our pages tables from an array of memory ranges
>>>>>> is very sound. However, instead of hard coding the code to create up
>>>>>> to 2 levels of 64k granule page tables, we really should just create
>>>>>> normal 4k page tables that allow us to set caching attributes on 2M
>>>>>> or 4k level later on.
>>>>>>
>>>>>> So this patch moves the full_va mapping code to 4k page size and
>>>>>> makes it fully flexible to dynamically create as many levels as
>>>>>> necessary for a map (including dynamic 1G/2M pages). It also adds
>>>>>> support to dynamically split a large map into smaller ones when
>>>>>> some code wants to set dcache attributes.
>>>>>>
>>>>>> With all this in place, there is very little reason to create your
>>>>>> own page tables in board specific files.
>>>>>>
>>>>>> Signed-off-by: Alexander Graf <agraf at suse.de>
>>>>>
>>>>>> +/*
>>>>>> + * This is a recursively called function to count the number of
>>>>>> + * page tables we need to cover a particular PTE range. If you
>>>>>> + * call this with level = -1 you basically get the full 48 bit
>>>>>> + * coverage.
>>>>>> + */
>>>>>> +static int count_required_pts(u64 addr, int level, u64 maxaddr)
>>>>>
>>>>> I think this looks correct now. Nits below if a respin is needed for
>>>>> other reasons.
>>>>>
>>>>>> +{
>>>>>> +    int levelshift = level2shift(level);
>>>>>> +    u64 levelsize = 1ULL << levelshift;
>>>>>> +    u64 levelmask = levelsize - 1;
>>>>>> +    u64 levelend = addr + levelsize;
>>>>>> +    int r = 0;
>>>>>> +    int i;
>>>>>> +    bool is_level = false;
>>>>>
>>>>> I might suggest renaming that is_level_needed. It's not obvious to me
>>>>> exactly what the name "is_level" is intended to represent; the name
>>>>> seems to represent whether something (unspecified) is a level or not.
>>>>
>>>> We're basically asking the function whether a PTE for address <addr> at
>>>> level <level> would be an inval/block/level PTE. is_level marks it as a
>>>> level pte.
>>>>
>>>> I could maybe rename this into pte_type and create an enum that is
>>>> either PTE_INVAL, PTE_BLOCK or PTE_LEVEL.
>>>>
>>>>>
>>>>>> +
>>>>>> +    for (i = 0; i < ARRAY_SIZE(mem_map); i++) {
>>>>>> +        struct mm_region *map = &mem_map[i];
>>>>>> +        u64 start = map->base;
>>>>>> +        u64 end = start + map->size;
>>>>>> +
>>>>>> +        /* Check if the PTE would overlap with the map */
>>>>>> +        if (max(addr, start) <= min(levelend, end)) {
>>>>>> +            start = max(addr, start);
>>>>>> +            end = min(levelend, end);
>>>>>> +
>>>>>> +            /* We need a sub-pt for this level */
>>>>>> +            if ((start & levelmask) || (end & levelmask)) {
>>>>>> +                is_level = true;
>>>>>> +                break;
>>>>>> +            }
>>>>>> +
>>>>>> +            /* Lv0 can not do block PTEs, so do levels here too */
>>>>>> +            if (level <= 0) {
>>>>>> +                is_level = true;
>>>>>> +                break;
>>>>>> +            }
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    /*
>>>>>> +     * Block PTEs at this level are already covered by the parent
>>>>>> page
>>>>>> +     * table, so we only need to count sub page tables.
>>>>>> +     */
>>>>>> +    if (is_level) {
>>>>>> +        int sublevel = level + 1;
>>>>>> +        u64 sublevelsize = 1ULL << level2shift(sublevel);
>>>>>> +
>>>>>> +        /* Account for the new sub page table ... */
>>>>>> +        r = 1;
>>>>>
>>>>> "Account for the page table at /this/ level"? This may represent the
>>>>> top-level (0/1) page table, not just sub page tables. The sub-level
>>>>> accounting is via the recursive call to count_required_pts() I
>>>>> believe.
>>>>
>>>> I think the easiest way to visualize what's going on is if we start
>>>> with
>>>> level -1.
>>>>
>>>> We basically ask the function at level -1 whether a PTE at level -1 (48
>>>> bits) would fit into a block PTE at level -1 (so the PTE spans all 48
>>>> bits) or whether we need to create a new level entry plus page table
>>>> for it.
>>>
>>> Hmm, I had overlooked that the call to count_required_pts() passed
>>> "start_level - 1" not "start_level". Now that I see that, your
>>> explanation makes more sense to me.
>>>
>>> It'd be nice if some/all of this explanation were comments in the code
>>> to aid readers.
>>>
>>> That said, I would have expected a slightly more direct calculation; why
>>> not:
>>>
>>> int start_level = 0; // or 1 if small VA space
>>> total_pts = count_required_pts(start_level);
>>
>> We need to account for the count twice, to have an (immutable) copy of
>> our page tables around when we split them.
> 
> I think that's just a hard-coded "* 2" there, unless I'm missing your
> point?

Yes :).

> 
>>> total_pts += 4; // "random" number to account for later splits
>>>
>>> int count_required_pts(int level, ...) {
>>
>> ... includes the address, as I have it today, I guess?
>>
>>>      int npts = 1; // the table at this level
>>>      for each pte slot at this level:
>>>          if a mem_map entry starts/stops within the pte slot:
>>>              npts += count_required_pts(level + 1, ...);
>>
>> This means we would count some levels multiple times. Imagine two
>> entries from
>>
>>     1G - 1.25G
>>     1.25G - 1.5G
>>
>> With your pseudo-code, we would count for both cases while checking for
>> 1G page table entries. Hence we need to jump out of the loop here and do
>> the counting outside.
> 
> I don't think so; "if a mem_map entry starts/stops within the pte slot"
> is a single decision for that PT entry at the current level, not a loop
> that recurses once per memory block. Hence, you'd only ever recurse
> once, so there's no double-counting.

Ah, "if a" means another loop over all mem_maps, eventually saying "yes,
I am overlapping".

Either way, I am not fully convinced the code would end up much more
readable than it is today, but I'll be happy to get persuaded otherwise :).

And yes, that's definitely something we can still work on down the road
in future follow-up patches.


Alex


More information about the U-Boot mailing list