[U-Boot] [Patch v4 2/5] ARMv8: Adjust MMU setup
Mark Rutland
mark.rutland at arm.com
Thu Jun 5 12:09:26 CEST 2014
On Wed, Jun 04, 2014 at 05:27:30PM +0100, York Sun wrote:
> On 06/02/2014 11:01 AM, Mark Rutland wrote:
> > On Mon, Jun 02, 2014 at 05:06:13PM +0100, York Sun wrote:
> >> On 06/02/2014 04:34 AM, Mark Rutland wrote:
> >>> On Thu, May 29, 2014 at 09:49:05PM +0100, York Sun wrote:
> >>>> Make MMU functions reusable. Platform code can setup its own MMU tables.
> >>>
> >>> What exactly does platform code need to setup its own tables for?
> >>
> >> The general ARMv8 MMU table is not detail enough to control memory attribute
> >> like cache for all addresses. We have devices mapping to addresses with
> >> different requirement for cache control.
> >
> > And there are no APIs for creating device mappings rather than exporting
> > the raw pagetable accessors and hard-coding them differently in every
> > board file?
> >
>
> That's a good question. At this point, only two platforms are using ARMv8 code.
> I am expecting FSL ARMv8 implementation will stay similar, i.e. covered by the
> file I added. If that's not the case, or more ARMv8 SoCs need special MMU table,
> we then should introduce such API. Having a full function MMU API may be an
> overkill for U-boot. We don't need dynamic MMU anyway.
Maybe. It just seems to me that it would be possible to pre-allocate an
empty table that we could place device (nGnRnE?) mappings in. Then all
you'd need to call from board code is a function to map a range, rather
than having to duplicate logic for creating the tables you want.
>
> >>
> >>>
> >>>> Also fix a typo of "TCR_EL3_IPS_BITS" in cache_v8.c.
> >>>>
> >>>> Signed-off-by: York Sun <yorksun at freescale.com>
> >>>> CC: David Feng <fenghua at phytium.com.cn>
> >>>> ---
> >>>> Change log:
> >>>> v4: new patch, splitted from v3 2/4
> >>>> Revise set_pgtable_section() to be reused by platform MMU code
> >>>> Add inline function set_ttbr_tcr_mair() to be used by this and platform mmu code
> >>>>
> >>>> arch/arm/cpu/armv8/cache_v8.c | 49 ++++++++++++++++----------------------
> >>>> arch/arm/include/asm/armv8/mmu.h | 23 ++++++++++++++++++
> >>>> 2 files changed, 43 insertions(+), 29 deletions(-)
> >>>>
> >>>> diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c
> >>>> index a96ecda..67dbd46 100644
> >>>> --- a/arch/arm/cpu/armv8/cache_v8.c
> >>>> +++ b/arch/arm/cpu/armv8/cache_v8.c
> >>>> @@ -12,15 +12,14 @@
> >>>> DECLARE_GLOBAL_DATA_PTR;
> >>>>
> >>>> #ifndef CONFIG_SYS_DCACHE_OFF
> >>>> -
> >>>> -static void set_pgtable_section(u64 section, u64 memory_type)
> >>>> +void set_pgtable_section(u64 *page_table, u64 index, u64 section,
> >>>> + u64 memory_type)
> >>>> {
> >>>> - u64 *page_table = (u64 *)gd->arch.tlb_addr;
> >>>> u64 value;
> >>>>
> >>>> - value = (section << SECTION_SHIFT) | PMD_TYPE_SECT | PMD_SECT_AF;
> >>>> + value = section | PMD_TYPE_SECT | PMD_SECT_AF;
> >>>> value |= PMD_ATTRINDX(memory_type);
> >>>> - page_table[section] = value;
> >>>> + page_table[index] = value;
> >>>> }
> >>>>
> >>>> /* to activate the MMU we need to set up virtual memory */
> >>>> @@ -28,10 +27,13 @@ static void mmu_setup(void)
> >>>> {
> >>>> int i, j, el;
> >>>> bd_t *bd = gd->bd;
> >>>> + u64 *page_table = (u64 *)gd->arch.tlb_addr;
> >>>>
> >>>> /* Setup an identity-mapping for all spaces */
> >>>> - for (i = 0; i < (PGTABLE_SIZE >> 3); i++)
> >>>> - set_pgtable_section(i, MT_DEVICE_NGNRNE);
> >>>> + for (i = 0; i < (PGTABLE_SIZE >> 3); i++) {
> >>>> + set_pgtable_section(page_table, i, i << SECTION_SHIFT,
> >>>> + MT_DEVICE_NGNRNE);
> >>>> + }
> >>>>
> >>>> /* Setup an identity-mapping for all RAM space */
> >>>> for (i = 0; i < CONFIG_NR_DRAM_BANKS; i++) {
> >>>> @@ -39,36 +41,25 @@ static void mmu_setup(void)
> >>>> ulong end = bd->bi_dram[i].start + bd->bi_dram[i].size;
> >>>> for (j = start >> SECTION_SHIFT;
> >>>> j < end >> SECTION_SHIFT; j++) {
> >>>> - set_pgtable_section(j, MT_NORMAL);
> >>>> + set_pgtable_section(page_table, j, j << SECTION_SHIFT,
> >>>> + MT_NORMAL);
> >>>> }
> >>>> }
> >>>>
> >>>> /* load TTBR0 */
> >>>> el = current_el();
> >>>> if (el == 1) {
> >>>> - asm volatile("msr ttbr0_el1, %0"
> >>>> - : : "r" (gd->arch.tlb_addr) : "memory");
> >>>> - asm volatile("msr tcr_el1, %0"
> >>>> - : : "r" (TCR_FLAGS | TCR_EL1_IPS_BITS)
> >>>> - : "memory");
> >>>> - asm volatile("msr mair_el1, %0"
> >>>> - : : "r" (MEMORY_ATTRIBUTES) : "memory");
> >>>> + set_ttbr_tcr_mair(el, gd->arch.tlb_addr,
> >>>> + TCR_FLAGS | TCR_EL1_IPS_BITS,
> >>>> + MEMORY_ATTRIBUTES);
> >>>> } else if (el == 2) {
> >>>> - asm volatile("msr ttbr0_el2, %0"
> >>>> - : : "r" (gd->arch.tlb_addr) : "memory");
> >>>> - asm volatile("msr tcr_el2, %0"
> >>>> - : : "r" (TCR_FLAGS | TCR_EL2_IPS_BITS)
> >>>> - : "memory");
> >>>> - asm volatile("msr mair_el2, %0"
> >>>> - : : "r" (MEMORY_ATTRIBUTES) : "memory");
> >>>> + set_ttbr_tcr_mair(el, gd->arch.tlb_addr,
> >>>> + TCR_FLAGS | TCR_EL2_IPS_BITS,
> >>>> + MEMORY_ATTRIBUTES);
> >>>> } else {
> >>>> - asm volatile("msr ttbr0_el3, %0"
> >>>> - : : "r" (gd->arch.tlb_addr) : "memory");
> >>>> - asm volatile("msr tcr_el3, %0"
> >>>> - : : "r" (TCR_FLAGS | TCR_EL2_IPS_BITS)
> >>>> - : "memory");
> >>>> - asm volatile("msr mair_el3, %0"
> >>>> - : : "r" (MEMORY_ATTRIBUTES) : "memory");
> >>>> + set_ttbr_tcr_mair(el, gd->arch.tlb_addr,
> >>>> + TCR_FLAGS | TCR_EL3_IPS_BITS,
> >>>> + MEMORY_ATTRIBUTES);
> >>>> }
> >>>>
> >>>> /* enable the mmu */
> >>>> diff --git a/arch/arm/include/asm/armv8/mmu.h b/arch/arm/include/asm/armv8/mmu.h
> >>>> index 1193e76..7de4ff9 100644
> >>>> --- a/arch/arm/include/asm/armv8/mmu.h
> >>>> +++ b/arch/arm/include/asm/armv8/mmu.h
> >>>> @@ -108,4 +108,27 @@
> >>>> TCR_IRGN_WBWA | \
> >>>> TCR_T0SZ(VA_BITS))
> >>>>
> >>>> +#ifndef __ASSEMBLY__
> >>>> +void set_pgtable_section(u64 *page_table, u64 index,
> >>>> + u64 section, u64 memory_type);
> >>>> +static inline void set_ttbr_tcr_mair(int el, u64 table, u64 tcr, u64 attr)
> >>>> +{
> >>>> + asm volatile("dsb sy;isb");
> >>>
> >>> Huh? This wasn't anywhere before. Is the isb necessary?
> >>
> >> Probably not, but to make sure all previous instructions finish.
> >
> > Which instructions do you care about completing?
> >
> > You'll certainly want any page table writes to complete, but is anything
> > else in-flight at this point in time?
>
> Before calling this inline function, MMU tables were created. We want the
> writing completes before setting these registers.
Sure, but the dsb sy guarantees that.
The dsb sy will ensure that all reads and writes before it have become
visible to all observers in the full system shareability domain
(including the MMU) before any subsequent instructions (in program
order) can execute.
I'd only expect to see a "dsb $DOMAIN; isb" sequence is when we're
changing things that affect the instruction stream (I-cache maintenance,
changing the mapping underlying the currently executing stream).
> >
> >>>
> >>> When is this function expected to be called? Is the MMU expected to be
> >>> on?
> >>
> >> The general ARMv8 code calls this when MMU is off. For the SoC I am enabling, it
> >> is called when MMU is off for the first time, but MMU on for the 2nd time.
> >
> > Ok.
> >
> >>
> >>>
> >>>> + if (el == 1) {
> >>>> + asm volatile("msr ttbr0_el1, %0" : : "r" (table) : "memory");
> >>>> + asm volatile("msr tcr_el1, %0" : : "r" (tcr) : "memory");
> >>>> + asm volatile("msr mair_el1, %0" : : "r" (attr) : "memory");
> >>>
> >>> If the MMU is on, this looks really scary -- what are you expecting to
> >>> change in a single invocation?
> >>
> >> It is not scary for general ARMv8 code. MMU is off then this is called. For FSL
> >> SoC, I have these called for the 2nd time, when MMU is on. The 2nd time call
> >> points to a new MMU table.
> >
> > Oh but it is ;)
> >
> >>
> >>>
> >>>> + } else if (el == 2) {
> >>>> + asm volatile("msr ttbr0_el2, %0" : : "r" (table) : "memory");
> >>>> + asm volatile("msr tcr_el2, %0" : : "r" (tcr) : "memory");
> >>>> + asm volatile("msr mair_el2, %0" : : "r" (attr) : "memory");
> >>>> + } else if (el == 3) {
> >>>> + asm volatile("msr ttbr0_el3, %0" : : "r" (table) : "memory");
> >>>> + asm volatile("msr tcr_el3, %0" : : "r" (tcr) : "memory");
> >>>> + asm volatile("msr mair_el3, %0" : : "r" (attr) : "memory");
> >>>> + } else {
> >>>> + hang();
> >>>> + }
> >>>
> >>> And no synchronisation to ensure that these writes are complete or even
> >>> ordered w.r.t. each other?
> >>>
> >>
> >> That's why I added asm volatile("dsb sy;isb") before them. The order of these
> >> write doesn't matter. See the code before my change
> >> http://git.denx.de/?p=u-boot.git;a=blob;f=arch/arm/cpu/armv8/cache_v8.c;h=a96ecda7e3146996d20a2e46b975dced2769255c;hb=HEAD
> >
> > Ok, so that orders them against everything _before_ them, but not with
> > respect to each other, or anything _after_ them.
> >
> > When the MMU is off, you are correct that the ordering of these
> > operations w.r.t. each other doesn't matter. With the MMU on that's not
> > true as the CPU can rerder the operations and can walk the page tables
> > asynchronously.
> >
> > Changing the MAIR at runtime is somewhat scary because you may be
> > changing attributes for entries which are active in the cache and/or
> > TLBs. I'm not sure I follow why you need to change it once the MMU is on
> > -- there are plenty of subfields in the MAIR that you could configure
> > before turning the MMU on for use later.
> >
> > Likewise changing the TCR at runtime is somewhat scary because you can
> > change attributes of active cache and/or TLB entries, change fields that
> > affect the way page tables are walked (TG*. T*SZ), all asynchronously
> > w.r.t. the logic that walsk the page tables.
> >
> > Changing the TTBR is fine, except that we didn't invalidate old entries
> > with a TLBI, so we may even have partial translations cahced in the
> > TLBs, and we can get odd translations generated from the combination of
> > the old and new tables.
>
>
> Maybe a more proper way to do so is to change TTBR only. That's actually what I
> really need. Or if I really need to change all of them, I should turn off MMU first.
In fact, if you had an API for creating device mappings, you wouldn't
even need to change the TTBR -- you'd just need to replace some existing
invalid entries with the new mapping, make them visible to the MMU with
a dsb, and then carry on. The TLBs aren't allowed to cache invalid
mappings, so on the next access to those mappings, the appropriate page
table entry will be read into the TLBs.
If you alter multiple levels of a live table then you'll need barriers
(DMBs) between populating those levels. If you back device mappings by a
single level of table then you can avoid that. Even better, with an API
you can do the simple thing today then make it more advanced in future
if necessary without having to change your board code.
> >> For the 2nd write used by FSL SoC, I need to flush the cache to make sure the
> >> table is in main DDR and perform TLB invalidation. That's when the loading of
> >> new MMU table happens.
> >
> > I missed the TLB invaliation -- where is that meant to happen?
> >
> After flushing the new MMU table into DDR. To your point above, I will send a
> new version with updating TTBR only.
Unfortunately, that's too early. The MMU is asynchronous w.r.t. the CPU
and so it can re-read the existing tables into the TLBs _before_ it has
been programmed with the new table address. The TLBI has to happen
_after_ the old tables are no longer pointed to by the TTBR.
What you need to do to replace the active set of tables (assuming that
the new mapping has the instruction stream mapped in an identical way)
is:
- Write the tables.
- DSB to make them visible to the MMU.
- Write to the appropriate TTBR_*.
- ISB to complete the write to the TTBR_*.
- TLBI to invalidate the old mappings the the TLBs.
- DSB to complete the TLBI.
- If you've changed the instruction stream or system state that will
affect the instruction stream, ISB to flush the CPU pipeline.
Cheers,
Mark.
More information about the U-Boot
mailing list