[U-Boot] [PATCH 02/10] arm64: Make full va map code more dynamic

Mark Rutland mark.rutland at arm.com
Wed Feb 24 15:52:16 CET 2016


Hi,

This is a good cleanup!

On Wed, Feb 24, 2016 at 01:11:36PM +0100, Alexander Graf wrote:
> The idea to generate our pages tables from an array of memory ranges
> is very sound. However, instead of hard coding the code to create up
> to 2 levels of 64k granule page tables, we really should just create
> normal 4k page tables that allow us to set caching attributes on 2M
> or 4k level later on.
> 
> So this patch moves the full_va mapping code to 4k page size and
> makes it fully flexible to dynamically create as many levels as
> necessary for a map (including dynamic 1G/2M pages). It also adds
> support to dynamically split a large map into smaller ones when
> some code wants to set dcache attributes.

This latter part scares me a bit. It's _very_ difficult to get that
right, and bugs in incorrect code are latent with asynchronous triggers.
We had to rework the Linux page table creation to avoid issues with
splitting/creating sections [1,2] early in boot.

For Linux we can't split arbitrary sections dynamically (in the kernel
mapping) because that risks unmapping code/data in active use as part of
the required Break-Before-Make sequence.

I suspect the same applies to U-Boot, and that splitting section
dynamically is unsafe once the MMU is on and the mappings are active.

In the ARM ARM (ARM DDI 0487A.i) see D4.7.1 General TLB maintenance
requirements, which has a sub-section "Using break-before-make when
updating translation table entries".

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/401434.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-November/386209.html


> +static void set_pte_table(u64 *pte, u64 *table)
> +{
> +	/* Point *pte to the new table */
> +	debug("Setting %p to addr=%p\n", pte, table);
> +	*pte = PTE_TYPE_TABLE | (ulong)table;
> +}

When the MMU is on, you will need barriers between creating/updating
tables and plumbing them into the next level. Otherwise the stores may
not be observed in-order by the TLB, and stale (garbage) might be
allocated into the TLB.

Either that needs to live at the end of the table creation function, or
the beginning of this one.

[...]

> +/* Splits a block PTE into table with subpages spanning the old block */
> +static void split_block(u64 *pte, int level)
> +{
> +	u64 old_pte = *pte;
> +	u64 *new_table;
> +	u64 i = 0;
> +	/* level describes the parent level, we need the child ones */
> +	int levelshift = level2shift(level + 1);
> +
> +	if (pte_type(pte) != PTE_TYPE_BLOCK)
> +		panic("PTE %p (%llx) is not a block. Some driver code wants to "
> +		      "modify dcache settings for an range not covered in "
> +		      "mem_map.", pte, old_pte);
> +
> +	new_table = create_table();
> +	debug("Splitting pte %p (%llx) into %p\n", pte, old_pte, new_table);
> +
> +	for (i = 0; i < MAX_PTE_ENTRIES; i++) {
> +		new_table[i] = old_pte | (i << levelshift);
> +		debug("Setting new_table[%lld] = %llx\n", i, new_table[i]);
> +	}

You need a barrier here to ensure ordering of the above modifications
with the below plumbing into the next level table.

> +
> +	/* Set the new table into effect */
> +	set_pte_table(pte, new_table);

Splitting blocks in this manner requires Break-Before-Make. You must go
via an invalid entry.

[...]

> +static u64 set_one_region(u64 start, u64 size, u64 attrs, int level)
> +{
> +	int levelshift = level2shift(level);
> +	u64 levelsize = 1ULL << levelshift;
> +	u64 *pte = find_pte(start, level);
> +
> +	/* Can we can just modify the current level block PTE? */
> +	if (is_aligned(start, size, levelsize)) {
> +		*pte &= ~PMD_ATTRINDX_MASK;
> +		*pte |= attrs;
> +		debug("Set attrs=%llx pte=%p level=%d\n", attrs, pte, level);
> +
> +		return levelsize;
> +	}

When the MMU is on, I believe you need Break-Before-Make here due to the
change of memory attributes.

> +
> +	/* Unaligned or doesn't fit, maybe split block into table */
> +	debug("addr=%llx level=%d pte=%p (%llx)\n", start, level, pte, *pte);
> +
> +	/* Maybe we need to split the block into a table */
> +	if (pte_type(pte) == PTE_TYPE_BLOCK)
> +		split_block(pte, level);
> +
> +	/* And then double-check it became a table or already is one */
> +	if (pte_type(pte) != PTE_TYPE_TABLE)
> +		panic("PTE %p (%llx) for addr=%llx should be a table",
> +		      pte, *pte, start);
> +
> +	/* Roll on to the next page table level */
> +	return 0;
> +}
> +
> +void mmu_set_region_dcache_behaviour(phys_addr_t start, size_t size,
> +				     enum dcache_option option)
> +{
> +	u64 attrs = PMD_ATTRINDX(option);
> +	u64 real_start = start;
> +	u64 real_size = size;
> +
> +	debug("start=%lx size=%lx\n", (ulong)start, (ulong)size);
> +
> +	/*
> +	 * Loop through the address range until we find a page granule that fits
> +	 * our alignment constraints, then set it to the new cache attributes
> +	 */
> +	while (size > 0) {
> +		int level;
> +		u64 r;
> +
> +		for (level = 1; level < 4; level++) {
> +			r = set_one_region(start, size, attrs, level);
> +			if (r) {
> +				/* PTE successfully replaced */
> +				size -= r;
> +				start += r;
> +				break;
> +			}
> +		}
> +
> +	}
> +
> +	asm volatile("dsb sy");
> +	__asm_invalidate_tlb_all();
> +	asm volatile("dsb sy");

This all needs to happen above with the Break-Before-Make sequence. It
is too late to have this here due to asynchronous TLB behaviour.

> +	asm volatile("isb");

I'm not sure this ISB is necessary.

> +	flush_dcache_range(real_start, real_start + real_size);
> +	asm volatile("dsb sy");

Does flush_dcache_range not have this barrier implicitly?

Thanks,
Mark.


More information about the U-Boot mailing list