[PATCH v3 20/36] mach-snapdragon: carve out no-map regions

Sumit Garg sumit.garg at linaro.org
Fri Feb 2 07:03:03 CET 2024


On Thu, 1 Feb 2024 at 20:20, Caleb Connolly <caleb.connolly at linaro.org> wrote:
>
>
>
> On 01/02/2024 12:07, Sumit Garg wrote:
> > On Tue, 30 Jan 2024 at 19:35, Caleb Connolly <caleb.connolly at linaro.org> wrote:
> >>
> >> On Qualcomm platforms, the TZ may already have certain memory regions
> >> under protection by the time U-Boot starts. There is a rare case on some
> >> platforms where the prefetcher might speculatively access one of these
> >> regions resulting in a board crash (TZ traps and then resets the board).
> >>
> >> We shouldn't be accessing these regions from within U-Boot anyway, so
> >> let's mark them all with PTE_TYPE_FAULT to prevent any speculative
> >> access and correctly trap in EL1 rather than EL3.
> >>
> >> This is quite costly with caches off (takes ~2 seconds on SDM845 vs 35ms
> >> with caches on). So to minimise the impact this is only enabled on
> >> QCS404 for now (where the issue is known to occur).
> >
> > It only took 65ms with caches off on QCS404 as per below print:
> >
> > carveout time: 65ms
> >
> > It's probably due to some missing bits as pointed below to get it
> > working fast on SDM845 too..
>
> Ah I didn't consider that the difference might really just be the 2M vs
> 4K alignment.
>
> [snip]
>
> >> +
> >> +/* Mark all no-map regions as PTE_TYPE_FAULT to prevent speculative access.
> >> + * On some platforms this is enough to trigger a security violation and trap
> >> + * to EL3.
> >> + */
> >> +static void carve_out_reserved_memory(void)
> >> +{
> >> +       static struct fdt_resource res[N_RESERVED_REGIONS] = { 0 };
> >> +       int parent, rmem, count, i = 0;
> >> +       phys_addr_t start;
> >> +       size_t size;
> >> +
> >> +       /* Some reserved nodes must be carved out, as the cache-prefetcher may otherwise
> >> +        * attempt to access them, causing a security exception.
> >> +        */
> >> +       parent = fdt_path_offset(gd->fdt_blob, "/reserved-memory");
> >> +       if (parent <= 0) {
> >> +               log_err("No reserved memory regions found\n");
> >> +               return;
> >> +       }
> >> +       count = fdtdec_get_child_count(gd->fdt_blob, parent);
> >> +       if (count > N_RESERVED_REGIONS) {
> >> +               log_err("Too many reserved memory regions!\n");
> >> +               count = N_RESERVED_REGIONS;
> >> +       }
> >> +
> >> +       /* Collect the reserved memory regions */
> >> +       fdt_for_each_subnode(rmem, gd->fdt_blob, parent) {
> >> +               if (!fdt_getprop(gd->fdt_blob, rmem, "no-map", NULL))
> >> +                       continue;
> >> +               if (fdt_getprop(gd->fdt_blob, rmem, "compatible", NULL))
> >> +                       continue;
> >> +               /* Skip regions that don't have a fixed address/size */
> >> +               if (fdt_get_resource(gd->fdt_blob, rmem, "reg", 0, &res[i++]))
> >> +                       i--;
> >> +       }
> >> +
> >> +       /* Sort the reserved memory regions by address */
> >> +       count = i;
> >> +       qsort(res, count, sizeof(struct fdt_resource), fdt_cmp_res);
> >> +
> >> +       /* Now set the right attributes for them. Often a lot of the regions are tightly packed together
> >> +        * so we can optimise the number of calls to mmu_change_region_attr() by combining adjacent
> >> +        * regions.
> >> +        */
> >> +       start = ALIGN_DOWN(res[0].start, SZ_2M);
> >> +       size = ALIGN(res[0].end, SZ_4K) - start;
> >> +       for (i = 1; i < count; i++) {
> >
> > I suppose this loop fails to configure attributes for the last no-map
> > region like uefi_mem region on QCS404.
>
> Right.
> >
> >> +               /* We ideally want to 2M align everything for more efficient pagetables, but we must avoid
> >> +                * overwriting reserved memory regions which shouldn't be mapped as FAULT (like those with
> >> +                * compatible properties).
> >> +                * If within 2M of the previous region, bump the size to include this region. Otherwise
> >> +                * start a new region.
> >> +                */
> >> +               if (start + size < res[i].start - SZ_2M) {
> >> +                       debug("  0x%016llx - 0x%016llx FAULT\n",
> >> +                             start, start + size);
> >> +                       mmu_change_region_attr(start, size, PTE_TYPE_FAULT);
> >
> > Here the size won't always be 2M aligned which is the case for SDM845.
> > I would suggest you do:
> >
> >                             mmu_change_region_attr(start, ALIGN(size,
> > SZ_2M), PTE_TYPE_FAULT);
>
> This isn't an option as I want to avoid unmapping reserved memory
> regions which have a compatible property; on SDM845 for example this
> includes the cmd-db and rmtfs regions. These are not 2M aligned and
> might be directly adjacent to other regions, so doing an align here
> could result in unmapping part of these regions.

The regions with a compatible property and "no-map" should be mapped
by corresponding drivers later on when caches are enabled. The default
memory attributes or cache policy may not make sense for them for eg.
a shared memory region with coprocessors which has to be marked
un-cached etc.

>
> This will be especially relevant for SM8250 where I'll be enabling
> cmd-db and RPMh drivers in order to turn on the USB VBUS regulator on RB5.

So the cmd-db and RPMh drivers should map corresponding reserved
memory appropriately as per their requirements via
mmu_change_region_attr(). Since we have the caches enabled at that
point it won't be an expensive operation.

-Sumit

> >
> >> +                       start = ALIGN_DOWN(res[i].start, SZ_2M);
> >> +                       size = ALIGN(res[i].end, SZ_4K) - start;
> >> +               } else {
> >> +                       /* Bump size if this region is immediately after the previous one */
> >> +                       size = ALIGN(res[i].end, SZ_4K) - start;
> >> +               }
> >> +       }
> >> +}
> >> +
> >> +/* This function open-codes setup_all_pgtables() so that we can
> >> + * insert additional mappings *before* turning on the MMU.
> >> + */
> >>  void enable_caches(void)
> >>  {
> >> +       u64 tlb_addr = gd->arch.tlb_addr;
> >> +       u64 tlb_size = gd->arch.tlb_size;
> >> +       ulong carveout_start;
> >> +
> >> +       gd->arch.tlb_fillptr = tlb_addr;
> >> +
> >>         build_mem_map();
> >>
> >>         icache_enable();
> >> +
> >> +       /* Create normal system page tables */
> >> +       setup_pgtables();
> >> +
> >> +       /* Create emergency page tables */
> >> +       gd->arch.tlb_size -= (uintptr_t)gd->arch.tlb_fillptr -
> >> +                            (uintptr_t)gd->arch.tlb_addr;
> >> +       gd->arch.tlb_addr = gd->arch.tlb_fillptr;
> >> +       setup_pgtables();
> >> +       gd->arch.tlb_emerg = gd->arch.tlb_addr;
> >> +       gd->arch.tlb_addr = tlb_addr;
> >> +       gd->arch.tlb_size = tlb_size;
> >> +
> >> +       /* Doing this has a noticeable impact on boot time, so until we can
> >> +        * find a more efficient solution, just enable the workaround for
> >> +        * the one board where it's necessary. Without this there seems to
> >> +        * be a speculative access to a region which is protected by the TZ.
> >> +        */
> >> +       if (of_machine_is_compatible("qcom,qcs404")) {
> >> +               carveout_start = get_timer(0);
> >> +               carve_out_reserved_memory();
> >> +               debug("carveout time: %lums\n", get_timer(carveout_start));
> >> +       }
> >> +
> >>         dcache_enable();
> >>  }
> >>
> >> --
> >> 2.43.0
> >>
>
> --
> // Caleb (they/them)


More information about the U-Boot mailing list