[PATCH v3 20/36] mach-snapdragon: carve out no-map regions

Caleb Connolly caleb.connolly at linaro.org
Thu Feb 1 15:50:01 CET 2024



On 01/02/2024 12:07, Sumit Garg wrote:
> On Tue, 30 Jan 2024 at 19:35, Caleb Connolly <caleb.connolly at linaro.org> wrote:
>>
>> On Qualcomm platforms, the TZ may already have certain memory regions
>> under protection by the time U-Boot starts. There is a rare case on some
>> platforms where the prefetcher might speculatively access one of these
>> regions resulting in a board crash (TZ traps and then resets the board).
>>
>> We shouldn't be accessing these regions from within U-Boot anyway, so
>> let's mark them all with PTE_TYPE_FAULT to prevent any speculative
>> access and correctly trap in EL1 rather than EL3.
>>
>> This is quite costly with caches off (takes ~2 seconds on SDM845 vs 35ms
>> with caches on). So to minimise the impact this is only enabled on
>> QCS404 for now (where the issue is known to occur).
> 
> It only took 65ms with caches off on QCS404 as per below print:
> 
> carveout time: 65ms
> 
> It's probably due to some missing bits as pointed below to get it
> working fast on SDM845 too..

Ah I didn't consider that the difference might really just be the 2M vs
4K alignment.

[snip]

>> +
>> +/* Mark all no-map regions as PTE_TYPE_FAULT to prevent speculative access.
>> + * On some platforms this is enough to trigger a security violation and trap
>> + * to EL3.
>> + */
>> +static void carve_out_reserved_memory(void)
>> +{
>> +       static struct fdt_resource res[N_RESERVED_REGIONS] = { 0 };
>> +       int parent, rmem, count, i = 0;
>> +       phys_addr_t start;
>> +       size_t size;
>> +
>> +       /* Some reserved nodes must be carved out, as the cache-prefetcher may otherwise
>> +        * attempt to access them, causing a security exception.
>> +        */
>> +       parent = fdt_path_offset(gd->fdt_blob, "/reserved-memory");
>> +       if (parent <= 0) {
>> +               log_err("No reserved memory regions found\n");
>> +               return;
>> +       }
>> +       count = fdtdec_get_child_count(gd->fdt_blob, parent);
>> +       if (count > N_RESERVED_REGIONS) {
>> +               log_err("Too many reserved memory regions!\n");
>> +               count = N_RESERVED_REGIONS;
>> +       }
>> +
>> +       /* Collect the reserved memory regions */
>> +       fdt_for_each_subnode(rmem, gd->fdt_blob, parent) {
>> +               if (!fdt_getprop(gd->fdt_blob, rmem, "no-map", NULL))
>> +                       continue;
>> +               if (fdt_getprop(gd->fdt_blob, rmem, "compatible", NULL))
>> +                       continue;
>> +               /* Skip regions that don't have a fixed address/size */
>> +               if (fdt_get_resource(gd->fdt_blob, rmem, "reg", 0, &res[i++]))
>> +                       i--;
>> +       }
>> +
>> +       /* Sort the reserved memory regions by address */
>> +       count = i;
>> +       qsort(res, count, sizeof(struct fdt_resource), fdt_cmp_res);
>> +
>> +       /* Now set the right attributes for them. Often a lot of the regions are tightly packed together
>> +        * so we can optimise the number of calls to mmu_change_region_attr() by combining adjacent
>> +        * regions.
>> +        */
>> +       start = ALIGN_DOWN(res[0].start, SZ_2M);
>> +       size = ALIGN(res[0].end, SZ_4K) - start;
>> +       for (i = 1; i < count; i++) {
> 
> I suppose this loop fails to configure attributes for the last no-map
> region like uefi_mem region on QCS404.

Right.
> 
>> +               /* We ideally want to 2M align everything for more efficient pagetables, but we must avoid
>> +                * overwriting reserved memory regions which shouldn't be mapped as FAULT (like those with
>> +                * compatible properties).
>> +                * If within 2M of the previous region, bump the size to include this region. Otherwise
>> +                * start a new region.
>> +                */
>> +               if (start + size < res[i].start - SZ_2M) {
>> +                       debug("  0x%016llx - 0x%016llx FAULT\n",
>> +                             start, start + size);
>> +                       mmu_change_region_attr(start, size, PTE_TYPE_FAULT);
> 
> Here the size won't always be 2M aligned which is the case for SDM845.
> I would suggest you do:
> 
>                             mmu_change_region_attr(start, ALIGN(size,
> SZ_2M), PTE_TYPE_FAULT);

This isn't an option as I want to avoid unmapping reserved memory
regions which have a compatible property; on SDM845 for example this
includes the cmd-db and rmtfs regions. These are not 2M aligned and
might be directly adjacent to other regions, so doing an align here
could result in unmapping part of these regions.

This will be especially relevant for SM8250 where I'll be enabling
cmd-db and RPMh drivers in order to turn on the USB VBUS regulator on RB5.
> 
> -Sumit
> 
>> +                       start = ALIGN_DOWN(res[i].start, SZ_2M);
>> +                       size = ALIGN(res[i].end, SZ_4K) - start;
>> +               } else {
>> +                       /* Bump size if this region is immediately after the previous one */
>> +                       size = ALIGN(res[i].end, SZ_4K) - start;
>> +               }
>> +       }
>> +}
>> +
>> +/* This function open-codes setup_all_pgtables() so that we can
>> + * insert additional mappings *before* turning on the MMU.
>> + */
>>  void enable_caches(void)
>>  {
>> +       u64 tlb_addr = gd->arch.tlb_addr;
>> +       u64 tlb_size = gd->arch.tlb_size;
>> +       ulong carveout_start;
>> +
>> +       gd->arch.tlb_fillptr = tlb_addr;
>> +
>>         build_mem_map();
>>
>>         icache_enable();
>> +
>> +       /* Create normal system page tables */
>> +       setup_pgtables();
>> +
>> +       /* Create emergency page tables */
>> +       gd->arch.tlb_size -= (uintptr_t)gd->arch.tlb_fillptr -
>> +                            (uintptr_t)gd->arch.tlb_addr;
>> +       gd->arch.tlb_addr = gd->arch.tlb_fillptr;
>> +       setup_pgtables();
>> +       gd->arch.tlb_emerg = gd->arch.tlb_addr;
>> +       gd->arch.tlb_addr = tlb_addr;
>> +       gd->arch.tlb_size = tlb_size;
>> +
>> +       /* Doing this has a noticeable impact on boot time, so until we can
>> +        * find a more efficient solution, just enable the workaround for
>> +        * the one board where it's necessary. Without this there seems to
>> +        * be a speculative access to a region which is protected by the TZ.
>> +        */
>> +       if (of_machine_is_compatible("qcom,qcs404")) {
>> +               carveout_start = get_timer(0);
>> +               carve_out_reserved_memory();
>> +               debug("carveout time: %lums\n", get_timer(carveout_start));
>> +       }
>> +
>>         dcache_enable();
>>  }
>>
>> --
>> 2.43.0
>>

-- 
// Caleb (they/them)


More information about the U-Boot mailing list