[PATCH v2] pinctrl: single: parse gpio-range as a raw array (O(N^2) -> O(N))

Thu Jun 25 19:47:12 CEST 2026

Hi Simon,

Thanks for the thorough review.

> phandle == 0 used to terminate the scan; now it is silently skipped.

You're right, good catch. The original code breaks on the first
ofnode_parse_phandle_with_args() error, and a zero phandle is one such
error, so it ended the list. v1 turned that into a skip, which is a
behaviour change. v2 restores the terminate-on-zero-phandle behaviour:

ofnode_read_u32_index(node, propname, base, &phandle);
if (!phandle)
break;

> These two [args_count check, size-mismatch] turn a silent no-op into a
> hard probe failure.

Agreed. v2 drops both -EINVAL paths: it keeps the lenient behaviour and
does not fail the probe on malformed or partial data. args_count is pinned
by #pinctrl-single,gpio-range-cells, and a non-multiple size simply leaves
the trailing partial entry unprocessed.

> Did you consider ofnode_read_u32_array() [...] instead of be32_to_cpup()?

Switched to ofnode_read_u32_index() for the argument cells. There is no
open-coded __be32 walk, and it works for OF_LIVE too. I read only the
three argument cells per entry, since the phandle target is never
dereferenced, so there is no phandle resolution in the loop. That was the
O(N^2) cost.

> The ret/ret_total split is unnecessary - just reuse ret.

Done, single ret, used only for the one phandle lookup.

> Please drop the extra spacing.

Done, single space.

> +1 to avoiding alloc. You can create a variable-length local var.

v2 coalesces the per-entry allocation into one devm_kcalloc(dev, entries,
...). The ranges themselves cannot be local or a VLA: they are linked into
priv->gpiofuncs and consumed later by single_get_gpio_func(), so they have
to outlive this function. The single allocation removes the repeated
malloc and NULL check you and Anshul flagged.

> Is every entry in the property always for the same device? I'm not sure
> where that is required in the schema.

You may well be right that the schema does not formally require it. In
every pinctrl-single,gpio-range instance I have come across, all entries
point at the controller's own gpio-range provider, so
#pinctrl-single,gpio-range-cells, and therefore the per-entry stride, is
uniform. I have noted that assumption in a comment.

If a DT ever mixed providers with different cell counts, the stride would
have to be recomputed per entry, which would reintroduce a phandle resolve
per entry. I am happy to do that if you would rather keep full generality
over the fast path, but I am not aware of any such DT today.

v2 with all of the above follows.

Thanks again to you and Anshul.

Jordi

Missatge de Jordi T <yordy1902 at gmail.com> del dia dj., 25 de juny 2026
a les 19:46:
>
> Hi Anshul,
>
> Thanks for the review and for trying to reproduce it.
>
> > I'm unable to reproduce the delays you have observed with the J722s EVM.
> > Are you using a custom DT perhaps?
>
> You're right that the default EVM build doesn't show a large delay, and it took
> a board test on my side to understand why. No custom gpio-range property is
> involved - it is the SoC's stock 7-entry main-domain gpio-range. What changes
> is when pinctrl at f4000 (main_pmx0) is probed.
>
> On the stock EVM defconfig it is first probed after relocation (caches on).
> There, single_add_gpio_func() is only ~18 ms, and ~1 ms with this patch - easy
> to miss in the overall boot unless you time the function directly. If you put a
> get_timer() around single_add_gpio_func() on a stock EVM, you should see that
> ~18 ms -> ~1 ms.
>
> The 723 ms in the commit came from a product configuration where the node is
> probed before relocation (caches off). This happens during the pre-relocation
> console bring-up: setting up the main UART applies its pin mux, which probes
> main_pmx0 with caches off. That board also runs its LPDDR4 clocked down, as it
> is an early sample and needed that for stability. The two factors decompose
> cleanly from real measurements of the same unpatched function:
>
>     caches off vs on, same board   : ~17x  (723 ms vs 41 ms post-reloc)
>     reduced DDR vs EVM, caches on  : ~2x   (41 ms vs the EVM's 18 ms)
>
> i.e. 723 ms = O(N^2) x caches-off x reduced-DDR. On the EVM's stock DDR, a
> pre-relocation probe would land around ~300 ms. Either way, the patch removes a
> genuinely quadratic full-FDT scan per entry and produces an identical
> gpio-range list.
>
> On the code points, all addressed in v2:
>
> > Why rename 'ret' to 'ret_total' here?
>
> Leftover, dropped. A single 'ret' is used only for the one phandle lookup.
>
> > The size check is redundant
>
> Agreed, dropped; v2 uses just if (!list).
>
> > You could just do if (!phandle_i) ... is the validation required?
>
> Correct, a valid gpio-range never has a null phandle. v2 reads the entries
> directly; a zero phandle terminates the list, as in the original loop.
>
> > I wonder if we can just allocate a pool of ranges[entries] ...
>
> Done: v2 does a single devm_kcalloc(dev, entries, ...) and walks the array,
> removing the per-entry allocation and its null check.
>
> v2 follows shortly.
>
> Thanks again.
>
> Jordi