[PATCH 1/3] arm64: Use FEAT_HAFDBS to track dirty pages when available

Chris Packham judge.packham at gmail.com
Tue Oct 17 06:22:23 CEST 2023


On Tue, Oct 17, 2023 at 12:21 AM Marc Zyngier <maz at kernel.org> wrote:
>
> On Mon, 16 Oct 2023 02:42:08 +0100,
> Chris Packham <judge.packham at gmail.com> wrote:
> >
> > On Sun, Oct 15, 2023 at 10:29 AM Chris Packham <judge.packham at gmail.com> wrote:
> > >
> > >
> > >
> > > On Sat, 14 Oct 2023, 11:04 am Marc Zyngier, <maz at kernel.org> wrote:
> > >>
> > >> On 2023-10-13 03:40, Chris Packham wrote:
> > >> > Hi Marc, Paul,
> > >> >
> > >> > On Sat, Mar 18, 2023 at 5:23 AM Ying-Chun Liu (PaulLiu)
> > >> > <paul.liu at linaro.org> wrote:
> > >> >>
> > >> >> From: Marc Zyngier <maz at kernel.org>
> > >> >>
> > >> >> Some recent arm64 cores have a facility that allows the page
> > >> >> table walker to track the dirty state of a page. This makes it
> > >> >> really efficient to perform CMOs by VA as we only need to look
> > >> >> at dirty pages.
> > >> >>
> > >> >> Signed-off-by: Marc Zyngier <maz at kernel.org>
> > >> >> [ Paul: pick from the Android tree. Rebase to the upstream ]
> > >> >> Signed-off-by: Ying-Chun Liu (PaulLiu) <paul.liu at linaro.org>
> > >> >> Cc: Tom Rini <trini at konsulko.com>
> > >> >> Link:
> > >> >> https://android.googlesource.com/platform/external/u-boot/+/3c433724e6f830a6b2edd5ec3d4a504794887263
> > >> >
> > >> > I think this may have caused a regression for the Marvell AC5X
> > >> > board(s). I found that v2023.07 locked up at boot but v2023.01 was
> > >> > fine. The lockup seemed to be in the 'Net:' init probably just as the
> > >> > mvneta driver was being initialised.
> > >> >
> > >> > A git bisect led me to this change although for this specific change
> > >> > instead of the lockup I get a crash so maybe I'm actually hitting a
> > >> > different issue.
> > >> >
> > >> > Any thoughts as to why this may have caused problems?
> > >>
> > >> Not really. What CPUs does this platform have? What is the offending
> > >> driver doing to trigger the issue? Can you provide some level of
> > >> tracing?
> > >
> > >
> > > The Marvell AC5X is a network switch ASIC with an integrated ARMv8 CPU (8.1 specifically I think).
> > >
> > > I think there is something that the mvneta driver is doing triggering the issue. I have another AC5X based board without an Ethernet port that boots just fine (this is also why I didn't notice earlier).
> > >
> > > I'll try and get some more debug out when I'm back in the office
> > >
> >
> > The thing the mvneta driver does that upsets things appears to be
> >
> >     mmu_set_region_dcache_behaviour((phys_addr_t)bd_space, BD_SPACE,
> >                                                           DCACHE_OFF);
> >
> > I can comment that line out and everything works.
>
> This leads to two questions:
>
> - is the device cache coherent, in which case it doesn't need the
>   memory being non-cacheable? If everything is OK, then why the switch
>   to device memory?

I'll be honest and say I understand less than 50% of that. The network
transfer does seem to work without the call so perhaps the device is
cache coherent but this seems to be a common thing in many drivers so
I'd assume that on such platforms this should be innocuous. It's
totally possible I haven't done a good job of setting up the CPU or
informing the rest of the system about it. I did just take a lot of
the code from the Marvell SDK and clean it up without really
understanding what most of it did.

>
> - what goes wrong when these attributes are applied? do we have to
>   split a block mapping?
>
> Instrumenting the MMU code would certainly help understanding what
> goes wrong here.

I did do that a little bit. At first I thought there was a possible
infinite loop in mmu_set_region_dcache_behaviour(). Squinting at
things you could naively say that if set_one_region() failed to find
an entry then it would loop forever but if that happened I'd have some
debug saying that it failed. Things seem to go south after
__asm_switch_ttbr(gd->arch.tlb_emerg) which did get me thinking that
perhaps the emergency tables aren't setup (or at least aren't set up
in a way that allows debug output). That's about as far as I got
debugging wise, I'll try and spend some more time digging into the MMU
code.

>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.


More information about the U-Boot mailing list