[U-Boot] ARMv7 MMU shareability issue

Albert ARIBAUD albert.u.boot at aribaud.net
Mon Dec 14 08:25:19 CET 2015


Hello Marek,

On Thu, 10 Dec 2015 04:53:01 +0100, Marek Vasut <marex at denx.de> wrote:
> On Thursday, December 10, 2015 at 03:27:08 AM, Tom Rini wrote:
> > On Wed, Dec 09, 2015 at 03:09:13PM +0100, Marek Vasut wrote:
> > > On Wednesday, December 09, 2015 at 02:02:03 PM, Stefan Roese wrote:
> > > > Hi All!
> > > > 
> > > > On 09.12.2015 09:09, Albert ARIBAUD wrote:
> > > > > On Tue,  8 Dec 2015 14:43:42 +0100, Marek Vasut <marex at denx.de> wrote:
> > > > >> The arch/arm/lib/cache-cp15.c checks for CONFIG_ARMV7 and if this
> > > > >> macro is set, it configures TTBR0 register. This register must be
> > > > >> configured for the cache on ARMv7 to operate correctly.
> > > > >> 
> > > > >> The problem is that noone actually sets the CONFIG_ARMV7 macro and
> > > > >> thus the TTBR0 is not configured at all. On SoCFPGA, this produces
> > > > >> all sorts of minor issues which are hard to replicate, for example
> > > > >> certain USB sticks are not detected or QSPI NOR sometimes fails to
> > > > >> write pages completely.
> > > > >> 
> > > > >> The solution is to replace CONFIG_ARMV7 test with CONFIG_CPU_V7 one.
> > > > >> This is correct because the code which added the test(s) for
> > > > >> CONFIG_ARMV7 was added shortly after CONFIG_ARMV7 was replaced by
> > > > >> CONFIG_CPU_V7 and this code was not adjusted correctly to reflect
> > > > >> that change.
> > > > > 
> > > > > Note:
> > > > > 
> > > > > As discussed with Marek on IRC, this patch enables what is supposed
> > > > > to be the correct MMU settings for ARMv7, which causes a sharp
> > > > > Ethernet performance drop (40%) but also a strong general memory
> > > > > access performance hit (a copy of 4 MB is almost instantaneous
> > > > > without the patch and takes 2-3 seconds with it).
> > > > 
> > > > I've tested Marek's patch on my current Armada XP platform (also
> > > > ARMv7). And also noticed a performance drop by about 30-40%.
> > > > The dhrystone command can be used for this testing as well
> > > > (CONFIG_CMD_DHRYSTONE).
> > > > 
> > > > So this is definitely a NAK from me to this current patch.
> > > 
> > > This is a wrong approach, this patch just fixes another patch which was
> > > broken from the getgo and the TTBR0 reg was not programmed correctly due
> > > to that. This patch must be applied, but what we need to decide is
> > > whether we need _another_ patch which removes the S-bit from the
> > > pagetable or how to deal with that part.
> > 
> > No, we don't have to apply this patch.  The original patch here
> > (97840b5) was likely not run-time tested when submitted as it was using
> > the then-defunct CONFIG_ARMV7 symbol, and was likely a bugfix in an
> > internal tree that was pushed upstream (which is on the whole, good).
> 
> I find it surprising that such a patch, which modifies common code even,
> was not scrutinized enough.
> 
> > So in sum U-Boot has always been as broken in some specific regard
> > before that patch as it is after that patch.  So we have time to see
> > what we need to do about enabling this feature correctly and even
> > ensuring that we don't happen to say break things once Linux is up too.
> 
> In that case, I am looking forward to better suggestions.
> 
> I still disagree that this patch should not be applied, it corrects code
> which was broken. The slowdown caused by the original patch is a separate
> issue and should be corrected by a separate patch. If these two patches
> get applied at once, that is fine.

This patch has several effects:

- it selects proper ARMv7 translation table level 1 bit definitions;
- it provides proper ARMv7 definitions for WT/WB/WA;
- it selects proper ARMv7 settings for TTBR0.

All these are correct as per the docs I have (although I may have missed
something during the readings (and cross-readings with Marek) of these
last hours/days.

Now, one specific effect goes against performance, and it is the
setting of bit S in all TT entries. This bit makes the corresponding
region shareable, but for all I know, in U-Boot we don't have more than
one core accessing the same memory or registers sets so -- at least for
the major part of its execution -- there is no reason for any region to
be shareable.

/That/ effect I certainly don't want.

The rest I am fine with.

So, if we apply this patch minus the inclusion of the S bit in the
definition of DCACHE_OFF (and hence of all DCACHE_* enum members after
it), then we get code that is more correct from a theoretical
standpoint, and does not degrade performance (which is a hint,
admittedly weak, to me that it is not incorrect from a practical
standpoint.

It does not fix the USB and QSPI issues in the socfpga case, but I am
not sure these are caused by the S bit being zero.

> Best regards,
> Marek Vasut

Amicalement,
-- 
Albert.


More information about the U-Boot mailing list