[PATCH v5 10/13] video: Only dcache flush damaged lines

Simon Glass sjg at chromium.org
Tue Aug 22 01:03:19 CEST 2023


Hi Alex,

On Mon, 21 Aug 2023 at 16:44, Alexander Graf <agraf at csgraf.de> wrote:
>
>
> On 22.08.23 00:10, Simon Glass wrote:
> > Hi Alex,
> >
> > On Mon, 21 Aug 2023 at 13:59, Alexander Graf <agraf at csgraf.de> wrote:
> >>
> >> On 21.08.23 21:11, Simon Glass wrote:
> >>> Hi Alper,
> >>>
> >>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak at gmail.com> wrote:
> >>>> From: Alexander Graf <agraf at csgraf.de>
> >>>>
> >>>> Now that we have a damage area tells us which parts of the frame buffer
> >>>> actually need updating, let's only dcache flush those on video_sync()
> >>>> calls. With this optimization in place, frame buffer updates - especially
> >>>> on large screen such as 4k displays - speed up significantly.
> >>>>
> >>>> Signed-off-by: Alexander Graf <agraf at csgraf.de>
> >>>> Reported-by: Da Xue <da at libre.computer>
> >>>> [Alper: Use damage.xstart/yend, IS_ENABLED()]
> >>>> Co-developed-by: Alper Nebi Yasak <alpernebiyasak at gmail.com>
> >>>> Signed-off-by: Alper Nebi Yasak <alpernebiyasak at gmail.com>
> >>>> ---
> >>>>
> >>>> Changes in v5:
> >>>> - Use xstart, ystart, xend, yend as names for damage region
> >>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
> >>>>
> >>>> Changes in v2:
> >>>> - Fix dcache range; we were flushing too much before
> >>>> - Remove ifdefs
> >>>>
> >>>>    drivers/video/video-uclass.c | 41 +++++++++++++++++++++++++++++++-----
> >>>>    1 file changed, 36 insertions(+), 5 deletions(-)
> >>> This is a little strange, since flushing the whole cache will only
> >>> actually write out data that was actually written (to the display). Is
> >>> there a benefit to this patch, in terms of performance?
> >>
> >> I'm happy to see you go through the same thought process I went through
> >> when writing these: "This surely can't be the problem, can it?". The
> >> answer is "simple" in hindsight:
> >>
> >> Have a look at the ARMv8 cache flush function. It does the only "safe"
> >> thing you can expect it to do: Clean+Invalidate to POC because we use it
> >> for multiple things, clearing modified code among others:
> >>
> >> ENTRY(__asm_flush_dcache_range)
> >>           mrs     x3, ctr_el0
> >>           ubfx    x3, x3, #16, #4
> >>           mov     x2, #4
> >>           lsl     x2, x2, x3              /* cache line size */
> >>
> >>           /* x2 <- minimal cache line size in cache system */
> >>           sub     x3, x2, #1
> >>           bic     x0, x0, x3
> >> 1:      dc      civac, x0       /* clean & invalidate data or unified
> >> cache */
> >>           add     x0, x0, x2
> >>           cmp     x0, x1
> >>           b.lo    1b
> >>           dsb     sy
> >>           ret
> >> ENDPROC(__asm_flush_dcache_range)
> >>
> >>
> >> Looking at the "dc civac" call, we find this documentation page from
> >> ARM:
> >> https://developer.arm.com/documentation/ddi0601/2022-03/AArch64-Instructions/DC-CIVAC--Data-or-unified-Cache-line-Clean-and-Invalidate-by-VA-to-PoC
> >>
> >> This says we're writing any dirtyness of this cache line up to the POC
> >> and then invalidate (remove the cache line) also up to POC. That means
> >> when you look at a typical SBC, this will either be L2 or system level
> >> cache. Every read afterwards needs to go and pull it all the way back to
> >> L1 to modify it (or not) on the next character write and then flush it
> >> again.
> >>
> >> Even worse: Because of the invalidate, we may even evict it from caches
> >> that the display controller uses to read the frame buffer. So depending
> >> on the SoC's cache topology and implementation, we may force the display
> >> controller to refetch the full FB content on its next screen refresh cycle.
> >>
> >> I faintly remember that I tried to experiment with CVAC instead to only
> >> flush without invalidating. I don't fully remember the results anymore
> >> though. I believe CVAC just behaved identical to CIVAC on the A53
> >> platform I was working on. And then I looked at Cortex-A53 errata like
> >> [1] and just accepted that doing anything but restricting the flushing
> >> range is a waste of time :)
> > Yuck I didn't know it was invalidating too. That is horrible. Is there
> > no way to fix it?
>
>
> Before building all of this damage logic, I tried, but failed. I'd
> welcome anyone else to try again :). I'm not even convinced yet that it
> is actually fixable: Depending on the SoC's internal cache logic, it may
> opt to always invalidate I think.

Wow, that is crazy! How is anyone supposed to make the system run well
with logic like that??!

>
> That said, this patch set really also makes sense outside of the
> particular invalidate problem. It creates a generic abstraction between
> the copy and non-copy code path and allows us to reduce the amount of
> work spent for both, generically for any video sync operation.

Sure...my question was really why it helps so much, given what I
understood the caches to be doing. If they are invalidating, then it
is amazing anything gets done...

Regards,
SImon


More information about the U-Boot mailing list