[PATCH v5 10/13] video: Only dcache flush damaged lines
Alexander Graf
agraf at csgraf.de
Mon Aug 21 21:59:50 CEST 2023
On 21.08.23 21:11, Simon Glass wrote:
> Hi Alper,
>
> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak at gmail.com> wrote:
>> From: Alexander Graf <agraf at csgraf.de>
>>
>> Now that we have a damage area tells us which parts of the frame buffer
>> actually need updating, let's only dcache flush those on video_sync()
>> calls. With this optimization in place, frame buffer updates - especially
>> on large screen such as 4k displays - speed up significantly.
>>
>> Signed-off-by: Alexander Graf <agraf at csgraf.de>
>> Reported-by: Da Xue <da at libre.computer>
>> [Alper: Use damage.xstart/yend, IS_ENABLED()]
>> Co-developed-by: Alper Nebi Yasak <alpernebiyasak at gmail.com>
>> Signed-off-by: Alper Nebi Yasak <alpernebiyasak at gmail.com>
>> ---
>>
>> Changes in v5:
>> - Use xstart, ystart, xend, yend as names for damage region
>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>>
>> Changes in v2:
>> - Fix dcache range; we were flushing too much before
>> - Remove ifdefs
>>
>> drivers/video/video-uclass.c | 41 +++++++++++++++++++++++++++++++-----
>> 1 file changed, 36 insertions(+), 5 deletions(-)
> This is a little strange, since flushing the whole cache will only
> actually write out data that was actually written (to the display). Is
> there a benefit to this patch, in terms of performance?
I'm happy to see you go through the same thought process I went through
when writing these: "This surely can't be the problem, can it?". The
answer is "simple" in hindsight:
Have a look at the ARMv8 cache flush function. It does the only "safe"
thing you can expect it to do: Clean+Invalidate to POC because we use it
for multiple things, clearing modified code among others:
ENTRY(__asm_flush_dcache_range)
mrs x3, ctr_el0
ubfx x3, x3, #16, #4
mov x2, #4
lsl x2, x2, x3 /* cache line size */
/* x2 <- minimal cache line size in cache system */
sub x3, x2, #1
bic x0, x0, x3
1: dc civac, x0 /* clean & invalidate data or unified
cache */
add x0, x0, x2
cmp x0, x1
b.lo 1b
dsb sy
ret
ENDPROC(__asm_flush_dcache_range)
Looking at the "dc civac" call, we find this documentation page from
ARM:
https://developer.arm.com/documentation/ddi0601/2022-03/AArch64-Instructions/DC-CIVAC--Data-or-unified-Cache-line-Clean-and-Invalidate-by-VA-to-PoC
This says we're writing any dirtyness of this cache line up to the POC
and then invalidate (remove the cache line) also up to POC. That means
when you look at a typical SBC, this will either be L2 or system level
cache. Every read afterwards needs to go and pull it all the way back to
L1 to modify it (or not) on the next character write and then flush it
again.
Even worse: Because of the invalidate, we may even evict it from caches
that the display controller uses to read the frame buffer. So depending
on the SoC's cache topology and implementation, we may force the display
controller to refetch the full FB content on its next screen refresh cycle.
I faintly remember that I tried to experiment with CVAC instead to only
flush without invalidating. I don't fully remember the results anymore
though. I believe CVAC just behaved identical to CIVAC on the A53
platform I was working on. And then I looked at Cortex-A53 errata like
[1] and just accepted that doing anything but restricting the flushing
range is a waste of time :)
Alex
[1]
https://patchwork.kernel.org/project/xen-devel/patch/1462466065-30212-14-git-send-email-julien.grall@arm.com/
More information about the U-Boot
mailing list