[PATCH v5 00/13] Add video damage tracking

Alexander Graf agraf at csgraf.de
Mon Aug 21 22:20:05 CEST 2023


On 21.08.23 21:57, Simon Glass wrote:
> Hi Alex,
>
> On Mon, 21 Aug 2023 at 13:33, Alexander Graf <agraf at csgraf.de> wrote:
>>
>> On 21.08.23 21:11, Simon Glass wrote:
>>> Hi Alper,
>>>
>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak at gmail.com> wrote:
>>>> This is a rebase of Alexander Graf's video damage tracking series, with
>>>> some tests and other changes. The original cover letter is as follows:
>>>>
>>>>> This patch set speeds up graphics output on ARM by a factor of 60x.
>>>>>
>>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
>>>>> but need it accessible by the display controller which reads directly
>>>>> from a later point of consistency. Hence, we flush the frame buffer to
>>>>> DRAM on every change. The full frame buffer.
>>> It should not, see below.
>>>
>>>>> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
>>>>> that can take a while to flush out. This was reported by Da Xue with grub,
>>>>> which happily print 1000s of spaces on the screen to draw a menu. Every
>>>>> printed space triggers a cache flush.
>>> That is a bug somewhere in EFI.
>>
>> Unfortunately not :). You may call it a bug in grub: It literally prints
>> over space characters for every character in its menu that it wants
>> cleared. On every text screen draw.
>>
>> This wouldn't be a big issue if we only flush the reactangle that gets
>> modified. But without this patch set, we're flushing the full DRAM
>> buffer on every u-boot text console character write, which means for
>> every character (as that's the only API UEFI has).
>>
>> As a nice side effect, we speed up the normal U-Boot text console as
>> well with this patch set, because even "normal" text prints that write
>> for example a single line of text on the screen today flush the full
>> frame buffer to DRAM.
> No, I mean that it is a bug that U-Boot (apparently) flushes the cache
> after every character. It doesn't do that for normal character output
> and I don't think it makes sense to do it for EFI either.


I see. Let's trace the calls:

efi_cout_output_string()
-> fputs()
-> vidconsole_puts()
-> video_sync()
-> flush_dcache_range()

Unfortunately grub abstracts character backends down to the "print 
character" level, so it calls UEFI's sopisticated "output_string" 
callback with single characters at a time, which means we do a full 
dcache flush for every character that we print:

https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165


>
>>
>>>>> This patch set implements the easiest mitigation against this problem:
>>>>> Damage tracking. We remember the lowest common denominator region that was
>>>>> touched since the last video_sync() call and only flush that. The most
>>>>> typical writer to the frame buffer is the video console, which always
>>>>> writes rectangles of characters on the screen and syncs afterwards.
>>>>>
>>>>> With this patch set applied, we reduce drawing a large grub menu (with
>>>>> serial console attached for size information) on an RK3399-ROC system
>>>>> at 1440p from 55 seconds to less than 1 second.
>>>>>
>>>>> Version 2 also implements VIDEO_COPY using this mechanism, reducing its
>>>>> overhead compared to before as well. So even x86 systems should be faster
>>>>> with this now :).
>>>>>
>>>>>
>>>>> Alternatives considered:
>>>>>
>>>>>     1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
>>>>>        so often. We are missing timers to do this generically.
>>>>>
>>>>>     2) Double buffering - We could try to identify whether anything changed
>>>>>        at all and only draw to the FB if it did. That would require
>>>>>        maintaining a second buffer that we need to scan.
>>>>>
>>>>>     3) Text buffer - Maintain a buffer of all text printed on the screen with
>>>>>        respective location. Don't write if the old and new character are
>>>>>        identical. This would limit applicability to text only and is an
>>>>>        optimization on top of this patch set.
>>>>>
>>>>>     4) Hash screen lines - Create a hash (sha256?) over every line when it
>>>>>        changes. Only flush when it does. I'm not sure if this would waste
>>>>>        more time, memory and cache than the current approach. It would make
>>>>>        full screen updates much more expensive.
>>> 5) Fix the bug mentioned above?
>>>
>>>> Changes in v5:
>>>> - Add patch "video: test: Split copy frame buffer check into a function"
>>>> - Add patch "video: test: Support checking copy frame buffer contents"
>>>> - Add patch "video: test: Test partial updates of hardware frame buffer"
>>>> - Use xstart, ystart, xend, yend as names for damage region
>>>> - Document damage struct and fields in struct video_priv comment
>>>> - Return void from video_damage()
>>>> - Fix undeclared priv error in video_sync()
>>>> - Drop unused headers from video-uclass.c
>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>>>> - Call video_damage() also in video_fill_part()
>>>> - Use met->baseline instead of priv->baseline
>>>> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
>>>> - Update console_rotate.c video_damage() calls to pass video tests
>>>> - Remove mention about not having minimal damage for console_rotate.c
>>>> - Add patch "video: test: Test video damage tracking via vidconsole"
>>>> - Document new vdev field in struct efi_gop_obj comment
>>>> - Remove video_sync_copy() also from video_fill(), video_fill_part()
>>>> - Fix memmove() calls by removing the extra dev argument
>>>> - Call video_sync() before checking copy_fb in video tests
>>>> - Imply VIDEO_DAMAGE for video drivers instead of selecting it
>>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
>>>>
>>>> v4: https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/
>>>>
>>>> Changes in v4:
>>>> - Move damage clear to patch "dm: video: Add damage tracking API"
>>>> - Simplify first damage logic
>>>> - Remove VIDEO_DAMAGE default for ARM
>>>> - Skip damage on EfiBltVideoToBltBuffer
>>>> - Add patch "video: Always compile cache flushing code"
>>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"
>>>>
>>>> v3: https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/
>>>>
>>>> Changes in v3:
>>>> - Adapt to always assume DM is used
>>>> - Adapt to always assume DM is used
>>>> - Make VIDEO_COPY always select VIDEO_DAMAGE
>>>>
>>>> v2: https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/
>>>>
>>>> Changes in v2:
>>>> - Remove ifdefs
>>>> - Fix ranges in truetype target
>>>> - Limit rotate to necessary damage
>>>> - Remove ifdefs from gop
>>>> - Fix dcache range; we were flushing too much before
>>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
>>>>
>>>> v1: https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/
>>>>
>>>> Alexander Graf (9):
>>>>     dm: video: Add damage tracking API
>>>>     dm: video: Add damage notification on display fills
>>>>     vidconsole: Add damage notifications to all vidconsole drivers
>>>>     video: Add damage notification on bmp display
>>>>     efi_loader: GOP: Add damage notification on BLT
>>>>     video: Only dcache flush damaged lines
>>>>     video: Use VIDEO_DAMAGE for VIDEO_COPY
>>>>     video: Always compile cache flushing code
>>>>     video: Enable VIDEO_DAMAGE for drivers that need it
>>>>
>>>> Alper Nebi Yasak (4):
>>>>     video: test: Split copy frame buffer check into a function
>>>>     video: test: Support checking copy frame buffer contents
>>>>     video: test: Test partial updates of hardware frame buffer
>>>>     video: test: Test video damage tracking via vidconsole
>>>>
>>>>    arch/arm/mach-omap2/omap3/Kconfig |   1 +
>>>>    arch/arm/mach-sunxi/Kconfig       |   1 +
>>>>    drivers/video/Kconfig             |  26 +++
>>>>    drivers/video/console_normal.c    |  27 ++--
>>>>    drivers/video/console_rotate.c    |  94 +++++++----
>>>>    drivers/video/console_truetype.c  |  37 +++--
>>>>    drivers/video/exynos/Kconfig      |   1 +
>>>>    drivers/video/imx/Kconfig         |   1 +
>>>>    drivers/video/meson/Kconfig       |   1 +
>>>>    drivers/video/rockchip/Kconfig    |   1 +
>>>>    drivers/video/stm32/Kconfig       |   1 +
>>>>    drivers/video/tegra20/Kconfig     |   1 +
>>>>    drivers/video/tidss/Kconfig       |   1 +
>>>>    drivers/video/vidconsole-uclass.c |  16 --
>>>>    drivers/video/video-uclass.c      | 190 ++++++++++++----------
>>>>    drivers/video/video_bmp.c         |   7 +-
>>>>    include/video.h                   |  59 +++----
>>>>    include/video_console.h           |  52 ------
>>>>    lib/efi_loader/efi_gop.c          |   7 +
>>>>    test/dm/video.c                   | 256 ++++++++++++++++++++++++------
>>>>    20 files changed, 483 insertions(+), 297 deletions(-)
>>> It is good to see this tidied up into something that can be applied!
>>>
>>> I am unsure what is going on with the EFI performance, though. It
>>> should not flush the cache after every character, only after a new
>>> line. Is there something wrong in here? If so, we should fix that bug
>>> first and it should be patch 1 of this series.
>>
>> Before I came up with this series, I was trying to identify the UEFI bug
>> in question as well, because intuition told me surely this is a bug in
>> UEFI :). Turns out it really isn't this time around.
> I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI
> implementation. Where did you look for the bug?


The "real" bug is in grub. But given that it's reasonably simple to work 
around in U-Boot and even with it "fixed" in grub we would still see 
performance benefits from flushing only parts of the screen, I think 
it's worth living with the grub deficiency.


Alex




More information about the U-Boot mailing list