[PATCH v5 00/13] Add video damage tracking

Simon Glass sjg at chromium.org
Mon Aug 28 19:54:58 CEST 2023


Hi Alex,

On Wed, 23 Aug 2023 at 02:56, Alexander Graf <agraf at csgraf.de> wrote:
>
> Hey Simon,
>
> On 22.08.23 20:56, Simon Glass wrote:
> > Hi Alex,
> >
> > On Tue, 22 Aug 2023 at 01:47, Alexander Graf <agraf at csgraf.de> wrote:
> >>
> >> On 22.08.23 01:03, Simon Glass wrote:
> >>> Hi Alex,
> >>>
> >>> On Mon, 21 Aug 2023 at 16:40, Alexander Graf <agraf at csgraf.de> wrote:
> >>>> On 22.08.23 00:10, Simon Glass wrote:
> >>>>> Hi Alex,
> >>>>>
> >>>>> On Mon, 21 Aug 2023 at 14:20, Alexander Graf <agraf at csgraf.de> wrote:
> >>>>>> On 21.08.23 21:57, Simon Glass wrote:
> >>>>>>> Hi Alex,
> >>>>>>>
> >>>>>>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf <agraf at csgraf.de> wrote:
> >>>>>>>> On 21.08.23 21:11, Simon Glass wrote:
> >>>>>>>>> Hi Alper,
> >>>>>>>>>
> >>>>>>>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak at gmail.com> wrote:
> >>>>>>>>>> This is a rebase of Alexander Graf's video damage tracking series, with
> >>>>>>>>>> some tests and other changes. The original cover letter is as follows:
> >>>>>>>>>>
> >>>>>>>>>>> This patch set speeds up graphics output on ARM by a factor of 60x.
> >>>>>>>>>>>
> >>>>>>>>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
> >>>>>>>>>>> but need it accessible by the display controller which reads directly
> >>>>>>>>>>> from a later point of consistency. Hence, we flush the frame buffer to
> >>>>>>>>>>> DRAM on every change. The full frame buffer.
> >>>>>>>>> It should not, see below.
> >>>>>>>>>
> >>>>>>>>>>> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
> >>>>>>>>>>> that can take a while to flush out. This was reported by Da Xue with grub,
> >>>>>>>>>>> which happily print 1000s of spaces on the screen to draw a menu. Every
> >>>>>>>>>>> printed space triggers a cache flush.
> >>>>>>>>> That is a bug somewhere in EFI.
> >>>>>>>> Unfortunately not :). You may call it a bug in grub: It literally prints
> >>>>>>>> over space characters for every character in its menu that it wants
> >>>>>>>> cleared. On every text screen draw.
> >>>>>>>>
> >>>>>>>> This wouldn't be a big issue if we only flush the reactangle that gets
> >>>>>>>> modified. But without this patch set, we're flushing the full DRAM
> >>>>>>>> buffer on every u-boot text console character write, which means for
> >>>>>>>> every character (as that's the only API UEFI has).
> >>>>>>>>
> >>>>>>>> As a nice side effect, we speed up the normal U-Boot text console as
> >>>>>>>> well with this patch set, because even "normal" text prints that write
> >>>>>>>> for example a single line of text on the screen today flush the full
> >>>>>>>> frame buffer to DRAM.
> >>>>>>> No, I mean that it is a bug that U-Boot (apparently) flushes the cache
> >>>>>>> after every character. It doesn't do that for normal character output
> >>>>>>> and I don't think it makes sense to do it for EFI either.
> >>>>>> I see. Let's trace the calls:
> >>>>>>
> >>>>>> efi_cout_output_string()
> >>>>>> -> fputs()
> >>>>>> -> vidconsole_puts()
> >>>>>> -> video_sync()
> >>>>>> -> flush_dcache_range()
> >>>>>>
> >>>>>> Unfortunately grub abstracts character backends down to the "print
> >>>>>> character" level, so it calls UEFI's sopisticated "output_string"
> >>>>>> callback with single characters at a time, which means we do a full
> >>>>>> dcache flush for every character that we print:
> >>>>>>
> >>>>>> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
> >>>>>>
> >>>>>>
> >>>>>>>>>>> This patch set implements the easiest mitigation against this problem:
> >>>>>>>>>>> Damage tracking. We remember the lowest common denominator region that was
> >>>>>>>>>>> touched since the last video_sync() call and only flush that. The most
> >>>>>>>>>>> typical writer to the frame buffer is the video console, which always
> >>>>>>>>>>> writes rectangles of characters on the screen and syncs afterwards.
> >>>>>>>>>>>
> >>>>>>>>>>> With this patch set applied, we reduce drawing a large grub menu (with
> >>>>>>>>>>> serial console attached for size information) on an RK3399-ROC system
> >>>>>>>>>>> at 1440p from 55 seconds to less than 1 second.
> >>>>>>>>>>>
> >>>>>>>>>>> Version 2 also implements VIDEO_COPY using this mechanism, reducing its
> >>>>>>>>>>> overhead compared to before as well. So even x86 systems should be faster
> >>>>>>>>>>> with this now :).
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Alternatives considered:
> >>>>>>>>>>>
> >>>>>>>>>>>        1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
> >>>>>>>>>>>           so often. We are missing timers to do this generically.
> >>>>>>>>>>>
> >>>>>>>>>>>        2) Double buffering - We could try to identify whether anything changed
> >>>>>>>>>>>           at all and only draw to the FB if it did. That would require
> >>>>>>>>>>>           maintaining a second buffer that we need to scan.
> >>>>>>>>>>>
> >>>>>>>>>>>        3) Text buffer - Maintain a buffer of all text printed on the screen with
> >>>>>>>>>>>           respective location. Don't write if the old and new character are
> >>>>>>>>>>>           identical. This would limit applicability to text only and is an
> >>>>>>>>>>>           optimization on top of this patch set.
> >>>>>>>>>>>
> >>>>>>>>>>>        4) Hash screen lines - Create a hash (sha256?) over every line when it
> >>>>>>>>>>>           changes. Only flush when it does. I'm not sure if this would waste
> >>>>>>>>>>>           more time, memory and cache than the current approach. It would make
> >>>>>>>>>>>           full screen updates much more expensive.
> >>>>>>>>> 5) Fix the bug mentioned above?
> >>>>>>>>>
> >>>>>>>>>> Changes in v5:
> >>>>>>>>>> - Add patch "video: test: Split copy frame buffer check into a function"
> >>>>>>>>>> - Add patch "video: test: Support checking copy frame buffer contents"
> >>>>>>>>>> - Add patch "video: test: Test partial updates of hardware frame buffer"
> >>>>>>>>>> - Use xstart, ystart, xend, yend as names for damage region
> >>>>>>>>>> - Document damage struct and fields in struct video_priv comment
> >>>>>>>>>> - Return void from video_damage()
> >>>>>>>>>> - Fix undeclared priv error in video_sync()
> >>>>>>>>>> - Drop unused headers from video-uclass.c
> >>>>>>>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
> >>>>>>>>>> - Call video_damage() also in video_fill_part()
> >>>>>>>>>> - Use met->baseline instead of priv->baseline
> >>>>>>>>>> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
> >>>>>>>>>> - Update console_rotate.c video_damage() calls to pass video tests
> >>>>>>>>>> - Remove mention about not having minimal damage for console_rotate.c
> >>>>>>>>>> - Add patch "video: test: Test video damage tracking via vidconsole"
> >>>>>>>>>> - Document new vdev field in struct efi_gop_obj comment
> >>>>>>>>>> - Remove video_sync_copy() also from video_fill(), video_fill_part()
> >>>>>>>>>> - Fix memmove() calls by removing the extra dev argument
> >>>>>>>>>> - Call video_sync() before checking copy_fb in video tests
> >>>>>>>>>> - Imply VIDEO_DAMAGE for video drivers instead of selecting it
> >>>>>>>>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
> >>>>>>>>>>
> >>>>>>>>>> v4: https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/
> >>>>>>>>>>
> >>>>>>>>>> Changes in v4:
> >>>>>>>>>> - Move damage clear to patch "dm: video: Add damage tracking API"
> >>>>>>>>>> - Simplify first damage logic
> >>>>>>>>>> - Remove VIDEO_DAMAGE default for ARM
> >>>>>>>>>> - Skip damage on EfiBltVideoToBltBuffer
> >>>>>>>>>> - Add patch "video: Always compile cache flushing code"
> >>>>>>>>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"
> >>>>>>>>>>
> >>>>>>>>>> v3: https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/
> >>>>>>>>>>
> >>>>>>>>>> Changes in v3:
> >>>>>>>>>> - Adapt to always assume DM is used
> >>>>>>>>>> - Adapt to always assume DM is used
> >>>>>>>>>> - Make VIDEO_COPY always select VIDEO_DAMAGE
> >>>>>>>>>>
> >>>>>>>>>> v2: https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/
> >>>>>>>>>>
> >>>>>>>>>> Changes in v2:
> >>>>>>>>>> - Remove ifdefs
> >>>>>>>>>> - Fix ranges in truetype target
> >>>>>>>>>> - Limit rotate to necessary damage
> >>>>>>>>>> - Remove ifdefs from gop
> >>>>>>>>>> - Fix dcache range; we were flushing too much before
> >>>>>>>>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
> >>>>>>>>>>
> >>>>>>>>>> v1: https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/
> >>>>>>>>>>
> >>>>>>>>>> Alexander Graf (9):
> >>>>>>>>>>        dm: video: Add damage tracking API
> >>>>>>>>>>        dm: video: Add damage notification on display fills
> >>>>>>>>>>        vidconsole: Add damage notifications to all vidconsole drivers
> >>>>>>>>>>        video: Add damage notification on bmp display
> >>>>>>>>>>        efi_loader: GOP: Add damage notification on BLT
> >>>>>>>>>>        video: Only dcache flush damaged lines
> >>>>>>>>>>        video: Use VIDEO_DAMAGE for VIDEO_COPY
> >>>>>>>>>>        video: Always compile cache flushing code
> >>>>>>>>>>        video: Enable VIDEO_DAMAGE for drivers that need it
> >>>>>>>>>>
> >>>>>>>>>> Alper Nebi Yasak (4):
> >>>>>>>>>>        video: test: Split copy frame buffer check into a function
> >>>>>>>>>>        video: test: Support checking copy frame buffer contents
> >>>>>>>>>>        video: test: Test partial updates of hardware frame buffer
> >>>>>>>>>>        video: test: Test video damage tracking via vidconsole
> >>>>>>>>>>
> >>>>>>>>>>       arch/arm/mach-omap2/omap3/Kconfig |   1 +
> >>>>>>>>>>       arch/arm/mach-sunxi/Kconfig       |   1 +
> >>>>>>>>>>       drivers/video/Kconfig             |  26 +++
> >>>>>>>>>>       drivers/video/console_normal.c    |  27 ++--
> >>>>>>>>>>       drivers/video/console_rotate.c    |  94 +++++++----
> >>>>>>>>>>       drivers/video/console_truetype.c  |  37 +++--
> >>>>>>>>>>       drivers/video/exynos/Kconfig      |   1 +
> >>>>>>>>>>       drivers/video/imx/Kconfig         |   1 +
> >>>>>>>>>>       drivers/video/meson/Kconfig       |   1 +
> >>>>>>>>>>       drivers/video/rockchip/Kconfig    |   1 +
> >>>>>>>>>>       drivers/video/stm32/Kconfig       |   1 +
> >>>>>>>>>>       drivers/video/tegra20/Kconfig     |   1 +
> >>>>>>>>>>       drivers/video/tidss/Kconfig       |   1 +
> >>>>>>>>>>       drivers/video/vidconsole-uclass.c |  16 --
> >>>>>>>>>>       drivers/video/video-uclass.c      | 190 ++++++++++++----------
> >>>>>>>>>>       drivers/video/video_bmp.c         |   7 +-
> >>>>>>>>>>       include/video.h                   |  59 +++----
> >>>>>>>>>>       include/video_console.h           |  52 ------
> >>>>>>>>>>       lib/efi_loader/efi_gop.c          |   7 +
> >>>>>>>>>>       test/dm/video.c                   | 256 ++++++++++++++++++++++++------
> >>>>>>>>>>       20 files changed, 483 insertions(+), 297 deletions(-)
> >>>>>>>>> It is good to see this tidied up into something that can be applied!
> >>>>>>>>>
> >>>>>>>>> I am unsure what is going on with the EFI performance, though. It
> >>>>>>>>> should not flush the cache after every character, only after a new
> >>>>>>>>> line. Is there something wrong in here? If so, we should fix that bug
> >>>>>>>>> first and it should be patch 1 of this series.
> >>>>>>>> Before I came up with this series, I was trying to identify the UEFI bug
> >>>>>>>> in question as well, because intuition told me surely this is a bug in
> >>>>>>>> UEFI :). Turns out it really isn't this time around.
> >>>>>>> I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI
> >>>>>>> implementation. Where did you look for the bug?
> >>>>>> The "real" bug is in grub. But given that it's reasonably simple to work
> >>>>>> around in U-Boot and even with it "fixed" in grub we would still see
> >>>>>> performance benefits from flushing only parts of the screen, I think
> >>>>>> it's worth living with the grub deficiency.
> >>>>> OK thanks for digging into it. I suggest we add a param to
> >>>>> vidconsole_puts() to tell it whether to sync or not, then the EFI code
> >>>>> can indicate this and try to be a bit smarter about it.
> >>>> It doesn't know when to sync either. From its point of view, any
> >>>> "console output" could be the last one. There is no API in UEFI that
> >>>> says "please flush console output now".
> >>> Yes, I understand. I was not suggesting we were missing an API. But
> >>> some sort of heuristic would do, e.g. only flush on a newline, flush
> >>> every 50 chars, etc.
> >> I can't think of any heuristic that would reliably work. Relevant for
> >> this conversation, UEFI provides 2 calls:
> >>
> >>     * Write string to screen (efi_cout_output_string)
> >>     * Set text cursor position to X, Y (efi_cout_set_cursor_position)
> >>
> >> It's perfectly legal for a UEFI application to do something like
> >>
> >> efi_cout_set_cursor_position(10, 10);
> >> efi_cout_output_string("f");
> >> efi_cout_output_string("o");
> >> efi_cout_output_string("o") ;
> >>
> >> to update contents of a virtual text box on the screen. Where in this
> >> chain of events would we call video_sync(), but on every call to
> >> efi_cout_output_string()?
> > Actually U-Boot has the same problem, but we have managed to work out something.
>
>
> U-Boot as a code base has a much easier stance: It can add APIs when it
> needs them in places that require them. With UEFI (as well as the U-Boot
> native API), we're stuck with what's there.
>
> I also don't understand what you mean by "we have managed to work out
> something". This patch set is not a UEFI fix - it fixes generic U-Boot
> behavior and speeds up non-UEFI boots as well. The improvement there is
> just not as impressive as with grub :).

We are still not quite on the same page...

U-Boot does have video_sync() but it doesn't know when to call it. If
it does not call it, then any amount of single-threaded code can run
after that, which may update the framebuffer. In other words, U-Boot
is in exactly the same boat as UEFI. It has to decide whether to call
video_sync() based on some sort of heuristic.

That is the only point I am trying to make here. Does that make sense?

>
>
> > I do think it is still to flush the cache on every char. I suspect you
> > will find that even a simple heuristic like I mentioned would be good
> > enough.
> >
> > Also I notice that EFI calls notify? all the time, so U-Boot probably
> > does have the ability to sync the video every 10ms if we wanted to.
>
>
> I fail to see how invalidating the frame buffer for the screen every
> 10ms is any better than doing flush+invalidate operations only on screen
> areas that changed? It's more fragile, more difficult to understand and
> also significantly more expensive given that most of the time very
> little on the screen actually changes.

I am not suggesting it is better, at all. I am just trying to explain
that the U-Boot EFI implementation should not be calling video_sync()
after every character, before or after this series.

>
>
> >
> > It seems from this discussion that we have made great the enemy of the good.
>
>
> I agree. Damage tracking in this patch set is elegant, simple,
> predictable, low overhead and basically just abstracts the video copy
> code path to a generic solution. All while it pretty much solves the
> issue for good. I don't understand what's not to like about it :)

I think it is a really nice feature and I hope it can be applied soon.

Regards,
Simon


More information about the U-Boot mailing list