[PATCH v5 00/13] Add video damage tracking

Alexander Graf agraf at csgraf.de
Tue Aug 29 08:27:09 CEST 2023


Hi Simon,

On 29.08.23 00:08, Simon Glass wrote:
> Hi Alex,
>
> On Mon, 28 Aug 2023 at 14:24, Alexander Graf <agraf at csgraf.de> wrote:
>>
>> On 28.08.23 19:54, Simon Glass wrote:
>>> Hi Alex,
>>>
>>> On Wed, 23 Aug 2023 at 02:56, Alexander Graf <agraf at csgraf.de> wrote:
>>>> Hey Simon,
>>>>
>>>> On 22.08.23 20:56, Simon Glass wrote:
>>>>> Hi Alex,
>>>>>
>>>>> On Tue, 22 Aug 2023 at 01:47, Alexander Graf <agraf at csgraf.de> wrote:
>>>>>> On 22.08.23 01:03, Simon Glass wrote:
>>>>>>> Hi Alex,
>>>>>>>
>>>>>>> On Mon, 21 Aug 2023 at 16:40, Alexander Graf <agraf at csgraf.de> wrote:
>>>>>>>> On 22.08.23 00:10, Simon Glass wrote:
>>>>>>>>> Hi Alex,
>>>>>>>>>
>>>>>>>>> On Mon, 21 Aug 2023 at 14:20, Alexander Graf <agraf at csgraf.de> wrote:
>>>>>>>>>> On 21.08.23 21:57, Simon Glass wrote:
>>>>>>>>>>> Hi Alex,
>>>>>>>>>>>
>>>>>>>>>>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf <agraf at csgraf.de> wrote:
>>>>>>>>>>>> On 21.08.23 21:11, Simon Glass wrote:
>>>>>>>>>>>>> Hi Alper,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak at gmail.com> wrote:
>>>>>>>>>>>>>> This is a rebase of Alexander Graf's video damage tracking series, with
>>>>>>>>>>>>>> some tests and other changes. The original cover letter is as follows:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This patch set speeds up graphics output on ARM by a factor of 60x.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
>>>>>>>>>>>>>>> but need it accessible by the display controller which reads directly
>>>>>>>>>>>>>>> from a later point of consistency. Hence, we flush the frame buffer to
>>>>>>>>>>>>>>> DRAM on every change. The full frame buffer.
>>>>>>>>>>>>> It should not, see below.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
>>>>>>>>>>>>>>> that can take a while to flush out. This was reported by Da Xue with grub,
>>>>>>>>>>>>>>> which happily print 1000s of spaces on the screen to draw a menu. Every
>>>>>>>>>>>>>>> printed space triggers a cache flush.
>>>>>>>>>>>>> That is a bug somewhere in EFI.
>>>>>>>>>>>> Unfortunately not :). You may call it a bug in grub: It literally prints
>>>>>>>>>>>> over space characters for every character in its menu that it wants
>>>>>>>>>>>> cleared. On every text screen draw.
>>>>>>>>>>>>
>>>>>>>>>>>> This wouldn't be a big issue if we only flush the reactangle that gets
>>>>>>>>>>>> modified. But without this patch set, we're flushing the full DRAM
>>>>>>>>>>>> buffer on every u-boot text console character write, which means for
>>>>>>>>>>>> every character (as that's the only API UEFI has).
>>>>>>>>>>>>
>>>>>>>>>>>> As a nice side effect, we speed up the normal U-Boot text console as
>>>>>>>>>>>> well with this patch set, because even "normal" text prints that write
>>>>>>>>>>>> for example a single line of text on the screen today flush the full
>>>>>>>>>>>> frame buffer to DRAM.
>>>>>>>>>>> No, I mean that it is a bug that U-Boot (apparently) flushes the cache
>>>>>>>>>>> after every character. It doesn't do that for normal character output
>>>>>>>>>>> and I don't think it makes sense to do it for EFI either.
>>>>>>>>>> I see. Let's trace the calls:
>>>>>>>>>>
>>>>>>>>>> efi_cout_output_string()
>>>>>>>>>> -> fputs()
>>>>>>>>>> -> vidconsole_puts()
>>>>>>>>>> -> video_sync()
>>>>>>>>>> -> flush_dcache_range()
>>>>>>>>>>
>>>>>>>>>> Unfortunately grub abstracts character backends down to the "print
>>>>>>>>>> character" level, so it calls UEFI's sopisticated "output_string"
>>>>>>>>>> callback with single characters at a time, which means we do a full
>>>>>>>>>> dcache flush for every character that we print:
>>>>>>>>>>
>>>>>>>>>> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>>> This patch set implements the easiest mitigation against this problem:
>>>>>>>>>>>>>>> Damage tracking. We remember the lowest common denominator region that was
>>>>>>>>>>>>>>> touched since the last video_sync() call and only flush that. The most
>>>>>>>>>>>>>>> typical writer to the frame buffer is the video console, which always
>>>>>>>>>>>>>>> writes rectangles of characters on the screen and syncs afterwards.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> With this patch set applied, we reduce drawing a large grub menu (with
>>>>>>>>>>>>>>> serial console attached for size information) on an RK3399-ROC system
>>>>>>>>>>>>>>> at 1440p from 55 seconds to less than 1 second.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Version 2 also implements VIDEO_COPY using this mechanism, reducing its
>>>>>>>>>>>>>>> overhead compared to before as well. So even x86 systems should be faster
>>>>>>>>>>>>>>> with this now :).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Alternatives considered:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>          1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
>>>>>>>>>>>>>>>             so often. We are missing timers to do this generically.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>          2) Double buffering - We could try to identify whether anything changed
>>>>>>>>>>>>>>>             at all and only draw to the FB if it did. That would require
>>>>>>>>>>>>>>>             maintaining a second buffer that we need to scan.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>          3) Text buffer - Maintain a buffer of all text printed on the screen with
>>>>>>>>>>>>>>>             respective location. Don't write if the old and new character are
>>>>>>>>>>>>>>>             identical. This would limit applicability to text only and is an
>>>>>>>>>>>>>>>             optimization on top of this patch set.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>          4) Hash screen lines - Create a hash (sha256?) over every line when it
>>>>>>>>>>>>>>>             changes. Only flush when it does. I'm not sure if this would waste
>>>>>>>>>>>>>>>             more time, memory and cache than the current approach. It would make
>>>>>>>>>>>>>>>             full screen updates much more expensive.
>>>>>>>>>>>>> 5) Fix the bug mentioned above?
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Changes in v5:
>>>>>>>>>>>>>> - Add patch "video: test: Split copy frame buffer check into a function"
>>>>>>>>>>>>>> - Add patch "video: test: Support checking copy frame buffer contents"
>>>>>>>>>>>>>> - Add patch "video: test: Test partial updates of hardware frame buffer"
>>>>>>>>>>>>>> - Use xstart, ystart, xend, yend as names for damage region
>>>>>>>>>>>>>> - Document damage struct and fields in struct video_priv comment
>>>>>>>>>>>>>> - Return void from video_damage()
>>>>>>>>>>>>>> - Fix undeclared priv error in video_sync()
>>>>>>>>>>>>>> - Drop unused headers from video-uclass.c
>>>>>>>>>>>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>>>>>>>>>>>>>> - Call video_damage() also in video_fill_part()
>>>>>>>>>>>>>> - Use met->baseline instead of priv->baseline
>>>>>>>>>>>>>> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
>>>>>>>>>>>>>> - Update console_rotate.c video_damage() calls to pass video tests
>>>>>>>>>>>>>> - Remove mention about not having minimal damage for console_rotate.c
>>>>>>>>>>>>>> - Add patch "video: test: Test video damage tracking via vidconsole"
>>>>>>>>>>>>>> - Document new vdev field in struct efi_gop_obj comment
>>>>>>>>>>>>>> - Remove video_sync_copy() also from video_fill(), video_fill_part()
>>>>>>>>>>>>>> - Fix memmove() calls by removing the extra dev argument
>>>>>>>>>>>>>> - Call video_sync() before checking copy_fb in video tests
>>>>>>>>>>>>>> - Imply VIDEO_DAMAGE for video drivers instead of selecting it
>>>>>>>>>>>>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> v4: https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Changes in v4:
>>>>>>>>>>>>>> - Move damage clear to patch "dm: video: Add damage tracking API"
>>>>>>>>>>>>>> - Simplify first damage logic
>>>>>>>>>>>>>> - Remove VIDEO_DAMAGE default for ARM
>>>>>>>>>>>>>> - Skip damage on EfiBltVideoToBltBuffer
>>>>>>>>>>>>>> - Add patch "video: Always compile cache flushing code"
>>>>>>>>>>>>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> v3: https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Changes in v3:
>>>>>>>>>>>>>> - Adapt to always assume DM is used
>>>>>>>>>>>>>> - Adapt to always assume DM is used
>>>>>>>>>>>>>> - Make VIDEO_COPY always select VIDEO_DAMAGE
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> v2: https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Changes in v2:
>>>>>>>>>>>>>> - Remove ifdefs
>>>>>>>>>>>>>> - Fix ranges in truetype target
>>>>>>>>>>>>>> - Limit rotate to necessary damage
>>>>>>>>>>>>>> - Remove ifdefs from gop
>>>>>>>>>>>>>> - Fix dcache range; we were flushing too much before
>>>>>>>>>>>>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> v1: https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Alexander Graf (9):
>>>>>>>>>>>>>>          dm: video: Add damage tracking API
>>>>>>>>>>>>>>          dm: video: Add damage notification on display fills
>>>>>>>>>>>>>>          vidconsole: Add damage notifications to all vidconsole drivers
>>>>>>>>>>>>>>          video: Add damage notification on bmp display
>>>>>>>>>>>>>>          efi_loader: GOP: Add damage notification on BLT
>>>>>>>>>>>>>>          video: Only dcache flush damaged lines
>>>>>>>>>>>>>>          video: Use VIDEO_DAMAGE for VIDEO_COPY
>>>>>>>>>>>>>>          video: Always compile cache flushing code
>>>>>>>>>>>>>>          video: Enable VIDEO_DAMAGE for drivers that need it
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Alper Nebi Yasak (4):
>>>>>>>>>>>>>>          video: test: Split copy frame buffer check into a function
>>>>>>>>>>>>>>          video: test: Support checking copy frame buffer contents
>>>>>>>>>>>>>>          video: test: Test partial updates of hardware frame buffer
>>>>>>>>>>>>>>          video: test: Test video damage tracking via vidconsole
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         arch/arm/mach-omap2/omap3/Kconfig |   1 +
>>>>>>>>>>>>>>         arch/arm/mach-sunxi/Kconfig       |   1 +
>>>>>>>>>>>>>>         drivers/video/Kconfig             |  26 +++
>>>>>>>>>>>>>>         drivers/video/console_normal.c    |  27 ++--
>>>>>>>>>>>>>>         drivers/video/console_rotate.c    |  94 +++++++----
>>>>>>>>>>>>>>         drivers/video/console_truetype.c  |  37 +++--
>>>>>>>>>>>>>>         drivers/video/exynos/Kconfig      |   1 +
>>>>>>>>>>>>>>         drivers/video/imx/Kconfig         |   1 +
>>>>>>>>>>>>>>         drivers/video/meson/Kconfig       |   1 +
>>>>>>>>>>>>>>         drivers/video/rockchip/Kconfig    |   1 +
>>>>>>>>>>>>>>         drivers/video/stm32/Kconfig       |   1 +
>>>>>>>>>>>>>>         drivers/video/tegra20/Kconfig     |   1 +
>>>>>>>>>>>>>>         drivers/video/tidss/Kconfig       |   1 +
>>>>>>>>>>>>>>         drivers/video/vidconsole-uclass.c |  16 --
>>>>>>>>>>>>>>         drivers/video/video-uclass.c      | 190 ++++++++++++----------
>>>>>>>>>>>>>>         drivers/video/video_bmp.c         |   7 +-
>>>>>>>>>>>>>>         include/video.h                   |  59 +++----
>>>>>>>>>>>>>>         include/video_console.h           |  52 ------
>>>>>>>>>>>>>>         lib/efi_loader/efi_gop.c          |   7 +
>>>>>>>>>>>>>>         test/dm/video.c                   | 256 ++++++++++++++++++++++++------
>>>>>>>>>>>>>>         20 files changed, 483 insertions(+), 297 deletions(-)
>>>>>>>>>>>>> It is good to see this tidied up into something that can be applied!
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am unsure what is going on with the EFI performance, though. It
>>>>>>>>>>>>> should not flush the cache after every character, only after a new
>>>>>>>>>>>>> line. Is there something wrong in here? If so, we should fix that bug
>>>>>>>>>>>>> first and it should be patch 1 of this series.
>>>>>>>>>>>> Before I came up with this series, I was trying to identify the UEFI bug
>>>>>>>>>>>> in question as well, because intuition told me surely this is a bug in
>>>>>>>>>>>> UEFI :). Turns out it really isn't this time around.
>>>>>>>>>>> I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI
>>>>>>>>>>> implementation. Where did you look for the bug?
>>>>>>>>>> The "real" bug is in grub. But given that it's reasonably simple to work
>>>>>>>>>> around in U-Boot and even with it "fixed" in grub we would still see
>>>>>>>>>> performance benefits from flushing only parts of the screen, I think
>>>>>>>>>> it's worth living with the grub deficiency.
>>>>>>>>> OK thanks for digging into it. I suggest we add a param to
>>>>>>>>> vidconsole_puts() to tell it whether to sync or not, then the EFI code
>>>>>>>>> can indicate this and try to be a bit smarter about it.
>>>>>>>> It doesn't know when to sync either. From its point of view, any
>>>>>>>> "console output" could be the last one. There is no API in UEFI that
>>>>>>>> says "please flush console output now".
>>>>>>> Yes, I understand. I was not suggesting we were missing an API. But
>>>>>>> some sort of heuristic would do, e.g. only flush on a newline, flush
>>>>>>> every 50 chars, etc.
>>>>>> I can't think of any heuristic that would reliably work. Relevant for
>>>>>> this conversation, UEFI provides 2 calls:
>>>>>>
>>>>>>       * Write string to screen (efi_cout_output_string)
>>>>>>       * Set text cursor position to X, Y (efi_cout_set_cursor_position)
>>>>>>
>>>>>> It's perfectly legal for a UEFI application to do something like
>>>>>>
>>>>>> efi_cout_set_cursor_position(10, 10);
>>>>>> efi_cout_output_string("f");
>>>>>> efi_cout_output_string("o");
>>>>>> efi_cout_output_string("o") ;
>>>>>>
>>>>>> to update contents of a virtual text box on the screen. Where in this
>>>>>> chain of events would we call video_sync(), but on every call to
>>>>>> efi_cout_output_string()?
>>>>> Actually U-Boot has the same problem, but we have managed to work out something.
>>>> U-Boot as a code base has a much easier stance: It can add APIs when it
>>>> needs them in places that require them. With UEFI (as well as the U-Boot
>>>> native API), we're stuck with what's there.
>>>>
>>>> I also don't understand what you mean by "we have managed to work out
>>>> something". This patch set is not a UEFI fix - it fixes generic U-Boot
>>>> behavior and speeds up non-UEFI boots as well. The improvement there is
>>>> just not as impressive as with grub :).
>>> We are still not quite on the same page...
>>>
>>> U-Boot does have video_sync() but it doesn't know when to call it. If
>>> it does not call it, then any amount of single-threaded code can run
>>> after that, which may update the framebuffer. In other words, U-Boot
>>> is in exactly the same boat as UEFI. It has to decide whether to call
>>> video_sync() based on some sort of heuristic.
>>>
>>> That is the only point I am trying to make here. Does that make sense?
>>
>> Oh, I thought you mentioned above that U-Boot is in a better spot or
>> "has it solved already". I agree - it's in the same boat and the only
>> safe thing it can really do today that is fully cross-platform
>> compatible is to call video_sync() after every character.
>>
>> I don't understand what you mean by "any amount of single-threaded code
>> can run after that, which may update the framebuffer". Any framebuffer
>> modification is U-Boot internal code which then again can apply
>> video_sync() to tell the system "I want what I wrote to screen actually
>> be on screen now". I don't think that's necessarily bad design. A bit
>> clunky, but we're in a pre-boot environment after all.
>>
>> Since we're aligned now: What exactly did you refer to with "but we have
>> managed to work out something"?
> So now we are on the same page about that point. The next step is my
> heuristic point:
>
> vidconsole_putc_xy() - does not call video_sync()
> vidconsole_newline() - does
>
> I am simply suggesting that UEFI should do the same thing.


I think the better analogy is with

vidconsole_puts() - does

and that's exactly the function that the UEFI code calls. The UEFI 
interface is "write this long string to screen". All the UEFI code does 
is call vidconsole_puts() to do that which then (rightfully) calls 
video_sync().

The reason we flush after every character with grub is grub: Grub abuses 
the "write long string to screen" function and instead only writes a 
single character on each call, which then means we flush on every 
character write.


>
>>
>>>>> I do think it is still to flush the cache on every char. I suspect you
>>>>> will find that even a simple heuristic like I mentioned would be good
>>>>> enough.
>>>>>
>>>>> Also I notice that EFI calls notify? all the time, so U-Boot probably
>>>>> does have the ability to sync the video every 10ms if we wanted to.
>>>> I fail to see how invalidating the frame buffer for the screen every
>>>> 10ms is any better than doing flush+invalidate operations only on screen
>>>> areas that changed? It's more fragile, more difficult to understand and
>>>> also significantly more expensive given that most of the time very
>>>> little on the screen actually changes.
>>> I am not suggesting it is better, at all. I am just trying to explain
>>> that the U-Boot EFI implementation should not be calling video_sync()
>>> after every character, before or after this series.
>>
>> Ah, it doesn't :). It just calls the normal U-Boot "write character on
>> screen" function. What that does is up to U-Boot internals - and those
>> basically today dictate that something needs to call video_sync() to
>> make FB modifications actually pop up on screen.
> Hmmm, so what function is it calling, then?  I think we are getting
> closer to the 'fix' I am trying to tease out.


It literally calls vidconsole_puts():

https://github.com/u-boot/u-boot/blob/master/lib/efi_loader/efi_console.c#L185


Alex




More information about the U-Boot mailing list