[PATCH v5 00/13] Add video damage tracking
Alexander Graf
agraf at csgraf.de
Tue Aug 29 08:20:49 CEST 2023
On 28.08.23 23:54, Heinrich Schuchardt wrote:
> On 8/28/23 22:24, Alexander Graf wrote:
>>
>> On 28.08.23 19:54, Simon Glass wrote:
>>> Hi Alex,
>>>
>>> On Wed, 23 Aug 2023 at 02:56, Alexander Graf <agraf at csgraf.de> wrote:
>>>> Hey Simon,
>>>>
>>>> On 22.08.23 20:56, Simon Glass wrote:
>>>>> Hi Alex,
>>>>>
>>>>> On Tue, 22 Aug 2023 at 01:47, Alexander Graf <agraf at csgraf.de> wrote:
>>>>>> On 22.08.23 01:03, Simon Glass wrote:
>>>>>>> Hi Alex,
>>>>>>>
>>>>>>> On Mon, 21 Aug 2023 at 16:40, Alexander Graf <agraf at csgraf.de>
>>>>>>> wrote:
>>>>>>>> On 22.08.23 00:10, Simon Glass wrote:
>>>>>>>>> Hi Alex,
>>>>>>>>>
>>>>>>>>> On Mon, 21 Aug 2023 at 14:20, Alexander Graf <agraf at csgraf.de>
>>>>>>>>> wrote:
>>>>>>>>>> On 21.08.23 21:57, Simon Glass wrote:
>>>>>>>>>>> Hi Alex,
>>>>>>>>>>>
>>>>>>>>>>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf <agraf at csgraf.de>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> On 21.08.23 21:11, Simon Glass wrote:
>>>>>>>>>>>>> Hi Alper,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak
>>>>>>>>>>>>> <alpernebiyasak at gmail.com> wrote:
>>>>>>>>>>>>>> This is a rebase of Alexander Graf's video damage tracking
>>>>>>>>>>>>>> series, with
>>>>>>>>>>>>>> some tests and other changes. The original cover letter is
>>>>>>>>>>>>>> as follows:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This patch set speeds up graphics output on ARM by a
>>>>>>>>>>>>>>> factor of 60x.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map
>>>>>>>>>>>>>>> it as cached,
>>>>>>>>>>>>>>> but need it accessible by the display controller which
>>>>>>>>>>>>>>> reads directly
>>>>>>>>>>>>>>> from a later point of consistency. Hence, we flush the
>>>>>>>>>>>>>>> frame buffer to
>>>>>>>>>>>>>>> DRAM on every change. The full frame buffer.
>>>>>>>>>>>>> It should not, see below.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Unfortunately, with the advent of 4k displays, we are
>>>>>>>>>>>>>>> seeing frame buffers
>>>>>>>>>>>>>>> that can take a while to flush out. This was reported by
>>>>>>>>>>>>>>> Da Xue with grub,
>>>>>>>>>>>>>>> which happily print 1000s of spaces on the screen to draw
>>>>>>>>>>>>>>> a menu. Every
>>>>>>>>>>>>>>> printed space triggers a cache flush.
>>>>>>>>>>>>> That is a bug somewhere in EFI.
>>>>>>>>>>>> Unfortunately not :). You may call it a bug in grub: It
>>>>>>>>>>>> literally prints
>>>>>>>>>>>> over space characters for every character in its menu that it
>>>>>>>>>>>> wants
>>>>>>>>>>>> cleared. On every text screen draw.
>>>>>>>>>>>>
>>>>>>>>>>>> This wouldn't be a big issue if we only flush the reactangle
>>>>>>>>>>>> that gets
>>>>>>>>>>>> modified. But without this patch set, we're flushing the full
>>>>>>>>>>>> DRAM
>>>>>>>>>>>> buffer on every u-boot text console character write, which
>>>>>>>>>>>> means for
>>>>>>>>>>>> every character (as that's the only API UEFI has).
>>>>>>>>>>>>
>>>>>>>>>>>> As a nice side effect, we speed up the normal U-Boot text
>>>>>>>>>>>> console as
>>>>>>>>>>>> well with this patch set, because even "normal" text prints
>>>>>>>>>>>> that write
>>>>>>>>>>>> for example a single line of text on the screen today flush
>>>>>>>>>>>> the full
>>>>>>>>>>>> frame buffer to DRAM.
>>>>>>>>>>> No, I mean that it is a bug that U-Boot (apparently) flushes
>>>>>>>>>>> the cache
>>>>>>>>>>> after every character. It doesn't do that for normal character
>>>>>>>>>>> output
>>>>>>>>>>> and I don't think it makes sense to do it for EFI either.
>>>>>>>>>> I see. Let's trace the calls:
>>>>>>>>>>
>>>>>>>>>> efi_cout_output_string()
>>>>>>>>>> -> fputs()
>>>>>>>>>> -> vidconsole_puts()
>>>>>>>>>> -> video_sync()
>>>>>>>>>> -> flush_dcache_range()
>>>>>>>>>>
>>>>>>>>>> Unfortunately grub abstracts character backends down to the
>>>>>>>>>> "print
>>>>>>>>>> character" level, so it calls UEFI's sopisticated
>>>>>>>>>> "output_string"
>>>>>>>>>> callback with single characters at a time, which means we do a
>>>>>>>>>> full
>>>>>>>>>> dcache flush for every character that we print:
>>>>>>>>>>
>>>>>>>>>> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>>> This patch set implements the easiest mitigation against
>>>>>>>>>>>>>>> this problem:
>>>>>>>>>>>>>>> Damage tracking. We remember the lowest common denominator
>>>>>>>>>>>>>>> region that was
>>>>>>>>>>>>>>> touched since the last video_sync() call and only flush
>>>>>>>>>>>>>>> that. The most
>>>>>>>>>>>>>>> typical writer to the frame buffer is the video console,
>>>>>>>>>>>>>>> which always
>>>>>>>>>>>>>>> writes rectangles of characters on the screen and syncs
>>>>>>>>>>>>>>> afterwards.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> With this patch set applied, we reduce drawing a large
>>>>>>>>>>>>>>> grub menu (with
>>>>>>>>>>>>>>> serial console attached for size information) on an
>>>>>>>>>>>>>>> RK3399-ROC system
>>>>>>>>>>>>>>> at 1440p from 55 seconds to less than 1 second.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Version 2 also implements VIDEO_COPY using this mechanism,
>>>>>>>>>>>>>>> reducing its
>>>>>>>>>>>>>>> overhead compared to before as well. So even x86 systems
>>>>>>>>>>>>>>> should be faster
>>>>>>>>>>>>>>> with this now :).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Alternatives considered:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) Lazy sync - Sandbox does this. It only calls
>>>>>>>>>>>>>>> video_sync(true) ever
>>>>>>>>>>>>>>> so often. We are missing timers to do this
>>>>>>>>>>>>>>> generically.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2) Double buffering - We could try to identify
>>>>>>>>>>>>>>> whether anything changed
>>>>>>>>>>>>>>> at all and only draw to the FB if it did. That
>>>>>>>>>>>>>>> would require
>>>>>>>>>>>>>>> maintaining a second buffer that we need to
>>>>>>>>>>>>>>> scan.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 3) Text buffer - Maintain a buffer of all text
>>>>>>>>>>>>>>> printed on the screen with
>>>>>>>>>>>>>>> respective location. Don't write if the old and
>>>>>>>>>>>>>>> new character are
>>>>>>>>>>>>>>> identical. This would limit applicability to
>>>>>>>>>>>>>>> text only and is an
>>>>>>>>>>>>>>> optimization on top of this patch set.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 4) Hash screen lines - Create a hash (sha256?)
>>>>>>>>>>>>>>> over every line when it
>>>>>>>>>>>>>>> changes. Only flush when it does. I'm not sure
>>>>>>>>>>>>>>> if this would waste
>>>>>>>>>>>>>>> more time, memory and cache than the current
>>>>>>>>>>>>>>> approach. It would make
>>>>>>>>>>>>>>> full screen updates much more expensive.
>>>>>>>>>>>>> 5) Fix the bug mentioned above?
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Changes in v5:
>>>>>>>>>>>>>> - Add patch "video: test: Split copy frame buffer check
>>>>>>>>>>>>>> into a function"
>>>>>>>>>>>>>> - Add patch "video: test: Support checking copy frame
>>>>>>>>>>>>>> buffer contents"
>>>>>>>>>>>>>> - Add patch "video: test: Test partial updates of hardware
>>>>>>>>>>>>>> frame buffer"
>>>>>>>>>>>>>> - Use xstart, ystart, xend, yend as names for damage region
>>>>>>>>>>>>>> - Document damage struct and fields in struct video_priv
>>>>>>>>>>>>>> comment
>>>>>>>>>>>>>> - Return void from video_damage()
>>>>>>>>>>>>>> - Fix undeclared priv error in video_sync()
>>>>>>>>>>>>>> - Drop unused headers from video-uclass.c
>>>>>>>>>>>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>>>>>>>>>>>>>> - Call video_damage() also in video_fill_part()
>>>>>>>>>>>>>> - Use met->baseline instead of priv->baseline
>>>>>>>>>>>>>> - Use fontdata->height/width instead of
>>>>>>>>>>>>>> VIDEO_FONT_HEIGHT/WIDTH
>>>>>>>>>>>>>> - Update console_rotate.c video_damage() calls to pass
>>>>>>>>>>>>>> video tests
>>>>>>>>>>>>>> - Remove mention about not having minimal damage for
>>>>>>>>>>>>>> console_rotate.c
>>>>>>>>>>>>>> - Add patch "video: test: Test video damage tracking via
>>>>>>>>>>>>>> vidconsole"
>>>>>>>>>>>>>> - Document new vdev field in struct efi_gop_obj comment
>>>>>>>>>>>>>> - Remove video_sync_copy() also from video_fill(),
>>>>>>>>>>>>>> video_fill_part()
>>>>>>>>>>>>>> - Fix memmove() calls by removing the extra dev argument
>>>>>>>>>>>>>> - Call video_sync() before checking copy_fb in video tests
>>>>>>>>>>>>>> - Imply VIDEO_DAMAGE for video drivers instead of
>>>>>>>>>>>>>> selecting it
>>>>>>>>>>>>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> v4:
>>>>>>>>>>>>>> https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Changes in v4:
>>>>>>>>>>>>>> - Move damage clear to patch "dm: video: Add damage
>>>>>>>>>>>>>> tracking API"
>>>>>>>>>>>>>> - Simplify first damage logic
>>>>>>>>>>>>>> - Remove VIDEO_DAMAGE default for ARM
>>>>>>>>>>>>>> - Skip damage on EfiBltVideoToBltBuffer
>>>>>>>>>>>>>> - Add patch "video: Always compile cache flushing code"
>>>>>>>>>>>>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that
>>>>>>>>>>>>>> need it"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> v3:
>>>>>>>>>>>>>> https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Changes in v3:
>>>>>>>>>>>>>> - Adapt to always assume DM is used
>>>>>>>>>>>>>> - Adapt to always assume DM is used
>>>>>>>>>>>>>> - Make VIDEO_COPY always select VIDEO_DAMAGE
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> v2:
>>>>>>>>>>>>>> https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Changes in v2:
>>>>>>>>>>>>>> - Remove ifdefs
>>>>>>>>>>>>>> - Fix ranges in truetype target
>>>>>>>>>>>>>> - Limit rotate to necessary damage
>>>>>>>>>>>>>> - Remove ifdefs from gop
>>>>>>>>>>>>>> - Fix dcache range; we were flushing too much before
>>>>>>>>>>>>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> v1:
>>>>>>>>>>>>>> https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Alexander Graf (9):
>>>>>>>>>>>>>> dm: video: Add damage tracking API
>>>>>>>>>>>>>> dm: video: Add damage notification on display fills
>>>>>>>>>>>>>> vidconsole: Add damage notifications to all
>>>>>>>>>>>>>> vidconsole drivers
>>>>>>>>>>>>>> video: Add damage notification on bmp display
>>>>>>>>>>>>>> efi_loader: GOP: Add damage notification on BLT
>>>>>>>>>>>>>> video: Only dcache flush damaged lines
>>>>>>>>>>>>>> video: Use VIDEO_DAMAGE for VIDEO_COPY
>>>>>>>>>>>>>> video: Always compile cache flushing code
>>>>>>>>>>>>>> video: Enable VIDEO_DAMAGE for drivers that need it
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Alper Nebi Yasak (4):
>>>>>>>>>>>>>> video: test: Split copy frame buffer check into a
>>>>>>>>>>>>>> function
>>>>>>>>>>>>>> video: test: Support checking copy frame buffer
>>>>>>>>>>>>>> contents
>>>>>>>>>>>>>> video: test: Test partial updates of hardware frame
>>>>>>>>>>>>>> buffer
>>>>>>>>>>>>>> video: test: Test video damage tracking via
>>>>>>>>>>>>>> vidconsole
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> arch/arm/mach-omap2/omap3/Kconfig | 1 +
>>>>>>>>>>>>>> arch/arm/mach-sunxi/Kconfig | 1 +
>>>>>>>>>>>>>> drivers/video/Kconfig | 26 +++
>>>>>>>>>>>>>> drivers/video/console_normal.c | 27 ++--
>>>>>>>>>>>>>> drivers/video/console_rotate.c | 94 +++++++----
>>>>>>>>>>>>>> drivers/video/console_truetype.c | 37 +++--
>>>>>>>>>>>>>> drivers/video/exynos/Kconfig | 1 +
>>>>>>>>>>>>>> drivers/video/imx/Kconfig | 1 +
>>>>>>>>>>>>>> drivers/video/meson/Kconfig | 1 +
>>>>>>>>>>>>>> drivers/video/rockchip/Kconfig | 1 +
>>>>>>>>>>>>>> drivers/video/stm32/Kconfig | 1 +
>>>>>>>>>>>>>> drivers/video/tegra20/Kconfig | 1 +
>>>>>>>>>>>>>> drivers/video/tidss/Kconfig | 1 +
>>>>>>>>>>>>>> drivers/video/vidconsole-uclass.c | 16 --
>>>>>>>>>>>>>> drivers/video/video-uclass.c | 190
>>>>>>>>>>>>>> ++++++++++++----------
>>>>>>>>>>>>>> drivers/video/video_bmp.c | 7 +-
>>>>>>>>>>>>>> include/video.h | 59 +++----
>>>>>>>>>>>>>> include/video_console.h | 52 ------
>>>>>>>>>>>>>> lib/efi_loader/efi_gop.c | 7 +
>>>>>>>>>>>>>> test/dm/video.c | 256
>>>>>>>>>>>>>> ++++++++++++++++++++++++------
>>>>>>>>>>>>>> 20 files changed, 483 insertions(+), 297 deletions(-)
>>>>>>>>>>>>> It is good to see this tidied up into something that can be
>>>>>>>>>>>>> applied!
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am unsure what is going on with the EFI performance,
>>>>>>>>>>>>> though. It
>>>>>>>>>>>>> should not flush the cache after every character, only after
>>>>>>>>>>>>> a new
>>>>>>>>>>>>> line. Is there something wrong in here? If so, we should fix
>>>>>>>>>>>>> that bug
>>>>>>>>>>>>> first and it should be patch 1 of this series.
>>>>>>>>>>>> Before I came up with this series, I was trying to identify
>>>>>>>>>>>> the UEFI bug
>>>>>>>>>>>> in question as well, because intuition told me surely this is
>>>>>>>>>>>> a bug in
>>>>>>>>>>>> UEFI :). Turns out it really isn't this time around.
>>>>>>>>>>> I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI
>>>>>>>>>>> implementation. Where did you look for the bug?
>>>>>>>>>> The "real" bug is in grub. But given that it's reasonably
>>>>>>>>>> simple to work
>>>>>>>>>> around in U-Boot and even with it "fixed" in grub we would
>>>>>>>>>> still see
>>>>>>>>>> performance benefits from flushing only parts of the screen, I
>>>>>>>>>> think
>>>>>>>>>> it's worth living with the grub deficiency.
>>>>>>>>> OK thanks for digging into it. I suggest we add a param to
>>>>>>>>> vidconsole_puts() to tell it whether to sync or not, then the
>>>>>>>>> EFI code
>>>>>>>>> can indicate this and try to be a bit smarter about it.
>>>>>>>> It doesn't know when to sync either. From its point of view, any
>>>>>>>> "console output" could be the last one. There is no API in UEFI
>>>>>>>> that
>>>>>>>> says "please flush console output now".
>>>>>>> Yes, I understand. I was not suggesting we were missing an API. But
>>>>>>> some sort of heuristic would do, e.g. only flush on a newline,
>>>>>>> flush
>>>>>>> every 50 chars, etc.
>>>>>> I can't think of any heuristic that would reliably work. Relevant
>>>>>> for
>>>>>> this conversation, UEFI provides 2 calls:
>>>>>>
>>>>>> * Write string to screen (efi_cout_output_string)
>>>>>> * Set text cursor position to X, Y
>>>>>> (efi_cout_set_cursor_position)
>>>>>>
>>>>>> It's perfectly legal for a UEFI application to do something like
>>>>>>
>>>>>> efi_cout_set_cursor_position(10, 10);
>>>>>> efi_cout_output_string("f");
>>>>>> efi_cout_output_string("o");
>>>>>> efi_cout_output_string("o") ;
>>>>>>
>>>>>> to update contents of a virtual text box on the screen. Where in
>>>>>> this
>>>>>> chain of events would we call video_sync(), but on every call to
>>>>>> efi_cout_output_string()?
>>>>> Actually U-Boot has the same problem, but we have managed to work
>>>>> out something.
>>>>
>>>> U-Boot as a code base has a much easier stance: It can add APIs
>>>> when it
>>>> needs them in places that require them. With UEFI (as well as the
>>>> U-Boot
>>>> native API), we're stuck with what's there.
>>>>
>>>> I also don't understand what you mean by "we have managed to work out
>>>> something". This patch set is not a UEFI fix - it fixes generic U-Boot
>>>> behavior and speeds up non-UEFI boots as well. The improvement
>>>> there is
>>>> just not as impressive as with grub :).
>>> We are still not quite on the same page...
>>>
>>> U-Boot does have video_sync() but it doesn't know when to call it. If
>>> it does not call it, then any amount of single-threaded code can run
>>> after that, which may update the framebuffer. In other words, U-Boot
>>> is in exactly the same boat as UEFI. It has to decide whether to call
>>> video_sync() based on some sort of heuristic.
>>>
>>> That is the only point I am trying to make here. Does that make sense?
>>
>>
>> Oh, I thought you mentioned above that U-Boot is in a better spot or
>> "has it solved already". I agree - it's in the same boat and the only
>> safe thing it can really do today that is fully cross-platform
>> compatible is to call video_sync() after every character.
>>
>> I don't understand what you mean by "any amount of single-threaded code
>> can run after that, which may update the framebuffer". Any framebuffer
>> modification is U-Boot internal code which then again can apply
>> video_sync() to tell the system "I want what I wrote to screen actually
>> be on screen now". I don't think that's necessarily bad design. A bit
>> clunky, but we're in a pre-boot environment after all.
>>
>> Since we're aligned now: What exactly did you refer to with "but we have
>> managed to work out something"?
>
> Should we set PixelBltOnly to indicate to UEFI applications that they
> are not allowed to directly write to the framebuffer but always have to
> use BitBlt? GRUB seems to be using a shadow buffer by default which it
> copies via BitBlt.
If we do that, OSs will no longer be able to carry the frame buffer
address over and continue to use it with to draw on the screen natively
(like Linux's efifb).
So no, I don't think we should indicate PixelBltOnly. The frame buffer
is usually available to applications, you just need to adhere to the
architecture's caching constraints.
Alex
More information about the U-Boot
mailing list