[PATCH v5 00/13] Add video damage tracking
Alexander Graf
agraf at csgraf.de
Wed Aug 30 21:55:00 CEST 2023
On 29.08.23 11:19, Mark Kettenis wrote:
>> Date: Tue, 29 Aug 2023 08:20:49 +0200
>> From: Alexander Graf <agraf at csgraf.de>
>>
>> On 28.08.23 23:54, Heinrich Schuchardt wrote:
>>> On 8/28/23 22:24, Alexander Graf wrote:
>>>> On 28.08.23 19:54, Simon Glass wrote:
>>>>> Hi Alex,
>>>>>
>>>>> On Wed, 23 Aug 2023 at 02:56, Alexander Graf <agraf at csgraf.de> wrote:
>>>>>> Hey Simon,
>>>>>>
>>>>>> On 22.08.23 20:56, Simon Glass wrote:
>>>>>>> Hi Alex,
>>>>>>>
>>>>>>> On Tue, 22 Aug 2023 at 01:47, Alexander Graf <agraf at csgraf.de> wrote:
>>>>>>>> On 22.08.23 01:03, Simon Glass wrote:
>>>>>>>>> Hi Alex,
>>>>>>>>>
>>>>>>>>> On Mon, 21 Aug 2023 at 16:40, Alexander Graf <agraf at csgraf.de>
>>>>>>>>> wrote:
>>>>>>>>>> On 22.08.23 00:10, Simon Glass wrote:
>>>>>>>>>>> Hi Alex,
>>>>>>>>>>>
>>>>>>>>>>> On Mon, 21 Aug 2023 at 14:20, Alexander Graf <agraf at csgraf.de>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> On 21.08.23 21:57, Simon Glass wrote:
>>>>>>>>>>>>> Hi Alex,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf <agraf at csgraf.de>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> On 21.08.23 21:11, Simon Glass wrote:
>>>>>>>>>>>>>>> Hi Alper,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak
>>>>>>>>>>>>>>> <alpernebiyasak at gmail.com> wrote:
>>>>>>>>>>>>>>>> This is a rebase of Alexander Graf's video damage tracking
>>>>>>>>>>>>>>>> series, with
>>>>>>>>>>>>>>>> some tests and other changes. The original cover letter is
>>>>>>>>>>>>>>>> as follows:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This patch set speeds up graphics output on ARM by a
>>>>>>>>>>>>>>>>> factor of 60x.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map
>>>>>>>>>>>>>>>>> it as cached,
>>>>>>>>>>>>>>>>> but need it accessible by the display controller which
>>>>>>>>>>>>>>>>> reads directly
>>>>>>>>>>>>>>>>> from a later point of consistency. Hence, we flush the
>>>>>>>>>>>>>>>>> frame buffer to
>>>>>>>>>>>>>>>>> DRAM on every change. The full frame buffer.
>>>>>>>>>>>>>>> It should not, see below.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Unfortunately, with the advent of 4k displays, we are
>>>>>>>>>>>>>>>>> seeing frame buffers
>>>>>>>>>>>>>>>>> that can take a while to flush out. This was reported by
>>>>>>>>>>>>>>>>> Da Xue with grub,
>>>>>>>>>>>>>>>>> which happily print 1000s of spaces on the screen to draw
>>>>>>>>>>>>>>>>> a menu. Every
>>>>>>>>>>>>>>>>> printed space triggers a cache flush.
>>>>>>>>>>>>>>> That is a bug somewhere in EFI.
>>>>>>>>>>>>>> Unfortunately not :). You may call it a bug in grub: It
>>>>>>>>>>>>>> literally prints
>>>>>>>>>>>>>> over space characters for every character in its menu that it
>>>>>>>>>>>>>> wants
>>>>>>>>>>>>>> cleared. On every text screen draw.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This wouldn't be a big issue if we only flush the reactangle
>>>>>>>>>>>>>> that gets
>>>>>>>>>>>>>> modified. But without this patch set, we're flushing the full
>>>>>>>>>>>>>> DRAM
>>>>>>>>>>>>>> buffer on every u-boot text console character write, which
>>>>>>>>>>>>>> means for
>>>>>>>>>>>>>> every character (as that's the only API UEFI has).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As a nice side effect, we speed up the normal U-Boot text
>>>>>>>>>>>>>> console as
>>>>>>>>>>>>>> well with this patch set, because even "normal" text prints
>>>>>>>>>>>>>> that write
>>>>>>>>>>>>>> for example a single line of text on the screen today flush
>>>>>>>>>>>>>> the full
>>>>>>>>>>>>>> frame buffer to DRAM.
>>>>>>>>>>>>> No, I mean that it is a bug that U-Boot (apparently) flushes
>>>>>>>>>>>>> the cache
>>>>>>>>>>>>> after every character. It doesn't do that for normal character
>>>>>>>>>>>>> output
>>>>>>>>>>>>> and I don't think it makes sense to do it for EFI either.
>>>>>>>>>>>> I see. Let's trace the calls:
>>>>>>>>>>>>
>>>>>>>>>>>> efi_cout_output_string()
>>>>>>>>>>>> -> fputs()
>>>>>>>>>>>> -> vidconsole_puts()
>>>>>>>>>>>> -> video_sync()
>>>>>>>>>>>> -> flush_dcache_range()
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately grub abstracts character backends down to the
>>>>>>>>>>>> "print
>>>>>>>>>>>> character" level, so it calls UEFI's sopisticated
>>>>>>>>>>>> "output_string"
>>>>>>>>>>>> callback with single characters at a time, which means we do a
>>>>>>>>>>>> full
>>>>>>>>>>>> dcache flush for every character that we print:
>>>>>>>>>>>>
>>>>>>>>>>>> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This patch set implements the easiest mitigation against
>>>>>>>>>>>>>>>>> this problem:
>>>>>>>>>>>>>>>>> Damage tracking. We remember the lowest common denominator
>>>>>>>>>>>>>>>>> region that was
>>>>>>>>>>>>>>>>> touched since the last video_sync() call and only flush
>>>>>>>>>>>>>>>>> that. The most
>>>>>>>>>>>>>>>>> typical writer to the frame buffer is the video console,
>>>>>>>>>>>>>>>>> which always
>>>>>>>>>>>>>>>>> writes rectangles of characters on the screen and syncs
>>>>>>>>>>>>>>>>> afterwards.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> With this patch set applied, we reduce drawing a large
>>>>>>>>>>>>>>>>> grub menu (with
>>>>>>>>>>>>>>>>> serial console attached for size information) on an
>>>>>>>>>>>>>>>>> RK3399-ROC system
>>>>>>>>>>>>>>>>> at 1440p from 55 seconds to less than 1 second.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Version 2 also implements VIDEO_COPY using this mechanism,
>>>>>>>>>>>>>>>>> reducing its
>>>>>>>>>>>>>>>>> overhead compared to before as well. So even x86 systems
>>>>>>>>>>>>>>>>> should be faster
>>>>>>>>>>>>>>>>> with this now :).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Alternatives considered:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1) Lazy sync - Sandbox does this. It only calls
>>>>>>>>>>>>>>>>> video_sync(true) ever
>>>>>>>>>>>>>>>>> so often. We are missing timers to do this
>>>>>>>>>>>>>>>>> generically.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2) Double buffering - We could try to identify
>>>>>>>>>>>>>>>>> whether anything changed
>>>>>>>>>>>>>>>>> at all and only draw to the FB if it did. That
>>>>>>>>>>>>>>>>> would require
>>>>>>>>>>>>>>>>> maintaining a second buffer that we need to
>>>>>>>>>>>>>>>>> scan.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 3) Text buffer - Maintain a buffer of all text
>>>>>>>>>>>>>>>>> printed on the screen with
>>>>>>>>>>>>>>>>> respective location. Don't write if the old and
>>>>>>>>>>>>>>>>> new character are
>>>>>>>>>>>>>>>>> identical. This would limit applicability to
>>>>>>>>>>>>>>>>> text only and is an
>>>>>>>>>>>>>>>>> optimization on top of this patch set.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 4) Hash screen lines - Create a hash (sha256?)
>>>>>>>>>>>>>>>>> over every line when it
>>>>>>>>>>>>>>>>> changes. Only flush when it does. I'm not sure
>>>>>>>>>>>>>>>>> if this would waste
>>>>>>>>>>>>>>>>> more time, memory and cache than the current
>>>>>>>>>>>>>>>>> approach. It would make
>>>>>>>>>>>>>>>>> full screen updates much more expensive.
>>>>>>>>>>>>>>> 5) Fix the bug mentioned above?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Changes in v5:
>>>>>>>>>>>>>>>> - Add patch "video: test: Split copy frame buffer check
>>>>>>>>>>>>>>>> into a function"
>>>>>>>>>>>>>>>> - Add patch "video: test: Support checking copy frame
>>>>>>>>>>>>>>>> buffer contents"
>>>>>>>>>>>>>>>> - Add patch "video: test: Test partial updates of hardware
>>>>>>>>>>>>>>>> frame buffer"
>>>>>>>>>>>>>>>> - Use xstart, ystart, xend, yend as names for damage region
>>>>>>>>>>>>>>>> - Document damage struct and fields in struct video_priv
>>>>>>>>>>>>>>>> comment
>>>>>>>>>>>>>>>> - Return void from video_damage()
>>>>>>>>>>>>>>>> - Fix undeclared priv error in video_sync()
>>>>>>>>>>>>>>>> - Drop unused headers from video-uclass.c
>>>>>>>>>>>>>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>>>>>>>>>>>>>>>> - Call video_damage() also in video_fill_part()
>>>>>>>>>>>>>>>> - Use met->baseline instead of priv->baseline
>>>>>>>>>>>>>>>> - Use fontdata->height/width instead of
>>>>>>>>>>>>>>>> VIDEO_FONT_HEIGHT/WIDTH
>>>>>>>>>>>>>>>> - Update console_rotate.c video_damage() calls to pass
>>>>>>>>>>>>>>>> video tests
>>>>>>>>>>>>>>>> - Remove mention about not having minimal damage for
>>>>>>>>>>>>>>>> console_rotate.c
>>>>>>>>>>>>>>>> - Add patch "video: test: Test video damage tracking via
>>>>>>>>>>>>>>>> vidconsole"
>>>>>>>>>>>>>>>> - Document new vdev field in struct efi_gop_obj comment
>>>>>>>>>>>>>>>> - Remove video_sync_copy() also from video_fill(),
>>>>>>>>>>>>>>>> video_fill_part()
>>>>>>>>>>>>>>>> - Fix memmove() calls by removing the extra dev argument
>>>>>>>>>>>>>>>> - Call video_sync() before checking copy_fb in video tests
>>>>>>>>>>>>>>>> - Imply VIDEO_DAMAGE for video drivers instead of
>>>>>>>>>>>>>>>> selecting it
>>>>>>>>>>>>>>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> v4:
>>>>>>>>>>>>>>>> https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Changes in v4:
>>>>>>>>>>>>>>>> - Move damage clear to patch "dm: video: Add damage
>>>>>>>>>>>>>>>> tracking API"
>>>>>>>>>>>>>>>> - Simplify first damage logic
>>>>>>>>>>>>>>>> - Remove VIDEO_DAMAGE default for ARM
>>>>>>>>>>>>>>>> - Skip damage on EfiBltVideoToBltBuffer
>>>>>>>>>>>>>>>> - Add patch "video: Always compile cache flushing code"
>>>>>>>>>>>>>>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that
>>>>>>>>>>>>>>>> need it"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> v3:
>>>>>>>>>>>>>>>> https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Changes in v3:
>>>>>>>>>>>>>>>> - Adapt to always assume DM is used
>>>>>>>>>>>>>>>> - Adapt to always assume DM is used
>>>>>>>>>>>>>>>> - Make VIDEO_COPY always select VIDEO_DAMAGE
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> v2:
>>>>>>>>>>>>>>>> https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Changes in v2:
>>>>>>>>>>>>>>>> - Remove ifdefs
>>>>>>>>>>>>>>>> - Fix ranges in truetype target
>>>>>>>>>>>>>>>> - Limit rotate to necessary damage
>>>>>>>>>>>>>>>> - Remove ifdefs from gop
>>>>>>>>>>>>>>>> - Fix dcache range; we were flushing too much before
>>>>>>>>>>>>>>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> v1:
>>>>>>>>>>>>>>>> https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Alexander Graf (9):
>>>>>>>>>>>>>>>> dm: video: Add damage tracking API
>>>>>>>>>>>>>>>> dm: video: Add damage notification on display fills
>>>>>>>>>>>>>>>> vidconsole: Add damage notifications to all
>>>>>>>>>>>>>>>> vidconsole drivers
>>>>>>>>>>>>>>>> video: Add damage notification on bmp display
>>>>>>>>>>>>>>>> efi_loader: GOP: Add damage notification on BLT
>>>>>>>>>>>>>>>> video: Only dcache flush damaged lines
>>>>>>>>>>>>>>>> video: Use VIDEO_DAMAGE for VIDEO_COPY
>>>>>>>>>>>>>>>> video: Always compile cache flushing code
>>>>>>>>>>>>>>>> video: Enable VIDEO_DAMAGE for drivers that need it
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Alper Nebi Yasak (4):
>>>>>>>>>>>>>>>> video: test: Split copy frame buffer check into a
>>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>>> video: test: Support checking copy frame buffer
>>>>>>>>>>>>>>>> contents
>>>>>>>>>>>>>>>> video: test: Test partial updates of hardware frame
>>>>>>>>>>>>>>>> buffer
>>>>>>>>>>>>>>>> video: test: Test video damage tracking via
>>>>>>>>>>>>>>>> vidconsole
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> arch/arm/mach-omap2/omap3/Kconfig | 1 +
>>>>>>>>>>>>>>>> arch/arm/mach-sunxi/Kconfig | 1 +
>>>>>>>>>>>>>>>> drivers/video/Kconfig | 26 +++
>>>>>>>>>>>>>>>> drivers/video/console_normal.c | 27 ++--
>>>>>>>>>>>>>>>> drivers/video/console_rotate.c | 94 +++++++----
>>>>>>>>>>>>>>>> drivers/video/console_truetype.c | 37 +++--
>>>>>>>>>>>>>>>> drivers/video/exynos/Kconfig | 1 +
>>>>>>>>>>>>>>>> drivers/video/imx/Kconfig | 1 +
>>>>>>>>>>>>>>>> drivers/video/meson/Kconfig | 1 +
>>>>>>>>>>>>>>>> drivers/video/rockchip/Kconfig | 1 +
>>>>>>>>>>>>>>>> drivers/video/stm32/Kconfig | 1 +
>>>>>>>>>>>>>>>> drivers/video/tegra20/Kconfig | 1 +
>>>>>>>>>>>>>>>> drivers/video/tidss/Kconfig | 1 +
>>>>>>>>>>>>>>>> drivers/video/vidconsole-uclass.c | 16 --
>>>>>>>>>>>>>>>> drivers/video/video-uclass.c | 190
>>>>>>>>>>>>>>>> ++++++++++++----------
>>>>>>>>>>>>>>>> drivers/video/video_bmp.c | 7 +-
>>>>>>>>>>>>>>>> include/video.h | 59 +++----
>>>>>>>>>>>>>>>> include/video_console.h | 52 ------
>>>>>>>>>>>>>>>> lib/efi_loader/efi_gop.c | 7 +
>>>>>>>>>>>>>>>> test/dm/video.c | 256
>>>>>>>>>>>>>>>> ++++++++++++++++++++++++------
>>>>>>>>>>>>>>>> 20 files changed, 483 insertions(+), 297 deletions(-)
>>>>>>>>>>>>>>> It is good to see this tidied up into something that can be
>>>>>>>>>>>>>>> applied!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am unsure what is going on with the EFI performance,
>>>>>>>>>>>>>>> though. It
>>>>>>>>>>>>>>> should not flush the cache after every character, only after
>>>>>>>>>>>>>>> a new
>>>>>>>>>>>>>>> line. Is there something wrong in here? If so, we should fix
>>>>>>>>>>>>>>> that bug
>>>>>>>>>>>>>>> first and it should be patch 1 of this series.
>>>>>>>>>>>>>> Before I came up with this series, I was trying to identify
>>>>>>>>>>>>>> the UEFI bug
>>>>>>>>>>>>>> in question as well, because intuition told me surely this is
>>>>>>>>>>>>>> a bug in
>>>>>>>>>>>>>> UEFI :). Turns out it really isn't this time around.
>>>>>>>>>>>>> I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI
>>>>>>>>>>>>> implementation. Where did you look for the bug?
>>>>>>>>>>>> The "real" bug is in grub. But given that it's reasonably
>>>>>>>>>>>> simple to work
>>>>>>>>>>>> around in U-Boot and even with it "fixed" in grub we would
>>>>>>>>>>>> still see
>>>>>>>>>>>> performance benefits from flushing only parts of the screen, I
>>>>>>>>>>>> think
>>>>>>>>>>>> it's worth living with the grub deficiency.
>>>>>>>>>>> OK thanks for digging into it. I suggest we add a param to
>>>>>>>>>>> vidconsole_puts() to tell it whether to sync or not, then the
>>>>>>>>>>> EFI code
>>>>>>>>>>> can indicate this and try to be a bit smarter about it.
>>>>>>>>>> It doesn't know when to sync either. From its point of view, any
>>>>>>>>>> "console output" could be the last one. There is no API in UEFI
>>>>>>>>>> that
>>>>>>>>>> says "please flush console output now".
>>>>>>>>> Yes, I understand. I was not suggesting we were missing an API. But
>>>>>>>>> some sort of heuristic would do, e.g. only flush on a newline,
>>>>>>>>> flush
>>>>>>>>> every 50 chars, etc.
>>>>>>>> I can't think of any heuristic that would reliably work. Relevant
>>>>>>>> for
>>>>>>>> this conversation, UEFI provides 2 calls:
>>>>>>>>
>>>>>>>> * Write string to screen (efi_cout_output_string)
>>>>>>>> * Set text cursor position to X, Y
>>>>>>>> (efi_cout_set_cursor_position)
>>>>>>>>
>>>>>>>> It's perfectly legal for a UEFI application to do something like
>>>>>>>>
>>>>>>>> efi_cout_set_cursor_position(10, 10);
>>>>>>>> efi_cout_output_string("f");
>>>>>>>> efi_cout_output_string("o");
>>>>>>>> efi_cout_output_string("o") ;
>>>>>>>>
>>>>>>>> to update contents of a virtual text box on the screen. Where in
>>>>>>>> this
>>>>>>>> chain of events would we call video_sync(), but on every call to
>>>>>>>> efi_cout_output_string()?
>>>>>>> Actually U-Boot has the same problem, but we have managed to work
>>>>>>> out something.
>>>>>> U-Boot as a code base has a much easier stance: It can add APIs
>>>>>> when it
>>>>>> needs them in places that require them. With UEFI (as well as the
>>>>>> U-Boot
>>>>>> native API), we're stuck with what's there.
>>>>>>
>>>>>> I also don't understand what you mean by "we have managed to work out
>>>>>> something". This patch set is not a UEFI fix - it fixes generic U-Boot
>>>>>> behavior and speeds up non-UEFI boots as well. The improvement
>>>>>> there is
>>>>>> just not as impressive as with grub :).
>>>>> We are still not quite on the same page...
>>>>>
>>>>> U-Boot does have video_sync() but it doesn't know when to call it. If
>>>>> it does not call it, then any amount of single-threaded code can run
>>>>> after that, which may update the framebuffer. In other words, U-Boot
>>>>> is in exactly the same boat as UEFI. It has to decide whether to call
>>>>> video_sync() based on some sort of heuristic.
>>>>>
>>>>> That is the only point I am trying to make here. Does that make sense?
>>>>
>>>> Oh, I thought you mentioned above that U-Boot is in a better spot or
>>>> "has it solved already". I agree - it's in the same boat and the only
>>>> safe thing it can really do today that is fully cross-platform
>>>> compatible is to call video_sync() after every character.
>>>>
>>>> I don't understand what you mean by "any amount of single-threaded code
>>>> can run after that, which may update the framebuffer". Any framebuffer
>>>> modification is U-Boot internal code which then again can apply
>>>> video_sync() to tell the system "I want what I wrote to screen actually
>>>> be on screen now". I don't think that's necessarily bad design. A bit
>>>> clunky, but we're in a pre-boot environment after all.
>>>>
>>>> Since we're aligned now: What exactly did you refer to with "but we have
>>>> managed to work out something"?
>>> Should we set PixelBltOnly to indicate to UEFI applications that they
>>> are not allowed to directly write to the framebuffer but always have to
>>> use BitBlt? GRUB seems to be using a shadow buffer by default which it
>>> copies via BitBlt.
>>
>> If we do that, OSs will no longer be able to carry the frame buffer
>> address over and continue to use it with to draw on the screen natively
>> (like Linux's efifb).
>>
>> So no, I don't think we should indicate PixelBltOnly. The frame buffer
>> is usually available to applications, you just need to adhere to the
>> architecture's caching constraints.
> Right. That would probably kill any reasonable way we can have an
> early framebuffer console in most OSes after ExitBootServices() has
> been called.
>
> I'm late to the game but isn't the real solution to have U-Boot map
> the framebuffer in a cache-coherent way? This is what typically
> happens on x86 where VRAM is mapped as "write-combining". That is,
> uncached but going through the store buffer to speed up writes.
> Reading from the framebuffer will be slow in that case (which probably
> is the real reason why grub uses a shadow framebuffer). So U-Boot
> still needs to some cleverness to make sure it only ever writes to the
> framebuffer.
Yeah, those are the 2 options: WB plus flush or WC plus shadow buffer
for reads. I don't really see much benefit in doing the latter over the
former: It occupies more valuable RAM and adds additional complexity on
FB reads. I believe the main reason x86 went that route was that it had
no choice: It didn't have a cache line flush instruction for a long time.
Alex
More information about the U-Boot
mailing list