[PATCH 0/6] Add video damage tracking

Alexander Graf agraf at csgraf.de
Thu Jun 9 21:04:37 CEST 2022


On 07.06.22 10:28, Heinrich Schuchardt wrote:
> On 6/7/22 01:43, Alexander Graf wrote:
>> This patch set speeds up graphics output on ARM by a factor of 60x.
>>
>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
>> but need it accessible by the display controller which reads directly
>> from a later point of consistency. Hence, we flush the frame buffer to
>> DRAM on every change. The full frame buffer.
>
> Isn't a similar problem already solved by CONFIG_VIDEO_COPY?
>
> Leaving the frame buffer uncached would convert the ARM problem into the
> X86 case?


It solves a similar problem, yes. However, it requires us to allocate 
the frame buffer size twice, and we would need to dynamically toggle the 
MMU mappings of the frame buffer to WC instead of cached. That's code we 
don't have today.

VIDEO_COPY is also terribly inefficient in the most common case: Drawing 
one or multiple characters. It basically copies every line that contains 
the character, for every character printed. The damage code in this 
patch set only flushes the relevant rectangles after a string is fully 
printed.

I think overall, damage tracking with cached memory is simple enough 
that it gives us the best of all worlds.


>
>>
>> Unfortunately, with the advent of 4k displays, we are seeing frame 
>> buffers
>> that can take a while to flush out. This was reported by Da Xue with 
>> grub,
>> which happily print 1000s of spaces on the screen to draw a menu. Every
>> printed space triggers a cache flush.
>>
>> This patch set implements the easiest mitigation against this problem:
>> Damage tracking. We remember the lowest common denominator region 
>> that was
>> touched since the last video_sync() call and only flush that.
>
> If by "lowest common denominator region" you should mean a rectangle,
> drawing a point in the upper left corner and another in the lower right
> corner would require a full flush. So nothing gained in this case.


Glad you asked! :)

While theoretically possible, this is a case that just never happens in 
U-Boot's code flow. All code that draws to the screen is either blt 
based (like gop, character drawing or logo display) or moves large 
portions of the screen (scrolling). The largest granularity we have 
between syncs is when printing strings. So the worst case you'll have 
today is a wrap around where you'd end up flushing full lines.


>
>>
>> With this patch set applied, we reduce drawing a large grub menu (with
>> serial console attached for size information) on an RK3399-ROC system
>> at 1440p from 55 seconds to less than 1 second.
>>
>>
>> Alternatives considered:
>>
>>    1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
>>       so often. We are missing timers to do this generically.
>>
>>    2) Double buffering - We could try to identify whether anything 
>> changed
>>       at all and only draw to the FB if it did. That would require
>>       maintaining a second buffer that we need to scan.
>>
>>    3) Text buffer - Maintain a buffer of all text printed on the 
>> screen with
>>       respective location. Don't write if the old and new character are
>>       identical. This would limit applicability to text only and is an
>>       optimization on top of this patch set.
>>
>>    4) Hash screen lines - Create a hash (sha256?) over every line 
>> when it
>>       changes. Only flush when it does. I'm not sure if this would waste
>>       more time, memory and cache than the current approach. It would 
>> make
>>       full screen updates much more expensive.
>>
>> Alexander Graf (6):
>>    dm: video: Add damage tracking API
>>    dm: video: Add damage notification on display clear
>>    vidconsole: Add damage notifications to all vidconsole drivers
>>    video: Add damage notification on bmp display
>>    efi_loader: GOP: Add damage notification on BLT
>>    video: Only dcache flush damaged lines
>
> We need documentation describing the difference between
> CONFIG_VIDEO_COPY and CONFIG_VIDEO_DAMAGE.


Hm, maybe we should implement CONFIG_VIDEO_COPY as a flush mechanism 
behind CONFIG_VIDEO_DAMAGE? That way we only have a single code path for 
producers left and in addition also optimize drawing individual 
characters. It would also make the feature useful beyond ARM dcache 
flushing.


Alex




More information about the U-Boot mailing list