[U-Boot] [PATCH 11/15] efi_loader: capitalization table

Mike FABIAN maiku.fabian at gmail.com
Mon Aug 27 08:30:28 UTC 2018


Alexander Graf <agraf at suse.de> さんは書きました:

> On 26.08.18 21:00, Heinrich Schuchardt wrote:
>> On 08/26/2018 08:22 PM, Alexander Graf wrote:
>>>
>>>
>>> On 11.08.18 17:28, Heinrich Schuchardt wrote:
>>>> This patch provides a define to initialize a table that maps lower to
>>>> capital letters for Unicode code point 0x0000 - 0xffff.
>>>>
>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk at gmx.de>
>>>> ---
>>>>  MAINTAINERS              |    1 +
>>>>  include/capitalization.h | 1909 ++++++++++++++++++++++++++++++++++++++
>>>>  2 files changed, 1910 insertions(+)
>>>>  create mode 100644 include/capitalization.h
>>>>
>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>>> index a324139471..0a543309f2 100644
>>>> --- a/MAINTAINERS
>>>> +++ b/MAINTAINERS
>>>> @@ -368,6 +368,7 @@ F:	doc/DocBook/efi.tmpl
>>>>  F:	doc/README.uefi
>>>>  F:	doc/README.iscsi
>>>>  F:	Documentation/efi.rst
>>>> +F:	include/capitalization.h
>>>>  F:	include/efi*
>>>>  F:	include/pe.h
>>>>  F:	include/asm-generic/pe.h
>>>> diff --git a/include/capitalization.h b/include/capitalization.h
>>>> new file mode 100644
>>>> index 0000000000..50d5108f98
>>>> --- /dev/null
>>>> +++ b/include/capitalization.h
>>>> @@ -0,0 +1,1909 @@
>>>> +/* SPDX-License-Identifier: Unicode-DFS-2016 */
>>>> +/*
>>>> + * Correspondence table for small and capital Unicode letters in the range of
>>>> + * 0x0000 - 0xffff based on http://www.unicode.org/Public/UCA/11.0.0/allkeys.txt
>>>> + */
>>>> +
>>>> +struct capitalization_table {
>>>> +	u16 upper;
>>>> +	u16 lower;
>>>> +};
>>>> +
>>>> +#define UNICODE_CAPITALIZATION_TABLE { \
>>>
>>> Ugh, that is a *lot* of data. How much does the binary size grow with
>>> the table compiled in?

That data is also in glibc. I don’t know whether you use glibc though
...

>>> Is there any slightly more sophisticated pattern in the table maybe that
>>> we could just express as code? Would that turn out smaller maybe?
>> 
>> This is 3792 bytes of data. Unicode capitalization is quite random in
>> arranging lower and upper letters.
>> 
>> We could resort to zlib or gzip. But these libraries are not built by
>> default.
>
> Yeah, and that only adds to more overhead.
>
>> Most urgently we will need the capitalization table for generating and
>> checking short FAT filenames, so we could create a configuration switch
>> that would reduce this table to codepage 437 or codepage 1250 letters
>> depending on the chosen native character set.
>
> I think that's a great idea. There probably is a lot of overlap even
> between the two, so maybe just make it a config option for "non-latin
> upper/lower case conversion".
>
>> In EDK2 I only found code for codepage 1250.
>
> Yeah, I'd be surprised if people really needed more. In fact, how about
> you just default the config option to =n by default?
>
>
> Alex
>

-- 
📧 Mike FABIAN   <mike.fabian at gmx.de>
睡眠不足はいい仕事の敵だ。


More information about the U-Boot mailing list