[U-Boot] [PATCH 11/15] efi_loader: capitalization table
Mike FABIAN
maiku.fabian at gmail.com
Mon Aug 27 08:30:28 UTC 2018
Alexander Graf <agraf at suse.de> さんは書きました:
> On 26.08.18 21:00, Heinrich Schuchardt wrote:
>> On 08/26/2018 08:22 PM, Alexander Graf wrote:
>>>
>>>
>>> On 11.08.18 17:28, Heinrich Schuchardt wrote:
>>>> This patch provides a define to initialize a table that maps lower to
>>>> capital letters for Unicode code point 0x0000 - 0xffff.
>>>>
>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk at gmx.de>
>>>> ---
>>>> MAINTAINERS | 1 +
>>>> include/capitalization.h | 1909 ++++++++++++++++++++++++++++++++++++++
>>>> 2 files changed, 1910 insertions(+)
>>>> create mode 100644 include/capitalization.h
>>>>
>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>>> index a324139471..0a543309f2 100644
>>>> --- a/MAINTAINERS
>>>> +++ b/MAINTAINERS
>>>> @@ -368,6 +368,7 @@ F: doc/DocBook/efi.tmpl
>>>> F: doc/README.uefi
>>>> F: doc/README.iscsi
>>>> F: Documentation/efi.rst
>>>> +F: include/capitalization.h
>>>> F: include/efi*
>>>> F: include/pe.h
>>>> F: include/asm-generic/pe.h
>>>> diff --git a/include/capitalization.h b/include/capitalization.h
>>>> new file mode 100644
>>>> index 0000000000..50d5108f98
>>>> --- /dev/null
>>>> +++ b/include/capitalization.h
>>>> @@ -0,0 +1,1909 @@
>>>> +/* SPDX-License-Identifier: Unicode-DFS-2016 */
>>>> +/*
>>>> + * Correspondence table for small and capital Unicode letters in the range of
>>>> + * 0x0000 - 0xffff based on http://www.unicode.org/Public/UCA/11.0.0/allkeys.txt
>>>> + */
>>>> +
>>>> +struct capitalization_table {
>>>> + u16 upper;
>>>> + u16 lower;
>>>> +};
>>>> +
>>>> +#define UNICODE_CAPITALIZATION_TABLE { \
>>>
>>> Ugh, that is a *lot* of data. How much does the binary size grow with
>>> the table compiled in?
That data is also in glibc. I don’t know whether you use glibc though
...
>>> Is there any slightly more sophisticated pattern in the table maybe that
>>> we could just express as code? Would that turn out smaller maybe?
>>
>> This is 3792 bytes of data. Unicode capitalization is quite random in
>> arranging lower and upper letters.
>>
>> We could resort to zlib or gzip. But these libraries are not built by
>> default.
>
> Yeah, and that only adds to more overhead.
>
>> Most urgently we will need the capitalization table for generating and
>> checking short FAT filenames, so we could create a configuration switch
>> that would reduce this table to codepage 437 or codepage 1250 letters
>> depending on the chosen native character set.
>
> I think that's a great idea. There probably is a lot of overlap even
> between the two, so maybe just make it a config option for "non-latin
> upper/lower case conversion".
>
>> In EDK2 I only found code for codepage 1250.
>
> Yeah, I'd be surprised if people really needed more. In fact, how about
> you just default the config option to =n by default?
>
>
> Alex
>
--
📧 Mike FABIAN <mike.fabian at gmx.de>
睡眠不足はいい仕事の敵だ。
More information about the U-Boot
mailing list