[U-Boot] [PATCH 11/15] efi_loader: capitalization table
Alexander Graf
agraf at suse.de
Sun Aug 26 22:06:40 UTC 2018
On 26.08.18 21:00, Heinrich Schuchardt wrote:
> On 08/26/2018 08:22 PM, Alexander Graf wrote:
>>
>>
>> On 11.08.18 17:28, Heinrich Schuchardt wrote:
>>> This patch provides a define to initialize a table that maps lower to
>>> capital letters for Unicode code point 0x0000 - 0xffff.
>>>
>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk at gmx.de>
>>> ---
>>> MAINTAINERS | 1 +
>>> include/capitalization.h | 1909 ++++++++++++++++++++++++++++++++++++++
>>> 2 files changed, 1910 insertions(+)
>>> create mode 100644 include/capitalization.h
>>>
>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>> index a324139471..0a543309f2 100644
>>> --- a/MAINTAINERS
>>> +++ b/MAINTAINERS
>>> @@ -368,6 +368,7 @@ F: doc/DocBook/efi.tmpl
>>> F: doc/README.uefi
>>> F: doc/README.iscsi
>>> F: Documentation/efi.rst
>>> +F: include/capitalization.h
>>> F: include/efi*
>>> F: include/pe.h
>>> F: include/asm-generic/pe.h
>>> diff --git a/include/capitalization.h b/include/capitalization.h
>>> new file mode 100644
>>> index 0000000000..50d5108f98
>>> --- /dev/null
>>> +++ b/include/capitalization.h
>>> @@ -0,0 +1,1909 @@
>>> +/* SPDX-License-Identifier: Unicode-DFS-2016 */
>>> +/*
>>> + * Correspondence table for small and capital Unicode letters in the range of
>>> + * 0x0000 - 0xffff based on http://www.unicode.org/Public/UCA/11.0.0/allkeys.txt
>>> + */
>>> +
>>> +struct capitalization_table {
>>> + u16 upper;
>>> + u16 lower;
>>> +};
>>> +
>>> +#define UNICODE_CAPITALIZATION_TABLE { \
>>
>> Ugh, that is a *lot* of data. How much does the binary size grow with
>> the table compiled in?
>>
>> Is there any slightly more sophisticated pattern in the table maybe that
>> we could just express as code? Would that turn out smaller maybe?
>
> This is 3792 bytes of data. Unicode capitalization is quite random in
> arranging lower and upper letters.
>
> We could resort to zlib or gzip. But these libraries are not built by
> default.
Yeah, and that only adds to more overhead.
> Most urgently we will need the capitalization table for generating and
> checking short FAT filenames, so we could create a configuration switch
> that would reduce this table to codepage 437 or codepage 1250 letters
> depending on the chosen native character set.
I think that's a great idea. There probably is a lot of overlap even
between the two, so maybe just make it a config option for "non-latin
upper/lower case conversion".
> In EDK2 I only found code for codepage 1250.
Yeah, I'd be surprised if people really needed more. In fact, how about
you just default the config option to =n by default?
Alex
More information about the U-Boot
mailing list