[U-Boot] [PATCH 11/15] efi_loader: capitalization table

Alexander Graf agraf at suse.de
Sun Aug 26 22:06:40 UTC 2018



On 26.08.18 21:00, Heinrich Schuchardt wrote:
> On 08/26/2018 08:22 PM, Alexander Graf wrote:
>>
>>
>> On 11.08.18 17:28, Heinrich Schuchardt wrote:
>>> This patch provides a define to initialize a table that maps lower to
>>> capital letters for Unicode code point 0x0000 - 0xffff.
>>>
>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk at gmx.de>
>>> ---
>>>  MAINTAINERS              |    1 +
>>>  include/capitalization.h | 1909 ++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 1910 insertions(+)
>>>  create mode 100644 include/capitalization.h
>>>
>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>> index a324139471..0a543309f2 100644
>>> --- a/MAINTAINERS
>>> +++ b/MAINTAINERS
>>> @@ -368,6 +368,7 @@ F:	doc/DocBook/efi.tmpl
>>>  F:	doc/README.uefi
>>>  F:	doc/README.iscsi
>>>  F:	Documentation/efi.rst
>>> +F:	include/capitalization.h
>>>  F:	include/efi*
>>>  F:	include/pe.h
>>>  F:	include/asm-generic/pe.h
>>> diff --git a/include/capitalization.h b/include/capitalization.h
>>> new file mode 100644
>>> index 0000000000..50d5108f98
>>> --- /dev/null
>>> +++ b/include/capitalization.h
>>> @@ -0,0 +1,1909 @@
>>> +/* SPDX-License-Identifier: Unicode-DFS-2016 */
>>> +/*
>>> + * Correspondence table for small and capital Unicode letters in the range of
>>> + * 0x0000 - 0xffff based on http://www.unicode.org/Public/UCA/11.0.0/allkeys.txt
>>> + */
>>> +
>>> +struct capitalization_table {
>>> +	u16 upper;
>>> +	u16 lower;
>>> +};
>>> +
>>> +#define UNICODE_CAPITALIZATION_TABLE { \
>>
>> Ugh, that is a *lot* of data. How much does the binary size grow with
>> the table compiled in?
>>
>> Is there any slightly more sophisticated pattern in the table maybe that
>> we could just express as code? Would that turn out smaller maybe?
> 
> This is 3792 bytes of data. Unicode capitalization is quite random in
> arranging lower and upper letters.
> 
> We could resort to zlib or gzip. But these libraries are not built by
> default.

Yeah, and that only adds to more overhead.

> Most urgently we will need the capitalization table for generating and
> checking short FAT filenames, so we could create a configuration switch
> that would reduce this table to codepage 437 or codepage 1250 letters
> depending on the chosen native character set.

I think that's a great idea. There probably is a lot of overlap even
between the two, so maybe just make it a config option for "non-latin
upper/lower case conversion".

> In EDK2 I only found code for codepage 1250.

Yeah, I'd be surprised if people really needed more. In fact, how about
you just default the config option to =n by default?


Alex


More information about the U-Boot mailing list