[U-Boot] [U-Boot, v0, 06/20] common: add some utf16 handling helpers
Rob Clark
robdclark at gmail.com
Tue Aug 8 23:21:11 UTC 2017
On Tue, Aug 8, 2017 at 6:50 PM, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
> On 08/04/2017 09:31 PM, Rob Clark wrote:
>> We'll eventually want these in a few places in efi_loader, and also
>> vsprintf.
>>
>> Signed-off-by: Rob Clark <robdclark at gmail.com>
>> ---
>> common/Makefile | 1 +
>> common/charset.c | 81 ++++++++++++++++++++++++++++++++++++++++++++
>> include/charset.h | 18 ++++++++++
>> lib/efi_loader/efi_console.c | 17 ++--------
>> 4 files changed, 103 insertions(+), 14 deletions(-)
>> create mode 100644 common/charset.c
>> create mode 100644 include/charset.h
>>
>> diff --git a/common/Makefile b/common/Makefile
>> index 60681c845c..44c8e1ba52 100644
>> --- a/common/Makefile
>> +++ b/common/Makefile
>> @@ -175,5 +175,6 @@ obj-$(CONFIG_CMD_DFU) += dfu.o
>> obj-y += command.o
>> obj-y += s_record.o
>> obj-y += xyzModem.o
>> +obj-y += charset.o
>>
>> CFLAGS_env_embedded.o := -Wa,--no-warn -DENV_CRC=$(shell tools/envcrc 2>/dev/null)
>> diff --git a/common/charset.c b/common/charset.c
>> new file mode 100644
>> index 0000000000..eaff2e542e
>> --- /dev/null
>> +++ b/common/charset.c
>> @@ -0,0 +1,81 @@
>> +/*
>> + * charset conversion utils
>> + *
>> + * Copyright (c) 2017 Rob Clark
>> + *
>> + * SPDX-License-Identifier: GPL-2.0+
>> + */
>> +
>> +#include <common.h>
>> +#include <charset.h>
>> +
>> +/*
>> + * utf8/utf16 conversion mostly lifted from grub
>> + */
>> +
>> +size_t utf16_strlen(uint16_t *in)
>> +{
>> + size_t i;
>> + for (i = 0; in[i]; i++);
>> + return i;
>> +}
>> +
>> +size_t utf16_strnlen(const uint16_t *in, size_t count)
>> +{
>> + size_t i;
>> + for (i = 0; count-- && in[i]; i++);
>> + return i;
>> +}
>> +
>> +/* Convert UTF-16 to UTF-8. */
>> +uint8_t *utf16_to_utf8(uint8_t *dest, const uint16_t *src, size_t size)
>> +{
>> + uint32_t code_high = 0;
>> +
>> + while (size--) {
>
> We should not read past the trailing null world. Check *src == 0 somewhere.
so, all the places this is used in u-boot, and all the places I've
seen it used in grub (which is where this code comes from.. I won't
claim to the a utfN expert, this is the first time I've looked at this
sort of thing), you already know the string length.. either via
"protocol" (ie. you know the size of the file-path efi_device_path
element) or you have to do one of the utf16_strlen() variants before
calling this to know the size of the output string.
so utf16_to_utf8() shouldn't rely on null terminators, that seems like
it is just a sign the caller is doing something wrong.
Not sure if there is an equiv to WARN_ON() in u-boot.. maybe just
assert()? But checking for null should be more of an
assert()/WARN_ON() sort of thing, imho.
I'll add an assert(size == utf16_strnlen(src, size)) (unless someone
has something better to suggest)
>> + uint32_t code = *src++;
>> +
>> + if (code_high) {
>> + if (code >= 0xDC00 && code <= 0xDFFF) {
>> + /* Surrogate pair. */
>> + code = ((code_high - 0xD800) << 10) + (code - 0xDC00) + 0x10000;
>> +
>> + *dest++ = (code >> 18) | 0xF0;
>> + *dest++ = ((code >> 12) & 0x3F) | 0x80;
>> + *dest++ = ((code >> 6) & 0x3F) | 0x80;
>> + *dest++ = (code & 0x3F) | 0x80;
>> + } else {
>> + /* Error... */
>> + *dest++ = '?';
>> + /* *src may be valid. Don't eat it. */
>> + src--;
>> + }
>> +
>> + code_high = 0;
>> + } else {
>> + if (code <= 0x007F) {
>> + *dest++ = code;
>> + } else if (code <= 0x07FF) {
>> + *dest++ = (code >> 6) | 0xC0;
>> + *dest++ = (code & 0x3F) | 0x80;
>> + } else if (code >= 0xD800 && code <= 0xDBFF) {
>> + code_high = code;
>> + continue;
>> + } else if (code >= 0xDC00 && code <= 0xDFFF) {
>> + /* Error... */
>> + *dest++ = '?';
>
> The error handling is somewhat inconsistent:
>
> No output if code 0xD800-0xDBFF is the last word.
> Output '?' for 0xDC00-0xDFFF where not expected.
> Output extraneous '?' for 0xD800-0xDBFF not followed by 0xDC00-0xDBFF.
if you have something better to suggest I look at let me know. Seems
like (and I'm assuming the grub code can't be too bad since it boots a
whole lot of linux systems every day) this should only happen with a
malformed utf16 string?
(for the record, my only contribution to this code is utf16_strnlen()
and comments (and correcting grub2's painful indentation style :-P))
BR,
-R
> Best regards
>
> Heinrich
>
>> + } else if (code < 0x10000) {
>> + *dest++ = (code >> 12) | 0xE0;
>> + *dest++ = ((code >> 6) & 0x3F) | 0x80;
>> + *dest++ = (code & 0x3F) | 0x80;
>> + } else {
>> + *dest++ = (code >> 18) | 0xF0;
>> + *dest++ = ((code >> 12) & 0x3F) | 0x80;
>> + *dest++ = ((code >> 6) & 0x3F) | 0x80;
>> + *dest++ = (code & 0x3F) | 0x80;
>> + }
>> + }
>> + }
>> +
>> + return dest;
>> +}
>> diff --git a/include/charset.h b/include/charset.h
>> new file mode 100644
>> index 0000000000..2ee1172182
>> --- /dev/null
>> +++ b/include/charset.h
>> @@ -0,0 +1,18 @@
>> +/*
>> + * charset conversion utils
>> + *
>> + * Copyright (c) 2017 Rob Clark
>> + *
>> + * SPDX-License-Identifier: GPL-2.0+
>> + */
>> +
>> +#ifndef __CHARSET_H_
>> +#define __CHARSET_H_
>> +
>> +#define MAX_UTF8_PER_UTF16 4
>> +
>> +size_t utf16_strlen(uint16_t *in);
>> +size_t utf16_strnlen(const uint16_t *in, size_t count);
>> +uint8_t *utf16_to_utf8(uint8_t *dest, const uint16_t *src, size_t size);
>> +
>> +#endif /* __CHARSET_H_ */
>> diff --git a/lib/efi_loader/efi_console.c b/lib/efi_loader/efi_console.c
>> index 5ebce4b544..3fc82b8726 100644
>> --- a/lib/efi_loader/efi_console.c
>> +++ b/lib/efi_loader/efi_console.c
>> @@ -7,6 +7,7 @@
>> */
>>
>> #include <common.h>
>> +#include <charset.h>
>> #include <efi_loader.h>
>>
>> static bool console_size_queried;
>> @@ -138,20 +139,8 @@ static efi_status_t EFIAPI efi_cout_reset(
>>
>> static void print_unicode_in_utf8(u16 c)
>> {
>> - char utf8[4] = { 0 };
>> - char *b = utf8;
>> -
>> - if (c < 0x80) {
>> - *(b++) = c;
>> - } else if (c < 0x800) {
>> - *(b++) = 192 + c / 64;
>> - *(b++) = 128 + c % 64;
>> - } else {
>> - *(b++) = 224 + c / 4096;
>> - *(b++) = 128 + c / 64 % 64;
>> - *(b++) = 128 + c % 64;
>> - }
>> -
>> + char utf8[MAX_UTF8_PER_UTF16] = { 0 };
>> + utf16_to_utf8((u8 *)utf8, &c, 1);
>> puts(utf8);
>> }
>>
>>
>
More information about the U-Boot
mailing list