[U-Boot] [PATCH] vsprintf.c: add wide string (%ls) support

Heinrich Schuchardt xypron.glpk at gmx.de
Wed Aug 2 17:05:57 UTC 2017


On 08/02/2017 11:38 AM, Rob Clark wrote:
> On Tue, Aug 1, 2017 at 10:22 PM, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
>> On 07/31/2017 02:42 PM, Rob Clark wrote:
>>> This is convenient for efi_loader which deals a lot with utf16.
>>>
>>> Signed-off-by: Rob Clark <robdclark at gmail.com>
>>> ---
>>>  lib/vsprintf.c | 39 +++++++++++++++++++++++++++++++++++++--
>>>  1 file changed, 37 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
>>> index 874a2951f7..84e157ecb1 100644
>>> --- a/lib/vsprintf.c
>>> +++ b/lib/vsprintf.c
>>> @@ -270,6 +270,35 @@ static char *string(char *buf, char *end, char *s, int field_width,
>>>       return buf;
>>>  }
>>>
>>> +static size_t strnlen16(const u16* s, size_t count)
>>> +{
>>> +     const u16 *sc;
>>> +
>>> +     for (sc = s; count-- && *sc; ++sc)
>>> +             /* nothing */;
>>> +     return sc - s;
>>> +}
>>> +
>>> +static char *string16(char *buf, char *end, u16 *s, int field_width,
>>> +             int precision, int flags)
>>> +{
>>> +     int len, i;
>>> +
>>> +     if (s == NULL)
>>> +             s = L"<NULL>";
>>
>> The L notation creates a wchar_t string. The width of wchar_t depends on
>> gcc compiler flag -fshort-wchar.
>>
>> vsprintf.c is not compiled with -fshort-wchar. So change this to
>>
>> const u16 null[] = { '<', 'N', 'U', 'L', 'L', '>', 0};
>> s = null;
> 
> oh, I have another patch that adds -fshort-wchar globally.. which I
> probably should have split out and sent with this.
> 
> The problem is we cannot mix objects using short-wchar and ones that
> don't without a compiler warning.  Travis would complain a lot more
> but I guess BOOTEFI_HELLO is not normally enabled.
> 
> With addition of efi_bootmgr.c we really want to be able to use
> L"string" to be u16.. and I don't think u-boot has any good reason to
> use 32b wchar.
> 
> But maybe for this code I should use wchar_t instead of u16.
> 
> BR,
> -R

ext4 filenames may contain letters with Unicode values > 2**16,
e.g. using Takri letters: 𑚀𑚁𑚂

So ext4ls probably should be enabled to display these on a Unicode console.

Using -fshort-wchar globally is not necessary. Only UEFI requires 16 bit
wchar_t. We should rather not enforce the UEFI standard on the rest of
the code.

> 
>>> +
>>> +     len = strnlen16(s, precision);
>>> +
>>> +     if (!(flags & LEFT))
>>> +             while (len < field_width--)
>>> +                     ADDCH(buf, ' ');
>>> +     for (i = 0; i < len; ++i)
>>> +             ADDCH(buf, *s++);

I would prefer to see a conversion to UTF-8 here.

Conversion from 32bit Unicode (Or the capped 16bit Unicode of EFI) is
quite easy. This is what I used in another project:

        uint32_t u = s[i];
        char c[5];
        if (u < 0x80) {
            c[0] = u & 0x7F;
            c[1] = 0;
            str.append(c);
        } else if (u < 0x800) {
            c[1] = 0x80 | (u & 0x3F);
            u >>= 6;
            c[0] = 0xC0 | (u & 0x1F);
            c[2] = 0;
            str.append(c);
        } else if (u < 0x10000) {
            c[2] = 0x80 | (u & 0x3F);
            u >>= 6;
            c[1] = 0x80 | (u & 0x3F);
            u >>= 6;
            c[0] = 0xE0 | (u & 0x0F);
            c[3] = 0;
            str.append(c);
        } else if (u < 0x200000) {
            c[3] = 0x80 | (u & 0x3F);
            u >>= 6;
            c[2] = 0x80 | (u & 0x3F);
            u >>= 6;
            c[1] = 0x80 | (u & 0x3F);
            u >>= 6;
            c[0] = 0xF0 | (u & 0x07);
            c[4] = 0;
            str.append(c);
        } else {
            throw invalid;
        }

Best regards

Heinrich

>>> +     while (len < field_width--)
>>> +             ADDCH(buf, ' ');
>>> +     return buf;
>>> +}
>>> +
>>>  #ifdef CONFIG_CMD_NET
>>>  static const char hex_asc[] = "0123456789abcdef";
>>>  #define hex_asc_lo(x)        hex_asc[((x) & 0x0f)]
>>> @@ -528,8 +557,14 @@ repeat:
>>>                       continue;
>>>
>>>               case 's':
>>> -                     str = string(str, end, va_arg(args, char *),
>>> -                                  field_width, precision, flags);
>>> +                     if (qualifier == 'l') {
>>
>> According to ISO 9899:1999 %ls is used to indicate a wchar_t string,
>> which may be u32 * or u16 * depending on GCC flag -fshort-wchar.
>>
>> Wouldn't it make sense to use some other notation, e.g. %S, to indicate
>> that we explicitly mean u16 *?
>>
>> Please, add a comment into the code indicating why we need u16 * support
>> referring to the UEFI spec.
>>
>> Best regards
>>
>> Heinrich
>>
>>> +                             str = string16(str, end, va_arg(args, u16 *),
>>> +                                            field_width, precision, flags);
>>> +
>>> +                     } else {
>>> +                             str = string(str, end, va_arg(args, char *),
>>> +                                          field_width, precision, flags);
>>> +                     }
>>>                       continue;
>>>
>>>               case 'p':
>>>
>>
> 



More information about the U-Boot mailing list