[U-Boot] [PATCH] vsprintf.c: add wide string (%ls) support

Rob Clark robdclark at gmail.com
Wed Aug 2 22:25:50 UTC 2017


On Wed, Aug 2, 2017 at 5:40 PM, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
> On 08/02/2017 08:15 PM, Rob Clark wrote:
>> On Wed, Aug 2, 2017 at 1:05 PM, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
>>> On 08/02/2017 11:38 AM, Rob Clark wrote:
>>>> On Tue, Aug 1, 2017 at 10:22 PM, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
>>>>> On 07/31/2017 02:42 PM, Rob Clark wrote:
>>>>>> This is convenient for efi_loader which deals a lot with utf16.
>>>>>>
>>>>>> Signed-off-by: Rob Clark <robdclark at gmail.com>
>>>>>> ---
>>>>>>  lib/vsprintf.c | 39 +++++++++++++++++++++++++++++++++++++--
>>>>>>  1 file changed, 37 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
>>>>>> index 874a2951f7..84e157ecb1 100644
>>>>>> --- a/lib/vsprintf.c
>>>>>> +++ b/lib/vsprintf.c
>>>>>> @@ -270,6 +270,35 @@ static char *string(char *buf, char *end, char *s, int field_width,
>>>>>>       return buf;
>>>>>>  }
>>>>>>
>>>>>> +static size_t strnlen16(const u16* s, size_t count)
>>>>>> +{
>>>>>> +     const u16 *sc;
>>>>>> +
>>>>>> +     for (sc = s; count-- && *sc; ++sc)
>>>>>> +             /* nothing */;
>>>>>> +     return sc - s;
>>>>>> +}
>>>>>> +
>>>>>> +static char *string16(char *buf, char *end, u16 *s, int field_width,
>>>>>> +             int precision, int flags)
>>>>>> +{
>>>>>> +     int len, i;
>>>>>> +
>>>>>> +     if (s == NULL)
>>>>>> +             s = L"<NULL>";
>>>>>
>>>>> The L notation creates a wchar_t string. The width of wchar_t depends on
>>>>> gcc compiler flag -fshort-wchar.
>>>>>
>>>>> vsprintf.c is not compiled with -fshort-wchar. So change this to
>>>>>
>>>>> const u16 null[] = { '<', 'N', 'U', 'L', 'L', '>', 0};
>>>>> s = null;
>>>>
>>>> oh, I have another patch that adds -fshort-wchar globally.. which I
>>>> probably should have split out and sent with this.
>>>>
>>>> The problem is we cannot mix objects using short-wchar and ones that
>>>> don't without a compiler warning.  Travis would complain a lot more
>>>> but I guess BOOTEFI_HELLO is not normally enabled.
>>>>
>>>> With addition of efi_bootmgr.c we really want to be able to use
>>>> L"string" to be u16.. and I don't think u-boot has any good reason to
>>>> use 32b wchar.
>>>>
>>>> But maybe for this code I should use wchar_t instead of u16.
>>>>
>>>> BR,
>>>> -R
>>>
>>> ext4 filenames may contain letters with Unicode values > 2**16,
>>> e.g. using Takri letters: 𑚀𑚁𑚂
>>>
>>> So ext4ls probably should be enabled to display these on a Unicode console.
>>>
>>> Using -fshort-wchar globally is not necessary. Only UEFI requires 16 bit
>>> wchar_t. We should rather not enforce the UEFI standard on the rest of
>>> the code.
>>
>> The alternative is disabling a gcc warning about mixing 32b and 16b
>> wchar.. and really mixing 32b and 16b wchar seems like a bad idea.
>>
>> We could use -fshort-wchar only if EFI_LOADER is enabled.  Technically
>> if we are a UEFI implementation, we do not need to have ext2/ext4 (or
>> really anything other than fat/vfat).
>
> You can avoid the problem of variable width wchar by using constants
> starting with u (e.g. u"Hello world") which are char16_t (introduced
> with C11, #include <uchar.h>) and converting to utf-8 for console output.
>
> This way we do not need -fshort-wchar at all.

oh, that would simplify things a *lot*.. is there a corresponding
printf() modifier?

and any reason to fear requiring c11 (I have none myself)?

If not I'll switch over my patches to using u"string" and fixup efi hello too.

BR,
-R


> Best regards
>
> Heinrich
>
>>
>>>>
>>>>>> +
>>>>>> +     len = strnlen16(s, precision);
>>>>>> +
>>>>>> +     if (!(flags & LEFT))
>>>>>> +             while (len < field_width--)
>>>>>> +                     ADDCH(buf, ' ');
>>>>>> +     for (i = 0; i < len; ++i)
>>>>>> +             ADDCH(buf, *s++);
>>>
>>> I would prefer to see a conversion to UTF-8 here.
>>>
>>> Conversion from 32bit Unicode (Or the capped 16bit Unicode of EFI) is
>>> quite easy. This is what I used in another project:
>>>
>>>         uint32_t u = s[i];
>>>         char c[5];
>>>         if (u < 0x80) {
>>>             c[0] = u & 0x7F;
>>>             c[1] = 0;
>>>             str.append(c);
>>>         } else if (u < 0x800) {
>>>             c[1] = 0x80 | (u & 0x3F);
>>>             u >>= 6;
>>>             c[0] = 0xC0 | (u & 0x1F);
>>>             c[2] = 0;
>>>             str.append(c);
>>>         } else if (u < 0x10000) {
>>>             c[2] = 0x80 | (u & 0x3F);
>>>             u >>= 6;
>>>             c[1] = 0x80 | (u & 0x3F);
>>>             u >>= 6;
>>>             c[0] = 0xE0 | (u & 0x0F);
>>>             c[3] = 0;
>>>             str.append(c);
>>>         } else if (u < 0x200000) {
>>>             c[3] = 0x80 | (u & 0x3F);
>>>             u >>= 6;
>>>             c[2] = 0x80 | (u & 0x3F);
>>>             u >>= 6;
>>>             c[1] = 0x80 | (u & 0x3F);
>>>             u >>= 6;
>>>             c[0] = 0xF0 | (u & 0x07);
>>>             c[4] = 0;
>>>             str.append(c);
>>>         } else {
>>>             throw invalid;
>>>         }
>>
>> I did add a utf16_to_utf8() (based on code from grub) as part of the
>> efi-variables patch, since there we are dealing with utf16 coming from
>> outside of grub.  I guess I could use that.  I think that mostly
>> matters if we end up printing strings that originate outside of
>> u-boot, but I guess that will be the case for filenames in a
>> device-path.
>>
>> BR,
>> -R
>>
>>> Best regards
>>>
>>> Heinrich
>>>
>>>>>> +     while (len < field_width--)
>>>>>> +             ADDCH(buf, ' ');
>>>>>> +     return buf;
>>>>>> +}
>>>>>> +
>>>>>>  #ifdef CONFIG_CMD_NET
>>>>>>  static const char hex_asc[] = "0123456789abcdef";
>>>>>>  #define hex_asc_lo(x)        hex_asc[((x) & 0x0f)]
>>>>>> @@ -528,8 +557,14 @@ repeat:
>>>>>>                       continue;
>>>>>>
>>>>>>               case 's':
>>>>>> -                     str = string(str, end, va_arg(args, char *),
>>>>>> -                                  field_width, precision, flags);
>>>>>> +                     if (qualifier == 'l') {
>>>>>
>>>>> According to ISO 9899:1999 %ls is used to indicate a wchar_t string,
>>>>> which may be u32 * or u16 * depending on GCC flag -fshort-wchar.
>>>>>
>>>>> Wouldn't it make sense to use some other notation, e.g. %S, to indicate
>>>>> that we explicitly mean u16 *?
>>>>>
>>>>> Please, add a comment into the code indicating why we need u16 * support
>>>>> referring to the UEFI spec.
>>>>>
>>>>> Best regards
>>>>>
>>>>> Heinrich
>>>>>
>>>>>> +                             str = string16(str, end, va_arg(args, u16 *),
>>>>>> +                                            field_width, precision, flags);
>>>>>> +
>>>>>> +                     } else {
>>>>>> +                             str = string(str, end, va_arg(args, char *),
>>>>>> +                                          field_width, precision, flags);
>>>>>> +                     }
>>>>>>                       continue;
>>>>>>
>>>>>>               case 'p':
>>>>>>
>>>>>
>>>>
>>>
>>
>


More information about the U-Boot mailing list