[U-Boot] [PATCH] vsprintf.c: add wide string (%ls) support
Rob Clark
robdclark at gmail.com
Wed Aug 2 18:15:34 UTC 2017
On Wed, Aug 2, 2017 at 1:05 PM, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
> On 08/02/2017 11:38 AM, Rob Clark wrote:
>> On Tue, Aug 1, 2017 at 10:22 PM, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
>>> On 07/31/2017 02:42 PM, Rob Clark wrote:
>>>> This is convenient for efi_loader which deals a lot with utf16.
>>>>
>>>> Signed-off-by: Rob Clark <robdclark at gmail.com>
>>>> ---
>>>> lib/vsprintf.c | 39 +++++++++++++++++++++++++++++++++++++--
>>>> 1 file changed, 37 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
>>>> index 874a2951f7..84e157ecb1 100644
>>>> --- a/lib/vsprintf.c
>>>> +++ b/lib/vsprintf.c
>>>> @@ -270,6 +270,35 @@ static char *string(char *buf, char *end, char *s, int field_width,
>>>> return buf;
>>>> }
>>>>
>>>> +static size_t strnlen16(const u16* s, size_t count)
>>>> +{
>>>> + const u16 *sc;
>>>> +
>>>> + for (sc = s; count-- && *sc; ++sc)
>>>> + /* nothing */;
>>>> + return sc - s;
>>>> +}
>>>> +
>>>> +static char *string16(char *buf, char *end, u16 *s, int field_width,
>>>> + int precision, int flags)
>>>> +{
>>>> + int len, i;
>>>> +
>>>> + if (s == NULL)
>>>> + s = L"<NULL>";
>>>
>>> The L notation creates a wchar_t string. The width of wchar_t depends on
>>> gcc compiler flag -fshort-wchar.
>>>
>>> vsprintf.c is not compiled with -fshort-wchar. So change this to
>>>
>>> const u16 null[] = { '<', 'N', 'U', 'L', 'L', '>', 0};
>>> s = null;
>>
>> oh, I have another patch that adds -fshort-wchar globally.. which I
>> probably should have split out and sent with this.
>>
>> The problem is we cannot mix objects using short-wchar and ones that
>> don't without a compiler warning. Travis would complain a lot more
>> but I guess BOOTEFI_HELLO is not normally enabled.
>>
>> With addition of efi_bootmgr.c we really want to be able to use
>> L"string" to be u16.. and I don't think u-boot has any good reason to
>> use 32b wchar.
>>
>> But maybe for this code I should use wchar_t instead of u16.
>>
>> BR,
>> -R
>
> ext4 filenames may contain letters with Unicode values > 2**16,
> e.g. using Takri letters: 𑚀𑚁𑚂
>
> So ext4ls probably should be enabled to display these on a Unicode console.
>
> Using -fshort-wchar globally is not necessary. Only UEFI requires 16 bit
> wchar_t. We should rather not enforce the UEFI standard on the rest of
> the code.
The alternative is disabling a gcc warning about mixing 32b and 16b
wchar.. and really mixing 32b and 16b wchar seems like a bad idea.
We could use -fshort-wchar only if EFI_LOADER is enabled. Technically
if we are a UEFI implementation, we do not need to have ext2/ext4 (or
really anything other than fat/vfat).
>>
>>>> +
>>>> + len = strnlen16(s, precision);
>>>> +
>>>> + if (!(flags & LEFT))
>>>> + while (len < field_width--)
>>>> + ADDCH(buf, ' ');
>>>> + for (i = 0; i < len; ++i)
>>>> + ADDCH(buf, *s++);
>
> I would prefer to see a conversion to UTF-8 here.
>
> Conversion from 32bit Unicode (Or the capped 16bit Unicode of EFI) is
> quite easy. This is what I used in another project:
>
> uint32_t u = s[i];
> char c[5];
> if (u < 0x80) {
> c[0] = u & 0x7F;
> c[1] = 0;
> str.append(c);
> } else if (u < 0x800) {
> c[1] = 0x80 | (u & 0x3F);
> u >>= 6;
> c[0] = 0xC0 | (u & 0x1F);
> c[2] = 0;
> str.append(c);
> } else if (u < 0x10000) {
> c[2] = 0x80 | (u & 0x3F);
> u >>= 6;
> c[1] = 0x80 | (u & 0x3F);
> u >>= 6;
> c[0] = 0xE0 | (u & 0x0F);
> c[3] = 0;
> str.append(c);
> } else if (u < 0x200000) {
> c[3] = 0x80 | (u & 0x3F);
> u >>= 6;
> c[2] = 0x80 | (u & 0x3F);
> u >>= 6;
> c[1] = 0x80 | (u & 0x3F);
> u >>= 6;
> c[0] = 0xF0 | (u & 0x07);
> c[4] = 0;
> str.append(c);
> } else {
> throw invalid;
> }
I did add a utf16_to_utf8() (based on code from grub) as part of the
efi-variables patch, since there we are dealing with utf16 coming from
outside of grub. I guess I could use that. I think that mostly
matters if we end up printing strings that originate outside of
u-boot, but I guess that will be the case for filenames in a
device-path.
BR,
-R
> Best regards
>
> Heinrich
>
>>>> + while (len < field_width--)
>>>> + ADDCH(buf, ' ');
>>>> + return buf;
>>>> +}
>>>> +
>>>> #ifdef CONFIG_CMD_NET
>>>> static const char hex_asc[] = "0123456789abcdef";
>>>> #define hex_asc_lo(x) hex_asc[((x) & 0x0f)]
>>>> @@ -528,8 +557,14 @@ repeat:
>>>> continue;
>>>>
>>>> case 's':
>>>> - str = string(str, end, va_arg(args, char *),
>>>> - field_width, precision, flags);
>>>> + if (qualifier == 'l') {
>>>
>>> According to ISO 9899:1999 %ls is used to indicate a wchar_t string,
>>> which may be u32 * or u16 * depending on GCC flag -fshort-wchar.
>>>
>>> Wouldn't it make sense to use some other notation, e.g. %S, to indicate
>>> that we explicitly mean u16 *?
>>>
>>> Please, add a comment into the code indicating why we need u16 * support
>>> referring to the UEFI spec.
>>>
>>> Best regards
>>>
>>> Heinrich
>>>
>>>> + str = string16(str, end, va_arg(args, u16 *),
>>>> + field_width, precision, flags);
>>>> +
>>>> + } else {
>>>> + str = string(str, end, va_arg(args, char *),
>>>> + field_width, precision, flags);
>>>> + }
>>>> continue;
>>>>
>>>> case 'p':
>>>>
>>>
>>
>
More information about the U-Boot
mailing list