[U-Boot] [U-Boot, v0, 07/20] vsprintf.c: add wide string (%ls) support

Tue Aug 8 23:08:00 UTC 2017

On 08/09/2017 12:44 AM, Rob Clark wrote:
> On Tue, Aug 8, 2017 at 6:03 PM, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
>> On 08/04/2017 09:31 PM, Rob Clark wrote:
>>> This is convenient for efi_loader which deals a lot with utf16.
>>>
>>> Signed-off-by: Rob Clark <robdclark at gmail.com>
>>
>> Please, put this patch together with
>> [PATCH] vsprintf.c: add GUID printing
>> https://patchwork.ozlabs.org/patch/798362/
>> and
>> [PATCH v0 06/20] common: add some utf16 handling helpers
>> https://patchwork.ozlabs.org/patch/797968/
>> into a separate patch series.
>>
>> These three patches can be reviewed independently of the efi_loader
>> patches and probably will not be integrated via the efi-next tree.
> 
> I'll resend these as a separate patchset, and just not in next
> revision of efi_loader patchset that it is a dependency
> 
>>> ---
>>>  lib/vsprintf.c | 30 ++++++++++++++++++++++++++++--
>>>  1 file changed, 28 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
>>> index 874a2951f7..0c40f852ce 100644
>>> --- a/lib/vsprintf.c
>>> +++ b/lib/vsprintf.c
>>> @@ -17,6 +17,7 @@
>>>  #include <linux/ctype.h>
>>>
>>>  #include <common.h>
>>> +#include <charset.h>
>>>
>>>  #include <div64.h>
>>>  #define noinline __attribute__((noinline))
>>> @@ -270,6 +271,26 @@ static char *string(char *buf, char *end, char *s, int field_width,
>>>       return buf;
>>>  }
>>>
>>> +static char *string16(char *buf, char *end, u16 *s, int field_width,
>>> +             int precision, int flags)
>>> +{
>>> +     u16 *str = s ? s : L"<NULL>";
>> Please, do not use the L-notation here as it requires -fshort-wchar.
>> As we currently cannot switch the complete project to C11 you cannot use
>> the u-notation either.
> 
> current plan was to either switch whole project to -fshort-wchar or
> c11 and rework these patches (as well as a few patches in the
> efi_loader patchset).  (In the c11 case, I'm not sure what we'll use
> as the fmt string, since afaict that isn't specified.  We could use %S
> although that seems to be a deprecated way to do %ls, or something
> different like %A, I guess)..
> 
> how far are we from c11?  If there is stuff I can do to help let me
> know.  If feasible, I'd rather do that first rather than have a bunch
> of stuff in vsprintf and elsewhere that needs to be cleaned up later
> after the switch.

buildman downloads very old compilers (gcc < 4.8) from kernel.org which
do not support C11.
Travis CI uses Ubuntu 14.04 with gcc 4.8.4 which incorrectly throws an
error for disk/part.c in C11 mode.

To get things right we would have to
* build our own cross tool chains based on a current gcc version
* use our own tool chain in Travis for x86-64 or use a docker
  container with a current gcc version.

In the long run heading for C11 would be the right thing to do.
Until then use an initializer { '<', 'N', 'U', 'L', 'L', '>' }.
It looks ugly but does not consume more bytes once compiled.

Regards

Heinrich

> 
>>
>>> +     int len = utf16_strnlen(str, precision);
>>> +     u8 utf8[len * MAX_UTF8_PER_UTF16];
>>
>> Didn't you forget 1 byte for \0 here?
>>
>> This is what strlnlen does:
>>
>> The strnlen() function returns the number of characters in the string
>> pointed to by s, **excluding** the terminating null byte ('\0'), but at
>> most maxlen.
>>
>> I would expect the exclusion of the terminating null word by an
>> utf16_strnlen function.
> 
> you are right, but fixing the wrong problem.. the code is definitely
> wrong since length of a utf16 string != length of a utf8 string, and
> we don't need to append a null terminator.. so my logic below using
> 'len' is wrong.  I'll fix that in the next version.
> 
>>> +     int i;
>>> +
>>> +     *utf16_to_utf8(utf8, str, len) = '\0';
>>> +
>>> +     if (!(flags & LEFT))
>>> +             while (len < field_width--)
>>> +                     ADDCH(buf, ' ');
>>> +     for (i = 0; i < len; ++i)
>>> +             ADDCH(buf, utf8[i]);
>>> +     while (len < field_width--)
>>> +             ADDCH(buf, ' ');
>>> +     return buf;
>>> +}
>>> +
>>>  #ifdef CONFIG_CMD_NET
>>>  static const char hex_asc[] = "0123456789abcdef";
>>>  #define hex_asc_lo(x)        hex_asc[((x) & 0x0f)]
>>> @@ -528,8 +549,13 @@ repeat:
>>>                       continue;
>>>
>>>               case 's':
>>> -                     str = string(str, end, va_arg(args, char *),
>>> -                                  field_width, precision, flags);
>>> +                     if (qualifier == 'l') {
>>
>> %ls refers to wchar with implementation dependent width in the C standard.
>> There is no qualifier for 16-bit wchar. Couldn't we use %us here in
>> reference to the u-notation ( u'MyString' ). This would leave the path
>> open for a standard compliant '%ls'.
> 
> hmm, yeah, that is a clever idea, and I like it better than %A or %S..
> so if we go the c11 route I'll do that.  The c11 committee should have
> thought of that ;-)
> 
> BR,
> -R
> 
>> Best regards
>>
>> Heinrich
>>
>>> +                             str = string16(str, end, va_arg(args, u16 *),
>>> +                                            field_width, precision, flags);
>>> +                     } else {
>>> +                             str = string(str, end, va_arg(args, char *),
>>> +                                          field_width, precision, flags);
>>> +                     }
>>>                       continue;
>>>
>>>               case 'p':
>>>
>