[U-Boot] [PATCH 2/6] efi_loader: support Unicode text input

Mon Sep 10 13:01:59 UTC 2018

On 09/10/2018 12:05 PM, Alexander Graf wrote:
> On 09/09/2018 07:57 AM, Heinrich Schuchardt wrote:
>> Up to now the EFI_TEXT_INPUT_PROTOCOL only supported ASCII characters.
>>
>> With the patch it can consume UTF-8 from the serial console or
>> codepage 437 special characters from the local keyboard.
>>
>> Signed-off-by: Heinrich Schuchardt <xypron.glpk at gmx.de>
>> ---
>>   lib/efi_loader/efi_console.c | 80 ++++++++++++++++++++++++++++++++++--
>>   1 file changed, 76 insertions(+), 4 deletions(-)
>>
>> diff --git a/lib/efi_loader/efi_console.c b/lib/efi_loader/efi_console.c
>> index 3ca6fe536c..8c45290b2e 100644
>> --- a/lib/efi_loader/efi_console.c
>> +++ b/lib/efi_loader/efi_console.c
>> @@ -15,12 +15,18 @@
>>   #define EFI_COUT_MODE_2 2
>>   #define EFI_MAX_COUT_MODE 3
>>   +/* Keyboard layouts */
>> +#define KBD_US        0    /* default US layout */
>> +#define KBD_GER        1    /* German layout */
>> +
>>   struct cout_mode {
>>       unsigned long columns;
>>       unsigned long rows;
>>       int present;
>>   };
>>   +static int keymap = KBD_US;
>> +
>>   static struct cout_mode efi_cout_modes[] = {
>>       /* EFI Mode 0 is 80x25 and always present */
>>       {
>> @@ -390,6 +396,19 @@ struct efi_simple_text_output_protocol 
>> efi_con_out = {
>>       .mode = (void*)&efi_con_mode,
>>   };
>>   +static void efi_set_keymap(void)
>> +{
>> +    char *penv;
>> +
>> +    /* Init keyboard device (default US layout) */
>> +    keymap = KBD_US;
>> +    penv = env_get("keymap");
>> +    if (penv) {
>> +        if (strncmp(penv, "de", 3) == 0)
>> +            keymap = KBD_GER;
>
> I'm not terribly happy with this. It's very i8042 specific. Currently 
> U-Boot only implements the keymap variable there. Couldn't we add an 
> extended getc() that reads a full 32bit unicode value from the target 
> device? That way we could add an interim layer that everyone - not 
> just efi_loader - can use to extract keycodes coherently from any input.

Or actually maybe using UTF-8 internally is the right way to go. Then we 
have less conversion and overhead to do in the default (serial) case.

Marek, what's your take on the idea? We could just make the i8042 driver 
emit UTF-8 codes on getc(). Then everything internally would just be 
based on UTF-8 and we don't need to push target input awareness into 
layers that really shouldn't care.

>
>> +    }
>> +}
>> +
>>   static efi_status_t EFIAPI efi_cin_reset(
>>               struct efi_simple_text_input_protocol *this,
>>               bool extended_verification)
>> @@ -453,17 +472,16 @@ static efi_status_t EFIAPI 
>> efi_cin_read_key_stroke(
>>           .scan_code = 0,
>>           .unicode_char = 0,
>>       };
>> -    char ch;
>> +    int ch;
>>         EFI_ENTRY("%p, %p", this, key);
>>         /* We don't do interrupts, so check for timers cooperatively */
>>       efi_timer_check();
>>   -    if (!tstc()) {
>> +    if (!tstc())
>>           /* No key pressed */
>> -        return EFI_EXIT(EFI_NOT_READY);
>> -    }
>> +        goto error;
>>         ch = getc();
>>       if (ch == cESC) {
>> @@ -550,12 +568,63 @@ static efi_status_t EFIAPI 
>> efi_cin_read_key_stroke(
>>       } else if (ch == 0x7f) {
>>           /* Backspace */
>>           ch = 0x08;
>> +    } else if (keymap == KBD_US && ch >= 0xc2 && ch <= 0xf4) {
>> +        /*
>> +         * Unicode
>> +         *
>> +         * We assume here that the serial console is using UTF-8.
>> +         * This of cause depends on the terminal settings.
>> +         */
>> +        int code = 0;
>> +
>> +        if (ch >= 0xe0) {
>> +            if (ch >= 0xf0) {
>> +                /* 0xf0 - 0xf4 */
>> +                ch &= 0x07;
>> +                code = ch << 18;
>> +                ch = getc();
>> +                if (ch < 0x80 || ch > 0xbf)
>> +                    goto error;
>> +                ch &= 0x3f;
>> +            } else {
>> +                /* 0xe0 - 0xef */
>> +                ch &= 0x0f;
>> +            }
>> +            code += ch << 12;
>> +            if ((code >= 0xD800 && code <= 0xDFFF) ||
>> +                code >= 0x110000)
>> +                goto error;
>> +            ch = getc();
>> +            if (ch < 0x80 || ch > 0xbf)
>> +                goto error;
>> +        }
>> +        /* 0xc0 - 0xdf or continuation byte (0x80 - 0xbf) */
>> +        ch &= 0x3f;
>> +        code += ch << 6;
>> +        ch = getc();
>> +        if (ch < 0x80 || ch > 0xbf)
>> +            goto error;
>> +        ch &= 0x3f;
>> +        ch += code;
>
> All of the logic above for example really shouldn't live inside of 
> efi_loader. It belongs somewhere more generic.

In fact, this probably should be a call in charset.c or so. Something 
like get_utf8() with a function pointer argument that basically just 
passes getc().

Alex