[PATCH] tbot_contrib/utils.py: convert a string into a dictionary via a pattern

Tue May 19 13:10:25 CEST 2020

Hello Harald,

Am 19.05.2020 um 12:55 schrieb Harald Seiler:
> Hello Heiko,
> 
> sorry for the slow responses to this ...

no problem, thanks for your time!

> On Thu, 2020-05-07 at 14:25 +0200, Heiko Schocher wrote:
>> Hello Harald,
>>
>> Am 05.05.2020 um 17:02 schrieb Harald Seiler:
>>> Hello Heiko,
>>>
>>> On Mon, 2020-04-20 at 15:51 +0200, Heiko Schocher wrote:
>>>> introduce function string_to_dict(), which converts
>>>> a string into dictionary via a pattern.
>>>
>>> Can you quickly explain where this helper is useful to you?  IMO the name
>>> string_to_dict() doesn't really hint at what it does.  If I can see this
>>> correctly, this is kind of a reverse string formatting.  So maybe
>>> match_format(), parse_format(), or unformat() would be more appropriate?
>>
>> I used if for example to parse a output line of the latency
>> command:
>>
>> output lines are from the follwoing format:
>>
>>       # RTT|  00:00:01  (periodic user-mode task, 1000 us period, priority 99)
>>       # RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
>>       # RTD|     11.458|     16.702|     39.250|       0|     0|     11.458|     39.250
>>
>> so if a line contains RTD I used the format:
>>
>>       rtd_format = 'RTD\|{min}\|{avg}\|{max}\|{overrun}\|{msw}\|{best}\|{worst}'
>>
>> and did:
>>
>>       for l in ret:
>>           if 'RTD' in l["out"]:
>>               rtd_dict = string_to_dict(l["out"], rtd_format)
>>
>>
>> Ok, we can rename it to "parse_format" ?
>>
>> But may there is some easier trick with python 3 ?
> 
> Hm, I see your use-case though I am a bit sceptical whether this is the
> right solution for your problem.  The main advantage of your function over
> regex is that the format/pattern string is easier to read.  But then you
> still need to regex-escape the characters in between match-groups which
> kind of nullifies that advantage again.  For reference, your example could
> be solved using just Python's regex like this:
> 
>      import re
> 
>      format = r"RTD\|(?P<min>.+)\|(?P<avg>.+)\|(?P<max>.+)\|(?P<overrun>.+)\|(?P<msw>.+)\|(?P<best>.+)\|(?P<worst>.+)"
>      match = re.match(format, output)
>      rtd_dict = match.groupdict()
> 
> Now fixing this would mean the whole thing comes very close to the
> existing `parse` module [1] so at that point I'd argue it is more sane to
> just use that:
> 
>      import parse
> 
>      fmt = "RTD|{min:^g}|{avg:^g}|{max:^g}|{overrun:^g}|{msw:^g}|{best:^g}|{worst:^g}"
>      res = parse.parse(fmt, output)
>      rtd_dict = res.named
> 
> OTOH for your specific case a simple split could work as well:
> 
>      names = ["min", "avg", "max", "overrun", "msw", "best", "worst"]
>      rtd_dict = {name: float(value) for name, value in zip(names, output.split("|")[1:])}
> 
> Not sure what would suit you best.
> 
> [1]: https://pypi.org/project/parse/

Hah, knew that you know a better way for this!

thanks for the hint ... so I would say, ignore my patch.

bye,
Heiko
> 
>>>> Signed-off-by: Heiko Schocher <hs at denx.de>
>>>> ---
>>>>
>>>>    tbot_contrib/utils.py | 25 +++++++++++++++++++++++++
>>>>    1 file changed, 25 insertions(+)
>>>>
>>>> diff --git a/tbot_contrib/utils.py b/tbot_contrib/utils.py
>>>> index 8073f89..1afb65b 100644
>>>> --- a/tbot_contrib/utils.py
>>>> +++ b/tbot_contrib/utils.py
>>>> @@ -15,6 +15,7 @@
>>>>    # along with this program.  If not, see <https://www.gnu.org/licenses/>;;;;;.
>>>>    
>>>>    from tbot.machine import linux
>>>> +import re
>>>>    
>>>>    
>>>>    def check_systemd_services_running(lnx: linux.LinuxShell, services: list) -> None:
>>>> @@ -29,3 +30,27 @@ def check_systemd_services_running(lnx: linux.LinuxShell, services: list) -> Non
>>>>            ret = lnx.exec("systemctl", "status", s, "--no-pager")
>>>>            if ret[0] != 0:
>>>>                lnx.test("sudo", "systemctl", "start", s)
>>>> +
>>>> +
>>>> +def string_to_dict(string: str, pattern: str) -> dict:
>>>
>>> Annotation for the return type needs to be typing.Dict[str, str].
>>>
>>> Also, to keep consistent with the rest of Python (e.g. the `re` module),
>>> the arguments should be switched so that pattern comes first.
>>
>> Ok, can clean this up...
>>
>>>> +    """
>>>> +    convert a string into a dictionary via a pattern
>>>> +
>>>> +    example pattern:
>>>> +    'hello, my name is {name} and I am a {age} year old {what}'
>>>> +
>>>> +    string:
>>>> +    'hello, my name is dan and I am a 33 year old developer'
>>>> +
>>>> +    returned dict:
>>>> +    {'age': '33', 'name': 'dan', 'what': 'developer'}
>>>> +    from:
>>>> +    https://stackoverflow.com/questions/11844986/convert-or-unformat-a-string-to-variables-like-format-but-in-reverse-in-p
>>>> +    """
>>>> +    regex = re.sub(r"{(.+?)}", r"(?P<_\1>.+)", pattern)
>>>
>>> The match-pattern used here, `(?P<_group>.+)` matches greedy.  This means
>>> for your above example that the following string
>>>
>>>       hello, my name is dan and I am a 33 year old developer and I am a 33 year old developer
>>>
>>> will yield
>>>
>>>       {'name': 'dan and I am a 33 year old developer', 'age': '33',
>>>        'what': 'developer'}
>>>
>>> Not sure if this is a problem though.  It could be changed by using
>>> `(?P<_group>.+?)` instead if non-greedy behavior is more desirable.
>>
>> Yes, I change this.
>>
>> Thanks!
>>
>> bye,
>> Heiko
>>>> +    match = re.search(regex, string)
>>>> +    assert match is not None, f"The pattern {regex!r} was not found!"
>>>> +    values = list(match.groups())
>>>> +    keys = re.findall(r"{(.+?)}", pattern)
>>>> +    _dict = dict(zip(keys, values))
>>>> +    return _dict
> 

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-52   Fax: +49-8142-66989-80   Email: hs at denx.de