[PATCH] tbot_contrib/utils.py: convert a string into a dictionary via a pattern
Harald Seiler
hws at denx.de
Tue May 19 12:55:12 CEST 2020
Hello Heiko,
sorry for the slow responses to this ...
On Thu, 2020-05-07 at 14:25 +0200, Heiko Schocher wrote:
> Hello Harald,
>
> Am 05.05.2020 um 17:02 schrieb Harald Seiler:
> > Hello Heiko,
> >
> > On Mon, 2020-04-20 at 15:51 +0200, Heiko Schocher wrote:
> > > introduce function string_to_dict(), which converts
> > > a string into dictionary via a pattern.
> >
> > Can you quickly explain where this helper is useful to you? IMO the name
> > string_to_dict() doesn't really hint at what it does. If I can see this
> > correctly, this is kind of a reverse string formatting. So maybe
> > match_format(), parse_format(), or unformat() would be more appropriate?
>
> I used if for example to parse a output line of the latency
> command:
>
> output lines are from the follwoing format:
>
> # RTT| 00:00:01 (periodic user-mode task, 1000 us period, priority 99)
> # RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
> # RTD| 11.458| 16.702| 39.250| 0| 0| 11.458| 39.250
>
> so if a line contains RTD I used the format:
>
> rtd_format = 'RTD\|{min}\|{avg}\|{max}\|{overrun}\|{msw}\|{best}\|{worst}'
>
> and did:
>
> for l in ret:
> if 'RTD' in l["out"]:
> rtd_dict = string_to_dict(l["out"], rtd_format)
>
>
> Ok, we can rename it to "parse_format" ?
>
> But may there is some easier trick with python 3 ?
Hm, I see your use-case though I am a bit sceptical whether this is the
right solution for your problem. The main advantage of your function over
regex is that the format/pattern string is easier to read. But then you
still need to regex-escape the characters in between match-groups which
kind of nullifies that advantage again. For reference, your example could
be solved using just Python's regex like this:
import re
format = r"RTD\|(?P<min>.+)\|(?P<avg>.+)\|(?P<max>.+)\|(?P<overrun>.+)\|(?P<msw>.+)\|(?P<best>.+)\|(?P<worst>.+)"
match = re.match(format, output)
rtd_dict = match.groupdict()
Now fixing this would mean the whole thing comes very close to the
existing `parse` module [1] so at that point I'd argue it is more sane to
just use that:
import parse
fmt = "RTD|{min:^g}|{avg:^g}|{max:^g}|{overrun:^g}|{msw:^g}|{best:^g}|{worst:^g}"
res = parse.parse(fmt, output)
rtd_dict = res.named
OTOH for your specific case a simple split could work as well:
names = ["min", "avg", "max", "overrun", "msw", "best", "worst"]
rtd_dict = {name: float(value) for name, value in zip(names, output.split("|")[1:])}
Not sure what would suit you best.
[1]: https://pypi.org/project/parse/
> > > Signed-off-by: Heiko Schocher <hs at denx.de>
> > > ---
> > >
> > > tbot_contrib/utils.py | 25 +++++++++++++++++++++++++
> > > 1 file changed, 25 insertions(+)
> > >
> > > diff --git a/tbot_contrib/utils.py b/tbot_contrib/utils.py
> > > index 8073f89..1afb65b 100644
> > > --- a/tbot_contrib/utils.py
> > > +++ b/tbot_contrib/utils.py
> > > @@ -15,6 +15,7 @@
> > > # along with this program. If not, see <https://www.gnu.org/licenses/>;;;;;.
> > >
> > > from tbot.machine import linux
> > > +import re
> > >
> > >
> > > def check_systemd_services_running(lnx: linux.LinuxShell, services: list) -> None:
> > > @@ -29,3 +30,27 @@ def check_systemd_services_running(lnx: linux.LinuxShell, services: list) -> Non
> > > ret = lnx.exec("systemctl", "status", s, "--no-pager")
> > > if ret[0] != 0:
> > > lnx.test("sudo", "systemctl", "start", s)
> > > +
> > > +
> > > +def string_to_dict(string: str, pattern: str) -> dict:
> >
> > Annotation for the return type needs to be typing.Dict[str, str].
> >
> > Also, to keep consistent with the rest of Python (e.g. the `re` module),
> > the arguments should be switched so that pattern comes first.
>
> Ok, can clean this up...
>
> > > + """
> > > + convert a string into a dictionary via a pattern
> > > +
> > > + example pattern:
> > > + 'hello, my name is {name} and I am a {age} year old {what}'
> > > +
> > > + string:
> > > + 'hello, my name is dan and I am a 33 year old developer'
> > > +
> > > + returned dict:
> > > + {'age': '33', 'name': 'dan', 'what': 'developer'}
> > > + from:
> > > + https://stackoverflow.com/questions/11844986/convert-or-unformat-a-string-to-variables-like-format-but-in-reverse-in-p
> > > + """
> > > + regex = re.sub(r"{(.+?)}", r"(?P<_\1>.+)", pattern)
> >
> > The match-pattern used here, `(?P<_group>.+)` matches greedy. This means
> > for your above example that the following string
> >
> > hello, my name is dan and I am a 33 year old developer and I am a 33 year old developer
> >
> > will yield
> >
> > {'name': 'dan and I am a 33 year old developer', 'age': '33',
> > 'what': 'developer'}
> >
> > Not sure if this is a problem though. It could be changed by using
> > `(?P<_group>.+?)` instead if non-greedy behavior is more desirable.
>
> Yes, I change this.
>
> Thanks!
>
> bye,
> Heiko
> > > + match = re.search(regex, string)
> > > + assert match is not None, f"The pattern {regex!r} was not found!"
> > > + values = list(match.groups())
> > > + keys = re.findall(r"{(.+?)}", pattern)
> > > + _dict = dict(zip(keys, values))
> > > + return _dict
--
Harald
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-62 Fax: +49-8142-66989-80 Email: hws at denx.de
More information about the tbot
mailing list