[RFC PATCH 00/28] cli: Add a new shell

Thu Jul 1 22:21:55 CEST 2021

On Thu, Jul 01, 2021 at 02:15:43AM -0400, Sean Anderson wrote:

> Well, this has been sitting on my hard drive for too long without feedback
> ("Release early, release often"), so here's the first RFC. This is not ready to
> merge (see the "Future work" section below), but the shell is functional and at
> least partially tested.
> 
> The goal is to have 0 bytes gained over Hush. Currently we are around 800 bytes
> over on sandbox.

A good goal, but perhaps slightly too strict?

> 
> add/remove: 90/54 grow/shrink: 3/7 up/down: 12834/-12042 (792)
> 
> = Getting started
> 
> Enable CONFIG_LIL. If you would like to run tests, enable CONFIG_LIL_FULL. Note
> that dm_test_acpi_cmd_dump and setexpr_test_str_oper will fail. CONFIG_LIL_POOLS
> is currently broken (with what appears to be a double free).
> 
> For an overview of the language as a whole, refer to the original readme [1].
> 
> [1] http://runtimeterror.com/tech/lil/readme.txt
> 
> == Key patches
> 
> The following patches are particularly significant for reviewing and
> understanding this series:
> 
> cli: Add LIL shell
> 	This contains the LIL shell as originally written by Kostas with some
> 	major deletions and some minor additions.
> cli: lil: Wire up LIL to the rest of U-Boot
> 	This allows you to use LIL as a shell just like Hush.
> cli: lil: Document structures
> 	This adds documentation for the major structures of LIL. It is a good
> 	place to start looking at the internals.
> test: Add tests for LIL
> 	This adds some basic integration tests and provides some examples of
> 	LIL code.
> cli: lil: Add a distinct parsing step
> 	This adds a parser separate from the interpreter. This patch is the
> 	largest original work in this series.
> cli: lil: Load procs from the environment
> 	This allows procedures to be saved and loaded like variables.
> 
> = A new shell
> 
> This series adds a new shell for U-Boot. The aim is to eventually replace Hush
> as the primary shell for all boards which currently use it. Hush should be
> replaced because it has several major problems:
> 
> - It has not had a major update in two decades, resulting in duplication of
>   effort in finding bugs. Regarding a bug in variable setting, Wolfgang remarks
> 
>     So the specific problem has (long) been fixed in upstream, and
>     instead of adding a patch to our old version, thus cementing the
>     broken behaviour, we should upgrade hush to recent upstream code.
> 
>     -- Wolfgang Denk [2]
> 
>   These lack of updates are further compounded by a significant amount of
>   ifdef-ing in the Hush code. This makes the shell hard to read and debug.
>   Further, the original purpose of such ifdef-ing (upgrading to a newer Hush)
>   has never happened.
> 
> - It was designed for a preempting OS which supports pipes and processes. This
>   fundamentally does not match the computing model of U-Boot where there is
>   exactly one thread (and every other CPU is spinning or sleeping). Working
>   around these design differences is a significant cause of the aformentioned
>   ifdef-ing.
> 
> - It lacks many major features expected of even the most basic shells, such
>   as functions and command substitution ($() syntax). This makes it difficult
>   to script with Hush. While it is desirable to write some code in C, much code
>   *must* be written in C because there is no way to express the logic in Hush.
> 
> I believe that U-Boot should have a shell which is more featureful, has cleaner
> code, and which is the same size as Hush (or less). The ergonomic advantages
> afforded by a new shell will make U-Boot easier to use and customize.
> 
> [2] https://lore.kernel.org/u-boot/872080.1614764732@gemini.denx.de/

First, great!  Thanks for doing this.  A new shell really is the only
viable path forward here, and I appreciate you taking the time to
evaluate several and implement one.

> = Open questions
> 
> While the primary purpose of this series is of course to get feedback on the
> code I have already written, there are several decisions where I am not sure
> what the best course of action is.
> 
> - What should be done about 'expr'? The 'expr' command is a significant portion
>   of the final code size. It cannot be removed outright, because it is used by
>   several builtin functions like 'if', 'while', 'for', etc. The way I see it,
>   there are two general approaches to take
> 
>   - Rewrite expr to parse expressions and then evaluate them. The parsing could
>     re-use several of the existing parse functions like how parse_list does.
>     This could reduce code, as instead of many functions each with their own
>     while/switch statements, we could have two while/switch statements (one to
>     parse, and one to evaluate). However, this may end up increasing code size
>     (such as when the main language had evaluation split from parsing).
> 
>   - Don't parse infix expressions, and just make arithmetic operators normal
>     functions. This would affect ergonomics a bit. For example, instead of
> 
> 	if {$i < 10} { ... }
> 
>     one would need to write
> 
> 	if {< $i 10} { ... }
> 
>     and instead of
> 
> 	if {$some_bool} { ... }
> 
>     one would need to write
> 
> 	if {quote $some_bool} { ... }
> 
>     Though, given how much setexpr is used (not much), this may not be such a
>     big price to pay. This route is almost certain to reduce code size.

So, this is a question because we have cmd/setexpr.c that provides
"expr" today?  Or because this is a likely place to reclaim some of that
800 byte growth?

> - How should LIL functions integrate with the rest of U-Boot? At the moment, lil
>   functions and procedures exist in a completely separate world from normal
>   commands. I would like to integrate them more closely, but I am not sure the
>   best way to go about this. At the very minimum, each LIL builtin function
>   needs to get its hands on the LIL interpreter somehow. I'd rather this didn't
>   happen through gd_t or similar so that it is easier to unit test.
>   Additionally, LIL functions expect an array of lil_values instead of strings.
>   We could strip them out, but I worry that might start to impact performance
>   (from all the copying).

I might be missing something here.  But, given that whenever we have C
code run-around and generate a string to then pass to the interpreter to
run, someone asks why we don't just make API calls directly, perhaps the
answer is that we don't need to?

> 
>   The other half of this is adding LIL features into regular commands. The most
>   important feature here is being able to return a string result. I took an
>   initial crack at it [3], but I think with this series there is a stronger
>   motivating factor (along with things like [4]).
> 
> [3] https://patchwork.ozlabs.org/project/uboot/list/?series=231377
> [4] https://patchwork.ozlabs.org/project/uboot/list/?series=251013
> 
> = Future work
> 
> The series as presented today is incomplete. The following are the major issues
> I see with it at the moment. I would like to address all of these issues, but
> some of them might be postponed until after first merging this series.
> 
> - There is a serious error handling problem. Most original LIL code never
>   checked errors. In almost every case, errors were silently ignored, even
>   malloc failures! While I have designed new code to handle errors properly,
>   there still remains a significant amount of original code which just ignores
>   errors. In particular, I would like to ensure that the following categories of
>   error conditions are handled:
> 
>   - Running out of memory.
>   - Access to a nonexistant variable.
>   - Passing the wrong number of arguments to a function.
>   - Interpreting a value as the wrong type (e.g. "foo" should not have a numeric
>     representation, instead of just being treated as 1).
> 
> - There are many deviations from TCL with no purpose. For example, the list
>   indexing function is named "index" and not "lindex". It is perfectly fine to
>   drop features or change semantics to reduce code size, make parsing easier,
>   or make execution easier. But changing things for the sake of it should be
>   avoided.
> 
> - The test suite is rather anemic compared with the amount of code this
>   series introduces. I would like to expand it significantly. In particular,
>   error conditions are not well tested (only the "happy path" is tested).
> 
> - While I have documented all new functions I have written, there are many
>   existing functions which remain to be documented. In addition, there is no
>   user documentation, which is critical in driving adoption of any new
>   programming language. Some of this cover letter might be integrated with any
>   documentation written.
> 
> - Some shell features such as command repetition and secondary shell prompts
>   have not been implemented.
> 
> - Arguments to native lil functions are incompatible with U-Boot functions. For
>   example, the command
> 
> 	foo bar baz
> 
>   would be passed to a U-Boot command as
> 
> 	{ "foo", "bar", "baz", NULL }
> 
>   but would be passed to a LIL function as
> 
> 	{ "bar", "baz" }
> 
>   This makes it more difficult to use the same function to parse several
>   different commands. At the moment this is solved by passing the command name
>   in lil->env->proc, but I would like to switch to the U-Boot argument list
>   style.
> 
> - Several existing tests break when using LIL because they expect no output on
>   failure, but LIL produces some output notifying the user of the failure.
> 
> - Implement DISTRO_BOOT in LIL. I think this is an important proof-of-concept to
>   show what can be done with LIL, and to determine which features should be
>   moved to LIL_FULL.
> 
> = Why Lil?
> 
> When looking for a suitable replacement shell, I evaluated implementations using
> the following criteria:
> 
> - It must have a GPLv2-compatible license.
> - It must be written in C, and have no major external dependencies.
> - It must support bare function calls. That is, a script such as 'foo bar'
>   should invoke the function 'foo' with the argument 'bar'. This preserves the
>   shell-like syntax we expect.
> - It must be small. The eventual target is that it compiles to around 10KiB with
>   -Os and -ffunction-sections.
> - There should be good tests. Any tests at all are good, but a functioning suite
>   is better.
> - There should be good documentation
> - There should be comments in the source.
> - It should be "finished" or have only slow development. This will hopefully
>   make it easier to port changes.

On this last point, I believe this is based on lil20190821 and current
is now lil20210502.  With a quick diff between them, I can see that the
changes there are small enough that while you've introduced a number of
changes here, it would be a very easy update.

> Notably absent from the above list is performance. Most scripts in U-Boot will
> be run once on boot. As long as the time spent evaluating scripts is kept under
> a reasonable threshold (a fraction of the time spend initializing hardware or
> reading data from persistant storage), there is no need to optimize for speed.
> 
> In addition, I did not consider updating Hush from Busybox. The mismatch in
> computing environment expectations (as noted in the "New shell" section above)
> still applies. IMO, this mismatch is the biggest reason that things like
> functions and command substitution have been excluded from the U-Boot's Hush.
> 
> == lil
> 
> - zLib
> - TCL
> - Compiles to around 10k with no builtins. To 25k with builtins.
> - Some tests, but not organized into a suite with expected output. Some evidence
>   that the author ran APL, but no harness.
> - Some architectural documentation. Some for each functions, but not much.
> - No comments :l
> - 3.5k LoC
> 
> == picol
> 
> - 2-clause BSD
> - TCL
> - Compiles to around 25k with no builtins. To 80k with builtins.
> - Tests with suite (in-language). No evidence of fuzzing.
> - No documentation :l
> - No comments :l
> - 5k LoC
> 
> == jimtcl
> 
> - 2-clause BSD
> - TCL
> - Compiles to around 95k with no builtins. To 140k with builtins. Too big...
> 
> == boron
> 
> - LGPLv3+ (so this is right out)
> - REBOL
> - Compiles to around 125k with no builtins. To 190k with builtins. Too big...
> 
> == libmawk
> 
> - GPLv2
> - Awk
> - Compiles to around 225k. Too big...
> 
> == libfawk
> 
> - 3-clause BSD
> - Uses bison+yacc...
> - Awk; As it turns out, this has parentheses for function calls.
> - Compiles to around 24-30k. Not sure how to remove builtins.
> - Test suite (in-language). No fuzzing.
> - Tutorial book. No function reference.
> - No comments
> - Around 2-4k LoC
> 
> == MicroPython
> 
> - MIT
> - Python (but included for completeness)
> - Compiles to around 300k. Too big...
> 
> == mruby/c
> 
> - 3-clause BSD
> - Ruby
> - Compiles to around 85k without builtins and 120k with. Too big...
> 
> == eLua
> 
> - MIT
> - Lua
> - Build system is a royal pain (custom and written in Lua with external deps)
> - Base binary is around 250KiB and I don't want to deal with reducing it
> 
> So the interesting/viable ones are
> - lil
> - picol
> - libfawk (maybe)
> 
> I started with LIL because it was the smallest. I have found several
> issues with LIL along the way. Some of these are addressed in this series
> already, while others remain unaddressed (see the section "Future Work").

Thanks for the evaluations, of these, lil does make the most sense.

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 659 bytes
Desc: not available
URL: <https://lists.denx.de/pipermail/u-boot/attachments/20210701/cbacdf93/attachment.sig>