[PATCH u-boot 0/3] Revert broken Bootmenu commits

Pali Rohár pali at kernel.org
Mon Jul 10 16:10:39 CEST 2023


On Sunday 25 June 2023 21:08:19 Tom Rini wrote:
> On Sun, Jun 25, 2023 at 05:15:34PM +0200, Pali Rohár wrote:
> > On Sunday 25 June 2023 10:52:13 Tom Rini wrote:
> > > On Sun, Jun 25, 2023 at 09:50:39AM +0200, Pali Rohár wrote:
> > > > On Saturday 24 June 2023 12:58:07 Tom Rini wrote:
> > > > > On Sat, Jun 24, 2023 at 10:50:41AM +0200, Pali Rohár wrote:
> > > > > > On Tuesday 20 June 2023 11:20:35 Simon Glass wrote:
> > > > > > > Hi Tom, Pali,
> > > > > > > 
> > > > > > > On Wed, 14 Jun 2023 at 20:51, Tom Rini <trini at konsulko.com> wrote:
> > > > > > > >
> > > > > > > > On Sun, Jun 11, 2023 at 02:53:21PM +0200, Pali Rohár wrote:
> > > > > > > >
> > > > > > > > > These 3 commits broke support for U-Boot Bootmenu on Nokia N900.
> > > > > > > > > Issues were reported more than month ago, but nobody reacted to them.
> > > > > > > > > So revert these broken commits for now, to fix U-Boot Bootmenu support.
> > > > > > > > >
> > > > > > > > > With these revered commits, U-Boot Bootmenu from master branch is
> > > > > > > > > working fine again on Nokia N900.
> > > > > > > > >
> > > > > > > > > Pali Rohár (3):
> > > > > > > > >   Revert "video: Enable VIDEO_ANSI by default only with EFI"
> > > > > > > > >   Revert "menu: Factor out menu-keypress decoding"
> > > > > > > > >   Revert "menu: Make use of CLI character processing"
> > > > > > > > >
> > > > > > > > >  cmd/bootmenu.c        |   9 ++--
> > > > > > > > >  cmd/eficonfig.c       |  12 ++---
> > > > > > > > >  common/cli_getch.c    |  12 ++---
> > > > > > > > >  common/menu.c         | 122 ++++++++++++++++++++++++++----------------
> > > > > > > > >  drivers/video/Kconfig |   7 +--
> > > > > > > > >  include/cli.h         |   4 +-
> > > > > > > > >  include/menu.h        |  17 +-----
> > > > > > > > >  7 files changed, 91 insertions(+), 92 deletions(-)
> > > > > > > >
> > > > > > > > Following up over here, while
> > > > > > > > https://patchwork.ozlabs.org/project/uboot/patch/20230612201434.861700-1-sjg@chromium.org/
> > > > > > > > addresses some of the issues, but not all (as it clearly isn't working
> > > > > > > > in all of the cases Pali has explained), looking in to the VIDEO_ANSI
> > > > > > > > part of this too, all three of these reverts are related seemingly to
> > > > > > > > something escape-character related.  I'm not taking any of the revert
> > > > > > > > commits in just yet, but will by the next -rc if we don't have something
> > > > > > > > that fixes all of the issues.
> > > > > > > 
> > > > > > > I did send a series [1] with a fix for the nokia_rx51 keyboard driver,
> > > > > > > including the previous ansi-code patch. Given that:
> > > > > > > 
> > > > > > > - this problem doesn't seem to manifest on other boards
> > > > > > > - it does not cause any CI test to fail
> > > > > > > - there seem to be bugs in the nokia_rx51 implementation which are a
> > > > > > > possible/likely cause
> > > > > > > - the nokia_rx51 CI uses a 10-year-old out-of-tree QEMU
> > > > > > > - the problem goes away if debugging is added, suggesting it is
> > > > > > > related to timing
> > > > > > > 
> > > > > > > ...I don't think a revert is appropriate.
> > > > > > 
> > > > > > Unfortunately it does not fix this issue :-( New patch series from [1]
> > > > > > applied on top of the master branch has still issue with DOWN key on
> > > > > > emulated UART terminal.
> > > > > > 
> > > > > > > Pali, can you please take a look and see if you can debug what is
> > > > > > > actually going on? Here is my guess:
> > > > > > > 
> > > > > > > 1. When an arrow key is pressed, cli_ch_process() handles it by being
> > > > > > > passed the codes in sequence: \e [ B
> > > > > > > 2. Calling cli_ch_process() with ichar = 0 causes it to think the
> > > > > > > sequence is finished: \e [ \0 B   (this doesn't work since the \0 ends
> > > > > > > the sequence)
> > > > > > > 3. nokia_rx51 keyboard driver is returning '\0' even when key codes
> > > > > > > are pending, causing cli_ch_process() to be told that the sequence is
> > > > > > > done
> > > > > > 
> > > > > > I can look at it later, but I'm loosing motivation to do whole debugging
> > > > > > for another issue again, because for the last year my fixes and other
> > > > > > patches were stalled and u-boot devs show me that they are not
> > > > > > interested even in commenting them. I do not want to work again on
> > > > > > something which would be ignored.
> > > > > 
> > > > > Well, unless your changes here break something else, I don't think a fix
> > > > > for your problem will be ignored.
> > > > 
> > > > This is something which I read here more times in the past... and
> > > > reality was different.
> > > > 
> > > > Should I remind you that there are waiting eMMC fixes for mvebu and
> > > > again discussion about them stalled? Or rather should I say that it is
> > > > again ignored?
> > > 
> > > No, you should just say they're still waiting for review.  Since they're
> > > waiting for review. The MMC custodian has been asked to review them, and
> > > hasn't yet. And they don't appear to be super critical changes, and they
> > > conflict with other series, so yes, they'll get picked up when the
> > > custodian has time to review everything.
> > 
> > And what is "critical change" if it is not fixing booting (from eMMC)?
> 
> The when and where, and since when. Since re-reading everything in
> https://patchwork.ozlabs.org/project/uboot/list/?submitter=78810 that's
> not this revert series doesn't say "fix booting on $platform that was
> broken by ...".  It says clean up.  Clean up can wait.

"Fix eMMC boot" - This is not clear that it is a "fix"???

Or are you again going to play a game "I do not see your patches"?
Ok, then here is direct link:

https://lore.kernel.org/u-boot/20230413205750.10641-1-pali@kernel.org/

> > > > I have already spend time on this and have done everything needed to
> > > > make bootmenu working again. I have also prepared patches which are
> > > > fixing this problem and which were also tested.
> > > 
> > > Note that "I reverted the commit" is not a fix.
> > 
> > And what is the "change which makes feature X working again after it was
> > broken" if not a fix? Of course, it is _not_ fix because it is fixing
> > the problem.
> 
> The part where you're reverting changes that were made intentionally
> rather than fixing whatever problem they introduce or expose. There is a
> problem in here, and Simon's patch that I have since merged fixes that.
> There's seemingly something else further going on with N900 and only
> N900

It is not "only N900", see below.

> and you're saying the fix is to revert changes rather than explain
> what's wrong in any of the changes, still. The first change 'Revert
> "video: Enable VIDEO_ANSI by default only with EFI"' as a fix would be
> to add whatever conditional should also cause this to be "default y".
> And I tried doing that, but after reading the code, it doesn't make any
> sense why a few functions change on N900 when I do so. It's also the
> only platform where there's a change.
> 
> > > > And if you want something more from me then you show me why this time it
> > > > would be different, and again empty promises.
> > > 
> > > What I'd like is for you to not assume worst-faith.
> > 
> > Of course not. I'm just assuming that same thing happen as in the past.
> > From realistic point of view, this is the best approximation what to
> > assume.
> > 
> > Also you just few lines above wrote that fixing broken booting is not a
> > critical change at all, so you are just expressed that you are not going
> > to take fixes.
> 
> The only time your patches aren't accepted in a timely manner are when
> either (a) they get delayed because of custodians who don't have a lot
> of time to review an area or (b) you refuse to take feedback on a
> change. The feedback right now on this 3 part revert series is to take
> what Simon posted after trying things in QEMU and use that additional
> debug information to figure it out or post more detailed failure logs.
> 
> > > If you can fix the
> > > problem quickly at this point it'll get reviewed (assuming Simon has
> > > time, next week is EOSS) and merged if it's a safe enough looking fix.
> > > Otherwise it'll get in v2023.10.
> > 
> > And here you wrote that "change maybe land in v2023.10 version". But at
> > that time u-boot would move and again it would not land because change
> > from this time wound not applied on top of the master branch which would
> > be in the future.
> 
> Yes, if the change to fix exactly one platform is 10 lines, I'm going to
> be iffy about taking it a week before release.  If the change is 1 or 2
> obviously correct lines, I'll likely take the gamble. But since at this
> point there's exactly one known broken platform I'm not going to delay
> the release. There were more broken platforms when I said I'd take the 3
> part revert but Simon came along and fixed the problems everyone else
> reported and made a good faith attempt and fixing your platform too.
> 
> > I'm not newbie here, I see what is happening for more years with my
> > patches.
> > 
> > Or should I remind you what is the average time how long rx51 patches
> > have been waiting on the list until they were merged?
> 
> I mean, what's the average over the last year or two? Yes, they got
> delayed when I delegated them to the TI custodian who then had no real
> time to deal with U-Boot.
> 
> > I have already provided a fix for this problem. What happened? Nothing,
> > I was just ask to do another fix. Do you think, that I'm a total idiot,
> > who is going to prepare a new patch, which would be ignored or rejected
> > like the previous one, or other other work which I done?
> 
> Narrowing down a problem to a few commits and reverting them is a debug
> aide, unless the commits themselves are simply wholly broken and wrong.
> So no, you didn't fix the issue.  You narrowed it down.  I was willing
> to take that even if Simon was unable to figure out what was wrong as it
> was on multiple platforms, even.  So now you've been asked to dig more
> on your platform as it doesn't fail for anyone else anywhere else.

I have not found any issue on rx51 code, see below...

> > > > > > Hence, now I did only smaller - but still important - task: Locate exact
> > > > > > commits which broke particular feature and prepare reverts on top of the
> > > > > > master branch which _really_ fix the broken functionality - and make
> > > > > > code working again.
> > > > > 
> > > > > Unfortunately you're also the only person able to replicate the problem,
> > > > > and Simon has tried (and posted fixes to your platform for other issues,
> > > > > and debugging patches to hopefully making finding the problem easier).
> > > > 
> > > > Have somebody else even tried to reproduce this issue on any other HW
> > > > board or any other qemu board?
> > > 
> > > The summary of Simon's thread where he tried to fix this problem is that
> > > only your platform still has a problem here.
> > 
> > This is not answer to the question.
> 
> Simon.  Simon tested this on N900 in QEMU, found some problems, reported
> them, was unable to find further problems and pushed it back on you to
> review the patches for your platform to fix the problem he did see and
> was able to fix.
> 
> -- 
> Tom

I take some time, starting debugging this issue and...

1) I have not found any issue in n900 code. So if there is one, please
exactly show where and how.

2) This problem is possible to reproduce _without_ n900 board code,
without n900 keyboard driver.

So I'm asking again question:

> Have somebody else even tried to reproduce this issue on any other HW
> board or any other qemu board?

Because you have not answered to it, I will do it. Answer is NO.

Nobody tried to reproduce it, otherwise would be able to hit this issue.
And your arguments "I'm lazy in building n900 qemu version" is pointless
because this can be reproduce without n900 keyboard driver. Just on the
UART terminal.


Normally people who introduced regression should take care of them. And
if are not able to do it in time, then revert of the broken
functionality must be applied.

Nobody took any action here, I wrote some details where can be problems,
everybody completely ignored them; so I sent those reverts.


Now I have also look at those 3 problematic commits. First one is
obviously broken, I think it does not need more comments that what I
already wrote.

Second one is probably correct but it somehow interference with the
third one (and disallow to revert the third one). And the third one is
the problem. I have tried to trace what that original commit is doing,
but I have not understand. So I cannot debug this code.

So I'm again asking, revert these 3 commits, or show me proof of their
validity. I have already showed you they they are incorrect by simple
regression testing.


I'm really disappointed, that even after getting explicit information
about breakage, regression and identification what is wrong; no steps
were taken and you intentionally released broken u-boot for rx51.

Should I take it as a clear "we do not care about reporting
regressions" or "fuck you we do not need anymore, go way"?

Its pretty obvious now, that you do not want to do anything with it, so
I'm NOT going to debug & send patches.


More information about the U-Boot mailing list