sandbox trace errors in CI loop

Simon Glass sjg at chromium.org
Mon Sep 11 00:36:50 CEST 2023


Hi Michal,

On Fri, 8 Sept 2023 at 09:00, Michal Simek <monstr at monstr.eu> wrote:
>
> Hi Tom,
>
> pá 8. 9. 2023 v 15:42 odesílatel Michal Simek <monstr at monstr.eu> napsal:
> >
> >
> >
> > On 9/8/23 15:34, Tom Rini wrote:
> > > On Fri, Sep 08, 2023 at 12:24:10PM +0200, Michal Simek wrote:
> > >>
> > >>
> > >> On 9/7/23 20:14, Tom Rini wrote:
> > >>> On Thu, Sep 07, 2023 at 05:30:03PM +0200, Michal Simek wrote:
> > >>>> Hi Simon and Tom,
> > >>>>
> > >>>> I am preparing pull request and I see that CI loop is reporting issue with
> > >>>> sandbox trace and I have no idea what's going wrong here.
> > >>>>
> > >>>> This is the first problematic commit but have no clue what it has to do with trace.
> > >>>>
> > >>>> https://source.denx.de/u-boot/custodians/u-boot-microblaze/-/commit/8c5f80d11b29536979ac41a6087fb8938f45bbf2
> > >>>>
> > >>>> If you have an access you can take a look at my queue to see that only this
> > >>>> job is failing.
> > >>>> https://source.denx.de/u-boot/custodians/u-boot-microblaze/-/pipelines
> > >>>>
> > >>>> Thanks,
> > >>>> Michal
> > >>>>
> > >>>> collected 1152 items / 1151 deselected / 1 selected
> > >>>> test/py/tests/test_trace.py F                                            [100%]
> > >>>> =================================== FAILURES ===================================
> > >>>> __________________________________ test_trace __________________________________
> > >>>> test/py/tests/test_trace.py:292: in test_trace
> > >>>>       check_function(cons, fname, proftool, map_fname, trace_dat)
> > >>>> test/py/tests/test_trace.py:117: in check_function
> > >>>>       out = util.run_and_log(cons, ['sh', '-c', cmd])
> > >>>> test/py/u_boot_utils.py:181: in run_and_log
> > >>>>       output = runner.run(cmd, ignore_errors=ignore_errors, stdin=stdin, env=env)
> > >>>> test/py/multiplexed_log.py:183: in run
> > >>>>       raise exception
> > >>>> E   ValueError: Exit code: 1
> > >>>> ---------------------------- Captured stdout setup -----------------------------
> > >>>> /u-boot
> > >>>> trace: early enable at 00100000
> > >>>> sandbox_serial serial: pinctrl_select_state_full:
> > >>>> uclass_get_device_by_phandle_id: err=-19
> > >>>> =>
> > >>>
> > >>> I don't get it either since I see this on master with trace options
> > >>> enabled per CI:
> > >>> $ ./u-boot -T arch/sandbox/dts/test.dtb
> > >>> trace: early enable at 00100000
> > >>> sandbox_serial serial: pinctrl_select_state_full: uclass_get_device_by_phandle_id: err=-19
> > >>> ... rest of boot proceeds ...
> > >>>
> > >>> So the test should be failing already if that was the problem.
> > >>>
> > >>
> > >> I don't thin it is the problematic part.
> > >>
> > >> When I look at
> > >> https://source.denx.de/u-boot/u-boot/-/blob/master/tools/docker/Dockerfile?ref_type=heads#L225
> > >>
> > >> clone is done from
> > >> git clone https://github.com/rostedt/trace-cmd.git /tmp/trace/trace-cmd && \
> > >>
> > >> https://github.com/rostedt/trace-cmd/blob/master/README#L5
> > >>
> > >> we should be using
> > >> git://git.kernel.org/pub/scm/utils/trace-cmd/trace-cmd.git instead.
> > >>
> > >> I installed the latest version on my PC and run it via pytest and this is
> > >> what I am getting.
> > >> You see that trace-cmd report is returning "failed to init data".
> > >>
> > >> Are we wired to any specific trace-cmd version?
> > >
> > > Wait, no, what's going on exactly? The github repository is a mirror of
> > > the kernel.org one. So yes, we should correct to point to the main one,
> > > but the top of tree in both cases is commit
> > > 354dccca52e805ce1b22e2b62cbae8c265886c27.
> > >
> >
> > I dig into it more.
> > What it is happening is that when sandbox u-boot is bigger trace.dat file is
> > placing information to different offset then it is recorded somewhere in header.
> >
> > Wrong placement
> >
> > 0005b000  20 70 65 72 66 20 6d 6f  6e 6f 20 6d 6f 6e 6f 5f  | perf mono mono_|
> > 0005b010  72 61 77 20 62 6f 6f 74  20 78 38 36 2d 74 73 63  |raw boot x86-tsc|
> > 0005b020  0a 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
> > 0005b030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
> > *
> > 0005c000  89 df f6 1d 00 00 00 00  dc 0f 00 00 00 00 00 00  |................|
> > 0005c010  06 00 00 00 01 00 00 00  01 00 00 00 af e5 0a 00  |................|
> > 0005c020  00 00 00 00 ef dc 09 00  00 00 00 00 06 00 00 00  |................|
> > 0005c030  01 00 00 00 01 00 00 00  af e5 0a 00 00 00 00 00  |................|
> >
> > Correct placement for trace-cmd
> >
> > 0005aee0  75 70 74 69 6d 65 20 70  65 72 66 20 6d 6f 6e 6f  |uptime perf mono|
> > 0005aef0  20 6d 6f 6e 6f 5f 72 61  77 20 62 6f 6f 74 20 78  | mono_raw boot x|
> > 0005af00  38 36 2d 74 73 63 0a 00  00 00 00 00 00 00 00 00  |86-tsc..........|
> > 0005af10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
> > *
> > 0005b000  36 f8 92 3f 00 00 00 00  e0 0f 00 00 00 00 00 00  |6..?............|
> >
> > And trace-cmd want to decode data available at 0x5b000 offset in both cases.
> >
> > It means I expect proftool is not correctly generating offset where data is
> > present. I don't understand how it works but I clearly see based on trace-cmd
> > that it points to incorrect location.
>
> diff --git a/tools/proftool.c b/tools/proftool.c
> index 101bcb63334e..815e01efb5d1 100644
> --- a/tools/proftool.c
> +++ b/tools/proftool.c
> @@ -1500,7 +1500,7 @@ static int write_flyrecord(struct twriter *tw,
> enum out_format_t out_format,
>         tw->ptr += fprintf(fout, "flyrecord%c", 0);
>
>         /* trace data */
> -       start = ALIGN(tw->ptr + 16, TRACE_PAGE_SIZE);
> +       start = ALIGN(tw->ptr + 16, 0x4000);
>         tw->ptr += tputq(fout, start);
>
>         /* use a placeholder for the size */
>
> This is the problematic code. In flyrecord section in header there is
> the calculation where data start.
> And there is tw->ptr +16 used (not sure why +16) but actual flyrecord
> is much bigger than 16.
>
> In my case start of flyrecord is 0x5afc1 + flyrecordX + 16 (mentioned
> above) is 0x5afdb.
> Then there is page alignment which gives 0x5b000 which is recorded but
> actual size is till 0x5b021.
>
> The actual size of the header is 0x60 not  flyrecordX + 16 (based on
> location where tw->ptr is taken).
> Simon: Do you agree with this logic?
>
> It was introduced by commit be16fc81b2ed ("trace: Update proftool to
> use new binary format").

If you can fix it then yes that is fine and please send a patch. The
test is there to catch things going wrong. The intent of proftool is
to mimic the flyrecord format used by Linux. Your analysis seems right
to me.

Regards,
Simon


More information about the U-Boot mailing list