[U-Boot] [PATCH v3 1/6] spl: add Kconfig option to clear bss early

Simon Goldschmidt simon.k.r.goldschmidt at gmail.com
Wed May 22 08:05:28 UTC 2019


On Wed, May 22, 2019 at 2:53 AM Simon Glass <sjg at chromium.org> wrote:
>
> Hi Andreas,
>
> On Tue, 21 May 2019 at 15:01, Andreas Dannenberg <dannenberg at ti.com> wrote:
> >
> > Hi Simon (Glass),
> >
> > On Sat, May 18, 2019 at 10:08:19AM -0600, Simon Glass wrote:
> > > Hi Andreas,
> > >
> > > On Mon, 6 May 2019 at 22:49, Andreas Dannenberg <dannenberg at ti.com> wrote:
> > > >
> > > > Hi Simon,
> > > >
> > > > On Mon, May 06, 2019 at 09:51:56PM -0600, Simon Glass wrote:
> > > > > Hi Andreas,
> > > > >
> > > > > On Fri, 3 May 2019 at 14:25, Andreas Dannenberg <dannenberg at ti.com> wrote:
> > > > > >
> > > > > > Simon,
> > > > > >
> > > > > > On Sat, Mar 30, 2019 at 09:18:08PM +0100, Simon Goldschmidt wrote:
> > > > > > > Simon Glass <sjg at chromium.org> schrieb am Sa., 30. März 2019, 21:06:
> > > > > > >
> > > > > > > > Hi Simon,
> > > > > > > >
> > > > > > > > On Wed, 27 Mar 2019 at 13:40, Simon Goldschmidt
> > > > > > > > <simon.k.r.goldschmidt at gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > This introduces a new Kconfig option SPL_CLEAR_BSS_F. If enabled, it
> > > > > > > > clears
> > > > > > > > > the bss before calling board_init_f() instead of clearing it before
> > > > > > > > calling
> > > > > > > > > board_init_r().
> > > > > > > > >
> > > > > > > > > This also ensures that variables placed in BSS can be shared between
> > > > > > > > > board_init_f() and board_init_r() in SPL.
> > > > > > > > >
> > > > > > > > > Such global variables are used, for example, when loading things from FAT
> > > > > > > > > before SDRAM is available: the full heap required for FAT uses global
> > > > > > > > > variables and clearing BSS after board_init_f() would reset the heap
> > > > > > > > state.
> > > > > > > > > An example for such a usage is socfpa_arria10 where an FPGA configuration
> > > > > > > > > is required before SDRAM can be used.
> > > > > > > > >
> > > > > > > > > Make the new option depend on ARM for now until more implementations
> > > > > > > > follow.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I still have objections to this series and I think we should discuss
> > > > > > > > other ways of solving this problem.
> > > > > > > >
> > > > > > > > Does socfgpa have SRAM that could be used before SDRAM is available?
> > > > > > > > If so, can we not use that for the configuration? What various are
> > > > > > > > actually in BSS that are needed before board_init_r() is called? Can
> > > > > > > > they not be in a struct created from malloc()?
> > > > > > > >
> > > > > > >
> > > > > > > The problem is the board needs to load an FPGA configuration from FAT
> > > > > > > before SDRAM is available. Yes, this is loaded into SRAM of course, but the
> > > > > > > whole code until that is done uses so many malloc/free iterations that The
> > > > > > > simple mall of implementation would require too much memory.
> > > > > > >
> > > > > > > And it's the full malloc state variables only that use BSS, not the FAT
> > > > > > > code.
> > > > > >
> > > > > > I've actually faced very similar issues working on our TI AM654x "System
> > > > > > Firmware Loader" implementation (will post upstream soon), where I need
> > > > > > to load this firmware and other files from media such as MMC/FAT in a very
> > > > > > memory-constrained SPL pre-relocation environment *before* I can bring up
> > > > > > DDR.
> > > > > >
> > > > > > Initially, I modified the fat.c driver to re-use memory so it is not as
> > > > > > wasteful during SYS_MALLOC_SIMPLE. While I'm not proud of this solution [1]
> > > > > > this allowed us to get going, allowing to load multiple files without
> > > > > > issues in pre-relocation SPL.
> > > > >
> > > > > That seems to point the way to a useful solution I think. We could
> > > > > have a struct containing allocated pointers which is private to FAT,
> > > > > and just allocate them the first time.
> > > >
> > > > The board_init_f()-based loader solution we use extends beyond MMC/FAT,
> > > > but also for OSPI, X/Y-Modem, and (later) USB, network, etc.
> > > >
> > > > Background:
> > > > On our "TI K3" devices we need to do a whole bunch of stuff before
> > > > DDR is up with limited memory, namely loading and installing a firmware
> > > > that controls the entire SoC called "System Firmware". It is only after
> > > > this FW is loaded from boot media and successfully started that I can
> > > > bring up DDR. So all this is done in SPL board_init_f(), which as the
> > > > last step brings up DDR.
> > > >
> > > > Not having BSS available to carry over certain state to the
> > > > board_init_r() world would lead to a bunch of hacky changes across
> > > > the board I'm afraid, more below.
> > >
> > > This is really unfortunate.
> > >
> > > It seems to me that we have two choises:
> > >
> > > 1. Hack around with board_init_f() such as to remove the distinction
> > > between this and board_init_r().
> > >
> > > 2. Enter board_init_r() without DRAM ready, and deal with setting it up there.
> > >
> > > I feel that the second solution is worth exploring. We could have some
> > > board-specific init in board_init_r(). We already have
> > > spl_board_init() so perhaps we could have spl_early_board_init()
> > > called right near the top?
> > >
> > > We can refactor a few of the functions in spl/spl.c so they can be
> > > called from board-specific code if necessary. We could also add new
> > > flags to global_data to control the behaviour of the SPL code, and the
> > > board code could set these.
> >
> > Let me explore this option. I can probably make something work but I
> > don't yet see how to do it cleanly, maybe it becomes clearer once I put
> > some code down. Currently by definition board_init_r() has DDR
> > available, and much of the code is geared towards it (for example the
> > calling of spl_relocate_stack_gd() before entering board_init_r() which
> > will already switch over to DDR).
> >
> > Also, and not to discourage that we can't improve upon the status quo,
> > but there is already a ton of boards using such an "early BSS" scheme...
> >
> > $ git grep --show-function 'memset.*bss' | grep board_init_f | wc -l
> > 35
>
> Yes I know :-(
>
> We should migrate these boards to use the generic SPL framework.

socfpga_gen5 is one of the architectures listed here. I'm not even sure
whether that's actually needed. However, it's hard to test, isn't it? How
do you actually tell BSS isn't used before entering board_init_r?

To be sure, we'd need to initialize unused memory to some magic
constant and check that it has been left untouched later (on boards
where BSS is available in board_init_r and remains in place when
moving on).

Regards,
Simon

>
> >
> > > > > I wonder if that would be enough for
> > > > >
> > > > > >
> > > > > > In the quest of creating something more upstream-friendly I had then
> > > > > > switched to using full malloc in pre-relocation SPL so that I didn't
> > > > > > have to hack the FAT driver, encountering similar issues like you
> > > > > > brought up and got this working, but ultimately abandoned this
> > > > > > approach after bundling all files needed to get loaded into a single
> > > > > > image tree blob which no longer required any of those solutions.
> > > > > >
> > > > > > What remained till today however is a need to preserve specific BSS
> > > > > > state from pre-relocation SPL over to post-relocation SPL environment,
> > > > > > namely flags set to avoid the (expensive) re-probing of peripheral
> > > > > > drivers by the SPL loader. For that I introduced a Kconfig option that
> > > > > > allows skipping the automatic clearing of BSS during relocation [2].
> > > > > >
> > > > > > Seeing this very related discussion here got me thinking about how else
> > > > > > I can carry over this "state" from pre- to post relocation but that's
> > > > > > probably a discussion to be had once I post my "System Firmware Loader
> > > > > > Series", probably next week.
> > > > >
> > > > > Since this is SPL I don't you mean 'relocation' here. I think you mean
> > > > > board_init_f() to board_init_r()?
> > > >
> > > > Yes that's what I mean. AFAIK relocation in SPL is still called relocation
> > > > from what I have seen working on U-boot, it just relocates gd and stack
> > > > but not the actual code (personally I find it misleading calling what SPL
> > > > does "relocation", but I got used to it).
> > > >
> > > > > You can use global_data to store state,
> > > >
> > > > I thought the idea was to stay away from gd, so we can eventually get
> > > > rid of it altogether?
> > >
> > > Not that I know of. It is how we communicate state before we have BSS.
> >
> > Oh ok I see now, I guess I have taken the comment [1] from
> > arch/arm/lib/spl.c out of context.
> >
> > [1] https://github.com/u-boot/u-boot/blob/master/arch/arm/lib/spl.c#L22
>
> Ah yes. That is referring to putting global_data in the data section.
> Perhaps we can delete that code now and see what breaks?
>
> >
> > > It is how we communicate state before we have BSS.
> >
> > Understood.
> >
> > > > > or malloc() to allocate memory and put things there.
> > > >
> > > > The challenge with potentially having to incorporate such a custom
> > > > solution for state preservation into several drivers as explained
> > > > earlier (SPI, USB, network, etc.) is that it does not appear to scale
> > > > well. I think using BSS instead would make all those additions cleaner
> > > > and simpler.
> > > >
> > > > > But using BSS seems wrong to me.
> > > >
> > > > I've seen you saying this a few times :)
> > > > Why?
> > >
> > > Driver model does its own allocate of memory and this is all attached
> > > to the DM structures.
> > >
> > > Drivers themselves cannot assume they are the only instance running,
> > > so data should be attached to their private-data pointers. Similarly
> > > for uclasses, if we put everything in the uclass-private data, then we
> > > don't need BSS and don't have any problems dealing with whether it is
> > > available yet. In general, BSS creates a lot of problems early in
> > > U-Boot's execution, and we don't actually need to use it.
> >
> > Yes, understood. The DM concept with private-data pointers is quite
> > clean from an OOP point of view.
> >
> > > If you look at the DM design you'll see that we try to avoid malloc()
> > > and BSS as much as possible. I suppose this series is another example
> > > of why :-)
> > >
> > > >
> > > > > If you are doing something in board_init_f() in SPL that needs BSS,
> > > > > can you not just move that code to board_init_r()?
> > > >
> > > > I need to access media drivers in board_init_f(), for which currently
> > > > I'm using BSS so I can preserve some limited state, such as that the
> > > > peripheral init was done, so that it doesn't get re-done by the actual
> > > > SPL loader later. board_init_r() requires DDR to be available which I
> > > > can't use without doing all that work in board_init_f() first to
> > > > load/start the system controller firmware, so it's a bit of a chicken
> > > > and egg issue here.
> > >
> > > Let's try moving the egg into board_init_r()  and putting the chicken
> > > after it, as mentioned above.
> >
> > Well I'll give this a shot.
> >
> > > >
> > > > My system firmware loader patch series is about ready and I was planning
> > > > on posting it tomorrow. How about with the entire approach being in the
> > > > open we use this as an opportunity to re-look at potential alternative
> > > > solutions...
> > >
> > > Sure. I hope I've explained my POV above.
> >
> > Yes thanks for the review and comments.
>
> You're welcome, and good luck with it.
>
> Regards,
> Simon


More information about the U-Boot mailing list