[PATCH] Revert "spl: Drop bd_info in the data section"
    Tom Rini 
    trini at konsulko.com
       
    Mon Apr 12 20:18:18 CEST 2021
    
    
  
On Tue, Apr 13, 2021 at 05:49:19AM +1200, Simon Glass wrote:
> Hi Tom, Alex,
> 
> On Tue, 13 Apr 2021 at 04:26, Tom Rini <trini at konsulko.com> wrote:
> >
> > On Mon, Apr 12, 2021 at 10:21:18AM -0500, Alex G. wrote:
> > >
> > >
> > > On 4/12/21 9:40 AM, Tom Rini wrote:
> > > > On Mon, Apr 12, 2021 at 08:51:11AM -0500, Alex G. wrote:
> > > > >
> > > > >
> > > > > On 4/12/21 8:25 AM, Tom Rini wrote:
> > > > > > On Fri, Apr 09, 2021 at 05:29:36PM -0700, Tim Harvey wrote:
> > > > > > > On Fri, Apr 9, 2021 at 1:53 PM Tom Rini <trini at konsulko.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, Apr 09, 2021 at 03:24:41PM -0500, Adam Ford wrote:
> > > > > > > > > On Fri, Apr 9, 2021 at 2:20 PM Alex G. <mr.nuke.me at gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Simon
> > > > > > > > > >
> > > > > > > > > > On 4/8/21 6:55 PM, Simon Glass wrote:
> > > > > > > > > > > Hi Alexandru,
> > > > > > > > > > >
> > > > > > > > > > > On Fri, 9 Apr 2021 at 04:56, Alexandru Gagniuc <mr.nuke.me at gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > This reverts commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75.
> > > > > > > > > > > >
> > > > > > > > > > > > struct global_data contains a pointer to the bd_info structure. This
> > > > > > > > > > > > pointer was populated spl_set_bd() to a pre-allocated bd_info in the
> > > > > > > > > > > > ".data" section. The referenced commit replaced this mechanism to one
> > > > > > > > > > > > that uses malloc(). That new mechanism is only used if SPL_ALLOC_BD=y.
> > > > > > > > > > > > which very few boards do.
> > > > > > > > > > > >
> > > > > > > > > > > > The result is that (struct global_data)->bd is NULL in SPL on most
> > > > > > > > > > > > platforms. This breaks falcon mode, since arch_fixup_fdt() tries to
> > > > > > > > > > > > access (struct global_data)->bd and set the "/memory" node in the
> > > > > > > > > > > > devicetree. The result is that the "/memory" node contains garbage
> > > > > > > > > > > > values, causing linux to panic() as it sets up the page table.
> > > > > > > > > > > >
> > > > > > > > > > > > Instead of trying to fix the mess, potentially causing other issues,
> > > > > > > > > > > > revert to the code that worked, while this change is reworked.
> > > > > > > > > > >
> > > > > > > > > > > The goal here is to drop a feature that few boards use and reduce
> > > > > > > > > > > memory usage in SPL. It has been in place for two releases now.
> > > > > > > > > > >
> > > > > > > > > > > If Falcon mode needs it, perhaps you could add an 'imply' in the
> > > > > > > > > > > Kconfig for that feature? Is there one? Or perhaps
> > > > > > > > > > > CONFIG_ARCH_FIXUP_FDT_MEMORY ?
> > > > > > > > > > >
> > > > > > > > > > > One option would be to return an error in arch_fixup_fdt(). In
> > > > > > > > > > > general, fixups make things tricky because there is no way to
> > > > > > > > > > > determine when they are used but at least this one has a CONFIG.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The argument that this has been in place for two releases is incorrect.
> > > > > > > > > > Commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75 was only introduced with
> > > > > > > > > > the v2021.04 release. It definitely was not in 2021.01. It's only in the
> > > > > > > > > > last release, which is four days old t the time of this writing.
> > > > > > > >
> > > > > > > > Yes.  But another way of saying that is that we're 4 days in to the
> > > > > > > > merge window.  That's a bit early to say we must revert the change.  If
> > > > > > > > this was just before the release, yes, revert would be the right answer.
> > > > > > > > It's also the case the original commit fixes some cases while also
> > > > > > > > saving size, if I read it right.  So a strict revert isn't right, we'd
> > > > > > > > need to also probably also default y SPL_ALLOC_BD in some cases.
> > > > > > > >
> > > > > > > > > > Although I was able to find one example, the reality is that we don't
> > > > > > > > > > know the full extent of the breakage. The prudent thing at this point is
> > > > > > > > > > to revert.
> > > > > > > > > >
> > > > > > > > > > The knowledge of how to init the platform is in the devicetree and code.
> > > > > > > > > > Why should kconfig also be involved in storing this knowledge? By this
> > > > > > > > > > model, as the number of boards increases without bounds, every "if"
> > > > > > > > > > predicate tends to be Kconfig driven. That is not maintainable, and why
> > > > > > > > > > I think the original change --and the proposed fixes-- are broken by design.
> > > > > > > > > >
> > > > > > > > > > Furthermore, I'm happy to discuss what to do about Falcon mode, and if
> > > > > > > > > > we should kill it entirely (I have an alternative proposal).  But we
> > > > > > > > > > shouldn't have that discussion in a manner holding my platform hostage.
> > > > > > > > > > To be fair to you, I don't think reverting a 64-byte savings in .data
> > > > > > > > > > will hold your platform hostage either.
> > > > > > > > >
> > > > > > > > > That original patch broke a bunch of OMAP boards, but enabling
> > > > > > > > > SPL_ALLOC_BD was the solution to fix the issue.
> > > > > > > > > Can you try enabling SPL_ALLOC_BD and see if that fixes it?  Maybe we
> > > > > > > > > can make falcon mode imply it.
> > > > > > > >
> > > > > > > > It would be "select" since it needs it rather than imply.
> > > > > > > >
> > > > > > >
> > > > > > > I just ran into this as well finding that commit 38d6b7ebdaee ("spl:
> > > > > > > Drop bd_info in the data section") breaks Gateworks Ventana and
> > > > > > > defining SPL_ALLOC_BD does resolve it. In this case, Falcon mode is
> > > > > > > not being used and the breakage is because arch/arm/mach-imx/spl.c
> > > > > > > dram_init_banksize() accesses gd->bd which is NULL.
> > > > > > >
> > > > > > > So I would guess that this probably broke a whole lot of IMX based
> > > > > > > boards that use SPL.
> > > > > > >
> > > > > > > I don't quite know what the best solution is... we now have a v2021.04
> > > > > > > that is broken for several or many boards and I dont' know if its
> > > > > > > clear what cases break.
> > > > > >
> > > > > > Looking forward, I think we need to rework this.  Simon, I gather you
> > > > > > have some platforms where we need to set gd->bd to something that we
> > > > > > malloc() ?  So perhaps spl_set_bd() should have been __weak so that
> > > > > > architectures / platforms could override as needed, but also move it
> > > > > > just past mem_malloc_init().
> > > > >
> > > > > Let's try and avoid making new weak functions. Why not introduce a
> > > > > spl_platform_alloc_bd(), that each plat- *must* implement? No diversion to
> > > > > Kconfig and no __weak__ required.
> > > >
> > > > Well, who needs something different than what we had before exactly?
> > >
> > > I'm not sure. From reading the commit message of the broken change, it seems
> > > the main driver was to save 64 bytes on coral. I'm assuming coral is the
> > > codename for a platform.
> >
> > Yes, coral is an x86 chromebook.  It also says:
> >     This uses up space in the SPL binary but it always starts as zero.
> >     Also some boards cannot support data in TPL (e.g. Intel Apollo Lake).
> >
> > and defaults the new feature to enabled on all x86.  And there's
> > a comment about x86 in the function (that with the changes made isn't so
> > helpful/correct?) now too.  So yes, I want to see what Simon
> >
> > > > This shouldn't be kicked up to "every SoC must define...", but maybe
> > > > "every architecture must define..." is OK.  Or maybe it's really just "a
> > > > few unusual platforms need" and so a weak function makes sense.
> > >
> > > We can can have arch_spl_alloc_bd() instead of plat_(). I don't believe in
> > > weak functins, as I see them as the problems that "made sense at the time".
> >
> > I see weak functions as a good solution when the problem is something
> > where the majority of the (cross-platform) cases require the same
> > solution, but there's one or two oddities out there.  But regardless, we
> > need to have the problem better defined before we can propose a
> > solution.
> 
> So how about enabling the Kconfig for iMX? I am not sure how many
> platforms do actually need this feature?
That was an OK answer for the first group we hit this on.
> Another option might be to (just as a test) take the 'bd' out of
> global_data when the Kconfig is not defined, to find out which
> platforms use it?
And that might help us figure out the second grouping of platforms, yes.
> As to a weak function, what would the default behaviour be? If we can
> define that, then it would be better IMO.
> 
> When we try to refactor things, weak functions and undefined (or
> arch-specific behaviour) can really make things tough.
Well, what was the problem you were trying to solve here?  I assumed
there was a case where things actively broke, with the previous design,
and it's not just the 64byte savings in some cases.  But is it?
-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 659 bytes
Desc: not available
URL: <https://lists.denx.de/pipermail/u-boot/attachments/20210412/d9e538d9/attachment.sig>
    
    
More information about the U-Boot
mailing list