[U-Boot] [PATCH] Revert "fdt: Fix fdtdec_get_addr_size() for 64-bit"

Fri Aug 14 10:32:07 CEST 2015

On Fri, Aug 14, 2015 at 04:10:32PM +0800, Bin Meng wrote:
> Hi,
> 
> On Sun, Aug 9, 2015 at 11:08 PM, Simon Glass <sjg at chromium.org> wrote:
> > Hi Stephen,
> >
> > On 6 August 2015 at 13:03, Stephen Warren <swarren at wwwdotorg.org> wrote:
> >> On 08/05/2015 05:45 PM, Simon Glass wrote:
> >>>
> >>> Hi Stephen,
> >>>
> >>> On 5 August 2015 at 12:22, Stephen Warren <swarren at wwwdotorg.org> wrote:
> >>>>
> >>>> On 08/04/2015 10:08 PM, Simon Glass wrote:
> >>>>>
> >>>>>
> >>>>> Hi Stephen,
> >>>>>
> >>>>> On 3 August 2015 at 12:20, Stephen Warren <swarren at wwwdotorg.org> wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 08/03/2015 09:52 AM, Simon Glass wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Hi Stephen,
> >>>>>>>
> >>>>>>> On 3 August 2015 at 09:12, Stephen Warren <swarren at wwwdotorg.org>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 08/02/2015 06:13 PM, Simon Glass wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> This reverts commit 5b34436035fc862b5e8d0d2c3eab74ba36f1a7f4.
> >>>>>>>>>
> >>>>>>>>> This function has a few problems. It calls fdt_parent_offset() which
> >>>>>>>>> as
> >>>>>>>>> mentioned in code review is very slow.
> >>>>>>>>>
> >>>>>>>>> https://patchwork.ozlabs.org/patch/499482/
> >>>>>>>>> https://patchwork.ozlabs.org/patch/452604/
> >>>>>>>>>
> >>>>>>>>> It also happens to break SPI flash on Minnowboard max which is how I
> >>>>>>>>> noticed
> >>>>>>>>> that this was applied. I can send a patch to tidy that up, but in
> >>>>>>>>> any
> >>>>>>>>> case
> >>>>>>>>> I think we should consider a revert until the function is better
> >>>>>>>>> implemented.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> The fact that the function needs to perform a slow operation is not a
> >>>>>>>> good
> >>>>>>>> reason for a revert. The slowness of the operation is just a matter
> >>>>>>>> of
> >>>>>>>> fact
> >>>>>>>> with DT not having uplinks in its data structure, and U-Boot using
> >>>>>>>> those
> >>>>>>>> data structures directly.
> >>>>>>>>
> >>>>>>>> You'd requested during review that I additionally implement a faster
> >>>>>>>> version
> >>>>>>>> of the function in the case where the parent node is already known,
> >>>>>>>> and
> >>>>>>>> said
> >>>>>>>> it was fine if that happened in a later patch. I have this on my TODO
> >>>>>>>> list,
> >>>>>>>> but it's only been a couple of days.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> I didn't expect this to go to mainline before your new patch.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> I didn't get that message from the thread; you wrote:
> >>>>>>
> >>>>>> Simon Glass wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> Stephen Warren wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Simon Glass wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Also, how about (in addition) a
> >>>>>>>>> version of this function that works for devices? Like:
> >>>>>>>>>
> >>>>>>>>> device_get_addr_size(struct udevice *dev, ...)
> >>>>>>>>>
> >>>>>>>>> so that it can handle this for you.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> That sounds like a separate patch?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Yes, but I think we should get it in there so that people don't start
> >>>>>>> using this (wildly inefficient) function when they don't need to. I
> >>>>>>> think by passing in the parent node we force people to think about the
> >>>>>>> cost.
> >>>>>>>
> >>>>>>> Yes the driver model function can be a separate patch, but let's get
> >>>>>>> it in at about the same time. We have dev_get_addr() so could have
> >>>>>>> dev_get_addr_size().
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> That sounds to like you'd like a followup patch soon after this patch
> >>>>>> (my
> >>>>>> assumption would be within a few days or weeks) to solve your comments,
> >>>>>> not
> >>>>>> that you wanted a replacement patch.
> >>>>>
> >>>>>
> >>>>>
> >>>>> I will take that feedback to be a little more forceful in future, sorry.
> >>>>>
> >>>>>>
> >>>>>>>> Why not just fix the bug since you said you could? That seems far
> >>>>>>>> better
> >>>>>>>> than breaking the newly added Tegra210 support.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> I do have a patch but I'm not 100% comfortable with the approach. I
> >>>>>>> don't see a good reason to move away from the idea of fdt_addr_t and
> >>>>>>> fdt_addr_t being set correctly for the platform. Or maybe I
> >>>>>>> misunderstand the problem the patch was trying to fix. As I said it
> >>>>>>> did not have a commit message, so who knows :-)
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> The size of fdt_addr_t isn't relevant *at all* when parsing DT. The
> >>>>>> only
> >>>>>> thing that matters is #address-cells/#size-cells. Those properties are
> >>>>>> what
> >>>>>> tell the parsing code how many bytes to read from the reg property.
> >>>>>> Whether
> >>>>>> the resultant value fits into the code's internal representation is an
> >>>>>> internal detail of the code, not part of the semantics of DT itself or
> >>>>>> how
> >>>>>> to parse it.
> >>>>>>
> >>>>>> If code assumes that #address-cells == sizeof(fdt_addr_t), it is indeed
> >>>>>> quite likely that everything will just happen to work most of the time.
> >>>>>> However, this is purely an accident and not something that anything
> >>>>>> should
> >>>>>> rely upon.
> >>>>>>
> >>>>>> (I think Tegra210 support still has CONFIG_PHYS_64BIT undefined which
> >>>>>> is
> >>>>>> admittedly a little /unexpected/ for a 64-bit U-Boot build, but in
> >>>>>> practice
> >>>>>> is perfectly /legal/ and will work out just fine since, except perhaps
> >>>>>> for
> >>>>>> RAM sizes, I don't believe any value in DT will actually require more
> >>>>>> than
> >>>>>> 32-bits to represent)
> >>>>>
> >>>>>
> >>>>>
> >>>>> I would like to have the types match the platform where possible. At
> >>>>> least the types should not be smaller than the platform - e.g. if
> >>>>> CONFIG_PHYS_64BIT is not defined we should not support 64-bit
> >>>>
> >>>>
> >>>>
> >>>> In general, there's no "should not" here; we "cannot". If there's a
> >>>> 64-bit
> >>>> value in the DT (with bits above bit 31 set), then it can't be stored in
> >>>> a
> >>>> 32-bit variable.
> >>>>
> >>>> That said, if a DT has #address-cells=<2>, but the actual values stored
> >>>> in
> >>>> all reg values have 0 for the MS word, that'll actually work just fine.
> >>>> Note
> >>>> that it's 100% legal to set #address-cells=<100> and just write a bunch
> >>>> of
> >>>> extra zeros into the property. Silly perhaps, but perfectly legal. Since
> >>>> the
> >>>> function should adapt to whatever #address-cells value is in the DT,
> >>>> supporting that case isn't any more work, so there's no reason we
> >>>> shouldn't.
> >>>>
> >>>>> address/size in the device tree. This is for efficiency. We don't want
> >>>>> to force all the U-Boot code to 64-bit suddenly. This will bloat
> >>>>> things for no benefit.
> >>>>
> >>>>
> >>>>
> >>>> We could and likely should set CONFIG_PHYS_64BIT for Tegra210. However,
> >>>> that's unrelated to using the correct algorithm to parse DT.
> >>>>
> >>>>>>>> P.S. What is the issue with SPI flash? The commit description doesn't
> >>>>>>>> mention this at all.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> It calls that function expecting it to pick up an address and size
> >>>>>>> from two consecutive cells. With this patch, that fails (unless the
> >>>>>>> property happens to be "reg").
> >>>>
> >>>>
> >>>> ...
> >>>>>>
> >>>>>>
> >>>>>> I think this is all stemming from drivers/mtd/spi/sf_probe.c
> >>>>>> spi_flash_decode_fdt()'s call to fdtdec_get_addr_size() for property
> >>>>>> "memory-map"? If so, then looking at arch/x86/dts/minnowmax.dts's /spi
> >>>>>> node,
> >>>>>> the code and DT content are clearly inconsistent; For this node
> >>>>>> #address-cells=<1>, #size-cells=<0> which makes sense given that the
> >>>>>> address
> >>>>>> is a chip-select and hence has no size. So, the code should not assume
> >>>>>> that
> >>>>>> the memory-map property can be parsed in the same way as a reg
> >>>>>> property.
> >>>>>>
> >>>>>> I note that the memory-map property doesn't exist in the Linux kernel's
> >>>>>> DT
> >>>>>> binding documentation database, or code, hence hasn't been reviewed by
> >>>>>> the
> >>>>>> DT binding maintainers.
> >>>>>
> >>>>>
> >>>>>
> >>>>> The function comment says:
> >>>>>
> >>>>>    * Look up an address property in a node and return it as an address.
> >>>>>    * The property must hold one address with a length. This is only
> >>>>> tested
> >>>>>    * on 32-bit machines.
> >>>>
> >>>>
> >>>>
> >>>> And Thierry fixed it for systems with #address-cells > 1. Perhaps that
> >>>> part
> >>>> of the function comment should have been removed in the commit.
> >>>>
> >>>>> My intention was that this would be an efficient way to decode an
> >>>>> address and size from a device tree. To some extent regmap may take
> >>>>> over this role (IMO we should turn to drop fdtdec one day one we have
> >>>>> more infrastructure). But I'd like it to work efficiently for 32-bit
> >>>>> machines. The new function could hardly be less efficient.
> >>>>
> >>>>
> >>>>
> >>>> Again, there's no way in general to make it more efficient. The
> >>>> efficiency
> >>>> issue is directly implied by the DT data structures.
> >>>>
> >>>> In the special case where the parent node is already known, we can
> >>>> certainly
> >>>> introduce an alternate function that is more efficient. You've already
> >>>> asked
> >>>> for that, and as I said, it's in my TODO list.
> >>>>
> >>>>> I think we should consider the case where the physical address size of
> >>>>> U-Boot and the device tree do not match as a corner case. I certainly
> >>>>> don't want device tree to add loads of pointless code for 'normal'
> >>>>> platforms.
> >>>>
> >>>>
> >>>>
> >>>> It's not a corner case. It's a fundamental part of the DT schema. If
> >>>> U-Boot
> >>>> is going to use DT, it should actually use *DT*, not some-very
> >>>> similar-but-not-quite-DT format with all kinds of implicit and unstated
> >>>> exceptions, limitations, and assumptions. This implies fully honoring all
> >>>> aspects of how DT works when parsing it, not picking and choosing
> >>>> features
> >>>> because some are inconvenient or annoying. If U-Boot doesn't want to
> >>>> correctly implement DT support, it should just drop it completely.
> >>>>
> >>>> As an aside, when instantiating devices, I hope one day that U-Boot will
> >>>> parse DT top-down in a hierarchical way, rather than simply searching for
> >>>> any node anywhere in the DT without regard for whether any parent node is
> >>>> enabled, has a driver, has had the driver initialize, etc. U-Boot ignores
> >>>> this right now, and is only getting away with this accidentally. Without
> >>>> hacky workarounds in drivers, this won't continue to work for all Tegra
> >>>> HW
> >>>> (e.g. host1x graphics/display sub-modules, AHUB's audio-related
> >>>> sub-modules), since parent drivers must initialize before child drivers
> >>>> in
> >>>> order to enable various register buses, including clocks/resets affecting
> >>>> those buses etc.
> >>>>
> >>>> Again, if it's simply too inconvenient or bloated to implement DT
> >>>> properly
> >>>> in U-Boot, let's just drop it entirely. A halfway solution is the worst
> >>>> of
> >>>> both worlds. I'm not convinced the full implications of how to (and the
> >>>> need
> >>>> to) correctly and fully support DT have were well thought through before
> >>>> (control) DT support was added to U-Boot.
> >>>>
> >>>>> Re memory-map, yes it doesn't seem to be possible to do what it is
> >>>>> trying to do (and Thierry says the same below). It is quite weird to
> >>>>> have a SPI peripheral which is also memory mapped.
> >>>>>
> >>>>> Here's my question - if you fix the CONFIG_PHYS_64BIT problem with
> >>>>> 64-bit Tegra, what is actually wrong with the way the function was?
> >>>>
> >>>>
> >>>>
> >>>> With the DT files now checked into U-Boot, I think it would accidentally
> >>>> work, since we just happen to have set #address-cells=<2>,
> >>>> #size-cells=<2>,
> >>>> and that would just happen to match sizeof(fdt_addr_t).
> >>>>
> >>>> However note this is an accident on a couple of levels:
> >>>>
> >>>> a) This is because the code assumes that sizeof(fdt_addr_t) ==
> >>>> #address-cells * 4. This is an invalid assumption since it does not
> >>>> correctly honor the DT schema. It hard-codes the size of values whereas
> >>>> DT
> >>>> schema says the size is defined by the #xxx-cells properties.
> >>>>
> >>>> b) The original Tegra210 DTs that TomW posted had #address-cells=<1>,
> >>>> #size-cells=<1>. I asked him to change that to match what I expected to
> >>>> be
> >>>> in the Linux kernel's Tegra210 DTs. However, if he'd rejected my request
> >>>> or
> >>>> I hadn't noticed that, then with CONFIG_PHYS_64BIT set,
> >>>> fdtdec_get_addr_size() would have attempted to read twice as many bytes
> >>>> as
> >>>> it should have from the property. It's entirely plausible that someone
> >>>> could
> >>>> have come along later and realized CONFIG_PHYS_64BIT was set incorrectly
> >>>> and
> >>>> enabled it.
> >>>>
> >>>>> This is a boot loader so we should be willing to make some
> >>>>> simplifications.
> >>>>
> >>>>
> >>>>
> >>>> When dealing with internal bootloader details, sure assumptions,
> >>>> simplifications, etc. can be made.
> >>>>
> >>>> However, DT is an externally defined standard. The content of DT must be
> >>>> identical across all OSs (SW stacks, bootloader) and not influenced by
> >>>> requirements/... of any specific individual OS's (SW stack, bootloader)
> >>>> quirks. We can't just pick and choose which parts of it we care about.
> >>>> Well,
> >>>> perhaps if we stop calling it DT we could.
> >>>
> >>>
> >>> So I think in summary:
> >>>
> >>> - 64-bit machines should have CONFIG_PHYS_64BIT set correctly
> >>
> >>
> >> It turns out that arch/arm/include/asm/config.h already enables this for all
> >> ARM64 platforms.
> >>
> >> As such, we can in fact go ahead with reverting this patch, and U-Boot will
> >> still function on Tegra210 boards.
> >>
> >> In the short term, I think that means TomR should just apply this revert
> >> patch, and we don't need to send any additional patches. In the slightly
> >> longer term, we should add some comments to fdtdec_get_addr_size()
> >> describing its problems, and slightly longer term, add back Thierry's patch,
> >> but in a way that lets callers specify whether #address-cells/#size-cells
> >> should be used, or whether caller-supplied hard-coded values should be used.
> >
> > OK great. But I see your new patch so I think we can apply both at the
> > same time.
> >
> >>
> >> I apologize for not noticing this earlier; I'd assumed that since none of
> >> the Tegra-specific files in include/configs/ set this flag, nor any of the
> >> Tegra-specific Kconfig files, it wasn't set, and hence a revert of the patch
> >> would break Tegra210 support.
> >
> > No problem - I assumed it would also.
> >
> >>
> >>> - then fdtdec_get_addr_size() would work as expected
> >>
> >>
> >> I take issue with *works* as expected", since I would expect the function to
> >> implement DT parsing rules completely.
> >>
> >> However, it's certainly true to say that it will generate the desired
> >> results, even if it doesn't /work/ (isn't implemented) as expected.
> >>
> >> (I suppose this depends on whether you're talking about "works" w.r.t to
> >> correctness of the returned results or side-effects, or w.r.t. how the
> >> internal implementation works.)
> >>
> >>> - I want to make this cases as efficient as possible since it will be
> >>> called in SPL
> >>> - You are concerned that making assumptions like this violates the DT spec
> >>>
> >>> One option is to split the functions into two - one that works in SPL
> >>> and makes assumptions, and one that does not and does things the hard
> >>> way.
> >>
> >>
> >> Why should SPL be any different? U-Boot should either parse DT correctly or
> >> not at all. SPL shouldn't be a special case. Admittedly SPL on Tegra does
> >> very little (and isn't even present on Tegra210), but in general, can't
> >> someone use storage drivers, filesystems, etc. in SPL to choose the next
> >> stage to load, read GPIOs or other data sources to select between different
> >> boot paths, perhaps even interact with a user? If so, then assuming that SPL
> >> can somehow implements a reduced set of features, and hence can make
> >> assumptions that non-SPL can't, seems quite dangerous. We could only do that
> >> if we put some active checks in the U-Boot makefiles to ensure that nobody
> >> enabled anything in the SPL besides a set of strictly audited features.
> >
> > Yes we could do that and it would avoid pain later. I suppose SPL on
> > Tegra is a bit of a special case since there is really no size limit.
> > For some chips the limits are pretty severe so I am quite sensitive to
> > code size even at the expense of extra debug time when something
> > breaks.
> >
> >>
> >>> I suppose we could also add checks/warnings that the DT is
> >>>
> >>> 'well-formed' and that the address size matches the machine it is
> >>> running on.
> >>
> >>
> >> Yes, we certainly should do that.
> >>
> >>> In any case we do need to get rid of the parent lookup in this
> >>> function. So can either you or Thierry submit a patch to do this? The
> >>> parent should instead be a parameter to the function. You may need to
> >>> create a stub function which looks it up before calling
> >>> fdtdec_get_addr_size().
> >>
> >>
> >> Yes, we can't remove the parent lookup in all cases. However, we can avoid
> >> it in the cases where the caller can supply it. I think that's most, but not
> >> quite all.
> >
> > My main concern is dev_get_addr() since that is the official way of
> > reading a device address now. See also regmap_init_mem() which does
> > things its own way.
> >
> >>
> >>
> >>> I'll see how things look in SPL.
> >>>
> >>> Regards,
> >>> Simon
> >>>
> >>
> >
> 
> Do we have any conclusion about commit 5b34436? Today I started to
> check the pre-relocatoin DM PCI UART issue, but found it is now broken
> due to this commit. The broken part is at
> ns16550_serial_ofdata_to_platdata() in drivers/serial/ns16550.c, in
> which it has:
> 
>     addr = fdtdec_get_addr(gd->fdt_blob, dev->of_offset, "reg");
> #ifdef CONFIG_PCI
>     if (addr == FDT_ADDR_T_NONE) {
> ...
> 
> Before commit 5b34436, the old behavior is that the call to
> fdtdec_get_addr() returns FDT_ADDR_T_NONE so that we can trap into the
> PCI logic. But with commit 5b34436, addr is now zero which just bypass
> this logic.
> 
> As for why addr is now zero, this is because fdtdec_get_number() can
> only handle a 64-bit number at most. However for PCI reg, it has 3
> cells. So if I have the following encoding:
> 
> reg = <0x00025100 0x0 0x0 0x0 0x0
>            0x01025110 0x0 0x0 0x0 0x0>;
> 
> The addr will be assigned to zero after two rounds of left shift by 32-bit.
> 
> I can certainly change ns16550 driver to test the return value against
> 0 now, but I think this fdtdec_get_addr() does not cover all cases.
> Please advise.

This type of simple address parsing breaks down in the face of PCI. PCI
requires 3 address cells because it has different types of address
spaces. fdtdec_get_address() actually does the right thing here. It
returns the "address" associated with the entry. The address is the
64-bit value you get by concatenating cells 0 and 1. Cell 2 contains
additional meta data such as the PCI address and the type of memory that
you are dealing with (configuration space, I/O, memory-mapped).

I'd suggest that we add code to properly check whether this is a PCI
device.

Also, why isn't this code obtaining the memory address from the base
address registers?

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20150814/edb27db6/attachment.sig>