[U-Boot] [PATCH] Revert "fdt: Fix fdtdec_get_addr_size() for 64-bit"

Fri Aug 14 10:10:32 CEST 2015

Hi,

On Sun, Aug 9, 2015 at 11:08 PM, Simon Glass <sjg at chromium.org> wrote:
> Hi Stephen,
>
> On 6 August 2015 at 13:03, Stephen Warren <swarren at wwwdotorg.org> wrote:
>> On 08/05/2015 05:45 PM, Simon Glass wrote:
>>>
>>> Hi Stephen,
>>>
>>> On 5 August 2015 at 12:22, Stephen Warren <swarren at wwwdotorg.org> wrote:
>>>>
>>>> On 08/04/2015 10:08 PM, Simon Glass wrote:
>>>>>
>>>>>
>>>>> Hi Stephen,
>>>>>
>>>>> On 3 August 2015 at 12:20, Stephen Warren <swarren at wwwdotorg.org> wrote:
>>>>>>
>>>>>>
>>>>>> On 08/03/2015 09:52 AM, Simon Glass wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi Stephen,
>>>>>>>
>>>>>>> On 3 August 2015 at 09:12, Stephen Warren <swarren at wwwdotorg.org>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 08/02/2015 06:13 PM, Simon Glass wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This reverts commit 5b34436035fc862b5e8d0d2c3eab74ba36f1a7f4.
>>>>>>>>>
>>>>>>>>> This function has a few problems. It calls fdt_parent_offset() which
>>>>>>>>> as
>>>>>>>>> mentioned in code review is very slow.
>>>>>>>>>
>>>>>>>>> https://patchwork.ozlabs.org/patch/499482/
>>>>>>>>> https://patchwork.ozlabs.org/patch/452604/
>>>>>>>>>
>>>>>>>>> It also happens to break SPI flash on Minnowboard max which is how I
>>>>>>>>> noticed
>>>>>>>>> that this was applied. I can send a patch to tidy that up, but in
>>>>>>>>> any
>>>>>>>>> case
>>>>>>>>> I think we should consider a revert until the function is better
>>>>>>>>> implemented.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The fact that the function needs to perform a slow operation is not a
>>>>>>>> good
>>>>>>>> reason for a revert. The slowness of the operation is just a matter
>>>>>>>> of
>>>>>>>> fact
>>>>>>>> with DT not having uplinks in its data structure, and U-Boot using
>>>>>>>> those
>>>>>>>> data structures directly.
>>>>>>>>
>>>>>>>> You'd requested during review that I additionally implement a faster
>>>>>>>> version
>>>>>>>> of the function in the case where the parent node is already known,
>>>>>>>> and
>>>>>>>> said
>>>>>>>> it was fine if that happened in a later patch. I have this on my TODO
>>>>>>>> list,
>>>>>>>> but it's only been a couple of days.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I didn't expect this to go to mainline before your new patch.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I didn't get that message from the thread; you wrote:
>>>>>>
>>>>>> Simon Glass wrote:
>>>>>>>
>>>>>>>
>>>>>>> Stephen Warren wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Simon Glass wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Also, how about (in addition) a
>>>>>>>>> version of this function that works for devices? Like:
>>>>>>>>>
>>>>>>>>> device_get_addr_size(struct udevice *dev, ...)
>>>>>>>>>
>>>>>>>>> so that it can handle this for you.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> That sounds like a separate patch?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Yes, but I think we should get it in there so that people don't start
>>>>>>> using this (wildly inefficient) function when they don't need to. I
>>>>>>> think by passing in the parent node we force people to think about the
>>>>>>> cost.
>>>>>>>
>>>>>>> Yes the driver model function can be a separate patch, but let's get
>>>>>>> it in at about the same time. We have dev_get_addr() so could have
>>>>>>> dev_get_addr_size().
>>>>>>
>>>>>>
>>>>>>
>>>>>> That sounds to like you'd like a followup patch soon after this patch
>>>>>> (my
>>>>>> assumption would be within a few days or weeks) to solve your comments,
>>>>>> not
>>>>>> that you wanted a replacement patch.
>>>>>
>>>>>
>>>>>
>>>>> I will take that feedback to be a little more forceful in future, sorry.
>>>>>
>>>>>>
>>>>>>>> Why not just fix the bug since you said you could? That seems far
>>>>>>>> better
>>>>>>>> than breaking the newly added Tegra210 support.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I do have a patch but I'm not 100% comfortable with the approach. I
>>>>>>> don't see a good reason to move away from the idea of fdt_addr_t and
>>>>>>> fdt_addr_t being set correctly for the platform. Or maybe I
>>>>>>> misunderstand the problem the patch was trying to fix. As I said it
>>>>>>> did not have a commit message, so who knows :-)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> The size of fdt_addr_t isn't relevant *at all* when parsing DT. The
>>>>>> only
>>>>>> thing that matters is #address-cells/#size-cells. Those properties are
>>>>>> what
>>>>>> tell the parsing code how many bytes to read from the reg property.
>>>>>> Whether
>>>>>> the resultant value fits into the code's internal representation is an
>>>>>> internal detail of the code, not part of the semantics of DT itself or
>>>>>> how
>>>>>> to parse it.
>>>>>>
>>>>>> If code assumes that #address-cells == sizeof(fdt_addr_t), it is indeed
>>>>>> quite likely that everything will just happen to work most of the time.
>>>>>> However, this is purely an accident and not something that anything
>>>>>> should
>>>>>> rely upon.
>>>>>>
>>>>>> (I think Tegra210 support still has CONFIG_PHYS_64BIT undefined which
>>>>>> is
>>>>>> admittedly a little /unexpected/ for a 64-bit U-Boot build, but in
>>>>>> practice
>>>>>> is perfectly /legal/ and will work out just fine since, except perhaps
>>>>>> for
>>>>>> RAM sizes, I don't believe any value in DT will actually require more
>>>>>> than
>>>>>> 32-bits to represent)
>>>>>
>>>>>
>>>>>
>>>>> I would like to have the types match the platform where possible. At
>>>>> least the types should not be smaller than the platform - e.g. if
>>>>> CONFIG_PHYS_64BIT is not defined we should not support 64-bit
>>>>
>>>>
>>>>
>>>> In general, there's no "should not" here; we "cannot". If there's a
>>>> 64-bit
>>>> value in the DT (with bits above bit 31 set), then it can't be stored in
>>>> a
>>>> 32-bit variable.
>>>>
>>>> That said, if a DT has #address-cells=<2>, but the actual values stored
>>>> in
>>>> all reg values have 0 for the MS word, that'll actually work just fine.
>>>> Note
>>>> that it's 100% legal to set #address-cells=<100> and just write a bunch
>>>> of
>>>> extra zeros into the property. Silly perhaps, but perfectly legal. Since
>>>> the
>>>> function should adapt to whatever #address-cells value is in the DT,
>>>> supporting that case isn't any more work, so there's no reason we
>>>> shouldn't.
>>>>
>>>>> address/size in the device tree. This is for efficiency. We don't want
>>>>> to force all the U-Boot code to 64-bit suddenly. This will bloat
>>>>> things for no benefit.
>>>>
>>>>
>>>>
>>>> We could and likely should set CONFIG_PHYS_64BIT for Tegra210. However,
>>>> that's unrelated to using the correct algorithm to parse DT.
>>>>
>>>>>>>> P.S. What is the issue with SPI flash? The commit description doesn't
>>>>>>>> mention this at all.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> It calls that function expecting it to pick up an address and size
>>>>>>> from two consecutive cells. With this patch, that fails (unless the
>>>>>>> property happens to be "reg").
>>>>
>>>>
>>>> ...
>>>>>>
>>>>>>
>>>>>> I think this is all stemming from drivers/mtd/spi/sf_probe.c
>>>>>> spi_flash_decode_fdt()'s call to fdtdec_get_addr_size() for property
>>>>>> "memory-map"? If so, then looking at arch/x86/dts/minnowmax.dts's /spi
>>>>>> node,
>>>>>> the code and DT content are clearly inconsistent; For this node
>>>>>> #address-cells=<1>, #size-cells=<0> which makes sense given that the
>>>>>> address
>>>>>> is a chip-select and hence has no size. So, the code should not assume
>>>>>> that
>>>>>> the memory-map property can be parsed in the same way as a reg
>>>>>> property.
>>>>>>
>>>>>> I note that the memory-map property doesn't exist in the Linux kernel's
>>>>>> DT
>>>>>> binding documentation database, or code, hence hasn't been reviewed by
>>>>>> the
>>>>>> DT binding maintainers.
>>>>>
>>>>>
>>>>>
>>>>> The function comment says:
>>>>>
>>>>>    * Look up an address property in a node and return it as an address.
>>>>>    * The property must hold one address with a length. This is only
>>>>> tested
>>>>>    * on 32-bit machines.
>>>>
>>>>
>>>>
>>>> And Thierry fixed it for systems with #address-cells > 1. Perhaps that
>>>> part
>>>> of the function comment should have been removed in the commit.
>>>>
>>>>> My intention was that this would be an efficient way to decode an
>>>>> address and size from a device tree. To some extent regmap may take
>>>>> over this role (IMO we should turn to drop fdtdec one day one we have
>>>>> more infrastructure). But I'd like it to work efficiently for 32-bit
>>>>> machines. The new function could hardly be less efficient.
>>>>
>>>>
>>>>
>>>> Again, there's no way in general to make it more efficient. The
>>>> efficiency
>>>> issue is directly implied by the DT data structures.
>>>>
>>>> In the special case where the parent node is already known, we can
>>>> certainly
>>>> introduce an alternate function that is more efficient. You've already
>>>> asked
>>>> for that, and as I said, it's in my TODO list.
>>>>
>>>>> I think we should consider the case where the physical address size of
>>>>> U-Boot and the device tree do not match as a corner case. I certainly
>>>>> don't want device tree to add loads of pointless code for 'normal'
>>>>> platforms.
>>>>
>>>>
>>>>
>>>> It's not a corner case. It's a fundamental part of the DT schema. If
>>>> U-Boot
>>>> is going to use DT, it should actually use *DT*, not some-very
>>>> similar-but-not-quite-DT format with all kinds of implicit and unstated
>>>> exceptions, limitations, and assumptions. This implies fully honoring all
>>>> aspects of how DT works when parsing it, not picking and choosing
>>>> features
>>>> because some are inconvenient or annoying. If U-Boot doesn't want to
>>>> correctly implement DT support, it should just drop it completely.
>>>>
>>>> As an aside, when instantiating devices, I hope one day that U-Boot will
>>>> parse DT top-down in a hierarchical way, rather than simply searching for
>>>> any node anywhere in the DT without regard for whether any parent node is
>>>> enabled, has a driver, has had the driver initialize, etc. U-Boot ignores
>>>> this right now, and is only getting away with this accidentally. Without
>>>> hacky workarounds in drivers, this won't continue to work for all Tegra
>>>> HW
>>>> (e.g. host1x graphics/display sub-modules, AHUB's audio-related
>>>> sub-modules), since parent drivers must initialize before child drivers
>>>> in
>>>> order to enable various register buses, including clocks/resets affecting
>>>> those buses etc.
>>>>
>>>> Again, if it's simply too inconvenient or bloated to implement DT
>>>> properly
>>>> in U-Boot, let's just drop it entirely. A halfway solution is the worst
>>>> of
>>>> both worlds. I'm not convinced the full implications of how to (and the
>>>> need
>>>> to) correctly and fully support DT have were well thought through before
>>>> (control) DT support was added to U-Boot.
>>>>
>>>>> Re memory-map, yes it doesn't seem to be possible to do what it is
>>>>> trying to do (and Thierry says the same below). It is quite weird to
>>>>> have a SPI peripheral which is also memory mapped.
>>>>>
>>>>> Here's my question - if you fix the CONFIG_PHYS_64BIT problem with
>>>>> 64-bit Tegra, what is actually wrong with the way the function was?
>>>>
>>>>
>>>>
>>>> With the DT files now checked into U-Boot, I think it would accidentally
>>>> work, since we just happen to have set #address-cells=<2>,
>>>> #size-cells=<2>,
>>>> and that would just happen to match sizeof(fdt_addr_t).
>>>>
>>>> However note this is an accident on a couple of levels:
>>>>
>>>> a) This is because the code assumes that sizeof(fdt_addr_t) ==
>>>> #address-cells * 4. This is an invalid assumption since it does not
>>>> correctly honor the DT schema. It hard-codes the size of values whereas
>>>> DT
>>>> schema says the size is defined by the #xxx-cells properties.
>>>>
>>>> b) The original Tegra210 DTs that TomW posted had #address-cells=<1>,
>>>> #size-cells=<1>. I asked him to change that to match what I expected to
>>>> be
>>>> in the Linux kernel's Tegra210 DTs. However, if he'd rejected my request
>>>> or
>>>> I hadn't noticed that, then with CONFIG_PHYS_64BIT set,
>>>> fdtdec_get_addr_size() would have attempted to read twice as many bytes
>>>> as
>>>> it should have from the property. It's entirely plausible that someone
>>>> could
>>>> have come along later and realized CONFIG_PHYS_64BIT was set incorrectly
>>>> and
>>>> enabled it.
>>>>
>>>>> This is a boot loader so we should be willing to make some
>>>>> simplifications.
>>>>
>>>>
>>>>
>>>> When dealing with internal bootloader details, sure assumptions,
>>>> simplifications, etc. can be made.
>>>>
>>>> However, DT is an externally defined standard. The content of DT must be
>>>> identical across all OSs (SW stacks, bootloader) and not influenced by
>>>> requirements/... of any specific individual OS's (SW stack, bootloader)
>>>> quirks. We can't just pick and choose which parts of it we care about.
>>>> Well,
>>>> perhaps if we stop calling it DT we could.
>>>
>>>
>>> So I think in summary:
>>>
>>> - 64-bit machines should have CONFIG_PHYS_64BIT set correctly
>>
>>
>> It turns out that arch/arm/include/asm/config.h already enables this for all
>> ARM64 platforms.
>>
>> As such, we can in fact go ahead with reverting this patch, and U-Boot will
>> still function on Tegra210 boards.
>>
>> In the short term, I think that means TomR should just apply this revert
>> patch, and we don't need to send any additional patches. In the slightly
>> longer term, we should add some comments to fdtdec_get_addr_size()
>> describing its problems, and slightly longer term, add back Thierry's patch,
>> but in a way that lets callers specify whether #address-cells/#size-cells
>> should be used, or whether caller-supplied hard-coded values should be used.
>
> OK great. But I see your new patch so I think we can apply both at the
> same time.
>
>>
>> I apologize for not noticing this earlier; I'd assumed that since none of
>> the Tegra-specific files in include/configs/ set this flag, nor any of the
>> Tegra-specific Kconfig files, it wasn't set, and hence a revert of the patch
>> would break Tegra210 support.
>
> No problem - I assumed it would also.
>
>>
>>> - then fdtdec_get_addr_size() would work as expected
>>
>>
>> I take issue with *works* as expected", since I would expect the function to
>> implement DT parsing rules completely.
>>
>> However, it's certainly true to say that it will generate the desired
>> results, even if it doesn't /work/ (isn't implemented) as expected.
>>
>> (I suppose this depends on whether you're talking about "works" w.r.t to
>> correctness of the returned results or side-effects, or w.r.t. how the
>> internal implementation works.)
>>
>>> - I want to make this cases as efficient as possible since it will be
>>> called in SPL
>>> - You are concerned that making assumptions like this violates the DT spec
>>>
>>> One option is to split the functions into two - one that works in SPL
>>> and makes assumptions, and one that does not and does things the hard
>>> way.
>>
>>
>> Why should SPL be any different? U-Boot should either parse DT correctly or
>> not at all. SPL shouldn't be a special case. Admittedly SPL on Tegra does
>> very little (and isn't even present on Tegra210), but in general, can't
>> someone use storage drivers, filesystems, etc. in SPL to choose the next
>> stage to load, read GPIOs or other data sources to select between different
>> boot paths, perhaps even interact with a user? If so, then assuming that SPL
>> can somehow implements a reduced set of features, and hence can make
>> assumptions that non-SPL can't, seems quite dangerous. We could only do that
>> if we put some active checks in the U-Boot makefiles to ensure that nobody
>> enabled anything in the SPL besides a set of strictly audited features.
>
> Yes we could do that and it would avoid pain later. I suppose SPL on
> Tegra is a bit of a special case since there is really no size limit.
> For some chips the limits are pretty severe so I am quite sensitive to
> code size even at the expense of extra debug time when something
> breaks.
>
>>
>>> I suppose we could also add checks/warnings that the DT is
>>>
>>> 'well-formed' and that the address size matches the machine it is
>>> running on.
>>
>>
>> Yes, we certainly should do that.
>>
>>> In any case we do need to get rid of the parent lookup in this
>>> function. So can either you or Thierry submit a patch to do this? The
>>> parent should instead be a parameter to the function. You may need to
>>> create a stub function which looks it up before calling
>>> fdtdec_get_addr_size().
>>
>>
>> Yes, we can't remove the parent lookup in all cases. However, we can avoid
>> it in the cases where the caller can supply it. I think that's most, but not
>> quite all.
>
> My main concern is dev_get_addr() since that is the official way of
> reading a device address now. See also regmap_init_mem() which does
> things its own way.
>
>>
>>
>>> I'll see how things look in SPL.
>>>
>>> Regards,
>>> Simon
>>>
>>
>

Do we have any conclusion about commit 5b34436? Today I started to
check the pre-relocatoin DM PCI UART issue, but found it is now broken
due to this commit. The broken part is at
ns16550_serial_ofdata_to_platdata() in drivers/serial/ns16550.c, in
which it has:

    addr = fdtdec_get_addr(gd->fdt_blob, dev->of_offset, "reg");
#ifdef CONFIG_PCI
    if (addr == FDT_ADDR_T_NONE) {
...

Before commit 5b34436, the old behavior is that the call to
fdtdec_get_addr() returns FDT_ADDR_T_NONE so that we can trap into the
PCI logic. But with commit 5b34436, addr is now zero which just bypass
this logic.

As for why addr is now zero, this is because fdtdec_get_number() can
only handle a 64-bit number at most. However for PCI reg, it has 3
cells. So if I have the following encoding:

reg = <0x00025100 0x0 0x0 0x0 0x0
           0x01025110 0x0 0x0 0x0 0x0>;

The addr will be assigned to zero after two rounds of left shift by 32-bit.

I can certainly change ns16550 driver to test the return value against
0 now, but I think this fdtdec_get_addr() does not cover all cases.
Please advise.

Regards,
Bin