[U-Boot] [PATCH] sunxi: Machine id hack to prevent loading buggy sunxi-3.4 kernels

Sat Feb 21 10:41:48 CET 2015

Hi,

On 20-02-15 19:33, Siarhei Siamashka wrote:
> On Fri, 20 Feb 2015 15:11:04 +0100
> Hans de Goede <hdegoede at redhat.com> wrote:
>
>> Hi,
>>
>> On 20-02-15 11:36, Siarhei Siamashka wrote:
>>> On Fri, 20 Feb 2015 10:19:51 +0100
>>> Hans de Goede <hdegoede at redhat.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> On 20-02-15 09:08, Siarhei Siamashka wrote:
>>>>> Store the 'compatibility revision' number in the top 4 bits of the
>>>>> machine id and pass it to the kernel. The old buggy kernels will
>>>>> fail to load with a very much googlable error message on the serial
>>>>> console:
>>>>>
>>>>>      "Error: unrecognized/unsupported machine ID (r1 = 0x100010bb)"
>>>>>
>>>>> This error message can be documented in the linux-sunxi wiki with
>>>>> proper explanations about how to resolve this situation and where
>>>>> to get the necessary bugfixes for the sunxi-3.4 kernel.
>>>>>
>>>>> The fixed sunxi-3.4 kernels can implement a revision compatibility
>>>>> check and clear the top 4 bits of the machine id if everything is
>>>>> alright.
>>>>>
>>>>> Signed-off-by: Siarhei Siamashka <siarhei.siamashka at gmail.com>
>>>>
>>>> TBH I'm not a big fan if this.
>>>>
>>>>> ---
>>>>>
>>>>> To be used together with:
>>>>>        https://groups.google.com/forum/#!topic/linux-sunxi/LOAxP3kAYs8
>>>>
>>>> Nor of this one.
>>>>
>>>> What I would prefer is for CONFIG_OLD_SUNXI_KERNEL_COMPAT to go away
>>>
>>> Yes, the CONFIG_OLD_SUNXI_KERNEL_COMPAT option is barely useful for
>>> anything right now. And it would be great to get rid of it.
>>>
>>>> and for u-boot to automatically do the right thing when booting an old
>>>> kernel.
>>>
>>> Sure, but first we need to define what is the "right thing".
>>>
>>>> Recently some patches where merged to make "bootm" work without an
>>>> fdt even when build with fdt support.
>>>>
>>>> Specifically in common/image-fdt.c line 437 there is:
>>>>
>>>>            if (!select && ok_no_fdt) {
>>>>                    debug("Continuing to boot without FDT\n");
>>>>                    return 0;
>>>>            }
>>>>
>>>> So we known when executing the bootm command that we do not have an fdt.
>>>>
>>>> If this is the case, and only when this (no fdt) is the case then in
>>>> arch/arm/lib/bootm.c: boot_prep_linux()
>>>>
>>>> The board specific setup_board_tags() function gets called, we could
>>>> define our own version of this (it has an empty weak default), and in
>>>> our own version fixup things for the old kernel to just work.
>>>
>>> Sounds like we are coming back to
>>>
>>>       http://lists.denx.de/pipermail/u-boot/2014-October/191697.html
>>
>> Yes.
>>
>>> As you may guess, I'm in favour of having everything working
>>> automatically without the need to set extra Kconfig options or
>>> environment variables :-)
>>
>> So am I.
>>
>>>
>>>> That means:
>>>>
>>>> Halving PLL5, then waiting for it to settle, then reprogramming the
>>>> DRAM clk divider, I'm assuming that this will work if done in this
>>>> order, but we obviously need to test this thoroughly.
>>>>
>>>> Halving PLL6, then waiting for it to settle, then reprogram the MBUS
>>>> divider, and check and update mmc mod clock dividers.
>>>
>>> This already looks somewhat more complicated than necessary...
>>
>> So how about we just halve the frequency of the 2 PLL-s
>> (PLL6 only on sun5i) everywhere and be done with it?
>
> Yes, that's what I also asked in
>
>      http://lists.denx.de/pipermail/u-boot/2015-February/205637.html
>
>> It seems to me that the advantage of having them doubled so that we
>> can get more clocks is mostly theoretical in both cases anyways.
>
> It is not so theoretical for PLL5P though.
>
>> I'm fine with halving PLL6 on sun5i, but I believe we should then really
>> also have PLL5 everywhere, taking us back to the original allwinner /
>> u-boot-sunxi settings.
>>
>> I seriously wonder what exactly, concretely having PLL5 doubled gains
>> us. Yes we could run MBUS on 2/3th of the DRAM speed which we cannot
>> do otherwise, but AFAIK we want MBUS at 300 for optimal perfomance,
>> and we can get MBUS at exactly 300 by using pll6 as its parent
>> (even if we halve PLL6).
>
> We might potentially want to set MBUS higher than 300. Yes, the
> A13 manual says that 300MHz is the limit. But in practice it can be
> increased as high as 576MHz on more than one board (with some VDD-DLL
> voltage increase). There seems to be really a lot of headroom and going
> slightly beyond 300MHz might be OK.
>
>> Do you have any non theoritical example where having PLL5 doubled helps?
>
> Sigh. I believe, that I had already explained it multiple times in
> various e-mail posts. But here we go again:
>
> 1. The G2D (Mixer Processor) clock speed has a certain limit. We don't
> know what it is exactly, but the sunxi-3.4 kernel used to hardcode it
> to 1/2 of the PLL5P speed, which means that we can assume at least
> 240MHz for the boards with 480MHz DRAM clock speed.
>
> Now let's see what happens if we try to increase the DRAM clock speed
> to 504MHz. In the case of the doubled PLL5P and divisor 5, we can get
> 1008MHz / 5 = 201.6 MHz clock speed for the G2D. But in the case
> of the same PLL5P as the DRAM clock speed, we have to use the divisor
> 3 and only get 504MHz / 3 = 168MHz. Obviously, 201.6MHz is faster than
> 168MHz.
>
> The erratic G2D clock speed drops when gradually increasing the DRAM
> clock speed are not very nice. People are going to have a lot of
> headache finding a reasonable compromise between the DRAM and G2D
> performance. Doubling PLL5 reduces the performance penalties.

Hmm, I see that is a nasty problem, esp. with the older kernels
not dealing well with the doubled pll.

> And yes, G2D is used today in the sunxi-3.4 kernel. Hopefully it
> will be also supported in the mainline kernel soon.
>
> 2. If we configure the DRAM clock speed as 648MHz, then the PLL5
> clock speed already becomes rather high. IMHO, it is not healthy
> for the peripherals even without further doubling it. Tsvetan
> reported LCD flicker problems at 768MHz, which is not terribly
> far away from 648MHz. The real solution is to use the bugfixed
> sunxi-3.4 kernels, and the fixes are not very invasive (which
> means that they can be cherry picked to any potential fork).

Ok, but then we should really also fix the PLL6 issue on sun5i in
the older kernels, that should be as simple as simply removing the
call setting the PLL6 speed from the code you pointed too, everything
else can pretty much be kept as is. But I'm afraid I've other priorities
then fixing this.

> 3. Not quite related to PLL5, but we have to deal with the voltage
> regulators too. Old buggy kernels are also trying to change the DCDC3
> voltage in the case of AXP209 and DCDC4 voltage in the case of AXP152.
> Which means that u-boot can't safely change these voltages if the
> compatibility with old buggy kernels is desired. Without increasing
> the VDD-DLL voltage, we can't reach very high MBUS and DRAM clock
> speeds. This is yet another PITA, which holds us back.

And something which we could fix by increasing the number you want to
store in the high bits of the machine id and then too old kernels would
simply not work rather then be unreliable. So yes I can see why you want
what you want.

> 4. Currently running DRAM reliability tests requires the sunxi-3.4
> kernel (see https://github.com/ssvb/lima-memtester/ for more details).
> I don't really like having different DRAM clock speed setup for the
> mainline kernel and for the sunxi-3.4 kernel, that's why using the
> CONFIG_OLD_SUNXI_KERNEL_COMPAT option is not a very good idea in
> general. It's much better to use the bugfixed sunxi-3.4 kernel
> without CONFIG_OLD_SUNXI_KERNEL_COMPAT.
>
>> Because if not then I'm becoming more and more in favor of just halving
>> it
>
> The sun4i/sun5i/sun7i DRAM controller code in u-boot is ready for much
> faster DRAM clock speeds since the v2014.10 release. We are only
> missing the appropriate 'dram_para' settings for the boards, which can
> be prepared/verified according to the instructions from the linux-sunxi
> wiki. But there does not seem to be much interest in the performance
> and reliability for the sunxi boards yet. And participation of the
> hardware vendors (for doing large scale tests on many boards) is
> missing too.
>
> Maybe now after the introduction of the Raspberry Pi 2, the Allwinner
> based devboard manufacturers might become a bit more interested in
> tweaking the performance in order to remain competitive.
>
> I believe that every Cubietruck user had more than enough time to
> try my 'highspeedtruck' branches posted at
>
>      http://lists.denx.de/pipermail/u-boot/2014-July/183981.html
>
> That's "the proof of the pudding", which demonstrates what is
> possible with this hardware :-)

I still believe that the only way to get anywhere wrt getting better
DRAM speeds is to just make the change. As said before if you submit
patches to increase DRAM speed on some boards I'll put them in
my personal sunxi-wip and the official u-boot-sunxi/next asap, and
then we can ask people to test that, and once the merge window for
v2015.07 opens we can land those changes and see from there.

What would also be welcome is a wiki page for reading DRAM chip
markings, so that people can figure out what their board should
be able to handle theoretically (assuming the pcb is not holding
things back).

> But now you are telling me that you are in favour of crippling it
> before it really takes off the ground...

When I was asking why not just halve pll5 I was sincerely asking, and
your g2d story is a compeling reason not to do that, if only the g2d could
use pll6 as a parent, then we could just run it as 300MHz (which I
believe should be a safe speed, as that is what the limit seems to
be for the backend and frontend bits), but alas like the backend /
frontend it has an IMHO weird set of parent choices. Given that
weird set of parent choices I see the logic in not wanting to halve
pll5, so lets try to find another solution for this.

>> leaving us with only the A20 sec vs nonsec boot thing for which the
>> solution which I've outlined in my previous mail should work nicely.
>
> Well, this is not really a news. We could have fixed the sec vs nonsec
> boot thing months ago (if there only was a consensus about it). But
> better late than never.

Fixing this only becomes interesting IMHO once we've the other bits sorted
out. Also I know the involved code a lot better know then back then which
has resulted in me seeing a possible way to fix this relatively cleanly.

>>>> Modifying armv7_boot_nonsec so that we can override the default. We
>>>> can do this e.g. by adding a global armv7_boot_nonsec_default variable
>>>> and setting that from setup_board_tags().
>>>>
>>>> This way we can just do the right thing automatically,
>>>
>>> By the "right thing", do you mean booting the sunxi-3.4 kernel with
>>> weird PLL5 and PLL6 settings just because some *older* versions of
>>> this kernel used to have bugs in the past?

By the right thing I mean having things just work for the end user,
whatever I'm doing having things just work is always my end goal.
This is also why I'm mostly focussing upstream, which has resulted in
Fedora 21 and the upcoming Debian release supporting allwinner devices
ootb (albeit only headless) and the upcoming F-22 release will have
much improved allwinner device support, supporting more devices, and
supporting video output, etc.

>>>> and this has
>>>> the added advantage that if we later find out that we're doing
>>>> something in u-boot which is not good for the older kernels we can
>>>> fix it in u-boot without needing to coordinate with the sunxi-3.4
>>>> kernels. In my experience the version check for compatibility style
>>>> solution you are proposing brings a large maintenance burden,
>>>
>>> I don't expect any maintenance burden at all. We only ever need this
>>> coordination to take care of very serious showstopper bugs. And such
>>> bugs have been already fixed.
>>
>> Been there done that did not like it, this is going to suck big time,
>> trust me I've done enough of this kinda stuff to know.
>
> I would say, that it would be a good idea to trust me this time for a
> change :-)  You were given a chance to enforce your own solution, and
> it quite predictably sucks. Too bad that Tsvetan and Lars had to
> encounter problems in order to (hopefully) make this clear for everyone.
>
> In fact, I have a better suggestion. We can have this versioned
> machine id hack active only when CONFIG_OLD_SUNXI_KERNEL_COMPAT
> is not defined. So that the normal build does not have non-obvious
> runtime failures with unpatched sunxi-3.4 kernels anymore. And the
> CONFIG_OLD_SUNXI_KERNEL_COMPAT still works just like before. I'll
> send a v2 patch shortly.

Ok, I think I can live with that as solution, esp. since it will also
work to ensure that we don't run old kernels which muck with voltages
making things run unreliable in an unpredictable manner.

I'm not 100% happy about overloading the upper id bits though, that
is not how machine ids are supposed to work officially, I guess we can
get away with this since we are keeping all related changes confined
to sunxi code only. But I'm not 100% enthusiastic about this.

Ian, can you live with overloading the high machine-id bits to
deliberately break old kernels when not compiled with old kernel
compatibility?

Regards,

Hans