[U-Boot] [PATCH 8/8] x86: quark: Optimize MRC execution time

Simon Glass sjg at chromium.org
Tue Sep 1 01:12:43 CEST 2015


Hi Bin,

On 31 August 2015 at 08:04, Bin Meng <bmeng.cn at gmail.com> wrote:
> Hi Simon,
>
> On Mon, Aug 31, 2015 at 9:54 PM, Simon Glass <sjg at chromium.org> wrote:
>> Hi Bin,
>>
>> On 31 August 2015 at 07:43, Bin Meng <bmeng.cn at gmail.com> wrote:
>>> Hi Simon,
>>>
>>> On Mon, Aug 31, 2015 at 9:20 PM, Simon Glass <sjg at chromium.org> wrote:
>>>> Hi Bin,
>>>>
>>>> On 31 August 2015 at 03:52, Bin Meng <bmeng.cn at gmail.com> wrote:
>>>>> Boot time performance degradation is observed with the conversion
>>>>> to use dm pci. Intel Quark SoC has a low end x86 processor with
>>>>> only 400MHz frequency and the most time consuming part is with MRC.
>>>>> Each MRC register programming requires indirect access via pci bus.
>>>>> With dm pci, accessing pci configuration space has some overhead.
>>>>> Unfortunately this single access overhead gets accumulated in the
>>>>> whole MRC process, and finally leads to twice boot time (25 seconds)
>>>>> than before (12 seconds).
>>>>>
>>>>> To speed up the boot, create an optimized version of pci config
>>>>> read/write routines without bothering to go through driver model.
>>>>> Now it only takes about 3 seconds to finish MRC, which is really
>>>>> fast (8 times faster than dm pci, or 4 times faster than before).
>>>>>
>>>>> Signed-off-by: Bin Meng <bmeng.cn at gmail.com>
>>>>> ---
>>>>>
>>>>>  arch/x86/cpu/quark/msg_port.c | 59 +++++++++++++++++++++++++++----------------
>>>>>  1 file changed, 37 insertions(+), 22 deletions(-)
>>>>
>>>> Before I delve into the patch - with driver model we are using the I/O
>>>> method - see pci_x86_read_config(). Is that the source of the slowdown
>>>> or is it just general driver model overhead.
>>>
>>> The MRC calls APIs in arch/x86/cpu/quark/msg_port.c to program DDR
>>> controller. Inside msg_port.c, pci_write_config_dword() and
>>> pci_read_config_dword() are called.
>>>
>>> With driver model, the overhead is:
>>>
>>> pci_write_config_dword() -> pci_write_config32() -> pci_write_config()
>>> -> uclass_get_device_by_seq() then pci_bus_write_config() will finally
>>> call pci_x86_read_config().
>>>
>>> Without driver model, there is still some overhead (so previously the
>>> MRC time was about 12 seconds)
>>>
>>> pci_write_config_dword() -> pci_hose_write_config_dword() ->
>>> TYPE1_PCI_OP(write, dword, u32, outl, 0)
>>>
>>> With my optimized version, pci_write_config_dword() directly calls a
>>> hardcoded dword size pci config access, without the need to consider
>>> offset and mask, and dereferencing hose->cfg_addr/cfg->data.
>>
>> What about if we use dm_pci_read_config32()? We should try to move PCI
>> access to driver model to avoid the uclass_get_device_by_seq()
>> everywhere.
>
> I don't think that helps. dm_pci_read_config32() requires a dm driver.
> MRC is just something that program a bunch of registers with pci
> config rw call.

My question is really what takes the time? It's not clear whether it
is the driver model overhead or something else. The code you add in
qrk_pci_write_config_dword() looks very similar to
pci_x86_read_config().

>
>>
>>>
>>>>
>>>> If the former then perhaps we should change this. If the latter then
>>>> we have work to do...
>>>>
>
> Also for this patch, I just realized that not only it helps to reduce
> the boot time, but also it helps to support PCIe root port in future
> patches.
>
> By checking the Quark SoC manual, I found something that needs to be
> done to get the two PCIe root ports on Quark SoC to work. In order to
> get PCIe root ports show up in the PCI configuration space, we need
> program some registers in the message bus (using APIs in
> arch/x86/cpu/quark/msg_port.c). With driver model, if we still call
> pci_write_config_dword() in the msg_port.c, we end up triggering PCI
> enumeration process first before we actually write something to the
> configuration space, however the enumeration process will hang when
> scanning to the PCIe root port if we don't properly initialize the
> message bus registers. This is a chicken-egg problem.

Sure, although I see that as a separate problem.

We can't have a driver model implementation that is really slow, so
I'd like to clear that up first. Of course your patch makes sense for
other reasons, but I don't want to play whack-a-mole here.

Also we should start to move things away from the non-driver-model pci
functions.

Regards,
Simon


More information about the U-Boot mailing list