[U-Boot] driver model is not smp safe

Simon Glass sjg at chromium.org
Fri Aug 7 21:09:02 CEST 2015


Hi Bin,

On 5 August 2015 at 02:43, Bin Meng <bmeng.cn at gmail.com> wrote:
> Hi Simon, Tom,
>
> On Tue, Aug 4, 2015 at 3:27 AM, Simon Glass <sjg at chromium.org> wrote:
>> Hi Tom,
>>
>> On 3 August 2015 at 13:06, Tom Rini <trini at konsulko.com> wrote:
>>> On Mon, Aug 03, 2015 at 12:52:19PM -0600, Simon Glass wrote:
>>>> Hi Tom,
>>>>
>>>> On 31 July 2015 at 08:31, Tom Rini <trini at konsulko.com> wrote:
>>>> > On Thu, Jul 30, 2015 at 12:12:03PM +0800, Bin Meng wrote:
>>>> >
>>>> >> Hi Simon,
>>>> >>
>>>> >> When adding x86 multi-cpu initialization on a board with 4 cores, I found:
>>>> >>
>>>> >> => cpu list
>>>> >>   0: cpu at 0               Genuine Intel(R) CPU         @ 1.58GHz
>>>> >>   1: cpu at 1               Genuine Intel(R) CPU         @ 1.58GHz
>>>> >>   2: cpu at 2               Genuine Intel(R) CPU         @ 1.58GHz
>>>> >>   2: cpu at 3               Genuine Intel(R) CPU         @ 1.58GHz
>>>> >>
>>>> >> cpu at 2 and cpu at 3 have the same sequence number, which indicates they
>>>> >> are running parallelly to get the same sequence number. The call chain
>>>> >> on an ap is: mp_init_cpu() -> device_probe() -> uclass_resolve_seq().
>>>> >> Apparently ap2 and ap3 are running at the same time to get the same
>>>> >> number.
>>>> >>
>>>> >> Note so far all x86 boards that we have enabled x86 multi-cpu
>>>> >> initialization on only have 2 cores, which will not expose such issue
>>>> >> as there is no parallel execution among aps.
>>>> >
>>>> > So what exactly are we doing with these additional cores?  My
>>>> > recollection of what we do on other arches when we even deal with other
>>>> > cores is that we bring them "up" and then usually put them in a holding
>>>> > pattern for the real OS to deal with _or_ it's one of those cases where
>>>> > we have multiple OSes running and we do what we need to load and release
>>>> > those other OSes.
>>>>
>>>> In this case they end up at stop_this_cpu() which is just a hlt
>>>> instruction in each case.
>>>
>>> So do we really have to be doing anything here?  Or is this just
>>> pre-emptive work for an async MP type setup down the road?  We could
>>> probably live with this with a big comment noting why we know it's
>>> misbehaving.
>>
>> I think we should fix it - I suggested some options above and Bin may
>> have ideas also. Bin may be able to send a patch since he can repeat
>> the problem.
>>
>
> Yes we should fix it. But IMHO, just fixing the seq number only
> resolves the surface problem. What concerns me is that multiple cpu
> running the same piece of codes (in this case, the DM core codes)
> without any protection. I have no idea whether these core structures
> (like the device list) still look good from the DM core perspective.
> Although right now it seems that it only exposes the seq number issue,
> we don't know if there are other potential DM issues. Thus I was
> thinking fundamentally we are using DM CPU uclass in a wrong way.

We don't add devices when running on the AP CPUs - we only scan lists.
So long as the boot CPU creates all the devices and then waits for
them to populate, we are OK. I don't see any fundamental problem.

Regards,
Simon


More information about the U-Boot mailing list