[U-Boot] driver model is not smp safe

Bin Meng bmeng.cn at gmail.com
Wed Aug 5 10:43:05 CEST 2015


Hi Simon, Tom,

On Tue, Aug 4, 2015 at 3:27 AM, Simon Glass <sjg at chromium.org> wrote:
> Hi Tom,
>
> On 3 August 2015 at 13:06, Tom Rini <trini at konsulko.com> wrote:
>> On Mon, Aug 03, 2015 at 12:52:19PM -0600, Simon Glass wrote:
>>> Hi Tom,
>>>
>>> On 31 July 2015 at 08:31, Tom Rini <trini at konsulko.com> wrote:
>>> > On Thu, Jul 30, 2015 at 12:12:03PM +0800, Bin Meng wrote:
>>> >
>>> >> Hi Simon,
>>> >>
>>> >> When adding x86 multi-cpu initialization on a board with 4 cores, I found:
>>> >>
>>> >> => cpu list
>>> >>   0: cpu at 0               Genuine Intel(R) CPU         @ 1.58GHz
>>> >>   1: cpu at 1               Genuine Intel(R) CPU         @ 1.58GHz
>>> >>   2: cpu at 2               Genuine Intel(R) CPU         @ 1.58GHz
>>> >>   2: cpu at 3               Genuine Intel(R) CPU         @ 1.58GHz
>>> >>
>>> >> cpu at 2 and cpu at 3 have the same sequence number, which indicates they
>>> >> are running parallelly to get the same sequence number. The call chain
>>> >> on an ap is: mp_init_cpu() -> device_probe() -> uclass_resolve_seq().
>>> >> Apparently ap2 and ap3 are running at the same time to get the same
>>> >> number.
>>> >>
>>> >> Note so far all x86 boards that we have enabled x86 multi-cpu
>>> >> initialization on only have 2 cores, which will not expose such issue
>>> >> as there is no parallel execution among aps.
>>> >
>>> > So what exactly are we doing with these additional cores?  My
>>> > recollection of what we do on other arches when we even deal with other
>>> > cores is that we bring them "up" and then usually put them in a holding
>>> > pattern for the real OS to deal with _or_ it's one of those cases where
>>> > we have multiple OSes running and we do what we need to load and release
>>> > those other OSes.
>>>
>>> In this case they end up at stop_this_cpu() which is just a hlt
>>> instruction in each case.
>>
>> So do we really have to be doing anything here?  Or is this just
>> pre-emptive work for an async MP type setup down the road?  We could
>> probably live with this with a big comment noting why we know it's
>> misbehaving.
>
> I think we should fix it - I suggested some options above and Bin may
> have ideas also. Bin may be able to send a patch since he can repeat
> the problem.
>

Yes we should fix it. But IMHO, just fixing the seq number only
resolves the surface problem. What concerns me is that multiple cpu
running the same piece of codes (in this case, the DM core codes)
without any protection. I have no idea whether these core structures
(like the device list) still look good from the DM core perspective.
Although right now it seems that it only exposes the seq number issue,
we don't know if there are other potential DM issues. Thus I was
thinking fundamentally we are using DM CPU uclass in a wrong way.

Regards,
Bin


More information about the U-Boot mailing list