[ELDK] Problem with pthread_cond_wait on 8xx?

Frank Svendsbøe frank.svendsboe at gmail.com
Wed Sep 23 17:20:28 CEST 2009


On Wed, Sep 23, 2009 at 3:55 PM, Detlev Zundel <dzu at denx.de> wrote:
> Hi Frank,
>
>> On Wed, Sep 23, 2009 at 2:13 PM, Detlev Zundel <dzu at denx.de> wrote:
>>> Hi Frank,
>>>
>>>> I have a problem with high CPU load on 8xx when using pthread_cond_wait.
>>>> I'm using the glibc v2.6 that came with ELDK 4.2, compiled with NPTL
>>>> support. The same code compiled for x86 works fine (but then using
>>>> glibc 2.9). I'm running torvalds mainline kernel v2.6.26-rc2.
>>>>
>>>> Can anyone here using ELDK 4.2 on 8xx or another PowerPC target try to
>>>> compile the program listed below and check the CPU load when running
>>>> it? Or pinpoint a problem with the code.
>>>>
>>>> My current assumption is that the glibc pthread_cond_wait call is
>>>> implemented using
>>>> busywaiting/spinlocking on 8xx, or something is wrong in the kernel.
>>>
>>> Actually I can reproduce both of your behaviours on my x86 and on a
>>> PowerPC system ;)
>>>
>>> I don't understand why this is the case, but if I compile the program
>>> without "-lpthread" I have ~100% CPU utilization on both systems.  If I
>>> do "the right thing" and compile with "-lpthread" everything is fine on
>>> both systems.
>>>
>>
>> Thanks for testing the code ;-)
>>
>> What PPC target did you test it on, what version of the standard C library
>> do you use, and what kernel?
>
> I used a MPC5200B system for testing with our ELDK 4.2 toolchain on a
> 2.6.29.2 kernel.  Out of lazyness, I compiled the test program natively.
>
>> If I don't link it against libpthread, I get link errors (unresolved
>> functions). I assume you're using the same toolchain (ELDK 4.2), so
>> this is strange. IOW, I have to link with -lpthread, and you
>> don't. How come?
>
> How should I know? ;) This was really an error on my side forgetting to
> specify "-lpthread" on the first shot (on x86 gcc 4.3.3-8 glibc 2.9
> linux 2.6.30.1) and I was kind of perplexed to see no link errors.  So I
> tried the same on native ELDK 4.2 and got the same results.
>
> So let me retest with the cross-compiler:
>
> [dzu at pollux dzu]$ eldk-switch tqm5200
> [ tqm5200 is using MPC5200 ]
> Setup for ppc_6xx (using ELDK 4.2)
> Adjusted /home/dzu/target-root pointing to /opt/eldk-4.2/ppc_6xx
> [dzu at pollux dzu]$ ${CROSS_COMPILE}gcc -o cond_wait_test cond_wait_test.c
> [dzu at pollux dzu]$
>
> Works....
>
>>> Maybe someone here on this list can explain this?
>>
>> Yes I really hope so. The best would be that someone could pinpoint
>> a problem with the example code, and why the behaviour is different on
>> different targets.
>
> Reading the code, I see absolutely no problem.
>
> What perplexed me however is that I could see the same thing on my x86
> host.  There I even took a look with ltrace and see that without
> "-lpthread" there is really a continous stream of library calls, whereas
> with "-lpthread", it is one single blocking call...
>

Thanks for sharing your observation. I can reproduce the same results
here on x86-64
too, which is good..

>> In case I don't find any solution, I will probably need to
>> upgrade/downgrade the kernel, switch to using uclibc/eglibc, or upgrade
>> glibc to v2.9. Do you plan to upgrade the ELDK anytime soon (ie. newer
>> glibc for instance)?
>
> We do not currently have a project which would finance such an upgrade,
> I'm afraid.
>

Well, ELDK is free software right? So maybe someone will volunteer for
that.. I wish.

>> FYI: I've used getconf and verified that I'm using the NPTL enabled pthreads
>> library on the 8xx.
>
> In your position, I'd try to reproduce my findings on the x86 host and
> get an explanation there first why two completely different behaviours
> occur.  When you have understood why this happens, maybe you can verify
> this for the PowerPC environment.
>

I haven't studied the libc implementation on either platforms, so this may
take a while to figure out..

Using ltrace, the test program compiled with -lpthreads gives:

signal(2, 0x004008d9)                            = NULL
pthread_mutex_lock(0x601080, 0, 0, 0x7fffe2482b18, 0x7fffe2482bb0) = 0
pthread_cond_wait(0x6010a8, 0x601080, 0x7fedda059fc0, 0x7fffe2482b18,
0x7fffe2482bb0  C-c C-c <unfinished ...>
--- SIGINT (Interrupt) ---
pthread_mutex_lock(0x601080, 0x7fffe24828b0, 0x7fffe2482780, -1, 0x601080) = 0
pthread_cond_signal(0x6010a8, 0, 0x7fedda059fc0, -1, 0x601080) = 0

.. while without -lthread specified, I get:

pthread_cond_wait(0x6010a8, 0x601080, 0, 0x7fffcd175808, 0x7fffcd1758a0) = 0
pthread_cond_wait(0x6010a8, 0x601080, 0, 0x7fffcd175808, 0x7fffcd1758a0) = 0
pthread_cond_wait(0x6010a8, 0x601080, 0, 0x7fffcd175808, 0x7fffcd1758a0) = 0
pthread_cond_wait(0x6010a8, 0x601080, 0, 0x7fffcd175808, 0x7fffcd1758a0) = 0
pthread_  C-c C-c0x6010a8, 0x601080, 0, 0x7fffcd175808, 0x7fffcd1758a0
<unfinished ...>
--- SIGINT (Interrupt) ---
pthread_cond_signal(0x6010a8, 0x7fffcd1755f0, 0x7fffcd1754c0,
0x7fffcd175808, 0x7fffcd1758a0) = 0
pthread_mutex_unlock(0x601080, 0x7fffcd1755f0, 0x7fffcd1754c0,
0x7fffcd175808, 0x7fffcd1758a0) = 0
pthread_cond_wait(0x6010a8, 0x601080, 0, 0x7fffcd175808, 0x7fffcd1758a0) = 0
pthread_mutex_unlock(0x601080, 0x601080, 0, 0x7fffcd175808, 0x7fffcd1758a0) = 0

Now compare the third argument in the former and latter dumps. The
first dump uses a pointer
that seems valid, while the latter is zero.

Dumps of same exercise done using strace:

With -lpthreads:

rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8720000, rlim_max=RLIM_INFINITY}) = 0
rt_sigaction(SIGINT, {0x4008d9, [INT], SA_RESTORER|SA_RESTART,
0x7fa7db11d040}, {SIG_DFL}, 8) = 0
futex(0x6010ac, FUTEX_WAIT_PRIVATE, 1, NULL  C-c C-c <unfinished ...>

Without -lpthread:

mprotect(0x7fa628fea000, 4096, PROT_READ) = 0
munmap(0x7fa628fd1000, 89354)           = 0
rt_sigaction(SIGINT, {0x400879, [INT], SA_RESTORER|SA_RESTART,
0x7fa628a8c040}, {SIG_DFL}, 8) = 0
  C-c C-c--- SIGINT (Interrupt) @ 0 (0) ---

The former shows that a futex is called. Why not on the latter?

Best regards,
Frank


> Cheers
>  Detlev
>
> --
> Today people don't go to rock concerts to listen to the music, because
> you can't. They go there to be part of the environment.
>                                       -- Peter Eisenman
> --
> DENX Software Engineering GmbH,      MD: Wolfgang Denk & Detlev Zundel
> HRB 165235 Munich,  Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-40 Fax: (+49)-8142-66989-80 Email: dzu at denx.de
>


More information about the eldk mailing list