On Wed, Mar 09, 2011 at 09:38:07PM +0100, hubert depesz lubaczewski wrote:
> So. every now and then (couple of times per day at most). I see hundreds
> (800-900) of connections in "PARSE" state.
>
> I did notice one thing.
>
> we do log output of ps axo user,pid,ppid,pgrp,%cpu,%mem,rss,lstart,nice,nlwp,sgi_p,cputime,tty,wchan:25,args
> every 15 seconds or so.
>
> And based on its output, I was able to get stats of "wchan" of all PARSE
> pg processes when the problem was logged.
> Results:
>
> 805 x semtimedop
> 10 x stext
>
> Any ideas on what could be wrong? Machine was definitely not loaded most
> of the times it happened.
>
> The problem usually goes away in ~ 10-15 seconds.
Would you have your monitoring process detect this condition and capture stack
traces, preferably from a gdb with access to debug information, of several of
these processes? That will probably make the specific contention point clear.