Thread: Re: postgres processes spending most of their time in the

Re: postgres processes spending most of their time in the

From

"Jeffrey W. Baker"

Date:

28 December 2001, 16:06:55

On Fri, 28 Dec 2001, Tom Lane wrote:

> Hmm, those look perfectly normal, except one would like to think it's
> the exception not the rule to be blocking on LWLocks.  If you're seeing
> a lot of semops coming from these code paths then something unhappy is
> going on (and I would imagine that 7.1 would've been worse, since it
> would busy-wait under similar conditions).
>
> Could we get some details about your application?  How many active
> backends do you have, and what are they doing?  A query trace
> (postmaster log with debug_print_query on) might be helpful.

Well, let's don't get too far off track.  The problem, I think, is using
SysV semaphores instead of CPU-specific test-and-set locking.  Unless I
misunderstand something.....

Here's some more stacks:

#0  semop (semid=1277952, sops=0xbfffeae0, nsops=1) at
../sysdeps/unix/sysv/linux/semop.c:35
#1  0x8150424 in IpcSemaphoreLock (semId=1277952, sem=4, interruptOK=0
'\000') at ipc.c:422
#2  0x8157bfb in LWLockAcquire (lockid=WALWriteLock, mode=LW_EXCLUSIVE) at
lwlock.c:271
#3  0x8095dc2 in XLogFlush (record={xlogid = 5, xrecoff = 194128600}) at
xlog.c:1230
#4  0x814af12 in BufferReplace (bufHdr=0x40401ba8) at bufmgr.c:1061
#5  0x814a231 in BufferAlloc (reln=0x4054e240, blockNum=1395,
foundPtr=0xbfffec03 "") at bufmgr.c:424
#6  0x8149dc5 in ReadBufferInternal (reln=0x4054e240, blockNum=1395,
bufferLockHeld=0 '\000')
    at bufmgr.c:184
#7  0x814a802 in ReleaseAndReadBuffer (buffer=55, relation=0x4054e240,
blockNum=1395) at bufmgr.c:694
#8  0x807a3cd in heapgettup (relation=0x4054e240, dir=1, tuple=0x82af4a4,
buffer=0x82af4bc,
    snapshot=0x82afb58, nkeys=0, key=0x0) at heapam.c:369
#9  0x807aad4 in heap_getnext (scan=0x82af4a0, backw=0) at heapam.c:806
#10 0x80fad1f in SeqNext (node=0x82afa30) at nodeSeqscan.c:96
#11 0x80f27bc in ExecScan (node=0x82afa30, accessMtd=0x80fac30 <SeqNext>)
at execScan.c:95
#12 0x80fad6b in ExecSeqScan (node=0x82afa30) at nodeSeqscan.c:133
#13 0x80ef84c in ExecProcNode (node=0x82afa30, parent=0x0) at
execProcnode.c:285
#14 0x80ee0e6 in ExecutePlan (estate=0x82afad0, plan=0x82afa30,
operation=CMD_SELECT, numberTuples=0,

#0  semop (semid=1277952, sops=0xbfffead4, nsops=1) at
../sysdeps/unix/sysv/linux/semop.c:35
#1  0x81504d2 in IpcSemaphoreUnlock (semId=1277952, sem=2) at ipc.c:456
#2  0x8157fa6 in LWLockRelease (lockid=WALWriteLock) at lwlock.c:455
#3  0x8095e90 in XLogFlush (record={xlogid = 5, xrecoff = 194128600}) at
xlog.c:1242
#4  0x814af12 in BufferReplace (bufHdr=0x40401ba8) at bufmgr.c:1061
#5  0x814a231 in BufferAlloc (reln=0x4054e240, blockNum=1395,
foundPtr=0xbfffec03 "") at bufmgr.c:424
#6  0x8149dc5 in ReadBufferInternal (reln=0x4054e240, blockNum=1395,
bufferLockHeld=0 '\000')
    at bufmgr.c:184
#7  0x814a802 in ReleaseAndReadBuffer (buffer=55, relation=0x4054e240,
blockNum=1395) at bufmgr.c:694
#8  0x807a3cd in heapgettup (relation=0x4054e240, dir=1, tuple=0x82af4a4,
buffer=0x82af4bc,
    snapshot=0x82afb58, nkeys=0, key=0x0) at heapam.c:369
#9  0x807aad4 in heap_getnext (scan=0x82af4a0, backw=0) at heapam.c:806
#10 0x80fad1f in SeqNext (node=0x82afa30) at nodeSeqscan.c:96
#11 0x80f27bc in ExecScan (node=0x82afa30, accessMtd=0x80fac30 <SeqNext>)
at execScan.c:95
#12 0x80fad6b in ExecSeqScan (node=0x82afa30) at nodeSeqscan.c:133
#13 0x80ef84c in ExecProcNode (node=0x82afa30, parent=0x0) at
execProcnode.c:285
#14 0x80ee0e6 in ExecutePlan (estate=0x82afad0, plan=0x82afa30,
operation=CMD_SELECT, numberTuples=0,

#0  semop (semid=1277952, sops=0xbfffeb24, nsops=1) at
../sysdeps/unix/sysv/linux/semop.c:35
#1  0x81504d2 in IpcSemaphoreUnlock (semId=1277952, sem=1) at ipc.c:456
#2  0x8157fa6 in LWLockRelease (lockid=101) at lwlock.c:455
#3  0x814c05d in TerminateBufferIO (buf=0x40401ba8) at bufmgr.c:2139
#4  0x814a338 in BufferAlloc (reln=0x4054e240, blockNum=1395,
foundPtr=0xbfffec03 "") at bufmgr.c:470
#5  0x8149dc5 in ReadBufferInternal (reln=0x4054e240, blockNum=1395,
bufferLockHeld=0 '\000')
    at bufmgr.c:184
#6  0x814a802 in ReleaseAndReadBuffer (buffer=55, relation=0x4054e240,
blockNum=1395) at bufmgr.c:694
#7  0x807a3cd in heapgettup (relation=0x4054e240, dir=1, tuple=0x82af4a4,
buffer=0x82af4bc,
    snapshot=0x82afb58, nkeys=0, key=0x0) at heapam.c:369
#8  0x807aad4 in heap_getnext (scan=0x82af4a0, backw=0) at heapam.c:806
#9  0x80fad1f in SeqNext (node=0x82afa30) at nodeSeqscan.c:96
#10 0x80f27bc in ExecScan (node=0x82afa30, accessMtd=0x80fac30 <SeqNext>)
at execScan.c:95
#11 0x80fad6b in ExecSeqScan (node=0x82afa30) at nodeSeqscan.c:133
#12 0x80ef84c in ExecProcNode (node=0x82afa30, parent=0x0) at
execProcnode.c:285
#13 0x80ee0e6 in ExecutePlan (estate=0x82afad0, plan=0x82afa30,
operation=CMD_SELECT, numberTuples=0,

I've got 5-10 backends running, always busy.  I think if Pg could keep up
my application's parallelism would increase to 15-20.  All the backends
are connected to the same database, with 10 tables + indexes, and the
cycle looks like such:

begin
select
insert approx 20 rows on four tables
update 1 table
commit
goto begin

I think it's pretty basic.  It used to run right quickly before I moved to
7.2beta4.  My application machine saw loads of 25+, now it only sees loads
of about 3.  Postgres is throttling it.

-jwb

Re: postgres processes spending most of their time in the kernel

From

Tom Lane

Date:

28 December 2001, 16:13:02

"Jeffrey W. Baker" <jwbaker@acm.org> writes:
> On Fri, 28 Dec 2001, Tom Lane wrote:
>> Hmm, those look perfectly normal, except one would like to think it's
>> the exception not the rule to be blocking on LWLocks.

> Well, let's don't get too far off track.  The problem, I think, is using
> SysV semaphores instead of CPU-specific test-and-set locking.

No, I've abandoned that theory.  You've shown us half a dozen backtraces
now, and they all are coming from LWLock acquire/release, not spinlock
acquire/release.  If you were using SysV semaphores for spinlocks then
I'd expect to see a lot of traces leading back to spin.c.

> I think it's pretty basic.  It used to run right quickly before I moved to
> 7.2beta4.  My application machine saw loads of 25+, now it only sees loads
> of about 3.  Postgres is throttling it.

Hmm.  And what was your actual *throughput*?  The backtraces you've
shown us all correspond to places where 7.1 would have busy-waited
rather than blocking on a semaphore.  Reduction of nominal CPU load
is exactly what I'd expect, and is not in itself bad.  The real question
is how many transactions can you process per second.

            regards, tom lane

Re: postgres processes spending most of their time in the

From

"Jeffrey W. Baker"

Date:

28 December 2001, 16:43:03

On Fri, 28 Dec 2001, Tom Lane wrote:

> Hmm.  And what was your actual *throughput*?

Throughput is down noticeably, but unfortunately I don't have any
controlled instrumentation to demonstrate that.  A process which took
approximately 9 minutes to complete with 7.1 now takes approximately 17
minutes to complete, for the same amount of work.  Further, I can't really
compare the two versions because the required dump/reload 1) may be
changing something that isn't obvious, and 2) takes too long.

> The backtraces you've shown us all correspond to places where 7.1
> would have busy-waited rather than blocking on a semaphore.

But the behavior seems to me to be more like busy-waiting than blocking on
locks.  For example, both of my CPUs are now pegged whenever the database
is busy, and I never used to see tens of thousands of contexts switches
per second.  I also never used to spend half of my CPU time in the kernel,
but now I do.

Also the straces show control bouncing back and forth between processes
stuck in semop().  In a trace of 576000 syscalls, 79% were semop().

I'd believe that the problem is in the kernel's scheduler or SysV, but
they didn't seem to think so on linux-kernel.

> Reduction of nominal CPU load is exactly what I'd expect, and is not
> in itself bad.  The real question is how many transactions can you
> process per second.

What I meant to say was that the CPU load on my application machine was
much lower, which is to say that machine is just waiting around all the
time for the database machine to do something.  The CPUs on the database
machine are pinned all the time.

-jwb

Re: postgres processes spending most of their time in the kernel

From

Tom Lane

Date:

28 December 2001, 17:13:02

"Jeffrey W. Baker" <jwbaker@acm.org> writes:
>> The backtraces you've shown us all correspond to places where 7.1
>> would have busy-waited rather than blocking on a semaphore.

> But the behavior seems to me to be more like busy-waiting than blocking on
> locks.  For example, both of my CPUs are now pegged whenever the database
> is busy, and I never used to see tens of thousands of contexts switches
> per second.  I also never used to spend half of my CPU time in the kernel,
> but now I do.

[ scratches head ... ]  Well, the LWLock stuff is new code, and it's not
out of the question that there's a bug in it.  I can't see where though,
and I don't have enough info to proceed further.

We need to understand what the dynamic behavior is in your situation.
Can you poke into it further, or perhaps grant access to your machine
to someone who can?

One thing that would be quite useful is a more-detailed strace that
would let us determine whether each semop is a lock or unlock.  Can you
get strace to record the user-space PC of the semop() caller?  If no
joy there, maybe beefing up the LWDEBUG log printouts would produce
a useful trace.

            regards, tom lane

Re: postgres processes spending most of their time in the

From

"Jeffrey W. Baker"

Date:

28 December 2001, 17:16:59

On Fri, 28 Dec 2001, Tom Lane wrote:

> [ scratches head ... ]  Well, the LWLock stuff is new code, and it's not
> out of the question that there's a bug in it.  I can't see where though,
> and I don't have enough info to proceed further.

Thanks for all your attention so far.

> We need to understand what the dynamic behavior is in your situation.
> Can you poke into it further, or perhaps grant access to your machine
> to someone who can?

I can provide as much dumping, logging, and tracing as you want, with the
single constraint of upstream network bandwith.  I don't think files
larger than 100MB will be practical.  I don't know what logging will be
useful, so someone will have to tell me what to do.

I don't think I can let anybody have access to this particular machine but
if I can reduce things to a simple testcase on another machine, I'll grant
access to that.

> One thing that would be quite useful is a more-detailed strace that
> would let us determine whether each semop is a lock or unlock.  Can you
> get strace to record the user-space PC of the semop() caller?  If no
> joy there, maybe beefing up the LWDEBUG log printouts would produce
> a useful trace.

strace unfortunately doesn't deref the sembuf structure in semop.

-jwb

Re: postgres processes spending most of their time in the kernel

From

Tom Lane

Date:

28 December 2001, 17:37:00

"Jeffrey W. Baker" <jwbaker@acm.org> writes:
> strace unfortunately doesn't deref the sembuf structure in semop.

Right, but the PC would be enough to let us tell whether semop was being
called from IpcSemaphoreLock or IpcSemaphoreUnlock.

However, if the info isn't available then no use crying over it.  What
we'll need to do instead is add some debugging printouts to lwlock.c,
and hope that the extra logfile I/O calls don't change the timing enough
to hide the misbehavior.  I'll think a little about this and send you
an updated lwlock.c to try out.

            regards, tom lane

Re: postgres processes spending most of their time in the

From

"Jeffrey W. Baker"

Date:

28 December 2001, 17:43:20


On Fri, 28 Dec 2001, Tom Lane wrote:

> "Jeffrey W. Baker" <jwbaker@acm.org> writes:
> > strace unfortunately doesn't deref the sembuf structure in semop.
>
> Right, but the PC would be enough to let us tell whether semop was being
> called from IpcSemaphoreLock or IpcSemaphoreUnlock.
>
> However, if the info isn't available then no use crying over it.  What
> we'll need to do instead is add some debugging printouts to lwlock.c,
> and hope that the extra logfile I/O calls don't change the timing enough
> to hide the misbehavior.  I'll think a little about this and send you
> an updated lwlock.c to try out.

Is there any use in frobbing these frobs?

#
#       Lock Tracing
#
#trace_notify = false
#ifdef LOCK_DEBUG
#trace_locks = false
#trace_userlocks = false
#trace_lwlocks = false
#debug_deadlocks = false
#trace_lock_oidmin = 16384
#trace_lock_table = 0
#endif

-jwb