Gavin Hamill <gdh@laterooms.com> writes:
> On Fri, 07 Apr 2006 17:56:49 -0400
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This is not good. Did the semop storms coincide with visible
>> slowdown? (I'd assume so, but you didn't actually say...)
> Yes, there's a definate correlation here.. I attached truss to the
> main postmaster..
> ...
> And when I saw a flood of semop's for any particular PID, a second later
> in the 'topas' process list would show that PID at a 100% CPU ...
So apparently we've still got a problem with multiprocess contention for
an LWLock somewhere. It's not the BufMgrLock because that's gone in 8.1.
It could be one of the finer-grain locks that are still there, or it
could be someplace else.
Are you in a position to try your workload using PG CVS tip? There's a
nontrivial possibility that we've already fixed this --- a couple months
ago I did some work to reduce contention in the lock manager:
2005-12-11 16:02 tgl
* src/: backend/access/transam/twophase.c,
backend/storage/ipc/procarray.c, backend/storage/lmgr/README,
backend/storage/lmgr/deadlock.c, backend/storage/lmgr/lock.c,
backend/storage/lmgr/lwlock.c, backend/storage/lmgr/proc.c,
include/storage/lock.h, include/storage/lwlock.h,
include/storage/proc.h: Divide the lock manager's shared state into
'partitions', so as to reduce contention for the former single
LockMgrLock. Per my recent proposal. I set it up for 16
partitions, but on a pgbench test this gives only a marginal
further improvement over 4 partitions --- we need to test more
scenarios to choose the number of partitions.
This is unfortunately not going to help you as far as getting that
machine into production now (unless you're brave enough to run CVS tip
as production, which I certainly am not). I'm afraid you're most likely
going to have to ship that pSeries back at the end of the month, but
while you've got it it'd be awfully nice if we could use it as a testbed
...
regards, tom lane