Thread: ProcArrayLock contention

ProcArrayLock contention

From
Robert Haas
Date:
I've been playing with the attached patch, which adds an additional
light-weight lock mode, LW_SHARED2.  LW_SHARED2 conflicts with
LW_SHARED and LW_EXCLUSIVE, but not with itself.  The patch changes
ProcArrayEndTransaction() to use this new mode.  IOW, multiple
processes can commit at the same time, and multiple processes can take
snapshots at the same time, but nobody can take a snapshot while
someone else is committing.

Needless to say, I don't we'd really want to apply this, because
adding a LW_SHARED2 mode that's probably only useful for ProcArrayLock
would be a pretty ugly wart.  But the results are interesting.
pgbench, scale factor 100, unlogged tables, Nate Boley's 32-core AMD
box, shared_buffers = 8GB, maintenance_work_mem = 1GB,
synchronous_commit = off, checkpoint_segments = 300,
checkpoint_timeout = 15min, checkpoint_completion_target = 0.9,
wal_writer_delay = 20ms, results are median of three five-minute runs:

#clients tps(master) tps(lwshared2)
1 657.984859 683.251582
8 4748.906750 4946.069238
32 10695.160555 17530.390578
80 7727.563437 16099.549506

That's a pretty impressive speedup, but there's trouble in paradise.
With 80 clients (but not 32 or fewer), I occasionally get the
following error:

ERROR:  t_xmin is uncommitted in tuple to be updated

So it seems that there's some way in which this locking is actually
incorrect, though I'm not seeing what it is at the moment.  Either
that, or there's some bug in the existing code that happens to be
exposed by this change.

The patch also produces a (much smaller) speedup with regular tables,
but it's hard to know how seriously to take that until the locking
issue is debugged.

Any ideas?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: ProcArrayLock contention

From
Simon Riggs
Date:
On Tue, Nov 8, 2011 at 4:52 AM, Robert Haas <robertmhaas@gmail.com> wrote:

> With 80 clients (but not 32 or fewer), I occasionally get the
> following error:
>
> ERROR:  t_xmin is uncommitted in tuple to be updated
>
> So it seems that there's some way in which this locking is actually
> incorrect, though I'm not seeing what it is at the moment.  Either
> that, or there's some bug in the existing code that happens to be
> exposed by this change.

The semantics of shared locks is that they jump the existing queue, so
this patch allows locks to be held in sequences not previously seen
when using exclusive locks.

For me, the second kind of lock should queue up normally, but then be
released en masse when possible. So queue like an exclusive, but wake
like a shared. Vaguely remember shared_queued.v1.patch

That can then produce flip-flop lock parties. A slight problem there
is that when shared locks queue they don't all queue together, a
problem which the other patch addresses, written long ago.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: ProcArrayLock contention

From
Robert Haas
Date:
On Tue, Nov 8, 2011 at 2:24 AM, YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp> wrote:
> latestCompletedXid got backward due to concurrent updates
> and it fooled TransactionIdIsInProgress?

Ah ha!  I bet that's it.

I think this could be avoided by a more sophisticated locking scheme.
Instead of waking up all the people trying to do
ProcArrayEndTransaction() and letting them all run simultaneously,
wake up one of them.  That one guy goes and clears all the XID fields
and updates latestCompletedXid, and then wakes up all the others (who
now don't even need to reacquire the spinlock to "release" the lock,
because they never really held it in the first place, but yet the work
they needed done is done).

The trick is to make something like that work within the confines of
the LWLock mechanism.  It strikes me that we have a number of places
in the system where it would be useful to leverage the queuing and
error handling facilities that the lwlock mechanism provides, but have
different rules for handling lock conflicts - either different lock
modes, or request combining, or whatever.  lwlock.c is an awfully big
chunk of code to cut-and-paste if you need an lwlock with three modes,
or some primitive that has behavior similar to an lwlock overall but
with some differences in detail.  I wonder if there's a way that we
could usefully refactor things to make that sort of thing easier.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: ProcArrayLock contention

From
yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
Date:
hi,

> I've been playing with the attached patch, which adds an additional
> light-weight lock mode, LW_SHARED2.  LW_SHARED2 conflicts with
> LW_SHARED and LW_EXCLUSIVE, but not with itself.  The patch changes
> ProcArrayEndTransaction() to use this new mode.  IOW, multiple
> processes can commit at the same time, and multiple processes can take
> snapshots at the same time, but nobody can take a snapshot while
> someone else is committing.
> 
> Needless to say, I don't we'd really want to apply this, because
> adding a LW_SHARED2 mode that's probably only useful for ProcArrayLock
> would be a pretty ugly wart.  But the results are interesting.
> pgbench, scale factor 100, unlogged tables, Nate Boley's 32-core AMD
> box, shared_buffers = 8GB, maintenance_work_mem = 1GB,
> synchronous_commit = off, checkpoint_segments = 300,
> checkpoint_timeout = 15min, checkpoint_completion_target = 0.9,
> wal_writer_delay = 20ms, results are median of three five-minute runs:
> 
> #clients tps(master) tps(lwshared2)
> 1 657.984859 683.251582
> 8 4748.906750 4946.069238
> 32 10695.160555 17530.390578
> 80 7727.563437 16099.549506
> 
> That's a pretty impressive speedup, but there's trouble in paradise.
> With 80 clients (but not 32 or fewer), I occasionally get the
> following error:
> 
> ERROR:  t_xmin is uncommitted in tuple to be updated
> 
> So it seems that there's some way in which this locking is actually
> incorrect, though I'm not seeing what it is at the moment.  Either
> that, or there's some bug in the existing code that happens to be
> exposed by this change.
> 
> The patch also produces a (much smaller) speedup with regular tables,
> but it's hard to know how seriously to take that until the locking
> issue is debugged.
> 
> Any ideas?

latestCompletedXid got backward due to concurrent updates
and it fooled TransactionIdIsInProgress?

YAMAMOTO Takashi

> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company