Thread: Is the unfair lwlock behavior intended?

Is the unfair lwlock behavior intended?

From
"Tsunakawa, Takayuki"
Date:
Hello,

I encountered a strange behavior of lightweight lock in PostgreSQL 9.2.  That appears to apply to 9.6, too, as far as I
examinethe code.  Could you tell me if the behavior is intended or needs fix?
 

Simply put, the unfair behavior is that waiters for exclusive mode are overtaken by share-mode lockers who arrive
later.


PROBLEM
====================

Under a heavy read/write workload on a big machine with dozens of CPUs and hundreds of GBs of RAM, psql sometimes took
morethan 30 seconds to connect to the database (and actually, it failed to connect due to our connect_timeout setting.)
The backend corresponding to the psql was waiting to acquire exclusive mode lock on ProcArrayLock.  Some other backends
tookmore than 10 seconds to commit their transactions, waiting for exclusive mode lock on ProcArrayLock.
 

At that time, many backend processes (I forgot the number) were acquiring and releasing share mode lock on
ProcArrayLock,most of which were from TransactionIsInProgress().
 


CAUSE
====================

Going into the 9.2 code, I realized that those who request share mode don't pay attention to the wait queue.  That is,
ifsome processes hold share mode lock and someone is waiting for exclusive mode in the wait queue, other processes who
comelater can get share mode overtaking those who are already waiting.  If many processes repeatedly request share
mode,the waiters can't get exclusive mode for a long time.
 

Is this intentional, or should we make the later share-lockers if someone is in the wait queue?

Regards
Takayuki Tsunakawa




Re: Is the unfair lwlock behavior intended?

From
Alexander Korotkov
Date:
Hi!

On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
I encountered a strange behavior of lightweight lock in PostgreSQL 9.2.  That appears to apply to 9.6, too, as far as I examine the code.  Could you tell me if the behavior is intended or needs fix?

Simply put, the unfair behavior is that waiters for exclusive mode are overtaken by share-mode lockers who arrive later.


PROBLEM
====================

Under a heavy read/write workload on a big machine with dozens of CPUs and hundreds of GBs of RAM, psql sometimes took more than 30 seconds to connect to the database (and actually, it failed to connect due to our connect_timeout setting.)  The backend corresponding to the psql was waiting to acquire exclusive mode lock on ProcArrayLock.  Some other backends took more than 10 seconds to commit their transactions, waiting for exclusive mode lock on ProcArrayLock.

At that time, many backend processes (I forgot the number) were acquiring and releasing share mode lock on ProcArrayLock, most of which were from TransactionIsInProgress().


CAUSE
====================

Going into the 9.2 code, I realized that those who request share mode don't pay attention to the wait queue.  That is, if some processes hold share mode lock and someone is waiting for exclusive mode in the wait queue, other processes who come later can get share mode overtaking those who are already waiting.  If many processes repeatedly request share mode, the waiters can't get exclusive mode for a long time.

Is this intentional, or should we make the later share-lockers if someone is in the wait queue?

I've already observed such behavior, see [1].  I think that now there is no consensus on how to fix that.  For instance, Andres express opinion that this shouldn't be fixed from LWLock side [2].
FYI, I'm planning to pickup work on CSN patch [3] for 10.0.  CSN should fix various scalability issues including high ProcArrayLock contention.

References.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
 

Re: Is the unfair lwlock behavior intended?

From
Ants Aasma
Date:
On Tue, May 24, 2016 at 1:29 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki
> <tsunakawa.takay@jp.fujitsu.com> wrote:
>> Is this intentional, or should we make the later share-lockers if someone
>> is in the wait queue?
>
> I've already observed such behavior, see [1].  I think that now there is no
> consensus on how to fix that.  For instance, Andres express opinion that
> this shouldn't be fixed from LWLock side [2].
> FYI, I'm planning to pickup work on CSN patch [3] for 10.0.  CSN should fix
> various scalability issues including high ProcArrayLock contention.

Some amount of non-fairness is ok, but degrading to the point of
complete denial of service is not very graceful. I don't think it's
realistic to hope that all lwlock contention issues will be fixed any
time soon. Some fallback mechanism would be extremely nice until then.

It seems to me that we could fix complete starvation with no
noticeable effect on common cases. The general idea would be to have a
separate flag bit for starving exclusive lockers. When lock is taken
in shared mode an exclusive locker puts itself on the wait queue like
it does now, and then proceeds to wait on its semaphore. But
differently from now, the semaphore wait has a (configurable?) timeout
after which the exclusive locker wakes up and sets the new
LW_FLAG_STARVING flag. When this flag is set new shared lockers will
now consider the lock taken and will queue up to wait. The exclusive
locker that marked itself as starving will get a chance to run once
the existing shared lockers complete and wake it up. If there are
other nearly starved exclusive lockers they are likely to be at the
head of the wait queue and will also get a chance to run. If the wait
queue is empty or has a shared locker at the head the starvation flag
gets reset and all of the queued up shared lockers get their turn.

With the proposed scheme non-starved cases will execute pretty much
exactly what they do now, except for a slightly different check for
shared lock availability. In starvation scenarios the exclusive
lockers will get a chance to run once per starvation grace period.
That might still not be ideal or "fair", but it is a lot better than
the status quo of indefinitely blocking.

PS: if/when you are picking up the CSN work, ping me to write up some
of the insights and ideas I have had while thinking about this topic.

Regards,
Ants Aasma



Re: Is the unfair lwlock behavior intended?

From
Ants Aasma
Date:
On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> I encountered a strange behavior of lightweight lock in PostgreSQL 9.2.  That appears to apply to 9.6, too, as far as
Iexamine the code.  Could you tell me if the behavior is intended or needs fix?
 
>
> Simply put, the unfair behavior is that waiters for exclusive mode are overtaken by share-mode lockers who arrive
later.

9.5 had significant LWLock scalability improvements. This might
improve performance enough so that exclusive lockers don't get
completely starved. It would be helpful if you could test if it's
still possible to trigger starvation with the new code.

Regards,
Ants Aasma



Re: Is the unfair lwlock behavior intended?

From
Robert Haas
Date:
On Tue, May 24, 2016 at 4:39 PM, Ants Aasma <ants.aasma@eesti.ee> wrote:
> On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki
> <tsunakawa.takay@jp.fujitsu.com> wrote:
>> I encountered a strange behavior of lightweight lock in PostgreSQL 9.2.  That appears to apply to 9.6, too, as far
asI examine the code.  Could you tell me if the behavior is intended or needs fix?
 
>>
>> Simply put, the unfair behavior is that waiters for exclusive mode are overtaken by share-mode lockers who arrive
later.
>
> 9.5 had significant LWLock scalability improvements. This might
> improve performance enough so that exclusive lockers don't get
> completely starved. It would be helpful if you could test if it's
> still possible to trigger starvation with the new code.

9.5 didn't just increase the scalability; it also whacked the fairness
aspects of this code around.

Author: Andres Freund <andres@anarazel.de>
Branch: master Release: REL9_5_BR [7882c3b0b] 2014-12-25 17:24:30 +0100
   Convert the PGPROC->lwWaitLink list into a dlist instead of open coding it.
   Besides being shorter and much easier to read it changes the logic in   LWLockRelease() to release all shared
lockerswhen waking up any. This   can yield some significant performance improvements - and the fairness   isn't really
muchworse than before, as we always allowed new shared   lockers to jump the queue.
 

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Is the unfair lwlock behavior intended?

From
Andres Freund
Date:
On 2016-05-24 17:20:48 -0400, Robert Haas wrote:
> On Tue, May 24, 2016 at 4:39 PM, Ants Aasma <ants.aasma@eesti.ee> wrote:
> > On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki
> > <tsunakawa.takay@jp.fujitsu.com> wrote:
> >> I encountered a strange behavior of lightweight lock in PostgreSQL 9.2.  That appears to apply to 9.6, too, as far
asI examine the code.  Could you tell me if the behavior is intended or needs fix?
 
> >>
> >> Simply put, the unfair behavior is that waiters for exclusive mode are overtaken by share-mode lockers who arrive
later.

Are you sure you're actually queued behind share locks, and not
primarily behind the lwlock's spinlocks? The latter is what I've seen in
similar cases.


> > 9.5 had significant LWLock scalability improvements. This might
> > improve performance enough so that exclusive lockers don't get
> > completely starved. It would be helpful if you could test if it's
> > still possible to trigger starvation with the new code.
> 
> 9.5 didn't just increase the scalability; it also whacked the fairness
> aspects of this code around.

True, but the difference isn't that big. As the commit says:


> and the fairness isn't really much worse than before, as we always
> allowed new shared lockers to jump the queue.

if you have lots of incoming locks, as it appears to be the case for th
OP, the fact that queued share locks are woken up earlier doesn't make
much of a difference.


Regards,

Andres



Re: Is the unfair lwlock behavior intended?

From
Peter Geoghegan
Date:
On Tue, May 24, 2016 at 1:38 PM, Ants Aasma <ants.aasma@eesti.ee> wrote:
>> I've already observed such behavior, see [1].  I think that now there is no
>> consensus on how to fix that.  For instance, Andres express opinion that
>> this shouldn't be fixed from LWLock side [2].
>> FYI, I'm planning to pickup work on CSN patch [3] for 10.0.  CSN should fix
>> various scalability issues including high ProcArrayLock contention.
>
> Some amount of non-fairness is ok, but degrading to the point of
> complete denial of service is not very graceful. I don't think it's
> realistic to hope that all lwlock contention issues will be fixed any
> time soon. Some fallback mechanism would be extremely nice until then.

Jim Gray's paper on the "Convoy phenomenon" remains relevant, decades later:

http://www.msr-waypoint.com/en-us/um/people/gray/papers/Convoy%20Phenomenon%20RJ%202516.pdf

I could believe that there's a case to be made for per-LWLock fairness
characteristics, which may be roughly what Andres meant.

-- 
Peter Geoghegan



Re: Is the unfair lwlock behavior intended?

From
Andres Freund
Date:
On 2016-05-24 15:34:31 -0700, Peter Geoghegan wrote:
> On Tue, May 24, 2016 at 1:38 PM, Ants Aasma <ants.aasma@eesti.ee> wrote:
> >> I've already observed such behavior, see [1].  I think that now there is no
> >> consensus on how to fix that.  For instance, Andres express opinion that
> >> this shouldn't be fixed from LWLock side [2].
> >> FYI, I'm planning to pickup work on CSN patch [3] for 10.0.  CSN should fix
> >> various scalability issues including high ProcArrayLock contention.
> >
> > Some amount of non-fairness is ok, but degrading to the point of
> > complete denial of service is not very graceful. I don't think it's
> > realistic to hope that all lwlock contention issues will be fixed any
> > time soon. Some fallback mechanism would be extremely nice until then.
> 
> Jim Gray's paper on the "Convoy phenomenon" remains relevant, decades later:
> 
> http://www.msr-waypoint.com/en-us/um/people/gray/papers/Convoy%20Phenomenon%20RJ%202516.pdf
> 
> I could believe that there's a case to be made for per-LWLock fairness
> characteristics, which may be roughly what Andres meant.

The problem is that half-way fair locks, which are frequently acquired
both in shared and exclusive mode, have really bad throughput
characteristics on modern multi-socket systems. We mostly get away with
fair locking on object level (after considerable work re fast-path
locking), because nearly all access are non-conflicting.  But
prohibiting any snapshot acquisitions when there's a single LW_EXCLUSIVE
ProcArrayLock waiter, can reduce throughput dramatically.

I don't think randomly processing the wait queue - which is what the
quoted paper essentially describes - is really useful here. We
intentionally *ignore* the wait queue entirely if a lock is not
conflicting, and that's what can prohibit exclusive locks from ever
succeeding, because you essentially can get repetitions of:

S1: acq(SHARED) -> shared = 1
S2: acq(EXCLUSIVE) -> shared = 1, waiters = 1 <block>
...
S3: acq(SHARED) -> shared = 2
S1: rel(SHARED) -> shared = 1
S1: acq(SHARED) -> shared = 2
S3: rel(SHARED) -> shared = 1
...

Now we potentially could mark individual lwlocks as being fair
locks. But which ones would those be? Certainly not ProcArrayLock, it's
way too heavily contended.

Regards,

Andres



Re: Is the unfair lwlock behavior intended?

From
Peter Geoghegan
Date:
On Tue, May 24, 2016 at 3:50 PM, Andres Freund <andres@anarazel.de> wrote:
>> Jim Gray's paper on the "Convoy phenomenon" remains relevant, decades later:
>>
>> http://www.msr-waypoint.com/en-us/um/people/gray/papers/Convoy%20Phenomenon%20RJ%202516.pdf
>>
>> I could believe that there's a case to be made for per-LWLock fairness
>> characteristics, which may be roughly what Andres meant.
>
> The problem is that half-way fair locks, which are frequently acquired
> both in shared and exclusive mode, have really bad throughput
> characteristics on modern multi-socket systems. We mostly get away with
> fair locking on object level (after considerable work re fast-path
> locking), because nearly all access are non-conflicting.  But
> prohibiting any snapshot acquisitions when there's a single LW_EXCLUSIVE
> ProcArrayLock waiter, can reduce throughput dramatically.

That's a useful example of where things are different for different,
well known cases.

> I don't think randomly processing the wait queue - which is what the
> quoted paper essentially describes - is really useful here. We
> intentionally *ignore* the wait queue entirely if a lock is not
> conflicting, and that's what can prohibit exclusive locks from ever
> succeeding, because you essentially can get repetitions of:

My point about the paper is that it made me reconsider my assumption
that fairness, according to some definition, was essential for
something like LWLocks. The paper made me aware that it's perfectly
acceptable practice to make informed trade-offs around fairness for
what it calls "latches" (what we'd call LWLocks). Maybe that was
obvious to you, but it wasn't to me.

-- 
Peter Geoghegan



Re: Is the unfair lwlock behavior intended?

From
"Tsunakawa, Takayuki"
Date:
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Alexander Korotkov
I've already observed such behavior, see [1].  I think that now there is no consensus on how to fix that.  For
instance,Andres express opinion that this shouldn't be fixed from LWLock side [2].
 


Thank you for nice pointers.  I understood.


> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Ants Aasma
> 9.5 had significant LWLock scalability improvements. This might
> improve performance enough so that exclusive lockers don't get
> completely starved. It would be helpful if you could test if it's
> still possible to trigger starvation with the new code.

Unfortunately, we cannot test anymore because the customer's system is now in production.  The heavy ProcArray
contentionwas caused mainly by too many tuple visibility tests, which in turn were caused by unintended sequential
scans. Then the customer avoided the contention problem by adding an index and reducing the number of concurrent active
sessions.

> From: Andres Freund [mailto:andres@anarazel.de]
> Are you sure you're actually queued behind share locks, and not primarily
> behind the lwlock's spinlocks? The latter is what I've seen in similar cases.

I think so, because the stack trace showed that the backends were waiting in TransactionIsInProgress (or some function
inthe commit processing) -> LWLockAcquire -> PGSemaphoreLock -> semop(), not including spinlock-related functions.
 


> The problem is that half-way fair locks, which are frequently acquired both
> in shared and exclusive mode, have really bad throughput characteristics
> on modern multi-socket systems. We mostly get away with fair locking on
> object level (after considerable work re fast-path locking), because nearly
> all access are non-conflicting.  But prohibiting any snapshot acquisitions
> when there's a single LW_EXCLUSIVE ProcArrayLock waiter, can reduce
> throughput dramatically.

Thanks, I understood that you chose total throughput over stable response time.  I feel empathetic with the decision,
andI think it's the way to go.
 

OTOH, maybe I'll object if I'm the pitiful waiter... I'll get out of the Disneyland if their staff said "Please stay in
theline as long as there are efficient guests behind you.  That's the benefit for the whole Disneyland."
 

Regards
Takayuki Tsunakawa




Re: Is the unfair lwlock behavior intended?

From
Robert Haas
Date:
On Tue, May 24, 2016 at 6:50 PM, Andres Freund <andres@anarazel.de> wrote:
> Now we potentially could mark individual lwlocks as being fair
> locks. But which ones would those be? Certainly not ProcArrayLock, it's
> way too heavily contended.

I think this is looking at the problem from the wrong angle.  The OP's
complaint is pretty fair: a 30-second wait for ProcArrayLock is
horrendous, and if that's actually something that is happening with
any significant regularity on well-configured systems, we need to fix
it somehow.  Whether FIFO queueing on LWLocks is the right way to fix
it is a separate question, and I agree that the answer is probably
"no" ... although it does strike me that Amit's work on group XID
clearing might make that a lot more palatable, since you won't
normally have more than one exclusive waiter in the queue at a time.
But regardless of that, I don't think we can just say "oh, well,
sometimes the system becomes totally unresponsive for more than 30
seconds, but we don't care".  We have to care about that.

What I think we need to understand better from is Tsunakawa-san is (1)
whether this behavior still occurs in 9.6 and (2) what sort of test
case is required to produce it.  Depending on how extreme that test
case is, we can decide how much of a problem we think this is.  And we
can test potential solutions to see whether and to what degree they
address it, as well as how they affect throughput.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Is the unfair lwlock behavior intended?

From
Andres Freund
Date:
On 2016-05-25 14:09:43 -0400, Robert Haas wrote:
> I think this is looking at the problem from the wrong angle.  The OP's
> complaint is pretty fair: a 30-second wait for ProcArrayLock is
> horrendous, and if that's actually something that is happening with
> any significant regularity on well-configured systems, we need to fix
> it somehow.

No disagreement there.


> I don't think we can just say "oh, well,
> sometimes the system becomes totally unresponsive for more than 30
> seconds, but we don't care".  We have to care about that.

I don't think anybody was doing that? The first questions on this thread
were about upgrading and retesting...


Andres



Re: Is the unfair lwlock behavior intended?

From
Andres Freund
Date:
On 2016-05-25 11:15:37 -0700, Andres Freund wrote:
> On 2016-05-25 14:09:43 -0400, Robert Haas wrote:
> I don't think anybody was doing that? The first questions on this thread
> were about upgrading and retesting...

Something I've repeatedly wondered about around this topic is whether we
could split ProcArrayLock into one that governs entering or leaving the
procarray from the one that's for consistent snapshots.  I think there's
no need for ProcArrayAdd/ProcArrayRemove/CountDBBackends()/CancelDBBackends()/
CountUserBackends()/CountOtherDBBackends() (and potentially some more)
to conflict with GetSnapshotData()/ProcArrayEndTransaction()/
TransactionIdIsInProgress()/TransactionIdIsActive()/GetOldestXmin()/...
as long as we're careful to ensure that by the time a entry is removed
ProcArrayEndTransaction() has been called.

Greetings,

Andres Freund



Re: Is the unfair lwlock behavior intended?

From
Andres Freund
Date:
Hi,

On 2016-05-24 06:03:07 +0000, Tsunakawa, Takayuki wrote:
> At that time, many backend processes (I forgot the number) were acquiring and releasing share mode lock on
ProcArrayLock,most of which were from TransactionIsInProgress().
 

FWIW, I suspect that 9.6 might be a fair bit better here, due to
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=8a7d0701814a4e293efad22091d6f6fb441bbe1c

Regards,

Andres



Re: Is the unfair lwlock behavior intended?

From
Robert Haas
Date:
On Wed, May 25, 2016 at 3:26 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-05-25 11:15:37 -0700, Andres Freund wrote:
>> On 2016-05-25 14:09:43 -0400, Robert Haas wrote:
>> I don't think anybody was doing that? The first questions on this thread
>> were about upgrading and retesting...
>
> Something I've repeatedly wondered about around this topic is whether we
> could split ProcArrayLock into one that governs entering or leaving the
> procarray from the one that's for consistent snapshots.  I think there's
> no need for ProcArrayAdd/ProcArrayRemove/CountDBBackends()/CancelDBBackends()/
> CountUserBackends()/CountOtherDBBackends() (and potentially some more)
> to conflict with GetSnapshotData()/ProcArrayEndTransaction()/
> TransactionIdIsInProgress()/TransactionIdIsActive()/GetOldestXmin()/...
> as long as we're careful to ensure that by the time a entry is removed
> ProcArrayEndTransaction() has been called.

I'm doubtful about how much that would reduce contention, because when
I've used perf or inserted instrumentation to see which actual call
sides are the problem, it's always been entirely down to
GetSnapshotData() and ProcArrayEndTransaction().  However, I think it
might be worth doing anyway, because redesigning the whole mechanism
might be easier if that lock weren't doing so many only-semi-related
things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Is the unfair lwlock behavior intended?

From
Andres Freund
Date:
On 2016-05-25 17:24:22 -0400, Robert Haas wrote:
> On Wed, May 25, 2016 at 3:26 PM, Andres Freund <andres@anarazel.de> wrote:
> > On 2016-05-25 11:15:37 -0700, Andres Freund wrote:
> >> On 2016-05-25 14:09:43 -0400, Robert Haas wrote:
> >> I don't think anybody was doing that? The first questions on this thread
> >> were about upgrading and retesting...
> >
> > Something I've repeatedly wondered about around this topic is whether we
> > could split ProcArrayLock into one that governs entering or leaving the
> > procarray from the one that's for consistent snapshots.  I think there's
> > no need for ProcArrayAdd/ProcArrayRemove/CountDBBackends()/CancelDBBackends()/
> > CountUserBackends()/CountOtherDBBackends() (and potentially some more)
> > to conflict with GetSnapshotData()/ProcArrayEndTransaction()/
> > TransactionIdIsInProgress()/TransactionIdIsActive()/GetOldestXmin()/...
> > as long as we're careful to ensure that by the time a entry is removed
> > ProcArrayEndTransaction() has been called.
> 
> I'm doubtful about how much that would reduce contention,

Yea, I don't think it'll usually will make a huge difference.  It'd
likely have solved this specific complaint though; and it'd remove
nearly all other exclusive acquisitions of ProcArrayLock other than
ProcArrayEndTransaction().


> However, I think it might be worth doing anyway, because redesigning
> the whole mechanism might be easier if that lock weren't doing so many
> only-semi-related things.

Very much agreed upon this.  Having a distinct 'SnapshotLock', for
EndTransaction() and GetSnapshotData() would be a first step in making
the locking more granular.

Andres