Thread: Is the unfair lwlock behavior intended?
Hello, I encountered a strange behavior of lightweight lock in PostgreSQL 9.2. That appears to apply to 9.6, too, as far as I examinethe code. Could you tell me if the behavior is intended or needs fix? Simply put, the unfair behavior is that waiters for exclusive mode are overtaken by share-mode lockers who arrive later. PROBLEM ==================== Under a heavy read/write workload on a big machine with dozens of CPUs and hundreds of GBs of RAM, psql sometimes took morethan 30 seconds to connect to the database (and actually, it failed to connect due to our connect_timeout setting.) The backend corresponding to the psql was waiting to acquire exclusive mode lock on ProcArrayLock. Some other backends tookmore than 10 seconds to commit their transactions, waiting for exclusive mode lock on ProcArrayLock. At that time, many backend processes (I forgot the number) were acquiring and releasing share mode lock on ProcArrayLock,most of which were from TransactionIsInProgress(). CAUSE ==================== Going into the 9.2 code, I realized that those who request share mode don't pay attention to the wait queue. That is, ifsome processes hold share mode lock and someone is waiting for exclusive mode in the wait queue, other processes who comelater can get share mode overtaking those who are already waiting. If many processes repeatedly request share mode,the waiters can't get exclusive mode for a long time. Is this intentional, or should we make the later share-lockers if someone is in the wait queue? Regards Takayuki Tsunakawa
Hi!
On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
I encountered a strange behavior of lightweight lock in PostgreSQL 9.2. That appears to apply to 9.6, too, as far as I examine the code. Could you tell me if the behavior is intended or needs fix?
Simply put, the unfair behavior is that waiters for exclusive mode are overtaken by share-mode lockers who arrive later.
PROBLEM
====================
Under a heavy read/write workload on a big machine with dozens of CPUs and hundreds of GBs of RAM, psql sometimes took more than 30 seconds to connect to the database (and actually, it failed to connect due to our connect_timeout setting.) The backend corresponding to the psql was waiting to acquire exclusive mode lock on ProcArrayLock. Some other backends took more than 10 seconds to commit their transactions, waiting for exclusive mode lock on ProcArrayLock.
At that time, many backend processes (I forgot the number) were acquiring and releasing share mode lock on ProcArrayLock, most of which were from TransactionIsInProgress().
CAUSE
====================
Going into the 9.2 code, I realized that those who request share mode don't pay attention to the wait queue. That is, if some processes hold share mode lock and someone is waiting for exclusive mode in the wait queue, other processes who come later can get share mode overtaking those who are already waiting. If many processes repeatedly request share mode, the waiters can't get exclusive mode for a long time.
Is this intentional, or should we make the later share-lockers if someone is in the wait queue?
I've already observed such behavior, see [1]. I think that now there is no consensus on how to fix that. For instance, Andres express opinion that this shouldn't be fixed from LWLock side [2].
FYI, I'm planning to pickup work on CSN patch [3] for 10.0. CSN should fix various scalability issues including high ProcArrayLock contention.
References.
------
On Tue, May 24, 2016 at 1:29 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki > <tsunakawa.takay@jp.fujitsu.com> wrote: >> Is this intentional, or should we make the later share-lockers if someone >> is in the wait queue? > > I've already observed such behavior, see [1]. I think that now there is no > consensus on how to fix that. For instance, Andres express opinion that > this shouldn't be fixed from LWLock side [2]. > FYI, I'm planning to pickup work on CSN patch [3] for 10.0. CSN should fix > various scalability issues including high ProcArrayLock contention. Some amount of non-fairness is ok, but degrading to the point of complete denial of service is not very graceful. I don't think it's realistic to hope that all lwlock contention issues will be fixed any time soon. Some fallback mechanism would be extremely nice until then. It seems to me that we could fix complete starvation with no noticeable effect on common cases. The general idea would be to have a separate flag bit for starving exclusive lockers. When lock is taken in shared mode an exclusive locker puts itself on the wait queue like it does now, and then proceeds to wait on its semaphore. But differently from now, the semaphore wait has a (configurable?) timeout after which the exclusive locker wakes up and sets the new LW_FLAG_STARVING flag. When this flag is set new shared lockers will now consider the lock taken and will queue up to wait. The exclusive locker that marked itself as starving will get a chance to run once the existing shared lockers complete and wake it up. If there are other nearly starved exclusive lockers they are likely to be at the head of the wait queue and will also get a chance to run. If the wait queue is empty or has a shared locker at the head the starvation flag gets reset and all of the queued up shared lockers get their turn. With the proposed scheme non-starved cases will execute pretty much exactly what they do now, except for a slightly different check for shared lock availability. In starvation scenarios the exclusive lockers will get a chance to run once per starvation grace period. That might still not be ideal or "fair", but it is a lot better than the status quo of indefinitely blocking. PS: if/when you are picking up the CSN work, ping me to write up some of the insights and ideas I have had while thinking about this topic. Regards, Ants Aasma
On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote: > I encountered a strange behavior of lightweight lock in PostgreSQL 9.2. That appears to apply to 9.6, too, as far as Iexamine the code. Could you tell me if the behavior is intended or needs fix? > > Simply put, the unfair behavior is that waiters for exclusive mode are overtaken by share-mode lockers who arrive later. 9.5 had significant LWLock scalability improvements. This might improve performance enough so that exclusive lockers don't get completely starved. It would be helpful if you could test if it's still possible to trigger starvation with the new code. Regards, Ants Aasma
On Tue, May 24, 2016 at 4:39 PM, Ants Aasma <ants.aasma@eesti.ee> wrote: > On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki > <tsunakawa.takay@jp.fujitsu.com> wrote: >> I encountered a strange behavior of lightweight lock in PostgreSQL 9.2. That appears to apply to 9.6, too, as far asI examine the code. Could you tell me if the behavior is intended or needs fix? >> >> Simply put, the unfair behavior is that waiters for exclusive mode are overtaken by share-mode lockers who arrive later. > > 9.5 had significant LWLock scalability improvements. This might > improve performance enough so that exclusive lockers don't get > completely starved. It would be helpful if you could test if it's > still possible to trigger starvation with the new code. 9.5 didn't just increase the scalability; it also whacked the fairness aspects of this code around. Author: Andres Freund <andres@anarazel.de> Branch: master Release: REL9_5_BR [7882c3b0b] 2014-12-25 17:24:30 +0100 Convert the PGPROC->lwWaitLink list into a dlist instead of open coding it. Besides being shorter and much easier to read it changes the logic in LWLockRelease() to release all shared lockerswhen waking up any. This can yield some significant performance improvements - and the fairness isn't really muchworse than before, as we always allowed new shared lockers to jump the queue. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2016-05-24 17:20:48 -0400, Robert Haas wrote: > On Tue, May 24, 2016 at 4:39 PM, Ants Aasma <ants.aasma@eesti.ee> wrote: > > On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki > > <tsunakawa.takay@jp.fujitsu.com> wrote: > >> I encountered a strange behavior of lightweight lock in PostgreSQL 9.2. That appears to apply to 9.6, too, as far asI examine the code. Could you tell me if the behavior is intended or needs fix? > >> > >> Simply put, the unfair behavior is that waiters for exclusive mode are overtaken by share-mode lockers who arrive later. Are you sure you're actually queued behind share locks, and not primarily behind the lwlock's spinlocks? The latter is what I've seen in similar cases. > > 9.5 had significant LWLock scalability improvements. This might > > improve performance enough so that exclusive lockers don't get > > completely starved. It would be helpful if you could test if it's > > still possible to trigger starvation with the new code. > > 9.5 didn't just increase the scalability; it also whacked the fairness > aspects of this code around. True, but the difference isn't that big. As the commit says: > and the fairness isn't really much worse than before, as we always > allowed new shared lockers to jump the queue. if you have lots of incoming locks, as it appears to be the case for th OP, the fact that queued share locks are woken up earlier doesn't make much of a difference. Regards, Andres
On Tue, May 24, 2016 at 1:38 PM, Ants Aasma <ants.aasma@eesti.ee> wrote: >> I've already observed such behavior, see [1]. I think that now there is no >> consensus on how to fix that. For instance, Andres express opinion that >> this shouldn't be fixed from LWLock side [2]. >> FYI, I'm planning to pickup work on CSN patch [3] for 10.0. CSN should fix >> various scalability issues including high ProcArrayLock contention. > > Some amount of non-fairness is ok, but degrading to the point of > complete denial of service is not very graceful. I don't think it's > realistic to hope that all lwlock contention issues will be fixed any > time soon. Some fallback mechanism would be extremely nice until then. Jim Gray's paper on the "Convoy phenomenon" remains relevant, decades later: http://www.msr-waypoint.com/en-us/um/people/gray/papers/Convoy%20Phenomenon%20RJ%202516.pdf I could believe that there's a case to be made for per-LWLock fairness characteristics, which may be roughly what Andres meant. -- Peter Geoghegan
On 2016-05-24 15:34:31 -0700, Peter Geoghegan wrote: > On Tue, May 24, 2016 at 1:38 PM, Ants Aasma <ants.aasma@eesti.ee> wrote: > >> I've already observed such behavior, see [1]. I think that now there is no > >> consensus on how to fix that. For instance, Andres express opinion that > >> this shouldn't be fixed from LWLock side [2]. > >> FYI, I'm planning to pickup work on CSN patch [3] for 10.0. CSN should fix > >> various scalability issues including high ProcArrayLock contention. > > > > Some amount of non-fairness is ok, but degrading to the point of > > complete denial of service is not very graceful. I don't think it's > > realistic to hope that all lwlock contention issues will be fixed any > > time soon. Some fallback mechanism would be extremely nice until then. > > Jim Gray's paper on the "Convoy phenomenon" remains relevant, decades later: > > http://www.msr-waypoint.com/en-us/um/people/gray/papers/Convoy%20Phenomenon%20RJ%202516.pdf > > I could believe that there's a case to be made for per-LWLock fairness > characteristics, which may be roughly what Andres meant. The problem is that half-way fair locks, which are frequently acquired both in shared and exclusive mode, have really bad throughput characteristics on modern multi-socket systems. We mostly get away with fair locking on object level (after considerable work re fast-path locking), because nearly all access are non-conflicting. But prohibiting any snapshot acquisitions when there's a single LW_EXCLUSIVE ProcArrayLock waiter, can reduce throughput dramatically. I don't think randomly processing the wait queue - which is what the quoted paper essentially describes - is really useful here. We intentionally *ignore* the wait queue entirely if a lock is not conflicting, and that's what can prohibit exclusive locks from ever succeeding, because you essentially can get repetitions of: S1: acq(SHARED) -> shared = 1 S2: acq(EXCLUSIVE) -> shared = 1, waiters = 1 <block> ... S3: acq(SHARED) -> shared = 2 S1: rel(SHARED) -> shared = 1 S1: acq(SHARED) -> shared = 2 S3: rel(SHARED) -> shared = 1 ... Now we potentially could mark individual lwlocks as being fair locks. But which ones would those be? Certainly not ProcArrayLock, it's way too heavily contended. Regards, Andres
On Tue, May 24, 2016 at 3:50 PM, Andres Freund <andres@anarazel.de> wrote: >> Jim Gray's paper on the "Convoy phenomenon" remains relevant, decades later: >> >> http://www.msr-waypoint.com/en-us/um/people/gray/papers/Convoy%20Phenomenon%20RJ%202516.pdf >> >> I could believe that there's a case to be made for per-LWLock fairness >> characteristics, which may be roughly what Andres meant. > > The problem is that half-way fair locks, which are frequently acquired > both in shared and exclusive mode, have really bad throughput > characteristics on modern multi-socket systems. We mostly get away with > fair locking on object level (after considerable work re fast-path > locking), because nearly all access are non-conflicting. But > prohibiting any snapshot acquisitions when there's a single LW_EXCLUSIVE > ProcArrayLock waiter, can reduce throughput dramatically. That's a useful example of where things are different for different, well known cases. > I don't think randomly processing the wait queue - which is what the > quoted paper essentially describes - is really useful here. We > intentionally *ignore* the wait queue entirely if a lock is not > conflicting, and that's what can prohibit exclusive locks from ever > succeeding, because you essentially can get repetitions of: My point about the paper is that it made me reconsider my assumption that fairness, according to some definition, was essential for something like LWLocks. The paper made me aware that it's perfectly acceptable practice to make informed trade-offs around fairness for what it calls "latches" (what we'd call LWLocks). Maybe that was obvious to you, but it wasn't to me. -- Peter Geoghegan
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Alexander Korotkov I've already observed such behavior, see [1]. I think that now there is no consensus on how to fix that. For instance,Andres express opinion that this shouldn't be fixed from LWLock side [2]. Thank you for nice pointers. I understood. > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Ants Aasma > 9.5 had significant LWLock scalability improvements. This might > improve performance enough so that exclusive lockers don't get > completely starved. It would be helpful if you could test if it's > still possible to trigger starvation with the new code. Unfortunately, we cannot test anymore because the customer's system is now in production. The heavy ProcArray contentionwas caused mainly by too many tuple visibility tests, which in turn were caused by unintended sequential scans. Then the customer avoided the contention problem by adding an index and reducing the number of concurrent active sessions. > From: Andres Freund [mailto:andres@anarazel.de] > Are you sure you're actually queued behind share locks, and not primarily > behind the lwlock's spinlocks? The latter is what I've seen in similar cases. I think so, because the stack trace showed that the backends were waiting in TransactionIsInProgress (or some function inthe commit processing) -> LWLockAcquire -> PGSemaphoreLock -> semop(), not including spinlock-related functions. > The problem is that half-way fair locks, which are frequently acquired both > in shared and exclusive mode, have really bad throughput characteristics > on modern multi-socket systems. We mostly get away with fair locking on > object level (after considerable work re fast-path locking), because nearly > all access are non-conflicting. But prohibiting any snapshot acquisitions > when there's a single LW_EXCLUSIVE ProcArrayLock waiter, can reduce > throughput dramatically. Thanks, I understood that you chose total throughput over stable response time. I feel empathetic with the decision, andI think it's the way to go. OTOH, maybe I'll object if I'm the pitiful waiter... I'll get out of the Disneyland if their staff said "Please stay in theline as long as there are efficient guests behind you. That's the benefit for the whole Disneyland." Regards Takayuki Tsunakawa
On Tue, May 24, 2016 at 6:50 PM, Andres Freund <andres@anarazel.de> wrote: > Now we potentially could mark individual lwlocks as being fair > locks. But which ones would those be? Certainly not ProcArrayLock, it's > way too heavily contended. I think this is looking at the problem from the wrong angle. The OP's complaint is pretty fair: a 30-second wait for ProcArrayLock is horrendous, and if that's actually something that is happening with any significant regularity on well-configured systems, we need to fix it somehow. Whether FIFO queueing on LWLocks is the right way to fix it is a separate question, and I agree that the answer is probably "no" ... although it does strike me that Amit's work on group XID clearing might make that a lot more palatable, since you won't normally have more than one exclusive waiter in the queue at a time. But regardless of that, I don't think we can just say "oh, well, sometimes the system becomes totally unresponsive for more than 30 seconds, but we don't care". We have to care about that. What I think we need to understand better from is Tsunakawa-san is (1) whether this behavior still occurs in 9.6 and (2) what sort of test case is required to produce it. Depending on how extreme that test case is, we can decide how much of a problem we think this is. And we can test potential solutions to see whether and to what degree they address it, as well as how they affect throughput. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2016-05-25 14:09:43 -0400, Robert Haas wrote: > I think this is looking at the problem from the wrong angle. The OP's > complaint is pretty fair: a 30-second wait for ProcArrayLock is > horrendous, and if that's actually something that is happening with > any significant regularity on well-configured systems, we need to fix > it somehow. No disagreement there. > I don't think we can just say "oh, well, > sometimes the system becomes totally unresponsive for more than 30 > seconds, but we don't care". We have to care about that. I don't think anybody was doing that? The first questions on this thread were about upgrading and retesting... Andres
On 2016-05-25 11:15:37 -0700, Andres Freund wrote: > On 2016-05-25 14:09:43 -0400, Robert Haas wrote: > I don't think anybody was doing that? The first questions on this thread > were about upgrading and retesting... Something I've repeatedly wondered about around this topic is whether we could split ProcArrayLock into one that governs entering or leaving the procarray from the one that's for consistent snapshots. I think there's no need for ProcArrayAdd/ProcArrayRemove/CountDBBackends()/CancelDBBackends()/ CountUserBackends()/CountOtherDBBackends() (and potentially some more) to conflict with GetSnapshotData()/ProcArrayEndTransaction()/ TransactionIdIsInProgress()/TransactionIdIsActive()/GetOldestXmin()/... as long as we're careful to ensure that by the time a entry is removed ProcArrayEndTransaction() has been called. Greetings, Andres Freund
Hi, On 2016-05-24 06:03:07 +0000, Tsunakawa, Takayuki wrote: > At that time, many backend processes (I forgot the number) were acquiring and releasing share mode lock on ProcArrayLock,most of which were from TransactionIsInProgress(). FWIW, I suspect that 9.6 might be a fair bit better here, due to http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=8a7d0701814a4e293efad22091d6f6fb441bbe1c Regards, Andres
On Wed, May 25, 2016 at 3:26 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-05-25 11:15:37 -0700, Andres Freund wrote: >> On 2016-05-25 14:09:43 -0400, Robert Haas wrote: >> I don't think anybody was doing that? The first questions on this thread >> were about upgrading and retesting... > > Something I've repeatedly wondered about around this topic is whether we > could split ProcArrayLock into one that governs entering or leaving the > procarray from the one that's for consistent snapshots. I think there's > no need for ProcArrayAdd/ProcArrayRemove/CountDBBackends()/CancelDBBackends()/ > CountUserBackends()/CountOtherDBBackends() (and potentially some more) > to conflict with GetSnapshotData()/ProcArrayEndTransaction()/ > TransactionIdIsInProgress()/TransactionIdIsActive()/GetOldestXmin()/... > as long as we're careful to ensure that by the time a entry is removed > ProcArrayEndTransaction() has been called. I'm doubtful about how much that would reduce contention, because when I've used perf or inserted instrumentation to see which actual call sides are the problem, it's always been entirely down to GetSnapshotData() and ProcArrayEndTransaction(). However, I think it might be worth doing anyway, because redesigning the whole mechanism might be easier if that lock weren't doing so many only-semi-related things. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2016-05-25 17:24:22 -0400, Robert Haas wrote: > On Wed, May 25, 2016 at 3:26 PM, Andres Freund <andres@anarazel.de> wrote: > > On 2016-05-25 11:15:37 -0700, Andres Freund wrote: > >> On 2016-05-25 14:09:43 -0400, Robert Haas wrote: > >> I don't think anybody was doing that? The first questions on this thread > >> were about upgrading and retesting... > > > > Something I've repeatedly wondered about around this topic is whether we > > could split ProcArrayLock into one that governs entering or leaving the > > procarray from the one that's for consistent snapshots. I think there's > > no need for ProcArrayAdd/ProcArrayRemove/CountDBBackends()/CancelDBBackends()/ > > CountUserBackends()/CountOtherDBBackends() (and potentially some more) > > to conflict with GetSnapshotData()/ProcArrayEndTransaction()/ > > TransactionIdIsInProgress()/TransactionIdIsActive()/GetOldestXmin()/... > > as long as we're careful to ensure that by the time a entry is removed > > ProcArrayEndTransaction() has been called. > > I'm doubtful about how much that would reduce contention, Yea, I don't think it'll usually will make a huge difference. It'd likely have solved this specific complaint though; and it'd remove nearly all other exclusive acquisitions of ProcArrayLock other than ProcArrayEndTransaction(). > However, I think it might be worth doing anyway, because redesigning > the whole mechanism might be easier if that lock weren't doing so many > only-semi-related things. Very much agreed upon this. Having a distinct 'SnapshotLock', for EndTransaction() and GetSnapshotData() would be a first step in making the locking more granular. Andres