Re: Is the unfair lwlock behavior intended? - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Is the unfair lwlock behavior intended?
Date
Msg-id 20160524225015.4zj5hkdizuzedcjj@alap3.anarazel.de
Whole thread Raw
In response to Re: Is the unfair lwlock behavior intended?  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Is the unfair lwlock behavior intended?
Re: Is the unfair lwlock behavior intended?
Re: Is the unfair lwlock behavior intended?
List pgsql-hackers
On 2016-05-24 15:34:31 -0700, Peter Geoghegan wrote:
> On Tue, May 24, 2016 at 1:38 PM, Ants Aasma <ants.aasma@eesti.ee> wrote:
> >> I've already observed such behavior, see [1].  I think that now there is no
> >> consensus on how to fix that.  For instance, Andres express opinion that
> >> this shouldn't be fixed from LWLock side [2].
> >> FYI, I'm planning to pickup work on CSN patch [3] for 10.0.  CSN should fix
> >> various scalability issues including high ProcArrayLock contention.
> >
> > Some amount of non-fairness is ok, but degrading to the point of
> > complete denial of service is not very graceful. I don't think it's
> > realistic to hope that all lwlock contention issues will be fixed any
> > time soon. Some fallback mechanism would be extremely nice until then.
> 
> Jim Gray's paper on the "Convoy phenomenon" remains relevant, decades later:
> 
> http://www.msr-waypoint.com/en-us/um/people/gray/papers/Convoy%20Phenomenon%20RJ%202516.pdf
> 
> I could believe that there's a case to be made for per-LWLock fairness
> characteristics, which may be roughly what Andres meant.

The problem is that half-way fair locks, which are frequently acquired
both in shared and exclusive mode, have really bad throughput
characteristics on modern multi-socket systems. We mostly get away with
fair locking on object level (after considerable work re fast-path
locking), because nearly all access are non-conflicting.  But
prohibiting any snapshot acquisitions when there's a single LW_EXCLUSIVE
ProcArrayLock waiter, can reduce throughput dramatically.

I don't think randomly processing the wait queue - which is what the
quoted paper essentially describes - is really useful here. We
intentionally *ignore* the wait queue entirely if a lock is not
conflicting, and that's what can prohibit exclusive locks from ever
succeeding, because you essentially can get repetitions of:

S1: acq(SHARED) -> shared = 1
S2: acq(EXCLUSIVE) -> shared = 1, waiters = 1 <block>
...
S3: acq(SHARED) -> shared = 2
S1: rel(SHARED) -> shared = 1
S1: acq(SHARED) -> shared = 2
S3: rel(SHARED) -> shared = 1
...

Now we potentially could mark individual lwlocks as being fair
locks. But which ones would those be? Certainly not ProcArrayLock, it's
way too heavily contended.

Regards,

Andres



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Is the unfair lwlock behavior intended?
Next
From: Tom Lane
Date:
Subject: Re: statistics for shared catalogs not updated when autovacuum is off