On 2016-05-24 15:34:31 -0700, Peter Geoghegan wrote:
> On Tue, May 24, 2016 at 1:38 PM, Ants Aasma <ants.aasma@eesti.ee> wrote:
> >> I've already observed such behavior, see [1]. I think that now there is no
> >> consensus on how to fix that. For instance, Andres express opinion that
> >> this shouldn't be fixed from LWLock side [2].
> >> FYI, I'm planning to pickup work on CSN patch [3] for 10.0. CSN should fix
> >> various scalability issues including high ProcArrayLock contention.
> >
> > Some amount of non-fairness is ok, but degrading to the point of
> > complete denial of service is not very graceful. I don't think it's
> > realistic to hope that all lwlock contention issues will be fixed any
> > time soon. Some fallback mechanism would be extremely nice until then.
>
> Jim Gray's paper on the "Convoy phenomenon" remains relevant, decades later:
>
> http://www.msr-waypoint.com/en-us/um/people/gray/papers/Convoy%20Phenomenon%20RJ%202516.pdf
>
> I could believe that there's a case to be made for per-LWLock fairness
> characteristics, which may be roughly what Andres meant.
The problem is that half-way fair locks, which are frequently acquired
both in shared and exclusive mode, have really bad throughput
characteristics on modern multi-socket systems. We mostly get away with
fair locking on object level (after considerable work re fast-path
locking), because nearly all access are non-conflicting. But
prohibiting any snapshot acquisitions when there's a single LW_EXCLUSIVE
ProcArrayLock waiter, can reduce throughput dramatically.
I don't think randomly processing the wait queue - which is what the
quoted paper essentially describes - is really useful here. We
intentionally *ignore* the wait queue entirely if a lock is not
conflicting, and that's what can prohibit exclusive locks from ever
succeeding, because you essentially can get repetitions of:
S1: acq(SHARED) -> shared = 1
S2: acq(EXCLUSIVE) -> shared = 1, waiters = 1 <block>
...
S3: acq(SHARED) -> shared = 2
S1: rel(SHARED) -> shared = 1
S1: acq(SHARED) -> shared = 2
S3: rel(SHARED) -> shared = 1
...
Now we potentially could mark individual lwlocks as being fair
locks. But which ones would those be? Certainly not ProcArrayLock, it's
way too heavily contended.
Regards,
Andres