Re: Condition variable live lock - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Condition variable live lock
Date
Msg-id CA+TgmobTOp154RTcM7HNoUhaT3-m1DCsjEx1oVNqvhuz7PGUbA@mail.gmail.com
Whole thread Raw
In response to Re: Condition variable live lock  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Condition variable live lock
List pgsql-hackers
On Sun, Jan 7, 2018 at 6:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Actually ... perhaps a better design would be to have
>> ConditionVariable[PrepareTo]Sleep auto-cancel any prepared sleep for
>> a different condition variable, analogously to what we just did in
>> ConditionVariableBroadcast, on the same theory that whenever control
>> returns to the other CV wait loop it can re-establish the relevant
>> state easily enough.  I have to think that if the use of CVs grows
>> much, the existing restriction is going to become untenable anyway,
>> so why not just get rid of it?
>
> Concretely, as per attached.

I guess my aversion to converting the existing Assert-test into an
elog test was really a concern that we'd be countenancing the use of
CVs in any coding pattern more complicated than a very simple
test-and-wait loop.  Suppose someone were to propose adding runtime
checks that when we release a spinlock, it is held by the process that
tried to release it.  Someone might reply that such a check ought to
be unnecessary because the code protected by a spinlock ought to be a
short, straight-line critical section and therefore we should be able
to spot any such coding error by inspection.

And I wonder if the same isn't true here.  Is it really sensible,
within a CV-wait loop, to call some other function that contains its
own CV-wait loop?  Is that really a use case we want to support?  If
you do have such a thing, with the present coding, you don't *need* an
Assert() at runtime; you just need to run the code with assertions
enabled AT LEAST ONCE.  If the code flow is so complex that it doesn't
reliably fail an assertion in test environments, maybe it's just too
complex.  Perhaps our answer to that, or so my thinking went, ought to
be "don't write code like that in the first place".

Now, that may be myopic on my part.  If we want to support complex
control flows involving CVs, the approach here has a lot to recommend
it.  Instead of making it the caller's problem to work it out, we do
it automatically.  There is some loss of efficiency, perhaps, since
when control returns to the outer CV-wait loop it will have to recheck
the condition twice before potentially waiting, but maybe that doesn't
matter.  It's often the case that mechanisms like this end up getting
used profitably in a lot of places not imagined by their original
creator, and that might be the case here.

I think an extremely *likely* programming error when programming with
CVs is to have a code path where ConditionVariableCancelSleep() does
not get called.  The proposed change could make such mistakes much
less noticeable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Parallel append plan instability/randomness
Next
From: Andrew Dunstan
Date:
Subject: Re: portal pinning