Re: condition variables - Mailing list pgsql-hackers

From Robert Haas
Subject Re: condition variables
Date
Msg-id CA+TgmobhMRTJz8bSGxTnkEX87DhxbZZghjxF7ESO5E930Z2b1g@mail.gmail.com
Whole thread Raw
In response to Re: condition variables  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: condition variables  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Mon, Aug 15, 2016 at 1:58 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Sun, Aug 14, 2016 at 9:04 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Fri, Aug 12, 2016 at 9:47 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> [condition-variable-v1.patch]
>>
>> Don't you need to set proc->cvSleeping = false in ConditionVariableSignal?
>
> I poked at this a bit... OK, a lot... and have some feedback:
>
> 1.  As above, we need to clear cvSleeping before setting the latch.

Right, OK.

> 2.  The following schedule corrupts the waitlist by trying to delete
> something from it that isn't in it:
>
> P1: ConditionVariablePrepareToSleep: push self onto waitlist
> P2:   ConditionVariableSignal: pop P1 from waitlist
> P1: <check user condition, decide to break without ever sleeping>
> P3:     ConditionVariablePrepareToSleep: push self onto waitlist
> P1: ConditionVariableCancelSleep: delete self from waitlist (!)
>
> Without P3 coming along you probably wouldn't notice because the
> waitlist will be happily cleared and look valid, but P3's entry gets
> lost and then it hangs forever.
>
> One solution is to teach ConditionVariableCancelSleep to check if
> we're still actually there first.  That can be done in O(1) time by
> clearing nodes' next and prev pointers when deleting, so that we can
> rely on that in a new proclist_contains() function.  See attached.

How about instead using cvSleeping to test this?  Suppose we make a
rule that cvSleeping can be changed from false to true only by the
process whose PGPROC it is, and thus no lock is needed, but changing
it from true to false always requires the spinlock.

> 3.  The following schedule corrupts the waitlist by trying to insert
> something into it that is already in it:
>
> P1: ConditionVariablePrepareToSleep: push self onto waitlist
> P1: <check user condition, decide to sleep>
> P1: ConditionVariableSleep
> P1: ConditionVariablePrepareToSleep: push self onto waitlist (!)
>
> Nodes before and after self's pre-existing position can be forgotten
> when self's node is pushed to the front of the list.  That can be
> fixed by making ConditionVariablePrepareToSleep also check if we're
> already in the list.

OK.

> 4.  The waitlist is handled LIFO-ly.  Would it be better for the first
> guy to start waiting to be woken up first, like we do for locks?  The
> Pthreads equivalent says that it depends on "scheduling policy".  I
> don't know if it matters much, just an observation.

I don't know whether this matters.  It's possible that FIFO is a
better policy; I don't really care.

> 5.  The new proclist function you added is the first to work in terms
> of PGPROC* rather than procno.  Maybe the whole interface should work
> with either PGPROC pointers or procnos?  No strong view.

Hmm, maybe so.  But wouldn't any caller translate to a PGPROC * straight off?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: patch proposal
Next
From: Shay Rojansky
Date:
Subject: Re: Slowness of extended protocol