Re: Introduce timeout capability for ConditionVariableSleep - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Introduce timeout capability for ConditionVariableSleep
Date
Msg-id CA+hUKGJxFAEg5A0B55JXOdRetYbP0Ng3vFL2An6mL0i0yAYJKA@mail.gmail.com
Whole thread Raw
In response to Re: Introduce timeout capability for ConditionVariableSleep  (Shawn Debnath <sdn@amazon.com>)
Responses Re: Introduce timeout capability for ConditionVariableSleep  (Shawn Debnath <sdn@amazon.com>)
Re: Introduce timeout capability for ConditionVariableSleep  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Jul 12, 2019 at 6:08 PM Shawn Debnath <sdn@amazon.com> wrote:
> On Tue, Jul 09, 2019 at 11:03:18PM +1200, Thomas Munro wrote:
> > On Sun, Jul 7, 2019 at 3:09 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > > +               /* Timed out */
> > > +               if (rc == 0)
> > > +                       return true;
> >
> > Here's a version without that bit, because I don't think we need it.
>
> This works. Agree that letting it fall through covers the first gap.

Pushed, like that (with the now unused rc variable also removed).
Thanks for the patch!

> > > That still leaves the danger that the CV can be signalled some time
> > > after ConditionVariableTimedSleep() returns.  So now I'm wondering if
> > > ConditionVariableCancelSleep() should signal the CV if it discovers
> > > that this process is not in the proclist, on the basis that that must
> > > indicate that we've been signalled even though we're not interested in
> > > the message anymore, and yet some other process else might be
> > > interested, and that might have been the only signal that is ever
> > > going to be delivered by ConditionVariableSignal().
> >
> > And a separate patch to do that.  Thoughts?
>
> I like it. This covers the gap all the way till cancel is invoked and it
> manipulates the list to remove itself or realizes that it needs to
> forward the signal to some other process.

I pushed this too.  It's a separate commit, because I think there is
at least a theoretical argument that it should be back-patched.  I'm
not going to do that today though, because I doubt anyone is relying
on ConditionVariableSignal() working that reliably yet, and it's
really with timeouts that it becomes a likely problem.

I thought about this edge case because I have long wanted to propose a
pair of functions that provide a simplified payloadless blocking
alternative to NOTIFY, that would allow for just the right number of
waiting sessions to wake up to handle SKIP LOCKED-style job queues.
Otherwise you sometimes get thundering herds of wakeups fighting over
crumbs.  That made me think about the case where a worker session
decides to time out and shut down due to being idle for too long, but
eats a wakeup on its way out.  Another question that comes up in that
use case is whether CV wakeup queues should be LIFO or FIFO.  I think
the answer is LIFO, to support class worker pool designs that
stabilise at the right size using a simple idle timeout rule.  They're
currently FIFO (proclist_pop_head_node() to wake up, but
proclist_push_tail() to sleep).  I understand why Robert didn't care
about that last time I mentioned it: all our uses of CVs today are
"broadcast" wakeups.  But a productised version of the "poke" hack I
showed earlier that supports poking just one waiter would care about
the thing this patch fixed, and also the wakeup queue order.

-- 
Thomas Munro
https://enterprisedb.com



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Check-out mutable functions in check constraints
Next
From: Thomas Munro
Date:
Subject: Re: Tab completion for CREATE TYPE