Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables)
Date
Msg-id 20230811202607.l5evqsryq4ic2jlw@awork3.anarazel.de
Whole thread Raw
In response to Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables)  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
Hi,

On 2023-08-12 07:51:09 +1200, Thomas Munro wrote:
> Oh, I see what's happening.  Maybe commit b91dd9de wasn't the best
> idea, but bc971f4025c broke an assumption, since it doesn't use
> ConditionVariableSleep().  That is confusing the signal forwarding
> logic which expects to find our entry in the wait list in the common
> case.

Hm, I guess I got confused by the cv code once more. I thought that
ConditionVariableCancelSleep() would wake us up anyway, because
once we return from ConditionVariableSleep(), we'd be off the list. But I now
realize (and I think not for the first time), that ConditionVariableSleep()
always puts us *back* on the list.


Leaving aside the issue in this thread, isn't always adding us back into the
list bad from a contention POV alone - it doubles the write traffic on the CV
and is guaranteed to cause contention for ConditionVariableBroadcast().  I
wonder if this explains some performance issues I've seen in the past.

What if we instead reset cv_sleep_target once we've been taken off the list? I
think it'd not be too hard to make it safe to do the proclist_contains()
without the spinlock.  Lwlocks have something similar, there we solve it by
this sequence:

        proclist_delete(&wakeup, iter.cur, lwWaitLink);

        /*
         * Guarantee that lwWaiting being unset only becomes visible once the
         * unlink from the link has completed. Otherwise the target backend
         * could be woken up for other reason and enqueue for a new lock - if
         * that happens before the list unlink happens, the list would end up
         * being corrupted.
         *
         * The barrier pairs with the LWLockWaitListLock() when enqueuing for
         * another lock.
         */
        pg_write_barrier();
        waiter->lwWaiting = LW_WS_NOT_WAITING;
        PGSemaphoreUnlock(waiter->sem);

I guess this means we'd need something like lwWaiting for CVs as well.


> From a85b2827f4500bc2e7c533feace474bc47086dfa Mon Sep 17 00:00:00 2001
> From: Thomas Munro <thomas.munro@gmail.com>
> Date: Sat, 12 Aug 2023 07:06:08 +1200
> Subject: [PATCH] De-pessimize ConditionVariableCancelSleep().
>
> Commit b91dd9de was concerned with a theoretical problem with our
> non-atomic condition variable operations.  If you stop sleeping, and
> then cancel the sleep in a separate step, you might be signaled in
> between, and that could be lost.  That doesn't matter for callers of
> ConditionVariableBroadcast(), but callers of ConditionVariableSignal()
> might be upset if a signal went missing like this.

FWIW I suspect at least some of the places that'd have a problem without that
forwarding, might also be racy with it....


> New idea: ConditionVariableCancelSleep() can just return true if you've
> been signaled.  Hypothetical users of ConditionVariableSignal() would
> then still have a way to deal with rare lost signals if they are
> concerned about that problem.

Sounds like a plan to me.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: AssertLog instead of Assert in some places
Next
From: Peter Geoghegan
Date:
Subject: Re: run pgindent on a regular basis / scripted manner