Re: walsender performance regression due to logical decoding on standby changes - Mailing list pgsql-hackers

From Andres Freund
Subject Re: walsender performance regression due to logical decoding on standby changes
Date
Msg-id 20230517195315.yavg4gwzk6pp6zuo@awork3.anarazel.de
Whole thread Raw
In response to Re: walsender performance regression due to logical decoding on standby changes  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: walsender performance regression due to logical decoding on standby changes
Re: walsender performance regression due to logical decoding on standby changes
List pgsql-hackers
Hi,

Thanks for working on the patch!

On 2023-05-15 20:09:00 +0530, Bharath Rupireddy wrote:
> > [1]
> > max_wal_senders = 100
> > before regression(ms)                after regression(ms)    v2 patch(ms)
> > 13394.4013                          14141.2615              13455.2543
> > Compared with before regression     5.58%                   0.45%
> >
> > max_wal_senders = 200
> > before regression(ms)                after regression(ms)     v2 patch(ms)
> > 13280.8507                          14597.1173              13632.0606
> > Compared with before regression     9.91%                   1.64%
> >
> > max_wal_senders = 300
> > before regression(ms)                after regression(ms)     v2 patch(ms)
> > 13535.0232                          16735.7379              13705.7135
> > Compared with before regression     23.65%                  1.26%
> 
> Yes, the numbers with v2 patch look close to where we were before.
> Thanks for confirming. Just wondering, where does this extra
> 0.45%/1.64%/1.26% coming from?

We still do more work for each WAL record than before, so I'd expect something
small. I'd say right now the main overhead with the patch comes from the
spinlock acquisitions in ConditionVariableBroadcast(), which happen even when
nobody is waiting.

I'll try to come up with a benchmark without the issues I pointed out in
https://postgr.es/m/20230517194331.ficfy5brpfq5lrmz%40awork3.anarazel.de


> +        ConditionVariableInit(&WalSndCtl->physicalWALSndCV);
> +        ConditionVariableInit(&WalSndCtl->logicalWALSndCV);

It's not obvious to me that it's worth having two CVs, because it's more
expensive to find no waiters in two CVs than to find no waiters in one CV.


> +    /*
> +     * We use condition variable (CV) to efficiently wake up walsenders in
> +     * WalSndWakeup().
> +     *
> +     * Every walsender prepares to sleep on a shared memory CV. Note that it
> +     * just prepares to sleep on the CV (i.e., adds itself to the CV's
> +     * waitlist), but not actually waits on the CV (IOW, it never calls
> +     * ConditionVariableSleep()). It still uses WaitEventSetWait() for waiting,
> +     * because CV infrastructure doesn't handle FeBe socket events currently.
> +     * The processes (startup process, walreceiver etc.) wanting to wake up
> +     * walsenders use ConditionVariableBroadcast(), which in turn calls
> +     * SetLatch(), helping walsenders come out of WaitEventSetWait().
> +     *
> +     * This approach is simple and efficient because, one doesn't have to loop
> +     * through all the walsenders slots, with a spinlock acquisition and
> +     * release for every iteration, just to wake up only the waiting
> +     * walsenders. It makes WalSndWakeup() callers' life easy.
> +     *
> +     * XXX: When available, WaitEventSetWait() can be replaced with its CV's
> +     * counterpart.

I don't really understand that XXX - the potential bright future would be to
add support for CVs into wait event sets, not to replace WES with a CV?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: PostgreSQL 16 Beta 1 release date
Next
From: Matthias van de Meent
Date:
Subject: Inconsistent behavior with locale definition in initdb/pg_ctl init