Re: walsender performance regression due to logical decoding on standby changes - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: walsender performance regression due to logical decoding on standby changes
Date
Msg-id CALj2ACVZyh++dgq5REgmO5-wNtZt=tN4zv=kMqPEARJ4zOpfVg@mail.gmail.com
Whole thread Raw
In response to walsender performance regression due to logical decoding on standby changes  (Andres Freund <andres@anarazel.de>)
Responses Re: walsender performance regression due to logical decoding on standby changes
RE: walsender performance regression due to logical decoding on standby changes
List pgsql-hackers
On Wed, May 10, 2023 at 12:33 AM Andres Freund <andres@anarazel.de> wrote:
>
> Unfortunately I have found the following commit to have caused a performance
> regression:
>
> commit e101dfac3a53c20bfbf1ca85d30a368c2954facf
>
> The problem is that, on a standby, after the change - as needed to for the
> approach to work - the call to WalSndWakeup() in ApplyWalRecord() happens for
> every record, instead of only happening when the timeline is changed (or WAL
> is flushed or ...).
>
> WalSndWakeup() iterates over all walsender slots, regardless of whether in
> use. For each of the walsender slots it acquires a spinlock.
>
> When replaying a lot of small-ish WAL records I found the startup process to
> spend the majority of the time in WalSndWakeup(). I've not measured it very
> precisely yet, but the overhead is significant (~35% slowdown), even with the
> default max_wal_senders. If that's increased substantially, it obviously gets
> worse.

I played it with a simple primary -> standby1 -> standby2 setup. I ran
a pgbench script [1] on primary and counted the number of times
WalSndWakeup() gets called from ApplyWalRecord() and the number of
times spinlock is acquired/released in WalSndWakeup(). It's a whopping
21 million times spinlock is acquired/released on the standby 1 and
standby 2 for just a < 5min of pgbench run on the primary:

standby 1:
2023-05-10 05:32:43.249 UTC [1595600] LOG:  FOO WalSndWakeup() in
ApplyWalRecord() was called 2176352 times
2023-05-10 05:32:43.249 UTC [1595600] LOG:  FOO spinlock
acquisition/release count in WalSndWakeup() is 21763530

standby 2:
2023-05-10 05:32:43.249 UTC [1595625] LOG:  FOO WalSndWakeup() in
ApplyWalRecord() was called 2176352 times
2023-05-10 05:32:43.249 UTC [1595625] LOG:  FOO spinlock
acquisition/release count in WalSndWakeup() is 21763530

In this case, there is no timeline switch or no logical decoding on
the standby or such, but WalSndWakeup() gets called because the
standby can't make out if the slot is for logical or physical
replication unless spinlock is acquired. Before e101dfac3a,
WalSndWakeup() was getting called only when there was a timeline
switch.

> The only saving grace is that this is not an issue on the primary.

Yeah.

> I don't think the approach of not having any sort of "registry" of whether
> anybody is waiting for the replay position to be updated is
> feasible. Iterating over all walsenders slots is just too expensive -
> WalSndWakeup() shows up even if I remove the spinlock (which we likely could,
> especially when just checking if the the walsender is connected).

Right.

> My current guess is that mis-using a condition variable is the best bet. I
> think it should work to use ConditionVariablePrepareToSleep() before a
> WalSndWait(), and then ConditionVariableCancelSleep(). I.e. to never use
> ConditionVariableSleep(). The latch set from ConditionVariableBroadcast()
> would still cause the necessary wakeup.

How about something like the attached? Recovery and subscription tests
don't complain with the patch.

[1]
./pg_ctl -D data -l logfile stop
./pg_ctl -D sbdata1 -l logfile1 stop
./pg_ctl -D sbdata2 -l logfile2 stop

cd /home/ubuntu/postgres
rm -rf inst
make distclean
./configure --prefix=$PWD/inst/ CFLAGS="-O3" > install.log && make -j
8 install > install.log 2>&1 &

primary -> standby1 -> standby2

free -m
sudo su -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
free -m

cd inst/bin
./initdb -D data
mkdir archived_wal

cat << EOF >> data/postgresql.conf
shared_buffers = '8GB'
wal_buffers = '1GB'
max_wal_size = '16GB'
max_connections = '5000'
archive_mode = 'on'
archive_command='cp %p /home/ubuntu/postgres/inst/bin/archived_wal/%f'
EOF

./pg_ctl -D data -l logfile start
./pg_basebackup -D sbdata1
./psql -c "select pg_create_physical_replication_slot('sb1_slot',
true, false)" postgres

cat << EOF >> sbdata1/postgresql.conf
port=5433
primary_conninfo='host=localhost port=5432 dbname=postgres user=ubuntu
application_name=sb1'
primary_slot_name='sb1_slot'
restore_command='cp /home/ubuntu/postgres/inst/bin/archived_wal/%f %p'
EOF

touch sbdata1/standby.signal

./pg_ctl -D sbdata1 -l logfile1 start
./pg_basebackup -D sbdata2 -p 5433
./psql -p 5433 -c "select
pg_create_physical_replication_slot('sb2_slot', true, false)" postgres

cat << EOF >> sbdata2/postgresql.conf
port=5434
primary_conninfo='host=localhost port=5433 dbname=postgres user=ubuntu
application_name=sb2'
primary_slot_name='sb2_slot'
restore_command='cp /home/ubuntu/postgres/inst/bin/archived_wal/%f %p'
EOF

touch sbdata2/standby.signal

./pg_ctl -D sbdata2 -l logfile2 start
./psql -c "select pid, backend_type from pg_stat_activity where
backend_type in ('walreceiver', 'walsender');" postgres
./psql -p 5433 -c "select pid, backend_type from pg_stat_activity
where backend_type in ('walreceiver', 'walsender');" postgres
./psql -p 5434 -c "select pid, backend_type from pg_stat_activity
where backend_type in ('walreceiver', 'walsender');" postgres

ulimit -S -n 5000
./pgbench --initialize --scale=300 postgres
for c in 1 2 4 8 16 32 64 128 256 512 768 1024 2048 4096; do echo -n
"$c ";./pgbench -n -M prepared -U ubuntu postgres -b tpcb-like -c$c
-j$c -T5 2>&1|grep '^tps'|awk '{print $3}';done

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: 吴昊
Date:
Subject: Re:Re: Feature: Add reloption support for table access method
Next
From: "Drouvot, Bertrand"
Date:
Subject: Re: walsender performance regression due to logical decoding on standby changes