Re: walsender performance regression due to logical decoding on standby changes - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: walsender performance regression due to logical decoding on standby changes |
Date | |
Msg-id | CALj2ACVZyh++dgq5REgmO5-wNtZt=tN4zv=kMqPEARJ4zOpfVg@mail.gmail.com Whole thread Raw |
In response to | walsender performance regression due to logical decoding on standby changes (Andres Freund <andres@anarazel.de>) |
Responses |
Re: walsender performance regression due to logical decoding on standby changes
RE: walsender performance regression due to logical decoding on standby changes |
List | pgsql-hackers |
On Wed, May 10, 2023 at 12:33 AM Andres Freund <andres@anarazel.de> wrote: > > Unfortunately I have found the following commit to have caused a performance > regression: > > commit e101dfac3a53c20bfbf1ca85d30a368c2954facf > > The problem is that, on a standby, after the change - as needed to for the > approach to work - the call to WalSndWakeup() in ApplyWalRecord() happens for > every record, instead of only happening when the timeline is changed (or WAL > is flushed or ...). > > WalSndWakeup() iterates over all walsender slots, regardless of whether in > use. For each of the walsender slots it acquires a spinlock. > > When replaying a lot of small-ish WAL records I found the startup process to > spend the majority of the time in WalSndWakeup(). I've not measured it very > precisely yet, but the overhead is significant (~35% slowdown), even with the > default max_wal_senders. If that's increased substantially, it obviously gets > worse. I played it with a simple primary -> standby1 -> standby2 setup. I ran a pgbench script [1] on primary and counted the number of times WalSndWakeup() gets called from ApplyWalRecord() and the number of times spinlock is acquired/released in WalSndWakeup(). It's a whopping 21 million times spinlock is acquired/released on the standby 1 and standby 2 for just a < 5min of pgbench run on the primary: standby 1: 2023-05-10 05:32:43.249 UTC [1595600] LOG: FOO WalSndWakeup() in ApplyWalRecord() was called 2176352 times 2023-05-10 05:32:43.249 UTC [1595600] LOG: FOO spinlock acquisition/release count in WalSndWakeup() is 21763530 standby 2: 2023-05-10 05:32:43.249 UTC [1595625] LOG: FOO WalSndWakeup() in ApplyWalRecord() was called 2176352 times 2023-05-10 05:32:43.249 UTC [1595625] LOG: FOO spinlock acquisition/release count in WalSndWakeup() is 21763530 In this case, there is no timeline switch or no logical decoding on the standby or such, but WalSndWakeup() gets called because the standby can't make out if the slot is for logical or physical replication unless spinlock is acquired. Before e101dfac3a, WalSndWakeup() was getting called only when there was a timeline switch. > The only saving grace is that this is not an issue on the primary. Yeah. > I don't think the approach of not having any sort of "registry" of whether > anybody is waiting for the replay position to be updated is > feasible. Iterating over all walsenders slots is just too expensive - > WalSndWakeup() shows up even if I remove the spinlock (which we likely could, > especially when just checking if the the walsender is connected). Right. > My current guess is that mis-using a condition variable is the best bet. I > think it should work to use ConditionVariablePrepareToSleep() before a > WalSndWait(), and then ConditionVariableCancelSleep(). I.e. to never use > ConditionVariableSleep(). The latch set from ConditionVariableBroadcast() > would still cause the necessary wakeup. How about something like the attached? Recovery and subscription tests don't complain with the patch. [1] ./pg_ctl -D data -l logfile stop ./pg_ctl -D sbdata1 -l logfile1 stop ./pg_ctl -D sbdata2 -l logfile2 stop cd /home/ubuntu/postgres rm -rf inst make distclean ./configure --prefix=$PWD/inst/ CFLAGS="-O3" > install.log && make -j 8 install > install.log 2>&1 & primary -> standby1 -> standby2 free -m sudo su -c 'sync; echo 3 > /proc/sys/vm/drop_caches' free -m cd inst/bin ./initdb -D data mkdir archived_wal cat << EOF >> data/postgresql.conf shared_buffers = '8GB' wal_buffers = '1GB' max_wal_size = '16GB' max_connections = '5000' archive_mode = 'on' archive_command='cp %p /home/ubuntu/postgres/inst/bin/archived_wal/%f' EOF ./pg_ctl -D data -l logfile start ./pg_basebackup -D sbdata1 ./psql -c "select pg_create_physical_replication_slot('sb1_slot', true, false)" postgres cat << EOF >> sbdata1/postgresql.conf port=5433 primary_conninfo='host=localhost port=5432 dbname=postgres user=ubuntu application_name=sb1' primary_slot_name='sb1_slot' restore_command='cp /home/ubuntu/postgres/inst/bin/archived_wal/%f %p' EOF touch sbdata1/standby.signal ./pg_ctl -D sbdata1 -l logfile1 start ./pg_basebackup -D sbdata2 -p 5433 ./psql -p 5433 -c "select pg_create_physical_replication_slot('sb2_slot', true, false)" postgres cat << EOF >> sbdata2/postgresql.conf port=5434 primary_conninfo='host=localhost port=5433 dbname=postgres user=ubuntu application_name=sb2' primary_slot_name='sb2_slot' restore_command='cp /home/ubuntu/postgres/inst/bin/archived_wal/%f %p' EOF touch sbdata2/standby.signal ./pg_ctl -D sbdata2 -l logfile2 start ./psql -c "select pid, backend_type from pg_stat_activity where backend_type in ('walreceiver', 'walsender');" postgres ./psql -p 5433 -c "select pid, backend_type from pg_stat_activity where backend_type in ('walreceiver', 'walsender');" postgres ./psql -p 5434 -c "select pid, backend_type from pg_stat_activity where backend_type in ('walreceiver', 'walsender');" postgres ulimit -S -n 5000 ./pgbench --initialize --scale=300 postgres for c in 1 2 4 8 16 32 64 128 256 512 768 1024 2048 4096; do echo -n "$c ";./pgbench -n -M prepared -U ubuntu postgres -b tpcb-like -c$c -j$c -T5 2>&1|grep '^tps'|awk '{print $3}';done -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: