Re: old_snapshot_threshold bottleneck on replica - Mailing list pgsql-hackers
From | Pavel Borisov |
---|---|
Subject | Re: old_snapshot_threshold bottleneck on replica |
Date | |
Msg-id | CALT9ZEEwfZkBC8LHddkigJC_MUQtKsxUNN-veVDVWmmBvKn_oQ@mail.gmail.com Whole thread Raw |
In response to | old_snapshot_threshold bottleneck on replica (Maxim Orlov <orlovmg@gmail.com>) |
List | pgsql-hackers |
Hi, Maxim! On Mon, 23 Jan 2023 at 18:40, Maxim Orlov <orlovmg@gmail.com> wrote: > > Hi! > > One of our customers stumble onto a significant performance degradation while running multiple OLAP-like queries on a replica. > After some investigation, it became clear that the problem is in accessing old_snapshot_threshold parameter. > > Accessing old_snapshot_threshold parameter is guarded by mutex_threshold. This is not a problem on primary > server, since we rarely call GetOldSnapshotThresholdTimestamp: > > 5028 void > 5029 TestForOldSnapshot_impl(Snapshot snapshot, Relation relation) > 5030 { > 5031 ····if (RelationAllowsEarlyPruning(relation) > 5032 ········&& (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp()) > 5033 ········ereport(ERROR, > 5034 ················(errcode(ERRCODE_SNAPSHOT_TOO_OLD), > 5035 ················ errmsg("snapshot too old"))); > > But in case of a replica, we have to call GetOldSnapshotThresholdTimestamp much often. So, this become a > bottleneck. The customer solve this issue by setting old_snapshot_threshold to 0. But, I think, we can > do something about it. > > Some more investigation: > > -- On primary -- > $ ./bin/psql postgres -c "create database benchmark" > CREATE DATABASE > $ ./bin/pgbench -i -Uorlov -s300 benchmark > dropping old tables... > NOTICE: table "pgbench_accounts" does not exist, skipping > ... > creating tables... > generating data (client-side)... > 30000000 of 30000000 tuples (100%) done (elapsed 142.37 s, remaining 0.00 s) > vacuuming... > creating primary keys... > done in 177.67 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 144.45 s, vacuum 0.59 s, primary keys32.61 s). > > -- On secondary -- > $ touch 1.sql > $ vim 1.sql > $ cat 1.sql > \set bid random(1, 300) > BEGIN; > SELECT sum(aid) FROM pgbench_accounts where bid = :bid GROUP BY bid; > END; > $ ./bin/pgbench -f 1.sql -p5433 -Uorlov -j10 -c100 -T720 -P1 -n benchmark > pgbench (16devel) > progress: 1.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed > ... > progress: 20.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed > > $ perf record -F 99 -a -g --call-graph=dwarf sleep 5 > $ perf script --header --fields comm,pid,tid,time,event,ip,sym,dso > file > $ grep s_lock file | wc -l > > 3486 > > > My proposal is to use atomic for threshold_timestamp and threshold_xid. PFA 0001 patch. > With patch 0001 we got: > > $ grep s_lock file2 | wc -l > 8 > > > Maybe, we shall go farther and remove mutex_threshold here? This will lead to inconsistency of > threshold_timestamp and threshold_xid, but is this really a problem? > > Thoughts? I think optimizing locking and switching to atomics wherever it improves performance is a good direction. If performance improvement could be demonstrated in a more direct way it would be a good argument to commit the improvement. Personally I like TPS plots like in [1]. [1] https://www.postgresql.org/message-id/CALT9ZEHSX1Hpz5xjDA62yHAHtpinkA6hg8Zt-odyxqppmKbQFA%40mail.gmail.com Kind regards, Pavel Borisov, Supabase
pgsql-hackers by date: