Re: [HACKERS] WIP: long transactions on hot standby feedback replica/ proof of concept - Mailing list pgsql-hackers
From | Ivan Kartyshov |
---|---|
Subject | Re: [HACKERS] WIP: long transactions on hot standby feedback replica/ proof of concept |
Date | |
Msg-id | 4ae82b354af9a20624e1e1d05932087e@postgrespro.ru Whole thread Raw |
In response to | Re: [HACKERS] WIP: long transactions on hot standby feedback replica/ proof of concept (Alexander Korotkov <a.korotkov@postgrespro.ru>) |
Responses |
Re: [HACKERS] WIP: long transactions on hot standby feedback replica/ proof of concept
|
List | pgsql-hackers |
Hello, thank you for your review. Alexander Korotkov писал 2018-06-20 20:54: > Thinking about that more I found that adding vacuum mark as an extra > argument to LockAcquireExtended is also wrong. It would be still hard > to determine if we should log the lock in LogStandbySnapshot(). We'll > have to add that flag to shared memory table of locks. And that looks > like a kludge. > > Therefore, I'd like to propose another approach: introduce new lock > level. So, AccessExclusiveLock will be split into two > AccessExclusiveLocalLock and AccessExclusiveLock. In spite of > AccessExclusiveLock, AccessExclusiveLocalLock will be not logged to > standby, and used for heap truncation. > > I expect some resistance to my proposal, because fixing this "small > bug" doesn't deserve new lock level. But current behavior of > hot_standby_feedback is buggy. From user prospective, > hot_standby_feedback option is just non-worker, which causes master to > bloat without protection for standby queries from cancel. We need to > fix that, but I don't see other proper way to do that except adding > new lock level... Your offer is very interesting, it made patch smaller and more beautiful. So I update patch and made changes accordance with the proposed concept of special AccessExclusiveLocalLock. > I don't yet understand what is the problem here and why this patch > should solve that. As I get idea of this patch, GetOldestXmin() will > initialize xmin of walsender with lowest xmin of replication slot. > But xmin of replication slots will be anyway taken into account by > GetSnapshotData() called by vacuum. > >> How to test: >> Make async replica, turn on feedback, reduce >> max_standby_streaming_delay >> and aggressive autovacuum. > > You forgot to mention, that one should setup the replication using > replication slot. Naturally, if replication slot isn't exist, then > master shouldn't keep dead tuples for disconnected standby. Because > master doesn't know if standby will reconnect or is it gone forever. I tried to solve hot_standby_feedback without replication slots, so I found a little window of possibility of case, when vacuum on master make heap truncation of pages that standby needs for just started transaction. How to test: Make async replica, turn on feedback and reduce max_standby_streaming_delay. Make autovacuum more aggressive. autovacuum = on autovacuum_max_workers = 1 autovacuum_naptime = 1s autovacuum_vacuum_threshold = 1 autovacuum_vacuum_cost_delay = 0 Test1: Here we will do a load on the master and simulation of a long transaction with repeated 1 second SEQSCANS on the replica (by calling pg_sleep 1 second duration every 6 seconds). MASTER REPLICA hot_standby = on max_standby_streaming_delay = 1s hot_standby_feedback = on start CREATE TABLE test AS (SELECT id, 1 AS value FROM generate_series(1,1) id); pgbench -T600 -P2 -n --file=master.sql postgres (update test set value = value;) start BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED; SELECT pg_sleep(value) FROM test; \watch 6 ---Autovacuum truncate pages at the end Result on replica: FATAL: terminating connection due to conflict with recovery DETAIL: User was holding a relation lock for too long. On Patched version lazy_vacuum_truncation passed without fatal errors. Only some times Error occurs because this tests is too synthetic ERROR: canceling statement due to conflict with recovery DETAIL: User was holding shared buffer pin for too long. Because of rising ResolveRecoveryConflictWithSnapshot while redo some visibility flags to avoid this conflict we can do test2 or increase max_standby_streaming_delay. Test2: Here we will do a load on the master and simulation of a long transaction on the replica (by taking LOCK on table) MASTER REPLICA hot_standby = on max_standby_streaming_delay = 1s hot_standby_feedback = on start CREATE TABLE test AS (SELECT id, 1 AS value FROM generate_series(1,1) id); pgbench -T600 -P2 -n --file=master.sql postgres (update test set value = value;) start BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED; LOCK TABLE test IN ACCESS SHARE MODE; select * from test; \watch 6 ---Autovacuum truncate pages at the end Result on replica: FATAL: terminating connection due to conflict with recovery DETAIL: User was holding a relation lock for too long. On Patched version lazy_vacuum_truncation passed without fatal errors. I would like to here you opinion over this implementation. -- Ivan Kartyshov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
pgsql-hackers by date: