Thread: Test 031_recovery_conflict.pl is not immune to autovacuum
Hello hackers, Among many recoveryCheck (more concretely, 027_stream_regress) failures occurred on a number of buildfarm animals after switching to meson, which can be explained by timeouts, I saw a different failure on adder: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-03-18%2023%3A43%3A00 [23:48:52.521](9.831s) ok 13 - startup deadlock: cursor holding conflicting pin, also waiting for lock, established [23:55:13.749](381.228s) # poll_query_until timed out executing this query: # # SELECT 'waiting' FROM pg_locks WHERE locktype = 'relation' AND NOT granted; # # expecting this output: # waiting # last actual query output: # # with stderr: [23:55:13.763](0.013s) not ok 14 - startup deadlock: lock acquisition is waiting and I suspect that it might be caused by autovacuum. I've managed to reproduced it locally (running 10 tests in parallel on a 2-core VM with disk bandwidth limited to 80MB/sec I get failures on iterations 10, 1, 3) and observed the following (with wal_debug = on): 031_recovery_conflict_standby.log: 2024-03-20 04:12:06.519 UTC|vagrant|test_db|65fa6214.111ede|LOG: statement: DECLARE test_recovery_conflict_cursor CURSOR FOR SELECT a FROM test_recovery_conflict_table1; 2024-03-20 04:12:06.520 UTC|vagrant|test_db|65fa6214.111ede|LOG: statement: FETCH FORWARD FROM test_recovery_conflict_cursor; 2024-03-20 04:12:06.520 UTC|vagrant|test_db|65fa6214.111ede|LOG: statement: SELECT * FROM test_recovery_conflict_table2; ... 2024-03-20 04:12:07.073 UTC|||65fa620d.111ec8|LOG: REDO @ 0/3438360; LSN 0/3438460: prev 0/3438338; xid 0; len 9; blkref #0: rel 1663/16385/16392, blk 0 - Heap2/PRUNE: snapshotConflictHorizon: 0, nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101] 2024-03-20 04:12:07.084 UTC|||65fa620d.111ec8|LOG: recovery still waiting after 11.241 ms: recovery conflict on buffer pin 2024-03-20 04:12:07.084 UTC|||65fa620d.111ec8|CONTEXT: WAL redo at 0/3438360 for Heap2/PRUNE: snapshotConflictHorizon: 0, nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101]; blkref #0: rel 1663/16385/16392, blk 0 2024-03-20 04:12:07.095 UTC|vagrant|test_db|65fa6214.111ede|ERROR: canceling statement due to conflict with recovery at character 15 2024-03-20 04:12:07.095 UTC|vagrant|test_db|65fa6214.111ede|DETAIL: User transaction caused buffer deadlock with recovery. ... 2024-03-20 04:12:08.093 UTC|vagrant|postgres|65fa6216.111f1a|LOG: statement: SELECT 'waiting' FROM pg_locks WHERE locktype = 'relation' AND NOT granted; 031_recovery_conflict_primary.log: 2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|DEBUG: Autovacuum VacuumUpdateCosts(db=16385, rel=16392, dobalance=yes, cost_limit=200, cost_delay=2 active=yes failsafe=no) 2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|DEBUG: Autovacuum VacuumUpdateCosts(db=16385, rel=16392, dobalance=yes, cost_limit=200, cost_delay=2 active=yes failsafe=no) 2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|LOG: INSERT @ 0/3438460: - Heap2/PRUNE: snapshotConflictHorizon: 0, nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101] 2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|CONTEXT: while scanning block 0 of relation "public.test_recovery_conflict_table1" ... 2024-03-20 04:12:05.981 UTC|||65fa6215.111f02|LOG: automatic vacuum of table "test_db.public.test_recovery_conflict_table1": index scans: 0 The corresponding fragment of 031_recovery_conflict.pl: $res = $psql_standby->query_until( qr/^1$/m, qq[ BEGIN; -- hold pin DECLARE $cursor1 CURSOR FOR SELECT a FROM $table1; FETCH FORWARD FROM $cursor1; -- wait for lock held by prepared transaction SELECT * FROM $table2; ]); ok(1, "$sect: cursor holding conflicting pin, also waiting for lock, established" ); # just to make sure we're waiting for lock already ok( $node_standby->poll_query_until( 'postgres', qq[ SELECT 'waiting' FROM pg_locks WHERE locktype = 'relation' AND NOT granted; ], 'waiting'), "$sect: lock acquisition is waiting"); # VACUUM FREEZE will prune away rows, causing a buffer pin conflict, while # standby psql is waiting on lock $node_primary->safe_psql($test_db, qq[VACUUM FREEZE $table1;]); So if autovacuum happens to process "$table1" before SELECT ... FROM pg_locks, a buffer pin conflict occurs before the manual VACUUM FREEZE and poll_query_until() fails. With autovacuum = off in TEMP_CONFIG 50 iterations passed for me in the same environment. Best regards, Alexander
On Wed, Mar 20, 2024 at 9:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: > > Hello hackers, > > Among many recoveryCheck (more concretely, 027_stream_regress) failures > occurred on a number of buildfarm animals after switching to meson, which > can be explained by timeouts, I saw a different failure on adder: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-03-18%2023%3A43%3A00 > [23:48:52.521](9.831s) ok 13 - startup deadlock: cursor holding conflicting pin, also waiting for lock, established > [23:55:13.749](381.228s) # poll_query_until timed out executing this query: > # > # SELECT 'waiting' FROM pg_locks WHERE locktype = 'relation' AND NOT granted; > # > # expecting this output: > # waiting > # last actual query output: > # > # with stderr: > [23:55:13.763](0.013s) not ok 14 - startup deadlock: lock acquisition is waiting > > and I suspect that it might be caused by autovacuum. > > I've managed to reproduced it locally (running 10 tests in parallel on a > 2-core VM with disk bandwidth limited to 80MB/sec I get failures on > iterations 10, 1, 3) and observed the following (with wal_debug = on): > 031_recovery_conflict_standby.log: > 2024-03-20 04:12:06.519 UTC|vagrant|test_db|65fa6214.111ede|LOG: statement: DECLARE test_recovery_conflict_cursor CURSOR > FOR SELECT a FROM test_recovery_conflict_table1; > 2024-03-20 04:12:06.520 UTC|vagrant|test_db|65fa6214.111ede|LOG: statement: FETCH FORWARD FROM > test_recovery_conflict_cursor; > 2024-03-20 04:12:06.520 UTC|vagrant|test_db|65fa6214.111ede|LOG: statement: SELECT * FROM test_recovery_conflict_table2; > ... > 2024-03-20 04:12:07.073 UTC|||65fa620d.111ec8|LOG: REDO @ 0/3438360; LSN 0/3438460: prev 0/3438338; xid 0; len 9; > blkref #0: rel 1663/16385/16392, blk 0 - Heap2/PRUNE: snapshotConflictHorizon: 0, nredirected: 0, ndead: 0, > isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, > 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, > 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, > 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101] > 2024-03-20 04:12:07.084 UTC|||65fa620d.111ec8|LOG: recovery still waiting after 11.241 ms: recovery conflict on bufferpin > 2024-03-20 04:12:07.084 UTC|||65fa620d.111ec8|CONTEXT: WAL redo at 0/3438360 for Heap2/PRUNE: snapshotConflictHorizon: > 0, nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9, > 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, > 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, > 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, > 100, 101]; blkref #0: rel 1663/16385/16392, blk 0 > 2024-03-20 04:12:07.095 UTC|vagrant|test_db|65fa6214.111ede|ERROR: canceling statement due to conflict with recovery at > character 15 > 2024-03-20 04:12:07.095 UTC|vagrant|test_db|65fa6214.111ede|DETAIL: User transaction caused buffer deadlock with recovery. > ... > 2024-03-20 04:12:08.093 UTC|vagrant|postgres|65fa6216.111f1a|LOG: statement: SELECT 'waiting' FROM pg_locks WHERE > locktype = 'relation' AND NOT granted; > > 031_recovery_conflict_primary.log: > 2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|DEBUG: Autovacuum VacuumUpdateCosts(db=16385, rel=16392, dobalance=yes, > cost_limit=200, cost_delay=2 active=yes failsafe=no) > 2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|DEBUG: Autovacuum VacuumUpdateCosts(db=16385, rel=16392, dobalance=yes, > cost_limit=200, cost_delay=2 active=yes failsafe=no) > 2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|LOG: INSERT @ 0/3438460: - Heap2/PRUNE: snapshotConflictHorizon: 0, > nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9, 10, > 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, > 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, > 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, > 101] > 2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|CONTEXT: while scanning block 0 of relation > "public.test_recovery_conflict_table1" > ... > 2024-03-20 04:12:05.981 UTC|||65fa6215.111f02|LOG: automatic vacuum of table > "test_db.public.test_recovery_conflict_table1": index scans: 0 > > The corresponding fragment of 031_recovery_conflict.pl: > $res = $psql_standby->query_until( > qr/^1$/m, qq[ > BEGIN; > -- hold pin > DECLARE $cursor1 CURSOR FOR SELECT a FROM $table1; > FETCH FORWARD FROM $cursor1; > -- wait for lock held by prepared transaction > SELECT * FROM $table2; > ]); > ok(1, > "$sect: cursor holding conflicting pin, also waiting for lock, established" > ); > > # just to make sure we're waiting for lock already > ok( $node_standby->poll_query_until( > 'postgres', qq[ > SELECT 'waiting' FROM pg_locks WHERE locktype = 'relation' AND NOT granted; > ], 'waiting'), > "$sect: lock acquisition is waiting"); > > # VACUUM FREEZE will prune away rows, causing a buffer pin conflict, while > # standby psql is waiting on lock > $node_primary->safe_psql($test_db, qq[VACUUM FREEZE $table1;]); > > So if autovacuum happens to process "$table1" before SELECT ... FROM > pg_locks, a buffer pin conflict occurs before the manual VACUUM FREEZE > and poll_query_until() fails. > > With autovacuum = off in TEMP_CONFIG 50 iterations passed for me in > the same environment. Hmmm. Thanks for finding this and taking the time to reproduce it. I don't know why I didn't think of this. Seems like we could just add autovacuum_enabled=false to the table like this: diff --git a/src/test/recovery/t/031_recovery_conflict.pl b/src/test/recovery/t/031_recovery_conflict.pl index d87efa823fd..65bc858c02d 100644 --- a/src/test/recovery/t/031_recovery_conflict.pl +++ b/src/test/recovery/t/031_recovery_conflict.pl @@ -59,7 +59,7 @@ my $table1 = "test_recovery_conflict_table1"; my $table2 = "test_recovery_conflict_table2"; $node_primary->safe_psql( $test_db, qq[ -CREATE TABLE ${table1}(a int, b int); +CREATE TABLE ${table1}(a int, b int) WITH (autovacuum_enabled = false); INSERT INTO $table1 SELECT i % 3, 0 FROM generate_series(1,20) i; CREATE TABLE ${table2}(a int, b int); ]); - Melanie
Hello Melanie, 20.03.2024 16:15, Melanie Plageman wrote: > Seems like we could just add autovacuum_enabled=false to the table like this: > diff --git a/src/test/recovery/t/031_recovery_conflict.pl > b/src/test/recovery/t/031_recovery_conflict.pl > index d87efa823fd..65bc858c02d 100644 > --- a/src/test/recovery/t/031_recovery_conflict.pl > +++ b/src/test/recovery/t/031_recovery_conflict.pl > @@ -59,7 +59,7 @@ my $table1 = "test_recovery_conflict_table1"; > my $table2 = "test_recovery_conflict_table2"; > $node_primary->safe_psql( > $test_db, qq[ > -CREATE TABLE ${table1}(a int, b int); > +CREATE TABLE ${table1}(a int, b int) WITH (autovacuum_enabled = false); > INSERT INTO $table1 SELECT i % 3, 0 FROM generate_series(1,20) i; > CREATE TABLE ${table2}(a int, b int); > ]); Thanks for paying attention to it! With such modification applied I've got another failure (on iteration 2): [13:27:39.034](2.317s) ok 14 - startup deadlock: lock acquisition is waiting Waiting for replication conn standby's replay_lsn to pass 0/343E6D0 on primary done timed out waiting for match: (?^:User transaction caused buffer deadlock with recovery.) at t/031_recovery_conflict.pl line 318. # Postmaster PID for node "primary" is 1523036 ### Stopping node "primary" using mode immediate 031_recovery_conflict_standby.log really doesn't contain the expected message. I can share log files from a successful and failed test runs, if they can be helpful, or I'll investigate this case today/tomorrow. Best regards, Alexander
On Wed, Mar 20, 2024 at 10:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: > > Hello Melanie, > > 20.03.2024 16:15, Melanie Plageman wrote: > > Seems like we could just add autovacuum_enabled=false to the table like this: > > diff --git a/src/test/recovery/t/031_recovery_conflict.pl > > b/src/test/recovery/t/031_recovery_conflict.pl > > index d87efa823fd..65bc858c02d 100644 > > --- a/src/test/recovery/t/031_recovery_conflict.pl > > +++ b/src/test/recovery/t/031_recovery_conflict.pl > > @@ -59,7 +59,7 @@ my $table1 = "test_recovery_conflict_table1"; > > my $table2 = "test_recovery_conflict_table2"; > > $node_primary->safe_psql( > > $test_db, qq[ > > -CREATE TABLE ${table1}(a int, b int); > > +CREATE TABLE ${table1}(a int, b int) WITH (autovacuum_enabled = false); > > INSERT INTO $table1 SELECT i % 3, 0 FROM generate_series(1,20) i; > > CREATE TABLE ${table2}(a int, b int); > > ]); > > Thanks for paying attention to it! > > With such modification applied I've got another failure (on iteration 2): Thanks for trying it out! > [13:27:39.034](2.317s) ok 14 - startup deadlock: lock acquisition is waiting > Waiting for replication conn standby's replay_lsn to pass 0/343E6D0 on primary > done > timed out waiting for match: (?^:User transaction caused buffer deadlock with recovery.) at t/031_recovery_conflict.pl > line 318. > # Postmaster PID for node "primary" is 1523036 > ### Stopping node "primary" using mode immediate > > 031_recovery_conflict_standby.log really doesn't contain the expected > message. I can share log files from a successful and failed test runs, if > they can be helpful, or I'll investigate this case today/tomorrow. Hmm. The log file from the failed test run with (autovacuum_enabled=false) would be helpful. I can't tell without the log if it hit a different type of conflict. Unfortunately it was very difficult to trigger the specific type of recovery conflict we were trying to test and not hit another of the recovery conflict types first. It'll take me some time to swap this back in my head, though. - Melanie
20.03.2024 22:24, Melanie Plageman wrote: >> [13:27:39.034](2.317s) ok 14 - startup deadlock: lock acquisition is waiting >> Waiting for replication conn standby's replay_lsn to pass 0/343E6D0 on primary >> done >> timed out waiting for match: (?^:User transaction caused buffer deadlock with recovery.) at t/031_recovery_conflict.pl >> line 318. >> # Postmaster PID for node "primary" is 1523036 >> ### Stopping node "primary" using mode immediate >> >> 031_recovery_conflict_standby.log really doesn't contain the expected >> message. I can share log files from a successful and failed test runs, if >> they can be helpful, or I'll investigate this case today/tomorrow. > Hmm. The log file from the failed test run with > (autovacuum_enabled=false) would be helpful. I can't tell without the > log if it hit a different type of conflict. Unfortunately it was very > difficult to trigger the specific type of recovery conflict we were > trying to test and not hit another of the recovery conflict types > first. It'll take me some time to swap this back in my head, though. Please look at the attached logs. For a successful run, I see in 031_recovery_conflict_standby.log: 2024-03-20 13:28:08.084 UTC|||65fae466.1744d7|CONTEXT: WAL redo at 0/347F9B8 for Heap2/PRUNE: snapshotConflictHorizon: 0, nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101]; blkref #0: rel 1663/16385/16396, blk 0 2024-03-20 13:28:08.084 UTC|vagrant|test_db|65fae467.174503|ERROR: canceling statement due to conflict with recovery at character 15 2024-03-20 13:28:08.084 UTC|vagrant|test_db|65fae467.174503|DETAIL: User transaction caused buffer deadlock with recovery. 2024-03-20 13:28:08.084 UTC|||65fae466.1744d7|LOG: recovery finished waiting after 10.432 ms: recovery conflict on buffer pin 2024-03-20 13:28:08.084 UTC|||65fae466.1744d7|CONTEXT: WAL redo at 0/347F9B8 for Heap2/PRUNE: snapshotConflictHorizon: 0, nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101]; blkref #0: rel 1663/16385/16396, blk 0 2024-03-20 13:28:08.084 UTC|||65fae466.1744d7|LOG: REDO @ 0/347FAB8; LSN 0/347FB00: prev 0/347F9B8; xid 0; len 7; blkref #0: rel 1663/16385/16396, blk 0 - Heap2/FREEZE_PAGE: snapshotConflictHorizon: 762, nplans: 1, isCatalogRel: F, plans: [{ xmax: 0, infomask: 2817, infomask2: 2, ntuples: 1, offsets: [1] }] And the corresponding fragment from 031_recovery_conflict_primary.log: 2024-03-20 13:28:07.846 UTC|vagrant|test_db|65fae467.174549|LOG: xlog flush request 0/347DCB0; write 0/0; flush 0/0 2024-03-20 13:28:07.846 UTC|vagrant|test_db|65fae467.174549|CONTEXT: writing block 16 of relation base/16385/1249 2024-03-20 13:28:07.847 UTC|vagrant|test_db|65fae467.174549|LOG: statement: VACUUM FREEZE test_recovery_conflict_table1; 2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|LOG: INSERT @ 0/347FAB8: - Heap2/PRUNE: snapshotConflictHorizon: 0, nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101] 2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|CONTEXT: while scanning block 0 of relation "public.test_recovery_conflict_table1" 2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|STATEMENT: VACUUM FREEZE test_recovery_conflict_table1; 2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|LOG: INSERT @ 0/347FB00: - Heap2/FREEZE_PAGE: snapshotConflictHorizon: 762, nplans: 1, isCatalogRel: F, plans: [{ xmax: 0, infomask: 2817, infomask2: 2, ntuples: 1, offsets: [1] }] 2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|CONTEXT: while scanning block 0 of relation "public.test_recovery_conflict_table1" 2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|STATEMENT: VACUUM FREEZE test_recovery_conflict_table1; 2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|LOG: INSERT @ 0/3481B50: - XLOG/FPI_FOR_HINT: 2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|CONTEXT: while scanning block 0 of relation "public.test_recovery_conflict_table1" 2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|STATEMENT: VACUUM FREEZE test_recovery_conflict_table1; 2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|LOG: INSERT @ 0/3483BA0: - XLOG/FPI_FOR_HINT: 2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|CONTEXT: while scanning relation "public.test_recovery_conflict_table1" 2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|STATEMENT: VACUUM FREEZE test_recovery_conflict_table1; 2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|LOG: INSERT @ 0/3485BF0: - XLOG/FPI_FOR_HINT: 2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|CONTEXT: while scanning relation "public.test_recovery_conflict_table1" 2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|STATEMENT: VACUUM FREEZE test_recovery_conflict_table1; 2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|LOG: INSERT @ 0/3485CB0: - Heap/INPLACE: off: 4 2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|STATEMENT: VACUUM FREEZE test_recovery_conflict_table1; 2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|LOG: INSERT @ 0/3485D10: - Standby/INVALIDATIONS: ; inval msgs: catcache 55 catcache 54 relcache 16396 2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|STATEMENT: VACUUM FREEZE test_recovery_conflict_table1; Whilst 031_recovery_conflict_primary.log for the failed run contains: 2024-03-20 13:27:39.042 UTC|vagrant|test_db|65fae44b.17419a|LOG: statement: VACUUM FREEZE test_recovery_conflict_table1; 2024-03-20 13:27:39.043 UTC|vagrant|test_db|65fae44b.17419a|LOG: INSERT @ 0/343E718: - Heap2/FREEZE_PAGE: snapshotConflictHorizon: 751, nplans: 1, isCatalogRel: F, plans: [{ xmax: 0, infomask: 2817, infomask2: 2, ntuples: 1, offsets: [1] }] 2024-03-20 13:27:39.043 UTC|vagrant|test_db|65fae44b.17419a|CONTEXT: while scanning block 0 of relation "public.test_recovery_conflict_table1" 2024-03-20 13:27:39.043 UTC|vagrant|test_db|65fae44b.17419a|STATEMENT: VACUUM FREEZE test_recovery_conflict_table1; 2024-03-20 13:27:39.043 UTC|vagrant|test_db|65fae44b.17419a|LOG: INSERT @ 0/343E7D8: - Heap/INPLACE: off: 4 2024-03-20 13:27:39.043 UTC|vagrant|test_db|65fae44b.17419a|STATEMENT: VACUUM FREEZE test_recovery_conflict_table1; 2024-03-20 13:27:39.043 UTC|vagrant|test_db|65fae44b.17419a|LOG: INSERT @ 0/343E838: - Standby/INVALIDATIONS: ; inval msgs: catcache 55 catcache 54 relcache 16392 2024-03-20 13:27:39.043 UTC|vagrant|test_db|65fae44b.17419a|STATEMENT: VACUUM FREEZE test_recovery_conflict_table1; (there is no Heap2/PRUNE record) I've modified the test as follows: --- a/src/test/recovery/t/031_recovery_conflict.pl +++ b/src/test/recovery/t/031_recovery_conflict.pl @@ -59,7 +59,7 @@ my $table1 = "test_recovery_conflict_table1"; my $table2 = "test_recovery_conflict_table2"; $node_primary->safe_psql( $test_db, qq[ -CREATE TABLE ${table1}(a int, b int); +CREATE TABLE ${table1}(a int, b int) WITH (autovacuum_enabled = false); INSERT INTO $table1 SELECT i % 3, 0 FROM generate_series(1,20) i; CREATE TABLE ${table2}(a int, b int); ]); Best regards, Alexander