BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC |
Date | |
Msg-id | 18426-2d18da6586f152d6@postgresql.org Whole thread Raw |
Responses |
Re: BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 18426 Logged by: Alexander Lakhin Email address: exclusion@gmail.com PostgreSQL version: 16.2 Operating system: Ubuntu 22.04 Description: When running test 027_stream_regress on a rather slow ARMv7 device, I came upon the following test failure: [21:31:30.789](489.962s) # poll_query_until timed out executing this query: # SELECT '0/16952DE0' <= replay_lsn AND state = 'streaming' # FROM pg_catalog.pg_stat_replication # WHERE application_name IN ('standby_1', 'walreceiver') # expecting this output: # t # last actual query output: # # with stderr: timed out waiting for catchup at t/027_stream_regress.pl line 100. log/027_stream_regress_standby_1.log ends with: ... 2024-04-04 21:04:01.946 UTC [17903] LOG: recovery restart point at 0/11C71430 2024-04-04 21:04:01.946 UTC [17903] DETAIL: Last completed transaction was at log time 2024-04-04 21:02:19.456471+03. 2024-04-04 21:06:16.014 UTC [17905] WARNING: page 7 of relation base/16384/2606 does not exist 2024-04-04 21:06:16.014 UTC [17905] CONTEXT: WAL redo at 0/13841DE8 for Heap2/PRUNE: snapshotConflictHorizon: 13050, nredirected: 0, ndead: 14, nunused: 1, redirected: [], dead: [1, 2, 3, 4, 5, 6, 28, 31, 32, 33, 34, 35, 36, 37], unused: [7]; blkref #0: rel 1663/16384/2606, blk 7 2024-04-04 21:06:16.014 UTC [17905] PANIC: WAL contains references to invalid pages 2024-04-04 21:06:16.014 UTC [17905] CONTEXT: WAL redo at 0/13841DE8 for Heap2/PRUNE: snapshotConflictHorizon: 13050, nredirected: 0, ndead: 14, nunused: 1, redirected: [], dead: [1, 2, 3, 4, 5, 6, 28, 31, 32, 33, 34, 35, 36, 37], unused: [7]; blkref #0: rel 1663/16384/2606, blk 7 2024-04-04 21:06:16.927 UTC [17888] LOG: startup process (PID 17905) was terminated by signal 6: Aborted t_027_stream_regress_standby_1_data/pgdata/pg_wal/000000010000000000000013 contains: ... rmgr: XLOG len (rec/tot): 49/ 8241, tx: 13103, lsn: 0/137C3538, prev 0/137C14E8, desc: FPI , blkref #0: rel 1663/16384/2606 fork vm blk 0 FPW rmgr: Storage len (rec/tot): 46/ 46, tx: 13103, lsn: 0/137C5588, prev 0/137C3538, desc: TRUNCATE base/16384/2606 to 7 blocks flags 7 ... rmgr: Transaction len (rec/tot): 38/ 38, tx: 13103, lsn: 0/137C7588, prev 0/137C7540, desc: ABORT 2024-04-04 21:05:31.176869 UTC ### That is, the relation 2606 (pg_constraint) was truncated in ### transaction 13103, and then the transaction was aborted. ... rmgr: Heap2 len (rec/tot): 85/ 85, tx: 0, lsn: 0/13841DE8, prev 0/13841DA8, desc: PRUNE snapshotConflictHorizon: 13050, nredirected: 0, ndead: 14, nunused: 1, redirected: [], dead: [1, 2, 3, 4, 5, 6, 28, 31, 32, 33, 34, 35, 36, 37], unused: [7], blkref #0: rel 1663/16384/2606 blk 7 ### The record that standby could not replay because it did truncate ### pg_constraint when replaying 0/137C5588. 027_stream_regress_primary.log contains: 2024-04-04 21:05:31.176 UTC [29329] ERROR: canceling autovacuum task 2024-04-04 21:05:31.176 UTC [29329] CONTEXT: while truncating relation "pg_catalog.pg_constraint" to 7 blocks automatic vacuum of table "regression.pg_catalog.pg_constraint" As far as I can see, the TRUNCATE record was WAL-logged by the autovacuum process in table_relation_vacuum() -> heap_vacuum_rel() -> lazy_truncate_heap() -> RelationTruncate(), and then that process was interrupted inside RelationTruncate() -> smgrtruncate() -> DropRelationBuffers() -> InvalidateBuffer() -> WaitIO(). A demo test for the issue to follow...
pgsql-bugs by date: