BUG #18811: PANIC,XX000,"WAL contains references to invalid pages" - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #18811: PANIC,XX000,"WAL contains references to invalid pages" |
Date | |
Msg-id | 18811-dbd06bbde2609075@postgresql.org Whole thread Raw |
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 18811 Logged by: Polina Bungina Email address: bungina@gmail.com PostgreSQL version: 16.6 Operating system: Ubuntu 22.04 Description: We have just encountered this problem the second time within a month. Standby starts to panic after the following sequence of events: [primary] 2025-02-12 06:22:44.160 UTC,,,3268046,,67ac3e15.31ddce,63,,2025-02-12 06:22:13 UTC,614/47144,1969937340,ERROR,57014,"canceling autovacuum task",,,,,"while truncating relation ""data.slice_trigger_13"" to 0 blocks automatic vacuum of table ""db1.data.slice_trigger_13""",,,,"","autovacuum worker",,0 2025-02-12 06:22:44.160 UTC,"user_app","db1",3263558,"10.2.30.104:42868",67ac3c0c.31cc46,75,"BIND waiting",2025-02-12 06:13:32 UTC,412/2708655,1969937332,LOG,00000,"process 3263558 acquired RowExclusiveLock on relation 35182 of database 16710 after 1184.572 ms",,,,,,"delete from ""data"".""slice_trigger"" where ""data"".""slice_trigger"".""st_id"" in ($1)",,,"PostgreSQL JDBC Driver","client backend",,-7230379448131803312 2025-02-12 06:22:44.160 UTC,"user_app","db1",3263678,"10.2.30.104:56256",67ac3c1a.31ccbe,55,"PARSE waiting",2025-02-12 06:13:46 UTC,427/2212958,0,LOG,00000,"process 3263678 acquired AccessShareLock on relation 35182 of database 16710 after 1042.391 ms",,,,,,"with ""first_select"" as (select * from data.slice_trigger_13 order by ""st_occurred_at"" limit $1), ""windowed_select"" as (select rank() over (partition by ""st_unit_identifier"", ""st_reservation_identifier"" order by ""st_occurred_at"", ""st_id"") as ""row_rank"", * from first_select) select * from windowed_select where ""row_rank"" = $2",39,,"PostgreSQL JDBC Driver","client backend",,0 2025-02-12 06:22:44.160 UTC,"user_app","db1",3263678,"10.2.30.104:56256",67ac3c1a.31ccbe,56,"PARSE",2025-02-12 06:13:46 UTC,427/2212958,0,LOG,00000,"duration: 1042.822 ms parse <unnamed>: with ""first_select"" as (select * from data.slice_trigger_13 order by ""st_occurred_at"" limit $1), ""windowed_select"" as (select rank() over (partition by ""st_unit_identifier"", ""st_reservation_identifier"" order by ""st_occurred_at"", ""st_id"") as ""row_rank"", * from first_select) select * from windowed_select where ""row_rank"" = $2",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,-4561709464811390503 [replica] 2025-02-12 06:23:25.302 UTC,,,413151,,6790b32a.64ddf,184,,2025-01-22 08:58:18 UTC,1/0,0,WARNING,01000,"page 1 of relation base/16710/35182 is uninitialized",,,,,"WAL redo at 1732/47969908 for Heap2/VISIBLE: snapshotConflictHorizon: 0, flags: 0x03; blkref #0: rel 1663/16710/35182, fork 2, blk 0 FPW; blkref #1: rel 1663/16710/35182, blk 1",,,,"","startup",,0 2025-02-12 06:23:25.302 UTC,,,413151,,6790b32a.64ddf,185,,2025-01-22 08:58:18 UTC,1/0,0,PANIC,XX000,"WAL contains references to invalid pages",,,,,"WAL redo at 1732/47969908 for Heap2/VISIBLE: snapshotConflictHorizon: 0, flags: 0x03; blkref #0: rel 1663/16710/35182, fork 2, blk 0 FPW; blkref #1: rel 1663/16710/35182, blk 1",,,,"","startup",,0 2025-02-12 06:23:26.233 UTC,,,413146,,6790b32a.64dda,7,,2025-01-22 08:58:18 UTC,,0,LOG,00000,"startup process (PID 413151) was terminated by signal 6: Aborted",,,,,,,,,"","postmaster",,0 [wal records for the problematic table around 1732/47969908] rmgr: Heap len (rec/tot): 54/ 54, tx: 1969939280, lsn: 1732/47306DF0, prev 1732/47306DC8, desc: DELETE xmax: 1969939280, off: 1, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/16710/35182 blk 21 rmgr: Heap len (rec/tot): 54/ 54, tx: 1969939280, lsn: 1732/47306E28, prev 1732/47306DF0, desc: DELETE xmax: 1969939280, off: 2, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/16710/35182 blk 21 rmgr: Heap len (rec/tot): 173/ 173, tx: 1969939315, lsn: 1732/4738B258, prev 1732/4738B230, desc: INSERT off: 7, flags: 0x00, blkref #0: rel 1663/16710/35182 blk 21 rmgr: Heap len (rec/tot): 173/ 173, tx: 1969939352, lsn: 1732/4749E4E0, prev 1732/4749E488, desc: INSERT off: 8, flags: 0x00, blkref #0: rel 1663/16710/35182 blk 21 rmgr: Heap len (rec/tot): 173/ 173, tx: 1969939352, lsn: 1732/474BF8F0, prev 1732/474BF898, desc: INSERT off: 9, flags: 0x00, blkref #0: rel 1663/16710/35182 blk 21 rmgr: Heap2 len (rec/tot): 64/ 186, tx: 0, lsn: 1732/47969908, prev 1732/479698C8, desc: VISIBLE snapshotConflictHorizon: 0, flags: 0x03, blkref #0: rel 1663/16710/35182 fork vm blk 0 FPW, blkref #1: rel 1663/16710/35182 blk 1 rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1732/4796A440, prev 1732/4796A3E8, desc: VISIBLE snapshotConflictHorizon: 0, flags: 0x03, blkref #0: rel 1663/16710/35182 fork vm blk 0, blkref #1: rel 1663/16710/35182 blk 2 rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1732/4796A480, prev 1732/4796A440, desc: VISIBLE snapshotConflictHorizon: 0, flags: 0x03, blkref #0: rel 1663/16710/35182 fork vm blk 0, blkref #1: rel 1663/16710/35182 blk 3 <..> rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 1732/4796FCC0, prev 1732/4796FC80, desc: VISIBLE snapshotConflictHorizon: 0, flags: 0x03, blkref #0: rel 1663/16710/35182 fork vm blk 0, blkref #1: rel 1663/16710/35182 blk 25 rmgr: Standby len (rec/tot): 90/ 90, tx: 0, lsn: 1732/4797E240, prev 1732/4797E0F8, desc: INVALIDATIONS ; inval msgs: catcache 55 catcache 54 relcache 35182 rmgr: Heap len (rec/tot): 173/ 173, tx: 1969939561, lsn: 1732/479C0688, prev 1732/479C0630, desc: INSERT+INIT off: 1, flags: 0x01, blkref #0: rel 1663/16710/35182 blk 1 [the table structure] \d+ data.slice_trigger Partitioned table "data.slice_trigger" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description ---------------------------+--------------------------+-----------+----------+---------------------------------------------------------+----------+-------------+--------------+------------- st_id | bigint | | not null | nextval('data.slice_trigger_st_id_seq'::regclass) | plain | | | st_site | text | | not null | | extended | | | st_event_type | text | | not null | | extended | | | st_data_type | text | | | | extended | | | st_storing_state | text | | | | extended | | | st_event_id | bigint | | | nextval('data.slice_trigger_st_event_id_seq'::regclass) | plain | | | st_unit_identifier | text | | not null | | extended | | | st_reservation_identifier | text | | | | extended | | | st_occurred_at | timestamp with time zone | | not null | clock_timestamp() | plain | | | st_event_json | text | | | | extended | | | st_process | text | | | | extended | | | Partition key: HASH (st_site, st_unit_identifier) Indexes: "slice_trigger_pkey" PRIMARY KEY, btree (st_id, st_site, st_unit_identifier) Partitions: data.slice_trigger_0 FOR VALUES WITH (modulus 32, remainder 0), data.slice_trigger_1 FOR VALUES WITH (modulus 32, remainder 1), data.slice_trigger_10 FOR VALUES WITH (modulus 32, remainder 10), data.slice_trigger_11 FOR VALUES WITH (modulus 32, remainder 11), data.slice_trigger_12 FOR VALUES WITH (modulus 32, remainder 12), data.slice_trigger_13 FOR VALUES WITH (modulus 32, remainder 13), ... there are plenty of "canceling autovacuum task while truncating relation" log entries for other partitions of this table and other partitioned tables before and after this happening, which did not cause the same issue.
pgsql-bugs by date: