Streaming replica failure - Mailing list pgsql-general

From Aleš Zelený
Subject Streaming replica failure
Date
Msg-id CAODqTUYjAdr-qHkWEgDWhtYf0Bkz__so8zxVfqU9bT44HzaOBw@mail.gmail.com
Whole thread Raw
Responses Re: Streaming replica failure  (Vijaykumar Jain <vijaykumarjain.github@gmail.com>)
List pgsql-general
Hello,

we are using PostgreSQL 12.4 on CentOS 7. The hot standby failed:

2021-04-24 09:19:27 CEST [20956]: [747-1] user=,db=,host=,app= LOG:  recovery restart point at 3D8C/352B4CE8
2021-04-24 09:19:27 CEST [20956]: [748-1] user=,db=,host=,app= DETAIL:  Last completed transaction was at log time 2021-04-24 09:19:27.221313+02.
2021-04-24 09:20:57 CEST [20956]: [749-1] user=,db=,host=,app= LOG:  restartpoint starting: time
2021-04-24 09:24:27 CEST [20956]: [750-1] user=,db=,host=,app= LOG:  restartpoint complete: wrote 171233 buffers (13.4%); 0 WAL file(s) added, 68 removed, 0 recycled; write
=209.663 s, sync=0.012 s, total=209.963 s; sync files=283, longest=0.001 s, average=0.000 s; distance=684762 kB, estimate=684762 kB
2021-04-24 09:24:27 CEST [20956]: [751-1] user=,db=,host=,app= LOG:  recovery restart point at 3D8C/5EF6B858
2021-04-24 09:24:27 CEST [20956]: [752-1] user=,db=,host=,app= DETAIL:  Last completed transaction was at log time 2021-04-24 09:24:27.288115+02.
2021-04-24 09:25:15 CEST [20955]: [11-1] user=,db=,host=,app= WARNING:  page 366603 of relation base/20955/10143636 is uninitialized
2021-04-24 09:25:15 CEST [20955]: [12-1] user=,db=,host=,app= CONTEXT:  WAL redo at 3D8C/D79F6500 for Heap2/VISIBLE: cutoff xid 3806260577 flags 0x01
2021-04-24 09:25:15 CEST [20955]: [13-1] user=,db=,host=,app= PANIC:  WAL contains references to invalid pages
2021-04-24 09:25:15 CEST [20955]: [14-1] user=,db=,host=,app= CONTEXT:  WAL redo at 3D8C/D79F6500 for Heap2/VISIBLE: cutoff xid 3806260577 flags 0x01
2021-04-24 09:25:16 CEST [20953]: [11-1] user=,db=,host=,app= LOG:  startup process (PID 20955) was terminated by signal 6: Neúspěšně ukončen (SIGABRT)
2021-04-24 09:25:16 CEST [20953]: [12-1] user=,db=,host=,app= LOG:  terminating any other active server processes

The relation base/20955/10143636 is a standard table.

Version details:
postgres=# select version();
                                                 version                                                
---------------------------------------------------------------------------------------------------------
 PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
(1 row)

Settings:
postgres=# select name, setting, category from pg_settings where category = 'Write-Ahead Log / Settings';
          name          |  setting  |          category          
------------------------+-----------+----------------------------
 commit_delay           | 0         | Write-Ahead Log / Settings
 commit_siblings        | 5         | Write-Ahead Log / Settings
 fsync                  | on        | Write-Ahead Log / Settings
 full_page_writes       | on        | Write-Ahead Log / Settings
 synchronous_commit     | on        | Write-Ahead Log / Settings
 wal_buffers            | 2048      | Write-Ahead Log / Settings
 wal_compression        | off       | Write-Ahead Log / Settings
 wal_init_zero          | on        | Write-Ahead Log / Settings
 wal_level              | logical   | Write-Ahead Log / Settings
 wal_log_hints          | off       | Write-Ahead Log / Settings
 wal_recycle            | on        | Write-Ahead Log / Settings
 wal_sync_method        | fdatasync | Write-Ahead Log / Settings
 wal_writer_delay       | 200       | Write-Ahead Log / Settings
 wal_writer_flush_after | 128       | Write-Ahead Log / Settings
(14 rows)

pg_waldump output:
-bash-4.2$ /usr/pgsql-12/bin/pg_waldump 0000000100003D8C000000D7 2>&1 | tail
rmgr: Btree       len (rec/tot):     64/    64, tx: 3812802559, lsn: 3D8C/D7CF5A98, prev 3D8C/D7CF5A58, desc: INSERT_LEAF off 360, blkref #0: rel 1663/20955/11280092 blk 19
509
rmgr: Transaction len (rec/tot):     46/    46, tx: 3812802557, lsn: 3D8C/D7CF5AD8, prev 3D8C/D7CF5A98, desc: COMMIT 2021-04-24 09:25:16.160687 CEST
rmgr: Heap        len (rec/tot):    159/   159, tx: 3812802559, lsn: 3D8C/D7CF5B08, prev 3D8C/D7CF5AD8, desc: INSERT off 8 flags 0x08, blkref #0: rel 1663/20955/11280066 bl
k 165603
rmgr: Btree       len (rec/tot):     64/    64, tx: 3812802559, lsn: 3D8C/D7CF5BA8, prev 3D8C/D7CF5B08, desc: INSERT_LEAF off 317, blkref #0: rel 1663/20955/11280073 blk 15
340
rmgr: Btree       len (rec/tot):     64/    64, tx: 3812802559, lsn: 3D8C/D7CF5BE8, prev 3D8C/D7CF5BA8, desc: INSERT_LEAF off 130, blkref #0: rel 1663/20955/11280091 blk 22
003
rmgr: Heap        len (rec/tot):     80/    80, tx: 3812802559, lsn: 3D8C/D7CF5C28, prev 3D8C/D7CF5BE8, desc: INSERT off 61 flags 0x08, blkref #0: rel 1663/20955/11280082 b
lk 38917
rmgr: Btree       len (rec/tot):     64/    64, tx: 3812802559, lsn: 3D8C/D7CF5C78, prev 3D8C/D7CF5C28, desc: INSERT_LEAF off 73, blkref #0: rel 1663/20955/11280088 blk 421
25
rmgr: Btree       len (rec/tot):     64/    64, tx: 3812802559, lsn: 3D8C/D7CF5CB8, prev 3D8C/D7CF5C78, desc: INSERT_LEAF off 361, blkref #0: rel 1663/20955/11280092 blk 19
509
rmgr: Transaction len (rec/tot):     46/    46, tx: 3812802559, lsn: 3D8C/D7CF5CF8, prev 3D8C/D7CF5CB8, desc: COMMIT 2021-04-24 09:25:16.161036 CEST
pg_waldump: fatal: error in WAL record at 3D8C/D7CF5CF8: invalid record length at 3D8C/D7CF5D28: wanted 24, got 0
-bash-4.2$

Emails like "BUG #13822: Slave terminated - WAL contains references to invalid page" in the archive are quite old (2015).

While we can create a new replica, ios there something we can do to diagnose the root cause of this failure?

Thanks for any hints or advice.

Kind regards Ales Zeleny


pgsql-general by date:

Previous
From: Matteo Bonardi
Date:
Subject: Re: Temporary files usage in explain
Next
From: Sanjay Minni
Date:
Subject: Approach to creating users in Database