Re: Streaming replica failure - Mailing list pgsql-general

From Vijaykumar Jain
Subject Re: Streaming replica failure
Date
Msg-id CAM+6J94qefLYTuAy7uki7vs-LZ+c3eGi6ccimfGPZ8OsOUH1Fw@mail.gmail.com
Whole thread Raw
In response to Streaming replica failure  (Aleš Zelený <zeleny.ales@gmail.com>)
List pgsql-general
Were there any issues with hardware ?
Memory/storage ?
I am not sure but it look like data loss to me.

If you have admin access?

Can you run dmesg -a,  system logs, hyperion logs etc and see if there are any errors related to memory corruption.
Coz if there are hardware issues, this will happen again IMHO.
But if this is not a hardware issue, then I guess one of the experts may have to chime in.

Thanks,
Vijay



On Tue, Apr 27, 2021, 3:31 PM Aleš Zelený <zeleny.ales@gmail.com> wrote:
Hello,

we are using PostgreSQL 12.4 on CentOS 7. The hot standby failed:

2021-04-24 09:19:27 CEST [20956]: [747-1] user=,db=,host=,app= LOG:  recovery restart point at 3D8C/352B4CE8
2021-04-24 09:19:27 CEST [20956]: [748-1] user=,db=,host=,app= DETAIL:  Last completed transaction was at log time 2021-04-24 09:19:27.221313+02.
2021-04-24 09:20:57 CEST [20956]: [749-1] user=,db=,host=,app= LOG:  restartpoint starting: time
2021-04-24 09:24:27 CEST [20956]: [750-1] user=,db=,host=,app= LOG:  restartpoint complete: wrote 171233 buffers (13.4%); 0 WAL file(s) added, 68 removed, 0 recycled; write
=209.663 s, sync=0.012 s, total=209.963 s; sync files=283, longest=0.001 s, average=0.000 s; distance=684762 kB, estimate=684762 kB
2021-04-24 09:24:27 CEST [20956]: [751-1] user=,db=,host=,app= LOG:  recovery restart point at 3D8C/5EF6B858
2021-04-24 09:24:27 CEST [20956]: [752-1] user=,db=,host=,app= DETAIL:  Last completed transaction was at log time 2021-04-24 09:24:27.288115+02.
2021-04-24 09:25:15 CEST [20955]: [11-1] user=,db=,host=,app= WARNING:  page 366603 of relation base/20955/10143636 is uninitialized
2021-04-24 09:25:15 CEST [20955]: [12-1] user=,db=,host=,app= CONTEXT:  WAL redo at 3D8C/D79F6500 for Heap2/VISIBLE: cutoff xid 3806260577 flags 0x01
2021-04-24 09:25:15 CEST [20955]: [13-1] user=,db=,host=,app= PANIC:  WAL contains references to invalid pages
2021-04-24 09:25:15 CEST [20955]: [14-1] user=,db=,host=,app= CONTEXT:  WAL redo at 3D8C/D79F6500 for Heap2/VISIBLE: cutoff xid 3806260577 flags 0x01
2021-04-24 09:25:16 CEST [20953]: [11-1] user=,db=,host=,app= LOG:  startup process (PID 20955) was terminated by signal 6: Neúspěšně ukončen (SIGABRT)
2021-04-24 09:25:16 CEST [20953]: [12-1] user=,db=,host=,app= LOG:  terminating any other active server processes

The relation base/20955/10143636 is a standard table.

Version details:
postgres=# select version();
                                                 version                                                
---------------------------------------------------------------------------------------------------------
 PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
(1 row)

Settings:
postgres=# select name, setting, category from pg_settings where category = 'Write-Ahead Log / Settings';
          name          |  setting  |          category          
------------------------+-----------+----------------------------
 commit_delay           | 0         | Write-Ahead Log / Settings
 commit_siblings        | 5         | Write-Ahead Log / Settings
 fsync                  | on        | Write-Ahead Log / Settings
 full_page_writes       | on        | Write-Ahead Log / Settings
 synchronous_commit     | on        | Write-Ahead Log / Settings
 wal_buffers            | 2048      | Write-Ahead Log / Settings
 wal_compression        | off       | Write-Ahead Log / Settings
 wal_init_zero          | on        | Write-Ahead Log / Settings
 wal_level              | logical   | Write-Ahead Log / Settings
 wal_log_hints          | off       | Write-Ahead Log / Settings
 wal_recycle            | on        | Write-Ahead Log / Settings
 wal_sync_method        | fdatasync | Write-Ahead Log / Settings
 wal_writer_delay       | 200       | Write-Ahead Log / Settings
 wal_writer_flush_after | 128       | Write-Ahead Log / Settings
(14 rows)

pg_waldump output:
-bash-4.2$ /usr/pgsql-12/bin/pg_waldump 0000000100003D8C000000D7 2>&1 | tail
rmgr: Btree       len (rec/tot):     64/    64, tx: 3812802559, lsn: 3D8C/D7CF5A98, prev 3D8C/D7CF5A58, desc: INSERT_LEAF off 360, blkref #0: rel 1663/20955/11280092 blk 19
509
rmgr: Transaction len (rec/tot):     46/    46, tx: 3812802557, lsn: 3D8C/D7CF5AD8, prev 3D8C/D7CF5A98, desc: COMMIT 2021-04-24 09:25:16.160687 CEST
rmgr: Heap        len (rec/tot):    159/   159, tx: 3812802559, lsn: 3D8C/D7CF5B08, prev 3D8C/D7CF5AD8, desc: INSERT off 8 flags 0x08, blkref #0: rel 1663/20955/11280066 bl
k 165603
rmgr: Btree       len (rec/tot):     64/    64, tx: 3812802559, lsn: 3D8C/D7CF5BA8, prev 3D8C/D7CF5B08, desc: INSERT_LEAF off 317, blkref #0: rel 1663/20955/11280073 blk 15
340
rmgr: Btree       len (rec/tot):     64/    64, tx: 3812802559, lsn: 3D8C/D7CF5BE8, prev 3D8C/D7CF5BA8, desc: INSERT_LEAF off 130, blkref #0: rel 1663/20955/11280091 blk 22
003
rmgr: Heap        len (rec/tot):     80/    80, tx: 3812802559, lsn: 3D8C/D7CF5C28, prev 3D8C/D7CF5BE8, desc: INSERT off 61 flags 0x08, blkref #0: rel 1663/20955/11280082 b
lk 38917
rmgr: Btree       len (rec/tot):     64/    64, tx: 3812802559, lsn: 3D8C/D7CF5C78, prev 3D8C/D7CF5C28, desc: INSERT_LEAF off 73, blkref #0: rel 1663/20955/11280088 blk 421
25
rmgr: Btree       len (rec/tot):     64/    64, tx: 3812802559, lsn: 3D8C/D7CF5CB8, prev 3D8C/D7CF5C78, desc: INSERT_LEAF off 361, blkref #0: rel 1663/20955/11280092 blk 19
509
rmgr: Transaction len (rec/tot):     46/    46, tx: 3812802559, lsn: 3D8C/D7CF5CF8, prev 3D8C/D7CF5CB8, desc: COMMIT 2021-04-24 09:25:16.161036 CEST
pg_waldump: fatal: error in WAL record at 3D8C/D7CF5CF8: invalid record length at 3D8C/D7CF5D28: wanted 24, got 0
-bash-4.2$

Emails like "BUG #13822: Slave terminated - WAL contains references to invalid page" in the archive are quite old (2015).

While we can create a new replica, ios there something we can do to diagnose the root cause of this failure?

Thanks for any hints or advice.

Kind regards Ales Zeleny


pgsql-general by date:

Previous
From: Wolfgang Rißler
Date:
Subject: Access a newer Version of PGDB (v13) with an older libpq (v10 x86)
Next
From: Thomas Munro
Date:
Subject: Re: -1/0 virtualtransaction