inconsistent page found on STANDBY server - Mailing list pgsql-hackers

From Ashutosh Sharma
Subject inconsistent page found on STANDBY server
Date
Msg-id CAE9k0PnmPDXfvf8HDObme7q_Ewc4E26ukHXUBPySoOs0ObqqaQ@mail.gmail.com
Whole thread Raw
List pgsql-hackers
Hi All,

When running make installcheck on a master with wal consistency check
enabled, inconsistent page is detected on standby. I could see the
following FATAL message in the standby server logfile,

2017-03-30 07:31:10.101 BST [27994] LOG:  entering standby mode
2017-03-30 07:31:10.106 BST [27994] LOG:  redo starts at 0/2000024
2017-03-30 07:31:10.108 BST [27994] LOG:  consistent recovery state
reached at 0/20000E4
2017-03-30 07:31:10.108 BST [27992] LOG:  database system is ready to
accept read only connections
2017-03-30 07:31:10.113 BST [27998] LOG:  started streaming WAL from
primary at 0/3000000 on timeline 1
2017-03-30 07:33:19.040 BST [27994] FATAL:  inconsistent page found,
rel 1663/13157/16391, forknum 0, blkno 0
2017-03-30 07:33:19.040 BST [27994] CONTEXT:  WAL redo at 0/351CF03C
for Hash/UPDATE_META_PAGE: ntuples -nan
2017-03-30 07:33:19.041 BST [27992] LOG:  startup process (PID 27994)
exited with exit code 1

Steps to reproduce:
===================
1)PG v10 sources
2)Setup Master/SLAVE replication
3)run make installcheck on Master
4)Check database logs ,generated on SLAVE directory.

Please note that above issue is observed only on 32 bit LInux machine
and was offlist reported to me by Tushar Ahuja. Tushar also allowed me
to use his 32 bit Linux machine to analyse this problem. I also had a
small offlist discussion with Amit (included in this mail) when
analysing this problem.

RCA:
====
After debugging the hash index code for deletion, I could find that
while registering data for xlog record 'XLOG_HASH_UPDATE_META_PAGE' we
are not passing the correct length of data being registered and
therefore, data (xl_hash_update_meta_page) is not completely recorded
into the wal record.

Fix:
===
Attached patch fixes this issue.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Logical decoding on standby
Next
From: tushar
Date:
Subject: Re: Getting server crash after running sqlsmith