Re: Production block comparison facility - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Production block comparison facility
Date
Msg-id CAB7nPqR4vxdKijP+Du82vOcOnGMvutq-gfqiU2dsH4bsM77hYg@mail.gmail.com
Whole thread Raw
In response to Re: Production block comparison facility  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: Production block comparison facility  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Production block comparison facility  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Re: Production block comparison facility  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers



On Tue, Jul 22, 2014 at 4:49 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
Then, looking at the code, we would need to tweak XLogInsert for the
WAL record construction to always do a FPW and to update
XLogCheckBufferNeedsBackup. Then for the redo part, we would need to
do some extra operations in the area of
RestoreBackupBlock/RestoreBackupBlockContents, including masking
operations before comparing the content of the FPW and the current
page.

Does that sound right?
 
I have spent some time digging more into this idea and finished with the patch attached, doing the following: addition of a consistency check when FPW is restored and applied on a given page.

The consistency check is made of two phases:
- Apply a mask on the FPW and the current page to eliminate potential conflicts like hint bits for example.
- Check that the FPW is consistent with the current page, aka the current page does not contain any new information that the FPW taken has not. This is done by checking the masked portions of the FPW and the current page.
Also some more details:
- If an inconsistency is found, a WARNING is simply logged.
- The consistency check is done if current page is not empty, and if database has reached a consistent state.
- The page masking API is taken from the WAL replay patch that was submitted in CF1 and plugged in as an independent set of API.
- In masking stuff, to facilitate if a page is used by a sequence relation SEQ_MAGIC as well as the its opaque data structure are renamed and moved into sequence.h.
- To facilitate debugging and comparison, the masked FPW and current page are also converted into hex.
Things could be refactored and improved for sure, but this patch is already useful as-is so I am going to add it to the next commit fest.

Comments are welcome.
Regards,
--
Michael
Attachment

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: PDF builds broken again
Next
From: Tom Lane
Date:
Subject: Re: Inconsistencies of service failure handling on Windows