Re: WAL consistency check facility - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: WAL consistency check facility |
Date | |
Msg-id | CAB7nPqR=OcojLCP=1Ho6Zo312CKzUZU8d4aJO+VvpUYV-waU_Q@mail.gmail.com Whole thread Raw |
In response to | Re: WAL consistency check facility (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: WAL consistency check facility
Re: WAL consistency check facility |
List | pgsql-hackers |
On Thu, Oct 27, 2016 at 5:08 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Thu, Sep 29, 2016 at 12:49 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> Seeing nothing happening, I have moved the patch to next CF as there >> is a new version, but no reviews for it. > > Just a note for anybody potentially looking at this patch. I am > currently looking at it in depth, and will post a new version of the > patch in a couple of days with review comments. Thanks. And here we go. Here is a review as well as a large brush-up for this patch. A couple of things: - wal_consistency is using a list of RMGRs, at the cost of being PGC_POSTMASTER. I'd suggest making it PGC_SUSER, and use a boolean (I have been thinking hard about that, and still I don't see the point). It is rather easy to for example default it to false, and enable it to true to check if a certain code path is correctly exercised or not for WAL consistency. Note that this simplification reduces the patch size by 100~150 lines. I know, I know, I'd expect some complains about that.... - Looking for wal_consistency at replay has no real value. What if on a standby the parameter value is inconsistent than the one on the master? This logic adds a whole new level of complications and potential bugs. So instead my suggestion is to add a marker at WAL record level to check if this record should be checked for consistency at replay or not. This is also quite flexible if you think about it, the standby is made independent of the WAL generated on the master and just applies, or checks what it sees is fit checking for. The best match here is to add a flag for XLogRecord->xl_info and make use of one of the low 4 bits and only one is used now for XLR_SPECIAL_REL_UPDATE. An interesting side effect of this approach is that callers of XLogInsert can set XLR_CHECK_CONSISTENCY to enforce a consistency check even if wal_consistency is off. It is true that we could register such a data via XLogRegisterBuffer() instead, though the 4 bits with the BKPBLOCK_* flags are already occupied so that would induce a record penalty length and I have a hard time believing that one would like to check the consistency of a record in particular. - Speaking of which using BKPIMAGE_IS_REQUIRED_FOR_REDO stored in the block definition is sort of weird because we want to know if consistency should be checked at a higher level. - in maskPage, the new rmgr routine, there is no need for the info and blkno arguments. info is not used at all to begin with. blkno is used for gin pages to detect meta pages but this can be guessed using the opaque pointer. For heap pages and speculative inserts, masking the blkno would be fine. That's not worth it. - Instead of palloc'ing the old and new pages to compare, it would be more performant to keep around two static buffers worth of BLCKSZ and just use that. This way there is no need as well to perform any palloc calls in the masking functions, limiting the risk of errors (those code paths had better avoid errors IMO). It would be also less costly to just pass to the masking function a pointer to a buffer of size BLCKSZ and just do the masking on it. - The masking routine names can be more generic, like XXX_mask(char *page). No need to say page, we already know they work on it via the argument provided. - mask_xlog_page and mask_generic_page are useless as the block restored comes directly from a FPW, so you are comparing basically a FPW with itself. - In checkConsistency, there is no need to allocate the old page. As RestorebackupImage stores the data in an already allocated buffer, you can reuse the same location as the buffer masked afterwards. - Removed comparePages(), using memcmp instead for simplicity(). This does not show up the exact location of the inconsistency, still that won't be a win as there could be more than one inconsistency across a page. So this gives an invitation to user to look at the exact context. memcmp can be used anyway to understand where is the inconsistency if need be. - I have noticed that mask_common_page is meaningfull just for the sequence RMGR, and just that does not justify its existence so I ripped it off. - PostgresNode.pm enables wal_consistency. Seeing the high amount of WAL this produces, I feel cold about doing that, the patch does include it btw... - Standbys now stop with FATAL when an inconsistency is found. This makes error detection easier on buildfarm machines. - A couple of masking functions still use 0xFFFFFF or similar marks. Those should be replaced by MASK_MARKING. Not done that yet. - Some of the masking routines should be refined, particularly the heap and GIn functions. I did not spend time yet to do it. On top of that, I have done a fair amount of testing, creating manually some inconsistencies in the REDO routines to trigger failures on standbys. And that was sort of fun to break things intentionally. Another fun thing is the large amount of WAL that this generates (!), so anyone willing to enable that in production would be crazy. Enabling it for development and/or session is something that would clearly help. I am sending back the patch as waiting on author. Attached is what I have up to now. -- Michael
Attachment
pgsql-hackers by date: