Re: WAL replay bugs - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: WAL replay bugs |
Date | |
Msg-id | CAB7nPqTm9Xx5rHY6uSjfreBvLe4cKmDn2Ngh8wXCmaPw+HLdBg@mail.gmail.com Whole thread Raw |
In response to | Re: WAL replay bugs (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: WAL replay bugs
(Heikki Linnakangas <hlinnakangas@vmware.com>)
Re: WAL replay bugs (Michael Paquier <michael.paquier@gmail.com>) |
List | pgsql-hackers |
On Mon, Jun 2, 2014 at 9:55 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Apr 23, 2014 at 9:43 PM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: > Perhaps there are parts of what is proposed here that could be made > more generalized, like the masking functions. So do not hesitate if > you have any opinion on the matter. OK, attached is the result of this hacking: Buffer capture facility: check WAL replay consistency It is a tool aimed to be used by developers and buildfarm machines that can be used to check for consistency at page level when replaying WAL files among several nodes of a cluster (generally master and standby node). This facility is made of two parts: - A server part, where all the changes happening at page level are captured and inserted in a file called buffer_captures located at the root of PGDATA. Each buffer entry is masked to make the comparison across node consistent (flags like hint bits for example) and then each buffer is captured is with the following format as a single line of the output file: LSN: %08X/%08X page: PAGE_IN_HEXA Hexadecimal format makes it easier to detect differences between pages, and format is chosen to facilitate comparison between buffer entries. - A client part, located in contrib/buffer_capture_cmp, that can be used to compare buffer captures between nodes. The footprint on core code is minimal and is controlled by a symbol called BUFFER_CAPTURE that needs to be set at build time to enable the buffer capture at server level. If this symbol is not enabled, both server and client parts are idle and generate nothing. Note that this facility can generate a lot of output (11G when running regression tests, counting double when using both master and standby). contrib/buffer_capture_cmp contains a regression test facility easing testing with buffer captures. The user just needs to run "make check" in this folder... There is a default set of tests saved in test-default.sh but user is free to set up custom tests by creating a file called test-custom.sh that can be kicked by the test facility if this file is present instead of the defaults. Patch will be added to the first commit fest as well. Note that the footprint on core code is limited, so even if there is more than 1k lines of codes, review is simpler than it looks. A couple of things to note though: 1) In order to detect if a page is used for a sequence, SEQ_MAGIC needs to be exposed in sequence.h. This is included in the patch attached but perhaps this should be changed as a separate patch 2) Regression test facility uses some useful parts taken from pg_upgrade. I think that we should gather those parts in a common place (contrib/common?). This can facilitate the integration of other modules using regression based on bash scripts. 3) While hacking this facility, I noticed that some ItemId entries in btree pages could be inconsistent between master and standby. Those items are masked in the current patch, but it looks like a bug of Postgres itself. Documentation is added in the code itself, I didn't feel any need to expose this facility the lambda users in doc/src/sgml... Regards, -- Michael
Attachment
pgsql-hackers by date: