Re: pg_rewind, a tool for resynchronizing an old master after failover - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: pg_rewind, a tool for resynchronizing an old master after failover
Date
Msg-id 519F85FF.7090601@nasby.net
Whole thread Raw
In response to Re: pg_rewind, a tool for resynchronizing an old master after failover  (Pavan Deolasee <pavan.deolasee@gmail.com>)
List pgsql-hackers
On 5/23/13 12:51 PM, Pavan Deolasee wrote:
>
>
>
> On Thu, May 23, 2013 at 11:10 PM, Heikki Linnakangas <hlinnakangas@vmware.com <mailto:hlinnakangas@vmware.com>>
wrote:
>
>     On 23.05.2013 07:55, Robert Haas wrote:
>
>         On Thu, May 23, 2013 at 7:10 AM, Heikki Linnakangas
>         <hlinnakangas@vmware.com <mailto:hlinnakangas@vmware.com>>  wrote:
>
>             1. Scan the WAL log of the old cluster, starting from the point where
>             the new cluster's timeline history forked off from the old cluster. For each
>             WAL record, make a note of the data blocks that are touched. This yields a
>             list of all the data blocks that were changed in the old cluster, after the
>             new cluster forked off.
>
>
>         Suppose that a transaction is open and has written tuples at the point
>         where WAL forks.  After WAL forks, the transaction commits.  Then, it
>         hints some of the tuples that it wrote.  There is no record in WAL
>         that those blocks are changed, but failing to revert them leads to
>         data corruption.
>
>
>     Bummer, you're right. Hmm, if you have checksums enabled, however, we'll WAL log a full-page every time a page is
dirtiedfor setting a hint bit, which fixes the problem. So, there's a caveat with pg_rewind; you must have checksums
enabled.
>
>
> I was quite impressed with the idea, but hint bits indeed are problem. I realised the same issue also applies to the
otheridea that Fujii-san and others have suggested about waiting for dirty buffers to be written until the WAL is
receivedat the standby. But since that idea would anyways need to be implemented in the core, we could teach
SetHintBits()to return false unless the corresponding commit WAL records are written to the standby first.
 

Would it be useful to turn this problem around? Heikki's proposal is based on being able to track (without fail) all
blocksthat have been modified; could we instead track blocks that we know for certain have NOT been modified? The
differencethere is that we can be more conservative in stating "we know this block is the same"; worst case we just do
someextra copying.
 

<thinking out loud...>
One possibility would be to use file timestamps. For files that are past a certain age on both master and slave, if we
forcethe timestamp on the slave to match the timestamp from the master, rsync will be able to safely ignore that file.
Irealize that's not as good as block-level detection, but it's probably a tremendous improvement over what we have
today.The critical thing in this case would be to *guarantee* that the timestamps did not match on modified files.
 

Of course, screwing around with FS timestamps in this manner is pretty grotty, at least on a live system. Perhaps
there'ssome way to track that info separately and then use it to change file timestamps before running rsync. Or if we
areable to define a list of files that we think may have changed, we just feed that list to rsync and let it do the
heavylifting.
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



pgsql-hackers by date:

Previous
From: Fabrízio de Royes Mello
Date:
Subject: Patch to add support of "IF NOT EXISTS" to others "CREATE" statements
Next
From: Alvaro Herrera
Date:
Subject: Re: background processes vs. hot standby