pg_rewind, a tool for resynchronizing an old master after failover - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject pg_rewind, a tool for resynchronizing an old master after failover
Date
Msg-id 519DF910.4020609@vmware.com
Whole thread Raw
Responses Re: pg_rewind, a tool for resynchronizing an old master after failover  (Robert Haas <robertmhaas@gmail.com>)
Re: pg_rewind, a tool for resynchronizing an old master after failover  (Simon Riggs <simon@2ndQuadrant.com>)
Re: pg_rewind, a tool for resynchronizing an old master after failover  (Thom Brown <thom@linux.com>)
Re: pg_rewind, a tool for resynchronizing an old master after failover  (Amit kapila <amit.kapila@huawei.com>)
List pgsql-hackers
Hi,

I've been hacking on a tool to allow resynchronizing an old master 
server after failover. The need to do a full backup/restore has been a 
common complaint ever since we've had streaming replication. I saw on 
the wiki that this was discussed in the dev meeting; too bad I couldn't 
make it.

In a nutshell, the idea is to do copy everything that has changed 
between the cluster, like rsync does, but instead of reading through all 
files, use the WAL to determine what has changed. Here's a somewhat more 
detailed explanation, from the README:

Theory of operation
-------------------

The basic idea is to copy everything from the new cluster to old, except 
for the blocks that we know to be the same.

1. Scan the WAL log of the old cluster, starting from the point where
the new cluster's timeline history forked off from the old cluster. For 
each WAL record, make a note of the data blocks that are touched. This 
yields a list of all the data blocks that were changed in the old 
cluster, after the new cluster forked off.

2. Copy all those changed blocks from the new master to the old master.

3. Copy all other files like clog, conf files etc. from the new cluster
to old. Everything except the relation files.

4. Apply the WAL from the new master, starting from the checkpoint
created at failover. (pg_rewind doesn't actually apply the WAL, it just 
creates a backup label file indicating that when PostgreSQL is started, 
it will start replay from that checkpoint and apply all the required WAL)


Please take a look: https://github.com/vmware/pg_rewind

- Heikki



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Time limit for a process to hold Content lock in Buffer Cache
Next
From: Atri Sharma
Date:
Subject: Re: Time limit for a process to hold Content lock in Buffer Cache