Hi,
I've been hacking on a tool to allow resynchronizing an old master
server after failover. The need to do a full backup/restore has been a
common complaint ever since we've had streaming replication. I saw on
the wiki that this was discussed in the dev meeting; too bad I couldn't
make it.
In a nutshell, the idea is to do copy everything that has changed
between the cluster, like rsync does, but instead of reading through all
files, use the WAL to determine what has changed. Here's a somewhat more
detailed explanation, from the README:
Theory of operation
-------------------
The basic idea is to copy everything from the new cluster to old, except
for the blocks that we know to be the same.
1. Scan the WAL log of the old cluster, starting from the point where
the new cluster's timeline history forked off from the old cluster. For
each WAL record, make a note of the data blocks that are touched. This
yields a list of all the data blocks that were changed in the old
cluster, after the new cluster forked off.
2. Copy all those changed blocks from the new master to the old master.
3. Copy all other files like clog, conf files etc. from the new cluster
to old. Everything except the relation files.
4. Apply the WAL from the new master, starting from the checkpoint
created at failover. (pg_rewind doesn't actually apply the WAL, it just
creates a backup label file indicating that when PostgreSQL is started,
it will start replay from that checkpoint and apply all the required WAL)
Please take a look: https://github.com/vmware/pg_rewind
- Heikki