On Tue, Aug 2, 2011 at 2:17 PM, Pedro Sam <pesam@rim.com> wrote:
> I've been trying to use repmgr for just that purpose. Looks like it simply creates/modifies a recovery.conf pointing
primary_conninfoto the new master, and then restart. It does not seem to have the ability to resolve any timeline
conflictsat all.
It does not -- however it does simplify the process and optimizes the
downtime a little bit. Reading the README:
"And if a previously failed node becomes available again, such as the
lost node1 above, you can get it to resynchronize by only copying over
changes made while it was down using. That hapens with what's called a
forced clone, which overwrites existing data rather than assuming it
starts with an empty database directory tree:
repmgr -D /var/lib/pgsql/9.0 --force standby clone node1
This can be much faster than creating a brand new node that must copy
over every file in the database."
Basically this is formalizing good practice for failing over nodes and
re-syncing to a promoted master. I will say though that one
unfortunate side effect of using HS/SR for HA is that you need *four*
servers to really protect yourself against data loss -- one master and
three standbys. With a master and two standbys, you face a risk of
significant loss if the promoted master dies while the remaining
standby is syncing up to it. What you are looking for is a 'hot sync'
so that standbys could be promoted in such a way that does not require
a full sync -- that doesn't exist right now AFAIK.
merlin