Thread: Timeline Conflict

Timeline Conflict

From

senthilnathan

Date:

02 August 2011, 10:00:26

We have system(Cluster) with Master replicating to 2 stand by servers.

i.e

M   |-------> S1

      |-------> S2

If master failed, we do a trigger file at S1 to take over as master. Now we
need to re-point the standby S2 as slave for the new master (i.e S1)

While trying to start standby S2,there is a conflict in timelines, since on
recovery it generates a new line.

Is there any way to solve this issue?

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Timeline-Conflict-tp4657611p4657611.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: Timeline Conflict

From

Merlin Moncure

Date:

02 August 2011, 10:55:17

On Tue, Aug 2, 2011 at 12:59 AM, senthilnathan
<senthilnathan.t@gmail.com> wrote:
> We have system(Cluster) with Master replicating to 2 stand by servers.
>
> i.e
>
> M   |-------> S1
>
>      |-------> S2
>
> If master failed, we do a trigger file at S1 to take over as master. Now we
> need to re-point the standby S2 as slave for the new master (i.e S1)
>
> While trying to start standby S2,there is a conflict in timelines, since on
> recovery it generates a new line.
>
> Is there any way to solve this issue?

AFAIK, the only solution is to follow the initial standby setup
process to bring the standby up to sync with the new master.  One
small comfort is that since the standby is mostly in the state it
needs to be, an rsync based process might happen fairly quickly.  This
of course means that if you lose the new master before the standby is
up to speed you are facing data loss.  I'm really curious if anyone
has figured out a potential solution to this problem.

merlin

Re: Timeline Conflict

From

Simon Riggs

Date:

02 August 2011, 13:13:30

On Tue, Aug 2, 2011 at 2:55 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Tue, Aug 2, 2011 at 12:59 AM, senthilnathan
> <senthilnathan.t@gmail.com> wrote:
>> We have system(Cluster) with Master replicating to 2 stand by servers.
>>
>> i.e
>>
>> M   |-------> S1
>>
>>      |-------> S2
>>
>> If master failed, we do a trigger file at S1 to take over as master. Now
>> we
>> need to re-point the standby S2 as slave for the new master (i.e S1)
>>
>> While trying to start standby S2,there is a conflict in timelines, since
>> on
>> recovery it generates a new line.
>>
>> Is there any way to solve this issue?
>
> AFAIK, the only solution is to follow the initial standby setup
> process to bring the standby up to sync with the new master.  One
> small comfort is that since the standby is mostly in the state it
> needs to be, an rsync based process might happen fairly quickly.  This
> of course means that if you lose the new master before the standby is
> up to speed you are facing data loss.  I'm really curious if anyone
> has figured out a potential solution to this problem.

http://projects.2ndquadrant.com/repmgr

solves the problem

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Timeline Conflict

From

Pedro Sam

Date:

02 August 2011, 16:18:10

I've been trying to use repmgr for just that purpose.  Looks like it simply creates/modifies a recovery.conf pointing
primary_conninfoto the new master, and then restart.  It does not seem to have the ability to resolve any timeline
conflictsat all. 

Am I using repmgr incorrectly?

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Simon Riggs
Sent: Tuesday, August 02, 2011 12:07 PM
To: Merlin Moncure
Cc: senthilnathan; pgsql-general
Subject: Re: [GENERAL] Timeline Conflict

On Tue, Aug 2, 2011 at 2:55 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Tue, Aug 2, 2011 at 12:59 AM, senthilnathan
> <senthilnathan.t@gmail.com> wrote:
>> We have system(Cluster) with Master replicating to 2 stand by servers.
>>
>> i.e
>>
>> M   |-------> S1
>>
>>      |-------> S2
>>
>> If master failed, we do a trigger file at S1 to take over as master. Now
>> we
>> need to re-point the standby S2 as slave for the new master (i.e S1)
>>
>> While trying to start standby S2,there is a conflict in timelines, since
>> on
>> recovery it generates a new line.
>>
>> Is there any way to solve this issue?
>
> AFAIK, the only solution is to follow the initial standby setup
> process to bring the standby up to sync with the new master.  One
> small comfort is that since the standby is mostly in the state it
> needs to be, an rsync based process might happen fairly quickly.  This
> of course means that if you lose the new master before the standby is
> up to speed you are facing data loss.  I'm really curious if anyone
> has figured out a potential solution to this problem.

http://projects.2ndquadrant.com/repmgr

solves the problem

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

---------------------------------------------------------------------
This transmission (including any attachments) may contain confidential information, privileged material (including
materialprotected by the solicitor-client or other applicable privileges), or constitute non-public information. Any
useof this information by anyone other than the intended recipient is prohibited. If you have received this
transmissionin error, please immediately reply to the sender and delete this information from your system. Use,
dissemination,distribution, or reproduction of this transmission by unintended recipients is not authorized and may be
unlawful.

Re: Timeline Conflict

From

Merlin Moncure

Date:

02 August 2011, 16:41:36

On Tue, Aug 2, 2011 at 2:17 PM, Pedro Sam <pesam@rim.com> wrote:
> I've been trying to use repmgr for just that purpose. Looks like it simply creates/modifies a recovery.conf pointing
primary_conninfoto the new master, and then restart. It does not seem to have the ability to resolve any timeline
conflictsat all.

It does not -- however it does simplify the process and optimizes the
downtime a little bit. Reading the README:

"And if a previously failed node becomes available again, such as the
lost node1 above, you can get it to resynchronize by only copying over
changes made while it was down using. That hapens with what's called a
forced clone, which overwrites existing data rather than assuming it
starts with an empty database directory tree:

repmgr -D /var/lib/pgsql/9.0 --force standby clone node1

This can be much faster than creating a brand new node that must copy
over every file in the database."

Basically this is formalizing good practice for failing over nodes and
re-syncing to a promoted master. I will say though that one
unfortunate side effect of using HS/SR for HA is that you need *four*
servers to really protect yourself against data loss -- one master and
three standbys. With a master and two standbys, you face a risk of
significant loss if the promoted master dies while the remaining
standby is syncing up to it. What you are looking for is a 'hot sync'
so that standbys could be promoted in such a way that does not require
a full sync -- that doesn't exist right now AFAIK.

merlin

Re: Timeline Conflict

From

Simon Riggs

Date:

02 August 2011, 17:19:06

On Tue, Aug 2, 2011 at 8:17 PM, Pedro Sam <pesam@rim.com> wrote:
> I've been trying to use repmgr for just that purpose.  Looks like it simply creates/modifies a recovery.conf pointing
primary_conninfoto the new master, and then restart.  It does not seem to have the ability to resolve any timeline
conflictsat all. 
>
> Am I using repmgr incorrectly?

It would appear so.

repmgr is not a fix for a problem situation, it is a management system
that will avoid the problems in the first place.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Timeline Conflict

From

Simon Riggs

Date:

02 August 2011, 17:22:01

On Tue, Aug 2, 2011 at 8:41 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

> Basically this is formalizing good practice for failing over nodes and
> re-syncing to a promoted master.  I will say though that one
> unfortunate side effect of using HS/SR for HA is that you need *four*
> servers to really protect yourself against data loss -- one master and
> three standbys.  With a master and two standbys, you face a risk of
> significant loss if the promoted master dies while the remaining
> standby is syncing up to it.  What you are looking for is a 'hot sync'
> so that standbys could be promoted in such a way that does not require
> a full sync -- that doesn't exist right now AFAIK.

repmgr is specifically designed to reduce the time for a "follow"
action to a very small amount.

There is no risk of significant loss.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Timeline Conflict

From

Fujii Masao

Date:

02 August 2011, 22:38:35

On Tue, Aug 2, 2011 at 2:59 PM, senthilnathan <senthilnathan.t@gmail.com> wrote:
> We have system(Cluster) with Master replicating to 2 stand by servers.
>
> i.e
>
> M   |-------> S1
>
>      |-------> S2
>
> If master failed, we do a trigger file at S1 to take over as master. Now we
> need to re-point the standby S2 as slave for the new master (i.e S1)
>
> While trying to start standby S2,there is a conflict in timelines, since on
> recovery it generates a new line.
>
> Is there any way to solve this issue?

Basically you need to take a fresh backup from new master and restart
the standby
using it. But, if S1 and S2 share the archive, S1 is ahead of S2
(i.e., the replay location
of S1 is bigger than or equal to that of S2), and
recovery_target_timeline is set to
'latest' in S2's recovery.conf, you can skip taking a fresh backup
from new master.
In this case, you can re-point S2 as a standby just by changing
primary_conninfo in
S2's recovery.conf and restarting S2. When S2 restarts, S2 reads the
timeline history
file which was created by S1 at failover and adjust its timeline ID to
S1's. So timeline
conflict doesn't happen.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Timeline Conflict

From

Simon Riggs

Date:

03 August 2011, 00:18:35

On Wed, Aug 3, 2011 at 2:38 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Aug 2, 2011 at 2:59 PM, senthilnathan <senthilnathan.t@gmail.com> wrote:
>> We have system(Cluster) with Master replicating to 2 stand by servers.
>>
>> i.e
>>
>> M   |-------> S1
>>
>>      |-------> S2
>>
>> If master failed, we do a trigger file at S1 to take over as master. Now we
>> need to re-point the standby S2 as slave for the new master (i.e S1)
>>
>> While trying to start standby S2,there is a conflict in timelines, since on
>> recovery it generates a new line.
>>
>> Is there any way to solve this issue?
>
> Basically you need to take a fresh backup from new master and restart
> the standby
> using it. But, if S1 and S2 share the archive, S1 is ahead of S2
> (i.e., the replay location
> of S1 is bigger than or equal to that of S2), and
> recovery_target_timeline is set to
> 'latest' in S2's recovery.conf, you can skip taking a fresh backup
> from new master.
> In this case, you can re-point S2 as a standby just by changing
> primary_conninfo in
> S2's recovery.conf and restarting S2. When S2 restarts, S2 reads the
> timeline history
> file which was created by S1 at failover and adjust its timeline ID to
> S1's. So timeline
> conflict doesn't happen.

Though this relies upon a shared archive which gives a single point of failure.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services