Thread: Timeline Conflict
We have system(Cluster) with Master replicating to 2 stand by servers. i.e M |-------> S1 |-------> S2 If master failed, we do a trigger file at S1 to take over as master. Now we need to re-point the standby S2 as slave for the new master (i.e S1) While trying to start standby S2,there is a conflict in timelines, since on recovery it generates a new line. Is there any way to solve this issue? -- View this message in context: http://postgresql.1045698.n5.nabble.com/Timeline-Conflict-tp4657611p4657611.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Tue, Aug 2, 2011 at 12:59 AM, senthilnathan <senthilnathan.t@gmail.com> wrote: > We have system(Cluster) with Master replicating to 2 stand by servers. > > i.e > > M |-------> S1 > > |-------> S2 > > If master failed, we do a trigger file at S1 to take over as master. Now we > need to re-point the standby S2 as slave for the new master (i.e S1) > > While trying to start standby S2,there is a conflict in timelines, since on > recovery it generates a new line. > > Is there any way to solve this issue? AFAIK, the only solution is to follow the initial standby setup process to bring the standby up to sync with the new master. One small comfort is that since the standby is mostly in the state it needs to be, an rsync based process might happen fairly quickly. This of course means that if you lose the new master before the standby is up to speed you are facing data loss. I'm really curious if anyone has figured out a potential solution to this problem. merlin
On Tue, Aug 2, 2011 at 2:55 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Tue, Aug 2, 2011 at 12:59 AM, senthilnathan > <senthilnathan.t@gmail.com> wrote: >> We have system(Cluster) with Master replicating to 2 stand by servers. >> >> i.e >> >> M |-------> S1 >> >> |-------> S2 >> >> If master failed, we do a trigger file at S1 to take over as master. Now >> we >> need to re-point the standby S2 as slave for the new master (i.e S1) >> >> While trying to start standby S2,there is a conflict in timelines, since >> on >> recovery it generates a new line. >> >> Is there any way to solve this issue? > > AFAIK, the only solution is to follow the initial standby setup > process to bring the standby up to sync with the new master. One > small comfort is that since the standby is mostly in the state it > needs to be, an rsync based process might happen fairly quickly. This > of course means that if you lose the new master before the standby is > up to speed you are facing data loss. I'm really curious if anyone > has figured out a potential solution to this problem. http://projects.2ndquadrant.com/repmgr solves the problem -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
I've been trying to use repmgr for just that purpose. Looks like it simply creates/modifies a recovery.conf pointing primary_conninfoto the new master, and then restart. It does not seem to have the ability to resolve any timeline conflictsat all. Am I using repmgr incorrectly? -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Simon Riggs Sent: Tuesday, August 02, 2011 12:07 PM To: Merlin Moncure Cc: senthilnathan; pgsql-general Subject: Re: [GENERAL] Timeline Conflict On Tue, Aug 2, 2011 at 2:55 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Tue, Aug 2, 2011 at 12:59 AM, senthilnathan > <senthilnathan.t@gmail.com> wrote: >> We have system(Cluster) with Master replicating to 2 stand by servers. >> >> i.e >> >> M |-------> S1 >> >> |-------> S2 >> >> If master failed, we do a trigger file at S1 to take over as master. Now >> we >> need to re-point the standby S2 as slave for the new master (i.e S1) >> >> While trying to start standby S2,there is a conflict in timelines, since >> on >> recovery it generates a new line. >> >> Is there any way to solve this issue? > > AFAIK, the only solution is to follow the initial standby setup > process to bring the standby up to sync with the new master. One > small comfort is that since the standby is mostly in the state it > needs to be, an rsync based process might happen fairly quickly. This > of course means that if you lose the new master before the standby is > up to speed you are facing data loss. I'm really curious if anyone > has figured out a potential solution to this problem. http://projects.2ndquadrant.com/repmgr solves the problem -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general --------------------------------------------------------------------- This transmission (including any attachments) may contain confidential information, privileged material (including materialprotected by the solicitor-client or other applicable privileges), or constitute non-public information. Any useof this information by anyone other than the intended recipient is prohibited. If you have received this transmissionin error, please immediately reply to the sender and delete this information from your system. Use, dissemination,distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.
On Tue, Aug 2, 2011 at 2:17 PM, Pedro Sam <pesam@rim.com> wrote: > I've been trying to use repmgr for just that purpose. Looks like it simply creates/modifies a recovery.conf pointing primary_conninfoto the new master, and then restart. It does not seem to have the ability to resolve any timeline conflictsat all. It does not -- however it does simplify the process and optimizes the downtime a little bit. Reading the README: "And if a previously failed node becomes available again, such as the lost node1 above, you can get it to resynchronize by only copying over changes made while it was down using. That hapens with what's called a forced clone, which overwrites existing data rather than assuming it starts with an empty database directory tree: repmgr -D /var/lib/pgsql/9.0 --force standby clone node1 This can be much faster than creating a brand new node that must copy over every file in the database." Basically this is formalizing good practice for failing over nodes and re-syncing to a promoted master. I will say though that one unfortunate side effect of using HS/SR for HA is that you need *four* servers to really protect yourself against data loss -- one master and three standbys. With a master and two standbys, you face a risk of significant loss if the promoted master dies while the remaining standby is syncing up to it. What you are looking for is a 'hot sync' so that standbys could be promoted in such a way that does not require a full sync -- that doesn't exist right now AFAIK. merlin
On Tue, Aug 2, 2011 at 8:17 PM, Pedro Sam <pesam@rim.com> wrote: > I've been trying to use repmgr for just that purpose. Looks like it simply creates/modifies a recovery.conf pointing primary_conninfoto the new master, and then restart. It does not seem to have the ability to resolve any timeline conflictsat all. > > Am I using repmgr incorrectly? It would appear so. repmgr is not a fix for a problem situation, it is a management system that will avoid the problems in the first place. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Aug 2, 2011 at 8:41 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > Basically this is formalizing good practice for failing over nodes and > re-syncing to a promoted master. I will say though that one > unfortunate side effect of using HS/SR for HA is that you need *four* > servers to really protect yourself against data loss -- one master and > three standbys. With a master and two standbys, you face a risk of > significant loss if the promoted master dies while the remaining > standby is syncing up to it. What you are looking for is a 'hot sync' > so that standbys could be promoted in such a way that does not require > a full sync -- that doesn't exist right now AFAIK. repmgr is specifically designed to reduce the time for a "follow" action to a very small amount. There is no risk of significant loss. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Aug 2, 2011 at 2:59 PM, senthilnathan <senthilnathan.t@gmail.com> wrote: > We have system(Cluster) with Master replicating to 2 stand by servers. > > i.e > > M |-------> S1 > > |-------> S2 > > If master failed, we do a trigger file at S1 to take over as master. Now we > need to re-point the standby S2 as slave for the new master (i.e S1) > > While trying to start standby S2,there is a conflict in timelines, since on > recovery it generates a new line. > > Is there any way to solve this issue? Basically you need to take a fresh backup from new master and restart the standby using it. But, if S1 and S2 share the archive, S1 is ahead of S2 (i.e., the replay location of S1 is bigger than or equal to that of S2), and recovery_target_timeline is set to 'latest' in S2's recovery.conf, you can skip taking a fresh backup from new master. In this case, you can re-point S2 as a standby just by changing primary_conninfo in S2's recovery.conf and restarting S2. When S2 restarts, S2 reads the timeline history file which was created by S1 at failover and adjust its timeline ID to S1's. So timeline conflict doesn't happen. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Aug 3, 2011 at 2:38 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Tue, Aug 2, 2011 at 2:59 PM, senthilnathan <senthilnathan.t@gmail.com> wrote: >> We have system(Cluster) with Master replicating to 2 stand by servers. >> >> i.e >> >> M |-------> S1 >> >> |-------> S2 >> >> If master failed, we do a trigger file at S1 to take over as master. Now we >> need to re-point the standby S2 as slave for the new master (i.e S1) >> >> While trying to start standby S2,there is a conflict in timelines, since on >> recovery it generates a new line. >> >> Is there any way to solve this issue? > > Basically you need to take a fresh backup from new master and restart > the standby > using it. But, if S1 and S2 share the archive, S1 is ahead of S2 > (i.e., the replay location > of S1 is bigger than or equal to that of S2), and > recovery_target_timeline is set to > 'latest' in S2's recovery.conf, you can skip taking a fresh backup > from new master. > In this case, you can re-point S2 as a standby just by changing > primary_conninfo in > S2's recovery.conf and restarting S2. When S2 restarts, S2 reads the > timeline history > file which was created by S1 at failover and adjust its timeline ID to > S1's. So timeline > conflict doesn't happen. Though this relies upon a shared archive which gives a single point of failure. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services