Thread: How should pg_standby get over the gap of timeline?
Hi, In the current Synch Rep patch, the standby cannot catch up with the primary which has a bigger timeline. So, whenever making the standby catch up, a fresh base backup is required. This is obviously undesirable, and I'd like to get rid of this restriction. Postgres itself can recover up to a bigger timeline without a base backup. The remaining problem is that pg_standby cannot get over the gap of timeline. It continues waiting for the XLOG file with out-of-date timeline, and redo doesn't progress. My idea is that introducing a new option into pg_standby, which makes the restoring fail if there is the XLOG file with the same logid and segid even if the target file doesn't exist. Once failing to restore, the startup process can switch the timeline and try to restore the XLOG file with new timeline. Is this idea reasonable? Any comments welcome! Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao wrote: > In the current Synch Rep patch, the standby cannot catch up with the > primary which has a bigger timeline. That would only happen if you've performed an archive recovery in the primary. If you've done PITR in the primary, I don't think there's any guarantee that it's even possible to catch up the standby. The standby might already have replayed a WAL file from an earlier timeline, that isn't part of the history of the bigger timeline. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hi, Heikki. Thanks for the comment! On Thu, Nov 20, 2008 at 11:24 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Fujii Masao wrote: >> >> In the current Synch Rep patch, the standby cannot catch up with the >> primary which has a bigger timeline. > > That would only happen if you've performed an archive recovery in the > primary. If you've done PITR in the primary, I don't think there's any > guarantee that it's even possible to catch up the standby. The standby might > already have replayed a WAL file from an earlier timeline, that isn't part > of the history of the bigger timeline. I assume the situation of making the standby (the original primary) catch up with the primary (the original standby) after failover. Since a timeline is incremented when a failover finishes archive recovery on a standby, the timelines differ between two servers. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao wrote: > Hi, Heikki. Thanks for the comment! > > On Thu, Nov 20, 2008 at 11:24 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> Fujii Masao wrote: >>> In the current Synch Rep patch, the standby cannot catch up with the >>> primary which has a bigger timeline. >> That would only happen if you've performed an archive recovery in the >> primary. If you've done PITR in the primary, I don't think there's any >> guarantee that it's even possible to catch up the standby. The standby might >> already have replayed a WAL file from an earlier timeline, that isn't part >> of the history of the bigger timeline. > > I assume the situation of making the standby (the original primary) catch up > with the primary (the original standby) after failover. Since a timeline is > incremented when a failover finishes archive recovery on a standby, the > timelines differ between two servers. That seems like a dangerous assumption. What if the standby had fallen behind before the failover? It's not safe to failover back to the original primary in that case. We'd need some kind of safeguards against that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
<br /><br /><div class="gmail_quote">On Thu, Nov 20, 2008 at 8:36 PM, Heikki Linnakangas <span dir="ltr"><<a href="mailto:heikki.linnakangas@enterprisedb.com">heikki.linnakangas@enterprisedb.com</a>></span>wrote:<br /><blockquoteclass="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left:1ex;"><div class="Ih2E3d"><br /></div> That seems like a dangerous assumption. What if the standby had fallenbehind before the failover? It's not safe to failover back to the original primary in that case. We'd need some kindof safeguards against that.<div class="Ih2E3d"><br /><br /></div></blockquote></div><br />For synchronous replication,what if we ensure that the standby has received the WAL (atleast in its buffers) before writing it to disk onthe primary ? If we do that, I think the old standby can never fall behind the primary and it would be easy for the oldprimary to join back the replication without a fresh backup.<br /><br />Of course, this doesn't work for async replication.<br/><br />Thanks,<br />Pavan<br /><br />-- <br />Pavan Deolasee<br />EnterpriseDB <a href="http://www.enterprisedb.com">http://www.enterprisedb.com</a><br/>
On Fri, Nov 21, 2008 at 12:06 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Fujii Masao wrote: >> >> Hi, Heikki. Thanks for the comment! >> >> On Thu, Nov 20, 2008 at 11:24 PM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >>> >>> Fujii Masao wrote: >>>> >>>> In the current Synch Rep patch, the standby cannot catch up with the >>>> primary which has a bigger timeline. >>> >>> That would only happen if you've performed an archive recovery in the >>> primary. If you've done PITR in the primary, I don't think there's any >>> guarantee that it's even possible to catch up the standby. The standby >>> might >>> already have replayed a WAL file from an earlier timeline, that isn't >>> part >>> of the history of the bigger timeline. >> >> I assume the situation of making the standby (the original primary) catch >> up >> with the primary (the original standby) after failover. Since a timeline >> is >> incremented when a failover finishes archive recovery on a standby, the >> timelines differ between two servers. > > That seems like a dangerous assumption. What if the standby had fallen > behind before the failover? It's not safe to failover back to the original > primary in that case. We'd need some kind of safeguards against that. Yeah, it's a legitimate concern. As the safeguard, I'm going to delete the XLOG files which may be inconsistent from the standby before making it catch up. The XLOG file including the recovery starting point and the subsequent ones may be inconsistent. Then, they need to be copied from the primary. I'm writing down the draft of this procedure at wiki. http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Procedure But, it's overkill to overwrite all the XLOG files which may be inconsistent. In the future, I'm going to provide the tool to compare the content of XLOG between two servers and tell the user which files should be overwritten. -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Nov 21, 2008 at 12:15 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > > > On Thu, Nov 20, 2008 at 8:36 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> >> That seems like a dangerous assumption. What if the standby had fallen >> behind before the failover? It's not safe to failover back to the original >> primary in that case. We'd need some kind of safeguards against that. >> > > For synchronous replication, what if we ensure that the standby has received > the WAL (atleast in its buffers) before writing it to disk on the primary ? > If we do that, I think the old standby can never fall behind the primary and > it would be easy for the old primary to join back the replication without a > fresh backup. In the current patch, since the WAL are written and sent concurrently for the performance gain, we cannot guarantee whether the old standby fall behind or not. I think that the setup procedure which can resolve both cases is required. > Of course, this doesn't work for async replication. Yeah, in asynch replication, some committed transaction may disappear regardless of whether the fresh backup is used or not. But, since the current patch guarantee "Replicate Ahead Log" rule even if asynch case, we can recover the old primary by using the WAL on the old standby consistently. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, 2008-11-20 at 22:41 +0900, Fujii Masao wrote: > In the current Synch Rep patch, the standby cannot catch up with the > primary which has a bigger timeline. So, whenever making the standby > catch up, a fresh base backup is required. This is obviously undesirable, > and I'd like to get rid of this restriction. > > Postgres itself can recover up to a bigger timeline without a base > backup. The remaining problem is that pg_standby cannot get over the > gap of timeline. It continues waiting for the XLOG file with out-of-date > timeline, and redo doesn't progress. We've discussed this before. My answer is the same: you are assuming it is safe to re-enter recovery, which is not correct (currently). You are also assuming that taking a base backup is an expensive operation - it need not be so if you simply move only the files/data that have changed, e.g. rsync. So if you want this to work, hacking pg_standby is not the way to do it. But I'm not convinced there is a problem worth solving. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi, Simon. Thanks for the comment!! On Sat, Nov 22, 2008 at 2:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Thu, 2008-11-20 at 22:41 +0900, Fujii Masao wrote: > >> In the current Synch Rep patch, the standby cannot catch up with the >> primary which has a bigger timeline. So, whenever making the standby >> catch up, a fresh base backup is required. This is obviously undesirable, >> and I'd like to get rid of this restriction. >> >> Postgres itself can recover up to a bigger timeline without a base >> backup. The remaining problem is that pg_standby cannot get over the >> gap of timeline. It continues waiting for the XLOG file with out-of-date >> timeline, and redo doesn't progress. > > We've discussed this before. My answer is the same: you are assuming it > is safe to re-enter recovery, which is not correct (currently). I'm afraid you might be right. But I cannot understand yet why it's not safe to re-enter recovery. Is it safe to re-enter recovery from the restart point after PITR stopped halfway? If it's safe, ISTM that PITR without a base backup also is safe. Please let me know what might violate a re-entry of recovery. What is your worry? > You are > also assuming that taking a base backup is an expensive operation - it > need not be so if you simply move only the files/data that have changed, > e.g. rsync. It depends on DB size and type. I think that it's important that the user *can* choose the better method according to his situation. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Sat, 2008-11-22 at 03:39 +0900, Fujii Masao wrote: > Hi, Simon. Thanks for the comment!! > > On Sat, Nov 22, 2008 at 2:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > > > On Thu, 2008-11-20 at 22:41 +0900, Fujii Masao wrote: > > > >> In the current Synch Rep patch, the standby cannot catch up with the > >> primary which has a bigger timeline. So, whenever making the standby > >> catch up, a fresh base backup is required. This is obviously undesirable, > >> and I'd like to get rid of this restriction. > >> > >> Postgres itself can recover up to a bigger timeline without a base > >> backup. The remaining problem is that pg_standby cannot get over the > >> gap of timeline. It continues waiting for the XLOG file with out-of-date > >> timeline, and redo doesn't progress. > > > > We've discussed this before. My answer is the same: you are assuming it > > is safe to re-enter recovery, which is not correct (currently). > > I'm afraid you might be right. But I cannot understand yet why it's not > safe to re-enter recovery. Is it safe to re-enter recovery from the > restart point after PITR stopped halfway? If it's safe, ISTM that PITR > without a base backup also is safe. Please let me know what might > violate a re-entry of recovery. What is your worry? My worry is that there has not been an exhaustive analysis. "Almost correct" and "probably correct" is not the same thing as "correct". We need to look through all of the changes that occur at the end of recovery to be certain we can do this. Luckily normal data blocks don't know anything about such state changes, so that is a good start. We must look at Timelines control file startupclog, startup multixact etc autovacuum starting relcache init file flat files archive status pg_xlog two phase commit ... every single file type in Postgres... -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Sat, Nov 22, 2008 at 6:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > My worry is that there has not been an exhaustive analysis. "Almost > correct" and "probably correct" is not the same thing as "correct". We > need to look through all of the changes that occur at the end of > recovery to be certain we can do this. Luckily normal data blocks don't > know anything about such state changes, so that is a good start. We must > look at It's reasonable worry. Thanks a lot, Simon. I will examine it next time (probably 8.5). And, I'd like to clear up which recovery method is safe now. Althogh I think as follows, is it right? Safe (proved to be safe): - PITR with a base backup. That is, we don't always need a fresh backup when setting up, and can make the standby catch upby using an old or fresh backup. If we can use an old backup, I think it might be worth changing pg_standby to get overthe gap of timeline. What is your opinion? - PITR with a database cluster including a recovery restart point. That is, we can make the standby catch up without a basebackup after it fails. Not safe (further examination is needed): - PITR with a database cluster not including a recovery restart point. That is, we cannot make the standby (old primary)catch up without a base backup. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center