Thread: How should pg_standby get over the gap of timeline?

How should pg_standby get over the gap of timeline?

From
"Fujii Masao"
Date:
Hi,

In the current Synch Rep patch, the standby cannot catch up with the
primary which has a bigger timeline. So, whenever making the standby
catch up, a fresh base backup is required. This is obviously undesirable,
and I'd like to get rid of this restriction.

Postgres itself can recover up to a bigger timeline without a base
backup. The remaining problem is that pg_standby cannot get over the
gap of timeline. It continues waiting for the XLOG file with out-of-date
timeline, and redo doesn't progress.

My idea is that introducing a new option into pg_standby, which makes
the restoring fail if there is the XLOG file with the same logid and segid
even if the target file doesn't exist. Once failing to restore, the startup
process can switch the timeline and try to restore the XLOG file with
new timeline.

Is this idea reasonable? Any comments welcome!

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: How should pg_standby get over the gap of timeline?

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> In the current Synch Rep patch, the standby cannot catch up with the
> primary which has a bigger timeline.

That would only happen if you've performed an archive recovery in the 
primary. If you've done PITR in the primary, I don't think there's any 
guarantee that it's even possible to catch up the standby. The standby 
might already have replayed a WAL file from an earlier timeline, that 
isn't part of the history of the bigger timeline.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: How should pg_standby get over the gap of timeline?

From
"Fujii Masao"
Date:
Hi, Heikki. Thanks for the comment!

On Thu, Nov 20, 2008 at 11:24 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Fujii Masao wrote:
>>
>> In the current Synch Rep patch, the standby cannot catch up with the
>> primary which has a bigger timeline.
>
> That would only happen if you've performed an archive recovery in the
> primary. If you've done PITR in the primary, I don't think there's any
> guarantee that it's even possible to catch up the standby. The standby might
> already have replayed a WAL file from an earlier timeline, that isn't part
> of the history of the bigger timeline.

I assume the situation of making the standby (the original primary) catch up
with the primary (the original standby) after failover. Since a timeline is
incremented when a failover finishes archive recovery on a standby, the
timelines differ between two servers.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: How should pg_standby get over the gap of timeline?

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> Hi, Heikki. Thanks for the comment!
> 
> On Thu, Nov 20, 2008 at 11:24 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Fujii Masao wrote:
>>> In the current Synch Rep patch, the standby cannot catch up with the
>>> primary which has a bigger timeline.
>> That would only happen if you've performed an archive recovery in the
>> primary. If you've done PITR in the primary, I don't think there's any
>> guarantee that it's even possible to catch up the standby. The standby might
>> already have replayed a WAL file from an earlier timeline, that isn't part
>> of the history of the bigger timeline.
> 
> I assume the situation of making the standby (the original primary) catch up
> with the primary (the original standby) after failover. Since a timeline is
> incremented when a failover finishes archive recovery on a standby, the
> timelines differ between two servers.

That seems like a dangerous assumption. What if the standby had fallen 
behind before the failover? It's not safe to failover back to the 
original primary in that case. We'd need some kind of safeguards against 
that.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: How should pg_standby get over the gap of timeline?

From
"Pavan Deolasee"
Date:
<br /><br /><div class="gmail_quote">On Thu, Nov 20, 2008 at 8:36 PM, Heikki Linnakangas <span dir="ltr"><<a
href="mailto:heikki.linnakangas@enterprisedb.com">heikki.linnakangas@enterprisedb.com</a>></span>wrote:<br
/><blockquoteclass="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex;
padding-left:1ex;"><div class="Ih2E3d"><br /></div> That seems like a dangerous assumption. What if the standby had
fallenbehind before the failover? It's not safe to failover back to the original primary in that case. We'd need some
kindof safeguards against that.<div class="Ih2E3d"><br /><br /></div></blockquote></div><br />For synchronous
replication,what if we ensure that the standby has received the WAL (atleast in its buffers) before writing it to disk
onthe primary ? If we do that, I think the old standby can never fall behind the primary and it would be easy for the
oldprimary to join back the replication without a fresh backup.<br /><br />Of course, this doesn't work for async
replication.<br/><br />Thanks,<br />Pavan<br /><br />-- <br />Pavan Deolasee<br />EnterpriseDB     <a
href="http://www.enterprisedb.com">http://www.enterprisedb.com</a><br/> 

Re: How should pg_standby get over the gap of timeline?

From
"Fujii Masao"
Date:
On Fri, Nov 21, 2008 at 12:06 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Fujii Masao wrote:
>>
>> Hi, Heikki. Thanks for the comment!
>>
>> On Thu, Nov 20, 2008 at 11:24 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>>
>>> Fujii Masao wrote:
>>>>
>>>> In the current Synch Rep patch, the standby cannot catch up with the
>>>> primary which has a bigger timeline.
>>>
>>> That would only happen if you've performed an archive recovery in the
>>> primary. If you've done PITR in the primary, I don't think there's any
>>> guarantee that it's even possible to catch up the standby. The standby
>>> might
>>> already have replayed a WAL file from an earlier timeline, that isn't
>>> part
>>> of the history of the bigger timeline.
>>
>> I assume the situation of making the standby (the original primary) catch
>> up
>> with the primary (the original standby) after failover. Since a timeline
>> is
>> incremented when a failover finishes archive recovery on a standby, the
>> timelines differ between two servers.
>
> That seems like a dangerous assumption. What if the standby had fallen
> behind before the failover? It's not safe to failover back to the original
> primary in that case. We'd need some kind of safeguards against that.

Yeah, it's a legitimate concern. As the safeguard, I'm going to delete the
XLOG files which may be inconsistent from the standby before making it
catch up. The XLOG file including the recovery starting point and the
subsequent ones may be inconsistent. Then, they need to be copied from
the primary. I'm writing down the draft of this procedure at wiki.
http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Procedure

But, it's overkill to overwrite all the XLOG files which may be inconsistent.
In the future, I'm going to provide the tool to compare the content of XLOG
between two servers and tell the user which files should be overwritten.

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: How should pg_standby get over the gap of timeline?

From
"Fujii Masao"
Date:
On Fri, Nov 21, 2008 at 12:15 AM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
>
>
> On Thu, Nov 20, 2008 at 8:36 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>>
>> That seems like a dangerous assumption. What if the standby had fallen
>> behind before the failover? It's not safe to failover back to the original
>> primary in that case. We'd need some kind of safeguards against that.
>>
>
> For synchronous replication, what if we ensure that the standby has received
> the WAL (atleast in its buffers) before writing it to disk on the primary ?
> If we do that, I think the old standby can never fall behind the primary and
> it would be easy for the old primary to join back the replication without a
> fresh backup.

In the current patch, since the WAL are written and sent concurrently for
the performance gain, we cannot guarantee whether the old standby fall
behind or not. I think that the setup procedure which can resolve both
cases is required.

> Of course, this doesn't work for async replication.

Yeah, in asynch replication, some committed transaction may disappear
regardless of whether the fresh backup is used or not. But, since the
current patch guarantee "Replicate Ahead Log" rule even if asynch case,
we can recover the old primary by using the WAL on the old standby
consistently.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: How should pg_standby get over the gap of timeline?

From
Simon Riggs
Date:
On Thu, 2008-11-20 at 22:41 +0900, Fujii Masao wrote:

> In the current Synch Rep patch, the standby cannot catch up with the
> primary which has a bigger timeline. So, whenever making the standby
> catch up, a fresh base backup is required. This is obviously undesirable,
> and I'd like to get rid of this restriction.
> 
> Postgres itself can recover up to a bigger timeline without a base
> backup. The remaining problem is that pg_standby cannot get over the
> gap of timeline. It continues waiting for the XLOG file with out-of-date
> timeline, and redo doesn't progress.

We've discussed this before. My answer is the same: you are assuming it
is safe to re-enter recovery, which is not correct (currently). You are
also assuming that taking a base backup is an expensive operation - it
need not be so if you simply move only the files/data that have changed,
e.g. rsync.

So if you want this to work, hacking pg_standby is not the way to do it.
But I'm not convinced there is a problem worth solving.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: How should pg_standby get over the gap of timeline?

From
"Fujii Masao"
Date:
Hi, Simon. Thanks for the comment!!

On Sat, Nov 22, 2008 at 2:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> On Thu, 2008-11-20 at 22:41 +0900, Fujii Masao wrote:
>
>> In the current Synch Rep patch, the standby cannot catch up with the
>> primary which has a bigger timeline. So, whenever making the standby
>> catch up, a fresh base backup is required. This is obviously undesirable,
>> and I'd like to get rid of this restriction.
>>
>> Postgres itself can recover up to a bigger timeline without a base
>> backup. The remaining problem is that pg_standby cannot get over the
>> gap of timeline. It continues waiting for the XLOG file with out-of-date
>> timeline, and redo doesn't progress.
>
> We've discussed this before. My answer is the same: you are assuming it
> is safe to re-enter recovery, which is not correct (currently).

I'm afraid you might be right. But I cannot understand yet why it's not
safe to re-enter recovery. Is it safe to re-enter recovery from the
restart point after PITR stopped halfway? If it's safe, ISTM that PITR
without a base backup also is safe. Please let me know what might
violate a re-entry of recovery. What is your worry?

> You are
> also assuming that taking a base backup is an expensive operation - it
> need not be so if you simply move only the files/data that have changed,
> e.g. rsync.

It depends on DB size and type. I think that it's important that the user
*can* choose the better method according to his situation.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: How should pg_standby get over the gap of timeline?

From
Simon Riggs
Date:
On Sat, 2008-11-22 at 03:39 +0900, Fujii Masao wrote:
> Hi, Simon. Thanks for the comment!!
> 
> On Sat, Nov 22, 2008 at 2:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >
> > On Thu, 2008-11-20 at 22:41 +0900, Fujii Masao wrote:
> >
> >> In the current Synch Rep patch, the standby cannot catch up with the
> >> primary which has a bigger timeline. So, whenever making the standby
> >> catch up, a fresh base backup is required. This is obviously undesirable,
> >> and I'd like to get rid of this restriction.
> >>
> >> Postgres itself can recover up to a bigger timeline without a base
> >> backup. The remaining problem is that pg_standby cannot get over the
> >> gap of timeline. It continues waiting for the XLOG file with out-of-date
> >> timeline, and redo doesn't progress.
> >
> > We've discussed this before. My answer is the same: you are assuming it
> > is safe to re-enter recovery, which is not correct (currently).
> 
> I'm afraid you might be right. But I cannot understand yet why it's not
> safe to re-enter recovery. Is it safe to re-enter recovery from the
> restart point after PITR stopped halfway? If it's safe, ISTM that PITR
> without a base backup also is safe. Please let me know what might
> violate a re-entry of recovery. What is your worry?

My worry is that there has not been an exhaustive analysis. "Almost
correct" and "probably correct" is not the same thing as "correct". We
need to look through all of the changes that occur at the end of
recovery to be certain we can do this. Luckily normal data blocks don't
know anything about such state changes, so that is a good start. We must
look at

Timelines
control file
startupclog, startup multixact etc
autovacuum starting
relcache init file
flat files
archive status
pg_xlog
two phase commit
...
every single file type in Postgres...

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: How should pg_standby get over the gap of timeline?

From
"Fujii Masao"
Date:
On Sat, Nov 22, 2008 at 6:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> My worry is that there has not been an exhaustive analysis. "Almost
> correct" and "probably correct" is not the same thing as "correct". We
> need to look through all of the changes that occur at the end of
> recovery to be certain we can do this. Luckily normal data blocks don't
> know anything about such state changes, so that is a good start. We must
> look at

It's reasonable worry. Thanks a lot, Simon. I will examine it next time
(probably 8.5).

And, I'd like to clear up which recovery method is safe now. Althogh
I think as follows, is it right?

Safe (proved to be safe):
- PITR with a base backup. That is, we don't always need a fresh backup when setting up, and can make the standby catch
upby using an old or fresh backup. If we can use an old backup, I think it might be worth changing pg_standby to get
overthe gap of timeline. What is your opinion?
 

- PITR with a database cluster including a recovery restart point. That is, we can make the standby catch up without a
basebackup after it fails.
 

Not safe (further examination is needed):
- PITR with a database cluster not including a recovery restart point. That is, we cannot make the standby (old
primary)catch up without a base backup.
 

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center