Re: recovery_target_time and standby_mode - Mailing list pgsql-hackers

From Robert Haas
Subject Re: recovery_target_time and standby_mode
Date
Msg-id CA+TgmoYwXGX5_6Li7oQJbkQEYTbMeOD7aR3PTNLSZq5aYe8aUg@mail.gmail.com
Whole thread Raw
In response to Re: recovery_target_time and standby_mode  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
On Fri, Nov 7, 2014 at 4:00 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 11/07/2014 12:02 PM, Robert Haas wrote:
>> Several people then suggested that you could accomplish your
>> originally stated goal - namely "restore a master *and replica* to a
>> point in time before Bad Stuff happened, and then have a working
>> master-replica pair" - by just connecting the new standby to the
>> master directly, without using recovery_target_time.  As long as
>> primary_conninfo and restore_command are both set, the standby should
>> be able to fetch older segments from the archive and then seamlessly
>> switch to fetching new segments from the new master.  If you tried
>> that and it didn't work, I don't see a description of the outcome
>> anywhere on this thread.
>
> Aha!
>
> I went ahead and tested that, which I did not ever expect to work since
> our documentation says it won't. The problem is that the replica
> continues to catch up from the archive *past* the point where the master
> is caught up.  It doesn't switch to the streaming replication connection
> until it runs out of archives (which is as designed, for a variety of
> good reasons).
>
> And since the replica is now beyond the master's timeline, replication
> fails with "end of wal reached on timeline 1 320/478ff780; new timeline
> 2 forked timeline 1 before current recovery point 320/47ffffe0".
>
> In order for this to work, the archive would need to stop before
> recovery_target_time.

Yeah, good point.  I didn't think of the case where you've rewound the
master but not the archive.  That will indeed require some special
handling, but it also seems like a somewhat unusual setup, because if
the master is trying to archive back to that same archive, archiving
will fail, with all the usual problems that entails.  Or maybe the
master is archiving there but on a different timeline, but in that
case why can the standby follow the timeline switch when connecting
directly to the master, but not via the archive?  My brain hurts.

>>> Is the current interaction of recovery_target_time and standby_mode
>>> (that is, that recovery_target_time causes standby_mode to be ignorned)
>>> the correct behavior?
>>
>> I think this summary of the behavior is probably not correct in
>> detail.  For example, if the recovery target isn't reached by the time
>> the standby reaches the end of archived WAL, I think standby_mode will
>> affect what happens next.
>
> Oh, yeah, good point.  So just a doc patch then.

Sounds like a plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: B-Tree index builds, CLUSTER, and sortsupport
Next
From: Robert Haas
Date:
Subject: Re: split builtins.h to quote.h