Re: Switching timeline over streaming replication - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Switching timeline over streaming replication
Date
Msg-id 50615879.2090608@vmware.com
Whole thread Raw
In response to Re: Switching timeline over streaming replication  (Amit Kapila <amit.kapila@huawei.com>)
Responses Re: Switching timeline over streaming replication  (Amit Kapila <amit.kapila@huawei.com>)
Re: Switching timeline over streaming replication  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On 24.09.2012 16:33, Amit Kapila wrote:
> On Tuesday, September 11, 2012 10:53 PM Heikki Linnakangas wrote:
>> I've been working on the often-requested feature to handle timeline
>> changes over streaming replication. At the moment, if you kill the
>> master and promote a standby server, and you have another standby
>> server that you'd like to keep following the new master server, you
>> need a WAL archive in addition to streaming replication to make it
>> cross the timeline change. Streaming replication will just error out.
>> Having a WAL archive is usually a good idea in complex replication
>> scenarios anyway, but it would be good to not require it.
>
> Confirm my understanding of this feature:
>
> This feature is for case when standby-1 who is going to be promoted to
> master has archive mode 'on'.

No. This is for the case where there is no WAL archive. 
archive_mode='off' on all servers.

Or to be precise, you can also have a WAL archive, but this patch 
doesn't affect that in any way. This is strictly about streaming 
replication.

> As in that case only its timeline will change.

The timeline changes whenever you promote a standby. It's not related to 
whether you have a WAL archive or not.

> If above is right, then there can be other similar scenario's where it can
> be used:
>
> Scenario-1 (1 Master, 1 Stand-by)
> 1. Master (archive_mode=on) goes down.
> 2. Master again comes up
> 3. Stand-by tries to follow it
>
> Now in above scenario also due to timeline mismatch it gives error, but your
> patch should fix it.

If the master simply crashes or is shut down, and then restarted, the 
timeline doesn't change. The standby will reconnect / poll the archive, 
and sync up just fine, even without this patch.

> However I am not sure about splitting for RestoreArchivedFile() and
> ExecuteRecoveryCommand() into separate file.
> How about splitting for all Archive related functions:
> static void XLogArchiveNotify(const char *xlog);
> static void XLogArchiveNotifySeg(XLogSegNo segno);
> static bool XLogArchiveCheckDone(const char *xlog);
> static bool XLogArchiveIsBusy(const char *xlog);
> static void XLogArchiveCleanup(const char *xlog);

Hmm, sounds reasonable.

> In any case, it will be better if you can split it into multiple patches:
> 1. Having new functionality of "Switching timeline over streaming
> replication"
> 2. Refactoring related changes.
>
> It can make my testing and review for new feature patch little easier.

Yep, I'll go ahead and split the patch. Thanks!

- Heikki



pgsql-hackers by date:

Previous
From: Dan Scott
Date:
Subject: Re: plpgsql gram.y make rule
Next
From: Heikki Linnakangas
Date:
Subject: Re: Configuration include directory