Re: Switching timeline over streaming replication - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Switching timeline over streaming replication
Date
Msg-id 506195B2.4030600@vmware.com
Whole thread Raw
In response to Re: Switching timeline over streaming replication  (Amit Kapila <amit.kapila@huawei.com>)
List pgsql-hackers
On 25.09.2012 14:10, Amit Kapila wrote:
>    On Tuesday, September 25, 2012 12:39 PM Heikki Linnakangas wrote:
>> On 24.09.2012 16:33, Amit Kapila wrote:
>>> On Tuesday, September 11, 2012 10:53 PM Heikki Linnakangas wrote:
>>>> I've been working on the often-requested feature to handle timeline
>>>> changes over streaming replication. At the moment, if you kill the
>>>> master and promote a standby server, and you have another standby
>>>> server that you'd like to keep following the new master server, you
>>>> need a WAL archive in addition to streaming replication to make it
>>>> cross the timeline change. Streaming replication will just error
>> out.
>>>> Having a WAL archive is usually a good idea in complex replication
>>>> scenarios anyway, but it would be good to not require it.
>>>
>>> Confirm my understanding of this feature:
>>>
>>> This feature is for case when standby-1 who is going to be promoted
>> to
>>> master has archive mode 'on'.
>>
>> No. This is for the case where there is no WAL archive.
>> archive_mode='off' on all servers.
>>
>> Or to be precise, you can also have a WAL archive, but this patch
>> doesn't affect that in any way. This is strictly about streaming
>> replication.
>>
>>> As in that case only its timeline will change.
>>
>> The timeline changes whenever you promote a standby. It's not related
>> to
>> whether you have a WAL archive or not.
>
>    Yes that is correct. I thought timeline change happens only when somebody
> does PITR.
>    Can you please tell me why we change timeline after promotion, because the
> original
>    Timeline concept was for PITR and I am not able to trace from code the
> reason
>    why on promotion it is required?

Bumping the timeline helps to avoid confusion if, for example, the 
master crashes, and the standby isn't fully in sync with it. In that 
situation, there are some WAL records in the master that are not in the 
standby, so promoting the standby is effectively the same as doing PITR. 
If you promote the standby, and later try to turn the old master into a 
standby server that connects to the new master, things will go wrong. 
Assigning the new master a new timeline ID helps the system and the 
administrator to notice that.

It's not bulletproof, for example you can easily avoid the timeline 
change if you just remove recovery.conf and restart the server, but the 
timelines help to manage such situations.

>>> If above is right, then there can be other similar scenario's where
>> it can
>>> be used:
>>>
>>> Scenario-1 (1 Master, 1 Stand-by)
>>> 1. Master (archive_mode=on) goes down.
>>> 2. Master again comes up
>>> 3. Stand-by tries to follow it
>>>
>>> Now in above scenario also due to timeline mismatch it gives error,
>> but your
>>> patch should fix it.
>>
>> If the master simply crashes or is shut down, and then restarted, the
>> timeline doesn't change. The standby will reconnect / poll the archive,
>> and sync up just fine, even without this patch.
>
> How about when Master does PITR when it comes again?

Then the timeline will be bumped and this patch will be helpful. 
Assuming the standby is behind the point in time that the master was 
recovered to, it will be able to follow the master to the new timeline.

- Heikki



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Switching timeline over streaming replication
Next
From: Michael Paquier
Date:
Subject: Re: pg_reorg in core?