Re: Switching timeline over streaming replication - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Switching timeline over streaming replication |
Date | |
Msg-id | 506195B2.4030600@vmware.com Whole thread Raw |
In response to | Re: Switching timeline over streaming replication (Amit Kapila <amit.kapila@huawei.com>) |
List | pgsql-hackers |
On 25.09.2012 14:10, Amit Kapila wrote: > On Tuesday, September 25, 2012 12:39 PM Heikki Linnakangas wrote: >> On 24.09.2012 16:33, Amit Kapila wrote: >>> On Tuesday, September 11, 2012 10:53 PM Heikki Linnakangas wrote: >>>> I've been working on the often-requested feature to handle timeline >>>> changes over streaming replication. At the moment, if you kill the >>>> master and promote a standby server, and you have another standby >>>> server that you'd like to keep following the new master server, you >>>> need a WAL archive in addition to streaming replication to make it >>>> cross the timeline change. Streaming replication will just error >> out. >>>> Having a WAL archive is usually a good idea in complex replication >>>> scenarios anyway, but it would be good to not require it. >>> >>> Confirm my understanding of this feature: >>> >>> This feature is for case when standby-1 who is going to be promoted >> to >>> master has archive mode 'on'. >> >> No. This is for the case where there is no WAL archive. >> archive_mode='off' on all servers. >> >> Or to be precise, you can also have a WAL archive, but this patch >> doesn't affect that in any way. This is strictly about streaming >> replication. >> >>> As in that case only its timeline will change. >> >> The timeline changes whenever you promote a standby. It's not related >> to >> whether you have a WAL archive or not. > > Yes that is correct. I thought timeline change happens only when somebody > does PITR. > Can you please tell me why we change timeline after promotion, because the > original > Timeline concept was for PITR and I am not able to trace from code the > reason > why on promotion it is required? Bumping the timeline helps to avoid confusion if, for example, the master crashes, and the standby isn't fully in sync with it. In that situation, there are some WAL records in the master that are not in the standby, so promoting the standby is effectively the same as doing PITR. If you promote the standby, and later try to turn the old master into a standby server that connects to the new master, things will go wrong. Assigning the new master a new timeline ID helps the system and the administrator to notice that. It's not bulletproof, for example you can easily avoid the timeline change if you just remove recovery.conf and restart the server, but the timelines help to manage such situations. >>> If above is right, then there can be other similar scenario's where >> it can >>> be used: >>> >>> Scenario-1 (1 Master, 1 Stand-by) >>> 1. Master (archive_mode=on) goes down. >>> 2. Master again comes up >>> 3. Stand-by tries to follow it >>> >>> Now in above scenario also due to timeline mismatch it gives error, >> but your >>> patch should fix it. >> >> If the master simply crashes or is shut down, and then restarted, the >> timeline doesn't change. The standby will reconnect / poll the archive, >> and sync up just fine, even without this patch. > > How about when Master does PITR when it comes again? Then the timeline will be bumped and this patch will be helpful. Assuming the standby is behind the point in time that the master was recovered to, it will be able to follow the master to the new timeline. - Heikki
pgsql-hackers by date: