Thread: BUG #14109: pg_rewind fails to update target control file in one scenario
BUG #14109: pg_rewind fails to update target control file in one scenario
From
johnlumby@hotmail.com
Date:
The following bug has been logged on the website: Bug reference: 14109 Logged by: John Lumby Email address: johnlumby@hotmail.com PostgreSQL version: 9.5.1 Operating system: linux 64-bit Description: scenario : two systems currently in an operating streaming replication relationship : Primary systemA Standby SystemB with no WAL queued and no inserts/updates/deletes now being performed on systemA then in chronological sequence : . shut down SystemA . pg_ctl promote SystemB and verify systemB is running correctly stand-alone . pg_rewind SystemA output is something like connected to server fetched file "global/pg_control", length 8192 fetched file "pg_xlog/0000000D.history", length 388 servers diverged at WAL position 9/A90002A8 on timeline 12 no rewind required . set up correct recovery.conf on SystemA . start SystemA postgres server At this point, both systemB and systemA appear to be running correctly, but any insert/update/delete now performed on systemB is not replicated to systemA. Also pg_stat_replication view on systemB shows state 'startup' , not 'streaming' I believe there is a bug in pg_rewind for this scenario, where it finds that the following conditions are true : 1 - source and target cluster are not on the same timeline 2 - the histories diverged exactly at the end of the shutdown checkpoint record on the target, so there are no WAL records in the target that don't belong in the source's history The code then concludes that no rewind is needed. Which is true -- However, what I believe *is* needed is to update the target control file with the new timeline and other information from the source. This patch seems to fix the problem on my system : --- src/bin/pg_rewind/pg_rewind.c.orig 2016-02-08 16:12:28.000000000 -0500 +++ src/bin/pg_rewind/pg_rewind.c 2016-04-24 14:50:52.646737233 -0400 @@ -247,7 +247,14 @@ main(int argc, char **argv) * needed. */ if (chkptendrec == divergerec) + { rewind_needed = false; + /* however we must still copy the control file from source to target + * because of the timeline change. + */ + printf(_("no rewind required but will update global control file from source for increase in timeline.\n")); + goto updateControlFile; + } else rewind_needed = true; } @@ -318,6 +325,7 @@ main(int argc, char **argv) pg_log(PG_PROGRESS, "\ncreating backup label and updating control file\n"); createBackupLabel(chkptredo, chkpttli, chkptrec); + updateControlFile: /* * Update control file of target. Make it ready to perform archive * recovery when restarting.
Re: BUG #14109: pg_rewind fails to update target control file in one scenario
From
Michael Paquier
Date:
On Mon, Apr 25, 2016 at 4:25 AM, <johnlumby@hotmail.com> wrote: > However, what I believe *is* needed is to update the target control file > with the new timeline and other information from the source. No, this is incorrect. There is no need to update the control file of a node that has not been rewound, and pg_rewind should not mess up with that if there is no divergence point between the target and the source nodes or it would update the minimum recovery point of a node without real need to do so. It should be able to join back the cluster depending on its initial shutdown state (when you shut down systemA). What are the logs of your system A telling you regarding its startup state? -- Michael
Thanks Michael=2C=0A= =0A= After the pg_rewind in the scenario I described=2C =0A= =0A= 1) on System B (new Primary) I see=0A= =0A= Sat Apr 23 14:19:18 EDT 2016=0A= =0A= control file indicates =0A= last check point WAL id : 0000000C00000009000000A3=0A= =0A= =A0client_addr |=A0=A0=A0=A0=A0=A0=A0=A0 backend_start=A0=A0=A0=A0=A0=A0=A0= =A0 |=A0 state=A0 | sent_location | write_location | flush_location | repla= y_location =0A= -------------+-------------------------------+---------+---------------+---= -------------+----------------+-----------------=0A= =A010.19.0.1=A0=A0 | 2016-04-23 18:19:50.812509+00 | startup | 9/A30000D0= =A0=A0=A0 | 9/A30000D0=A0=A0=A0=A0 | 9/A30000D0=A0=A0=A0=A0 | 9/A30000D0=0A= =0A= =0A= 2) whereas on System A after pg_rewind=A0 I see=0A= =0A= Sat Apr 23 14:19:54 EDT 2016=0A= =0A= control file indicates =0A= =0A= last check point WAL id : 0000000B00000009000000A3=0A= =0A= =A0pg_last_xlog_receive_location() =2C pg_last_xlog_replay_location() indic= ates=0A= =0A= =A0pg_last_xlog_receive_location | pg_last_xlog_replay_location =0A= -------------------------------+------------------------------=0A= =A09/A3000000=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 | 9/= A30000D0=0A= (1 row)=0A= =0A= Note the difference in timeline=0A= =0A= and then=2C=A0 as I described=2C=A0=A0 no WAL is replicated from B to A.=0A= =0A= Did you try this scenario yourself?=A0=A0=A0=A0 I hope you agree it is a bu= g?=0A= I will defer to you on what part of the code is the true cause=2C=0A= but to me it looks very much as though pg_rewind ought to update the contro= l file in this scenario.=0A= That certainly does fix it.=0A= If not that=2C=A0=A0 then what?=0A= =0A= Cheers=2C=A0=A0 John=0A= =0A= ----------------------------------------=0A= > Date: Mon=2C 25 Apr 2016 16:23:58 +0900=0A= > Subject: Re: [BUGS] BUG #14109: pg_rewind fails to update target control = file in one scenario=0A= > From: michael.paquier@gmail.com=0A= > To: johnlumby@hotmail.com=0A= > CC: pgsql-bugs@postgresql.org=0A= >=0A= > On Mon=2C Apr 25=2C 2016 at 4:25 AM=2C <johnlumby@hotmail.com> wrote:=0A= >> However=2C what I believe *is* needed is to update the target control fi= le=0A= >> with the new timeline and other information from the source.=0A= >=0A= > No=2C this is incorrect. There is no need to update the control file of= =0A= > a node that has not been rewound=2C and pg_rewind should not mess up=0A= > with that if there is no divergence point between the target and the=0A= > source nodes or it would update the minimum recovery point of a node=0A= > without real need to do so. It should be able to join back the cluster=0A= > depending on its initial shutdown state (when you shut down systemA).=0A= > What are the logs of your system A telling you regarding its startup=0A= > state?=0A= > --=0A= > Michael=0A= =
Re: BUG #14109: pg_rewind fails to update target control file in one scenario
From
Michael Paquier
Date:
On Mon, Apr 25, 2016 at 10:48 PM, John Lumby <johnlumby@hotmail.com> wrote: > Thanks Michael, > > After the pg_rewind in the scenario I described, > > 1) on System B (new Primary) I see > > Sat Apr 23 14:19:18 EDT 2016 > > control file indicates > last check point WAL id : 0000000C00000009000000A3 > > client_addr | backend_start | state | sent_location | write_location | flush_location | replay_location > -------------+-------------------------------+---------+---------------+----------------+----------------+----------------- > 10.19.0.1 | 2016-04-23 18:19:50.812509+00 | startup | 9/A30000D0 | 9/A30000D0 | 9/A30000D0 | 9/A30000D0 > > > 2) whereas on System A after pg_rewind I see (pg_rewind is a no-op here). It has done nothing to the source node. When you ran it, it was clearly mentioned that "no rewind is needed". > Sat Apr 23 14:19:54 EDT 2016 > > control file indicates > last check point WAL id : 0000000B00000009000000A3 > > pg_last_xlog_receive_location() , pg_last_xlog_replay_location() indicates > > pg_last_xlog_receive_location | pg_last_xlog_replay_location > -------------------------------+------------------------------ > 9/A3000000 | 9/A30000D0 > (1 row) > > Note the difference in timeline Yes, and? System A is still on its previous timeline 11, and will jump to timeline 12 once it has connected back. That's possible since 9.3. > and then, as I described, no WAL is replicated from B to A. > Did you try this scenario yourself? Yes. > I hope you agree it is a bug? No. In this case pg_rewind is a no-op: system A was shut down *before* B was promoted, so it knows about the shutdown checkpoint record of system A. No rewind would be needed here. One potential issue with repetitive rewinds in such configurations is that after promotion of system B the control file information is not up to date to the new timeline, and pg_rewind runs and fetches the control data file of the source node which still has the outdated timeline information. You may want to issue a checkpoint on the source node after its promotion to ensure that its control file is in correct shape, and pointing to the latest timeline. Again there is no bug here. -- Michael
=0A= =0A= ----------------------------------------=0A= > Date: Mon=2C 25 Apr 2016 23:03:23 +0900=0A= > Subject: Re: [BUGS] BUG #14109: pg_rewind fails to update target control = file in one scenario=0A= > From: michael.paquier@gmail.com=0A= > To: johnlumby@hotmail.com=0A= > CC: pgsql-bugs@postgresql.org=0A= >=0A= > On Mon=2C Apr 25=2C 2016 at 10:48 PM=2C John Lumby <johnlumby@hotmail.com= > wrote:=0A= >>=0A= >> Note the difference in timeline=0A= >=0A= > Yes=2C and? System A is still on its previous timeline 11=2C and will jum= p=0A= > to timeline 12 once it has connected back. That's possible since 9.3.=0A= =0A= Answering your "and?" : In my testing=2C=A0 =0A= =A0 =A0 =A0 =A0 "and no WAL replicated from B=2C new Primary =2C to A =2C n= ew Standby".=0A= That is why I concluded there is a bug.=0A= Also on System A (new Standby) timeline did not jump up to 12=2C=A0 it rema= ined at 11.=0A= Are you indicating that the admin (me) needs to do something to make that h= appen?=0A= If so what?=0A= =0A= >=0A= >> and then=2C as I described=2C no WAL is replicated from B to A.=0A= >> Did you try this scenario yourself?=0A= >=0A= > Yes.=0A= =0A= And did you perform updates on System B (new Primary)=0A= and then observe them replicated to System A?=0A= I consistently see that *not* happening.=0A= =0A= >=0A= >> I hope you agree it is a bug?=0A= >=0A= > No. In this case pg_rewind is a no-op: system A was shut down *before*=0A= > B was promoted=2C so it knows about the shutdown checkpoint record of=0A= > system A. No rewind would be needed here. One potential issue with=0A= =0A= Yes=2C=A0 no rewind is needed=2C=A0 I agree.=A0=A0 But=2C=A0 as you stated = earlier=2C=0A= the timeline on System A needs to be incremented.=0A= So the question is=2C=A0 what is supposed to make that happen?=0A= =0A= >=0A= > Again there is no bug here.=0A= > --=0A= > Michael=0A= =
Re: BUG #14109: pg_rewind fails to update target control file in one scenario
From
Julien Rouhaud
Date:
On 25/04/2016 16:36, John Lumby wrote: >> From: michael.paquier@gmail.com >> To: johnlumby@hotmail.com >> CC: pgsql-bugs@postgresql.org >> >> On Mon, Apr 25, 2016 at 10:48 PM, John Lumby <johnlumby@hotmail.com> wrote: >>> >>> Note the difference in timeline >> >> Yes, and? System A is still on its previous timeline 11, and will jump >> to timeline 12 once it has connected back. That's possible since 9.3. > > Answering your "and?" : In my testing, > "and no WAL replicated from B, new Primary , to A , new Standby". > That is why I concluded there is a bug. > Also on System A (new Standby) timeline did not jump up to 12, it remained at 11. > Are you indicating that the admin (me) needs to do something to make that happen? > If so what? > Did you set the recovery_target_timeline parameter to "latest" in the recovery.conf file? (see http://www.postgresql.org/docs/current/static/recovery-target-settings.html). -- Julien Rouhaud http://dalibo.com - http://dalibo.org
=0A= =0A= ----------------------------------------=0A= > Subject: Re: [BUGS] BUG #14109: pg_rewind fails to update target control = file in one scenario=0A= > To: johnlumby@hotmail.com=3B michael.paquier@gmail.com=0A= > CC: pgsql-bugs@postgresql.org=0A= > From: julien.rouhaud@dalibo.com=0A= > Date: Mon=2C 25 Apr 2016 16:53:19 +0200=0A= >=0A= > On 25/04/2016 16:36=2C John Lumby wrote:=0A= >>> From: michael.paquier@gmail.com=0A= >>> To: johnlumby@hotmail.com=0A= >>> CC: pgsql-bugs@postgresql.org=0A= >>>=0A= >>> On Mon=2C Apr 25=2C 2016 at 10:48 PM=2C John Lumby <johnlumby@hotmail.c= om> wrote:=0A= >>>>=0A= >>>> Note the difference in timeline=0A= > Did you set the recovery_target_timeline parameter to "latest" in the=0A= > recovery.conf file? (see=0A= > http://www.postgresql.org/docs/current/static/recovery-target-settings.ht= ml).=0A= =0A= Thanks Julien=2C=A0 no=A0 I did not=2C=A0=A0=A0=A0 I will re-test with that= later=2C=A0 on 9.5.2=0A= Meanwhile one question on that.=A0=A0 The documentation states=0A= =A0=A0 "Setting this to latest recovers=0A= to the latest timeline found in the archive=2C which ..."=0A= However I am running my systems with =0A= archive_mode =3D off=A0=A0=A0 =0A= I am wondering whether =0A= recovery_target_timeline parameter =3D "latest" =0A= is expected to work reliably=A0 to fix my particular problem when archiving= is not in effect.=0A= =0A= >=0A= >=0A= > --=0A= > Julien Rouhaud=0A= > http://dalibo.com - http://dalibo.org=0A= =
Re: BUG #14109: pg_rewind fails to update target control file in one scenario
From
Guillaume Lelarge
Date:
2016-04-25 17:45 GMT+02:00 John Lumby <johnlumby@hotmail.com>: > > > ---------------------------------------- > > Subject: Re: [BUGS] BUG #14109: pg_rewind fails to update target control > file in one scenario > > To: johnlumby@hotmail.com; michael.paquier@gmail.com > > CC: pgsql-bugs@postgresql.org > > From: julien.rouhaud@dalibo.com > > Date: Mon, 25 Apr 2016 16:53:19 +0200 > > > > On 25/04/2016 16:36, John Lumby wrote: > >>> From: michael.paquier@gmail.com > >>> To: johnlumby@hotmail.com > >>> CC: pgsql-bugs@postgresql.org > >>> > >>> On Mon, Apr 25, 2016 at 10:48 PM, John Lumby <johnlumby@hotmail.com> > wrote: > >>>> > >>>> Note the difference in timeline > > Did you set the recovery_target_timeline parameter to "latest" in the > > recovery.conf file? (see > > > http://www.postgresql.org/docs/current/static/recovery-target-settings.html > ). > > Thanks Julien, no I did not, I will re-test with that later, on > 9.5.2 > Meanwhile one question on that. The documentation states > "Setting this to latest recovers > to the latest timeline found in the archive, which ..." > However I am running my systems with > archive_mode = off > I am wondering whether > recovery_target_timeline parameter = "latest" > is expected to work reliably to fix my particular problem when archiving > is not in effect. > > It also works with the streaming replication since at least 9.3. So, yes, it should work. -- Guillaume. http://blog.guillaume.lelarge.info http://www.dalibo.com
Re: BUG #14109: pg_rewind fails to update target control file in one scenario
From
Julien Rouhaud
Date:
On 25/04/2016 17:45, John Lumby wrote: >> From: julien.rouhaud@dalibo.com >> >> Did you set the recovery_target_timeline parameter to "latest" in the >> recovery.conf file? (see >> http://www.postgresql.org/docs/current/static/recovery-target-settings.html). > > Thanks Julien, no I did not, I will re-test with that later, on 9.5.2 > Meanwhile one question on that. The documentation states > "Setting this to latest recovers > to the latest timeline found in the archive, which ..." > However I am running my systems with > archive_mode = off > I am wondering whether > recovery_target_timeline parameter = "latest" > is expected to work reliably to fix my particular problem when archiving is not in effect. > I'm not sure where it's documented, but the timeline change works with log shipping, and since 9.3 with streaming replication. That's why Michael previously said: "System A is still on its previous timeline 11, and will jump to timeline 12 once it has connected back. That's possible since 9.3." -- Julien Rouhaud http://dalibo.com - http://dalibo.org
I've tested same scenario but with the setting=0A= =0A= recovery_target_timeline =3D 'latest'=0A= =0A= and on 9.5.2=2C=0A= =0A= and now the new Standby receives new WAL correctly after the pg_rewind and = restart=0A= =0A= So=2C=A0 assuming this is reliable (will work without requiring archiving)= =0A= then my problem is solved.=0A= =0A= Thanks for everyone's help=2C=A0 please close the bug as user error=0A= =0A= Cheers=2C=A0 John=0A= =0A= ----------------------------------------=0A= > Subject: Re: [BUGS] BUG #14109: pg_rewind fails to update target control = file in one scenario=0A= > To: johnlumby@hotmail.com=3B michael.paquier@gmail.com=0A= > CC: pgsql-bugs@postgresql.org=0A= > From: julien.rouhaud@dalibo.com=0A= > Date: Mon=2C 25 Apr 2016 19:06:57 +0200=0A= >=0A= > On 25/04/2016 17:45=2C John Lumby wrote:=0A= >>> From: julien.rouhaud@dalibo.com=0A= >>>=0A= >>> Did you set the recovery_target_timeline parameter to "latest" in the= =0A= >>> recovery.conf file? (see=0A= >>> http://www.postgresql.org/docs/current/static/recovery-target-settings.= html).=0A= >>=0A= >> Thanks Julien=2C no I did not=2C I will re-test with that later=2C on 9.= 5.2=0A= >> Meanwhile one question on that. The documentation states=0A= >> "Setting this to latest recovers=0A= >> to the latest timeline found in the archive=2C which ..."=0A= >> However I am running my systems with=0A= >> archive_mode =3D off=0A= >> I am wondering whether=0A= >> recovery_target_timeline parameter =3D "latest"=0A= >> is expected to work reliably to fix my particular problem when archiving= is not in effect.=0A= >>=0A= >=0A= > I'm not sure where it's documented=2C but the timeline change works with= =0A= > log shipping=2C and since 9.3 with streaming replication.=0A= >=0A= > That's why Michael previously said: "System A is still on its previous=0A= > timeline 11=2C and will jump=0A= > to timeline 12 once it has connected back. That's possible since 9.3."=0A= >=0A= > --=0A= > Julien Rouhaud=0A= > http://dalibo.com - http://dalibo.org=0A= =
Re: BUG #14109: pg_rewind fails to update target control file in one scenario
From
Michael Paquier
Date:
On Tue, Apr 26, 2016 at 7:15 AM, John Lumby <johnlumby@hotmail.com> wrote: > So, assuming this is reliable (will work without requiring archiving) > then my problem is solved. Depending on the checkpoint frequency and the activity on your systems, you may face problems with missing WAL segments at some point because past WAL segments need to be recycled or removed by the server to move on with its life. One way to take care of this class of problems is to use wal_keep_segments. An even better one is called replication slot. This solely depends on how your system is working, so perhaps you will not need some extra configuration. -- Michael
Thanks Michael=2C=0A= =0A= ----------------------------------------=0A= > Date: Tue=2C 26 Apr 2016 08:04:58 +0900=0A= > Subject: Re: [BUGS] BUG #14109: pg_rewind fails to update target control = file in one scenario=0A= > From: michael.paquier@gmail.com=0A= > To: johnlumby@hotmail.com=0A= > CC: julien.rouhaud@dalibo.com=3B pgsql-bugs@postgresql.org=0A= >=0A= > On Tue=2C Apr 26=2C 2016 at 7:15 AM=2C John Lumby <johnlumby@hotmail.com>= wrote:=0A= >> So=2C assuming this is reliable (will work without requiring archiving)= =0A= >> then my problem is solved.=0A= >=0A= > Depending on the checkpoint frequency and the activity on your=0A= > systems=2C you may face problems with missing WAL segments at some point= =0A= > because past WAL segments need to be recycled or removed by the server=0A= > to move on with its life.=0A= =0A= Yes=2C=A0 I fear I could be caught out by that --=0A= in fact that is why I now always "stabilize" the replication by halting ins= /upd/del activity=0A= and then shut the current Primary down first before promoting current Stand= by.=0A= I *think* that then should guarantee there cannot be any missing WAL segmen= ts=0A= when I then rewind the old Primary to become new Standby.=0A= =0A= > One way to take care of this class of=0A= > problems is to use wal_keep_segments. An even better one is called=0A= > replication slot.=0A= =0A= Regarding replication slots=A0 --=A0=A0 Actually I do use them (I think it = is unsafe to run=0A= streaming replication without either archiving or a replication slot) =0A= but even that would still not guarantee success =0A= if I did not take the precaution of shutting down current primary first bef= ore flip.=0A= =0A= And=A0 ..=A0=A0 we discussed this very point in pqsql-general just a month = ago=A0 --=0A= =0A= http://www.postgresql.org/message-id/COL131-W804D45E77B0D0FB1EF08B1A3890@ph= x.gbl=0A= =0A= I did not get any answer to my suggestion in that post but I think it might= be useful.=0A= =0A= > This solely depends on how your system is working=2C=0A= > so perhaps you will not need some extra configuration.=0A= > --=0A= > Michael=0A= =0A= I think there needs to be some clear instructions on exactly what configura= tion is needed=0A= to be able to run streaming replication and always be able to flip=0A= Standby->Primary=A0=A0 =2C=A0 <some actions>=A0=A0 =2C=A0=A0 Primary-> Stan= dby=0A= =0A= and in those posts in pgsql-general I wrote a suggested addition to the wik= i page=0A= but was unable to edit it myself.=0A= =0A= Cheers=2C=A0=A0=A0 John=0A= =0A= =
Re: BUG #14109: pg_rewind fails to update target control file in one scenario
From
Michael Paquier
Date:
On Tue, Apr 26, 2016 at 10:37 PM, John Lumby <johnlumby@hotmail.com> wrote: > I wrote: >> One way to take care of this class of >> problems is to use wal_keep_segments. An even better one is called >> replication slot. > > Regarding replication slots -- Actually I do use them (I think it is unsafe to run > streaming replication without either archiving or a replication slot) > but even that would still not guarantee success > if I did not take the precaution of shutting down current primary first before flip. > > And .. we discussed this very point in pqsql-general just a month ago -- > > http://www.postgresql.org/message-id/COL131-W804D45E77B0D0FB1EF08B1A3890@phx.gbl My memory is so short-lived lately... I did not recall that :) > I did not get any answer to my suggestion in that post but I think it might be useful. Replication slots are perfectly able to retain WAL segments from a prior timeline, so I am not sure that this would be much a gain. And as they can be used as well on standbys you could create/drop slots on it at regular intervals. Or more simply use a WAL archive. -- Michael
Re: BUG #14109: pg_rewind fails to update target control file in one scenario
From
Michael Paquier
Date:
On Wed, Apr 27, 2016 at 9:47 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Tue, Apr 26, 2016 at 10:37 PM, John Lumby <johnlumby@hotmail.com> wrote: >> I did not get any answer to my suggestion in that post but I think it might be useful. > > Replication slots are perfectly able to retain WAL segments from a > prior timeline, so I am not sure that this would be much a gain. Or to put in in other words, a slot that activates automatically itself at promotion to retain WAL from the previous timeline would be interesting for pg_rewind, but the use case we have here is only pg_rewind. so it seems a bit narrow to bother modifying the core core to add this support in the replication slot code. And this can be solved lightly with an archive. Thoughts from others are welcome. -- Michael
Re: BUG #14109: pg_rewind fails to update target control file in one scenario
From
Alvaro Herrera
Date:
Michael Paquier wrote: > On Wed, Apr 27, 2016 at 9:47 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: > > Replication slots are perfectly able to retain WAL segments from a > > prior timeline, so I am not sure that this would be much a gain. > > Or to put in in other words, a slot that activates automatically > itself at promotion to retain WAL from the previous timeline would be > interesting for pg_rewind, but the use case we have here is only > pg_rewind. so it seems a bit narrow to bother modifying the core core > to add this support in the replication slot code. And this can be > solved lightly with an archive. Thoughts from others are welcome. Sounds to me like it should be enough to document (in pg_rewind's page) that activating a slot prior to the promotion. Maybe we could have "pg_ctl promote" have an option to create a slot, to make this simpler? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #14109: pg_rewind fails to update target control file in one scenario
From
Michael Paquier
Date:
On Thu, Apr 28, 2016 at 8:28 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Michael Paquier wrote: >> On Wed, Apr 27, 2016 at 9:47 AM, Michael Paquier >> <michael.paquier@gmail.com> wrote: > >> > Replication slots are perfectly able to retain WAL segments from a >> > prior timeline, so I am not sure that this would be much a gain. >> >> Or to put in in other words, a slot that activates automatically >> itself at promotion to retain WAL from the previous timeline would be >> interesting for pg_rewind, but the use case we have here is only >> pg_rewind. so it seems a bit narrow to bother modifying the core core >> to add this support in the replication slot code. And this can be >> solved lightly with an archive. Thoughts from others are welcome. > > Sounds to me like it should be enough to document (in pg_rewind's page) > that activating a slot prior to the promotion. Maybe we could have > "pg_ctl promote" have an option to create a slot, to make this simpler? That would be actually creating a slot before the last common checkpoint before promotion, which is a bit more tricky to evaluate. With luck, perhaps both of them would be on the same segment, but there is no way to be sure about that... It seems to me that what we are looking for here is a way to tell through pg_create_physical_replication_slot to reserve WAL from a position that caller has decided. -- Michael
Re: BUG #14109: pg_rewind fails to update target control file in one scenario
From
Alvaro Herrera
Date:
Michael Paquier wrote: > On Thu, Apr 28, 2016 at 8:28 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > Sounds to me like it should be enough to document (in pg_rewind's page) > > that activating a slot prior to the promotion. Maybe we could have > > "pg_ctl promote" have an option to create a slot, to make this simpler? > > That would be actually creating a slot before the last common > checkpoint before promotion, which is a bit more tricky to evaluate. > With luck, perhaps both of them would be on the same segment, but > there is no way to be sure about that... It seems to me that what we > are looking for here is a way to tell through > pg_create_physical_replication_slot to reserve WAL from a position > that caller has decided. Well, that sounds like just a SMOP, doesn't it? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #14109: pg_rewind fails to update target control file in one scenario
From
Michael Paquier
Date:
On Fri, Apr 29, 2016 at 8:42 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Well, that sounds like just a SMOP, doesn't it? Oh, actually, looking at ReplicationSlotReserveWal(), what is used as a restart LSN is the last redo position. I haven't noticed that until now. So indded if you create a slot just before the promotion that would actually be fine. -- Michael