Thread: Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WALfiles
On 2017-09-30 02:17:54,"Alexander Korotkov" <a.korotkov@postgrespro.ru> wrote:
Great. Now code of this patch looks good for me.However, we forgot about documentation.<para>
The result is equivalent to replacing the target data directory with the
source one. Only changed blocks from relation files are copied;
all other files are copied in full, including configuration files. The
advantage of <application>pg_rewind</> over taking a new base backup, or
tools like <application>rsync</>, is that <application>pg_rewind</> does
not require reading through unchanged blocks in the cluster. This makes
it a lot faster when the database is large and only a small
fraction of blocks differ between the clusters.
</para>At least, this paragraph need to be adjusted, because it states whose files are copied. And probably latter paragraphs whose state about WAL files.
Your are rigth.
I wrote a draft as following, but i'm afraid whether the english statement is accurate.
----------------------
diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index d5430d4..bcd094b 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -50,9 +50,12 @@ PostgreSQL documentation
<para>
The result is equivalent to replacing the target data directory with the
source one. Only changed blocks from relation files are copied;
- all other files are copied in full, including configuration files. The
- advantage of <application>pg_rewind</> over taking a new base backup, or
- tools like <application>rsync</>, is that <application>pg_rewind</> does
+ all other files except WAL are copied in full, including configuration
+ files. Only the WAL files between the point of divergence and the current
+ WAL insert location of the source server are copied, for other WAL files
+ are useless for the target server. The advantage of
+ <application>pg_rewind</> over taking a new base backup, or tools
+ like <application>rsync</>, is that <application>pg_rewind</> does
not require reading through unchanged blocks in the cluster. This makes
it a lot faster when the database is large and only a small
fraction of blocks differ between the clusters.
@@ -231,7 +234,7 @@ PostgreSQL documentation
<para>
Copy all other files such as <filename>pg_xact</filename> and
configuration files from the source cluster to the target cluster
- (everything except the relation files).
+ (everything except the relation files and some WAL files).
</para>
</step>
<step>
------
Best Regars,
Chen Huajun
On Sat, Sep 30, 2017 at 8:18 PM, chenhj <chjischj@163.com> wrote:
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
On 2017-09-30 02:17:54,"Alexander Korotkov" <a.korotkov@postgrespro.ru>wrote: Great. Now code of this patch looks good for me.However, we forgot about documentation.<para>
The result is equivalent to replacing the target data directory with the
source one. Only changed blocks from relation files are copied;
all other files are copied in full, including configuration files. The
advantage of <application>pg_rewind</> over taking a new base backup, or
tools like <application>rsync</>, is that <application>pg_rewind</> does
not require reading through unchanged blocks in the cluster. This makes
it a lot faster when the database is large and only a small
fraction of blocks differ between the clusters.
</para>At least, this paragraph need to be adjusted, because it states whose files are copied. And probably latter paragraphs whose state about WAL files.Your are rigth.I wrote a draft as following, but i'm afraid whether the english statement is accurate.
I'm not native english speaker too :(
Only the WAL files between the point of divergence and the current WAL insert location of the source server are copied, *for* other WAL files are useless for the target server.
I'm not sure about this usage of word *for*. For me, it probably should be just removed. Rest of changes looks good for me. Please, integrate them into the patch.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 2017-10-01 04:09:19,"Alexander Korotkov" <a.korotkov@postgrespro.ru> wrote:
On Sat, Sep 30, 2017 at 8:18 PM, chenhj <chjischj@163.com> wrote:On 2017-09-30 02:17:54,"Alexander Korotkov" <a.korotkov@postgrespro.ru>wrote: Great. Now code of this patch looks good for me.However, we forgot about documentation.<para>
The result is equivalent to replacing the target data directory with the
source one. Only changed blocks from relation files are copied;
all other files are copied in full, including configuration files. The
advantage of <application>pg_rewind</> over taking a new base backup, or
tools like <application>rsync</>, is that <application>pg_rewind</> does
not require reading through unchanged blocks in the cluster. This makes
it a lot faster when the database is large and only a small
fraction of blocks differ between the clusters.
</para>At least, this paragraph need to be adjusted, because it states whose files are copied. And probably latter paragraphs whose state about WAL files.Your are rigth.I wrote a draft as following, but i'm afraid whether the english statement is accurate.I'm not native english speaker too :(Only the WAL files between the point of divergence and the current WAL insert location of the source server are copied, *for* other WAL files are useless for the target server.I'm not sure about this usage of word *for*. For me, it probably should be just removed. Rest of changes looks good for me. Please, integrate them into the patch.
I had removed the *for* , Pleae check the new patch again.
---
Best Regards,
Chen Huajun
Attachment
On Sun, Oct 1, 2017 at 8:27 PM, chenhj <chjischj@163.com> wrote:
On 2017-10-01 04:09:19,"Alexander Korotkov" <a.korotkov@postgrespro.ru> wrote:On Sat, Sep 30, 2017 at 8:18 PM, chenhj <chjischj@163.com> wrote: On 2017-09-30 02:17:54,"Alexander Korotkov" <a.korotkov@postgrespro.ru> wrote: Great. Now code of this patch looks good for me.However, we forgot about documentation.<para>
The result is equivalent to replacing the target data directory with the
source one. Only changed blocks from relation files are copied;
all other files are copied in full, including configuration files. The
advantage of <application>pg_rewind</> over taking a new base backup, or
tools like <application>rsync</>, is that <application>pg_rewind</> does
not require reading through unchanged blocks in the cluster. This makes
it a lot faster when the database is large and only a small
fraction of blocks differ between the clusters.
</para>At least, this paragraph need to be adjusted, because it states whose files are copied. And probably latter paragraphs whose state about WAL files.Your are rigth.I wrote a draft as following, but i'm afraid whether the english statement is accurate.I'm not native english speaker too :(Only the WAL files between the point of divergence and the current WAL insert location of the source server are copied, *for* other WAL files are useless for the target server.I'm not sure about this usage of word *for*. For me, it probably should be just removed. Rest of changes looks good for me. Please, integrate them into the patch.I had removed the *for* , Pleae check the new patch again.
Now, this patch looks good for me. It applies cleanly, builds cleanly, passes regression tests, new functionality is covered by regression tests. Code is OK for me and docs too.
I'm marking this patch as "Ready for committer". BTW, authors field in the commitfest app is empty (https://commitfest.postgresql.org/15/1302/). Please, put your name there.
I hope this patch will be committed during 2017-11 commitfest. Be ready to rebase this patch if needed. Thank you for your work.
------
Alexander Korotkov
Postgres Professional: http://www. postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.
The Russian Postgres Company
On 2017-10-02 23:24:30,"Alexander Korotkov" <a.korotkov@postgrespro.ru> wrote:
On Sun, Oct 1, 2017 at 8:27 PM, chenhj <chjischj@163.com> wrote:Now, this patch looks good for me. It applies cleanly, builds cleanly, passes regression tests, new functionality is covered by regression tests. Code is OK for me and docs too.I'm marking this patch as "Ready for committer". BTW, authors field in the commitfest app is empty (https://commitfest.postgresql.org/15/1302/). Please, put your name there.I hope this patch will be committed during 2017-11 commitfest. Be ready to rebase this patch if needed. Thank you for your work.
I had filled the authors field of this patch in commitfest, and will rebase this patch if needed. Thank you for your help!
------
Best Regards,
Chen Huajun
At 2017-12-01 12:27:09, "Michael Paquier" <michael.paquier@gmail.com> wrote: >On Tue, Oct 3, 2017 at 1:20 AM, chenhj <chjischj@163.com> wrote: >> I had filled the authors field of this patch in commitfest, and will rebase >> this patch if needed. Thank you for your help! > >The documentation of the patch needs a rebase, so I am moving it to >next CF with "waiting on author" as status. > >$ git diff master --check >src/bin/pg_rewind/pg_rewind.c:292: trailing whitespace. >+ * >There are whitespace complains. >-->MichaelRebased and removed the whitespace.----regardsChen Huajun
Attachment
Greetings! * chenhj (chjischj@163.com) wrote: > Rebased and removed the whitespace. Thanks for working on this, I agree that it seems like a pretty cool optimization for pg_rewind. I've only read through the thread to try and understand what's going on and the first thing that comes to mind is that you're changing pg_rewind to not remove the WAL from before the divergence (split) point, but I'm not sure why. As noted, that WAL isn't needed for anything (it's from before the split, after all), so why keep it? Is there something in this optimization that depends on the old WAL being there and, if so, what and why? That's also different from how pg_basebackup works, which I don't think is good (seems like pg_rewind should operate in a pretty similar manner to pg_basebackup). Setting this back to Waiting for Author but hopefully you can reply soon and clarify that, possibly adjusting the patch accordingly. Thanks! Stephen
Attachment
At 2018-01-23 09:56:48, "Stephen Frost" <sfrost@snowman.net> wrote: > >I've only read through the thread to try and understand what's going on >and the first thing that comes to mind is that you're changing >pg_rewind to not remove the WAL from before the divergence (split) >point, but I'm not sure why. As noted, that WAL isn't needed for >anything (it's from before the split, after all), so why keep it? Is >there something in this optimization that depends on the old WAL being >there and, if so, what and why?After run pg_rewind, the first startup of postgres will do crash recovery. And crash recovery will begin from the previous redo point preceding the divergence. So, the WAL after the redo point and before the divergence is needed.Of course, the WAL before the redo point is not needed, but in my point of view, recycling unwanted WAL does not have to be done by pg_rewind. reference code: pg_rewind.c: main(int argc, char **argv) { ... findLastCheckpoint(datadir_target, divergerec, lastcommontliIndex, &chkptrec, &chkpttli, &chkptredo); ... createBackupLabel(chkptredo, chkpttli, chkptrec); ... } ... createBackupLabel(XLogRecPtr startpoint, TimeLineID starttli, XLogRecPtr checkpointloc) { ...len = snprintf(buf, sizeof(buf), "START WAL LOCATION: %X/%X (file %s)\n" "CHECKPOINT LOCATION: %X/%X\n" "BACKUP METHOD: pg_rewind\n" "BACKUP FROM: standby\n" "START TIME: %s\n",/* omit LABEL: line */ (uint32) (startpoint >> 32), (uint32) startpoint, xlogfilename, (uint32) (checkpointloc >> 32), (uint32) checkpointloc, strfbuf); ... }>That's also different from how pg_basebackup works, which I don't think >is good (seems like pg_rewind should operate in a pretty similar manner >to pg_basebackup). >Thanks for your comments! I also considered copy WAL just like how pg_basebackup does,but a implement similar to pg_basebackup's manner may be not so simple.Twice transport of files from source to target may be needed,first for data files, and another for WAL. And the WAL which contains the previous redo point preceding the divergence may be only exists in target server and had been recycled in source. That's different between pg_rewind and pg_basebackup.Regards,Chen Huajun
Greetings, * chenhj (chjischj@163.com) wrote: > > At 2018-01-23 09:56:48, "Stephen Frost" <sfrost@snowman.net> wrote: > >I've only read through the thread to try and understand what's going on > >and the first thing that comes to mind is that you're changing > >pg_rewind to not remove the WAL from before the divergence (split) > >point, but I'm not sure why. As noted, that WAL isn't needed for > >anything (it's from before the split, after all), so why keep it? Is > >there something in this optimization that depends on the old WAL being > >there and, if so, what and why? > > After run pg_rewind, the first startup of postgres will do crash recovery. > And crash recovery will begin from the previous redo point preceding the divergence. > So, the WAL after the redo point and before the divergence is needed. Right. > Of course, the WAL before the redo point is not needed, but in my point of view, > recycling unwanted WAL does not have to be done by pg_rewind. That's what pg_rewind has been doing though, isn't it? And it's not like that WAL is useful for anything, is it? That's also how pg_basebackup works. > >That's also different from how pg_basebackup works, which I don't think > >is good (seems like pg_rewind should operate in a pretty similar manner > >to pg_basebackup). > > Thanks for your comments! > I also considered copy WAL just like how pg_basebackup does,but a implement similar to pg_basebackup's manner may be notso simple. Using the replication protocol to fetch WAL would be a good thing to do (actually, making pg_rewind entirely work through a connection to the current primary would be great) but that's independent of what I'm asking for here. Here I'm just suggesting that we not change what pg_rewind is doing today when it comes to the existing WAL on the old-primary. > And the WAL which contains the previous redo point preceding the divergence may be only exists in target server and hadbeen recycled in source. That's different between pg_rewind and pg_basebackup. Hm, pg_rewind was removing that and expecting it to be on the new primary? If that's the case then I could see an argument for keeping WAL that's from the divergence point onward, but I still don't think we should have pg_rewind just leave all of the prior WAL in place. Thanks! Stephen
Attachment
On Wed, Jan 24, 2018 at 12:43:51PM -0500, Stephen Frost wrote: > * chenhj (chjischj@163.com) wrote: >> At 2018-01-23 09:56:48, "Stephen Frost" <sfrost@snowman.net> wrote: >>> I've only read through the thread to try and understand what's going on >>> and the first thing that comes to mind is that you're changing >>> pg_rewind to not remove the WAL from before the divergence (split) >>> point, but I'm not sure why. As noted, that WAL isn't needed for >>> anything (it's from before the split, after all), so why keep it? Is >>> there something in this optimization that depends on the old WAL being >>> there and, if so, what and why? >> >> After run pg_rewind, the first startup of postgres will do crash recovery. >> And crash recovery will begin from the previous redo point preceding the divergence. >> So, the WAL after the redo point and before the divergence is needed. > > Right. Most of the time, and particularly since v11 has removed the need to retain more past segments than one completed checkpoint, those segments have less chances to be on the source server, limiting more the impact of the patch discussed on this thread. >> Of course, the WAL before the redo point is not needed, but in my point of view, >> recycling unwanted WAL does not have to be done by pg_rewind. > > That's what pg_rewind has been doing though, isn't it? And it's not > like that WAL is useful for anything, is it? That's also how > pg_basebackup works. As of HEAD, pg_rewind handles data in pg_wal similarly to other paths which are not relation files: the files from the source are just blindly copied to the target. After the rewind and once recovery begins, we just let the startup process do the cleanup instead of pg_rewind. Regarding that pg_basebackup is different, you get the choice to do what you want using --wal-method, and you actually get the segments that you only need to get a self-contained base backup. >>> That's also different from how pg_basebackup works, which I don't think >>> is good (seems like pg_rewind should operate in a pretty similar manner >>> to pg_basebackup). >> >> Thanks for your comments! >> I also considered copy WAL just like how pg_basebackup does,but a >> implement similar to pg_basebackup's manner may be not so simple. > > Using the replication protocol to fetch WAL would be a good thing to do > (actually, making pg_rewind entirely work through a connection to the > current primary would be great) but that's independent of what I'm > asking for here. Here I'm just suggesting that we not change what > pg_rewind is doing today when it comes to the existing WAL on the > old-primary. Yes, superuser is necessary now, if we could get to a point where only a replication permission is needed that would be nice. Now we could do things differently. We could have a system role dedicated to pg_rewind which works only on the functions from genfile.c that pg_rewind needs, in order to leverage the need of a superuser. >> And the WAL which contains the previous redo point preceding the >> divergence may be only exists in target server and had been recycled >> in source. That's different between pg_rewind and pg_basebackup. > > Hm, pg_rewind was removing that and expecting it to be on the new > primary? If that's the case then I could see an argument for keeping > WAL that's from the divergence point onward, but I still don't think > we should have pg_rewind just leave all of the prior WAL in place. Another thing that we could as well do is simply not fetching any WAL files at all during a rewind, then let the startup process of the rewound server decide by itself what it needs. This would leverage the data transfered in all cases. It is easy to define the start point of WAL segments needed for a rewound server because the last checkpoint record before WAL forked is calculated before transferring any data. The finish point cannot be exact though because you don't know up to which point you should transfer it. In some ways, this is close to a base backup. We could as well define an end point to minimize the amount of WAL as the last completed segment before data transfer begins, but then you need to worry about WAL segment holes and such. At the end of the day, just not transferring any data from pg_wal looks more solid to me as a first step if we need to worry about data that is transferred but finishes by being useless. -- Michael
Attachment
Michael, all, * Michael Paquier (michael.paquier@gmail.com) wrote: > On Wed, Jan 24, 2018 at 12:43:51PM -0500, Stephen Frost wrote: > > * chenhj (chjischj@163.com) wrote: > >> At 2018-01-23 09:56:48, "Stephen Frost" <sfrost@snowman.net> wrote: > >>> I've only read through the thread to try and understand what's going on > >>> and the first thing that comes to mind is that you're changing > >>> pg_rewind to not remove the WAL from before the divergence (split) > >>> point, but I'm not sure why. As noted, that WAL isn't needed for > >>> anything (it's from before the split, after all), so why keep it? Is > >>> there something in this optimization that depends on the old WAL being > >>> there and, if so, what and why? > >> > >> After run pg_rewind, the first startup of postgres will do crash recovery. > >> And crash recovery will begin from the previous redo point preceding the divergence. > >> So, the WAL after the redo point and before the divergence is needed. > > > > Right. > > Most of the time, and particularly since v11 has removed the need to > retain more past segments than one completed checkpoint, those segments > have less chances to be on the source server, limiting more the impact > of the patch discussed on this thread. Good point. > >>> That's also different from how pg_basebackup works, which I don't think > >>> is good (seems like pg_rewind should operate in a pretty similar manner > >>> to pg_basebackup). > >> > >> Thanks for your comments! > >> I also considered copy WAL just like how pg_basebackup does,but a > >> implement similar to pg_basebackup's manner may be not so simple. > > > > Using the replication protocol to fetch WAL would be a good thing to do > > (actually, making pg_rewind entirely work through a connection to the > > current primary would be great) but that's independent of what I'm > > asking for here. Here I'm just suggesting that we not change what > > pg_rewind is doing today when it comes to the existing WAL on the > > old-primary. > > Yes, superuser is necessary now, if we could get to a point where only a > replication permission is needed that would be nice. Now we could do > things differently. We could have a system role dedicated to pg_rewind > which works only on the functions from genfile.c that pg_rewind needs, > in order to leverage the need of a superuser. Actually, the other work I'm doing nearby wrt removing the explicit superuser() checks in those functions would allow a non-superuser role to be created which could be used by pg_rewind. We could even add an explicit 'pg_rewind' default role if we wanted to, but is that the route we want to go with this or should we be thinking about changing pg_rewind to use the replication protocol and a replication user instead..? The only reason that it doesn't today is that there isn't an easy way for it to do so with the existing replication protocol, as I understand it. Then again, there's this whole question of if we should even keep the replication protocol or if we should be getting rid of it in favor of have regular connections that can support replication. There's been discussion of that recently, as I recall, though I can't remember where. > >> And the WAL which contains the previous redo point preceding the > >> divergence may be only exists in target server and had been recycled > >> in source. That's different between pg_rewind and pg_basebackup. > > > > Hm, pg_rewind was removing that and expecting it to be on the new > > primary? If that's the case then I could see an argument for keeping > > WAL that's from the divergence point onward, but I still don't think > > we should have pg_rewind just leave all of the prior WAL in place. > > Another thing that we could as well do is simply not fetching any WAL > files at all during a rewind, then let the startup process of the > rewound server decide by itself what it needs. This would leverage the > data transfered in all cases. It is easy to define the start point of > WAL segments needed for a rewound server because the last checkpoint > record before WAL forked is calculated before transferring any data. The > finish point cannot be exact though because you don't know up to which > point you should transfer it. In some ways, this is close to a base > backup. We could as well define an end point to minimize the amount of > WAL as the last completed segment before data transfer begins, but then > you need to worry about WAL segment holes and such. At the end of the > day, just not transferring any data from pg_wal looks more solid to me > as a first step if we need to worry about data that is transferred but > finishes by being useless. Having the rewound server decide by itself what it needs actually sounds like it would be better and would allow whatever the restore command is to fetch any missing WAL necessary (which it might be able to do more efficiently, from an archive server and possibly even in parallel as compared to pg_rewind doing it serially from the primary...). We don't have any particularly easy way to prevent needed WAL from disappearing off of the primary anyway, so it seems like it would be necessary to have a working restore command for pg_rewind to be reliable. While we could create a physical replication slot for pg_rewind when it starts, that may already be too late. Thanks! Stephen
Attachment
On Thu, Jan 25, 2018 at 07:38:37AM -0500, Stephen Frost wrote: > Michael, all, > > * Michael Paquier (michael.paquier@gmail.com) wrote: >> On Wed, Jan 24, 2018 at 12:43:51PM -0500, Stephen Frost wrote: >>> * chenhj (chjischj@163.com) wrote: >>>> At 2018-01-23 09:56:48, "Stephen Frost" <sfrost@snowman.net> wrote: >>>>> I've only read through the thread to try and understand what's going on >>>>> and the first thing that comes to mind is that you're changing >>>>> pg_rewind to not remove the WAL from before the divergence (split) >>>>> point, but I'm not sure why. As noted, that WAL isn't needed for >>>>> anything (it's from before the split, after all), so why keep it? Is >>>>> there something in this optimization that depends on the old WAL being >>>>> there and, if so, what and why? >>>> >>>> After run pg_rewind, the first startup of postgres will do crash recovery. >>>> And crash recovery will begin from the previous redo point preceding the divergence. >>>> So, the WAL after the redo point and before the divergence is needed. >>> >>> Right. >> >> Most of the time, and particularly since v11 has removed the need to >> retain more past segments than one completed checkpoint, those segments >> have less chances to be on the source server, limiting more the impact >> of the patch discussed on this thread. > > Good point. Actually, that is worse that than, because after doing a promotion one has to issue a checkpoint on the promoted server so as to update its control file (pg_rewind fetches the new TLI number from the on-disk file), so we can never ensure that needed WAL segments will be on the primary. Hence the proposed patch loses most of its value, because there will always be a WAL segment hole anyway. It can also help in reducing the load in creating new segments if min_wal_size is large enough on the rewound server but that is a narrow benefit. In short, it seems really to me that we should reject the approach as proposed, and replace it with something that prevents the fetching of any WAL segments from the source server. I think that we could consider as well removing all WAL segments on the primary from the point WAL forked, as those created between the last checkpoint before WAL forked up to the point where WAL forked are useful anyway. But those are bonus points to keep a minimalistic amount of space for the rewound node as they finish by being recycled anyway. For those reasons I think that the patch should be marked as returned with feedback. >> Yes, superuser is necessary now, if we could get to a point where only a >> replication permission is needed that would be nice. Now we could do >> things differently. We could have a system role dedicated to pg_rewind >> which works only on the functions from genfile.c that pg_rewind needs, >> in order to leverage the need of a superuser. > > Actually, the other work I'm doing nearby wrt removing the explicit > superuser() checks in those functions would allow a non-superuser role > to be created which could be used by pg_rewind. We could even add an > explicit 'pg_rewind' default role if we wanted to, but is that the route > we want to go with this or should we be thinking about changing > pg_rewind to use the replication protocol and a replication user > instead..? The only reason that it doesn't today is that there isn't an > easy way for it to do so with the existing replication protocol, as I > understand it. Yes, actually with the piece of work we are doing, we don't need to work more on the replication protocol, and pg_rewind does not require heavy refactoring either, so from a maintenance point of view we would gain much more in the long term by using a dedicated role. If your patch to have the file-read catalog user gets in, we should add in the docs of pg_rewind a small set of ACL commands to allow pg_rewind to work with a non-superuser role as long is it gains access to the minimal set of functions necessary. > Then again, there's this whole question of if we should even keep the > replication protocol or if we should be getting rid of it in favor of > have regular connections that can support replication. There's been > discussion of that recently, as I recall, though I can't remember > where. Hm. I don't recall this one.. > Having the rewound server decide by itself what it needs actually sounds > like it would be better and would allow whatever the restore command is > to fetch any missing WAL necessary (which it might be able to do more > efficiently, from an archive server and possibly even in parallel as > compared to pg_rewind doing it serially from the primary...). We don't > have any particularly easy way to prevent needed WAL from disappearing > off of the primary anyway, so it seems like it would be necessary to > have a working restore command for pg_rewind to be reliable. While we > could create a physical replication slot for pg_rewind when it starts, > that may already be too late. The only approach I can think of that makes unnecessary an external archive is to have a slot that guarantees that WAL segments down to the previous checkpoint before promotion are kept around to allow a rewound node to use them. For planned failovers, things can be controlled even today as you know when to create a slot. Unplanned failovers are where the problems come. One way I think can address any problems is to have what I would call a failover slot. The concept is simple: when a node is in recovery have it create a new physical replication slot at each restart point which has a user-defined name, controlled by a GUC, and do the creation automatically. If the parameter is empty, no slots are created. At the beginning of a restart point, the slot is removed, so once the node is promoted the slot will not get removed, and it would keep around WAL segments down to the last redo point before promotion automatically. When you don't want to set up a WAL archive, but still want to ensure that a rewound node is able to catch up with its promoted primary, just enable this failover slot and then set up primary_slot_name in recovery.conf to use it. -- Michael
Attachment
On Fri, Jan 26, 2018 at 11:36:09AM +0900, Michael Paquier wrote: > In short, it seems really to me that we should reject the approach as > proposed, and replace it with something that prevents the fetching of > any WAL segments from the source server. I think that we could consider > as well removing all WAL segments on the primary from the point WAL > forked, as those created between the last checkpoint before WAL forked > up to the point where WAL forked are useful anyway. But those are bonus > points to keep a minimalistic amount of space for the rewound node as > they finish by being recycled anyway. For those reasons I think that the > patch should be marked as returned with feedback. Hearing nothing and as the commit fest is coming to a close, I am marking this patch as returned with feedback. Feel free to correct me if you think this is not adapted. -- Michael