Thread: .ready files appearing on slaves
Hi hackers, Since few months, we occasionally see .ready files appearing on some slaveinstances from various context. The two I havein mind are under 9.2.x. I tried to investigate a bit. These .ready files are created when a WAL file from pg_xlog has no corresponding file in pg_xlog/archive_status. I could easily experience this by deleting such a file: it is created again at the next restartpoint or checkpoint received from the master. Looking at the WAL in pg_xlog folder corresponding to these .ready files, they are all much older than the current WAL "cycle" in both mtime and name logic sequence. As instance on one of these box we have currently 6 of those "ghost" WALs: 0000000200001E53000000FF 0000000200001F18000000FF 0000000200002047000000FF 00000002000020BF000000FF 0000000200002140000000FF0000000200002370000000FF 000000020000255D000000A8 000000020000255D000000A9 [...normal WAL sequence...]000000020000255E0000009D And on another box: 000000010000040E000000FF 0000000100000414000000DA 000000010000046E000000FF 0000000100000470000000FF 00000001000004850000000F000000010000048500000010 [...normal WAL sequence...] 000000010000048500000052 So it seems for some reasons, these old WALs were "forgotten" by the restartpoint mechanism when they should have been recylced/deleted. For one of these servers, I could correlate this with some brutal disconnection of the streaming replication appearing in its logs. But there was no known SR disconnection on the second one. Any idea about this weird behaviour? What can we do to help you investigate further? Regards, -- Jehan-Guillaume de Rorthais Dalibo http://www.dalibo.com
Hi hackers, An issue that seems related to this has been posted on pgsql-admin. See: http://www.postgresql.org/message-id/CAAS3tyLnXaYDZ0+zhXLPdVtOvHQOvR+jSPhp30o8kvWqQs0Tqw@mail.gmail.com How can we help on this issue? Cheers, On Thu, 4 Sep 2014 17:50:36 +0200 Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote: > Hi hackers, > > Since few months, we occasionally see .ready files appearing on some slave > instances from various context. The two I have in mind are under 9.2.x. > > I tried to investigate a bit. These .ready files are created when a WAL file > from pg_xlog has no corresponding file in pg_xlog/archive_status. I could > easily experience this by deleting such a file: it is created again at the > next restartpoint or checkpoint received from the master. > > Looking at the WAL in pg_xlog folder corresponding to these .ready files, they > are all much older than the current WAL "cycle" in both mtime and name logic > sequence. As instance on one of these box we have currently 6 of those "ghost" > WALs: > > 0000000200001E53000000FF > 0000000200001F18000000FF > 0000000200002047000000FF > 00000002000020BF000000FF > 0000000200002140000000FF > 0000000200002370000000FF > 000000020000255D000000A8 > 000000020000255D000000A9 > [...normal WAL sequence...] > 000000020000255E0000009D > > And on another box: > > 000000010000040E000000FF > 0000000100000414000000DA > 000000010000046E000000FF > 0000000100000470000000FF > 00000001000004850000000F > 000000010000048500000010 > [...normal WAL sequence...] > 000000010000048500000052 > > So it seems for some reasons, these old WALs were "forgotten" by the > restartpoint mechanism when they should have been recylced/deleted. > > For one of these servers, I could correlate this with some brutal > disconnection of the streaming replication appearing in its logs. But there > was no known SR disconnection on the second one. > > Any idea about this weird behaviour? What can we do to help you investigate > further? > > Regards, -- Jehan-Guillaume de Rorthais Dalibo http://www.dalibo.com
BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
From
Jehan-Guillaume de Rorthais
Date:
Hi hackers, We spend some time with Guillaume Lelarge studying this issue. CreateRestartPoint() calls RemoveOldXlogFiles() to drop/recycle old WALs. This one is calling XLogArchiveCheckDone() to check if the given WAL can be dropped. As our slave has "archive_mode" & "archive_command" set, XLogArchivingActive() returns true, forcing XLogArchiveCheckDone() to look for a ".done" file. If the corresponding ".done" file doesn't exist, XLogArchiveCheckDone() calls XLogArchiveNotify() to actually create the ".ready" file. Now, on a standby, we are supposed to create the ".done" files by calling XLogArchiveForceDone() either after a successful restore_command or when the streaming replication finishes and flushes a WAL. It seems like the root cause of this trouble: maybe a race condition where XLogArchiveForceDone() would not be called for a WAL. As the ".done" file is never created, we end up calling XLogArchiveNotify() to create a ".ready" and this WAL will never be archived by the slave, sitting in the pg_xlog dir. But worst, we experience this under two 9.2.8 clusters and realized this version is not supposed to have 256 segments per WAL, finishing a WAL to the *FE one. I'm quoting back the history of my previous mail as we clearly have some *FF files on the slave side! > > 000000010000040E000000FF > > 0000000100000414000000DA > > 000000010000046E000000FF > > 0000000100000470000000FF > > 00000001000004850000000F > > 000000010000048500000010 > > [...normal WAL sequence...] > > 000000010000048500000052 Note that it seems to happen only during streaming replication, not in pure Log Shipping replication. We added the debug message on the master side to make sure it never archive *FF file We kept the WAL files and log files for further analysis. How can we help regarding this issue? Regards, On Mon, 15 Sep 2014 17:37:24 +0200 Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote: > Hi hackers, > > An issue that seems related to this has been posted on pgsql-admin. See: > > http://www.postgresql.org/message-id/CAAS3tyLnXaYDZ0+zhXLPdVtOvHQOvR+jSPhp30o8kvWqQs0Tqw@mail.gmail.com > > How can we help on this issue? > > Cheers, > > On Thu, 4 Sep 2014 17:50:36 +0200 > Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote: > > > Hi hackers, > > > > Since few months, we occasionally see .ready files appearing on some slave > > instances from various context. The two I have in mind are under 9.2.x. > > > > I tried to investigate a bit. These .ready files are created when a WAL file > > from pg_xlog has no corresponding file in pg_xlog/archive_status. I could > > easily experience this by deleting such a file: it is created again at the > > next restartpoint or checkpoint received from the master. > > > > Looking at the WAL in pg_xlog folder corresponding to these .ready files, > > they are all much older than the current WAL "cycle" in both mtime and name > > logic sequence. As instance on one of these box we have currently 6 of > > those "ghost" WALs: > > > > 0000000200001E53000000FF > > 0000000200001F18000000FF > > 0000000200002047000000FF > > 00000002000020BF000000FF > > 0000000200002140000000FF > > 0000000200002370000000FF > > 000000020000255D000000A8 > > 000000020000255D000000A9 > > [...normal WAL sequence...] > > 000000020000255E0000009D > > > > And on another box: > > > > 000000010000040E000000FF > > 0000000100000414000000DA > > 000000010000046E000000FF > > 0000000100000470000000FF > > 00000001000004850000000F > > 000000010000048500000010 > > [...normal WAL sequence...] > > 000000010000048500000052 > > > > So it seems for some reasons, these old WALs were "forgotten" by the > > restartpoint mechanism when they should have been recylced/deleted. > > > > For one of these servers, I could correlate this with some brutal > > disconnection of the streaming replication appearing in its logs. But there > > was no known SR disconnection on the second one. > > > > Any idea about this weird behaviour? What can we do to help you investigate > > further? > > > > Regards, > > > -- Jehan-Guillaume de Rorthais Dalibo http://www.dalibo.com
On Fri, Sep 19, 2014 at 1:07 AM, Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote:
MichaelWe kept the WAL files and log files for further analysis. How can we help
regarding this issue?
Commit c2f79ba has added as assumption that the WAL receiver should always enforce the create of .done files when WAL files are done being streamed (XLogWalRcvWrite and WalReceiverMain) or archived (KeepFileRestoredFromArchive). Then using this assumption 1bd42cd has changed a bit RemoveOldXlogFiles, removing a check looking if the node is in recovery. Now, based on the information given here yes it happens that there are still cases where .done file creation is not correctly done, leading to those extra files. Even by looking at the code, I am not directly seeing any code paths where an extra call to XLogArchiveForceDone would be needed on the WAL receiver side but... Something like the patch attached (which is clearly a band-aid) may help though as it would make files to be removed even if they are not marked as .done for a node in recovery. And this is consistent with the pre-1bd42cd.
Comments welcome.
--
Attachment
On 10/08/2014 10:44 AM, Michael Paquier wrote: > On Fri, Sep 19, 2014 at 1:07 AM, Jehan-Guillaume de Rorthais < > jgdr@dalibo.com> wrote: > >> We kept the WAL files and log files for further analysis. How can we help >> regarding this issue? >> > > Commit c2f79ba has added as assumption that the WAL receiver should always > enforce the create of .done files when WAL files are done being streamed > (XLogWalRcvWrite and WalReceiverMain) or archived > (KeepFileRestoredFromArchive). Then using this assumption 1bd42cd has > changed a bit RemoveOldXlogFiles, removing a check looking if the node is > in recovery. Now, based on the information given here yes it happens that > there are still cases where .done file creation is not correctly done, > leading to those extra files. Even by looking at the code, I am not > directly seeing any code paths where an extra call to XLogArchiveForceDone > would be needed on the WAL receiver side but... Something like the patch > attached (which is clearly a band-aid) may help though as it would make > files to be removed even if they are not marked as .done for a node in > recovery. And this is consistent with the pre-1bd42cd. There are two mysteries here: 1. Where do the FF files come from? In 9.2, FF-segments are not supposed to created, ever. Since this only happens with streaming replication, the FF segments are probably being created by walreceiver. XLogWalRcvWrite is the function that opens the file. I don't see anything obviously wrong there. XLogWalRcvWrite opens the file corresponding the start position in the message received from the master. There is no check that the start position is valid, though; if the master sends a start position in the FF segment, walreceiver will merrily write it. So the problem could be in the walsender side. However, I don't see anything wrong there either. I think we should add a check in walreceiver, to throw an error if the master sends an invalid WAL pointer, pointing to an FF segment. 2. Why are the .done files sometimes not being created? I may have an explanation for that. Walreceiver creates a .done file when it closes an old segment and opens a new one. However, it does this only when it's about to start writing to the new segment, and still has the old segment open. If you stream the FE segment fully, but drop replication connection at exactly that point, the .done file is not created. That might sound unlikely, but it's actually pretty easy to trigger. Just do "select pg_switch_xlog()" in the master, followed by "pg_ctl stop -m i" and a restart. The creation of the .done files seems quite unreliable anyway. If only a portion of a segment is streamed, we don't write a .done file for it, so we still have the original problem that we will try to archive the segment after failover, even though the master might already have archived it. I looked again at the thread where this was discussed: http://www.postgresql.org/message-id/flat/CAHGQGwHVYqbX=A+zo+AvFbVHLGoypO9G_QDKbabeXgXBVGd05g@mail.gmail.com. I believe the idea was that the server that generates a WAL segment is always responsible for archiving it. A standby should never attempt to archive a WAL segment that was restored from the archive, or streamed from the master. In that thread, it was not discussed what should happen to WAL files that an admin manually copies into pg_xlog of the standby. Should the standby archive them? I don't think so - the admin should copy them manually to the archive too, if he wants them archived. It's a good and simple rule that the server that generates the WAL, archives the WAL. Instead of creating any .done files during recovery, we could scan pg_xlog at promotion, and create a .done file for every WAL segment that's present at that point. That would be more robust. And then apply your patch, to recycle old segments during archive recovery, ignoring .done files. - Heikki
On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
The additional process at promotion sounds like a good idea, I'll try to get a patch done tomorrow. This would result as well in removing the XLogArchiveForceDone stuff. Either way, not that I have been able to reproduce the problem manually, things can be clearly solved.
1. Where do the FF files come from? In 9.2, FF-segments are not supposed to created, ever.
Since this only happens with streaming replication, the FF segments are probably being created by walreceiver. XLogWalRcvWrite is the function that opens the file. I don't see anything obviously wrong there. XLogWalRcvWrite opens the file corresponding the start position in the message received from the master. There is no check that the start position is valid, though; if the master sends a start position in the FF segment, walreceiver will merrily write it. So the problem could be in the walsender side. However, I don't see anything wrong there either.
Good to hear that. By looking at the wal receiver and sender code paths, I found nothing really weird.
I think we should add a check in walreceiver, to throw an error if the master sends an invalid WAL pointer, pointing to an FF segment.
Then we're good for a check in ProcessWalSndrMessage for walEnd I guess. Seems like a straight-forward patch.
2. Why are the .done files sometimes not being created?
I may have an explanation for that. Walreceiver creates a .done file when it closes an old segment and opens a new one. However, it does this only when it's about to start writing to the new segment, and still has the old segment open. If you stream the FE segment fully, but drop replication connection at exactly that point, the .done file is not created. That might sound unlikely, but it's actually pretty easy to trigger. Just do "select pg_switch_xlog()" in the master, followed by "pg_ctl stop -m i" and a restart.
That's exactly the test I have been doing a couple of times to trigger this behavior before sending my previous email, but without success on the standby with master: all the WAL files were marked as .done. Now, I have just retried it, with far more tries on REL9_3_STABLE and on HEAD and I have been able to actually trigger the problem a couple of times. Simply run a long transaction generating a lot of WAL like that:
=# create table aa as select generate_series(1,1000000000);
And then run that:
$ psql -c 'select pg_switch_xlog()'; pg_ctl stop -m immediate; pg_ctl start
=# create table aa as select generate_series(1,1000000000);
And then run that:
$ psql -c 'select pg_switch_xlog()'; pg_ctl stop -m immediate; pg_ctl start
And with enough "luck", .ready files may appear. It may take a dozen of tries before seeing at least ones. And I noticed that generally multiple .ready files appear at the same time.
The creation of the .done files seems quite unreliable anyway. If only a portion of a segment is streamed, we don't write a .done file for it, so we still have the original problem that we will try to archive the segment after failover, even though the master might already have archived it.
Yep. Agreed.
I looked again at the thread where this was discussed: http://www.postgresql.org/message-id/flat/CAHGQGwHVYqbX=A+zo+AvFbVHLGoypO9G_QDKbabeXgXBVGd05g@mail.gmail.com. I believe the idea was that the server that generates a WAL segment is always responsible for archiving it. A standby should never attempt to archive a WAL segment that was restored from the archive, or streamed from the master.
In that thread, it was not discussed what should happen to WAL files that an admin manually copies into pg_xlog of the standby. Should the standby archive them? I don't think so - the admin should copy them manually to the archive too, if he wants them archived. It's a good and simple rule that the server that generates the WAL, archives the WAL.
Question time: why has the check based on recovery state of the node been removed in 1bd42cd? Just assuming, but did you have in mind that relying on XLogArchiveForceDone and XLogArchiveCheckDone was enough and more robust at this point?
Instead of creating any .done files during recovery, we could scan pg_xlog at promotion, and create a .done file for every WAL segment that's present at that point. That would be more robust. And then apply your patch, to recycle old segments during archive recovery, ignoring .done files.
Regards,
--
Michael
Michael
On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 10/08/2014 10:44 AM, Michael Paquier wrote: >> >> On Fri, Sep 19, 2014 at 1:07 AM, Jehan-Guillaume de Rorthais < >> jgdr@dalibo.com> wrote: >> >>> We kept the WAL files and log files for further analysis. How can we help >>> regarding this issue? >>> >> >> Commit c2f79ba has added as assumption that the WAL receiver should always >> enforce the create of .done files when WAL files are done being streamed >> (XLogWalRcvWrite and WalReceiverMain) or archived >> (KeepFileRestoredFromArchive). Then using this assumption 1bd42cd has >> changed a bit RemoveOldXlogFiles, removing a check looking if the node is >> in recovery. Now, based on the information given here yes it happens that >> there are still cases where .done file creation is not correctly done, >> leading to those extra files. Even by looking at the code, I am not >> directly seeing any code paths where an extra call to XLogArchiveForceDone >> would be needed on the WAL receiver side but... Something like the patch >> attached (which is clearly a band-aid) may help though as it would make >> files to be removed even if they are not marked as .done for a node in >> recovery. And this is consistent with the pre-1bd42cd. > > > > There are two mysteries here: > > 1. Where do the FF files come from? In 9.2, FF-segments are not supposed to > created, ever. > > Since this only happens with streaming replication, the FF segments are > probably being created by walreceiver. XLogWalRcvWrite is the function that > opens the file. I don't see anything obviously wrong there. XLogWalRcvWrite > opens the file corresponding the start position in the message received from > the master. There is no check that the start position is valid, though; if > the master sends a start position in the FF segment, walreceiver will > merrily write it. So the problem could be in the walsender side. However, I > don't see anything wrong there either. > > I think we should add a check in walreceiver, to throw an error if the > master sends an invalid WAL pointer, pointing to an FF segment. > > > 2. Why are the .done files sometimes not being created? > > I may have an explanation for that. Walreceiver creates a .done file when it > closes an old segment and opens a new one. However, it does this only when > it's about to start writing to the new segment, and still has the old > segment open. If you stream the FE segment fully, but drop replication > connection at exactly that point, the .done file is not created. That might > sound unlikely, but it's actually pretty easy to trigger. Just do "select > pg_switch_xlog()" in the master, followed by "pg_ctl stop -m i" and a > restart. > > The creation of the .done files seems quite unreliable anyway. If only a > portion of a segment is streamed, we don't write a .done file for it, so we > still have the original problem that we will try to archive the segment > after failover, even though the master might already have archived it. > > I looked again at the thread where this was discussed: > http://www.postgresql.org/message-id/flat/CAHGQGwHVYqbX=A+zo+AvFbVHLGoypO9G_QDKbabeXgXBVGd05g@mail.gmail.com. > I believe the idea was that the server that generates a WAL segment is > always responsible for archiving it. A standby should never attempt to > archive a WAL segment that was restored from the archive, or streamed from > the master. > > In that thread, it was not discussed what should happen to WAL files that an > admin manually copies into pg_xlog of the standby. Should the standby > archive them? I don't think so - the admin should copy them manually to the > archive too, if he wants them archived. It's a good and simple rule that the > server that generates the WAL, archives the WAL. > > Instead of creating any .done files during recovery, we could scan pg_xlog > at promotion, and create a .done file for every WAL segment that's present > at that point. That would be more robust. And then apply your patch, to > recycle old segments during archive recovery, ignoring .done files. What happens if a user shutdowns the standby, removes recovery.conf and starts the server as the master? In this case, no WAL files have .done status files, so the server will create .ready and archive all of them. Probably this is problematic. So even if we adopt your idea, ISTM that it's better to create .done file whenever WAL file is fullly streamed and restored. Regards, -- Fujii Masao
On 10/08/2014 04:59 PM, Fujii Masao wrote: > On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> Instead of creating any .done files during recovery, we could scan pg_xlog >> at promotion, and create a .done file for every WAL segment that's present >> at that point. That would be more robust. And then apply your patch, to >> recycle old segments during archive recovery, ignoring .done files. > > What happens if a user shutdowns the standby, removes recovery.conf and > starts the server as the master? Um, that's not a safe thing to do anyway, is it? - Heikki
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Wed, Oct 8, 2014 at 11:38 PM, Heikki Linnakangas<span dir="ltr"><<a href="mailto:hlinnakangas@vmware.com" target="_blank">hlinnakangas@vmware.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px#ccc solid;padding-left:1ex"><span class="">On 10/08/2014 04:59 PM, Fujii Masao wrote:<br /></span><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas<br /> <<a href="mailto:hlinnakangas@vmware.com" target="_blank">hlinnakangas@vmware.com</a>>wrote:<br /></span><span class=""><blockquote class="gmail_quote" style="margin:00 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Instead of creating any .done files during recovery,we could scan pg_xlog<br /> at promotion, and create a .done file for every WAL segment that's present<br /> atthat point. That would be more robust. And then apply your patch, to<br /> recycle old segments during archive recovery,ignoring .done files.<br /></blockquote><br /> What happens if a user shutdowns the standby, removes recovery.confand<br /> starts the server as the master?<br /></span></blockquote><br /> Um, that's not a safe thing to doanyway, is it?<span class="HOEnZb"><font color="#888888"><br /></font></span></blockquote><div class="gmail_extra">That'snot safe as it bypasses all the consistency checks of promotion. Now, it is also something thatrepmgr for example does as far as I recall to do a node "promotion". What if we simply document the problem properlythen? The apparition of those phantom WAL files is more scary than a user or a utility that does a promotion witha server restart. Not to mention as well that users as free to add themselves files to pg_xlog.<br />-- <br />Michael<br/></div></div></div></div>
On Wed, Oct 8, 2014 at 10:00 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
Please find attached two patches aimed to fix this issue and to improve the situation:The additional process at promotion sounds like a good idea, I'll try to get a patch done tomorrow. This would result as well in removing the XLogArchiveForceDone stuff. Either way, not that I have been able to reproduce the problem manually, things can be clearly solved.
- 0001 prevents the apparition of those phantom WAL segment file by ensuring that when a node is in recovery it will remove it whatever its status in archive_status. This patch is the real fix, and should be applied down to 9.2.
- 0002 is a patch implementing Heikki's idea of enforcing all the segment files present in pg_xlog to have their status to .done, marking them for removal. When looking at the code, I finally concluded that Fujii-san's point, about marking in all cases as .done segment files that have been fully streamed, actually makes more sense to not be backward. This patch would actually not be mandatory for back-patching, but it makes the process more robust IMO.
I imagine that it would be nice to get those things fixed before the next minor release.
Regards,
--
Michael
Michael
Attachment
On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > 1. Where do the FF files come from? In 9.2, FF-segments are not supposed to > created, ever. > > I think we should add a check in walreceiver, to throw an error if the > master sends an invalid WAL pointer, pointing to an FF segment. Attached is a patch for that. This would be needed for PG < 9.3 if applied. Regards, -- Michael
Attachment
On Thu, Oct 9, 2014 at 3:26 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > > > On Wed, Oct 8, 2014 at 10:00 PM, Michael Paquier <michael.paquier@gmail.com> > wrote: >> >> The additional process at promotion sounds like a good idea, I'll try to >> get a patch done tomorrow. This would result as well in removing the >> XLogArchiveForceDone stuff. Either way, not that I have been able to >> reproduce the problem manually, things can be clearly solved. > > Please find attached two patches aimed to fix this issue and to improve the > situation: > - 0001 prevents the apparition of those phantom WAL segment file by ensuring > that when a node is in recovery it will remove it whatever its status in > archive_status. This patch is the real fix, and should be applied down to > 9.2. > - 0002 is a patch implementing Heikki's idea of enforcing all the segment > files present in pg_xlog to have their status to .done, marking them for > removal. When looking at the code, I finally concluded that Fujii-san's > point, about marking in all cases as .done segment files that have been > fully streamed, actually makes more sense to not be backward. This patch > would actually not be mandatory for back-patching, but it makes the process > more robust IMO. Thanks for the patches! I found one problem in the 0002 patch. The patch changes the recovery so that it creates .done files for every WAL files which exist in pg_xlog directory at the end of recovery. But even WAL files which will have to be archived later can exist in pg_xlog at that moment. For example, the latest, recycled and fully-written-but-not-archived-yet (i.e., maybe having .ready files) WAL files. The patch wrongly prevents them from being archvied at all. ISTM that the 0001 patch has the similar problem. Please imagine the following scenario. 1. There are many unarchived WAL files in pg_xlog because of the continuous failure of archive_command, for example, andthen the server unfortunately crashes because of the corruption of database itself. 2. DBA restores the backup onto the server and copies all the WAL files from old pg_xlog to new one. Then he or she preparesfor archive recovery. 3. DBA starts the server and the archive recovery starts. 4. After all the archived WAL files are replayed, the server tries to replay the WAL files in pg_xlog. Since there aremany WAL files in pg_xlog, more than one restartpoints happen while they are being replayed. In this case, the patch seems to make the restartpoint recycle even WAL files which have .ready files and will have to be archived later. Thought? Regards, -- Fujii Masao
On Fri, Oct 17, 2014 at 9:23 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Thu, Oct 9, 2014 at 3:26 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> >> >> On Wed, Oct 8, 2014 at 10:00 PM, Michael Paquier <michael.paquier@gmail.com> >> wrote: >>> >>> The additional process at promotion sounds like a good idea, I'll try to >>> get a patch done tomorrow. This would result as well in removing the >>> XLogArchiveForceDone stuff. Either way, not that I have been able to >>> reproduce the problem manually, things can be clearly solved. >> >> Please find attached two patches aimed to fix this issue and to improve the >> situation: >> - 0001 prevents the apparition of those phantom WAL segment file by ensuring >> that when a node is in recovery it will remove it whatever its status in >> archive_status. This patch is the real fix, and should be applied down to >> 9.2. >> - 0002 is a patch implementing Heikki's idea of enforcing all the segment >> files present in pg_xlog to have their status to .done, marking them for >> removal. When looking at the code, I finally concluded that Fujii-san's >> point, about marking in all cases as .done segment files that have been >> fully streamed, actually makes more sense to not be backward. This patch >> would actually not be mandatory for back-patching, but it makes the process >> more robust IMO. > > Thanks for the patches! While reviewing the patch, I found another bug related to WAL file in recovery mode. The problem is that exitArchiveRecovery() always creates .ready file for the last WAL file of the old timeline even when it's restored from the archive and has .done file. So this causes the already-archived WAL file to be archived again.... Attached patch fixes this bug. Regards, -- Fujii Masao
Attachment
<div dir="ltr">On Fri, Oct 17, 2014 at 9:23 PM, Fujii Masao <span dir="ltr"><<a href="mailto:masao.fujii@gmail.com" target="_blank">masao.fujii@gmail.com</a>></span>wrote:<br /><div class="gmail_extra"><div class="gmail_quote"><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">In this case, the patch seems to make the restartpoint recycle even WAL files<br /> which have .readyfiles and will have to be archived later. Thought?<span><font color="#888888"><br /></font></span></blockquote></div>Thereal problem currently is that it is possible to have a segment file not marked as.done during recovery when stream connection is abruptly cut when this segment is switched, marking it as .ready in archive_statusand simply letting this segment in pg_xlog because it will neither be recycled nor removed. I have not beenable to look much at this code these days, so I am not sure how invasive it would be in back-branches, but perhaps weshould try to improve code such as when a segment file is switched and connection to the is cut, we guarantee that thisfile is completed and marked as .done.<br />-- <br />Michael<br /></div></div>
On Fri, Oct 17, 2014 at 10:37 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > > On Fri, Oct 17, 2014 at 9:23 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> >> In this case, the patch seems to make the restartpoint recycle even WAL files >> which have .ready files and will have to be archived later. Thought? > > The real problem currently is that it is possible to have a segment file not marked as .done during recovery when streamconnection is abruptly cut when this segment is switched, marking it as .ready in archive_status and simply lettingthis segment in pg_xlog because it will neither be recycled nor removed. I have not been able to look much at thiscode these days, so I am not sure how invasive it would be in back-branches, but perhaps we should try to improve codesuch as when a segment file is switched and connection to the is cut, we guarantee that this file is completed and markedas .done. I have spent more time on that, with a bit more of underground work... First, the problem can be reproduced most of the time by running this simple command: psql -c 'select pg_switch_xlog()'; pg_ctl restart -m immediate This will enforce a segment file switch and restart the master in crash recovery. This has as effect to immediately cut the WAL stream on slave, symbolized by a FATAL in libpqrcv_receive where rawlen == 0. For example, let's imagine that stream fails when switching from 000000010000000000000003 to the next segment, then the the last XLogRecPtr in XLogWalRcvProcessMsg for dataStart is for example 0/3100000, and walrcv->latestWalEnd is 0/4000000. When stream restarts it will begin once again from 0/4000000, ignoring that 000000010000000000000003 should be marked as .done, ultimately marking it in .ready state when old segment files are recycled or removed. There is nothing that can really be done to enforce the creation of a .done file before the FATAL of libpqrcv_receive because we cannot predict the stream failure.. Now, we can do better than what we have now by looking at WAL start position used when starting streaming in WAL receiver and enforce .done if the start position is the last one of previous segment. Hence, in the case of start position 0/4000000 that was found previously, the file that will be enforced to .done is 000000010000000000000003. I have written the patch attached that implements this idea and fixes the problem. Now let's see if you guys see any flaws in this simple logic which uses a sniper gun instead of a bazooka as in the previous patches sent. Regards, -- Michael
Attachment
On Fri, Oct 17, 2014 at 10:12 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
-- On Fri, Oct 17, 2014 at 9:23 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, Oct 9, 2014 at 3:26 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>>
>>
>> On Wed, Oct 8, 2014 at 10:00 PM, Michael Paquier <michael.paquier@gmail.com>
>> wrote:
>>>
>>> The additional process at promotion sounds like a good idea, I'll try to
>>> get a patch done tomorrow. This would result as well in removing the
>>> XLogArchiveForceDone stuff. Either way, not that I have been able to
>>> reproduce the problem manually, things can be clearly solved.
>>
>> Please find attached two patches aimed to fix this issue and to improve the
>> situation:
>> - 0001 prevents the apparition of those phantom WAL segment file by ensuring
>> that when a node is in recovery it will remove it whatever its status in
>> archive_status. This patch is the real fix, and should be applied down to
>> 9.2.
>> - 0002 is a patch implementing Heikki's idea of enforcing all the segment
>> files present in pg_xlog to have their status to .done, marking them for
>> removal. When looking at the code, I finally concluded that Fujii-san's
>> point, about marking in all cases as .done segment files that have been
>> fully streamed, actually makes more sense to not be backward. This patch
>> would actually not be mandatory for back-patching, but it makes the process
>> more robust IMO.
>
> Thanks for the patches!
While reviewing the patch, I found another bug related to WAL file in recovery
mode. The problem is that exitArchiveRecovery() always creates .ready file for
the last WAL file of the old timeline even when it's restored from the archive
and has .done file. So this causes the already-archived WAL file to be archived
again.... Attached patch fixes this bug.
That's a good catch! Patch looks good. I think that you should change xlogpath to use MAXFNAMELEN instead of MAXPGPATH in exitArchiveRecovery. This is only for correctness, so that's a master-only remark, because this variable is used only to calculate a segment file name, and not a path. Renaming the variable from xlogpath to xlogname would make sense as well.
Regards,
Michael
On 10/20/2014 09:26 AM, Michael Paquier wrote: > On Fri, Oct 17, 2014 at 10:37 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> >> On Fri, Oct 17, 2014 at 9:23 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> >>> In this case, the patch seems to make the restartpoint recycle even WAL files >>> which have .ready files and will have to be archived later. Thought? >> >> The real problem currently is that it is possible to have a segment file not marked as .done during recovery when streamconnection is abruptly cut when this segment is switched, marking it as .ready in archive_status and simply lettingthis segment in pg_xlog because it will neither be recycled nor removed. I have not been able to look much at thiscode these days, so I am not sure how invasive it would be in back-branches, but perhaps we should try to improve codesuch as when a segment file is switched and connection to the is cut, we guarantee that this file is completed and markedas .done. > > I have spent more time on that, with a bit more of underground work... > First, the problem can be reproduced most of the time by running this > simple command: > psql -c 'select pg_switch_xlog()'; pg_ctl restart -m immediate > > This will enforce a segment file switch and restart the master in > crash recovery. This has as effect to immediately cut the WAL stream > on slave, symbolized by a FATAL in libpqrcv_receive where rawlen == 0. > For example, let's imagine that stream fails when switching from > 000000010000000000000003 to the next segment, then the > the last XLogRecPtr in XLogWalRcvProcessMsg for dataStart is for > example 0/3100000, and walrcv->latestWalEnd is 0/4000000. When stream > restarts it will begin once again from 0/4000000, ignoring that > 000000010000000000000003 should be marked as .done, ultimately marking > it in .ready state when old segment files are recycled or removed. > There is nothing that can really be done to enforce the creation of a > .done file before the FATAL of libpqrcv_receive because we cannot > predict the stream failure.. > > Now, we can do better than what we have now by looking at WAL start > position used when starting streaming in WAL receiver and enforce > .done if the start position is the last one of previous segment. > Hence, in the case of start position 0/4000000 that was found > previously, the file that will be enforced to .done is > 000000010000000000000003. I have written the patch attached that > implements this idea and fixes the problem. Now let's see if you guys > see any flaws in this simple logic which uses a sniper gun instead of > a bazooka as in the previous patches sent. Hmm. This will still miss the .done file if you don't re-establish the streaming replication connection after the restart. For example, if you shut down the master, and promote the standby server. I think we should take a more wholesale approach to this. We should enforce the rule that the server only ever archives WAL files belonging to the same timeline that the server generates. IOW, the server only archives the WAL that it has generated. - Heikki
On Fri, Oct 17, 2014 at 2:23 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
-- I found one problem in the 0002 patch. The patch changes the recovery so that
it creates .done files for every WAL files which exist in pg_xlog directory at
the end of recovery. But even WAL files which will have to be archived later
can exist in pg_xlog at that moment. For example, the latest, recycled and
fully-written-but-not-archived-yet (i.e., maybe having .ready files) WAL files.
The patch wrongly prevents them from being archived at all.
Re-looking at patch 2, yes you are right. Even if it was mentioned that we should do that for a node that had promotion triggered it was not done this way as a check on CheckForStandbyTrigger() is actually missing. Attached is an updated patch.
ISTM that the 0001 patch has the similar problem. Please imagine the following
scenario.
In this case, the patch seems to make the restartpoint recycle even WAL files
which have .ready files and will have to be archived later. Thought?
Right, that's really backward. This was the approach taken before c9cc7e0, and this commit actually prevents removal of unarchived WAL files during recovery.
Michael
Attachment
On 10/22/2014 04:24 PM, Michael Paquier wrote: > On Wed, Oct 22, 2014 at 1:53 PM, Heikki Linnakangas <hlinnakangas@vmware.com >> wrote: > >> I think we should take a more wholesale approach to this. We should >> enforce the rule that the server only ever archives WAL files belonging to >> the same timeline that the server generates. IOW, the server only archives >> the WAL that it has generated. > > Hm?! Would that be really back-patchable? There may be in the wild tools or > users that rely on the fact a node archives segment files from all > timelines. Hmm, so it would be a tool or user that manually copies a file to the pg_xlog directory of a standby server, and expects the standby to archive the file after promotion. That seems a bit far-fetched, although I've seen people do strange things. I think it would be acceptable as long as we document the change in behavior in the release notes. I don't have much hope that we'll ever be able to nail down the correct behavior with the current approach. - Heikki
On Mon, Oct 20, 2014 at 3:26 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Fri, Oct 17, 2014 at 10:37 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> >> On Fri, Oct 17, 2014 at 9:23 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> >>> In this case, the patch seems to make the restartpoint recycle even WAL files >>> which have .ready files and will have to be archived later. Thought? >> >> The real problem currently is that it is possible to have a segment file not marked as .done during recovery when streamconnection is abruptly cut when this segment is switched, marking it as .ready in archive_status and simply lettingthis segment in pg_xlog because it will neither be recycled nor removed. I have not been able to look much at thiscode these days, so I am not sure how invasive it would be in back-branches, but perhaps we should try to improve codesuch as when a segment file is switched and connection to the is cut, we guarantee that this file is completed and markedas .done. > > I have spent more time on that, with a bit more of underground work... > First, the problem can be reproduced most of the time by running this > simple command: > psql -c 'select pg_switch_xlog()'; pg_ctl restart -m immediate What about fixing this problem directly? That is, we can make walreceiver check whether the end of last received WAL data is the end of current WAL file or not, and then close the WAL file and create .done file if the test is true. This is not a perfect solution. If the standby crashes during very short interval (i.e., after closing the WAL file and before creating .done file), the problem would happen again. But it can really rarely happen, so I don't think that it's worth fixing the corner case at least in back-branches. Of course, we can find out the "perfect" solution for the master, though. Regards, -- Fujii Masao
On Mon, Oct 20, 2014 at 5:15 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > > > On Fri, Oct 17, 2014 at 10:12 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> >> On Fri, Oct 17, 2014 at 9:23 PM, Fujii Masao <masao.fujii@gmail.com> >> wrote: >> > On Thu, Oct 9, 2014 at 3:26 PM, Michael Paquier >> > <michael.paquier@gmail.com> wrote: >> >> >> >> >> >> On Wed, Oct 8, 2014 at 10:00 PM, Michael Paquier >> >> <michael.paquier@gmail.com> >> >> wrote: >> >>> >> >>> The additional process at promotion sounds like a good idea, I'll try >> >>> to >> >>> get a patch done tomorrow. This would result as well in removing the >> >>> XLogArchiveForceDone stuff. Either way, not that I have been able to >> >>> reproduce the problem manually, things can be clearly solved. >> >> >> >> Please find attached two patches aimed to fix this issue and to improve >> >> the >> >> situation: >> >> - 0001 prevents the apparition of those phantom WAL segment file by >> >> ensuring >> >> that when a node is in recovery it will remove it whatever its status >> >> in >> >> archive_status. This patch is the real fix, and should be applied down >> >> to >> >> 9.2. >> >> - 0002 is a patch implementing Heikki's idea of enforcing all the >> >> segment >> >> files present in pg_xlog to have their status to .done, marking them >> >> for >> >> removal. When looking at the code, I finally concluded that Fujii-san's >> >> point, about marking in all cases as .done segment files that have been >> >> fully streamed, actually makes more sense to not be backward. This >> >> patch >> >> would actually not be mandatory for back-patching, but it makes the >> >> process >> >> more robust IMO. >> > >> > Thanks for the patches! >> >> While reviewing the patch, I found another bug related to WAL file in >> recovery >> mode. The problem is that exitArchiveRecovery() always creates .ready file >> for >> the last WAL file of the old timeline even when it's restored from the >> archive >> and has .done file. So this causes the already-archived WAL file to be >> archived >> again.... Attached patch fixes this bug. > > That's a good catch! Patch looks good. I think that you should change > xlogpath to use MAXFNAMELEN instead of MAXPGPATH in exitArchiveRecovery. > This is only for correctness, so that's a master-only remark, because this > variable is used only to calculate a segment file name, and not a path. > Renaming the variable from xlogpath to xlogname would make sense as well. Thanks for the review! Applied. Regards, -- Fujii Masao
On 10/23/2014 08:59 AM, Fujii Masao wrote: > On Mon, Oct 20, 2014 at 3:26 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Fri, Oct 17, 2014 at 10:37 PM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> >>> On Fri, Oct 17, 2014 at 9:23 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>> >>>> In this case, the patch seems to make the restartpoint recycle even WAL files >>>> which have .ready files and will have to be archived later. Thought? >>> >>> The real problem currently is that it is possible to have a segment file not marked as .done during recovery when streamconnection is abruptly cut when this segment is switched, marking it as .ready in archive_status and simply lettingthis segment in pg_xlog because it will neither be recycled nor removed. I have not been able to look much at thiscode these days, so I am not sure how invasive it would be in back-branches, but perhaps we should try to improve codesuch as when a segment file is switched and connection to the is cut, we guarantee that this file is completed and markedas .done. >> >> I have spent more time on that, with a bit more of underground work... >> First, the problem can be reproduced most of the time by running this >> simple command: >> psql -c 'select pg_switch_xlog()'; pg_ctl restart -m immediate > > What about fixing this problem directly? That is, we can make walreceiver > check whether the end of last received WAL data is the end of current WAL file > or not, and then close the WAL file and create .done file if the test is true. > > This is not a perfect solution. If the standby crashes during very > short interval > (i.e., after closing the WAL file and before creating .done file), the problem > would happen again. But it can really rarely happen, so I don't think that it's > worth fixing the corner case at least in back-branches. Of course, we can > find out the "perfect" solution for the master, though. Sounds reasonable, for back-branches. Although I'm still worried we might miss some corner-case unless we go with a more wholesale solution. At least for master, we should consider changing the way the archiving works so that we only archive WAL that was generated in the same server. I.e. we should never try to archive WAL files belonging to another timeline. I just remembered that we discussed a different problem related to this some time ago, at http://www.postgresql.org/message-id/20131212.110002.204892575.horiguchi.kyotaro@lab.ntt.co.jp. The conclusion of that was that at promotion, we should not archive the last, partial, segment from the old timeline. Also, we could make the archive recovery more precise in what WAL files it tries to restore. For example, if we're doing recovery with target timeline 4, but the timeline history says that segment AB came from timeline 3 (i.e. the switch to timeline 4 happened within a later segment), we should not try to restore segment AB from timeline 4. It shouldn't exist, so trying to restore it is a waste of cycles, but more importantly, if there's some sort of confusion with the timelines, and the file exists anyway, we should not restore it. Also, if the timeline history says e.g. that we switched from TLI 3 to 4 at WAL position 0/AD123456, and we restore segment AD from timeline 3, we should stop recovering that segment at position 0/AD123456 and try to find the AD segment from timeline 4. We currently only do that when streaming, but we should also do that when restoring from archive. In summary, let's do something small for back-branches, like what you suggested. But for master, let's do bigger changes to the timeline handling. Would you like to write a patch for the back-branches, if I start working on master? - Heikki
On Thu, Oct 23, 2014 at 10:09 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
-- Sounds reasonable, for back-branches. Although I'm still worried we might miss some corner-case unless we go with a more wholesale solution.On 10/23/2014 08:59 AM, Fujii Masao wrote:
Don't really want to be the intruder here, but isn't that the simple patch attached? There is still a small window between XLogWalRcvFlush and XLogArchiveForceDone in XLogWalRcvWrite if the standby crashes exactly between them.
Michael
Attachment
On 10/23/2014 01:25 PM, Michael Paquier wrote: > On Thu, Oct 23, 2014 at 10:09 AM, Heikki Linnakangas < > hlinnakangas@vmware.com> wrote: > >> On 10/23/2014 08:59 AM, Fujii Masao wrote: >> Sounds reasonable, for back-branches. Although I'm still worried we might >> miss some corner-case unless we go with a more wholesale solution. >> > > Don't really want to be the intruder here, but isn't that the simple patch > attached? That's not right. Should check *after* the write if the segment was completed, and close it if so. Like the attached. > There is still a small window between XLogWalRcvFlush and > XLogArchiveForceDone in XLogWalRcvWrite if the standby crashes exactly > between them. Yeah. I think we can live with that. - Heikki
Attachment
On Thu, Oct 23, 2014 at 5:09 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 10/23/2014 08:59 AM, Fujii Masao wrote: >> >> On Mon, Oct 20, 2014 at 3:26 PM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> >>> On Fri, Oct 17, 2014 at 10:37 PM, Michael Paquier >>> <michael.paquier@gmail.com> wrote: >>>> >>>> >>>> On Fri, Oct 17, 2014 at 9:23 PM, Fujii Masao <masao.fujii@gmail.com> >>>> wrote: >>>>> >>>>> >>>>> In this case, the patch seems to make the restartpoint recycle even WAL >>>>> files >>>>> which have .ready files and will have to be archived later. Thought? >>>> >>>> >>>> The real problem currently is that it is possible to have a segment file >>>> not marked as .done during recovery when stream connection is abruptly cut >>>> when this segment is switched, marking it as .ready in archive_status and >>>> simply letting this segment in pg_xlog because it will neither be recycled >>>> nor removed. I have not been able to look much at this code these days, so I >>>> am not sure how invasive it would be in back-branches, but perhaps we should >>>> try to improve code such as when a segment file is switched and connection >>>> to the is cut, we guarantee that this file is completed and marked as .done. >>> >>> >>> I have spent more time on that, with a bit more of underground work... >>> First, the problem can be reproduced most of the time by running this >>> simple command: >>> psql -c 'select pg_switch_xlog()'; pg_ctl restart -m immediate >> >> >> What about fixing this problem directly? That is, we can make walreceiver >> check whether the end of last received WAL data is the end of current WAL >> file >> or not, and then close the WAL file and create .done file if the test is >> true. >> >> This is not a perfect solution. If the standby crashes during very >> short interval >> (i.e., after closing the WAL file and before creating .done file), the >> problem >> would happen again. But it can really rarely happen, so I don't think that >> it's >> worth fixing the corner case at least in back-branches. Of course, we can >> find out the "perfect" solution for the master, though. > > > Sounds reasonable, for back-branches. Although I'm still worried we might > miss some corner-case unless we go with a more wholesale solution. +1 > At least for master, we should consider changing the way the archiving works > so that we only archive WAL that was generated in the same server. I.e. we > should never try to archive WAL files belonging to another timeline. > > I just remembered that we discussed a different problem related to this some > time ago, at > http://www.postgresql.org/message-id/20131212.110002.204892575.horiguchi.kyotaro@lab.ntt.co.jp. > The conclusion of that was that at promotion, we should not archive the > last, partial, segment from the old timeline. So, the last, partial, segment of the old timeline is never archived? If yes, I'm afraid that the PITR to the old timeline cannot replay the last segment. No? Or you're thinking to change the code so that the segment of new timeline is replayed in that case? > In summary, let's do something small for back-branches, like what you > suggested. But for master, let's do bigger changes to the timeline handling. Yep. Regards, -- Fujii Masao
On Thu, Oct 23, 2014 at 8:45 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 10/23/2014 01:25 PM, Michael Paquier wrote: >> >> On Thu, Oct 23, 2014 at 10:09 AM, Heikki Linnakangas < >> hlinnakangas@vmware.com> wrote: >> >>> On 10/23/2014 08:59 AM, Fujii Masao wrote: >>> Sounds reasonable, for back-branches. Although I'm still worried we might >>> miss some corner-case unless we go with a more wholesale solution. >>> >> >> Don't really want to be the intruder here, but isn't that the simple patch >> attached? > > > That's not right. Should check *after* the write if the segment was > completed, and close it if so. Like the attached. Looks good to me. WalReceiverMain has almost the same code as what XLogWalRcvFileClose does. So we can refactor that. >> There is still a small window between XLogWalRcvFlush and >> XLogArchiveForceDone in XLogWalRcvWrite if the standby crashes exactly >> between them. > > > Yeah. I think we can live with that. Yes. Regards, -- Fujii Masao
On Thu, Oct 23, 2014 at 9:23 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Thu, Oct 23, 2014 at 8:45 PM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> On 10/23/2014 01:25 PM, Michael Paquier wrote: >>> >>> On Thu, Oct 23, 2014 at 10:09 AM, Heikki Linnakangas < >>> hlinnakangas@vmware.com> wrote: >>> >>>> On 10/23/2014 08:59 AM, Fujii Masao wrote: >>>> Sounds reasonable, for back-branches. Although I'm still worried we might >>>> miss some corner-case unless we go with a more wholesale solution. >>>> >>> >>> Don't really want to be the intruder here, but isn't that the simple patch >>> attached? >> >> >> That's not right. Should check *after* the write if the segment was >> completed, and close it if so. Like the attached. > > Looks good to me. WalReceiverMain has almost the same code as > what XLogWalRcvFileClose does. So we can refactor that. While looking at the code of WAL archiving and recovery, I found another small issue. The standby always creates .ready file for the timeline history file even when WAL archiving is not enabled. Since WAL archiving is off, that .ready file will remain infinitely. Probably this is almost harmless but confusing, so I'd like to fix that. Patch attached. Thought? Regards, -- Fujii Masao
Attachment
On Thu, Oct 23, 2014 at 2:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
Good catch once again. We could as well put the check of XLogArchivingActive directly in XLogArchiveNotify...
-- On Thu, Oct 23, 2014 at 9:23 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, Oct 23, 2014 at 8:45 PM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> On 10/23/2014 01:25 PM, Michael Paquier wrote:
>>>
>>> On Thu, Oct 23, 2014 at 10:09 AM, Heikki Linnakangas <
>>> hlinnakangas@vmware.com> wrote:
>>>
>>>> On 10/23/2014 08:59 AM, Fujii Masao wrote:
>>>> Sounds reasonable, for back-branches. Although I'm still worried we might
>>>> miss some corner-case unless we go with a more wholesale solution.
>>>>
>>>
>>> Don't really want to be the intruder here, but isn't that the simple patch
>>> attached?
>>
>>
>> That's not right. Should check *after* the write if the segment was
>> completed, and close it if so. Like the attached.
>
> Looks good to me. WalReceiverMain has almost the same code as
> what XLogWalRcvFileClose does. So we can refactor that.
While looking at the code of WAL archiving and recovery, I found
another small issue. The standby always creates .ready file for
the timeline history file even when WAL archiving is not enabled.
Since WAL archiving is off, that .ready file will remain infinitely.
Probably this is almost harmless but confusing, so I'd like to fix that.
Patch attached. Thought?
Good catch once again. We could as well put the check of XLogArchivingActive directly in XLogArchiveNotify...
Michael
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Thu, Oct 23, 2014 at 1:45 PM, Heikki Linnakangas<span dir="ltr"><<a href="mailto:hlinnakangas@vmware.com" target="_blank">hlinnakangas@vmware.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px#ccc solid;padding-left:1ex"><span class="">On 10/23/2014 01:25 PM, Michael Paquier wrote:<br /><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> On Thu, Oct 23, 2014at 10:09 AM, Heikki Linnakangas <<br /><a href="mailto:hlinnakangas@vmware.com" target="_blank">hlinnakangas@vmware.com</a>>wrote:<br /><br /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px#ccc solid;padding-left:1ex"> On 10/23/2014 08:59 AM, Fujii Masao wrote:<br /> Sounds reasonable, forback-branches. Although I'm still worried we might<br /> miss some corner-case unless we go with a more wholesale solution.<br/><br /></blockquote><br /> Don't really want to be the intruder here, but isn't that the simple patch<br />attached?<br /></blockquote><br /></span> That's not right. Should check *after* the write if the segment was completed,and close it if so. Like the attached.<span class=""><br /><br /><blockquote class="gmail_quote" style="margin:00 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> There is still a small window between XLogWalRcvFlushand<br /> XLogArchiveForceDone in XLogWalRcvWrite if the standby crashes exactly<br /> between them.<br /></blockquote><br/></span> Yeah. I think we can live with that.<span class="HOEnZb"><font color="#888888"><br /></font></span></blockquote></div>Justtested this patch with the combo pg_switch_xlog() + stop/immediate and the apparitionof the .ready files is not fixed.</div><div class="gmail_extra">Regards,<br />-- <br /></div><div class="gmail_extra">Michael<br/></div></div>
On Thu, Oct 23, 2014 at 10:16 PM, Michael Paquier <michael.paquier@gmail.com> wrote:--
Michael
Just tested this patch with the combo pg_switch_xlog() + stop/immediate and the apparition of the .ready files is not fixed.On Thu, Oct 23, 2014 at 1:45 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
That's not right. Should check *after* the write if the segment was completed, and close it if so. Like the attached.
Btw, I think that we should pass walEnd to XLogWalRcvWrite and add an additional check based on that after the write() loop calls to enforce the segment to be switched to .done.
Michael
On 10/23/2014 11:09 AM, Heikki Linnakangas wrote: > At least for master, we should consider changing the way the archiving > works so that we only archive WAL that was generated in the same server. > I.e. we should never try to archive WAL files belonging to another timeline. > > I just remembered that we discussed a different problem related to this > some time ago, at > http://www.postgresql.org/message-id/20131212.110002.204892575.horiguchi.kyotaro@lab.ntt.co.jp. > The conclusion of that was that at promotion, we should not archive the > last, partial, segment from the old timeline. So, this is what I came up with for master. Does anyone see a problem with it? - Heikki
Attachment
On Fri, Oct 24, 2014 at 3:05 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 10/23/2014 11:09 AM, Heikki Linnakangas wrote: >> >> At least for master, we should consider changing the way the archiving >> works so that we only archive WAL that was generated in the same server. >> I.e. we should never try to archive WAL files belonging to another >> timeline. >> >> I just remembered that we discussed a different problem related to this >> some time ago, at >> >> http://www.postgresql.org/message-id/20131212.110002.204892575.horiguchi.kyotaro@lab.ntt.co.jp. >> The conclusion of that was that at promotion, we should not archive the >> last, partial, segment from the old timeline. > > > So, this is what I came up with for master. Does anyone see a problem with > it? Thinking long-term, this is a solid approach, so +1 for it. I just tested the patch and the extra segment files do not show up anymore. Patch looks good as well. -- Michael
On Fri, Oct 24, 2014 at 9:05 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > So, this is what I came up with for master. Does anyone see a problem with > it? In the proposed commit message, you mis-spelled "significant" as "signicant". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Oct 24, 2014 at 10:05 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 10/23/2014 11:09 AM, Heikki Linnakangas wrote: >> >> At least for master, we should consider changing the way the archiving >> works so that we only archive WAL that was generated in the same server. >> I.e. we should never try to archive WAL files belonging to another >> timeline. >> >> I just remembered that we discussed a different problem related to this >> some time ago, at >> >> http://www.postgresql.org/message-id/20131212.110002.204892575.horiguchi.kyotaro@lab.ntt.co.jp. >> The conclusion of that was that at promotion, we should not archive the >> last, partial, segment from the old timeline. > > > So, this is what I came up with for master. Does anyone see a problem with > it? What about the problem that I raised upthread? This is, the patch prevents the last, partial, WAL file of the old timeline from being archived. So we can never PITR the database to the point that the last, partial WAL file has. Isn't this problem? For example, please imagine the following scenario. 1. The important data was deleted but no one noticed that. This deletion was logged in last, partial WAL file. 2. The server crashed and DBA started an archive recovery from old backup. 3. After recovery, all WAL files of the old timeline were recycled. 4. Finally DBA noticed the loss of important data and tried to do PITR to the point where the data was deleted. HOWEVER, the WAL file containing that deletion operation no longer exists. So DBA will never be able to recover that important data .... Regards, -- Fujii Masao
On 10/27/2014 02:12 PM, Fujii Masao wrote: > On Fri, Oct 24, 2014 at 10:05 PM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> On 10/23/2014 11:09 AM, Heikki Linnakangas wrote: >>> >>> At least for master, we should consider changing the way the archiving >>> works so that we only archive WAL that was generated in the same server. >>> I.e. we should never try to archive WAL files belonging to another >>> timeline. >>> >>> I just remembered that we discussed a different problem related to this >>> some time ago, at >>> >>> http://www.postgresql.org/message-id/20131212.110002.204892575.horiguchi.kyotaro@lab.ntt.co.jp. >>> The conclusion of that was that at promotion, we should not archive the >>> last, partial, segment from the old timeline. >> >> >> So, this is what I came up with for master. Does anyone see a problem with >> it? > > What about the problem that I raised upthread? This is, the patch > prevents the last, partial, WAL file of the old timeline from being archived. > So we can never PITR the database to the point that the last, partial WAL > file has. A partial WAL file is never archived in the master server to begin with, so if it's ever used in archive recovery, the administrator must have performed some manual action to copy the partial WAL file from the original server. When he does that, he can also copy it manually to the archive, or whatever he wants to do with it. Note that the same applies to any complete, but not-yet archived WAL files. But we've never had any mechanism in place to archive those in the new instance, after PITR. > Isn't this problem? For example, please imagine the > following scenario. > > 1. The important data was deleted but no one noticed that. This deletion was > logged in last, partial WAL file. > 2. The server crashed and DBA started an archive recovery from old backup. > 3. After recovery, all WAL files of the old timeline were recycled. > 4. Finally DBA noticed the loss of important data and tried to do PITR > to the point > where the data was deleted. > > HOWEVER, the WAL file containing that deletion operation no longer exists. > So DBA will never be able to recover that important data .... I think you're missing a step above: 1.5: The administrator copies the last, partial WAL file (and any complete but not yet-archived files) to the new server's pg_xlog directory. Without that, it won't be available for PITR anyway, and the new server won't see it or try to archive it, no matter what. - Heikki
On Tue, Oct 28, 2014 at 1:12 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 10/27/2014 02:12 PM, Fujii Masao wrote: >> >> On Fri, Oct 24, 2014 at 10:05 PM, Heikki Linnakangas >> <hlinnakangas@vmware.com> wrote: >>> >>> On 10/23/2014 11:09 AM, Heikki Linnakangas wrote: >>>> >>>> >>>> At least for master, we should consider changing the way the archiving >>>> works so that we only archive WAL that was generated in the same server. >>>> I.e. we should never try to archive WAL files belonging to another >>>> timeline. >>>> >>>> I just remembered that we discussed a different problem related to this >>>> some time ago, at >>>> >>>> >>>> http://www.postgresql.org/message-id/20131212.110002.204892575.horiguchi.kyotaro@lab.ntt.co.jp. >>>> The conclusion of that was that at promotion, we should not archive the >>>> last, partial, segment from the old timeline. >>> >>> >>> >>> So, this is what I came up with for master. Does anyone see a problem >>> with >>> it? >> >> >> What about the problem that I raised upthread? This is, the patch >> prevents the last, partial, WAL file of the old timeline from being >> archived. >> So we can never PITR the database to the point that the last, partial WAL >> file has. > > > A partial WAL file is never archived in the master server to begin with, so > if it's ever used in archive recovery, the administrator must have performed > some manual action to copy the partial WAL file from the original server. Yes. > When he does that, he can also copy it manually to the archive, or whatever > he wants to do with it. Yes, but currently he doesn't need to do that manually. At the end of archive recovery, .ready file is created and the partial, last, WAL file of the old timeline would be archived later. It's never recycled until it's archived. So when subsequent archive recovery is necessary, we can expect that it exists in either pg_xlog or archival storage. OTOH, if the patch is applied, it's never archived and may have been already recycled, and then subsequent archive recovery would fail. Therefore, the problem that the patch can cause is that the partial WAL file may be recycled even when it has not been archived yet. The partial WAL file may not be able to be archived for some reasons, but that's not directly a problem if it exists in pg_xlog. Regards, -- Fujii Masao
On 10/27/2014 06:12 PM, Heikki Linnakangas wrote: > On 10/27/2014 02:12 PM, Fujii Masao wrote: >> >On Fri, Oct 24, 2014 at 10:05 PM, Heikki Linnakangas >> ><hlinnakangas@vmware.com> wrote: >>> >>On 10/23/2014 11:09 AM, Heikki Linnakangas wrote: >>>> >>> >>>> >>>At least for master, we should consider changing the way the archiving >>>> >>>works so that we only archive WAL that was generated in the same server. >>>> >>>I.e. we should never try to archive WAL files belonging to another >>>> >>>timeline. >>>> >>> >>>> >>>I just remembered that we discussed a different problem related to this >>>> >>>some time ago, at >>>> >>> >>>> >>>http://www.postgresql.org/message-id/20131212.110002.204892575.horiguchi.kyotaro@lab.ntt.co.jp. >>>> >>>The conclusion of that was that at promotion, we should not archive the >>>> >>>last, partial, segment from the old timeline. >>> >> >>> >> >>> >>So, this is what I came up with for master. Does anyone see a problem with >>> >>it? >> > >> >What about the problem that I raised upthread? This is, the patch >> >prevents the last, partial, WAL file of the old timeline from being archived. >> >So we can never PITR the database to the point that the last, partial WAL >> >file has. > A partial WAL file is never archived in the master server to begin with, > so if it's ever used in archive recovery, the administrator must have > performed some manual action to copy the partial WAL file from the > original server. When he does that, he can also copy it manually to the > archive, or whatever he wants to do with it. > > Note that the same applies to any complete, but not-yet archived WAL > files. But we've never had any mechanism in place to archive those in > the new instance, after PITR. Actually, I'll take back what I said above. I had misunderstood the current behavior. Currently, a server *does* archive any files that you copy manually to pg_xlog, after PITR has finished. Eventually. We don't create a .ready file for them until they're old enough to be recycled. We do create a .ready file for the last, partial, segment, but it's pretty weird to do it just for that, and not any other, complete, segments that might've been copied to pg_xlog. So what happens is that the last partial segment gets archived immediately after promotion, but any older segments will linger unarchived until much later. The special treatment of the last partial segment still makes no sense. If we want the segments from the old timeline to be archived after PITR, we should archive them all immediately after end of recovery, not just the partial one. The exception for just the last partial segment is silly. Now, the bigger question is whether we want the server after PITR to be responsible for archiving the segments from the old timeline at all. If we do, then we should remove the special treatment of the last, partial segment, and create the .ready files for all the complete segments too. And actually, I think we should *not* archive the partial segment. We don't normally archive partial segments, and all the WAL required to restore the server to new timeline is copied to the file with the new TLI. If the old timeline is still live, i.e. there's a server somewhere still writing new WAL on the old timeline, the partial segment will clash with a complete segment that the other server will archive later. Yet another consideration is that we currently don't archive files streamed from the master. If we think that the standby server is responsible for archiving old segments after recovery, why is it not responsible for archiving the streamed segments? It's because in most cases, the master will archive the file, and we don't want two servers to archive the same file, but there is actually no guarantee on that. It might well be that the archiver runs a little bit behind in the master, and after crash the archive will miss some of the segments required. That's not good either. I'm not sure what to do here. The current behavior is inconsistent, and there are a some nasty gotchas that would be nice to fix. I think someone needs to sit down and write a high-level design of how this all should work. - Heikki
Hi, Am 04.09.2014 um 17:50 schrieb Jehan-Guillaume de Rorthais <jgdr@dalibo.com>: > Since few months, we occasionally see .ready files appearing on some slave > instances from various context. The two I have in mind are under 9.2.x. […] > So it seems for some reasons, these old WALs were "forgotten" by the > restartpoint mechanism when they should have been recylced/deleted. Am 08.10.2014 um 11:54 schrieb Heikki Linnakangas <hlinnakangas@vmware.com>: > 1. Where do the FF files come from? In 9.2, FF-segments are not supposed to created, ever. […] > 2. Why are the .done files sometimes not being created? We’ve encountered behaviour which seems to match what has been described here: On Streaming Replication slaves, there isan odd piling up of old WALs and .ready files in pg_xlog, going back several months. The fine people on IRC have pointed me to this thread, and have encouraged me to revive it with our observations, so herewe go: Environment: Master, 9.2.9 |- Slave S1, 9.2.9, on the same network as the master '- Slave S2, 9.2.9, some 100 km away (occassional network hickups; *not* a cascading replication) wal_keep_segments M=100 S1=100 S2=30 checkpoint_segments M=100 S1=30 S2=30 wal_level hot_standby (all) archive_mode on (all) archive_command on both slaves: /bin/true archive_timeout 600s (all) - On both slaves, we have „ghost“ WALs and corresponding .ready files (currently >600 of each on S2, slowly becoming a diskspace problem) - There’s always gaps in the ghost WAL names, often roughly 0x20, but not always - The slave with the „bad“ network link has significantly more of these files, which suggests that disturbances of the StreamingReplication increase chances of triggering this bug; OTOH, the presence of a name gap pattern suggests the opposite - We observe files named *FF as well As you can see in the directory listings below, this setup is *very* low traffic, which may explain the pattern in WAL namegaps (?). I’ve listed the entries by time, expecting to easily match WALs to their .ready files. There sometimes is an interesting delay between the WAL’s mtime and the .ready file — especially for *FF, where there’s severaldays between the WAL and the .ready file. - Master: http://pgsql.privatepaste.com/52ad612dfb - Slave S1: http://pgsql.privatepaste.com/58b4f3bb10 - Slave S2: http://pgsql.privatepaste.com/a693a8d7f4 I’ve only skimmed through the thread; my understanding is that there were several patches floating around, but nothing wascommitted. If there’s any way I can help, please let me know. - D.
On 12/10/2014 04:32 PM, Dennis Kögel wrote: > Hi, > > Am 04.09.2014 um 17:50 schrieb Jehan-Guillaume de Rorthais <jgdr@dalibo.com>: >> Since few months, we occasionally see .ready files appearing on some slave >> instances from various context. The two I have in mind are under 9.2.x. […] >> So it seems for some reasons, these old WALs were "forgotten" by the >> restartpoint mechanism when they should have been recylced/deleted. > > Am 08.10.2014 um 11:54 schrieb Heikki Linnakangas <hlinnakangas@vmware.com>: >> 1. Where do the FF files come from? In 9.2, FF-segments are not supposed to created, ever. […] >> 2. Why are the .done files sometimes not being created? > > > > We’ve encountered behaviour which seems to match what has been described here: On Streaming Replication slaves, there isan odd piling up of old WALs and .ready files in pg_xlog, going back several months. > > The fine people on IRC have pointed me to this thread, and have encouraged me to revive it with our observations, so herewe go: > > Environment: > > Master, 9.2.9 > |- Slave S1, 9.2.9, on the same network as the master > '- Slave S2, 9.2.9, some 100 km away (occassional network hickups; *not* a cascading replication) > > wal_keep_segments M=100 S1=100 S2=30 > checkpoint_segments M=100 S1=30 S2=30 > wal_level hot_standby (all) > archive_mode on (all) > archive_command on both slaves: /bin/true > archive_timeout 600s (all) > > > - On both slaves, we have „ghost“ WALs and corresponding .ready files (currently >600 of each on S2, slowly becoming adisk space problem) > > - There’s always gaps in the ghost WAL names, often roughly 0x20, but not always > > - The slave with the „bad“ network link has significantly more of these files, which suggests that disturbances of theStreaming Replication increase chances of triggering this bug; OTOH, the presence of a name gap pattern suggests the opposite > > - We observe files named *FF as well > > > As you can see in the directory listings below, this setup is *very* low traffic, which may explain the pattern in WALname gaps (?). > > I’ve listed the entries by time, expecting to easily match WALs to their .ready files. > There sometimes is an interesting delay between the WAL’s mtime and the .ready file — especially for *FF, where there’sseveral days between the WAL and the .ready file. > > - Master: http://pgsql.privatepaste.com/52ad612dfb > - Slave S1: http://pgsql.privatepaste.com/58b4f3bb10 > - Slave S2: http://pgsql.privatepaste.com/a693a8d7f4 > > > I’ve only skimmed through the thread; my understanding is that there were several patches floating around, but nothingwas committed. > If there’s any way I can help, please let me know. Yeah. It wasn't totally clear how all this should work, so I got distracted with other stuff an dropped the ball; sorry. I'm thinking that we should change the behaviour on master so that the standby never archives any files from older timelines, only the new one that it generates itself. That will solve the immediate problem of old WAL files accumulating, and bogus .ready files appearing in the standby. However, it will not solve the bigger problem of how do you ensure that all WAL files are archived, when you promote a standby server. There is no guarantee on that today anyway, but this will make it even less reliable, because it will increase the chances that you miss a file on the old timeline in the archive, after promoting. I'd argue that that's a good thing; it makes the issue more obvious, so you are more likely to encounter it in testing, and you won't be surprised in an emergency. But I've started a new thread on that bigger issue, hopefully we'll come up with a solution (http://www.postgresql.org/message-id/548AF1CB.80702@vmware.com). Now, what do we do with the back-branches? I'm not sure. Changing the behaviour in back-branches could cause nasty surprises. Perhaps it's best to just leave it as it is, even though it's buggy. - Heikki
2014-12-12 14:58 GMT+01:00 Heikki Linnakangas <hlinnakangas@vmware.com>:
--
Yeah. It wasn't totally clear how all this should work, so I got distracted with other stuff an dropped the ball; sorry.On 12/10/2014 04:32 PM, Dennis Kögel wrote:Hi,
Am 04.09.2014 um 17:50 schrieb Jehan-Guillaume de Rorthais <jgdr@dalibo.com>:Since few months, we occasionally see .ready files appearing on some slave
instances from various context. The two I have in mind are under 9.2.x. […]
So it seems for some reasons, these old WALs were "forgotten" by the
restartpoint mechanism when they should have been recylced/deleted.
Am 08.10.2014 um 11:54 schrieb Heikki Linnakangas <hlinnakangas@vmware.com>:1. Where do the FF files come from? In 9.2, FF-segments are not supposed to created, ever. […]
2. Why are the .done files sometimes not being created?
We’ve encountered behaviour which seems to match what has been described here: On Streaming Replication slaves, there is an odd piling up of old WALs and .ready files in pg_xlog, going back several months.
The fine people on IRC have pointed me to this thread, and have encouraged me to revive it with our observations, so here we go:
Environment:
Master, 9.2.9
|- Slave S1, 9.2.9, on the same network as the master
'- Slave S2, 9.2.9, some 100 km away (occassional network hickups; *not* a cascading replication)
wal_keep_segments M=100 S1=100 S2=30
checkpoint_segments M=100 S1=30 S2=30
wal_level hot_standby (all)
archive_mode on (all)
archive_command on both slaves: /bin/true
archive_timeout 600s (all)
- On both slaves, we have „ghost“ WALs and corresponding .ready files (currently >600 of each on S2, slowly becoming a disk space problem)
- There’s always gaps in the ghost WAL names, often roughly 0x20, but not always
- The slave with the „bad“ network link has significantly more of these files, which suggests that disturbances of the Streaming Replication increase chances of triggering this bug; OTOH, the presence of a name gap pattern suggests the opposite
- We observe files named *FF as well
As you can see in the directory listings below, this setup is *very* low traffic, which may explain the pattern in WAL name gaps (?).
I’ve listed the entries by time, expecting to easily match WALs to their .ready files.
There sometimes is an interesting delay between the WAL’s mtime and the .ready file — especially for *FF, where there’s several days between the WAL and the .ready file.
- Master: http://pgsql.privatepaste.com/52ad612dfb
- Slave S1: http://pgsql.privatepaste.com/58b4f3bb10
- Slave S2: http://pgsql.privatepaste.com/a693a8d7f4
I’ve only skimmed through the thread; my understanding is that there were several patches floating around, but nothing was committed.
If there’s any way I can help, please let me know.
I'm thinking that we should change the behaviour on master so that the standby never archives any files from older timelines, only the new one that it generates itself. That will solve the immediate problem of old WAL files accumulating, and bogus .ready files appearing in the standby. However, it will not solve the bigger problem of how do you ensure that all WAL files are archived, when you promote a standby server. There is no guarantee on that today anyway, but this will make it even less reliable, because it will increase the chances that you miss a file on the old timeline in the archive, after promoting. I'd argue that that's a good thing; it makes the issue more obvious, so you are more likely to encounter it in testing, and you won't be surprised in an emergency. But I've started a new thread on that bigger issue, hopefully we'll come up with a solution (http://www.postgresql.org/message-id/548AF1CB.80702@vmware.com).
Now, what do we do with the back-branches? I'm not sure. Changing the behaviour in back-branches could cause nasty surprises. Perhaps it's best to just leave it as it is, even though it's buggy.
As long as master is fixed, I don't actually care. But I agree with Dennis that it's hard to see what's been commited with all the different issues found, and if any commits were done, in which branch. I'd like to be able to tell my customers: update to this minor release to see if it's fixed, but I can't even do that.
--
On Wed, Dec 31, 2014 at 4:44 PM, Guillaume Lelarge <guillaume@lelarge.info> wrote: > 2014-12-12 14:58 GMT+01:00 Heikki Linnakangas <hlinnakangas@vmware.com>: >> Now, what do we do with the back-branches? I'm not sure. Changing the >> behaviour in back-branches could cause nasty surprises. Perhaps it's best to >> just leave it as it is, even though it's buggy. >> > > As long as master is fixed, I don't actually care. But I agree with Dennis > that it's hard to see what's been commited with all the different issues > found, and if any commits were done, in which branch. I'd like to be able to > tell my customers: update to this minor release to see if it's fixed, but I > can't even do that. This bug does not endanger at all data consistency as only old WAL files remain on the standby, so I'm fine as well with just a clean fix on master, and nothing done on back-branches. -- Michael
Am 31.12.2014 um 11:40 schrieb Michael Paquier <michael.paquier@gmail.com>: >> As long as master is fixed, I don't actually care. But I agree with Dennis >> that it's hard to see what's been commited with all the different issues >> found, and if any commits were done, in which branch. I'd like to be able to >> tell my customers: update to this minor release to see if it's fixed, but I >> can't even do that. > This bug does not endanger at all data consistency as only old WAL > files remain on the standby, so I'm fine as well with just a clean fix > on master, and nothing done on back-branches. I’d like to point out that this issue is causing severe disk space bloat over time, as described in my previous posting.I’d assume that, as time goes by, this issue will pop up more and more, as long as back-branches remain unfixed. And as the general recommendation is to never ever fiddle with pg_xlog/ contents, I’m not even sure what to do about it. - D.