Thread: Re: [PATCHES] odd output in restore mode
Below my comments on the CommitFest patch: pg_standby minor changes for Windows Simon, I'm sorry you got me, a Postgres newbie, signed up for reviewing your patch ;) To start with, I'm not quite sure of the status of this patch since Bruce's last comment on the -patches alias: Bruce Momjian wrote: > OK, based on these observations I think we need to learn more about the > issues before making any changes to our code. From easy to difficult: 1. Issues with applying the patch to CVS HEAD: The second file in the patch Index: doc/src/sgml/standby.sgml appears to be misnamed -- the existing file in HEAD is Index: doc/src/sgml/pgstandby.sgml However, still had issues after fixing the file name: md@Garu:~/pg/pgsql$ patch -c -p0 < ../pg_standby.patch patching file contrib/pg_standby/pg_standby.c patching file doc/src/sgml/pgstandby.sgml Hunk #1 FAILED at 136. Hunk #2 FAILED at 168. Hunk #3 FAILED at 245. Hunk #4 FAILED at 255. 4 out of 4 hunks FAILED -- saving rejects to file doc/src/sgml/pgstandby.sgml.rej 2. Missing description for new command-line options in pgstandby.sgml Simon Riggs wrote: > Patch implements > * recommendation to use GnuWin32 cp on Windows Saw that in the changes to pgstandby.sgml, and looks ok to me, but: - no description of the proposed new command-line options -h and -p? 3. No coding style issues seen Just one comment: the logic that selects the actual restore command to be used has moved from CustomizableInitialize() to main() -- a matter of personal taste, perhaps. But in my view the: + the #ifdef WIN32/HAVE_WORKING_LINK logic has become easier to read 4. Issue: missing break in switch, silent override of '-l' argument? This behaviour has been in there before and is not addresses by the patch: The user-selected Win32 "mklink" command mode is never applied due to a missing 'break' in CustomizableInitialize(): switch (restoreCommandType) { case RESTORE_COMMAND_WIN32_MKLINK: SET_RESTORE_COMMAND("mklink", WALFilePath, xlogFilePath); case RESTORE_COMMAND_WIN32_COPY: SET_RESTORE_COMMAND("copy", WALFilePath, xlogFilePath); break; A similar behaviour on Non-Win32 platforms where the user-selected "ln" may be silently changed to "cp" in main(): #if HAVE_WORKING_LINK restoreCommandType = RESTORE_COMMAND_LN; #else restoreCommandType = RESTORE_COMMAND_CP; #endif If both Win32/Non-Win32 cases reflect the intended behaviour: - I'd prefer a code comment in the above case-fall-through, - suggest a message to the user about the ignored "ln" / "mklink", - observe that the logic to override of the '-l' option is now in two places: CustomizableInitialize() and main(). 5. Minor wording issue in usage message on new '-p' option I was wondering if the "always" in the usage text fprintf(stderr, " -p always uses GNU compatible 'cp' command on all platforms\n"); is too strong, since multiple restore command options overwrite each other, e.g. "-p -c" applies Windows's "copy" instead of Gnu's "cp". 6. Minor code comment suggestion Unrelated to this patch, I wonder if the code comments on all four time-related vars better read "seconds" instead of "amount of time": int sleeptime = 5; /* amount of time to sleep between file checks */ int holdtime = 0; /* amount of time to wait once file appears full */ int waittime = -1; /* how long we have been waiting, -1 no wait * yet */ int maxwaittime = 0; /* how long are we prepared to wait for? */ 7. Question: benefits of separate holdtime option from sleeptime? Simon Riggs wrote: > * provide "holdtime" delay, default 0 (on all platforms) Going back on the hackers+patches emails and parsing the code comments, I'm sorry if I missed that, but I'm not sure I've understood the exact tuning benefits that introducing the new holdtime option provides over using the existing sleeptime, as it's been the case (just on Win32 only). 8. Unresolved question of implementing now/later a "cp" replacement Simon Riggs wrote: > On Tue, 2008-07-01 at 13:44 +0300, Heikki Linnakangas wrote: >> This seems pretty kludgey to me. I wouldn't want to install GnuWin32 >> utilities on a production system just for the "cp" command, and I don't >> know how I would tune holdtime properly for using "copy". And it seems >> risky to have defaults that are known to not work reliably. >> >> How about implementing a replacement function for "cp" ourselves? It >> seems pretty trivial to do. We could use that on Unixes as well, which >> would keep the differences between Win32 and other platforms smaller, >> and thus ensure the codepath gets more testing. > > If you've heard complaints about any of this from users, I haven't. > AFAIK we're doing this because it *might* cause a problem. Bear in mind > that link is the preferred performance option, not copy. So AFAICS we're > tuning a secondary option on one specific port, without it being a > raised issue and in an area of code that will be superceded in the next > release. > > So further embellishments would be a long way down my own priority list, > putting it politely. Yet I have no objections to the suggestion overall; > we have done that already for alter tablespace. Don't have much to add to the whether/now/later question of providing a "cp" replacement, but I guess the existing command-line options and documentation wouldn't have to change with our own "cp" replacement while the newly proposed '-h' and '-p' would become moot then, right? Regards, Martin
On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote: > 1. Issues with applying the patch to CVS HEAD: Sounds awful. Thanks for the review, will fix. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote: > 1. Issues with applying the patch to CVS HEAD: For me, the patch applies cleanly to CVS HEAD. I do notice that there are two files "standby.sgml" and "pgstandby.sgml". I can't see where "standby.sgml" comes from, but I haven't created it; perhaps it is a relic of the SGML build process. I've recreated my source tree since I wrote the patch also. Weird. I'll redo the patch so it points at pgstandby.sgml, which is the one thats listed as being in the main source tree. > 2. Missing description for new command-line options in pgstandby.sgml > > - no description of the proposed new command-line options -h and -p? These are done. The patch issues have missed those hunks. > 3. No coding style issues seen > > Just one comment: the logic that selects the actual restore command to > be used has moved from CustomizableInitialize() to main() -- a matter > of personal taste, perhaps. But in my view the: > + the #ifdef WIN32/HAVE_WORKING_LINK logic has become easier to read Thanks > 4. Issue: missing break in switch, silent override of '-l' argument? > > This behaviour has been in there before Well spotted. I don't claim to test this for Windows. > 5. Minor wording issue in usage message on new '-p' option > > I was wondering if the "always" in the usage text > fprintf(stderr, " -p always uses GNU compatible 'cp' command on all platforms\n"); > is too strong, since multiple restore command options overwrite each > other, e.g. "-p -c" applies Windows's "copy" instead of Gnu's "cp". I was assuming you don't turn the switch off again immediately afterwards. > 6. Minor code comment suggestion > > Unrelated to this patch, I wonder if the code comments on all four > time-related vars better read "seconds" instead of "amount of time": > int sleeptime = 5; /* amount of time to sleep between file checks */ > int holdtime = 0; /* amount of time to wait once file appears full */ > int waittime = -1; /* how long we have been waiting, -1 no wait > * yet */ > int maxwaittime = 0; /* how long are we prepared to wait for? */ As you say, unrelated to the patch. > 7. Question: benefits of separate holdtime option from sleeptime? > > Simon Riggs wrote: > > * provide "holdtime" delay, default 0 (on all platforms) > > Going back on the hackers+patches emails and parsing the code > comments, I'm sorry if I missed that, but I'm not sure I've understood > the exact tuning benefits that introducing the new holdtime option > provides over using the existing sleeptime, as it's been the case > (just on Win32 only). This is central to the patch, since the complaint was about the delay introduced by doing that previously. > 8. Unresolved question of implementing now/later a "cp" replacement The patch implements what's been agreed. I'm not rewriting "cp", for reasons already discussed. Not a comment to you Martin, but it's fairly clear that I'm not maintaining this correctly for Windows. I've never claimed to have tested this on Windows, and only included Windows related items as requested by others. I need to make it clear that I'm not going to maintain it at all, for Windows. If others wish to report Windows issues then they can suggest appropriate fixes and test them also. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Simon Riggs wrote: > On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote: >> 8. Unresolved question of implementing now/later a "cp" replacement > > The patch implements what's been agreed. > > I'm not rewriting "cp", for reasons already discussed. > > Not a comment to you Martin, but it's fairly clear that I'm not > maintaining this correctly for Windows. I've never claimed to have > tested this on Windows, and only included Windows related items as > requested by others. I need to make it clear that I'm not going to > maintain it at all, for Windows. If others wish to report Windows issues > then they can suggest appropriate fixes and test them also. Hmm. I just realized that replacing the "cp" command within pg_standby won't help at all. The problem is with the command that copies the files *to* the archivelocation that pg_standby polls, not with the copy pg_standby does from archivelocation to pg_xlog. And we don't have much control over that. We really need a more reliable way of detecting that a file has been fully copied. One simple improvement would be to check the xlp_magic field of the last page, though it still wouldn't be bullet-proof. Do the commands that preallocate the space keep the file exclusively locked during the copy? If they do, shouldn't we get an error in trying to run the restore copy command, and retry after the 1s sleep in RestoreWALFileForRecovery? Though if the archive location is a samba mount or something, I guess we can't rely on Windows-style exclusive locking. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
>>> "Heikki Linnakangas" <heikki@enterprisedb.com> wrote: > We really need a more reliable way of detecting that a file has been > fully copied. In our scripts we handle this by copying to a temp directory on the same mount point as the archive directory and doing a mv to the archive location when the copy is successfully completed. I think that this even works on Windows. Could that just be documented as a strong recommendation for the archive script? -Kevin
Kevin Grittner wrote: >>>> "Heikki Linnakangas" <heikki@enterprisedb.com> wrote: >>>> > > >> We really need a more reliable way of detecting that a file has been >> > > >> fully copied. >> > > In our scripts we handle this by copying to a temp directory on the > same mount point as the archive directory and doing a mv to the > archive location when the copy is successfully completed. I think > that this even works on Windows. Could that just be documented as a > strong recommendation for the archive script? > > > > Needs testing at least. If it does in fact work then we can just adjust the docs and be done - or maybe provide a .bat file or perl script that would work as na archive_command on Windows. cheers andrew
On Wed, 2008-07-23 at 21:38 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote: > >> 8. Unresolved question of implementing now/later a "cp" replacement > > > > The patch implements what's been agreed. > > > > I'm not rewriting "cp", for reasons already discussed. > > > > Not a comment to you Martin, but it's fairly clear that I'm not > > maintaining this correctly for Windows. I've never claimed to have > > tested this on Windows, and only included Windows related items as > > requested by others. I need to make it clear that I'm not going to > > maintain it at all, for Windows. If others wish to report Windows issues > > then they can suggest appropriate fixes and test them also. > > Hmm. I just realized that replacing the "cp" command within pg_standby > won't help at all. The problem is with the command that copies the files > *to* the archivelocation that pg_standby polls, not with the copy > pg_standby does from archivelocation to pg_xlog. And we don't have much > control over that. > > We really need a more reliable way of detecting that a file has been > fully copied. One simple improvement would be to check the xlp_magic > field of the last page, though it still wouldn't be bullet-proof. > > Do the commands that preallocate the space keep the file exclusively > locked during the copy? If they do, shouldn't we get an error in trying > to run the restore copy command, and retry after the 1s sleep in > RestoreWALFileForRecovery? Though if the archive location is a samba > mount or something, I guess we can't rely on Windows-style exclusive > locking. With respect, I need to refer you back to the my last paragraph above. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote: > reviewing your patch Current status is this: * My understanding is that Dave and Andrew (and therefore Simon) think the approach proposed here is an acceptable one. Heikki disagrees and wants different approach. Perhaps I misunderstand. * Patch needs work to complete the proposed approach * I'm willing to change the patch, but not able to test it on Windows. Is there someone able to test the patch, if I make the changes? If not, we should just kick this out of the CommitFest queue now and be done. If nobody cares enough about this issue to test a fix, we shouldn't bother. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Simon Riggs <simon@2ndquadrant.com> writes: > On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote: >> reviewing your patch > Current status is this: > * My understanding is that Dave and Andrew (and therefore Simon) think > the approach proposed here is an acceptable one. Heikki disagrees and > wants different approach. Perhaps I misunderstand. > * Patch needs work to complete the proposed approach > * I'm willing to change the patch, but not able to test it on Windows. I thought the latest conclusion was that changing the behavior of pg_standby itself wouldn't address the problem anyway, and that what we need is just a docs patch recommending that people use safe copying methods in their scripts that copy to the archive area? regards, tom lane
On Fri, 2008-07-25 at 16:31 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote: > >> reviewing your patch > > > Current status is this: > > * My understanding is that Dave and Andrew (and therefore Simon) think > > the approach proposed here is an acceptable one. Heikki disagrees and > > wants different approach. Perhaps I misunderstand. > > * Patch needs work to complete the proposed approach > > * I'm willing to change the patch, but not able to test it on Windows. > > I thought the latest conclusion was that changing the behavior of > pg_standby itself wouldn't address the problem anyway, and that what we > need is just a docs patch recommending that people use safe copying > methods in their scripts that copy to the archive area? Plus the rest of this patch, which is really very simple. pg_standby currently waits (on Windows) for the sleep time. We agreed that this sleep would be on by default, but optional. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Simon Riggs <simon@2ndquadrant.com> writes: > On Fri, 2008-07-25 at 16:31 -0400, Tom Lane wrote: >> I thought the latest conclusion was that changing the behavior of >> pg_standby itself wouldn't address the problem anyway, and that what we >> need is just a docs patch recommending that people use safe copying >> methods in their scripts that copy to the archive area? > Plus the rest of this patch, which is really very simple. Why? AFAICT the patch is just a kluge that adds user-visible complexity without providing a solution that's actually sure to work. regards, tom lane
On Fri, 2008-07-25 at 16:58 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > On Fri, 2008-07-25 at 16:31 -0400, Tom Lane wrote: > >> I thought the latest conclusion was that changing the behavior of > >> pg_standby itself wouldn't address the problem anyway, and that what we > >> need is just a docs patch recommending that people use safe copying > >> methods in their scripts that copy to the archive area? > > > Plus the rest of this patch, which is really very simple. > > Why? AFAICT the patch is just a kluge that adds user-visible complexity > without providing a solution that's actually sure to work. First, I'm not the one objecting to the current behaviour. Currently, there is a wait in there that can be removed if you use a copy utility that sets size after it does a copy. So we agreed to make it optional (at PGCon). -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Andrew Dunstan wrote: > Kevin Grittner wrote: >>>>> "Heikki Linnakangas" <heikki@enterprisedb.com> wrote: >>> We really need a more reliable way of detecting that a file has been >>> fully copied. >> >> In our scripts we handle this by copying to a temp directory on the >> same mount point as the archive directory and doing a mv to the >> archive location when the copy is successfully completed. I think >> that this even works on Windows. Could that just be documented as a >> strong recommendation for the archive script? > > Needs testing at least. If it does in fact work then we can just adjust > the docs and be done Yeah. > - or maybe provide a .bat file or perl script that > would work as na archive_command on Windows. We're not talking about archive_command. We're talking about the thing that copies files to the directory that pg_standby polls. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Andrew Dunstan wrote: > > >> - or maybe provide a .bat file or perl script that would work as na >> archive_command on Windows. > > We're not talking about archive_command. We're talking about the thing > that copies files to the directory that pg_standby polls. Er, that's what the archive_command is. Look at the pg_standby docs and you'll see that that's where we're currently recommending use of windows copy. Perhaps you're confusing this with the restore_command? cheers andrew
Andrew Dunstan wrote: > > > Heikki Linnakangas wrote: >> Andrew Dunstan wrote: >> >> >>> - or maybe provide a .bat file or perl script that would work as na >>> archive_command on Windows. >> >> We're not talking about archive_command. We're talking about the thing >> that copies files to the directory that pg_standby polls. > > Er, that's what the archive_command is. Look at the pg_standby docs and > you'll see that that's where we're currently recommending use of windows > copy. Perhaps you're confusing this with the restore_command? Oh, right. I was thinking that archive_command copies the files to an archive location, and there's yet another process copying files from there to the directory pg_standby polls. But indeed in the simple configuration, archive_command is the command that we're interested in. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 23 Jul 2008, Kevin Grittner wrote: > In our scripts we handle this by copying to a temp directory on the > same mount point as the archive directory and doing a mv to the > archive location when the copy is successfully completed. I think > that this even works on Windows. Could that just be documented as a > strong recommendation for the archive script? This is exactly what I always do. I think the way cp is shown in the examples promotes what's really a bad practice for lots of reasons, this particular problem being just one of them. I've been working on an improved archive_command shell script that I expect to submit for comments and potential inclusion in the documentation as a better base for other people to build on. This is one of the options for how it can operate. It would be painful but not impossible to convert a subset of that script to run under Windows as well, at least enough to cover this particular issue. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith wrote: > On Wed, 23 Jul 2008, Kevin Grittner wrote: > >> In our scripts we handle this by copying to a temp directory on the >> same mount point as the archive directory and doing a mv to the >> archive location when the copy is successfully completed. I think >> that this even works on Windows. Could that just be documented as a >> strong recommendation for the archive script? > > This is exactly what I always do. I think the way cp is shown in the > examples promotes what's really a bad practice for lots of reasons, > this particular problem being just one of them. > > I've been working on an improved archive_command shell script that I > expect to submit for comments and potential inclusion in the > documentation as a better base for other people to build on. This is > one of the options for how it can operate. It would be painful but not > impossible to convert a subset of that script to run under Windows as > well, at least enough to cover this particular issue. > > A Perl script using the (standard) File::Copy module along with the builtin function rename() should be moderately portable. It would to be nice not to have to maintain two scripts. cheers andrew
Andrew Dunstan wrote: > > > Greg Smith wrote: >> On Wed, 23 Jul 2008, Kevin Grittner wrote: >> >>> In our scripts we handle this by copying to a temp directory on the >>> same mount point as the archive directory and doing a mv to the >>> archive location when the copy is successfully completed. I think >>> that this even works on Windows. Could that just be documented as a >>> strong recommendation for the archive script? >> >> This is exactly what I always do. I think the way cp is shown in the >> examples promotes what's really a bad practice for lots of reasons, >> this particular problem being just one of them. >> >> I've been working on an improved archive_command shell script that I >> expect to submit for comments and potential inclusion in the >> documentation as a better base for other people to build on. This is >> one of the options for how it can operate. It would be painful but not >> impossible to convert a subset of that script to run under Windows as >> well, at least enough to cover this particular issue. > > A Perl script using the (standard) File::Copy module along with the > builtin function rename() should be moderately portable. It would to be > nice not to have to maintain two scripts. It's also not very nice to require a Perl installation on Windows, just for a replacement of Copy. Would a simple .bat script work? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Andrew Dunstan wrote: >> Greg Smith wrote: >>> On Wed, 23 Jul 2008, Kevin Grittner wrote: >>> >>> I've been working on an improved archive_command shell script that I >>> expect to submit for comments and potential inclusion in the >>> documentation as a better base for other people to build on. This is >>> one of the options for how it can operate. It would be painful but >>> not impossible to convert a subset of that script to run under >>> Windows as well, at least enough to cover this particular issue. >> >> A Perl script using the (standard) File::Copy module along with the >> builtin function rename() should be moderately portable. It would to >> be nice not to have to maintain two scripts. > > It's also not very nice to require a Perl installation on Windows, just > for a replacement of Copy. Would a simple .bat script work? With these avenues to be explored, can the pg_standby patch on the CommitFest wiki be moved to the "Returned with Feedback" section? Regards, Martin
Martin Zaun wrote: > Heikki Linnakangas wrote: >> Andrew Dunstan wrote: >>> Greg Smith wrote: >>>> On Wed, 23 Jul 2008, Kevin Grittner wrote: >>>> >>>> I've been working on an improved archive_command shell script that I >>>> expect to submit for comments and potential inclusion in the >>>> documentation as a better base for other people to build on. This is >>>> one of the options for how it can operate. It would be painful but >>>> not impossible to convert a subset of that script to run under >>>> Windows as well, at least enough to cover this particular issue. >>> >>> A Perl script using the (standard) File::Copy module along with the >>> builtin function rename() should be moderately portable. It would to >>> be nice not to have to maintain two scripts. >> >> It's also not very nice to require a Perl installation on Windows, >> just for a replacement of Copy. Would a simple .bat script work? > > With these avenues to be explored, can the pg_standby patch on the > CommitFest wiki be moved to the "Returned with Feedback" section? Yes, I think we can conclude that we don't want this patch as it is. Instead, we want a documentation patch that describes the problem, mentioning that GNU cp is safe, or you can use the copy+rename trick. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > Martin Zaun wrote: >> With these avenues to be explored, can the pg_standby patch on the >> CommitFest wiki be moved to the "Returned with Feedback" section? > Yes, I think we can conclude that we don't want this patch as it is. > Instead, we want a documentation patch that describes the problem, > mentioning that GNU cp is safe, or you can use the copy+rename trick. Right, after which we remove the presently hacked-in delay. I've updated the commitfest page accordingly. regards, tom lane
On Thu, 2008-07-31 at 12:32 -0400, Tom Lane wrote: > "Heikki Linnakangas" <heikki@enterprisedb.com> writes: > > Martin Zaun wrote: > >> With these avenues to be explored, can the pg_standby patch on the > >> CommitFest wiki be moved to the "Returned with Feedback" section? > > > Yes, I think we can conclude that we don't want this patch as it is. > > Instead, we want a documentation patch that describes the problem, > > mentioning that GNU cp is safe, or you can use the copy+rename trick. > > Right, after which we remove the presently hacked-in delay. > > I've updated the commitfest page accordingly. Well, this is a strange conclusion, leaving me slightly bemused. The discussion between Andrew and I at PGcon concluded that we would * document which other tools to use * remove the delay Now we have rejected the patch which does that, but then re-requested the exact same thing again. The patch interprets "remove the delay" as "remove the delay in a way which will not screw up existing users of pg_standby when they upgrade". Doing that requires us to have a configurable delay, which defaults to the current behaviour, but that can be set to zero (the recommended way). Which is what the patch implements. Andrew, Heikki: ISTM its time to just make the changes yourselves. This is just going round and round to no benefit. This doesn't warrant such a long discussion and review process. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Simon Riggs wrote: > Well, this is a strange conclusion, leaving me slightly bemused. > > The discussion between Andrew and I at PGcon concluded that we would > * document which other tools to use > * remove the delay > > Now we have rejected the patch which does that, but then re-requested > the exact same thing again. > > The patch interprets "remove the delay" as "remove the delay in a way > which will not screw up existing users of pg_standby when they upgrade". > Doing that requires us to have a configurable delay, which defaults to > the current behaviour, but that can be set to zero (the recommended > way). Which is what the patch implements. > > Andrew, Heikki: ISTM its time to just make the changes yourselves. This > is just going round and round to no benefit. This doesn't warrant such a > long discussion and review process. > You ought to know by now that the length and ferocity of the discussion bears no relation at all to the importance of the subject ;-) Personally, I think it's reasonable to provide the delay as long as it's switchable, although I would have preferred zero to be the default. If we remove it altogether then we force bigger changes on people who are currently using Windows copy. But I can live with that since changing their archive_command is the better path by far anyway, either to use Gnu cp or the copy / rename trick. cheers andrew
Have we made any progress on this, namely better documentation and removing the Win32 delay code? --------------------------------------------------------------------------- Andrew Dunstan wrote: > > > Simon Riggs wrote: > > Well, this is a strange conclusion, leaving me slightly bemused. > > > > The discussion between Andrew and I at PGcon concluded that we would > > * document which other tools to use > > * remove the delay > > > > Now we have rejected the patch which does that, but then re-requested > > the exact same thing again. > > > > The patch interprets "remove the delay" as "remove the delay in a way > > which will not screw up existing users of pg_standby when they upgrade". > > Doing that requires us to have a configurable delay, which defaults to > > the current behaviour, but that can be set to zero (the recommended > > way). Which is what the patch implements. > > > > Andrew, Heikki: ISTM its time to just make the changes yourselves. This > > is just going round and round to no benefit. This doesn't warrant such a > > long discussion and review process. > > > > You ought to know by now that the length and ferocity of the discussion > bears no relation at all to the importance of the subject ;-) > > Personally, I think it's reasonable to provide the delay as long as it's > switchable, although I would have preferred zero to be the default. If > we remove it altogether then we force bigger changes on people who are > currently using Windows copy. But I can live with that since changing > their archive_command is the better path by far anyway, either to use > Gnu cp or the copy / rename trick. > > cheers > > andrew > -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
I have a fairly large TODO list, and Simon has thrown in the towel (and I imagine he also has a large TODO list). anyone else want to step in? cheers andrew Bruce Momjian wrote: > Have we made any progress on this, namely better documentation and > removing the Win32 delay code? > > --------------------------------------------------------------------------- > > Andrew Dunstan wrote: > >> Simon Riggs wrote: >> >>> Well, this is a strange conclusion, leaving me slightly bemused. >>> >>> The discussion between Andrew and I at PGcon concluded that we would >>> * document which other tools to use >>> * remove the delay >>> >>> Now we have rejected the patch which does that, but then re-requested >>> the exact same thing again. >>> >>> The patch interprets "remove the delay" as "remove the delay in a way >>> which will not screw up existing users of pg_standby when they upgrade". >>> Doing that requires us to have a configurable delay, which defaults to >>> the current behaviour, but that can be set to zero (the recommended >>> way). Which is what the patch implements. >>> >>> Andrew, Heikki: ISTM its time to just make the changes yourselves. This >>> is just going round and round to no benefit. This doesn't warrant such a >>> long discussion and review process. >>> >>> >> You ought to know by now that the length and ferocity of the discussion >> bears no relation at all to the importance of the subject ;-) >> >> Personally, I think it's reasonable to provide the delay as long as it's >> switchable, although I would have preferred zero to be the default. If >> we remove it altogether then we force bigger changes on people who are >> currently using Windows copy. But I can live with that since changing >> their archive_command is the better path by far anyway, either to use >> Gnu cp or the copy / rename trick. >> >> cheers >> >> andrew >> >> > >
Martin Zaun wrote: > 4. Issue: missing break in switch, silent override of '-l' argument? > > This behaviour has been in there before and is not addresses by the > patch: The user-selected Win32 "mklink" command mode is never applied > due to a missing 'break' in CustomizableInitialize(): > > switch (restoreCommandType) > { > case RESTORE_COMMAND_WIN32_MKLINK: > SET_RESTORE_COMMAND("mklink", WALFilePath, xlogFilePath); > case RESTORE_COMMAND_WIN32_COPY: > SET_RESTORE_COMMAND("copy", WALFilePath, xlogFilePath); > break; I have added the missing 'break' to CVS HEAD; thanks. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Martin Zaun wrote: >> 4. Issue: missing break in switch, silent override of '-l' argument? >> >> This behaviour has been in there before and is not addresses by the >> patch: The user-selected Win32 "mklink" command mode is never applied >> due to a missing 'break' in CustomizableInitialize(): >> >> switch (restoreCommandType) >> { >> case RESTORE_COMMAND_WIN32_MKLINK: >> SET_RESTORE_COMMAND("mklink", WALFilePath, xlogFilePath); >> case RESTORE_COMMAND_WIN32_COPY: >> SET_RESTORE_COMMAND("copy", WALFilePath, xlogFilePath); >> break; > > I have added the missing 'break' to CVS HEAD; thanks. Why no backpatch to 8.3? Seems like a clear bugfix to me. //Magnus
Since this patch was rejected, I have added the attached documentation to pg_standby to mention the sleep() we do. --------------------------------------------------------------------------- Martin Zaun wrote: > > Below my comments on the CommitFest patch: > pg_standby minor changes for Windows > > Simon, I'm sorry you got me, a Postgres newbie, signed up for > reviewing your patch ;) > > To start with, I'm not quite sure of the status of this patch > since Bruce's last comment on the -patches alias: > > Bruce Momjian wrote: > > OK, based on these observations I think we need to learn more about the > > issues before making any changes to our code. > > From easy to difficult: > > 1. Issues with applying the patch to CVS HEAD: > > The second file in the patch > Index: doc/src/sgml/standby.sgml > appears to be misnamed -- the existing file in HEAD is > Index: doc/src/sgml/pgstandby.sgml > > However, still had issues after fixing the file name: > > md@Garu:~/pg/pgsql$ patch -c -p0 < ../pg_standby.patch > patching file contrib/pg_standby/pg_standby.c > patching file doc/src/sgml/pgstandby.sgml > Hunk #1 FAILED at 136. > Hunk #2 FAILED at 168. > Hunk #3 FAILED at 245. > Hunk #4 FAILED at 255. > 4 out of 4 hunks FAILED -- saving rejects to file doc/src/sgml/pgstandby.sgml.rej > > > 2. Missing description for new command-line options in pgstandby.sgml > > Simon Riggs wrote: > > Patch implements > > * recommendation to use GnuWin32 cp on Windows > > Saw that in the changes to pgstandby.sgml, and looks ok to me, but: > - no description of the proposed new command-line options -h and -p? > > > 3. No coding style issues seen > > Just one comment: the logic that selects the actual restore command to > be used has moved from CustomizableInitialize() to main() -- a matter > of personal taste, perhaps. But in my view the: > + the #ifdef WIN32/HAVE_WORKING_LINK logic has become easier to read > > > 4. Issue: missing break in switch, silent override of '-l' argument? > > This behaviour has been in there before and is not addresses by the > patch: The user-selected Win32 "mklink" command mode is never applied > due to a missing 'break' in CustomizableInitialize(): > > switch (restoreCommandType) > { > case RESTORE_COMMAND_WIN32_MKLINK: > SET_RESTORE_COMMAND("mklink", WALFilePath, xlogFilePath); > case RESTORE_COMMAND_WIN32_COPY: > SET_RESTORE_COMMAND("copy", WALFilePath, xlogFilePath); > break; > > A similar behaviour on Non-Win32 platforms where the user-selected > "ln" may be silently changed to "cp" in main(): > > #if HAVE_WORKING_LINK > restoreCommandType = RESTORE_COMMAND_LN; > #else > restoreCommandType = RESTORE_COMMAND_CP; > #endif > > If both Win32/Non-Win32 cases reflect the intended behaviour: > - I'd prefer a code comment in the above case-fall-through, > - suggest a message to the user about the ignored "ln" / "mklink", > - observe that the logic to override of the '-l' option is now in two > places: CustomizableInitialize() and main(). > > > 5. Minor wording issue in usage message on new '-p' option > > I was wondering if the "always" in the usage text > fprintf(stderr, " -p always uses GNU compatible 'cp' command on all platforms\n"); > is too strong, since multiple restore command options overwrite each > other, e.g. "-p -c" applies Windows's "copy" instead of Gnu's "cp". > > > 6. Minor code comment suggestion > > Unrelated to this patch, I wonder if the code comments on all four > time-related vars better read "seconds" instead of "amount of time": > int sleeptime = 5; /* amount of time to sleep between file checks */ > int holdtime = 0; /* amount of time to wait once file appears full */ > int waittime = -1; /* how long we have been waiting, -1 no wait > * yet */ > int maxwaittime = 0; /* how long are we prepared to wait for? */ > > > 7. Question: benefits of separate holdtime option from sleeptime? > > Simon Riggs wrote: > > * provide "holdtime" delay, default 0 (on all platforms) > > Going back on the hackers+patches emails and parsing the code > comments, I'm sorry if I missed that, but I'm not sure I've understood > the exact tuning benefits that introducing the new holdtime option > provides over using the existing sleeptime, as it's been the case > (just on Win32 only). > > > 8. Unresolved question of implementing now/later a "cp" replacement > > Simon Riggs wrote: > > On Tue, 2008-07-01 at 13:44 +0300, Heikki Linnakangas wrote: > >> This seems pretty kludgey to me. I wouldn't want to install GnuWin32 > >> utilities on a production system just for the "cp" command, and I don't > >> know how I would tune holdtime properly for using "copy". And it seems > >> risky to have defaults that are known to not work reliably. > >> > >> How about implementing a replacement function for "cp" ourselves? It > >> seems pretty trivial to do. We could use that on Unixes as well, which > >> would keep the differences between Win32 and other platforms smaller, > >> and thus ensure the codepath gets more testing. > > > > If you've heard complaints about any of this from users, I haven't. > > AFAIK we're doing this because it *might* cause a problem. Bear in mind > > that link is the preferred performance option, not copy. So AFAICS we're > > tuning a secondary option on one specific port, without it being a > > raised issue and in an area of code that will be superceded in the next > > release. > > > > So further embellishments would be a long way down my own priority list, > > putting it politely. Yet I have no objections to the suggestion overall; > > we have done that already for alter tablespace. > > Don't have much to add to the whether/now/later question of providing > a "cp" replacement, but I guess the existing command-line options and > documentation wouldn't have to change with our own "cp" replacement > while the newly proposed '-h' and '-p' would become moot then, right? > > Regards, > Martin -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: pgstandby.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/pgstandby.sgml,v retrieving revision 2.5 diff -c -r2.5 pgstandby.sgml *** pgstandby.sgml 7 May 2008 18:48:40 -0000 2.5 --- pgstandby.sgml 15 Dec 2008 22:04:09 -0000 *************** *** 295,301 **** </itemizedlist> <para> ! Since the Windows example uses <literal>copy</> at both ends, either or both servers might be accessing the archive directory across the network. </para> --- 295,310 ---- </itemizedlist> <para> ! The <literal>copy</> command on Windows sets the final file size ! before the file is completely copied, which would ordinarly confuse ! <application>pg_standby</application>. Therefore ! <application>pg_standby</application> waits <literal>sleeptime</> ! seconds once it sees the proper file size. GNUWin32's <literal>cp</> ! sets the file size only after the file copy is complete. ! </para> ! ! <para> ! Using the Since the Windows example uses <literal>copy</> at both ends, either or both servers might be accessing the archive directory across the network. </para>
Magnus Hagander wrote: > Bruce Momjian wrote: > > Martin Zaun wrote: > >> 4. Issue: missing break in switch, silent override of '-l' argument? > >> > >> This behaviour has been in there before and is not addresses by the > >> patch: The user-selected Win32 "mklink" command mode is never applied > >> due to a missing 'break' in CustomizableInitialize(): > >> > >> switch (restoreCommandType) > >> { > >> case RESTORE_COMMAND_WIN32_MKLINK: > >> SET_RESTORE_COMMAND("mklink", WALFilePath, xlogFilePath); > >> case RESTORE_COMMAND_WIN32_COPY: > >> SET_RESTORE_COMMAND("copy", WALFilePath, xlogFilePath); > >> break; > > > > I have added the missing 'break' to CVS HEAD; thanks. > > Why no backpatch to 8.3? Seems like a clear bugfix to me. I knew that was going to be asked. At this point I am pulling comments from rejected patches into CVS commits; these are not even submitted patches. I am not comfortable backpatching anything when using that system because obviously no one else even cared enough to submit a patch for it, let alone test it. If someone wants to batckpatch this or submit a patch to be backpatched, that is fine. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Tom Lane wrote: > "Heikki Linnakangas" <heikki@enterprisedb.com> writes: > > Martin Zaun wrote: > >> With these avenues to be explored, can the pg_standby patch on the > >> CommitFest wiki be moved to the "Returned with Feedback" section? > > > Yes, I think we can conclude that we don't want this patch as it is. > > Instead, we want a documentation patch that describes the problem, > > mentioning that GNU cp is safe, or you can use the copy+rename trick. > > Right, after which we remove the presently hacked-in delay. > > I've updated the commitfest page accordingly. I have documented the sleep() call and that GNU cp is safe, but did not remove the delay, nor mention copy+rename. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Mon, 2008-12-15 at 17:10 -0500, Bruce Momjian wrote: > > > > Why no backpatch to 8.3? Seems like a clear bugfix to me. > > I knew that was going to be asked. 8.3 is really where this is needed. 8.4 has almost no need of this. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support