Thread: Re: [PATCHES] odd output in restore mode

Re: [PATCHES] odd output in restore mode

From
Martin Zaun
Date:
Below my comments on the CommitFest patch:
   pg_standby minor changes for Windows

Simon, I'm sorry you got me, a Postgres newbie, signed up for
reviewing your patch ;)

To start with, I'm not quite sure of the status of this patch
since Bruce's last comment on the -patches alias:

Bruce Momjian wrote:
 > OK, based on these observations I think we need to learn more about the
 > issues before making any changes to our code.

 From easy to difficult:

1. Issues with applying the patch to CVS HEAD:

The second file in the patch
   Index: doc/src/sgml/standby.sgml
appears to be misnamed -- the existing file in HEAD is
   Index: doc/src/sgml/pgstandby.sgml

However, still had issues after fixing the file name:

md@Garu:~/pg/pgsql$ patch -c -p0 < ../pg_standby.patch
patching file contrib/pg_standby/pg_standby.c
patching file doc/src/sgml/pgstandby.sgml
Hunk #1 FAILED at 136.
Hunk #2 FAILED at 168.
Hunk #3 FAILED at 245.
Hunk #4 FAILED at 255.
4 out of 4 hunks FAILED -- saving rejects to file doc/src/sgml/pgstandby.sgml.rej


2. Missing description for new command-line options in pgstandby.sgml

Simon Riggs wrote:
 > Patch implements
 > * recommendation to use GnuWin32 cp on Windows

Saw that in the changes to pgstandby.sgml, and looks ok to me, but:
- no description of the proposed new command-line options -h and -p?


3. No coding style issues seen

Just one comment: the logic that selects the actual restore command to
be used has moved from CustomizableInitialize() to main() -- a matter
of personal taste, perhaps.  But in my view the:
+ the #ifdef WIN32/HAVE_WORKING_LINK logic has become easier to read


4. Issue: missing break in switch, silent override of '-l' argument?

This behaviour has been in there before and is not addresses by the
patch: The user-selected Win32 "mklink" command mode is never applied
due to a missing 'break' in CustomizableInitialize():

     switch (restoreCommandType)
     {
         case RESTORE_COMMAND_WIN32_MKLINK:
             SET_RESTORE_COMMAND("mklink", WALFilePath, xlogFilePath);
         case RESTORE_COMMAND_WIN32_COPY:
             SET_RESTORE_COMMAND("copy", WALFilePath, xlogFilePath);
             break;

A similar behaviour on Non-Win32 platforms where the user-selected
"ln" may be silently changed to "cp" in main():

#if HAVE_WORKING_LINK
                 restoreCommandType = RESTORE_COMMAND_LN;
#else
                 restoreCommandType = RESTORE_COMMAND_CP;
#endif

If both Win32/Non-Win32 cases reflect the intended behaviour:
- I'd prefer a code comment in the above case-fall-through,
- suggest a message to the user about the ignored "ln" / "mklink",
- observe that the logic to override of the '-l' option is now in two
   places: CustomizableInitialize() and main().


5. Minor wording issue in usage message on new '-p' option

I was wondering if the "always" in the usage text
     fprintf(stderr, "  -p           always uses GNU compatible 'cp' command on all platforms\n");
is too strong, since multiple restore command options overwrite each
other, e.g. "-p -c" applies Windows's "copy" instead of Gnu's "cp".


6. Minor code comment suggestion

Unrelated to this patch, I wonder if the code comments on all four
time-related vars better read "seconds" instead of "amount of time":
int         sleeptime = 5;      /* amount of time to sleep between file checks */
int         holdtime = 0;       /* amount of time to wait once file appears full */
int         waittime = -1;      /* how long we have been waiting, -1 no wait
                                  * yet */
int         maxwaittime = 0;    /* how long are we prepared to wait for? */


7. Question: benefits of separate holdtime option from sleeptime?

Simon Riggs wrote:
 > * provide "holdtime" delay, default 0 (on all platforms)

Going back on the hackers+patches emails and parsing the code
comments, I'm sorry if I missed that, but I'm not sure I've understood
the exact tuning benefits that introducing the new holdtime option
provides over using the existing sleeptime, as it's been the case
(just on Win32 only).


8. Unresolved question of implementing now/later a "cp" replacement

Simon Riggs wrote:
 > On Tue, 2008-07-01 at 13:44 +0300, Heikki Linnakangas wrote:
 >> This seems pretty kludgey to me. I wouldn't want to install GnuWin32
 >> utilities on a production system just for the "cp" command, and I don't
 >> know how I would tune holdtime properly for using "copy". And it seems
 >> risky to have defaults that are known to not work reliably.
 >>
 >> How about implementing a replacement function for "cp" ourselves? It
 >> seems pretty trivial to do. We could use that on Unixes as well, which
 >> would keep the differences between Win32 and other platforms smaller,
 >> and thus ensure the codepath gets more testing.
 >
 > If you've heard complaints about any of this from users, I haven't.
 > AFAIK we're doing this because it *might* cause a problem. Bear in mind
 > that link is the preferred performance option, not copy. So AFAICS we're
 > tuning a secondary option on one specific port, without it being a
 > raised issue and in an area of code that will be superceded in the next
 > release.
 >
 > So further embellishments would be a long way down my own priority list,
 > putting it politely. Yet I have no objections to the suggestion overall;
 > we have done that already for alter tablespace.

Don't have much to add to the whether/now/later question of providing
a "cp" replacement, but I guess the existing command-line options and
documentation wouldn't have to change with our own "cp" replacement
while the newly proposed '-h' and '-p' would become moot then, right?

Regards,
Martin

Re: [PATCHES] odd output in restore mode

From
Simon Riggs
Date:
On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote:
> 1. Issues with applying the patch to CVS HEAD:

Sounds awful. Thanks for the review, will fix.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: [PATCHES] odd output in restore mode

From
Simon Riggs
Date:
On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote:

> 1. Issues with applying the patch to CVS HEAD:

For me, the patch applies cleanly to CVS HEAD.

I do notice that there are two files "standby.sgml" and
"pgstandby.sgml". I can't see where "standby.sgml" comes from, but I
haven't created it; perhaps it is a relic of the SGML build process.
I've recreated my source tree since I wrote the patch also. Weird.

I'll redo the patch so it points at pgstandby.sgml, which is the one
thats listed as being in the main source tree.

> 2. Missing description for new command-line options in pgstandby.sgml
>
> - no description of the proposed new command-line options -h and -p?

These are done. The patch issues have missed those hunks.

> 3. No coding style issues seen
>
> Just one comment: the logic that selects the actual restore command to
> be used has moved from CustomizableInitialize() to main() -- a matter
> of personal taste, perhaps.  But in my view the:
> + the #ifdef WIN32/HAVE_WORKING_LINK logic has become easier to read

Thanks

> 4. Issue: missing break in switch, silent override of '-l' argument?
>
> This behaviour has been in there before

Well spotted. I don't claim to test this for Windows.

> 5. Minor wording issue in usage message on new '-p' option
>
> I was wondering if the "always" in the usage text
>      fprintf(stderr, "  -p           always uses GNU compatible 'cp' command on all platforms\n");
> is too strong, since multiple restore command options overwrite each
> other, e.g. "-p -c" applies Windows's "copy" instead of Gnu's "cp".

I was assuming you don't turn the switch off again immediately
afterwards.

> 6. Minor code comment suggestion
>
> Unrelated to this patch, I wonder if the code comments on all four
> time-related vars better read "seconds" instead of "amount of time":
> int         sleeptime = 5;      /* amount of time to sleep between file checks */
> int         holdtime = 0;       /* amount of time to wait once file appears full */
> int         waittime = -1;      /* how long we have been waiting, -1 no wait
>                                   * yet */
> int         maxwaittime = 0;    /* how long are we prepared to wait for? */

As you say, unrelated to the patch.

> 7. Question: benefits of separate holdtime option from sleeptime?
>
> Simon Riggs wrote:
>  > * provide "holdtime" delay, default 0 (on all platforms)
>
> Going back on the hackers+patches emails and parsing the code
> comments, I'm sorry if I missed that, but I'm not sure I've understood
> the exact tuning benefits that introducing the new holdtime option
> provides over using the existing sleeptime, as it's been the case
> (just on Win32 only).

This is central to the patch, since the complaint was about the delay
introduced by doing that previously.

> 8. Unresolved question of implementing now/later a "cp" replacement

The patch implements what's been agreed.

I'm not rewriting "cp", for reasons already discussed.



Not a comment to you Martin, but it's fairly clear that I'm not
maintaining this correctly for Windows. I've never claimed to have
tested this on Windows, and only included Windows related items as
requested by others. I need to make it clear that I'm not going to
maintain it at all, for Windows. If others wish to report Windows issues
then they can suggest appropriate fixes and test them also.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: [PATCHES] odd output in restore mode

From
"Heikki Linnakangas"
Date:
Simon Riggs wrote:
> On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote:
>> 8. Unresolved question of implementing now/later a "cp" replacement
>
> The patch implements what's been agreed.
>
> I'm not rewriting "cp", for reasons already discussed.
>
> Not a comment to you Martin, but it's fairly clear that I'm not
> maintaining this correctly for Windows. I've never claimed to have
> tested this on Windows, and only included Windows related items as
> requested by others. I need to make it clear that I'm not going to
> maintain it at all, for Windows. If others wish to report Windows issues
> then they can suggest appropriate fixes and test them also.

Hmm. I just realized that replacing the "cp" command within pg_standby
won't help at all. The problem is with the command that copies the files
*to* the archivelocation that pg_standby polls, not with the copy
pg_standby does from archivelocation to pg_xlog. And we don't have much
control over that.

We really need a more reliable way of detecting that a file has been
fully copied. One simple improvement would be to check the xlp_magic
field of the last page, though it still wouldn't be bullet-proof.

Do the commands that preallocate the space keep the file exclusively
locked during the copy? If they do, shouldn't we get an error in trying
to run the restore copy command, and retry after the 1s sleep in
RestoreWALFileForRecovery? Though if the archive location is a samba
mount or something, I guess we can't rely on Windows-style exclusive
locking.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: [PATCHES] odd output in restore mode

From
"Kevin Grittner"
Date:
>>> "Heikki Linnakangas" <heikki@enterprisedb.com> wrote:

> We really need a more reliable way of detecting that a file has been

> fully copied.

In our scripts we handle this by copying to a temp directory on the
same mount point as the archive directory and doing a mv to the
archive location when the copy is successfully completed.  I think
that this even works on Windows.  Could that just be documented as a
strong recommendation for the archive script?

-Kevin

Re: [PATCHES] odd output in restore mode

From
Andrew Dunstan
Date:

Kevin Grittner wrote:
>>>> "Heikki Linnakangas" <heikki@enterprisedb.com> wrote:
>>>>
>
>
>> We really need a more reliable way of detecting that a file has been
>>
>
>
>> fully copied.
>>
>
> In our scripts we handle this by copying to a temp directory on the
> same mount point as the archive directory and doing a mv to the
> archive location when the copy is successfully completed.  I think
> that this even works on Windows.  Could that just be documented as a
> strong recommendation for the archive script?
>
>
>
>

Needs testing at least. If it does in fact work then we can just adjust
the docs and be done - or maybe provide a .bat file or perl script that
would work as na archive_command on Windows.

cheers

andrew

Re: [PATCHES] odd output in restore mode

From
Simon Riggs
Date:
On Wed, 2008-07-23 at 21:38 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote:
> >> 8. Unresolved question of implementing now/later a "cp" replacement
> >
> > The patch implements what's been agreed.
> >
> > I'm not rewriting "cp", for reasons already discussed.
> >
> > Not a comment to you Martin, but it's fairly clear that I'm not
> > maintaining this correctly for Windows. I've never claimed to have
> > tested this on Windows, and only included Windows related items as
> > requested by others. I need to make it clear that I'm not going to
> > maintain it at all, for Windows. If others wish to report Windows issues
> > then they can suggest appropriate fixes and test them also.
>
> Hmm. I just realized that replacing the "cp" command within pg_standby
> won't help at all. The problem is with the command that copies the files
> *to* the archivelocation that pg_standby polls, not with the copy
> pg_standby does from archivelocation to pg_xlog. And we don't have much
> control over that.
>
> We really need a more reliable way of detecting that a file has been
> fully copied. One simple improvement would be to check the xlp_magic
> field of the last page, though it still wouldn't be bullet-proof.
>
> Do the commands that preallocate the space keep the file exclusively
> locked during the copy? If they do, shouldn't we get an error in trying
> to run the restore copy command, and retry after the 1s sleep in
> RestoreWALFileForRecovery? Though if the archive location is a samba
> mount or something, I guess we can't rely on Windows-style exclusive
> locking.

With respect, I need to refer you back to the my last paragraph above.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: [PATCHES] odd output in restore mode

From
Simon Riggs
Date:
On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote:

> reviewing your patch

Current status is this:

* My understanding is that Dave and Andrew (and therefore Simon) think
the approach proposed here is an acceptable one. Heikki disagrees and
wants different approach. Perhaps I misunderstand.

* Patch needs work to complete the proposed approach

* I'm willing to change the patch, but not able to test it on Windows.

Is there someone able to test the patch, if I make the changes? If not,
we should just kick this out of the CommitFest queue now and be done. If
nobody cares enough about this issue to test a fix, we shouldn't bother.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: [PATCHES] odd output in restore mode

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote:
>> reviewing your patch

> Current status is this:
> * My understanding is that Dave and Andrew (and therefore Simon) think
> the approach proposed here is an acceptable one. Heikki disagrees and
> wants different approach. Perhaps I misunderstand.
> * Patch needs work to complete the proposed approach
> * I'm willing to change the patch, but not able to test it on Windows.

I thought the latest conclusion was that changing the behavior of
pg_standby itself wouldn't address the problem anyway, and that what we
need is just a docs patch recommending that people use safe copying
methods in their scripts that copy to the archive area?

            regards, tom lane

Re: [PATCHES] odd output in restore mode

From
Simon Riggs
Date:
On Fri, 2008-07-25 at 16:31 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote:
> >> reviewing your patch
>
> > Current status is this:
> > * My understanding is that Dave and Andrew (and therefore Simon) think
> > the approach proposed here is an acceptable one. Heikki disagrees and
> > wants different approach. Perhaps I misunderstand.
> > * Patch needs work to complete the proposed approach
> > * I'm willing to change the patch, but not able to test it on Windows.
>
> I thought the latest conclusion was that changing the behavior of
> pg_standby itself wouldn't address the problem anyway, and that what we
> need is just a docs patch recommending that people use safe copying
> methods in their scripts that copy to the archive area?

Plus the rest of this patch, which is really very simple.

pg_standby currently waits (on Windows) for the sleep time. We agreed
that this sleep would be on by default, but optional.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: [PATCHES] odd output in restore mode

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Fri, 2008-07-25 at 16:31 -0400, Tom Lane wrote:
>> I thought the latest conclusion was that changing the behavior of
>> pg_standby itself wouldn't address the problem anyway, and that what we
>> need is just a docs patch recommending that people use safe copying
>> methods in their scripts that copy to the archive area?

> Plus the rest of this patch, which is really very simple.

Why?  AFAICT the patch is just a kluge that adds user-visible complexity
without providing a solution that's actually sure to work.

            regards, tom lane

Re: [PATCHES] odd output in restore mode

From
Simon Riggs
Date:
On Fri, 2008-07-25 at 16:58 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Fri, 2008-07-25 at 16:31 -0400, Tom Lane wrote:
> >> I thought the latest conclusion was that changing the behavior of
> >> pg_standby itself wouldn't address the problem anyway, and that what we
> >> need is just a docs patch recommending that people use safe copying
> >> methods in their scripts that copy to the archive area?
>
> > Plus the rest of this patch, which is really very simple.
>
> Why?  AFAICT the patch is just a kluge that adds user-visible complexity
> without providing a solution that's actually sure to work.

First, I'm not the one objecting to the current behaviour.

Currently, there is a wait in there that can be removed if you use a
copy utility that sets size after it does a copy. So we agreed to make
it optional (at PGCon).

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: [PATCHES] odd output in restore mode

From
"Heikki Linnakangas"
Date:
Andrew Dunstan wrote:
> Kevin Grittner wrote:
>>>>> "Heikki Linnakangas" <heikki@enterprisedb.com> wrote:
>>> We really need a more reliable way of detecting that a file has been
>>> fully copied.
>>
>> In our scripts we handle this by copying to a temp directory on the
>> same mount point as the archive directory and doing a mv to the
>> archive location when the copy is successfully completed.  I think
>> that this even works on Windows.  Could that just be documented as a
>> strong recommendation for the archive script?
>
> Needs testing at least. If it does in fact work then we can just adjust
> the docs and be done

Yeah.

> - or maybe provide a .bat file or perl script that
> would work as na archive_command on Windows.

We're not talking about archive_command. We're talking about the thing
that copies files to the directory that pg_standby polls.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: [PATCHES] odd output in restore mode

From
Andrew Dunstan
Date:

Heikki Linnakangas wrote:
> Andrew Dunstan wrote:
>
>
>> - or maybe provide a .bat file or perl script that would work as na
>> archive_command on Windows.
>
> We're not talking about archive_command. We're talking about the thing
> that copies files to the directory that pg_standby polls.

Er, that's what the archive_command is. Look at the pg_standby docs and
you'll see that that's where we're currently recommending use of windows
copy. Perhaps you're confusing this with the restore_command?

cheers

andrew

Re: [PATCHES] odd output in restore mode

From
"Heikki Linnakangas"
Date:
Andrew Dunstan wrote:
>
>
> Heikki Linnakangas wrote:
>> Andrew Dunstan wrote:
>>
>>
>>> - or maybe provide a .bat file or perl script that would work as na
>>> archive_command on Windows.
>>
>> We're not talking about archive_command. We're talking about the thing
>> that copies files to the directory that pg_standby polls.
>
> Er, that's what the archive_command is. Look at the pg_standby docs and
> you'll see that that's where we're currently recommending use of windows
> copy. Perhaps you're confusing this with the restore_command?

Oh, right. I was thinking that archive_command copies the files to an
archive location, and there's yet another process copying files from
there to the directory pg_standby polls. But indeed in the simple
configuration, archive_command is the command that we're interested in.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: [PATCHES] odd output in restore mode

From
Greg Smith
Date:
On Wed, 23 Jul 2008, Kevin Grittner wrote:

> In our scripts we handle this by copying to a temp directory on the
> same mount point as the archive directory and doing a mv to the
> archive location when the copy is successfully completed.  I think
> that this even works on Windows.  Could that just be documented as a
> strong recommendation for the archive script?

This is exactly what I always do.  I think the way cp is shown in the
examples promotes what's really a bad practice for lots of reasons, this
particular problem being just one of them.

I've been working on an improved archive_command shell script that I
expect to submit for comments and potential inclusion in the documentation
as a better base for other people to build on.  This is one of the options
for how it can operate.  It would be painful but not impossible to convert
a subset of that script to run under Windows as well, at least enough to
cover this particular issue.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: [PATCHES] odd output in restore mode

From
Andrew Dunstan
Date:

Greg Smith wrote:
> On Wed, 23 Jul 2008, Kevin Grittner wrote:
>
>> In our scripts we handle this by copying to a temp directory on the
>> same mount point as the archive directory and doing a mv to the
>> archive location when the copy is successfully completed. I think
>> that this even works on Windows. Could that just be documented as a
>> strong recommendation for the archive script?
>
> This is exactly what I always do. I think the way cp is shown in the
> examples promotes what's really a bad practice for lots of reasons,
> this particular problem being just one of them.
>
> I've been working on an improved archive_command shell script that I
> expect to submit for comments and potential inclusion in the
> documentation as a better base for other people to build on. This is
> one of the options for how it can operate. It would be painful but not
> impossible to convert a subset of that script to run under Windows as
> well, at least enough to cover this particular issue.
>
>

A Perl script using the (standard) File::Copy module along with the
builtin function rename() should be moderately portable. It would to be
nice not to have to maintain two scripts.

cheers

andrew

Re: [PATCHES] odd output in restore mode

From
"Heikki Linnakangas"
Date:
Andrew Dunstan wrote:
>
>
> Greg Smith wrote:
>> On Wed, 23 Jul 2008, Kevin Grittner wrote:
>>
>>> In our scripts we handle this by copying to a temp directory on the
>>> same mount point as the archive directory and doing a mv to the
>>> archive location when the copy is successfully completed. I think
>>> that this even works on Windows. Could that just be documented as a
>>> strong recommendation for the archive script?
>>
>> This is exactly what I always do. I think the way cp is shown in the
>> examples promotes what's really a bad practice for lots of reasons,
>> this particular problem being just one of them.
>>
>> I've been working on an improved archive_command shell script that I
>> expect to submit for comments and potential inclusion in the
>> documentation as a better base for other people to build on. This is
>> one of the options for how it can operate. It would be painful but not
>> impossible to convert a subset of that script to run under Windows as
>> well, at least enough to cover this particular issue.
>
> A Perl script using the (standard) File::Copy module along with the
> builtin function rename() should be moderately portable. It would to be
> nice not to have to maintain two scripts.

It's also not very nice to require a Perl installation on Windows, just
for a replacement of Copy. Would a simple .bat script work?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: [PATCHES] odd output in restore mode

From
Martin Zaun
Date:
Heikki Linnakangas wrote:
> Andrew Dunstan wrote:
>> Greg Smith wrote:
>>> On Wed, 23 Jul 2008, Kevin Grittner wrote:
>>>
>>> I've been working on an improved archive_command shell script that I
>>> expect to submit for comments and potential inclusion in the
>>> documentation as a better base for other people to build on. This is
>>> one of the options for how it can operate. It would be painful but
>>> not impossible to convert a subset of that script to run under
>>> Windows as well, at least enough to cover this particular issue.
>>
>> A Perl script using the (standard) File::Copy module along with the
>> builtin function rename() should be moderately portable. It would to
>> be nice not to have to maintain two scripts.
>
> It's also not very nice to require a Perl installation on Windows, just
> for a replacement of Copy. Would a simple .bat script work?

With these avenues to be explored, can the pg_standby patch on the
CommitFest wiki be moved to the "Returned with Feedback" section?

Regards,
Martin

Re: [PATCHES] odd output in restore mode

From
"Heikki Linnakangas"
Date:
Martin Zaun wrote:
> Heikki Linnakangas wrote:
>> Andrew Dunstan wrote:
>>> Greg Smith wrote:
>>>> On Wed, 23 Jul 2008, Kevin Grittner wrote:
>>>>
>>>> I've been working on an improved archive_command shell script that I
>>>> expect to submit for comments and potential inclusion in the
>>>> documentation as a better base for other people to build on. This is
>>>> one of the options for how it can operate. It would be painful but
>>>> not impossible to convert a subset of that script to run under
>>>> Windows as well, at least enough to cover this particular issue.
>>>
>>> A Perl script using the (standard) File::Copy module along with the
>>> builtin function rename() should be moderately portable. It would to
>>> be nice not to have to maintain two scripts.
>>
>> It's also not very nice to require a Perl installation on Windows,
>> just for a replacement of Copy. Would a simple .bat script work?
>
> With these avenues to be explored, can the pg_standby patch on the
> CommitFest wiki be moved to the "Returned with Feedback" section?

Yes, I think we can conclude that we don't want this patch as it is.
Instead, we want a documentation patch that describes the problem,
mentioning that GNU cp is safe, or you can use the copy+rename trick.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: [PATCHES] odd output in restore mode

From
Tom Lane
Date:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> Martin Zaun wrote:
>> With these avenues to be explored, can the pg_standby patch on the
>> CommitFest wiki be moved to the "Returned with Feedback" section?

> Yes, I think we can conclude that we don't want this patch as it is.
> Instead, we want a documentation patch that describes the problem,
> mentioning that GNU cp is safe, or you can use the copy+rename trick.

Right, after which we remove the presently hacked-in delay.

I've updated the commitfest page accordingly.

            regards, tom lane

Re: [PATCHES] odd output in restore mode

From
Simon Riggs
Date:
On Thu, 2008-07-31 at 12:32 -0400, Tom Lane wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> > Martin Zaun wrote:
> >> With these avenues to be explored, can the pg_standby patch on the
> >> CommitFest wiki be moved to the "Returned with Feedback" section?
>
> > Yes, I think we can conclude that we don't want this patch as it is.
> > Instead, we want a documentation patch that describes the problem,
> > mentioning that GNU cp is safe, or you can use the copy+rename trick.
>
> Right, after which we remove the presently hacked-in delay.
>
> I've updated the commitfest page accordingly.

Well, this is a strange conclusion, leaving me slightly bemused.

The discussion between Andrew and I at PGcon concluded that we would
* document which other tools to use
* remove the delay

Now we have rejected the patch which does that, but then re-requested
the exact same thing again.

The patch interprets "remove the delay" as "remove the delay in a way
which will not screw up existing users of pg_standby when they upgrade".
Doing that requires us to have a configurable delay, which defaults to
the current behaviour, but that can be set to zero (the recommended
way). Which is what the patch implements.

Andrew, Heikki: ISTM its time to just make the changes yourselves. This
is just going round and round to no benefit. This doesn't warrant such a
long discussion and review process.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: [PATCHES] odd output in restore mode

From
Andrew Dunstan
Date:

Simon Riggs wrote:
> Well, this is a strange conclusion, leaving me slightly bemused.
>
> The discussion between Andrew and I at PGcon concluded that we would
> * document which other tools to use
> * remove the delay
>
> Now we have rejected the patch which does that, but then re-requested
> the exact same thing again.
>
> The patch interprets "remove the delay" as "remove the delay in a way
> which will not screw up existing users of pg_standby when they upgrade".
> Doing that requires us to have a configurable delay, which defaults to
> the current behaviour, but that can be set to zero (the recommended
> way). Which is what the patch implements.
>
> Andrew, Heikki: ISTM its time to just make the changes yourselves. This
> is just going round and round to no benefit. This doesn't warrant such a
> long discussion and review process.
>

You ought to know by now that the length and ferocity of the discussion
bears no relation at all to the importance of the subject ;-)

Personally, I think it's reasonable to provide the delay as long as it's
switchable, although I would have preferred zero to be the default. If
we remove it altogether then we force bigger changes on people who are
currently using Windows copy. But I can live with that since changing
their archive_command is the better path by far anyway, either to use
Gnu cp or the copy / rename trick.

cheers

andrew



Re: [PATCHES] odd output in restore mode

From
Bruce Momjian
Date:
Have we made any progress on this, namely better documentation and
removing the Win32 delay code?

---------------------------------------------------------------------------

Andrew Dunstan wrote:
> 
> 
> Simon Riggs wrote:
> > Well, this is a strange conclusion, leaving me slightly bemused.
> >
> > The discussion between Andrew and I at PGcon concluded that we would
> > * document which other tools to use
> > * remove the delay
> >
> > Now we have rejected the patch which does that, but then re-requested
> > the exact same thing again.
> >
> > The patch interprets "remove the delay" as "remove the delay in a way
> > which will not screw up existing users of pg_standby when they upgrade".
> > Doing that requires us to have a configurable delay, which defaults to
> > the current behaviour, but that can be set to zero (the recommended
> > way). Which is what the patch implements.
> >
> > Andrew, Heikki: ISTM its time to just make the changes yourselves. This
> > is just going round and round to no benefit. This doesn't warrant such a
> > long discussion and review process.
> >   
> 
> You ought to know by now that the length and ferocity of the discussion 
> bears no relation at all to the importance of the subject ;-)
> 
> Personally, I think it's reasonable to provide the delay as long as it's 
> switchable, although I would have preferred zero to be the default. If 
> we remove it altogether then we force bigger changes on people who are 
> currently using Windows copy. But I can live with that since changing 
> their archive_command is the better path by far anyway, either to use 
> Gnu cp or the copy / rename trick.
> 
> cheers
> 
> andrew
> 

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [PATCHES] odd output in restore mode

From
Andrew Dunstan
Date:
I have a fairly large TODO list, and Simon has thrown in the towel (and 
I imagine he also has a large TODO list).

anyone else want to step in?

cheers

andrew


Bruce Momjian wrote:
> Have we made any progress on this, namely better documentation and
> removing the Win32 delay code?
>
> ---------------------------------------------------------------------------
>
> Andrew Dunstan wrote:
>   
>> Simon Riggs wrote:
>>     
>>> Well, this is a strange conclusion, leaving me slightly bemused.
>>>
>>> The discussion between Andrew and I at PGcon concluded that we would
>>> * document which other tools to use
>>> * remove the delay
>>>
>>> Now we have rejected the patch which does that, but then re-requested
>>> the exact same thing again.
>>>
>>> The patch interprets "remove the delay" as "remove the delay in a way
>>> which will not screw up existing users of pg_standby when they upgrade".
>>> Doing that requires us to have a configurable delay, which defaults to
>>> the current behaviour, but that can be set to zero (the recommended
>>> way). Which is what the patch implements.
>>>
>>> Andrew, Heikki: ISTM its time to just make the changes yourselves. This
>>> is just going round and round to no benefit. This doesn't warrant such a
>>> long discussion and review process.
>>>   
>>>       
>> You ought to know by now that the length and ferocity of the discussion 
>> bears no relation at all to the importance of the subject ;-)
>>
>> Personally, I think it's reasonable to provide the delay as long as it's 
>> switchable, although I would have preferred zero to be the default. If 
>> we remove it altogether then we force bigger changes on people who are 
>> currently using Windows copy. But I can live with that since changing 
>> their archive_command is the better path by far anyway, either to use 
>> Gnu cp or the copy / rename trick.
>>
>> cheers
>>
>> andrew
>>
>>     
>
>   


Re: [PATCHES] odd output in restore mode

From
Bruce Momjian
Date:
Martin Zaun wrote:
> 4. Issue: missing break in switch, silent override of '-l' argument?
> 
> This behaviour has been in there before and is not addresses by the
> patch: The user-selected Win32 "mklink" command mode is never applied
> due to a missing 'break' in CustomizableInitialize():
> 
>      switch (restoreCommandType)
>      {
>          case RESTORE_COMMAND_WIN32_MKLINK:
>              SET_RESTORE_COMMAND("mklink", WALFilePath, xlogFilePath);
>          case RESTORE_COMMAND_WIN32_COPY:
>              SET_RESTORE_COMMAND("copy", WALFilePath, xlogFilePath);
>              break;

I have added the missing 'break' to CVS HEAD;  thanks.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [PATCHES] odd output in restore mode

From
Magnus Hagander
Date:
Bruce Momjian wrote:
> Martin Zaun wrote:
>> 4. Issue: missing break in switch, silent override of '-l' argument?
>>
>> This behaviour has been in there before and is not addresses by the
>> patch: The user-selected Win32 "mklink" command mode is never applied
>> due to a missing 'break' in CustomizableInitialize():
>>
>>      switch (restoreCommandType)
>>      {
>>          case RESTORE_COMMAND_WIN32_MKLINK:
>>              SET_RESTORE_COMMAND("mklink", WALFilePath, xlogFilePath);
>>          case RESTORE_COMMAND_WIN32_COPY:
>>              SET_RESTORE_COMMAND("copy", WALFilePath, xlogFilePath);
>>              break;
> 
> I have added the missing 'break' to CVS HEAD;  thanks.

Why no backpatch to 8.3? Seems like a clear bugfix to me.

//Magnus



Re: [PATCHES] odd output in restore mode

From
Bruce Momjian
Date:
Since this patch was rejected, I have added the attached documentation
to pg_standby to mention the sleep() we do.

---------------------------------------------------------------------------

Martin Zaun wrote:
>
> Below my comments on the CommitFest patch:
>    pg_standby minor changes for Windows
>
> Simon, I'm sorry you got me, a Postgres newbie, signed up for
> reviewing your patch ;)
>
> To start with, I'm not quite sure of the status of this patch
> since Bruce's last comment on the -patches alias:
>
> Bruce Momjian wrote:
>  > OK, based on these observations I think we need to learn more about the
>  > issues before making any changes to our code.
>
>  From easy to difficult:
>
> 1. Issues with applying the patch to CVS HEAD:
>
> The second file in the patch
>    Index: doc/src/sgml/standby.sgml
> appears to be misnamed -- the existing file in HEAD is
>    Index: doc/src/sgml/pgstandby.sgml
>
> However, still had issues after fixing the file name:
>
> md@Garu:~/pg/pgsql$ patch -c -p0 < ../pg_standby.patch
> patching file contrib/pg_standby/pg_standby.c
> patching file doc/src/sgml/pgstandby.sgml
> Hunk #1 FAILED at 136.
> Hunk #2 FAILED at 168.
> Hunk #3 FAILED at 245.
> Hunk #4 FAILED at 255.
> 4 out of 4 hunks FAILED -- saving rejects to file doc/src/sgml/pgstandby.sgml.rej
>
>
> 2. Missing description for new command-line options in pgstandby.sgml
>
> Simon Riggs wrote:
>  > Patch implements
>  > * recommendation to use GnuWin32 cp on Windows
>
> Saw that in the changes to pgstandby.sgml, and looks ok to me, but:
> - no description of the proposed new command-line options -h and -p?
>
>
> 3. No coding style issues seen
>
> Just one comment: the logic that selects the actual restore command to
> be used has moved from CustomizableInitialize() to main() -- a matter
> of personal taste, perhaps.  But in my view the:
> + the #ifdef WIN32/HAVE_WORKING_LINK logic has become easier to read
>
>
> 4. Issue: missing break in switch, silent override of '-l' argument?
>
> This behaviour has been in there before and is not addresses by the
> patch: The user-selected Win32 "mklink" command mode is never applied
> due to a missing 'break' in CustomizableInitialize():
>
>      switch (restoreCommandType)
>      {
>          case RESTORE_COMMAND_WIN32_MKLINK:
>              SET_RESTORE_COMMAND("mklink", WALFilePath, xlogFilePath);
>          case RESTORE_COMMAND_WIN32_COPY:
>              SET_RESTORE_COMMAND("copy", WALFilePath, xlogFilePath);
>              break;
>
> A similar behaviour on Non-Win32 platforms where the user-selected
> "ln" may be silently changed to "cp" in main():
>
> #if HAVE_WORKING_LINK
>                  restoreCommandType = RESTORE_COMMAND_LN;
> #else
>                  restoreCommandType = RESTORE_COMMAND_CP;
> #endif
>
> If both Win32/Non-Win32 cases reflect the intended behaviour:
> - I'd prefer a code comment in the above case-fall-through,
> - suggest a message to the user about the ignored "ln" / "mklink",
> - observe that the logic to override of the '-l' option is now in two
>    places: CustomizableInitialize() and main().
>
>
> 5. Minor wording issue in usage message on new '-p' option
>
> I was wondering if the "always" in the usage text
>      fprintf(stderr, "  -p           always uses GNU compatible 'cp' command on all platforms\n");
> is too strong, since multiple restore command options overwrite each
> other, e.g. "-p -c" applies Windows's "copy" instead of Gnu's "cp".
>
>
> 6. Minor code comment suggestion
>
> Unrelated to this patch, I wonder if the code comments on all four
> time-related vars better read "seconds" instead of "amount of time":
> int         sleeptime = 5;      /* amount of time to sleep between file checks */
> int         holdtime = 0;       /* amount of time to wait once file appears full */
> int         waittime = -1;      /* how long we have been waiting, -1 no wait
>                                   * yet */
> int         maxwaittime = 0;    /* how long are we prepared to wait for? */
>
>
> 7. Question: benefits of separate holdtime option from sleeptime?
>
> Simon Riggs wrote:
>  > * provide "holdtime" delay, default 0 (on all platforms)
>
> Going back on the hackers+patches emails and parsing the code
> comments, I'm sorry if I missed that, but I'm not sure I've understood
> the exact tuning benefits that introducing the new holdtime option
> provides over using the existing sleeptime, as it's been the case
> (just on Win32 only).
>
>
> 8. Unresolved question of implementing now/later a "cp" replacement
>
> Simon Riggs wrote:
>  > On Tue, 2008-07-01 at 13:44 +0300, Heikki Linnakangas wrote:
>  >> This seems pretty kludgey to me. I wouldn't want to install GnuWin32
>  >> utilities on a production system just for the "cp" command, and I don't
>  >> know how I would tune holdtime properly for using "copy". And it seems
>  >> risky to have defaults that are known to not work reliably.
>  >>
>  >> How about implementing a replacement function for "cp" ourselves? It
>  >> seems pretty trivial to do. We could use that on Unixes as well, which
>  >> would keep the differences between Win32 and other platforms smaller,
>  >> and thus ensure the codepath gets more testing.
>  >
>  > If you've heard complaints about any of this from users, I haven't.
>  > AFAIK we're doing this because it *might* cause a problem. Bear in mind
>  > that link is the preferred performance option, not copy. So AFAICS we're
>  > tuning a secondary option on one specific port, without it being a
>  > raised issue and in an area of code that will be superceded in the next
>  > release.
>  >
>  > So further embellishments would be a long way down my own priority list,
>  > putting it politely. Yet I have no objections to the suggestion overall;
>  > we have done that already for alter tablespace.
>
> Don't have much to add to the whether/now/later question of providing
> a "cp" replacement, but I guess the existing command-line options and
> documentation wouldn't have to change with our own "cp" replacement
> while the newly proposed '-h' and '-p' would become moot then, right?
>
> Regards,
> Martin

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: pgstandby.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/pgstandby.sgml,v
retrieving revision 2.5
diff -c -r2.5 pgstandby.sgml
*** pgstandby.sgml    7 May 2008 18:48:40 -0000    2.5
--- pgstandby.sgml    15 Dec 2008 22:04:09 -0000
***************
*** 295,301 ****
    </itemizedlist>

    <para>
!    Since the Windows example uses <literal>copy</> at both ends, either
     or both servers might be accessing the archive directory across the
     network.
    </para>
--- 295,310 ----
    </itemizedlist>

    <para>
!    The <literal>copy</> command on Windows sets the final file size
!    before the file is completely copied, which would ordinarly confuse
!    <application>pg_standby</application>.  Therefore
!    <application>pg_standby</application> waits <literal>sleeptime</>
!    seconds once it sees the proper file size.  GNUWin32's <literal>cp</>
!    sets the file size only after the file copy is complete.
!   </para>
!
!   <para>
!    Using the Since the Windows example uses <literal>copy</> at both ends, either
     or both servers might be accessing the archive directory across the
     network.
    </para>

Re: [PATCHES] odd output in restore mode

From
Bruce Momjian
Date:
Magnus Hagander wrote:
> Bruce Momjian wrote:
> > Martin Zaun wrote:
> >> 4. Issue: missing break in switch, silent override of '-l' argument?
> >>
> >> This behaviour has been in there before and is not addresses by the
> >> patch: The user-selected Win32 "mklink" command mode is never applied
> >> due to a missing 'break' in CustomizableInitialize():
> >>
> >>      switch (restoreCommandType)
> >>      {
> >>          case RESTORE_COMMAND_WIN32_MKLINK:
> >>              SET_RESTORE_COMMAND("mklink", WALFilePath, xlogFilePath);
> >>          case RESTORE_COMMAND_WIN32_COPY:
> >>              SET_RESTORE_COMMAND("copy", WALFilePath, xlogFilePath);
> >>              break;
> > 
> > I have added the missing 'break' to CVS HEAD;  thanks.
> 
> Why no backpatch to 8.3? Seems like a clear bugfix to me.

I knew that was going to be asked.  At this point I am pulling comments
from rejected patches into CVS commits;  these are not even submitted
patches.  I am not comfortable backpatching anything when using that
system because obviously no one else even cared enough to submit a patch
for it, let alone test it.  If someone wants to batckpatch this or
submit a patch to be backpatched, that is fine.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [PATCHES] odd output in restore mode

From
Bruce Momjian
Date:
Tom Lane wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> > Martin Zaun wrote:
> >> With these avenues to be explored, can the pg_standby patch on the
> >> CommitFest wiki be moved to the "Returned with Feedback" section?
> 
> > Yes, I think we can conclude that we don't want this patch as it is. 
> > Instead, we want a documentation patch that describes the problem, 
> > mentioning that GNU cp is safe, or you can use the copy+rename trick.
> 
> Right, after which we remove the presently hacked-in delay.
> 
> I've updated the commitfest page accordingly.

I have documented the sleep() call and that GNU cp is safe, but did not
remove the delay, nor mention copy+rename.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [PATCHES] odd output in restore mode

From
Simon Riggs
Date:
On Mon, 2008-12-15 at 17:10 -0500, Bruce Momjian wrote:
> > 
> > Why no backpatch to 8.3? Seems like a clear bugfix to me.
> 
> I knew that was going to be asked. 

8.3 is really where this is needed. 8.4 has almost no need of this.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support