Re: Incomplete docs for restore_command for hot standby - Mailing list pgsql-bugs

From Markus Bertheau
Subject Re: Incomplete docs for restore_command for hot standby
Date
Msg-id 684362e10802250356w1fe820f8i2d5207801c18daf0@mail.gmail.com
Whole thread Raw
In response to Re: Incomplete docs for restore_command for hot standby  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: [PATCHES] Incomplete docs for restore_command for hot standby
Re: Incomplete docs for restore_command for hot standby
Re: [PATCHES] Incomplete docs for restore_command for hot standby
List pgsql-bugs
2008/2/22, Simon Riggs <simon@2ndquadrant.com>:
> On Thu, 2008-02-21 at 08:01 +0600, Markus Bertheau wrote:
>  >
>  > Section 24.3.3.1 states about restore_command:
>  >
>  > "The command will be asked for file names that are not present in the
>  > archive; it must return nonzero when so asked."
>  >
>  > Section 24.4.1 further states:
>  >
>  > "The magic that makes the two loosely coupled servers work together is
>  > simply a restore_command used on the standby that waits for the next
>  > WAL file to become available from the primary."
>  >
>  > It is not clear from the first paragraph, whether the non-existing
>  > file that restore_command is being asked for is a not-yet-generated
>  > WAL file or something different. If it was a not-yet-generated WAL
>  > file, restore_command for replication would have to wait for it to
>  > appear. If it was something different, restore_command for replication
>  > would have to return an error right away. (Because else it would hang
>  > indefinitely, waiting for a file that is not going to appear). Yet I
>  > couldn't find hints in the documentation as to how these two cases can
>  > be detected by restore_command, i.e. how restore_command should tell a
>  > request for a WAL file from a request for a non-WAL file.
>
>
> The two sentences aren't mutually exclusive, especially when you
>  consider they are discussing two different use cases. Why not read up on
>  pg_standby anyway?

I read about pg_standby, but this is not about solving a particular problem but
about missing information in the docs.

>  > Practice (http://archives.postgresql.org/sydpug/2006-10/msg00001.php)
>  > shows that this is a problem, and people use unproved heuristics
>  > ('history' substring in the requested file name).
>
>
> Old email written during beta. Read at your own peril.

The email may be old, but the problem at hand is still relevant.

>  > Additionally, 24.3.3 contains slightly misleading information:
>  >
>  > "It is important that the command return nonzero exit status on
>  > failure. The command will be asked for log files that are not present
>  > in the archive; it must return nonzero when so asked. This is not an
>  > error condition."
>  >
>  > This suggests that all non-existing files that restore_command will be
>  > asked for are log files. One could therefore reasonably assume that
>  > restore_command for replication should wait on all non-existing files.
>  > 24.3.3.1 later corrects this by stating that not only log files may be
>  > requested, but nevertheless.
>
>
> If you have some suggested changes, I'd be happy to hear them.
>
>  Probably additions are better than just changes though.

What about this:

*** a/doc/src/sgml/backup.sgml
--- b/doc/src/sgml/backup.sgml
***************
*** 1001,1011 **** restore_command = 'cp /mnt/server/archivedir/%f %p'

     <para>
      It is important that the command return nonzero exit status on failure.
!     The command <emphasis>will</> be asked for log files that are not present
!     in the archive; it must return nonzero when so asked.  This is not an
!     error condition.  Be aware also that the base name of the <literal>%p</>
!     path will be different from <literal>%f</>; do not expect them to be
!     interchangeable.
     </para>

     <para>
--- 1001,1011 ----

     <para>
      It is important that the command return nonzero exit status on failure.
!     The command <emphasis>will</> be asked for log and other files that are
!     not present in the archive; it must return nonzero when so asked.  This is
!     not an error condition.  Be aware also that the base name of the
!     <literal>%p</> path will be different from <literal>%f</>; do not expect
!     them to be interchangeable.
     </para>

     <para>
***************
*** 1576,1594 **** archive_command = 'local_backup_script.sh'

     <para>
      The magic that makes the two loosely coupled servers work together is
!     simply a <varname>restore_command</> used on the standby that waits
!     for the next WAL file to become available from the primary. The
!     <varname>restore_command</> is specified in the
      <filename>recovery.conf</> file on the standby server. Normal recovery
      processing would request a file from the WAL archive, reporting failure
      if the file was unavailable.  For standby processing it is normal for
!     the next file to be unavailable, so we must be patient and wait for
!     it to appear. A waiting <varname>restore_command</> can be written as
!     a custom script that loops after polling for the existence of the next
!     WAL file. There must also be some way to trigger failover, which should
!     interrupt the <varname>restore_command</>, break the loop and return
!     a file-not-found error to the standby server. This ends recovery and
!     the standby will then come up as a normal server.
     </para>

     <para>
--- 1576,1596 ----

     <para>
      The magic that makes the two loosely coupled servers work together is
!     simply a <varname>restore_command</> used on the standby that, when asked
!     for the a WAL file, waits for it to become available from the primary.
!     The <varname>restore_command</> is specified in the
      <filename>recovery.conf</> file on the standby server. Normal recovery
      processing would request a file from the WAL archive, reporting failure
      if the file was unavailable.  For standby processing it is normal for
!     the next WAL file to be unavailable, so we must be patient and wait for
!     it to appear. For non-WAL files though the script must still report
!     failure. WAL files can be distinguished from non-WAL files by FIXME. A
!     waiting <varname>restore_command</> can be written as a custom script that
!     loops after polling for the existence of the next WAL file. There must
!     also be some way to trigger failover, which should interrupt the
!     <varname>restore_command</>, break the loop and return a file-not-found
!     error to the standby server. This ends recovery and the standby will then
!     come up as a normal server.
     </para>

     <para>

The FIXME of course needs replacement by someone in the know.

Markus Bertheau
Blog: http://www.bluetwanger.de/blog/

pgsql-bugs by date:

Previous
From: "chandra"
Date:
Subject: BUG #3987: Not checking the password
Next
From: "Amandeep Singh"
Date:
Subject: BUG #3988: problem with installation