Re: Incomplete docs for restore_command for hot standby - Mailing list pgsql-bugs

From Bruce Momjian
Subject Re: Incomplete docs for restore_command for hot standby
Date
Msg-id 200803040332.m243Wvu01187@momjian.us
Whole thread Raw
In response to Re: Incomplete docs for restore_command for hot standby  ("Markus Bertheau" <mbertheau.pg@googlemail.com>)
List pgsql-bugs
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


Markus Bertheau wrote:
> 2008/2/22, Simon Riggs <simon@2ndquadrant.com>:
> > On Thu, 2008-02-21 at 08:01 +0600, Markus Bertheau wrote:
> >  >
> >  > Section 24.3.3.1 states about restore_command:
> >  >
> >  > "The command will be asked for file names that are not present in the
> >  > archive; it must return nonzero when so asked."
> >  >
> >  > Section 24.4.1 further states:
> >  >
> >  > "The magic that makes the two loosely coupled servers work together is
> >  > simply a restore_command used on the standby that waits for the next
> >  > WAL file to become available from the primary."
> >  >
> >  > It is not clear from the first paragraph, whether the non-existing
> >  > file that restore_command is being asked for is a not-yet-generated
> >  > WAL file or something different. If it was a not-yet-generated WAL
> >  > file, restore_command for replication would have to wait for it to
> >  > appear. If it was something different, restore_command for replication
> >  > would have to return an error right away. (Because else it would hang
> >  > indefinitely, waiting for a file that is not going to appear). Yet I
> >  > couldn't find hints in the documentation as to how these two cases can
> >  > be detected by restore_command, i.e. how restore_command should tell a
> >  > request for a WAL file from a request for a non-WAL file.
> >
> >
> > The two sentences aren't mutually exclusive, especially when you
> >  consider they are discussing two different use cases. Why not read up on
> >  pg_standby anyway?
>
> I read about pg_standby, but this is not about solving a particular problem but
> about missing information in the docs.
>
> >  > Practice (http://archives.postgresql.org/sydpug/2006-10/msg00001.php)
> >  > shows that this is a problem, and people use unproved heuristics
> >  > ('history' substring in the requested file name).
> >
> >
> > Old email written during beta. Read at your own peril.
>
> The email may be old, but the problem at hand is still relevant.
>
> >  > Additionally, 24.3.3 contains slightly misleading information:
> >  >
> >  > "It is important that the command return nonzero exit status on
> >  > failure. The command will be asked for log files that are not present
> >  > in the archive; it must return nonzero when so asked. This is not an
> >  > error condition."
> >  >
> >  > This suggests that all non-existing files that restore_command will be
> >  > asked for are log files. One could therefore reasonably assume that
> >  > restore_command for replication should wait on all non-existing files.
> >  > 24.3.3.1 later corrects this by stating that not only log files may be
> >  > requested, but nevertheless.
> >
> >
> > If you have some suggested changes, I'd be happy to hear them.
> >
> >  Probably additions are better than just changes though.
>
> What about this:
>
> *** a/doc/src/sgml/backup.sgml
> --- b/doc/src/sgml/backup.sgml
> ***************
> *** 1001,1011 **** restore_command = 'cp /mnt/server/archivedir/%f %p'
>
>      <para>
>       It is important that the command return nonzero exit status on failure.
> !     The command <emphasis>will</> be asked for log files that are not present
> !     in the archive; it must return nonzero when so asked.  This is not an
> !     error condition.  Be aware also that the base name of the <literal>%p</>
> !     path will be different from <literal>%f</>; do not expect them to be
> !     interchangeable.
>      </para>
>
>      <para>
> --- 1001,1011 ----
>
>      <para>
>       It is important that the command return nonzero exit status on failure.
> !     The command <emphasis>will</> be asked for log and other files that are
> !     not present in the archive; it must return nonzero when so asked.  This is
> !     not an error condition.  Be aware also that the base name of the
> !     <literal>%p</> path will be different from <literal>%f</>; do not expect
> !     them to be interchangeable.
>      </para>
>
>      <para>
> ***************
> *** 1576,1594 **** archive_command = 'local_backup_script.sh'
>
>      <para>
>       The magic that makes the two loosely coupled servers work together is
> !     simply a <varname>restore_command</> used on the standby that waits
> !     for the next WAL file to become available from the primary. The
> !     <varname>restore_command</> is specified in the
>       <filename>recovery.conf</> file on the standby server. Normal recovery
>       processing would request a file from the WAL archive, reporting failure
>       if the file was unavailable.  For standby processing it is normal for
> !     the next file to be unavailable, so we must be patient and wait for
> !     it to appear. A waiting <varname>restore_command</> can be written as
> !     a custom script that loops after polling for the existence of the next
> !     WAL file. There must also be some way to trigger failover, which should
> !     interrupt the <varname>restore_command</>, break the loop and return
> !     a file-not-found error to the standby server. This ends recovery and
> !     the standby will then come up as a normal server.
>      </para>
>
>      <para>
> --- 1576,1596 ----
>
>      <para>
>       The magic that makes the two loosely coupled servers work together is
> !     simply a <varname>restore_command</> used on the standby that, when asked
> !     for the a WAL file, waits for it to become available from the primary.
> !     The <varname>restore_command</> is specified in the
>       <filename>recovery.conf</> file on the standby server. Normal recovery
>       processing would request a file from the WAL archive, reporting failure
>       if the file was unavailable.  For standby processing it is normal for
> !     the next WAL file to be unavailable, so we must be patient and wait for
> !     it to appear. For non-WAL files though the script must still report
> !     failure. WAL files can be distinguished from non-WAL files by FIXME. A
> !     waiting <varname>restore_command</> can be written as a custom script that
> !     loops after polling for the existence of the next WAL file. There must
> !     also be some way to trigger failover, which should interrupt the
> !     <varname>restore_command</>, break the loop and return a file-not-found
> !     error to the standby server. This ends recovery and the standby will then
> !     come up as a normal server.
>      </para>
>
>      <para>
>
> The FIXME of course needs replacement by someone in the know.
>
> Markus Bertheau
> Blog: http://www.bluetwanger.de/blog/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

pgsql-bugs by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: BUG #3983: pgxs files missing from binary installation
Next
From: "Michael G. Leahy"
Date:
Subject: Re: BUG #3983: pgxs files missing from binary installation