Re: [BUGS] Incomplete docs for restore_command for hot standby - Mailing list pgsql-patches
From | Bruce Momjian |
---|---|
Subject | Re: [BUGS] Incomplete docs for restore_command for hot standby |
Date | |
Msg-id | 200803032114.m23LE2706849@momjian.us Whole thread Raw |
In response to | Re: [BUGS] Incomplete docs for restore_command for hot standby ("Markus Bertheau" <mbertheau.pg@googlemail.com>) |
List | pgsql-patches |
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --------------------------------------------------------------------------- Markus Bertheau wrote: > 2008/2/22, Simon Riggs <simon@2ndquadrant.com>: > > On Thu, 2008-02-21 at 08:01 +0600, Markus Bertheau wrote: > > > > > > Section 24.3.3.1 states about restore_command: > > > > > > "The command will be asked for file names that are not present in the > > > archive; it must return nonzero when so asked." > > > > > > Section 24.4.1 further states: > > > > > > "The magic that makes the two loosely coupled servers work together is > > > simply a restore_command used on the standby that waits for the next > > > WAL file to become available from the primary." > > > > > > It is not clear from the first paragraph, whether the non-existing > > > file that restore_command is being asked for is a not-yet-generated > > > WAL file or something different. If it was a not-yet-generated WAL > > > file, restore_command for replication would have to wait for it to > > > appear. If it was something different, restore_command for replication > > > would have to return an error right away. (Because else it would hang > > > indefinitely, waiting for a file that is not going to appear). Yet I > > > couldn't find hints in the documentation as to how these two cases can > > > be detected by restore_command, i.e. how restore_command should tell a > > > request for a WAL file from a request for a non-WAL file. > > > > > > The two sentences aren't mutually exclusive, especially when you > > consider they are discussing two different use cases. Why not read up on > > pg_standby anyway? > > I read about pg_standby, but this is not about solving a particular problem but > about missing information in the docs. > > > > Practice (http://archives.postgresql.org/sydpug/2006-10/msg00001.php) > > > shows that this is a problem, and people use unproved heuristics > > > ('history' substring in the requested file name). > > > > > > Old email written during beta. Read at your own peril. > > The email may be old, but the problem at hand is still relevant. > > > > Additionally, 24.3.3 contains slightly misleading information: > > > > > > "It is important that the command return nonzero exit status on > > > failure. The command will be asked for log files that are not present > > > in the archive; it must return nonzero when so asked. This is not an > > > error condition." > > > > > > This suggests that all non-existing files that restore_command will be > > > asked for are log files. One could therefore reasonably assume that > > > restore_command for replication should wait on all non-existing files. > > > 24.3.3.1 later corrects this by stating that not only log files may be > > > requested, but nevertheless. > > > > > > If you have some suggested changes, I'd be happy to hear them. > > > > Probably additions are better than just changes though. > > What about this: > > *** a/doc/src/sgml/backup.sgml > --- b/doc/src/sgml/backup.sgml > *************** > *** 1001,1011 **** restore_command = 'cp /mnt/server/archivedir/%f %p' > > <para> > It is important that the command return nonzero exit status on failure. > ! The command <emphasis>will</> be asked for log files that are not present > ! in the archive; it must return nonzero when so asked. This is not an > ! error condition. Be aware also that the base name of the <literal>%p</> > ! path will be different from <literal>%f</>; do not expect them to be > ! interchangeable. > </para> > > <para> > --- 1001,1011 ---- > > <para> > It is important that the command return nonzero exit status on failure. > ! The command <emphasis>will</> be asked for log and other files that are > ! not present in the archive; it must return nonzero when so asked. This is > ! not an error condition. Be aware also that the base name of the > ! <literal>%p</> path will be different from <literal>%f</>; do not expect > ! them to be interchangeable. > </para> > > <para> > *************** > *** 1576,1594 **** archive_command = 'local_backup_script.sh' > > <para> > The magic that makes the two loosely coupled servers work together is > ! simply a <varname>restore_command</> used on the standby that waits > ! for the next WAL file to become available from the primary. The > ! <varname>restore_command</> is specified in the > <filename>recovery.conf</> file on the standby server. Normal recovery > processing would request a file from the WAL archive, reporting failure > if the file was unavailable. For standby processing it is normal for > ! the next file to be unavailable, so we must be patient and wait for > ! it to appear. A waiting <varname>restore_command</> can be written as > ! a custom script that loops after polling for the existence of the next > ! WAL file. There must also be some way to trigger failover, which should > ! interrupt the <varname>restore_command</>, break the loop and return > ! a file-not-found error to the standby server. This ends recovery and > ! the standby will then come up as a normal server. > </para> > > <para> > --- 1576,1596 ---- > > <para> > The magic that makes the two loosely coupled servers work together is > ! simply a <varname>restore_command</> used on the standby that, when asked > ! for the a WAL file, waits for it to become available from the primary. > ! The <varname>restore_command</> is specified in the > <filename>recovery.conf</> file on the standby server. Normal recovery > processing would request a file from the WAL archive, reporting failure > if the file was unavailable. For standby processing it is normal for > ! the next WAL file to be unavailable, so we must be patient and wait for > ! it to appear. For non-WAL files though the script must still report > ! failure. WAL files can be distinguished from non-WAL files by FIXME. A > ! waiting <varname>restore_command</> can be written as a custom script that > ! loops after polling for the existence of the next WAL file. There must > ! also be some way to trigger failover, which should interrupt the > ! <varname>restore_command</>, break the loop and return a file-not-found > ! error to the standby server. This ends recovery and the standby will then > ! come up as a normal server. > </para> > > <para> > > The FIXME of course needs replacement by someone in the know. > > Markus Bertheau > Blog: http://www.bluetwanger.de/blog/ > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-patches by date: