Re: Incomplete docs for restore_command for hot standby - Mailing list pgsql-bugs
From | Markus Bertheau |
---|---|
Subject | Re: Incomplete docs for restore_command for hot standby |
Date | |
Msg-id | 684362e10802250356w1fe820f8i2d5207801c18daf0@mail.gmail.com Whole thread Raw |
In response to | Re: Incomplete docs for restore_command for hot standby (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: [PATCHES] Incomplete docs for restore_command for
hot standby
Re: Incomplete docs for restore_command for hot standby Re: [PATCHES] Incomplete docs for restore_command for hot standby |
List | pgsql-bugs |
2008/2/22, Simon Riggs <simon@2ndquadrant.com>: > On Thu, 2008-02-21 at 08:01 +0600, Markus Bertheau wrote: > > > > Section 24.3.3.1 states about restore_command: > > > > "The command will be asked for file names that are not present in the > > archive; it must return nonzero when so asked." > > > > Section 24.4.1 further states: > > > > "The magic that makes the two loosely coupled servers work together is > > simply a restore_command used on the standby that waits for the next > > WAL file to become available from the primary." > > > > It is not clear from the first paragraph, whether the non-existing > > file that restore_command is being asked for is a not-yet-generated > > WAL file or something different. If it was a not-yet-generated WAL > > file, restore_command for replication would have to wait for it to > > appear. If it was something different, restore_command for replication > > would have to return an error right away. (Because else it would hang > > indefinitely, waiting for a file that is not going to appear). Yet I > > couldn't find hints in the documentation as to how these two cases can > > be detected by restore_command, i.e. how restore_command should tell a > > request for a WAL file from a request for a non-WAL file. > > > The two sentences aren't mutually exclusive, especially when you > consider they are discussing two different use cases. Why not read up on > pg_standby anyway? I read about pg_standby, but this is not about solving a particular problem but about missing information in the docs. > > Practice (http://archives.postgresql.org/sydpug/2006-10/msg00001.php) > > shows that this is a problem, and people use unproved heuristics > > ('history' substring in the requested file name). > > > Old email written during beta. Read at your own peril. The email may be old, but the problem at hand is still relevant. > > Additionally, 24.3.3 contains slightly misleading information: > > > > "It is important that the command return nonzero exit status on > > failure. The command will be asked for log files that are not present > > in the archive; it must return nonzero when so asked. This is not an > > error condition." > > > > This suggests that all non-existing files that restore_command will be > > asked for are log files. One could therefore reasonably assume that > > restore_command for replication should wait on all non-existing files. > > 24.3.3.1 later corrects this by stating that not only log files may be > > requested, but nevertheless. > > > If you have some suggested changes, I'd be happy to hear them. > > Probably additions are better than just changes though. What about this: *** a/doc/src/sgml/backup.sgml --- b/doc/src/sgml/backup.sgml *************** *** 1001,1011 **** restore_command = 'cp /mnt/server/archivedir/%f %p' <para> It is important that the command return nonzero exit status on failure. ! The command <emphasis>will</> be asked for log files that are not present ! in the archive; it must return nonzero when so asked. This is not an ! error condition. Be aware also that the base name of the <literal>%p</> ! path will be different from <literal>%f</>; do not expect them to be ! interchangeable. </para> <para> --- 1001,1011 ---- <para> It is important that the command return nonzero exit status on failure. ! The command <emphasis>will</> be asked for log and other files that are ! not present in the archive; it must return nonzero when so asked. This is ! not an error condition. Be aware also that the base name of the ! <literal>%p</> path will be different from <literal>%f</>; do not expect ! them to be interchangeable. </para> <para> *************** *** 1576,1594 **** archive_command = 'local_backup_script.sh' <para> The magic that makes the two loosely coupled servers work together is ! simply a <varname>restore_command</> used on the standby that waits ! for the next WAL file to become available from the primary. The ! <varname>restore_command</> is specified in the <filename>recovery.conf</> file on the standby server. Normal recovery processing would request a file from the WAL archive, reporting failure if the file was unavailable. For standby processing it is normal for ! the next file to be unavailable, so we must be patient and wait for ! it to appear. A waiting <varname>restore_command</> can be written as ! a custom script that loops after polling for the existence of the next ! WAL file. There must also be some way to trigger failover, which should ! interrupt the <varname>restore_command</>, break the loop and return ! a file-not-found error to the standby server. This ends recovery and ! the standby will then come up as a normal server. </para> <para> --- 1576,1596 ---- <para> The magic that makes the two loosely coupled servers work together is ! simply a <varname>restore_command</> used on the standby that, when asked ! for the a WAL file, waits for it to become available from the primary. ! The <varname>restore_command</> is specified in the <filename>recovery.conf</> file on the standby server. Normal recovery processing would request a file from the WAL archive, reporting failure if the file was unavailable. For standby processing it is normal for ! the next WAL file to be unavailable, so we must be patient and wait for ! it to appear. For non-WAL files though the script must still report ! failure. WAL files can be distinguished from non-WAL files by FIXME. A ! waiting <varname>restore_command</> can be written as a custom script that ! loops after polling for the existence of the next WAL file. There must ! also be some way to trigger failover, which should interrupt the ! <varname>restore_command</>, break the loop and return a file-not-found ! error to the standby server. This ends recovery and the standby will then ! come up as a normal server. </para> <para> The FIXME of course needs replacement by someone in the know. Markus Bertheau Blog: http://www.bluetwanger.de/blog/
pgsql-bugs by date: