Thread: PITR restore hot standby
hello, I am trying to setup a "hot standby" on a second machine. I have created a "recovery.conf" file and started a restore with logs from the primary machine. everything was OK. now a have new transaction logs generated by the primary machine and I want to "play" them on the secondary one. I have stopped postgres, recreated "recovery.conf", started postgres and I get the following error: ---------------------------------------------------------------- LOG: database system was shut down at 2005-05-23 05:19:34 PDT LOG: starting archive recovery LOG: restore_command = "/usr/cbmp/core/bin/restore_pg_tlog %f %p" LOG: restored log file "0000000100000008000000C4" from archive LOG: invalid resource manager ID 53 at 8/C4FFFEF8 LOG: invalid primary checkpoint record LOG: restored log file "0000000100000008000000C4" from archive LOG: invalid resource manager ID 52 at 8/C4FFFEBC LOG: invalid secondary checkpoint record PANIC: could not locate a valid checkpoint record LOG: startup process (PID 18297) was terminated by signal 6 LOG: aborting startup due to startup process failure LOG: logger shutting down ---------------------------------------------------------------- what is the procedure for creating a "hot standby" (continuously feeding a series of WAL files created by the primary machine into the secondary one) ? thanks, Razvan Radu
Postgres General <pgsql-general@list.coretech.ro> writes: > what is the procedure for creating a "hot standby" (continuously > feeding a series of WAL files created by the primary machine into the > secondary one) ? There currently isn't one. I think someone may be working on it. -Doug
On Mon, 2005-05-23 at 10:14, Douglas McNaught wrote: > Postgres General <pgsql-general@list.coretech.ro> writes: > > > what is the procedure for creating a "hot standby" (continuously > > feeding a series of WAL files created by the primary machine into the > > secondary one) ? > > There currently isn't one. I think someone may be working on it. Slony 1.1 should do pretty much that.
On Mon, 2005-05-23 at 16:17 +0300, Postgres General wrote: > I am trying to setup a "hot standby" on a second machine. > I have created a "recovery.conf" file and started a restore with logs > from the primary machine. everything was OK. > > now a have new transaction logs generated by the primary machine and I > want to "play" them on the secondary one. I have stopped postgres, > recreated "recovery.conf", started postgres and I get the following error: > > ---------------------------------------------------------------- > LOG: database system was shut down at 2005-05-23 05:19:34 PDT > LOG: starting archive recovery > LOG: restore_command = "/usr/cbmp/core/bin/restore_pg_tlog %f %p" > LOG: restored log file "0000000100000008000000C4" from archive > LOG: invalid resource manager ID 53 at 8/C4FFFEF8 > LOG: invalid primary checkpoint record > LOG: restored log file "0000000100000008000000C4" from archive > LOG: invalid resource manager ID 52 at 8/C4FFFEBC > LOG: invalid secondary checkpoint record > PANIC: could not locate a valid checkpoint record > LOG: startup process (PID 18297) was terminated by signal 6 > LOG: aborting startup due to startup process failure > LOG: logger shutting down > ---------------------------------------------------------------- > > what is the procedure for creating a "hot standby" (continuously feeding > a series of WAL files created by the primary machine into the secondary > one) ? Sounds like you've tried to do two recoveries on the same server. The control file doesn't match the log files you've provided to it in your script/program, so the recovery has not been setup correctly. This doesn't look like a bug... The backup system doesn't know about the primary, so just let the log files stream in and it will work. Once the recovery terminates normally, you cannot restart it. You need to perform the base backup again and then begin streaming the files through once more. If you have more information, perhaps it would be possible to say more. Professional support is available. Some enhancements should be available in 8.1, as well as further documentation. Best Regards, Simon Riggs http://www.2ndquadrant.com
thanks for your reply, I did not presume that it is a bug, I am interested to know how can I setup a "hot standby" (if is some more documentation available). ===QUOTE FROM DOCS=== If we continuously feed the series of WAL files to another machine that has been loaded with the same base backup file, we have a "hot standby" system: at any point we can bring up the second machine and it will have a nearly-current copy of the database. ===QUOTE FROM DOCS=== I have on average about 30-50GB of WAL files (gzip compressed) and a complete restore takes about 1-3 hours. From your answer I understand that once a restore is complete it can not be "resumed", the only way to add a new WAL file is to restore the original backup and to "play" all WAL files (including the new one). In this case do you know what is the timeout on the "restore_command" command (or where I can find it in documentation). What will happen if I make the restore command wait for a new WAL file to become available ? And, most important, how do I distinguish between WAL files that should be in the archive from those that should not be in the archive, because I see in the documentation that "The command /will/ be asked for log files that are not present in the archive; it must return nonzero when so asked" ? Are timelines helping me in this case ? thanks again, Razvan Radu CORE Technologies Simon Riggs wrote: >On Mon, 2005-05-23 at 16:17 +0300, Postgres General wrote: > > >>I am trying to setup a "hot standby" on a second machine. >>I have created a "recovery.conf" file and started a restore with logs >>from the primary machine. everything was OK. >> >>now a have new transaction logs generated by the primary machine and I >>want to "play" them on the secondary one. I have stopped postgres, >>recreated "recovery.conf", started postgres and I get the following error: >> >>---------------------------------------------------------------- >>LOG: database system was shut down at 2005-05-23 05:19:34 PDT >>LOG: starting archive recovery >>LOG: restore_command = "/usr/cbmp/core/bin/restore_pg_tlog %f %p" >>LOG: restored log file "0000000100000008000000C4" from archive >>LOG: invalid resource manager ID 53 at 8/C4FFFEF8 >>LOG: invalid primary checkpoint record >>LOG: restored log file "0000000100000008000000C4" from archive >>LOG: invalid resource manager ID 52 at 8/C4FFFEBC >>LOG: invalid secondary checkpoint record >>PANIC: could not locate a valid checkpoint record >>LOG: startup process (PID 18297) was terminated by signal 6 >>LOG: aborting startup due to startup process failure >>LOG: logger shutting down >>---------------------------------------------------------------- >> >>what is the procedure for creating a "hot standby" (continuously feeding >>a series of WAL files created by the primary machine into the secondary >>one) ? >> >> > >Sounds like you've tried to do two recoveries on the same server. The >control file doesn't match the log files you've provided to it in your >script/program, so the recovery has not been setup correctly. This >doesn't look like a bug... > >The backup system doesn't know about the primary, so just let the log >files stream in and it will work. Once the recovery terminates normally, >you cannot restart it. You need to perform the base backup again and >then begin streaming the files through once more. > >If you have more information, perhaps it would be possible to say more. > >Professional support is available. Some enhancements should be available >in 8.1, as well as further documentation. > >Best Regards, Simon Riggs >http://www.2ndquadrant.com > > > >
On Thu, 2005-06-02 at 01:13 +0300, pgsql-general@list.coretech.ro wrote: > In this case do you know what is the timeout on the "restore_command" > command (or where I can find it in documentation). There isn't one. You decide when restore_command returns. > What will happen if I make the restore command wait for a new WAL file > to become available ? You will succeed. PostgreSQL doesn't know anything about what your program does. The design allows for various delays that could be caused by: - hot standby waits - extended waits for tape mounts by human operators etc > And, most important, how do I distinguish between WAL files that should > be in the archive from those that should not be in the archive, because > I see in the documentation that "The command /will/ be asked for log > files that are not present in the archive; it must return nonzero when > so asked" ? You program restore_command to wait, but if an event occurs that triggers failover (you decide...) you return nonzero. > Are timelines helping me in this case ? Not really. They are for when you do a recovery to a specific target, rather than a recovery to end of logs. Best Regards, Simon Riggs