Thread: PITR restore hot standby

PITR restore hot standby

From
Postgres General
Date:
hello,

I am trying to setup a "hot standby" on a second machine.
I have created a "recovery.conf" file and started a restore with logs
from the primary machine. everything was OK.

now a have new transaction logs generated by the primary machine and I
want to "play" them on the secondary one. I have stopped postgres,
recreated "recovery.conf", started postgres and I get the following error:

----------------------------------------------------------------
LOG:  database system was shut down at 2005-05-23 05:19:34 PDT
LOG:  starting archive recovery
LOG:  restore_command = "/usr/cbmp/core/bin/restore_pg_tlog %f %p"
LOG:  restored log file "0000000100000008000000C4" from archive
LOG:  invalid resource manager ID 53 at 8/C4FFFEF8
LOG:  invalid primary checkpoint record
LOG:  restored log file "0000000100000008000000C4" from archive
LOG:  invalid resource manager ID 52 at 8/C4FFFEBC
LOG:  invalid secondary checkpoint record
PANIC:  could not locate a valid checkpoint record
LOG:  startup process (PID 18297) was terminated by signal 6
LOG:  aborting startup due to startup process failure
LOG:  logger shutting down
----------------------------------------------------------------

what is the procedure for creating a "hot standby" (continuously feeding
a series of WAL files created by the primary machine into the secondary
one) ?


thanks,
Razvan Radu




Re: PITR restore hot standby

From
Douglas McNaught
Date:
Postgres General <pgsql-general@list.coretech.ro> writes:

> what is the procedure for creating a "hot standby" (continuously
> feeding a series of WAL files created by the primary machine into the
> secondary one) ?

There currently isn't one.  I think someone may be working on it.

-Doug

Re: PITR restore hot standby

From
Scott Marlowe
Date:
On Mon, 2005-05-23 at 10:14, Douglas McNaught wrote:
> Postgres General <pgsql-general@list.coretech.ro> writes:
>
> > what is the procedure for creating a "hot standby" (continuously
> > feeding a series of WAL files created by the primary machine into the
> > secondary one) ?
>
> There currently isn't one.  I think someone may be working on it.


Slony 1.1 should do pretty much that.

Re: PITR restore hot standby

From
Simon Riggs
Date:
On Mon, 2005-05-23 at 16:17 +0300, Postgres General wrote:
> I am trying to setup a "hot standby" on a second machine.
> I have created a "recovery.conf" file and started a restore with logs
> from the primary machine. everything was OK.
>
> now a have new transaction logs generated by the primary machine and I
> want to "play" them on the secondary one. I have stopped postgres,
> recreated "recovery.conf", started postgres and I get the following error:
>
> ----------------------------------------------------------------
> LOG:  database system was shut down at 2005-05-23 05:19:34 PDT
> LOG:  starting archive recovery
> LOG:  restore_command = "/usr/cbmp/core/bin/restore_pg_tlog %f %p"
> LOG:  restored log file "0000000100000008000000C4" from archive
> LOG:  invalid resource manager ID 53 at 8/C4FFFEF8
> LOG:  invalid primary checkpoint record
> LOG:  restored log file "0000000100000008000000C4" from archive
> LOG:  invalid resource manager ID 52 at 8/C4FFFEBC
> LOG:  invalid secondary checkpoint record
> PANIC:  could not locate a valid checkpoint record
> LOG:  startup process (PID 18297) was terminated by signal 6
> LOG:  aborting startup due to startup process failure
> LOG:  logger shutting down
> ----------------------------------------------------------------
>
> what is the procedure for creating a "hot standby" (continuously feeding
> a series of WAL files created by the primary machine into the secondary
> one) ?

Sounds like you've tried to do two recoveries on the same server. The
control file doesn't match the log files you've provided to it in your
script/program, so the recovery has not been setup correctly. This
doesn't look like a bug...

The backup system doesn't know about the primary, so just let the log
files stream in and it will work. Once the recovery terminates normally,
you cannot restart it. You need to perform the base backup again and
then begin streaming the files through once more.

If you have more information, perhaps it would be possible to say more.

Professional support is available. Some enhancements should be available
in 8.1, as well as further documentation.

Best Regards, Simon Riggs
http://www.2ndquadrant.com



Re: PITR restore hot standby

From
"pgsql-general@list.coretech.ro"
Date:
thanks for your reply,

I did not presume that it is a bug, I am interested to know how can I
setup a "hot standby" (if is some more documentation available).

===QUOTE FROM DOCS===
If we continuously feed the series of WAL files to another machine that
has been loaded with the same base backup file, we have a "hot standby"
system: at any point we can bring up the second machine and it will have
a nearly-current copy of the database.
===QUOTE FROM DOCS===

I have on average about 30-50GB of WAL files (gzip compressed) and a
complete restore takes about 1-3 hours. From your answer I understand
that once a restore is complete it can not be "resumed", the only way to
add a new WAL file is to restore the original backup and to "play" all
WAL files (including the new one).

In this case do you know what is the timeout on the "restore_command"
command (or where I can find it in documentation).
What will happen if I make the restore command wait for a new WAL file
to become available ?
And, most important, how do I distinguish between WAL files that should
be in the archive from those that should not be in the archive, because
I see in the documentation that "The command /will/ be asked for log
files that are not present in the archive; it must return nonzero when
so asked" ?

Are timelines helping me in this case ?

thanks again,
Razvan Radu
CORE Technologies

Simon Riggs wrote:

>On Mon, 2005-05-23 at 16:17 +0300, Postgres General wrote:
>
>
>>I am trying to setup a "hot standby" on a second machine.
>>I have created a "recovery.conf" file and started a restore with logs
>>from the primary machine. everything was OK.
>>
>>now a have new transaction logs generated by the primary machine and I
>>want to "play" them on the secondary one. I have stopped postgres,
>>recreated "recovery.conf", started postgres and I get the following error:
>>
>>----------------------------------------------------------------
>>LOG:  database system was shut down at 2005-05-23 05:19:34 PDT
>>LOG:  starting archive recovery
>>LOG:  restore_command = "/usr/cbmp/core/bin/restore_pg_tlog %f %p"
>>LOG:  restored log file "0000000100000008000000C4" from archive
>>LOG:  invalid resource manager ID 53 at 8/C4FFFEF8
>>LOG:  invalid primary checkpoint record
>>LOG:  restored log file "0000000100000008000000C4" from archive
>>LOG:  invalid resource manager ID 52 at 8/C4FFFEBC
>>LOG:  invalid secondary checkpoint record
>>PANIC:  could not locate a valid checkpoint record
>>LOG:  startup process (PID 18297) was terminated by signal 6
>>LOG:  aborting startup due to startup process failure
>>LOG:  logger shutting down
>>----------------------------------------------------------------
>>
>>what is the procedure for creating a "hot standby" (continuously feeding
>>a series of WAL files created by the primary machine into the secondary
>>one) ?
>>
>>
>
>Sounds like you've tried to do two recoveries on the same server. The
>control file doesn't match the log files you've provided to it in your
>script/program, so the recovery has not been setup correctly. This
>doesn't look like a bug...
>
>The backup system doesn't know about the primary, so just let the log
>files stream in and it will work. Once the recovery terminates normally,
>you cannot restart it. You need to perform the base backup again and
>then begin streaming the files through once more.
>
>If you have more information, perhaps it would be possible to say more.
>
>Professional support is available. Some enhancements should be available
>in 8.1, as well as further documentation.
>
>Best Regards, Simon Riggs
>http://www.2ndquadrant.com
>
>
>
>



Re: PITR restore hot standby

From
Simon Riggs
Date:
On Thu, 2005-06-02 at 01:13 +0300, pgsql-general@list.coretech.ro wrote:
> In this case do you know what is the timeout on the "restore_command"
> command (or where I can find it in documentation).

There isn't one. You decide when restore_command returns.

> What will happen if I make the restore command wait for a new WAL file
> to become available ?

You will succeed. PostgreSQL doesn't know anything about what your
program does. The design allows for various delays that could be caused
by:

- hot standby waits
- extended waits for tape mounts by human operators

etc

> And, most important, how do I distinguish between WAL files that should
> be in the archive from those that should not be in the archive, because
> I see in the documentation that "The command /will/ be asked for log
> files that are not present in the archive; it must return nonzero when
> so asked" ?

You program restore_command to wait, but if an event occurs that
triggers failover (you decide...) you return nonzero.

> Are timelines helping me in this case ?

Not really. They are for when you do a recovery to a specific target,
rather than a recovery to end of logs.

Best Regards, Simon Riggs