Thread: recovery after interrupt in the middle of a previous recovery

recovery after interrupt in the middle of a previous recovery

From
Or Kroyzer
Date:
Hello,
I am using postgres 8.3.1, and have implemented warm standby very much like the one described in the high availability documentation on this site.
It seems to work well except for this problem: I've had a case where the postgresql server was interrupted while in recovery (I think it was a user interrupt, the log sais:

 . LOG:  received fast shutdown request
LOG:  archive recovery complete
FATAL:  terminating connection due to administrator command
LOG:  startup process (PID 6033) exited with exit code 1
LOG:  aborting startup due to startup process failure

And after that, pg doesn't go through the recovery script provided in recovery.conf, and doesn't manage to come up. The log sais:

LOG:  database system was interrupted while in recovery at log time 2010-05-26 02:00:03 IDT
HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
LOG:  could not open file "pg_xlog/000000CA0000000A0000006D" (log file 10, segment 109): No such file or directory
LOG:  invalid primary checkpoint record
LOG:  could not open file "pg_xlog/000000CA0000000A0000006D" (log file 10, segment 109): No such file or directory
LOG:  invalid secondary checkpoint record
PANIC:  could not locate a valid checkpoint record
LOG:  startup process (PID 8081) was terminated by signal 6: Aborted
LOG:  aborting startup due to startup process failure

 But usually it just goes into my recovery script and there I provide the WAL archive files and put them in the pg_xlog directory.

Do you know if I have to configure someplace what script to use when PG is recovering from a failed recovery? Or is this a bug?

Thanks!

Re: recovery after interrupt in the middle of a previous recovery

From
Tom Lane
Date:
Or Kroyzer <orkroyzer@gmail.com> writes:
> I am using postgres 8.3.1,

... you really ought to be using 8.3.something-recent ...

> and have implemented warm standby very much like
> the one described in the high availability documentation on this site.
> It seems to work well except for this problem: I've had a case where the
> postgresql server was interrupted while in recovery (I think it was a user
> interrupt, the log sais:

>  . LOG:  received fast shutdown request
> LOG:  archive recovery complete
> FATAL:  terminating connection due to administrator command
> LOG:  startup process (PID 6033) exited with exit code 1
> LOG:  aborting startup due to startup process failure

> And after that, pg doesn't go through the recovery script provided in
> recovery.conf, and doesn't manage to come up. The log sais:

> LOG:  database system was interrupted while in recovery at log time
> 2010-05-26 02:00:03 IDT
> HINT:  If this has occurred more than once some data might be corrupted and
> you might need to choose an earlier recovery target.
> LOG:  could not open file "pg_xlog/000000CA0000000A0000006D" (log file 10,
> segment 109): No such file or directory
> LOG:  invalid primary checkpoint record
> LOG:  could not open file "pg_xlog/000000CA0000000A0000006D" (log file 10,
> segment 109): No such file or directory
> LOG:  invalid secondary checkpoint record
> PANIC:  could not locate a valid checkpoint record
> LOG:  startup process (PID 8081) was terminated by signal 6: Aborted
> LOG:  aborting startup due to startup process failure

Hmm.  Try putting back your recovery.conf file --- it will have been
renamed at the point where "archive recovery complete" was printed.
This example suggests that we might be doing that too early.

            regards, tom lane

Re: recovery after interrupt in the middle of a previous recovery

From
Or Kroyzer
Date:
Thanks.

2010/5/26 Tom Lane <tgl@sss.pgh.pa.us>
Or Kroyzer <orkroyzer@gmail.com> writes:
> I am using postgres 8.3.1,

... you really ought to be using 8.3.something-recent ...

> and have implemented warm standby very much like
> the one described in the high availability documentation on this site.
> It seems to work well except for this problem: I've had a case where the
> postgresql server was interrupted while in recovery (I think it was a user
> interrupt, the log sais:

>  . LOG:  received fast shutdown request
> LOG:  archive recovery complete
> FATAL:  terminating connection due to administrator command
> LOG:  startup process (PID 6033) exited with exit code 1
> LOG:  aborting startup due to startup process failure

> And after that, pg doesn't go through the recovery script provided in
> recovery.conf, and doesn't manage to come up. The log sais:

> LOG:  database system was interrupted while in recovery at log time
> 2010-05-26 02:00:03 IDT
> HINT:  If this has occurred more than once some data might be corrupted and
> you might need to choose an earlier recovery target.
> LOG:  could not open file "pg_xlog/000000CA0000000A0000006D" (log file 10,
> segment 109): No such file or directory
> LOG:  invalid primary checkpoint record
> LOG:  could not open file "pg_xlog/000000CA0000000A0000006D" (log file 10,
> segment 109): No such file or directory
> LOG:  invalid secondary checkpoint record
> PANIC:  could not locate a valid checkpoint record
> LOG:  startup process (PID 8081) was terminated by signal 6: Aborted
> LOG:  aborting startup due to startup process failure

Hmm.  Try putting back your recovery.conf file --- it will have been
renamed at the point where "archive recovery complete" was printed.
This example suggests that we might be doing that too early.

                       regards, tom lane