Thread: Postgres Hot Standby. How or when does the recovery db move recovery.conf to recovery.done?

Resending.

I have a "hot" standby. Now, if the primary fails
how do I tell the secondary that come out of recovery mode and move
the recovery.conf to recovery.done and start the db. I mean, what
error code shall I return?

If I return a non-zero error code (1), I get the following result [from
serverlog]:

====
00000001000000000000001B pg_xlog/RECOVERYXLOG
LOG:  restored log file "00000001000000000000001B" from archive
00000001000000000000001C pg_xlog/RECOVERYXLOG
[Main: Triggering Recovery!!!] <---- My script detected that it needs
to trigger recovery...
LOG:  could not open file "pg_xlog/00000001000000000000001C" (log file
0, segment 28): No such file or directory
LOG:  redo done at 0/1B000070
00000001000000000000001B pg_xlog/RECOVERYXLOG
Main: Triggering Recovery!!! <--- My script is called again and the
script says trigger recovery
PANIC:  could not open file "pg_xlog/00000001000000000000001B" (log
file 0, segment 27): No such file or directory
LOG:  startup process (PID 32167) was terminated by signal 6
LOG:  aborting startup due to startup process failure
====

This is what my script is doing:

if ( triggerRecovery() ) {
   print "Main: Triggering Recovery!!! \n";
   return 1;
}

So, the question is, on detecting that the primary is down and to
trigger recovery, what error code shall I return? Or do I have to move
the recovery.conf to recovery.done myself and restart the db?

Regards
Dhaval

On 3/21/07, Dhaval Shah <dhaval.shah.m@gmail.com> wrote:
> Resending.
>
> I have a "hot" standby. Now, if the primary fails
> how do I tell the secondary that come out of recovery mode and move
> the recovery.conf to recovery.done and start the db. I mean, what
> error code shall I return?

did you look at pg_standby utility? it has kill file mechanism that
automates this for you.

merlin

I looked at the pg_standby utility and would have liked to use it,
however there are some customer driven extraneous issues in using
that.

What I am looking at it is this:

1. I can detect that the primary has gone down and return a non-zero
for the standby to recover.

2. Since I can detect that I am out of standby mode, I can shutdown
the postgres, move the recovery.conf file to recovery.done manually.
And then restart the db.

Even if I do step 2, I still get the following in the server log:

=====
Main: Triggering Recovery!!!  <- my script is returning a non-zero code here ...

PANIC:  could not open file "pg_xlog/00000001000000000000001B" (log
file 0, segment 27): No such file or directory
LOG:  startup process (PID 32167) was terminated by signal 6
LOG:  aborting startup due to startup process failure
LOG:  database system was interrupted while in recovery at log time
2007-03-20 13:04:28 PDT
HINT:  If this has occurred more than once some data may be corrupted
and you may need to choose an earlier recovery target.
LOG:  could not open file "pg_xlog/000000010000000000000006" (log file
0, segment 6): No such file or directory
LOG:  invalid primary checkpoint record
LOG:  could not open file "pg_xlog/000000010000000000000005" (log file
0, segment 5): No such file or directory
LOG:  invalid secondary checkpoint record
PANIC:  could not locate a valid checkpoint record
LOG:  startup process (PID 4676) was terminated by signal 6
LOG:  aborting startup due to startup process failure
LOG:  database system was interrupted while in recovery at log time
2007-03-20 13:04:28 PDT
====

The question I have is how do I get out of the above mode and ensure
that the db is up and ready? What I need to clear? A previous cache or
something? Am I missing something here? I went to the docs and it says
the following:

"Start the postmaster. The postmaster will go into recovery mode and
proceed to read through the archived WAL files it needs. Upon
completion of the recovery process, the postmaster will rename
recovery.conf to recovery.done (to prevent accidentally re-entering
recovery mode in case of a crash later) and then commence normal
database operations."

And I do not see the recovery.conf go to recovery.done automatically.

Dhaval


On 3/21/07, Merlin Moncure <mmoncure@gmail.com> wrote:
> On 3/21/07, Dhaval Shah <dhaval.shah.m@gmail.com> wrote:
> > Resending.
> >
> > I have a "hot" standby. Now, if the primary fails
> > how do I tell the secondary that come out of recovery mode and move
> > the recovery.conf to recovery.done and start the db. I mean, what
> > error code shall I return?
>
> did you look at pg_standby utility? it has kill file mechanism that
> automates this for you.
>
> merlin
>


--
Dhaval Shah

From one of Tom's reply to a different poster, I found that one can run

pg_resetxlog. http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html,
to make the db recover and startup.

Appears not for the faint hearted!

Dhaval

On 3/21/07, Dhaval Shah <dhaval.shah.m@gmail.com> wrote:
> I looked at the pg_standby utility and would have liked to use it,
> however there are some customer driven extraneous issues in using
> that.
>
> What I am looking at it is this:
>
> 1. I can detect that the primary has gone down and return a non-zero
> for the standby to recover.
>
> 2. Since I can detect that I am out of standby mode, I can shutdown
> the postgres, move the recovery.conf file to recovery.done manually.
> And then restart the db.
>
> Even if I do step 2, I still get the following in the server log:
>
> =====
> Main: Triggering Recovery!!!  <- my script is returning a non-zero code here ...
>
> PANIC:  could not open file "pg_xlog/00000001000000000000001B" (log
> file 0, segment 27): No such file or directory
> LOG:  startup process (PID 32167) was terminated by signal 6
> LOG:  aborting startup due to startup process failure
> LOG:  database system was interrupted while in recovery at log time
> 2007-03-20 13:04:28 PDT
> HINT:  If this has occurred more than once some data may be corrupted
> and you may need to choose an earlier recovery target.
> LOG:  could not open file "pg_xlog/000000010000000000000006" (log file
> 0, segment 6): No such file or directory
> LOG:  invalid primary checkpoint record
> LOG:  could not open file "pg_xlog/000000010000000000000005" (log file
> 0, segment 5): No such file or directory
> LOG:  invalid secondary checkpoint record
> PANIC:  could not locate a valid checkpoint record
> LOG:  startup process (PID 4676) was terminated by signal 6
> LOG:  aborting startup due to startup process failure
> LOG:  database system was interrupted while in recovery at log time
> 2007-03-20 13:04:28 PDT
> ====
>
> The question I have is how do I get out of the above mode and ensure
> that the db is up and ready? What I need to clear? A previous cache or
> something? Am I missing something here? I went to the docs and it says
> the following:
>
> "Start the postmaster. The postmaster will go into recovery mode and
> proceed to read through the archived WAL files it needs. Upon
> completion of the recovery process, the postmaster will rename
> recovery.conf to recovery.done (to prevent accidentally re-entering
> recovery mode in case of a crash later) and then commence normal
> database operations."
>
> And I do not see the recovery.conf go to recovery.done automatically.
>
> Dhaval
>
>
> On 3/21/07, Merlin Moncure <mmoncure@gmail.com> wrote:
> > On 3/21/07, Dhaval Shah <dhaval.shah.m@gmail.com> wrote:
> > > Resending.
> > >
> > > I have a "hot" standby. Now, if the primary fails
> > > how do I tell the secondary that come out of recovery mode and move
> > > the recovery.conf to recovery.done and start the db. I mean, what
> > > error code shall I return?
> >
> > did you look at pg_standby utility? it has kill file mechanism that
> > automates this for you.
> >
> > merlin
> >
>
>
> --
> Dhaval Shah
>


--
Dhaval Shah

On 3/21/07, Dhaval Shah <dhaval.shah.m@gmail.com> wrote:
> From one of Tom's reply to a different poster, I found that one can run
>
> pg_resetxlog. http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html,
> to make the db recover and startup.
>
> Appears not for the faint hearted!
>
> Dhaval
>
> On 3/21/07, Dhaval Shah <dhaval.shah.m@gmail.com> wrote:
> > I looked at the pg_standby utility and would have liked to use it,
> > however there are some customer driven extraneous issues in using
> > that.
> >
> > What I am looking at it is this:
> >
> > 1. I can detect that the primary has gone down and return a non-zero
> > for the standby to recover.
> >
> > 2. Since I can detect that I am out of standby mode, I can shutdown
> > the postgres, move the recovery.conf file to recovery.done manually.
> > And then restart the db.
> >
> > Even if I do step 2, I still get the following in the server log:
> >
> > =====
> > Main: Triggering Recovery!!!  <- my script is returning a non-zero code here ...
> >
> > PANIC:  could not open file "pg_xlog/00000001000000000000001B" (log
> > file 0, segment 27): No such file or directory

If you are getting these errors there is something wrong with your log
shipping method.  You are missing WAL files that are needed to bring
the server back into recovery...pg_resetxlog will not help you
re-recover the server although it may allow you to bring the server up
with some (possibly a lot) of data loss.

This is coming from the fact that for a 'hot standby', you need to
take extra precautions to preserve old WAL files.  AIUI, the server
needs to go far enough back in 'WAL time' to see the last checkpoint,
which is not available.  Even if you can't use it, get a copy of the
pg_standby utility and get a really good understanding of how it
works.  It has a clever 'symlink' mode which neatly bypasses the
complexity of maintaining a standby system.  It is one C file and is
well documented.

merlin