Thread: Shutting down a warm standby database in 8.2beta3

Shutting down a warm standby database in 8.2beta3

From

Stephen Harris

Date:

17 November 2006, 19:14:37

I'm using 8.2beta3 but I'm asking here before posting to the devel
lists as suggested by that lists guidelines..

First the question, because it might be simple and I'm stupid.  However
I'll then go into detail in case I'm not so silly.

In a database which is in recovery mode waiting on an external script
to provide a log file, how do we do a clean shutdown so that we can then
continue recovery at a later point?

Detail:

I've set up a simple log shipping test between two databases.  This
works well; I can perform transactions on the live server, see the
archive log get shipped to the second machine, see the recovery thread
pick it up and apply the logs and wait for the next.  If I signal the
recovery program to stop then then standby database finally comes live
and I can see my work.

This is all good stuff and I like it.  I used to do this with oracle
many years ago and am now please that we can force log switches (even
better, the DBMS itself can do it!), which was a big thing missing in
earlier versions.

However, there will be times when the standby databse needs shutting
down (eg hardware maintenance, OS patches, whatever) and then bringing
back up in recovery mode.

I tried the following idea:
  pg_ctl stop -W -m smart
  signal_recovery_program to exit(1)
  wait-for-pid-file
  remove any pg_xlog files
  recreate recover.conf

This has a problem; because the database will temporarily be "live".

LOG:  restored log file "000000010000000100000038" from archive
LOG:  received smart shutdown request
LOG:  could not open file "pg_xlog/000000010000000100000039" (log file 1, segment 57): No such file or directory
LOG:  redo done at 1/38000070
LOG:  restored log file "000000010000000100000038" from archive
LOG:  archive recovery complete
LOG:  database system is ready
LOG:  shutting down
LOG:  database system is shut down

This means it may have started it's own transaction history and so the
archives from the primary database no longer match.  Mostly this has
minimal impact and the system recovers

eg
LOG:  restored log file "00000001000000010000003B" from archive
LOG:  invalid xl_info in primary checkpoint record
LOG:  using previous checkpoint record at 1/3B000020
LOG:  redo record is at 1/3B000020; undo record is at 0/0; shutdown FALSE
LOG:  next transaction ID: 0/2004; next OID: 26532
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  automatic recovery in progress
LOG:  redo starts at 1/3B000070
LOG:  restored log file "00000001000000010000003C" from archive
LOG:  restored log file "00000001000000010000003D" from archive
LOG:  restored log file "00000001000000010000003E" from archive

The 2nd line is a little worrying.

However, occaisionally the system can't recover:

LOG:  restored log file "000000010000000100000039" from archive
LOG:  invalid record length at 1/39000070
LOG:  invalid primary checkpoint record
LOG:  restored log file "000000010000000100000039" from archive
LOG:  invalid xl_info in secondary checkpoint record
PANIC:  could not locate a valid checkpoint record
LOG:  startup process (PID 6893) was terminated by signal 6
LOG:  aborting startup due to startup process failure
LOG:  logger shutting down

I know log 1/39.... is good because if I bring up an older backup and
replay the logs then it goes through cleanly.

Doing a shutdown "immediate" isn't to clever because it actually leaves
the recovery threads running

LOG:  restored log file "00000001000000010000003E" from archive
LOG:  received immediate shutdown request
LOG:  restored log file "00000001000000010000003F" from archive

Oops!

So the question is...  how to cleanly shutdown a recovery instance?

--

rgds
Stephen

Re: Shutting down a warm standby database in 8.2beta3

From

Tom Lane

Date:

17 November 2006, 21:03:59

Stephen Harris <lists@spuddy.org> writes:
> Doing a shutdown "immediate" isn't to clever because it actually leaves
> the recovery threads running

> LOG:  restored log file "00000001000000010000003E" from archive
> LOG:  received immediate shutdown request
> LOG:  restored log file "00000001000000010000003F" from archive

Hm, that should work --- AFAICS the startup process should abort on
SIGQUIT the same as any regular backend.

[ thinks... ]  Ah-hah, "man system(3)" tells the tale:

     system() ignores the SIGINT and SIGQUIT signals, and blocks the
     SIGCHLD signal, while waiting for the command to terminate.  If this
     might cause the application to miss a signal that would have killed
     it, the application should examine the return value from system() and
     take whatever action is appropriate to the application if the command
     terminated due to receipt of a signal.

So the SIGQUIT went to the recovery script command and was missed by the
startup process.  It looks to me like your script actually ignored the
signal, which you'll need to fix, but it also looks like we are not
checking for these cases in RestoreArchivedFile(), which we'd better fix.
As the code stands, if the recovery script is killed by a signal, we'd
take that as normal termination of the recovery and proceed to come up,
which is definitely the Wrong Thing.

            regards, tom lane

Re: Shutting down a warm standby database in 8.2beta3

From

Stephen Harris

Date:

18 November 2006, 01:22:18

On Fri, Nov 17, 2006 at 05:03:44PM -0500, Tom Lane wrote:
> Stephen Harris <lists@spuddy.org> writes:
> > Doing a shutdown "immediate" isn't to clever because it actually leaves
> > the recovery threads running
>
> > LOG:  restored log file "00000001000000010000003E" from archive
> > LOG:  received immediate shutdown request
> > LOG:  restored log file "00000001000000010000003F" from archive
>
> Hm, that should work --- AFAICS the startup process should abort on
> SIGQUIT the same as any regular backend.
>
> [ thinks... ]  Ah-hah, "man system(3)" tells the tale:
>
>      system() ignores the SIGINT and SIGQUIT signals, and blocks the
>      SIGCHLD signal, while waiting for the command to terminate.  If this
>      might cause the application to miss a signal that would have killed
>      it, the application should examine the return value from system() and
>      take whatever action is appropriate to the application if the command
>      terminated due to receipt of a signal.
>
> So the SIGQUIT went to the recovery script command and was missed by the
> startup process.  It looks to me like your script actually ignored the
> signal, which you'll need to fix, but it also looks like we are not

My script was just a ksh script and didn't do anything special with signals.
Essentially it does
  #!/bin/ksh -p

  [...variable setup...]
  while [ ! -f $wanted_file ]
  do
    if [ -f $abort_file ]
    then
      exit 1
    fi
    sleep 5
  done
  cat $wanted_file

I know signals can be deferred in scripts (a signal sent to the script during
the sleep will be deferred if a trap handler had been written for the signal)
but they _do_ get delivered.

However, it seems the signal wasn't sent at all.  Once the wanted file
appeared the recovery thread from postmaster started a _new_ script for
the next log.  I'll rewrite the script in perl (probably monday when
I'm back in the office) and stick lots of signal() traps in to see if
anything does get sent to the script.

> As the code stands, if the recovery script is killed by a signal, we'd
> take that as normal termination of the recovery and proceed to come up,
> which is definitely the Wrong Thing.

Oh good; that means I'm not mad :-)

--

rgds
Stephen

Re: [HACKERS] Shutting down a warm standby database in 8.2beta3

From

Gregory Stark

Date:

18 November 2006, 01:39:54

"Stephen Harris" <lists@spuddy.org> writes:

> My script was just a ksh script and didn't do anything special with signals.
> Essentially it does
>   #!/bin/ksh -p
>
>   [...variable setup...]
>   while [ ! -f $wanted_file ]
>   do
>     if [ -f $abort_file ]
>     then
>       exit 1
>     fi
>     sleep 5
>   done
>   cat $wanted_file
>
> I know signals can be deferred in scripts (a signal sent to the script during
> the sleep will be deferred if a trap handler had been written for the signal)
> but they _do_ get delivered.

Sure, but it might be getting delivered to, say, your "sleep" command. You
haven't checked the return value of sleep to handle any errors that may occur.
As it stands you have to check for errors from every single command executed
by your script.

That doesn't seem terribly practical to expect of useres. As long as Postgres
is using SIGQUIT for its own communication it seems it really ought to arrange
to block the signal while the script is running so it will receive the signals
it expects once the script ends.

Alternatively perhaps Postgres really ought to be using USR1/USR2 or other
signals that library routines won't think they have any business rearranging.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: [HACKERS] Shutting down a warm standby database in 8.2beta3

From

Stephen Harris

Date:

18 November 2006, 02:52:31

On Fri, Nov 17, 2006 at 09:39:39PM -0500, Gregory Stark wrote:
> "Stephen Harris" <lists@spuddy.org> writes:
> >   [...variable setup...]
> >   while [ ! -f $wanted_file ]
> >   do
> >     if [ -f $abort_file ]
> >     then
> >       exit 1
> >     fi
> >     sleep 5
> >   done
> >   cat $wanted_file

> > I know signals can be deferred in scripts (a signal sent to the script during

> Sure, but it might be getting delivered to, say, your "sleep" command. You

No.  The sleep command keeps on running.  I could see that using "ps".

To the best of my knowldge, a random child process of the script wouldn't
even get a signal.  All the postmaster recovery thread knows about is the
system() - ie "sh -c".  All sh knows about is the ksh process.  Neither
postmaster or sh know about "sleep" and so "sleep" wouldn't receive the
signal (unless it was sent to all processes in the process group).

Here's an example from Solaris 10 demonstrating lack of signal propogation.

  $ uname -sr
  SunOS 5.10
  $ echo $0
  /bin/sh
  $ cat x
  #!/bin/ksh -p

  sleep 10000
  $ ./x &
  4622
  $ kill 4622
  $
  4622 Terminated
  $ ps -ef | grep sleep
      sweh  4624  4602   0 22:13:13 pts/1       0:00 grep sleep
      sweh  4623     1   0 22:13:04 pts/1       0:00 sleep 10000

This is, in fact, what proper "job control" shells do.  Doing the same
test with ksh as the command shell will kill the sleep :-)

  $ echo $0
  -ksh
  $ ./x &
  [1]     4632
  $ kill %1
  [1] + Terminated               ./x &
  $ ps -ef | grep sleep
      sweh  4635  4582   0 22:15:17 pts/1       0:00 grep sleep

[ Aside: The only way I've been able to guarantee all processes and child
  processes and everything to be killed is to run a subprocess with
  setsid() to create a new process group and kill the whole process group.
  It's a pain ]

If postmaster was sending a signal to the system() process then "sh -c"
might not signal the ksh script, anyway.  The ksh script might terminate,
or it might defer until sleep had finished.  Only if postmaster had
signalled a complete process group would sleep ever see the signal.

--

rgds
Stephen

Re: [HACKERS] Shutting down a warm standby database in 8.2beta3

From

Tom Lane

Date:

18 November 2006, 04:15:36

Gregory Stark <stark@enterprisedb.com> writes:
> Sure, but it might be getting delivered to, say, your "sleep" command. You
> haven't checked the return value of sleep to handle any errors that may occur.
> As it stands you have to check for errors from every single command executed
> by your script.

The expectation is that something like SIGINT or SIGQUIT would be
delivered to both the sleep command and the shell process running the
script.  So the shell should fail anyway.  (Of course, a nontrivial
archive or recovery script had better be checking for failures at each
step, but this is not very relevant to the immediate problem.)

> Alternatively perhaps Postgres really ought to be using USR1/USR2 or other
> signals that library routines won't think they have any business rearranging.

The existing signal assignments were all picked for what seem to me
to be good reasons; I'm disinclined to change them.  In any case, the
important point here is that we'd really like an archive or recovery
script, or for that matter any command executed via system() from a
backend, to abort when the parent backend is SIGINT'd or SIGQUIT'd.

Stephen's idea of executing setsid() at each backend start seems
interesting, but is there a way that will work on Windows?

            regards, tom lane

Re: [HACKERS] Shutting down a warm standby database in 8.2beta3

From

Gregory Stark

Date:

18 November 2006, 04:38:45

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> Sure, but it might be getting delivered to, say, your "sleep" command. You
>> haven't checked the return value of sleep to handle any errors that may occur.
>> As it stands you have to check for errors from every single command executed
>> by your script.
>
> The expectation is that something like SIGINT or SIGQUIT would be
> delivered to both the sleep command and the shell process running the
> script.  So the shell should fail anyway.  (Of course, a nontrivial
> archive or recovery script had better be checking for failures at each
> step, but this is not very relevant to the immediate problem.)

Hm, I tried to test that before I sent that. But I guess my test was faulty
since I was really testing what process the terminal handling delivered the
signal to:


$ cat /tmp/test.sh
#!/bin/sh

echo before
sleep 5 || echo sleep failed
echo after

$ sh /tmp/test.sh ; echo $?
before
^\
/tmp/test.sh: line 4: 23407 Quit                    sleep 5
sleep failed
after
0


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com