Thread: Shutting down a warm standby database in 8.2beta3
I'm using 8.2beta3 but I'm asking here before posting to the devel lists as suggested by that lists guidelines.. First the question, because it might be simple and I'm stupid. However I'll then go into detail in case I'm not so silly. In a database which is in recovery mode waiting on an external script to provide a log file, how do we do a clean shutdown so that we can then continue recovery at a later point? Detail: I've set up a simple log shipping test between two databases. This works well; I can perform transactions on the live server, see the archive log get shipped to the second machine, see the recovery thread pick it up and apply the logs and wait for the next. If I signal the recovery program to stop then then standby database finally comes live and I can see my work. This is all good stuff and I like it. I used to do this with oracle many years ago and am now please that we can force log switches (even better, the DBMS itself can do it!), which was a big thing missing in earlier versions. However, there will be times when the standby databse needs shutting down (eg hardware maintenance, OS patches, whatever) and then bringing back up in recovery mode. I tried the following idea: pg_ctl stop -W -m smart signal_recovery_program to exit(1) wait-for-pid-file remove any pg_xlog files recreate recover.conf This has a problem; because the database will temporarily be "live". LOG: restored log file "000000010000000100000038" from archive LOG: received smart shutdown request LOG: could not open file "pg_xlog/000000010000000100000039" (log file 1, segment 57): No such file or directory LOG: redo done at 1/38000070 LOG: restored log file "000000010000000100000038" from archive LOG: archive recovery complete LOG: database system is ready LOG: shutting down LOG: database system is shut down This means it may have started it's own transaction history and so the archives from the primary database no longer match. Mostly this has minimal impact and the system recovers eg LOG: restored log file "00000001000000010000003B" from archive LOG: invalid xl_info in primary checkpoint record LOG: using previous checkpoint record at 1/3B000020 LOG: redo record is at 1/3B000020; undo record is at 0/0; shutdown FALSE LOG: next transaction ID: 0/2004; next OID: 26532 LOG: next MultiXactId: 1; next MultiXactOffset: 0 LOG: automatic recovery in progress LOG: redo starts at 1/3B000070 LOG: restored log file "00000001000000010000003C" from archive LOG: restored log file "00000001000000010000003D" from archive LOG: restored log file "00000001000000010000003E" from archive The 2nd line is a little worrying. However, occaisionally the system can't recover: LOG: restored log file "000000010000000100000039" from archive LOG: invalid record length at 1/39000070 LOG: invalid primary checkpoint record LOG: restored log file "000000010000000100000039" from archive LOG: invalid xl_info in secondary checkpoint record PANIC: could not locate a valid checkpoint record LOG: startup process (PID 6893) was terminated by signal 6 LOG: aborting startup due to startup process failure LOG: logger shutting down I know log 1/39.... is good because if I bring up an older backup and replay the logs then it goes through cleanly. Doing a shutdown "immediate" isn't to clever because it actually leaves the recovery threads running LOG: restored log file "00000001000000010000003E" from archive LOG: received immediate shutdown request LOG: restored log file "00000001000000010000003F" from archive Oops! So the question is... how to cleanly shutdown a recovery instance? -- rgds Stephen
Stephen Harris <lists@spuddy.org> writes: > Doing a shutdown "immediate" isn't to clever because it actually leaves > the recovery threads running > LOG: restored log file "00000001000000010000003E" from archive > LOG: received immediate shutdown request > LOG: restored log file "00000001000000010000003F" from archive Hm, that should work --- AFAICS the startup process should abort on SIGQUIT the same as any regular backend. [ thinks... ] Ah-hah, "man system(3)" tells the tale: system() ignores the SIGINT and SIGQUIT signals, and blocks the SIGCHLD signal, while waiting for the command to terminate. If this might cause the application to miss a signal that would have killed it, the application should examine the return value from system() and take whatever action is appropriate to the application if the command terminated due to receipt of a signal. So the SIGQUIT went to the recovery script command and was missed by the startup process. It looks to me like your script actually ignored the signal, which you'll need to fix, but it also looks like we are not checking for these cases in RestoreArchivedFile(), which we'd better fix. As the code stands, if the recovery script is killed by a signal, we'd take that as normal termination of the recovery and proceed to come up, which is definitely the Wrong Thing. regards, tom lane
On Fri, Nov 17, 2006 at 05:03:44PM -0500, Tom Lane wrote: > Stephen Harris <lists@spuddy.org> writes: > > Doing a shutdown "immediate" isn't to clever because it actually leaves > > the recovery threads running > > > LOG: restored log file "00000001000000010000003E" from archive > > LOG: received immediate shutdown request > > LOG: restored log file "00000001000000010000003F" from archive > > Hm, that should work --- AFAICS the startup process should abort on > SIGQUIT the same as any regular backend. > > [ thinks... ] Ah-hah, "man system(3)" tells the tale: > > system() ignores the SIGINT and SIGQUIT signals, and blocks the > SIGCHLD signal, while waiting for the command to terminate. If this > might cause the application to miss a signal that would have killed > it, the application should examine the return value from system() and > take whatever action is appropriate to the application if the command > terminated due to receipt of a signal. > > So the SIGQUIT went to the recovery script command and was missed by the > startup process. It looks to me like your script actually ignored the > signal, which you'll need to fix, but it also looks like we are not My script was just a ksh script and didn't do anything special with signals. Essentially it does #!/bin/ksh -p [...variable setup...] while [ ! -f $wanted_file ] do if [ -f $abort_file ] then exit 1 fi sleep 5 done cat $wanted_file I know signals can be deferred in scripts (a signal sent to the script during the sleep will be deferred if a trap handler had been written for the signal) but they _do_ get delivered. However, it seems the signal wasn't sent at all. Once the wanted file appeared the recovery thread from postmaster started a _new_ script for the next log. I'll rewrite the script in perl (probably monday when I'm back in the office) and stick lots of signal() traps in to see if anything does get sent to the script. > As the code stands, if the recovery script is killed by a signal, we'd > take that as normal termination of the recovery and proceed to come up, > which is definitely the Wrong Thing. Oh good; that means I'm not mad :-) -- rgds Stephen
"Stephen Harris" <lists@spuddy.org> writes: > My script was just a ksh script and didn't do anything special with signals. > Essentially it does > #!/bin/ksh -p > > [...variable setup...] > while [ ! -f $wanted_file ] > do > if [ -f $abort_file ] > then > exit 1 > fi > sleep 5 > done > cat $wanted_file > > I know signals can be deferred in scripts (a signal sent to the script during > the sleep will be deferred if a trap handler had been written for the signal) > but they _do_ get delivered. Sure, but it might be getting delivered to, say, your "sleep" command. You haven't checked the return value of sleep to handle any errors that may occur. As it stands you have to check for errors from every single command executed by your script. That doesn't seem terribly practical to expect of useres. As long as Postgres is using SIGQUIT for its own communication it seems it really ought to arrange to block the signal while the script is running so it will receive the signals it expects once the script ends. Alternatively perhaps Postgres really ought to be using USR1/USR2 or other signals that library routines won't think they have any business rearranging. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Fri, Nov 17, 2006 at 09:39:39PM -0500, Gregory Stark wrote: > "Stephen Harris" <lists@spuddy.org> writes: > > [...variable setup...] > > while [ ! -f $wanted_file ] > > do > > if [ -f $abort_file ] > > then > > exit 1 > > fi > > sleep 5 > > done > > cat $wanted_file > > I know signals can be deferred in scripts (a signal sent to the script during > Sure, but it might be getting delivered to, say, your "sleep" command. You No. The sleep command keeps on running. I could see that using "ps". To the best of my knowldge, a random child process of the script wouldn't even get a signal. All the postmaster recovery thread knows about is the system() - ie "sh -c". All sh knows about is the ksh process. Neither postmaster or sh know about "sleep" and so "sleep" wouldn't receive the signal (unless it was sent to all processes in the process group). Here's an example from Solaris 10 demonstrating lack of signal propogation. $ uname -sr SunOS 5.10 $ echo $0 /bin/sh $ cat x #!/bin/ksh -p sleep 10000 $ ./x & 4622 $ kill 4622 $ 4622 Terminated $ ps -ef | grep sleep sweh 4624 4602 0 22:13:13 pts/1 0:00 grep sleep sweh 4623 1 0 22:13:04 pts/1 0:00 sleep 10000 This is, in fact, what proper "job control" shells do. Doing the same test with ksh as the command shell will kill the sleep :-) $ echo $0 -ksh $ ./x & [1] 4632 $ kill %1 [1] + Terminated ./x & $ ps -ef | grep sleep sweh 4635 4582 0 22:15:17 pts/1 0:00 grep sleep [ Aside: The only way I've been able to guarantee all processes and child processes and everything to be killed is to run a subprocess with setsid() to create a new process group and kill the whole process group. It's a pain ] If postmaster was sending a signal to the system() process then "sh -c" might not signal the ksh script, anyway. The ksh script might terminate, or it might defer until sleep had finished. Only if postmaster had signalled a complete process group would sleep ever see the signal. -- rgds Stephen
Gregory Stark <stark@enterprisedb.com> writes: > Sure, but it might be getting delivered to, say, your "sleep" command. You > haven't checked the return value of sleep to handle any errors that may occur. > As it stands you have to check for errors from every single command executed > by your script. The expectation is that something like SIGINT or SIGQUIT would be delivered to both the sleep command and the shell process running the script. So the shell should fail anyway. (Of course, a nontrivial archive or recovery script had better be checking for failures at each step, but this is not very relevant to the immediate problem.) > Alternatively perhaps Postgres really ought to be using USR1/USR2 or other > signals that library routines won't think they have any business rearranging. The existing signal assignments were all picked for what seem to me to be good reasons; I'm disinclined to change them. In any case, the important point here is that we'd really like an archive or recovery script, or for that matter any command executed via system() from a backend, to abort when the parent backend is SIGINT'd or SIGQUIT'd. Stephen's idea of executing setsid() at each backend start seems interesting, but is there a way that will work on Windows? regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> Sure, but it might be getting delivered to, say, your "sleep" command. You >> haven't checked the return value of sleep to handle any errors that may occur. >> As it stands you have to check for errors from every single command executed >> by your script. > > The expectation is that something like SIGINT or SIGQUIT would be > delivered to both the sleep command and the shell process running the > script. So the shell should fail anyway. (Of course, a nontrivial > archive or recovery script had better be checking for failures at each > step, but this is not very relevant to the immediate problem.) Hm, I tried to test that before I sent that. But I guess my test was faulty since I was really testing what process the terminal handling delivered the signal to: $ cat /tmp/test.sh #!/bin/sh echo before sleep 5 || echo sleep failed echo after $ sh /tmp/test.sh ; echo $? before ^\ /tmp/test.sh: line 4: 23407 Quit sleep 5 sleep failed after 0 -- Gregory Stark EnterpriseDB http://www.enterprisedb.com