Thread: restarting after power outage
Hello, The following has happened to me maybe 3 or 4 times over the past few years (and again today), so I thought I might send in an email to the list to see if others experience this. After a power outage (and bad UPS combo, or plug pull, or bad RAM, etc) sometimes (I would guess <10% of the time) postgresql fails to restart automatically after booting the computer. Invariably, it is because the "postmaster.pid" file exists, but maybe this is just a symptom of something else. The solution I have been performing is to simply delete this file, and then restart postgres (service postgresql start). Is this the correct procedure? Should I be doing something else? Do others see this, or am I the only one? Finally, I would make the suggestion that the init script should check to see if the PID file exists BEFORE starting the server. If so, issue some sort of message on how to procede. Thanks, Jon PS: vital stats: [root@bilbo init.d]# head -1 /etc/issue Fedora Core release 3 (Heidelberg) [root@bilbo init.d]# uname -a Linux bilbo 2.6.11-1.14_FC3 #1 Thu Apr 7 19:23:49 EDT 2005 i686 athlon i386 GNU/Linux [root@bilbo init.d]# rpm -q postgresql postgresql-7.4.7-3.FC3.1 -- -**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*--- Jon Lapham <lapham@jandr.org> Rio de Janeiro, Brasil Personal: http://www.jandr.org/ ***-*--*----*-------*------------*--------------------*---------------
Jon Lapham <lapham@jandr.org> writes: > After a power outage (and bad UPS combo, or plug pull, or bad RAM, etc) > sometimes (I would guess <10% of the time) postgresql fails to restart > automatically after booting the computer. Invariably, it is because the > "postmaster.pid" file exists, but maybe this is just a symptom of > something else. The solution I have been performing is to simply delete > this file, and then restart postgres (service postgresql start). > Is this the correct procedure? It is. We have been fooling with the postmaster startup logic to try to eliminate this gotcha, but it's only very recently (8.0.2) that I think we got it right. regards, tom lane
Tom Lane wrote: > Jon Lapham <lapham@jandr.org> writes: > >>After a power outage (and bad UPS combo, or plug pull, or bad RAM, etc) >>sometimes (I would guess <10% of the time) postgresql fails to restart >>automatically after booting the computer. Invariably, it is because the >>"postmaster.pid" file exists, but maybe this is just a symptom of >>something else. The solution I have been performing is to simply delete >>this file, and then restart postgres (service postgresql start). > >>Is this the correct procedure? > > It is. We have been fooling with the postmaster startup logic to try to > eliminate this gotcha, but it's only very recently (8.0.2) that I think > we got it right. So, then it would be correct to change my init scripts to do the following: (if so, this patch can be applied to the 7.4 branch) --- postgresql 2005-02-21 16:33:37.000000000 -0300 +++ postgresql_pidkiller 2005-04-27 15:38:03.000000000 -0300 @@ -178,6 +178,13 @@ fi echo -n "$PSQL_START" + + # If there is a stray postmaster.pid file laying around, remove it + if [ -f "${PGDATA}/postmaster.pid" ] + then + rm ${PGDATA}/postmaster.pid + fi + $SU -l postgres -c "$PGENGINE/postmaster -p ${PGPORT} -D '${PGDATA}' ${PGOPTS} &" >> $PGLOG 2>&1 < /dev/null sleep 2 pid=`pidof -s $PGENGINE/postmaster` -- -**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*--- Jon Lapham <lapham@jandr.org> Rio de Janeiro, Brasil Personal: http://www.jandr.org/ ***-*--*----*-------*------------*--------------------*---------------
Jon Lapham <lapham@jandr.org> writes: > Tom Lane wrote: >> It is. We have been fooling with the postmaster startup logic to try to >> eliminate this gotcha, but it's only very recently (8.0.2) that I think >> we got it right. > So, then it would be correct to change my init scripts to do the > following: (if so, this patch can be applied to the 7.4 branch) I would recommend strongly AGAINST that, because what you just did was remove the defense against starting two postmasters concurrently in the same data directory (which would be a disaster of the first magnitude). This is not a problem for bootup of course, but if you ever use this script to start the postmaster by hand, then you are playing with fire. We would have put something like that in the standard init scripts years ago if it were safe. If you want a solution in the 7.4 branch, I have back-patched the 8.0.2 fix into the latest Fedora Core 3 RPMs (7.4.7-5.FC3.1). regards, tom lane
Tom Lane wrote: > Jon Lapham <lapham@jandr.org> writes: >>So, then it would be correct to change my init scripts to do the >>following: (if so, this patch can be applied to the 7.4 branch) > > I would recommend strongly AGAINST that, because what you just did was > remove the defense against starting two postmasters concurrently in the > same data directory (which would be a disaster of the first magnitude). > This is not a problem for bootup of course, but if you ever use this > script to start the postmaster by hand, then you are playing with fire. I figured there must be more to it... > If you want a solution in the 7.4 branch, I have back-patched the > 8.0.2 fix into the latest Fedora Core 3 RPMs (7.4.7-5.FC3.1). Nice, thanks. -- -**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*--- Jon Lapham <lapham@jandr.org> Rio de Janeiro, Brasil Personal: http://www.jandr.org/ ***-*--*----*-------*------------*--------------------*---------------
Tom Lane <tgl@sss.pgh.pa.us> writes: > Jon Lapham <lapham@jandr.org> writes: >> Tom Lane wrote: >>> It is. We have been fooling with the postmaster startup logic to try to >>> eliminate this gotcha, but it's only very recently (8.0.2) that I think >>> we got it right. > >> So, then it would be correct to change my init scripts to do the >> following: (if so, this patch can be applied to the 7.4 branch) > > I would recommend strongly AGAINST that, because what you just did was > remove the defense against starting two postmasters concurrently in the > same data directory (which would be a disaster of the first magnitude). > This is not a problem for bootup of course, but if you ever use this > script to start the postmaster by hand, then you are playing with fire. What I have done is to create a separate init.d script that removes the PID file, and arrange for it to run before the PG startup script. That way you can use the regular script to stop and start without danger, but on a bootup after an unclean shutdown the PID file will get removed before PG gets started. If you're dumb enough to run the removal script by hand while PG is running, you deserve what you get. :) -Doug
On Wed, Apr 27, 2005 at 03:03:50PM -0400, Doug McNaught wrote: > > What I have done is to create a separate init.d script that > removes the PID file, and arrange for it to run before the PG > startup script. An even better place (if you really want to do all this) would be something that happens only at boot time. On a Debian system a skript linked to /etc/rcS.d/ would be in order, on SUSE I would rm it in /etc/init.d/boot.local. But I advise against it. Do things like that manually after you've checked that your pgsql-partition is mounted and filled with correct data. Some of the distributed init.d skripts for starting postgres also initdb the data-location if they think they should. That can lead to the total loss of your cluster. -- Peter
>>>> It is. We have been fooling with the postmaster startup logic to try to >>>> eliminate this gotcha, but it's only very recently (8.0.2) that I think >>>> we got it right. >> >>> So, then it would be correct to change my init scripts to do the >>> following: (if so, this patch can be applied to the 7.4 branch) >> >> I would recommend strongly AGAINST that, because what you just did was >> remove the defense against starting two postmasters concurrently in the >> same data directory (which would be a disaster of the first magnitude). >> This is not a problem for bootup of course, but if you ever use this >> script to start the postmaster by hand, then you are playing with fire. > > What I have done is to create a separate init.d script that removes > the PID file, and arrange for it to run before the PG startup script. > That way you can use the regular script to stop and start without > danger, but on a bootup after an unclean shutdown the PID file will > get removed before PG gets started. If you're dumb enough to run the > removal script by hand while PG is running, you deserve what you get. :) Or, if your cron supports it, add the following to root's crontab: @reboot /bin/rm -f /path/to/postgres/pid Although I like having a separate startup script that runs first to go around removing this and other things as well...
Philip Hallstrom <postgresql@philip.pjkh.com> writes: > Although I like having a separate startup script that runs first to go > around removing this and other things as well... I think most Unix variants have a specific bootup script that's charged with doing exactly that; if you can find it, that's a good place to add a line for postmaster.pid. regards, tom lane
Is this just me or did anyone actually think about adding a UPS to the machine and monitor it with NUT ? That way the machine would shut down properly, making the whole stale pid-file issue irrelevant. UC On Wednesday 27 April 2005 13:41, Tom Lane wrote: > Philip Hallstrom <postgresql@philip.pjkh.com> writes: > > Although I like having a separate startup script that runs first to go > > around removing this and other things as well... > > I think most Unix variants have a specific bootup script that's charged > with doing exactly that; if you can find it, that's a good place to add > a line for postmaster.pid. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend -- Open Source Solutions 4U, LLC 2570 Fleetwood Drive Phone: +1 650 872 2425 San Bruno, CA 94066 Cell: +1 650 302 2405 United States Fax: +1 650 872 2417
"Uwe C. Schroeder" <uwe@oss4u.com> writes: > Is this just me or did anyone actually think about adding a UPS to > the machine and monitor it with NUT ? That way the machine would > shut down properly, making the whole stale pid-file issue > irrelevant. UPSs fail. People kick out power cords. It's good to be able to deal with it. -Doug
On Wednesday 27 April 2005 15:17, Doug McNaught wrote: > "Uwe C. Schroeder" <uwe@oss4u.com> writes: > > Is this just me or did anyone actually think about adding a UPS to > > the machine and monitor it with NUT ? That way the machine would > > shut down properly, making the whole stale pid-file issue > > irrelevant. > > UPSs fail. People kick out power cords. It's good to be able to deal > with it. > > -Doug You're right about that. Question is how often does this happen to rectify some automated procedure. In case of a hard shutdown there are a whole bunch of things that could potentially go wrong on startup (like fsck failing etc.). So checking up on the machine might be a good idea anyways. I for my part locked the server room - works every time when the cleaning crew comes into the office looking for an outlet to plug the vacuum in. All they take out now is the faxmachine :-) UC -- Open Source Solutions 4U, LLC 2570 Fleetwood Drive Phone: +1 650 872 2425 San Bruno, CA 94066 Cell: +1 650 302 2405 United States Fax: +1 650 872 2417