Thread: restarting after power outage

restarting after power outage

From
Jon Lapham
Date:
Hello,

The following has happened to me maybe 3 or 4 times over the past few
years (and again today), so I thought I might send in an email to the
list to see if others experience this.

After a power outage (and bad UPS combo, or plug pull, or bad RAM, etc)
sometimes (I would guess <10% of the time) postgresql fails to restart
automatically after booting the computer.  Invariably, it is because the
"postmaster.pid" file exists, but maybe this is just a symptom of
something else.  The solution I have been performing is to simply delete
this file, and then restart postgres (service postgresql start).

Is this the correct procedure?  Should I be doing something else?  Do
others see this, or am I the only one?

Finally, I would make the suggestion that the init script should check
to see if the PID file exists BEFORE starting the server.  If so, issue
some sort of message on how to procede.

Thanks, Jon

PS: vital stats:

[root@bilbo init.d]# head -1 /etc/issue
Fedora Core release 3 (Heidelberg)
[root@bilbo init.d]# uname -a
Linux bilbo 2.6.11-1.14_FC3 #1 Thu Apr 7 19:23:49 EDT 2005 i686 athlon
i386 GNU/Linux
[root@bilbo init.d]# rpm -q postgresql
postgresql-7.4.7-3.FC3.1


--
-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---
  Jon Lapham  <lapham@jandr.org>                Rio de Janeiro, Brasil
  Personal: http://www.jandr.org/
***-*--*----*-------*------------*--------------------*---------------


Re: restarting after power outage

From
Tom Lane
Date:
Jon Lapham <lapham@jandr.org> writes:
> After a power outage (and bad UPS combo, or plug pull, or bad RAM, etc)
> sometimes (I would guess <10% of the time) postgresql fails to restart
> automatically after booting the computer.  Invariably, it is because the
> "postmaster.pid" file exists, but maybe this is just a symptom of
> something else.  The solution I have been performing is to simply delete
> this file, and then restart postgres (service postgresql start).

> Is this the correct procedure?

It is.  We have been fooling with the postmaster startup logic to try to
eliminate this gotcha, but it's only very recently (8.0.2) that I think
we got it right.

            regards, tom lane

Re: restarting after power outage

From
Jon Lapham
Date:
Tom Lane wrote:
> Jon Lapham <lapham@jandr.org> writes:
>
>>After a power outage (and bad UPS combo, or plug pull, or bad RAM, etc)
>>sometimes (I would guess <10% of the time) postgresql fails to restart
>>automatically after booting the computer.  Invariably, it is because the
>>"postmaster.pid" file exists, but maybe this is just a symptom of
>>something else.  The solution I have been performing is to simply delete
>>this file, and then restart postgres (service postgresql start).
>
>>Is this the correct procedure?
>
> It is.  We have been fooling with the postmaster startup logic to try to
> eliminate this gotcha, but it's only very recently (8.0.2) that I think
> we got it right.

So, then it would be correct to change my init scripts to do the
following:  (if so, this patch can be applied to the 7.4 branch)

--- postgresql  2005-02-21 16:33:37.000000000 -0300
+++ postgresql_pidkiller        2005-04-27 15:38:03.000000000 -0300
@@ -178,6 +178,13 @@
         fi

         echo -n "$PSQL_START"
+
+       # If there is a stray postmaster.pid file laying around, remove it
+       if [ -f "${PGDATA}/postmaster.pid" ]
+       then
+               rm ${PGDATA}/postmaster.pid
+       fi
+
         $SU -l postgres -c "$PGENGINE/postmaster -p ${PGPORT} -D
'${PGDATA}' ${PGOPTS} &" >> $PGLOG 2>&1 < /dev/null
         sleep 2
         pid=`pidof -s $PGENGINE/postmaster`


--
-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---
  Jon Lapham  <lapham@jandr.org>                Rio de Janeiro, Brasil
  Personal: http://www.jandr.org/
***-*--*----*-------*------------*--------------------*---------------


Re: restarting after power outage

From
Tom Lane
Date:
Jon Lapham <lapham@jandr.org> writes:
> Tom Lane wrote:
>> It is.  We have been fooling with the postmaster startup logic to try to
>> eliminate this gotcha, but it's only very recently (8.0.2) that I think
>> we got it right.

> So, then it would be correct to change my init scripts to do the
> following:  (if so, this patch can be applied to the 7.4 branch)

I would recommend strongly AGAINST that, because what you just did was
remove the defense against starting two postmasters concurrently in the
same data directory (which would be a disaster of the first magnitude).
This is not a problem for bootup of course, but if you ever use this
script to start the postmaster by hand, then you are playing with fire.

We would have put something like that in the standard init scripts
years ago if it were safe.

If you want a solution in the 7.4 branch, I have back-patched the
8.0.2 fix into the latest Fedora Core 3 RPMs (7.4.7-5.FC3.1).

            regards, tom lane

Re: restarting after power outage

From
Jon Lapham
Date:
Tom Lane wrote:
> Jon Lapham <lapham@jandr.org> writes:
>>So, then it would be correct to change my init scripts to do the
>>following:  (if so, this patch can be applied to the 7.4 branch)
>
> I would recommend strongly AGAINST that, because what you just did was
> remove the defense against starting two postmasters concurrently in the
> same data directory (which would be a disaster of the first magnitude).
> This is not a problem for bootup of course, but if you ever use this
> script to start the postmaster by hand, then you are playing with fire.

I figured there must be more to it...

> If you want a solution in the 7.4 branch, I have back-patched the
> 8.0.2 fix into the latest Fedora Core 3 RPMs (7.4.7-5.FC3.1).

Nice, thanks.

--
-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---
  Jon Lapham  <lapham@jandr.org>                Rio de Janeiro, Brasil
  Personal: http://www.jandr.org/
***-*--*----*-------*------------*--------------------*---------------


Re: restarting after power outage

From
Doug McNaught
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Jon Lapham <lapham@jandr.org> writes:
>> Tom Lane wrote:
>>> It is.  We have been fooling with the postmaster startup logic to try to
>>> eliminate this gotcha, but it's only very recently (8.0.2) that I think
>>> we got it right.
>
>> So, then it would be correct to change my init scripts to do the
>> following:  (if so, this patch can be applied to the 7.4 branch)
>
> I would recommend strongly AGAINST that, because what you just did was
> remove the defense against starting two postmasters concurrently in the
> same data directory (which would be a disaster of the first magnitude).
> This is not a problem for bootup of course, but if you ever use this
> script to start the postmaster by hand, then you are playing with fire.

What I have done is to create a separate init.d script that removes
the PID file, and arrange for it to run before the PG startup script.
That way you can use the regular script to stop and start without
danger, but on a bootup after an unclean shutdown the PID file will
get removed before PG gets started.  If you're dumb enough to run the
removal script by hand while PG is running, you deserve what you get.  :)

-Doug

Re: restarting after power outage

From
Peter Wiersig
Date:
On Wed, Apr 27, 2005 at 03:03:50PM -0400, Doug McNaught wrote:
>
> What I have done is to create a separate init.d script that
> removes the PID file, and arrange for it to run before the PG
> startup script.

An even better place (if you really want to do all this) would be
something that happens only at boot time.

On a Debian system a skript linked to /etc/rcS.d/ would be in
order, on SUSE I would rm it in /etc/init.d/boot.local.


But I advise against it. Do things like that manually after you've
checked that your pgsql-partition is mounted and filled with
correct data.

Some of the distributed init.d skripts for starting postgres also
initdb the data-location if they think they should. That can lead
to the total loss of your cluster.

--
Peter

Re: restarting after power outage

From
Philip Hallstrom
Date:
>>>> It is.  We have been fooling with the postmaster startup logic to try to
>>>> eliminate this gotcha, but it's only very recently (8.0.2) that I think
>>>> we got it right.
>>
>>> So, then it would be correct to change my init scripts to do the
>>> following:  (if so, this patch can be applied to the 7.4 branch)
>>
>> I would recommend strongly AGAINST that, because what you just did was
>> remove the defense against starting two postmasters concurrently in the
>> same data directory (which would be a disaster of the first magnitude).
>> This is not a problem for bootup of course, but if you ever use this
>> script to start the postmaster by hand, then you are playing with fire.
>
> What I have done is to create a separate init.d script that removes
> the PID file, and arrange for it to run before the PG startup script.
> That way you can use the regular script to stop and start without
> danger, but on a bootup after an unclean shutdown the PID file will
> get removed before PG gets started.  If you're dumb enough to run the
> removal script by hand while PG is running, you deserve what you get.  :)

Or, if your cron supports it, add the following to root's crontab:

@reboot /bin/rm -f /path/to/postgres/pid

Although I like having a separate startup script that runs first to go
around removing this and other things as well...

Re: restarting after power outage

From
Tom Lane
Date:
Philip Hallstrom <postgresql@philip.pjkh.com> writes:
> Although I like having a separate startup script that runs first to go
> around removing this and other things as well...

I think most Unix variants have a specific bootup script that's charged
with doing exactly that; if you can find it, that's a good place to add
a line for postmaster.pid.

            regards, tom lane

Re: restarting after power outage

From
"Uwe C. Schroeder"
Date:
Is this just me or did anyone actually think about adding a UPS to the machine
and monitor it with NUT ?
That way the machine would shut down properly, making the whole stale pid-file
issue irrelevant.

UC


On Wednesday 27 April 2005 13:41, Tom Lane wrote:
> Philip Hallstrom <postgresql@philip.pjkh.com> writes:
> > Although I like having a separate startup script that runs first to go
> > around removing this and other things as well...
>
> I think most Unix variants have a specific bootup script that's charged
> with doing exactly that; if you can find it, that's a good place to add
> a line for postmaster.pid.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend

--
Open Source Solutions 4U, LLC    2570 Fleetwood Drive
Phone:  +1 650 872 2425        San Bruno, CA 94066
Cell:   +1 650 302 2405        United States
Fax:    +1 650 872 2417

Re: restarting after power outage

From
Doug McNaught
Date:
"Uwe C. Schroeder" <uwe@oss4u.com> writes:

> Is this just me or did anyone actually think about adding a UPS to
> the machine and monitor it with NUT ?  That way the machine would
> shut down properly, making the whole stale pid-file issue
> irrelevant.

UPSs fail.  People kick out power cords.  It's good to be able to deal
with it.

-Doug

Re: restarting after power outage

From
"Uwe C. Schroeder"
Date:
On Wednesday 27 April 2005 15:17, Doug McNaught wrote:
> "Uwe C. Schroeder" <uwe@oss4u.com> writes:
> > Is this just me or did anyone actually think about adding a UPS to
> > the machine and monitor it with NUT ?  That way the machine would
> > shut down properly, making the whole stale pid-file issue
> > irrelevant.
>
> UPSs fail.  People kick out power cords.  It's good to be able to deal
> with it.
>
> -Doug

You're right about that. Question is how often does this happen to rectify
some automated procedure. In case of a hard shutdown there are a whole bunch
of things that could potentially go wrong on startup (like fsck failing
etc.). So checking up on the machine might be a good idea anyways.
I for my part locked the server room - works every time when the cleaning crew
comes into the office looking for an outlet to plug the vacuum in. All they
take out now is the faxmachine :-)

    UC

--
Open Source Solutions 4U, LLC    2570 Fleetwood Drive
Phone:  +1 650 872 2425        San Bruno, CA 94066
Cell:   +1 650 302 2405        United States
Fax:    +1 650 872 2417