Thread: Postgresql goes down need to restart (redhat postgresql service script) lock files removal avoid 2 postmasters

Hi,
Thank you Tom.
I have been looking at the postgresql service startup scripts in Redhat
written by Lamar Owen et al.

I would like to understand what is the role of the following files that are
created during startup of Postgresql on my Redhat Linux box.

1.  /var/lib/pgsql/data/postmaster.pid

2.  /var/run/postmaster.pid (for redhat 7.3/Postgresql 7.2)
    /var/run/postmaster5432.pid (for redhat 9.0/Postgresql 7.3.2)

3.  /var/lock/subsys/postgresql

and

4.  /tmp/.s.PGSQL.5432.lock (and associated link to the directory in that
directory).

I notice that the file

1. /var/lib/pgsql/data/postmaster.pid contains the pid of the
/usr/bin/postmaster process. Interestingly Lamar does not rm this file on
stop().

2. /var/run/postmaster.pid contains the pid of a postgres stats process

3. the  /tmp/.s.PGSQL.5432.lock file has the pid of the /usr/bin/postmaster
process.

Why do I care?

My goal is to use   DJ Bernsteins daemonstools to make sure that my Postgresql
process goes back up unattended if it goes down for some reason. So I will be
substituting daemontools for the postgresql service script.
Thus I want to know what lock files to remove to make sure all is ok. I also
want to follow Tom Lanes's advice and not shoot myself in the foot by
creating two different postmaster processes working the same database!!!!

Thank you all for your help!!!

Mitchell Laks


On Thu, May 08, 2003 at 12:50:49 -0400,
  mlaks <mlaks@bellatlantic.net> wrote:
>
> My goal is to use   DJ Bernsteins daemonstools to make sure that my Postgresql
> process goes back up unattended if it goes down for some reason. So I will be
> substituting daemontools for the postgresql service script.
> Thus I want to know what lock files to remove to make sure all is ok. I also
> want to follow Tom Lanes's advice and not shoot myself in the foot by
> creating two different postmaster processes working the same database!!!!

This is what I put in my run file:
#!/bin/sh
exec 2>&1
exec setuidgid postgres /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data

I use multilog for logging as you normally would.


Thank you for your response Bruno. I agree about the importance of using the
lines

#!/bin/sh
exec 2>&1
exec setuidgid postgres /usr/local/pgsql/bin/postmaster -D
/usr/local/pgsql/data

in the run file. However, what else must we put in as well?

My question is to understand the lock files for postgresql so I can deal with
the following:

1.

I notice that Lamar's postgresql service script removes "stale lock files"
before starting postgresql by using the line

rm -f /tmp/.s.PGSQL.* > /dev/null

and perhaps my own experience indicates we also should add a line

rm -f /var/lib/pgsql/data/postmaster.pid

because sometimes when my machine crashes and gets rebooted I must manually
remove that file.

2.

Moreover, I  see that after successfully starting postgresql Lamar touches a
file

touch /var/lock/subsys/postgresql

and does this

echo $pid > /var/run/postmaster.pid

so how can we do that?

3.

I can imagine we can accomplish 1. with

#!/bin/sh
rm -f /tmp/.s.PGSQL.* > /dev/null
rm -f /var/lib/pgsql/data/postmaster.pid
exec 2>&1
exec setuidgid postgres /usr/local/pgsql/bin/postmaster -D
/usr/local/pgsql/data

but how do we do 2. -> the touching and echoing after the process starts if we
have replaced the "run" process by the postmaster process with the exec so
that the daemontools "svc" can control the process?



Mitchell
















On Thursday 08 May 2003 01:50 pm, you wrote:
> On Thu, May 08, 2003 at 12:50:49 -0400,
>
>   mlaks <mlaks@bellatlantic.net> wrote:
> > My goal is to use   DJ Bernsteins daemonstools to make sure that my
> > Postgresql process goes back up unattended if it goes down for some
> > reason. So I will be substituting daemontools for the postgresql service
> > script.
> > Thus I want to know what lock files to remove to make sure all is ok. I
> > also want to follow Tom Lanes's advice and not shoot myself in the foot
> > by creating two different postmaster processes working the same
> > database!!!!
>
> This is what I put in my run file:
> #!/bin/sh
> exec 2>&1
> exec setuidgid postgres /usr/local/pgsql/bin/postmaster -D
> /usr/local/pgsql/data
>
> I use multilog for logging as you normally would.


On Thu, May 08, 2003 at 14:10:52 -0400,
  mlaks <mlaks@bellatlantic.net> wrote:
> Thank you for your response Bruno. I agree about the importance of using the
> lines
>
> #!/bin/sh
> exec 2>&1
> exec setuidgid postgres /usr/local/pgsql/bin/postmaster -D
> /usr/local/pgsql/data
>
> in the run file. However, what else must we put in as well?
>
> My question is to understand the lock files for postgresql so I can deal with
> the following:

Some of the lock files have to do with the init system. Those can be
ignored. Postgres also keeps a lock file and that is used to prevent
two postmasters from running at the same time. You probably don't want
to have a script remove that lock file, because if there really is
another postmaster running, starting a second one can be a disaster.


Bruno,
Thanks for your help. I was wondering:

Should we in fact be execing the postmaster as you describe or perhaps pg_ctl
as Lamar's script uses or perhaps
a "new" script that incorporates pg_ctl or postmaster and a signal catching
mechanism. The reason I ask is that

the way that daemontools stops a service - if you want it to - is via the
command

svc opts postgresl :

with opts

-d: Down. If the service is running, send it a TERM signal and then a CONT
signal. After it stops, do not restart it.
-o: Once. If the service is not running, start it. Do not restart it if it
stops.
-p: Pause. Send the service a STOP signal.
-c: Continue. Send the service a CONT signal.
-h: Hangup. Send the service a HUP signal.
-a: Alarm. Send the service an ALRM signal.
-i: Interrupt. Send the service an INT signal.
-t: Terminate. Send the service a TERM signal.
-k: Kill. Send the service a KILL signal.

now we would not want to kill the postmaster, of course. But should we even be
TERM'ing the postmaster? I dont know. What do the Postgresql Gurus say?

Moreover, if we agree that we need to imbed pg_ctl or postmaster in a script
to handle the above things, it should be doable to handle all of the assorted
other files if they are neccesary to handle .

Do you agree?

Also what would be the problem in checking for the existence of a postmaster
and if none exists then killing the lock files.

My main problem is that I have machines that get creamed by power surges, and
then wont restart postgresql on reboot of the system because of the damn lock
files. I really want to deal with them up front!

MOreover can you tell me more about what init uses the locks for?

what is the role of the files

/var/run/postmaster.pid
/var/lock/subsys/postgresql

that Lamar carefully adds and subtracts?


rm -f /var/run/postmaster.pid
rm -f /var/lock/subsys/postgresql

Thanks
Mitchell















On Thursday 08 May 2003 02:40 pm, you wrote:
> On Thu, May 08, 2003 at 14:10:52 -0400,
>
>   mlaks <mlaks@bellatlantic.net> wrote:
> > Thank you for your response Bruno. I agree about the importance of using
> > the lines
> >
> > #!/bin/sh
> > exec 2>&1
> > exec setuidgid postgres /usr/local/pgsql/bin/postmaster -D
> > /usr/local/pgsql/data
> >
> > in the run file. However, what else must we put in as well?
> >
> > My question is to understand the lock files for postgresql so I can deal
> > with the following:
>
> Some of the lock files have to do with the init system. Those can be
> ignored. Postgres also keeps a lock file and that is used to prevent
> two postmasters from running at the same time. You probably don't want
> to have a script remove that lock file, because if there really is
> another postmaster running, starting a second one can be a disaster.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)


On Thu, May 08, 2003 at 14:57:15 -0400,
  mlaks <mlaks@bellatlantic.net> wrote:
>
> 1. check for a running postmaster
> 2  if not delete the /var/lib/pgsql/data/postmaster.pid files
>
> where would we go wrong with duplicate postmistresses?

postmaster already does that, but there may be cases where it thinks there
is a running postmater and there really isn't. In that case you would
need to verify this, remove the lock file and start by hand.

Having two postmasters running at the same time for the same data
directory will corrupt your databases.


On Thu, May 08, 2003 at 14:39:08 -0400,
  mlaks <mlaks@bellatlantic.net> wrote:
>
> now we would not want to kill the postmaster, of course. But should we even be
> TERM'ing the postmaster? I dont know. What do the Postgresql Gurus say?

I regularly use svc -d to shutdown postmaster and svc -u to restart it.
This works just fine.

> Moreover, if we agree that we need to imbed pg_ctl or postmaster in a script
> to handle the above things, it should be doable to handle all of the assorted
> other files if they are neccesary to handle .

You don't have to do that.

> Also what would be the problem in checking for the existence of a postmaster
> and if none exists then killing the lock files.

I would be very leary of putting this in a script. postmaster already does
this and trying to be smarter than it might cause you a lot of grief.

> My main problem is that I have machines that get creamed by power surges, and
> then wont restart postgresql on reboot of the system because of the damn lock
> files. I really want to deal with them up front!

Most of the time when I have unscheduled shutdowns postgres comes up without
problem. I don't remember if I have had any since I switched to using
supervise though. I have had more issues with someone needing to answer
a question from fsck from the console than postgresql not coming up.

> MOreover can you tell me more about what init uses the locks for?

To tell if the service is already running or not.

>
> what is the role of the files
>
> /var/run/postmaster.pid
> /var/lock/subsys/postgresql
>
> that Lamar carefully adds and subtracts?

I don't know exactly, but I would expect that the pid file is a lock for
the service and that the subsys file is a lock to keep two init scripts
from running for the same time for the same service.


Bruno, Thanks for your help.

i checked - grep in the /etc/rc.d/init.d agrees with what you said - those
/var/lock and /var/run files are commonly placed in all of the services!

Here's my problem:

I had 4 out of 5 machines that got creamed this weekend, and all i needed to
go in for was to erase that file /var/lib/pgsql/data/postmaster.pid.
the same thing!!! (with only one machine) happened about a month ago.

I notice that in his script Lamar does this

pid=`pidof -s postmaster`
        if [ $pid ]
        then
                echo $"Postmaster already running."
        else
                #all systems go -- remove any stale lock files
                rm -f /tmp/.s.PGSQL.* > /dev/null
then he starts up pg_ctl.

What I would be doing is simply adding in

rm  -f  /var/lib/pgsql/data/postmaster.pid  line.

It looks like he isnt worried about getting rid of that tmp/.s.PGSQL.* file
 as long as he ran pidof first -
 (is /tmp/.s.PGSQL.  also a kind of lock file? i dont know  -  do you know
what system sets it  up?)

Also - what do you do about those files

/tmp/.s.PGSQL.* ?

and what do you do about the possibility of supervise starting more than one
of the postmasters?

I like the idea of supervise starting me up again even without a reboot! and
 i just want to catch this problem and solve it.

Thanks, mitchell

> On Thursday 08 May 2003 03:48 pm, you wrote:
> On Thu, May 08, 2003 at 14:39:08 -0400,
>
>   mlaks <mlaks@bellatlantic.net> wrote:
> > now we would not want to kill the postmaster, of course. But should we
> > even be TERM'ing the postmaster? I dont know. What do the Postgresql
> > Gurus say?
>
> I regularly use svc -d to shutdown postmaster and svc -u to restart it.
> This works just fine.
>
> > Moreover, if we agree that we need to imbed pg_ctl or postmaster in a
> > script to handle the above things, it should be doable to handle all of
> > the assorted other files if they are neccesary to handle .
>
> You don't have to do that.
>
> > Also what would be the problem in checking for the existence of a
> > postmaster and if none exists then killing the lock files.
>
> I would be very leary of putting this in a script. postmaster already does
> this and trying to be smarter than it might cause you a lot of grief.
>
> > My main problem is that I have machines that get creamed by power surges,
> > and then wont restart postgresql on reboot of the system because of the
> > damn lock files. I really want to deal with them up front!
>
> Most of the time when I have unscheduled shutdowns postgres comes up
> without problem. I don't remember if I have had any since I switched to
> using supervise though. I have had more issues with someone needing to
> answer a question from fsck from the console than postgresql not coming up.
>
> > MOreover can you tell me more about what init uses the locks for?
>
> To tell if the service is already running or not.
>
> > what is the role of the files
> >
> > /var/run/postmaster.pid
> > /var/lock/subsys/postgresql
> >
> > that Lamar carefully adds and subtracts?
>
> I don't know exactly, but I would expect that the pid file is a lock for
> the service and that the subsys file is a lock to keep two init scripts
> from running for the same time for the same service.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

-------------------------------------------------------


On Thu, May 08, 2003 at 16:30:11 -0400,
  mlaks <mlaks@bellatlantic.net> wrote:
> Bruno, Thanks for your help.
>
> i checked - grep in the /etc/rc.d/init.d agrees with what you said - those
> /var/lock and /var/run files are commonly placed in all of the services!
>
> Here's my problem:
>
> I had 4 out of 5 machines that got creamed this weekend, and all i needed to
> go in for was to erase that file /var/lib/pgsql/data/postmaster.pid.
> the same thing!!! (with only one machine) happened about a month ago.
>
> I notice that in his script Lamar does this
>
> pid=`pidof -s postmaster`
>         if [ $pid ]
>         then
>                 echo $"Postmaster already running."
>         else
>                 #all systems go -- remove any stale lock files
>                 rm -f /tmp/.s.PGSQL.* > /dev/null
> then he starts up.
>
> What I would be doing is simply adding in
>
> rm  -f  /var/lib/pgsql/data/postmaster.pid  line.
>
> It looks like he isnt worried about getting rid of that tmp/.s.PGSQL.* file as
> long as he ran pidof first -
>  (is /tmp/.s.PGSQL.  also a kind of lock file? i dont know  -  do you know
> what system sets it  up?)

Well if there is no process with the pid in postmaster.pid then you are safe.
If there is one then you have to know it isn't a postmaster.

> Also - what do you do about those files
>
> /tmp/.s.PGSQL.* ?

These are place holders for the domain sockets used for local connections.

>
> and what do you do about the possibility of supervise starting more than one
> of the postmasters?

I do this. It is simpler to set up than making a bunch of different init
scripts. Just make sure each postmaster uses a different port and data
area.

> I like the idea of supervise starting me up again even without a reboot! and i
> just want to catch this problem and solve it.
>
> Thanks, mitchell