Thread: Postgresql goes down need to restart (redhat postgresql service script) lock files removal avoid 2 postmasters
Postgresql goes down need to restart (redhat postgresql service script) lock files removal avoid 2 postmasters
From
mlaks
Date:
Hi, Thank you Tom. I have been looking at the postgresql service startup scripts in Redhat written by Lamar Owen et al. I would like to understand what is the role of the following files that are created during startup of Postgresql on my Redhat Linux box. 1. /var/lib/pgsql/data/postmaster.pid 2. /var/run/postmaster.pid (for redhat 7.3/Postgresql 7.2) /var/run/postmaster5432.pid (for redhat 9.0/Postgresql 7.3.2) 3. /var/lock/subsys/postgresql and 4. /tmp/.s.PGSQL.5432.lock (and associated link to the directory in that directory). I notice that the file 1. /var/lib/pgsql/data/postmaster.pid contains the pid of the /usr/bin/postmaster process. Interestingly Lamar does not rm this file on stop(). 2. /var/run/postmaster.pid contains the pid of a postgres stats process 3. the /tmp/.s.PGSQL.5432.lock file has the pid of the /usr/bin/postmaster process. Why do I care? My goal is to use DJ Bernsteins daemonstools to make sure that my Postgresql process goes back up unattended if it goes down for some reason. So I will be substituting daemontools for the postgresql service script. Thus I want to know what lock files to remove to make sure all is ok. I also want to follow Tom Lanes's advice and not shoot myself in the foot by creating two different postmaster processes working the same database!!!! Thank you all for your help!!! Mitchell Laks
Re: Postgresql goes down need to restart (redhat postgresql service script) lock files removal avoid 2 postmasters
From
Bruno Wolff III
Date:
On Thu, May 08, 2003 at 12:50:49 -0400, mlaks <mlaks@bellatlantic.net> wrote: > > My goal is to use DJ Bernsteins daemonstools to make sure that my Postgresql > process goes back up unattended if it goes down for some reason. So I will be > substituting daemontools for the postgresql service script. > Thus I want to know what lock files to remove to make sure all is ok. I also > want to follow Tom Lanes's advice and not shoot myself in the foot by > creating two different postmaster processes working the same database!!!! This is what I put in my run file: #!/bin/sh exec 2>&1 exec setuidgid postgres /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data I use multilog for logging as you normally would.
Re: Postgresql goes down need to restart (redhat postgresql service script) lock files removal avoid 2 postmasters
From
mlaks
Date:
Thank you for your response Bruno. I agree about the importance of using the lines #!/bin/sh exec 2>&1 exec setuidgid postgres /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data in the run file. However, what else must we put in as well? My question is to understand the lock files for postgresql so I can deal with the following: 1. I notice that Lamar's postgresql service script removes "stale lock files" before starting postgresql by using the line rm -f /tmp/.s.PGSQL.* > /dev/null and perhaps my own experience indicates we also should add a line rm -f /var/lib/pgsql/data/postmaster.pid because sometimes when my machine crashes and gets rebooted I must manually remove that file. 2. Moreover, I see that after successfully starting postgresql Lamar touches a file touch /var/lock/subsys/postgresql and does this echo $pid > /var/run/postmaster.pid so how can we do that? 3. I can imagine we can accomplish 1. with #!/bin/sh rm -f /tmp/.s.PGSQL.* > /dev/null rm -f /var/lib/pgsql/data/postmaster.pid exec 2>&1 exec setuidgid postgres /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data but how do we do 2. -> the touching and echoing after the process starts if we have replaced the "run" process by the postmaster process with the exec so that the daemontools "svc" can control the process? Mitchell On Thursday 08 May 2003 01:50 pm, you wrote: > On Thu, May 08, 2003 at 12:50:49 -0400, > > mlaks <mlaks@bellatlantic.net> wrote: > > My goal is to use DJ Bernsteins daemonstools to make sure that my > > Postgresql process goes back up unattended if it goes down for some > > reason. So I will be substituting daemontools for the postgresql service > > script. > > Thus I want to know what lock files to remove to make sure all is ok. I > > also want to follow Tom Lanes's advice and not shoot myself in the foot > > by creating two different postmaster processes working the same > > database!!!! > > This is what I put in my run file: > #!/bin/sh > exec 2>&1 > exec setuidgid postgres /usr/local/pgsql/bin/postmaster -D > /usr/local/pgsql/data > > I use multilog for logging as you normally would.
Re: Postgresql goes down need to restart (redhat postgresql service script) lock files removal avoid 2 postmasters
From
Bruno Wolff III
Date:
On Thu, May 08, 2003 at 14:10:52 -0400, mlaks <mlaks@bellatlantic.net> wrote: > Thank you for your response Bruno. I agree about the importance of using the > lines > > #!/bin/sh > exec 2>&1 > exec setuidgid postgres /usr/local/pgsql/bin/postmaster -D > /usr/local/pgsql/data > > in the run file. However, what else must we put in as well? > > My question is to understand the lock files for postgresql so I can deal with > the following: Some of the lock files have to do with the init system. Those can be ignored. Postgres also keeps a lock file and that is used to prevent two postmasters from running at the same time. You probably don't want to have a script remove that lock file, because if there really is another postmaster running, starting a second one can be a disaster.
Re: Postgresql goes down need to restart (redhat postgresql service script) lock files removal avoid 2 postmasters
From
mlaks
Date:
Bruno, Thanks for your help. I was wondering: Should we in fact be execing the postmaster as you describe or perhaps pg_ctl as Lamar's script uses or perhaps a "new" script that incorporates pg_ctl or postmaster and a signal catching mechanism. The reason I ask is that the way that daemontools stops a service - if you want it to - is via the command svc opts postgresl : with opts -d: Down. If the service is running, send it a TERM signal and then a CONT signal. After it stops, do not restart it. -o: Once. If the service is not running, start it. Do not restart it if it stops. -p: Pause. Send the service a STOP signal. -c: Continue. Send the service a CONT signal. -h: Hangup. Send the service a HUP signal. -a: Alarm. Send the service an ALRM signal. -i: Interrupt. Send the service an INT signal. -t: Terminate. Send the service a TERM signal. -k: Kill. Send the service a KILL signal. now we would not want to kill the postmaster, of course. But should we even be TERM'ing the postmaster? I dont know. What do the Postgresql Gurus say? Moreover, if we agree that we need to imbed pg_ctl or postmaster in a script to handle the above things, it should be doable to handle all of the assorted other files if they are neccesary to handle . Do you agree? Also what would be the problem in checking for the existence of a postmaster and if none exists then killing the lock files. My main problem is that I have machines that get creamed by power surges, and then wont restart postgresql on reboot of the system because of the damn lock files. I really want to deal with them up front! MOreover can you tell me more about what init uses the locks for? what is the role of the files /var/run/postmaster.pid /var/lock/subsys/postgresql that Lamar carefully adds and subtracts? rm -f /var/run/postmaster.pid rm -f /var/lock/subsys/postgresql Thanks Mitchell On Thursday 08 May 2003 02:40 pm, you wrote: > On Thu, May 08, 2003 at 14:10:52 -0400, > > mlaks <mlaks@bellatlantic.net> wrote: > > Thank you for your response Bruno. I agree about the importance of using > > the lines > > > > #!/bin/sh > > exec 2>&1 > > exec setuidgid postgres /usr/local/pgsql/bin/postmaster -D > > /usr/local/pgsql/data > > > > in the run file. However, what else must we put in as well? > > > > My question is to understand the lock files for postgresql so I can deal > > with the following: > > Some of the lock files have to do with the init system. Those can be > ignored. Postgres also keeps a lock file and that is used to prevent > two postmasters from running at the same time. You probably don't want > to have a script remove that lock file, because if there really is > another postmaster running, starting a second one can be a disaster. > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Re: Postgresql goes down need to restart (redhat postgresql service script) lock files removal avoid 2 postmasters
From
Bruno Wolff III
Date:
On Thu, May 08, 2003 at 14:57:15 -0400, mlaks <mlaks@bellatlantic.net> wrote: > > 1. check for a running postmaster > 2 if not delete the /var/lib/pgsql/data/postmaster.pid files > > where would we go wrong with duplicate postmistresses? postmaster already does that, but there may be cases where it thinks there is a running postmater and there really isn't. In that case you would need to verify this, remove the lock file and start by hand. Having two postmasters running at the same time for the same data directory will corrupt your databases.
Re: Postgresql goes down need to restart (redhat postgresql service script) lock files removal avoid 2 postmasters
From
Bruno Wolff III
Date:
On Thu, May 08, 2003 at 14:39:08 -0400, mlaks <mlaks@bellatlantic.net> wrote: > > now we would not want to kill the postmaster, of course. But should we even be > TERM'ing the postmaster? I dont know. What do the Postgresql Gurus say? I regularly use svc -d to shutdown postmaster and svc -u to restart it. This works just fine. > Moreover, if we agree that we need to imbed pg_ctl or postmaster in a script > to handle the above things, it should be doable to handle all of the assorted > other files if they are neccesary to handle . You don't have to do that. > Also what would be the problem in checking for the existence of a postmaster > and if none exists then killing the lock files. I would be very leary of putting this in a script. postmaster already does this and trying to be smarter than it might cause you a lot of grief. > My main problem is that I have machines that get creamed by power surges, and > then wont restart postgresql on reboot of the system because of the damn lock > files. I really want to deal with them up front! Most of the time when I have unscheduled shutdowns postgres comes up without problem. I don't remember if I have had any since I switched to using supervise though. I have had more issues with someone needing to answer a question from fsck from the console than postgresql not coming up. > MOreover can you tell me more about what init uses the locks for? To tell if the service is already running or not. > > what is the role of the files > > /var/run/postmaster.pid > /var/lock/subsys/postgresql > > that Lamar carefully adds and subtracts? I don't know exactly, but I would expect that the pid file is a lock for the service and that the subsys file is a lock to keep two init scripts from running for the same time for the same service.
Re: Postgresql goes down need to restart (redhat postgresql service script) lock files removal avoid 2 postmasters
From
mlaks
Date:
Bruno, Thanks for your help. i checked - grep in the /etc/rc.d/init.d agrees with what you said - those /var/lock and /var/run files are commonly placed in all of the services! Here's my problem: I had 4 out of 5 machines that got creamed this weekend, and all i needed to go in for was to erase that file /var/lib/pgsql/data/postmaster.pid. the same thing!!! (with only one machine) happened about a month ago. I notice that in his script Lamar does this pid=`pidof -s postmaster` if [ $pid ] then echo $"Postmaster already running." else #all systems go -- remove any stale lock files rm -f /tmp/.s.PGSQL.* > /dev/null then he starts up pg_ctl. What I would be doing is simply adding in rm -f /var/lib/pgsql/data/postmaster.pid line. It looks like he isnt worried about getting rid of that tmp/.s.PGSQL.* file as long as he ran pidof first - (is /tmp/.s.PGSQL. also a kind of lock file? i dont know - do you know what system sets it up?) Also - what do you do about those files /tmp/.s.PGSQL.* ? and what do you do about the possibility of supervise starting more than one of the postmasters? I like the idea of supervise starting me up again even without a reboot! and i just want to catch this problem and solve it. Thanks, mitchell > On Thursday 08 May 2003 03:48 pm, you wrote: > On Thu, May 08, 2003 at 14:39:08 -0400, > > mlaks <mlaks@bellatlantic.net> wrote: > > now we would not want to kill the postmaster, of course. But should we > > even be TERM'ing the postmaster? I dont know. What do the Postgresql > > Gurus say? > > I regularly use svc -d to shutdown postmaster and svc -u to restart it. > This works just fine. > > > Moreover, if we agree that we need to imbed pg_ctl or postmaster in a > > script to handle the above things, it should be doable to handle all of > > the assorted other files if they are neccesary to handle . > > You don't have to do that. > > > Also what would be the problem in checking for the existence of a > > postmaster and if none exists then killing the lock files. > > I would be very leary of putting this in a script. postmaster already does > this and trying to be smarter than it might cause you a lot of grief. > > > My main problem is that I have machines that get creamed by power surges, > > and then wont restart postgresql on reboot of the system because of the > > damn lock files. I really want to deal with them up front! > > Most of the time when I have unscheduled shutdowns postgres comes up > without problem. I don't remember if I have had any since I switched to > using supervise though. I have had more issues with someone needing to > answer a question from fsck from the console than postgresql not coming up. > > > MOreover can you tell me more about what init uses the locks for? > > To tell if the service is already running or not. > > > what is the role of the files > > > > /var/run/postmaster.pid > > /var/lock/subsys/postgresql > > > > that Lamar carefully adds and subtracts? > > I don't know exactly, but I would expect that the pid file is a lock for > the service and that the subsys file is a lock to keep two init scripts > from running for the same time for the same service. > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -------------------------------------------------------
Re: Postgresql goes down need to restart (redhat postgresql service script) lock files removal avoid 2 postmasters
From
Bruno Wolff III
Date:
On Thu, May 08, 2003 at 16:30:11 -0400, mlaks <mlaks@bellatlantic.net> wrote: > Bruno, Thanks for your help. > > i checked - grep in the /etc/rc.d/init.d agrees with what you said - those > /var/lock and /var/run files are commonly placed in all of the services! > > Here's my problem: > > I had 4 out of 5 machines that got creamed this weekend, and all i needed to > go in for was to erase that file /var/lib/pgsql/data/postmaster.pid. > the same thing!!! (with only one machine) happened about a month ago. > > I notice that in his script Lamar does this > > pid=`pidof -s postmaster` > if [ $pid ] > then > echo $"Postmaster already running." > else > #all systems go -- remove any stale lock files > rm -f /tmp/.s.PGSQL.* > /dev/null > then he starts up. > > What I would be doing is simply adding in > > rm -f /var/lib/pgsql/data/postmaster.pid line. > > It looks like he isnt worried about getting rid of that tmp/.s.PGSQL.* file as > long as he ran pidof first - > (is /tmp/.s.PGSQL. also a kind of lock file? i dont know - do you know > what system sets it up?) Well if there is no process with the pid in postmaster.pid then you are safe. If there is one then you have to know it isn't a postmaster. > Also - what do you do about those files > > /tmp/.s.PGSQL.* ? These are place holders for the domain sockets used for local connections. > > and what do you do about the possibility of supervise starting more than one > of the postmasters? I do this. It is simpler to set up than making a bunch of different init scripts. Just make sure each postmaster uses a different port and data area. > I like the idea of supervise starting me up again even without a reboot! and i > just want to catch this problem and solve it. > > Thanks, mitchell