Thread: pid gets overwritten in OSX
Hi, I'm running Postgres on Mac OSX (10.1.4). Every once in a while, I get the following problem: for some reason the postmaster seems to stop running postgres. When I look at the pid attributed to postgres (in postmaster.pid) and check it against ps -aux, I see that either the process doesn't exist anymore or that it has been overwritten by some other program (e.g. MySQL). It's not a big problem since it is enough to restart for the pids to get sorted (just once the problem happened twice in a row), but does anyone have an idea how I could avoid this? Thanks. -------- François Home page: http://www.monpetitcoin.com/ "A fox is a wolf who sends flowers"
Francois Suter sez: } I'm running Postgres on Mac OSX (10.1.4). Every once in a while, I } get the following problem: for some reason the postmaster seems to } stop running postgres. When I look at the pid attributed to postgres } (in postmaster.pid) and check it against ps -aux, I see that either } the process doesn't exist anymore or that it has been overwritten by } some other program (e.g. MySQL). It's not a big problem since it is } enough to restart for the pids to get sorted (just once the problem } happened twice in a row), but does anyone have an idea how I could } avoid this? You'll have to provide more information. I am running OSX 10.1.4 and both PostgreSQL 7.1.2 and MySQL and I have never seen any such behavior. The only way I could even imagine them interacting is if you are trying to use the same directory for both, and even then it shouldn't happen since MySQL and PostgreSQL use different naming schemes for their pid files. Is it possible that PostgreSQL isn't coming up after a reboot and the pid file just happens to have an old pid from the last boot? } Thanks. } François --Greg
Francois Suter wrote: > Hi, > > I'm running Postgres on Mac OSX (10.1.4). Every once in a while, I > get the following problem: for some reason the postmaster seems to > stop running postgres. When I look at the pid attributed to postgres > (in postmaster.pid) and check it against ps -aux, I see that either > the process doesn't exist anymore or that it has been overwritten by > some other program (e.g. MySQL). It's not a big problem since it is > enough to restart for the pids to get sorted (just once the problem > happened twice in a row), but does anyone have an idea how I could > avoid this? That is strange. The odds that a pid would get reused by another long-running program, and that it would be another database, is very small. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
>You'll have to provide more information. I am running OSX 10.1.4 and both >PostgreSQL 7.1.2 and MySQL and I have never seen any such behavior. The >only way I could even imagine them interacting is if you are trying to use >the same directory for both, and even then it shouldn't happen since MySQL >and PostgreSQL use different naming schemes for their pid files. No, I'm definitely not using the same directory for both. As for more info, I'm using Postgres 7.2. >Is it possible that PostgreSQL isn't coming up after a reboot and the pid >file just happens to have an old pid from the last boot? It could be. I have been thinking along this line. I could imagine the following scenario: Postgres starts after quite a few other processes, tries to start with the pid stored in the postmaster.pid file and actually doesn't start because the pid is already in use. Is there an error log somewhere where such an error might appear? Thanks. -------- François Home page: http://www.monpetitcoin.com/ "A fox is a wolf who sends flowers"
Francois Suter <dba@paragraf.ch> writes: > the following scenario: Postgres starts after quite a few other > processes, tries to start with the pid stored in the postmaster.pid > file and actually doesn't start because the pid is already in use. Postgres does not "try to start with the stored pid"; that's entirely impossible under any flavor of Unix. You get the PID the kernel assigns you, and that's that. This could well be a problem of failure to start up, but you're barking up the wrong tree as to why. What is needed at this point is more observation. You need to determine whether the postmaster is in fact starting (and later dying) or failing to start at all --- ie, is the postmaster.pid file left over from a previous system boot cycle? Checking the mod date of the pid file might be enough to tell. > Is there an error log somewhere where such an error might appear? What are you doing with the postmaster's stderr? If your start script for the postmaster is routing it to /dev/null, send it someplace more helpful. regards, tom lane
Thanks for the leads. I will investigate for a while and keep you posted if I find anything that might be of interest to everybody. >What is needed at this point is more observation. You need to determine >whether the postmaster is in fact starting (and later dying) or >failing to start at all --- ie, is the postmaster.pid file left over >from a previous system boot cycle? Checking the mod date of the pid >file might be enough to tell. > >What are you doing with the postmaster's stderr? If your start script >for the postmaster is routing it to /dev/null, send it someplace more >helpful. -------- François Home page: http://www.monpetitcoin.com/ "A fox is a wolf who sends flowers"
<excerpt><excerpt>What is needed at this point is more observation. You need to determine whether the postmaster is in fact starting (and later dying) or failing to start at all --- ie, is the postmaster.pid file left over from a previous system boot cycle? Checking the mod date of the pid file might be enough to tell. </excerpt></excerpt> The error happened again during the week-end and I was able to collect the following from Postgres' logfile: <fontfamily><param>Courier</param><bigger>Lock file "/usr/local/pgsql/data/postmaster.pid" already exists. Is another postmaster (pid 217) running in "/usr/local/pgsql/data"? </bigger></fontfamily>So it seems that the problem is that the postmaster.pid file can't be overwritten. I checked the last mod date and it is indeed left over from last startup. Any idea what could be causing this problem? -------- François Home page: http://www.monpetitcoin.com/ "A fox is a wolf who sends flowers" >>What is needed at this point is more observation. You need to determine >>whether the postmaster is in fact starting (and later dying) or >>failing to start at all --- ie, is the postmaster.pid file left over >>from a previous system boot cycle? Checking the mod date of the pid >>file might be enough to tell. The error happened again during the week-end and I was able to collect the following from Postgres' logfile: Lock file "/usr/local/pgsql/data/postmaster.pid" already exists. Is another postmaster (pid 217) running in "/usr/local/pgsql/data"? So it seems that the problem is that the postmaster.pid file can't be overwritten. I checked the last mod date and it is indeed left over from last startup. Any idea what could be causing this problem? -------- François Home page: http://www.monpetitcoin.com/ "A fox is a wolf who sends flowers"
Francois Suter <dba@paragraf.ch> writes: > The error happened again during the week-end and I was able to=20 > collect the following from Postgres' logfile: > Lock file "/usr/local/pgsql/data/postmaster.pid" already exists. > Is another postmaster (pid 217) running in "/usr/local/pgsql/data"? > So it seems that the problem is that the postmaster.pid file can't be=20 > overwritten. I checked the last mod date and it is indeed left over=20 > from last startup. Any idea what could be causing this problem? Well, it *could* be overwritten, but Postgres won't do it if it sees that there is a process of that PID in the system. What I think is happening is that there's some small variation in the number or ordering of processes launched during system boot. Maybe one time Postgres is PID 217, the next time it is PID 218 and some other daemon happens to get 217. But if 217 is what is in the lockfile, and we see *any* other existent process with PID 217, we cravenly refuse to overwrite the lockfile. I have seen this sort of thing before with other daemons --- on my system, sendmail occasionally refuses to start after a power failure & reboot because it has the same sort of lockfile checking behavior. We could perhaps avoid this scenario by being a little tighter about what we will believe is a conflicting process --- for example, if PID 217 exists but isn't our same userID, don't assume it's the old postmaster still running. But I could easily see that cure being worse than the disease. If it ever let us start two conflicting postmasters in the same data directory, data corruption would be the certain result. That's exactly what the lockfile is there to prevent. The real problem is that the old postmaster was evidently not allowed to shut down cleanly (else it'd have removed its lockfile). How are you powering down the system, anyway? regards, tom lane
>The real problem is that the old postmaster was evidently not allowed >to shut down cleanly (else it'd have removed its lockfile). How are >you powering down the system, anyway? I'm shutting down normally (ok, I mean most of the time I press the power-up button and choose "Shut down" rather than going via the Apple menu). I haven't had a system crash in ages! The only difference I can see (and I would have to test if it makes any difference) is that sometimes I'm working stand-alone at home and sometimes on the network in my office (I'm using a PowerBook G4), but I'm pretty sure I don't have this problem popping up everytime I go back to the office after having used my machine at home. Maybe there's some operation missing at shutdown. I installed PostgreSQL using Mark Liyanage's package. Could there be something missing? Is Postgres taking care of the removal of the postmaster.pid file or do you have to do it yourself in some shutdown script? Best regards. -------- François Home page: http://www.monpetitcoin.com/ "A fox is a wolf who sends flowers"
Francois Suter <dba@paragraf.ch> writes: > Maybe there's some operation missing at shutdown. I installed > PostgreSQL using Mark Liyanage's package. Could there be something > missing? Is Postgres taking care of the removal of the postmaster.pid > file or do you have to do it yourself in some shutdown script? No, you shouldn't need to do it yourself. The approved way to shut down Pg is to send the postmaster a SIGTERM signal --- which I believe all Unixen will do automatically during the shutdown sequence. What may be happening is that the system is not giving the postmaster a long enough grace period between SIGTERM and hard kill. We need a minimum of about three seconds I believe (there's a 2-second sleep() in the checkpoint sync code, which maybe should not be there, but it's there at the moment). Traditionally systems have allowed 10 seconds or more to respond to SIGTERM, but perhaps Apple thought they could shave some time there? regards, tom lane
On Mon, 2002-04-29 at 17:05, Francois Suter wrote: > >The real problem is that the old postmaster was evidently not allowed > >to shut down cleanly (else it'd have removed its lockfile). How are > >you powering down the system, anyway? > > I'm shutting down normally (ok, I mean most of the time I press the > power-up button and choose "Shut down" rather than going via the > Apple menu). I haven't had a system crash in ages! The only > difference I can see (and I would have to test if it makes any > difference) is that sometimes I'm working stand-alone at home and > sometimes on the network in my office (I'm using a PowerBook G4), but > I'm pretty sure I don't have this problem popping up everytime I go > back to the office after having used my machine at home. > > Maybe there's some operation missing at shutdown. I installed > PostgreSQL using Mark Liyanage's package. Could there be something > missing? Is Postgres taking care of the removal of the postmaster.pid > file or do you have to do it yourself in some shutdown script? François I would definitely quit postgres before shutting down. And Mac OS X does not in my experience like working in "offline" mode. I had all sorts of problems getting networking set up right in that mode. All my problems disapeared when the machine was plugged in to the adsl router... I would say that DNS could be an issue here. Cheers Tony Grant -- RedHat Linux on Sony Vaio C1XD/S http://www.animaproductions.com/linux2.html Macromedia UltraDev with PostgreSQL http://www.animaproductions.com/ultra.html