Thread: pid gets overwritten in OSX

pid gets overwritten in OSX

From
Francois Suter
Date:
Hi,

I'm running Postgres on Mac OSX (10.1.4). Every once in a while, I
get the following problem: for some reason the postmaster seems to
stop running postgres. When I look at the pid attributed to postgres
(in postmaster.pid) and check it against ps -aux, I see that either
the process doesn't exist anymore or that it has been overwritten by
some other program (e.g. MySQL). It's not a big problem since it is
enough to restart for the pids to get sorted (just once the problem
happened twice in a row), but does anyone have an idea how I could
avoid this?

Thanks.

--------
François

Home page: http://www.monpetitcoin.com/
"A fox is a wolf who sends flowers"

Re: pid gets overwritten in OSX

From
Gregory Seidman
Date:
Francois Suter sez:
} I'm running Postgres on Mac OSX (10.1.4). Every once in a while, I
} get the following problem: for some reason the postmaster seems to
} stop running postgres. When I look at the pid attributed to postgres
} (in postmaster.pid) and check it against ps -aux, I see that either
} the process doesn't exist anymore or that it has been overwritten by
} some other program (e.g. MySQL). It's not a big problem since it is
} enough to restart for the pids to get sorted (just once the problem
} happened twice in a row), but does anyone have an idea how I could
} avoid this?

You'll have to provide more information. I am running OSX 10.1.4 and both
PostgreSQL 7.1.2 and MySQL and I have never seen any such behavior. The
only way I could even imagine them interacting is if you are trying to use
the same directory for both, and even then it shouldn't happen since MySQL
and PostgreSQL use different naming schemes for their pid files.

Is it possible that PostgreSQL isn't coming up after a reboot and the pid
file just happens to have an old pid from the last boot?

} Thanks.
} François
--Greg


Re: pid gets overwritten in OSX

From
Bruce Momjian
Date:
Francois Suter wrote:
> Hi,
>
> I'm running Postgres on Mac OSX (10.1.4). Every once in a while, I
> get the following problem: for some reason the postmaster seems to
> stop running postgres. When I look at the pid attributed to postgres
> (in postmaster.pid) and check it against ps -aux, I see that either
> the process doesn't exist anymore or that it has been overwritten by
> some other program (e.g. MySQL). It's not a big problem since it is
> enough to restart for the pids to get sorted (just once the problem
> happened twice in a row), but does anyone have an idea how I could
> avoid this?

That is strange.  The odds that a pid would get reused by another
long-running program, and that it would be another database, is very
small.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: pid gets overwritten in OSX

From
Francois Suter
Date:
>You'll have to provide more information. I am running OSX 10.1.4 and both
>PostgreSQL 7.1.2 and MySQL and I have never seen any such behavior. The
>only way I could even imagine them interacting is if you are trying to use
>the same directory for both, and even then it shouldn't happen since MySQL
>and PostgreSQL use different naming schemes for their pid files.

No, I'm definitely not using the same directory for both. As for more
info, I'm using Postgres 7.2.

>Is it possible that PostgreSQL isn't coming up after a reboot and the pid
>file just happens to have an old pid from the last boot?

It could be. I have been thinking along this line. I could imagine
the following scenario: Postgres starts after quite a few other
processes, tries to start with the pid stored in the postmaster.pid
file and actually doesn't start because the pid is already in use. Is
there an error log somewhere where such an error might appear?

Thanks.


--------
François

Home page: http://www.monpetitcoin.com/
"A fox is a wolf who sends flowers"

Re: pid gets overwritten in OSX

From
Tom Lane
Date:
Francois Suter <dba@paragraf.ch> writes:
> the following scenario: Postgres starts after quite a few other
> processes, tries to start with the pid stored in the postmaster.pid
> file and actually doesn't start because the pid is already in use.

Postgres does not "try to start with the stored pid"; that's entirely
impossible under any flavor of Unix.  You get the PID the kernel assigns
you, and that's that.  This could well be a problem of failure to start
up, but you're barking up the wrong tree as to why.

What is needed at this point is more observation.  You need to determine
whether the postmaster is in fact starting (and later dying) or
failing to start at all --- ie, is the postmaster.pid file left over
from a previous system boot cycle?  Checking the mod date of the pid
file might be enough to tell.

> Is there an error log somewhere where such an error might appear?

What are you doing with the postmaster's stderr?  If your start script
for the postmaster is routing it to /dev/null, send it someplace more
helpful.

            regards, tom lane

Re: pid gets overwritten in OSX

From
Francois Suter
Date:
Thanks for the leads. I will investigate for a while and keep you
posted if I find anything that might be of interest to everybody.

>What is needed at this point is more observation.  You need to determine
>whether the postmaster is in fact starting (and later dying) or
>failing to start at all --- ie, is the postmaster.pid file left over
>from a previous system boot cycle?  Checking the mod date of the pid
>file might be enough to tell.
>
>What are you doing with the postmaster's stderr?  If your start script
>for the postmaster is routing it to /dev/null, send it someplace more
>helpful.


--------
François

Home page: http://www.monpetitcoin.com/
"A fox is a wolf who sends flowers"

Re: pid gets overwritten in OSX

From
Francois Suter
Date:
<excerpt><excerpt>What is needed at this point is more observation.
You need to determine

whether the postmaster is in fact starting (and later dying) or

failing to start at all --- ie, is the postmaster.pid file left over

from a previous system boot cycle?  Checking the mod date of the pid

file might be enough to tell.

</excerpt></excerpt>

The error happened again during the week-end and I was able to collect
the following from Postgres' logfile:


<fontfamily><param>Courier</param><bigger>Lock file
"/usr/local/pgsql/data/postmaster.pid" already exists.

Is another postmaster (pid 217) running in "/usr/local/pgsql/data"?


</bigger></fontfamily>So it seems that the problem is that the
postmaster.pid file can't be overwritten. I checked the last mod date
and it is indeed left over from last startup. Any idea what could be
causing this problem?



--------

François


Home page: http://www.monpetitcoin.com/

"A fox is a wolf who sends flowers"
>>What is needed at this point is more observation.  You need to determine
>>whether the postmaster is in fact starting (and later dying) or
>>failing to start at all --- ie, is the postmaster.pid file left over
>>from a previous system boot cycle?  Checking the mod date of the pid
>>file might be enough to tell.

The error happened again during the week-end and I was able to
collect the following from Postgres' logfile:

Lock file "/usr/local/pgsql/data/postmaster.pid" already exists.
Is another postmaster (pid 217) running in "/usr/local/pgsql/data"?

So it seems that the problem is that the postmaster.pid file can't be
overwritten. I checked the last mod date and it is indeed left over
from last startup. Any idea what could be causing this problem?


--------
François

Home page: http://www.monpetitcoin.com/
"A fox is a wolf who sends flowers"

Re: pid gets overwritten in OSX

From
Tom Lane
Date:
Francois Suter <dba@paragraf.ch> writes:
> The error happened again during the week-end and I was able to=20
> collect the following from Postgres' logfile:

> Lock file "/usr/local/pgsql/data/postmaster.pid" already exists.
> Is another postmaster (pid 217) running in "/usr/local/pgsql/data"?

> So it seems that the problem is that the postmaster.pid file can't be=20
> overwritten. I checked the last mod date and it is indeed left over=20
> from last startup. Any idea what could be causing this problem?

Well, it *could* be overwritten, but Postgres won't do it if it sees
that there is a process of that PID in the system.

What I think is happening is that there's some small variation in the
number or ordering of processes launched during system boot.  Maybe one
time Postgres is PID 217, the next time it is PID 218 and some other
daemon happens to get 217.  But if 217 is what is in the lockfile, and
we see *any* other existent process with PID 217, we cravenly refuse
to overwrite the lockfile.

I have seen this sort of thing before with other daemons --- on my
system, sendmail occasionally refuses to start after a power failure &
reboot because it has the same sort of lockfile checking behavior.

We could perhaps avoid this scenario by being a little tighter about
what we will believe is a conflicting process --- for example, if PID
217 exists but isn't our same userID, don't assume it's the old
postmaster still running.  But I could easily see that cure being worse
than the disease.  If it ever let us start two conflicting postmasters
in the same data directory, data corruption would be the certain result.
That's exactly what the lockfile is there to prevent.

The real problem is that the old postmaster was evidently not allowed
to shut down cleanly (else it'd have removed its lockfile).  How are
you powering down the system, anyway?

            regards, tom lane

Re: pid gets overwritten in OSX

From
Francois Suter
Date:
>The real problem is that the old postmaster was evidently not allowed
>to shut down cleanly (else it'd have removed its lockfile).  How are
>you powering down the system, anyway?

I'm shutting down normally (ok, I mean most of the time I press the
power-up button and choose "Shut down" rather than going via the
Apple menu). I haven't had a system crash in ages! The only
difference I can see (and I would have to test if it makes any
difference) is that sometimes I'm working stand-alone at home and
sometimes on the network in my office (I'm using a PowerBook G4), but
I'm pretty sure I don't have this problem popping up everytime I go
back to the office after having used my machine at home.

Maybe there's some operation missing at shutdown. I installed
PostgreSQL using Mark Liyanage's package. Could there be something
missing? Is Postgres taking care of the removal of the postmaster.pid
file or do you have to do it yourself in some shutdown script?

Best regards.


--------
François

Home page: http://www.monpetitcoin.com/
"A fox is a wolf who sends flowers"

Re: pid gets overwritten in OSX

From
Tom Lane
Date:
Francois Suter <dba@paragraf.ch> writes:
> Maybe there's some operation missing at shutdown. I installed
> PostgreSQL using Mark Liyanage's package. Could there be something
> missing? Is Postgres taking care of the removal of the postmaster.pid
> file or do you have to do it yourself in some shutdown script?

No, you shouldn't need to do it yourself.  The approved way to shut down
Pg is to send the postmaster a SIGTERM signal --- which I believe all
Unixen will do automatically during the shutdown sequence.  What may be
happening is that the system is not giving the postmaster a long enough
grace period between SIGTERM and hard kill.  We need a minimum of about
three seconds I believe (there's a 2-second sleep() in the checkpoint
sync code, which maybe should not be there, but it's there at the
moment).  Traditionally systems have allowed 10 seconds or more to
respond to SIGTERM, but perhaps Apple thought they could shave some
time there?

            regards, tom lane

Re: pid gets overwritten in OSX

From
tony
Date:
On Mon, 2002-04-29 at 17:05, Francois Suter wrote:
> >The real problem is that the old postmaster was evidently not allowed
> >to shut down cleanly (else it'd have removed its lockfile).  How are
> >you powering down the system, anyway?
>
> I'm shutting down normally (ok, I mean most of the time I press the
> power-up button and choose "Shut down" rather than going via the
> Apple menu). I haven't had a system crash in ages! The only
> difference I can see (and I would have to test if it makes any
> difference) is that sometimes I'm working stand-alone at home and
> sometimes on the network in my office (I'm using a PowerBook G4), but
> I'm pretty sure I don't have this problem popping up everytime I go
> back to the office after having used my machine at home.
>
> Maybe there's some operation missing at shutdown. I installed
> PostgreSQL using Mark Liyanage's package. Could there be something
> missing? Is Postgres taking care of the removal of the postmaster.pid
> file or do you have to do it yourself in some shutdown script?

François

I would definitely quit postgres before shutting down. And Mac OS X does
not in my experience like working in "offline" mode. I had all sorts of
problems getting networking set up right in that mode. All my problems
disapeared when the machine was plugged in to the adsl router...

I would say that DNS could be an issue here.

Cheers

Tony Grant

--
RedHat Linux on Sony Vaio C1XD/S
http://www.animaproductions.com/linux2.html
Macromedia UltraDev with PostgreSQL
http://www.animaproductions.com/ultra.html