Re: pid gets overwritten in OSX - Mailing list pgsql-general

From Tom Lane
Subject Re: pid gets overwritten in OSX
Date
Msg-id 23554.1020090521@sss.pgh.pa.us
Whole thread Raw
In response to Re: pid gets overwritten in OSX  (Francois Suter <dba@paragraf.ch>)
Responses Re: pid gets overwritten in OSX
List pgsql-general
Francois Suter <dba@paragraf.ch> writes:
> The error happened again during the week-end and I was able to=20
> collect the following from Postgres' logfile:

> Lock file "/usr/local/pgsql/data/postmaster.pid" already exists.
> Is another postmaster (pid 217) running in "/usr/local/pgsql/data"?

> So it seems that the problem is that the postmaster.pid file can't be=20
> overwritten. I checked the last mod date and it is indeed left over=20
> from last startup. Any idea what could be causing this problem?

Well, it *could* be overwritten, but Postgres won't do it if it sees
that there is a process of that PID in the system.

What I think is happening is that there's some small variation in the
number or ordering of processes launched during system boot.  Maybe one
time Postgres is PID 217, the next time it is PID 218 and some other
daemon happens to get 217.  But if 217 is what is in the lockfile, and
we see *any* other existent process with PID 217, we cravenly refuse
to overwrite the lockfile.

I have seen this sort of thing before with other daemons --- on my
system, sendmail occasionally refuses to start after a power failure &
reboot because it has the same sort of lockfile checking behavior.

We could perhaps avoid this scenario by being a little tighter about
what we will believe is a conflicting process --- for example, if PID
217 exists but isn't our same userID, don't assume it's the old
postmaster still running.  But I could easily see that cure being worse
than the disease.  If it ever let us start two conflicting postmasters
in the same data directory, data corruption would be the certain result.
That's exactly what the lockfile is there to prevent.

The real problem is that the old postmaster was evidently not allowed
to shut down cleanly (else it'd have removed its lockfile).  How are
you powering down the system, anyway?

            regards, tom lane

pgsql-general by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Desc of Functions
Next
From: "ARP"
Date:
Subject: Re: What popular, large commercial websites run PostgreSQL?