Re: postmaster.pid file auto-clean up? - Mailing list pgsql-general

From Michael Clark
Subject Re: postmaster.pid file auto-clean up?
Date
Msg-id CACAT_AcbbfeG47a7apc7goR8bqcwpsJQwfzXYsGLxrPhTAtraQ@mail.gmail.com
Whole thread Raw
In response to Re: postmaster.pid file auto-clean up?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: postmaster.pid file auto-clean up?
Re: postmaster.pid file auto-clean up?
List pgsql-general


On Mon, Aug 20, 2012 at 11:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Sebastien Boisvert <sebastienboisvert@yahoo.com> writes:
> Is this mechanism documented anywhere (besides source code)?

No, not really.

> It looks like PG will only clean it up if there's no other process running at all on the pid listed in the postmaster.pid file, even if any process running on that pid isn't a PG process or there's no server running on the data directory (as per `pg_ctl status`).

Not sure what you're looking at, but the above is wrong in at least one
critical detail, namely that there's a process-ownership check via
kill().  There are also checks to ensure no children of the previous
postmaster are still alive.  These are not things you want to lightly
bypass: two sets of postmaster children running against the same data
directory *will* result in unrecoverable data corruption.

If you're trying to claim you've seen a false-positive situation, it
would be interesting to hear actual details.


Hello, I work with Seb, and have been investigating this deeper.

It does in fact appear that we are getting false-positives.
When trying to start PG using pg_ctl, I am getting this response:
pg_ctl: another server might be running; trying to start server anyway
2012-08-26 04:46:02.211 GMT [] - FATAL:  lock file "postmaster.pid" already exists
2012-08-26 04:46:02.211 GMT [] - HINT:  Is another postmaster (PID 8574) running in data directory "/Users/mclark/Library/Application Support/com.marketcircle.Daylite4/StorageDebug.dlpdb/Data/9_1"?

pg_ctl: this data directory appears to be running a pre-existing postmaster
pg_ctl: could not start server
Examine the log output.


PID 8574 is actually iTunes, not PG, and PG was cleanly brought down on it's last run, there are no children processes running.
Seb figured out how to contrive this situation.
Run PG, copy the pid file, stop pg, copy the copied pid file back to the data dir and edit it, replacing the old PID with that of another running process.

At first we thought our software was to blame, because it checks the PID from PG's pid file to see if a process is running with that PID, and if none are found then we call pg_ctl, otherwise we just continue launching our software and trying to connect to PG.
I just added an additional check to see if the process name for the PID is postgres, and if not then try to start PG with pg_ctl, thinking it would figure it out and remove the pid file as it would if there was no process running with that pid.

Is this considered a bug?  Should PG do a similar check on the process name, or has the way we contrived this doing something unexpected?

Thanks,
Michael.

pgsql-general by date:

Previous
From: Vincent Veyron
Date:
Subject: Re: create table like . . . constraint names
Next
From: John R Pierce
Date:
Subject: Re: postmaster.pid file auto-clean up?