Thread: postmaster.pid file auto-clean up?

postmaster.pid file auto-clean up?

From
Sebastien Boisvert
Date:
I vaguely remember reading in the release notes (around the time 9.x was released) something about it automatically clearing out the postmaster.pid file if it was found to be stale/invalid when starting the the database server, however I cannot find any reference to this anymore.

Was this something that did, in fact, exist at one point, and was pulled?

Re: postmaster.pid file auto-clean up?

From
Tom Lane
Date:
Sebastien Boisvert <sebastienboisvert@yahoo.com> writes:
> I vaguely remember reading in the release notes (around the time 9.x was released) something about it automatically
clearingout the postmaster.pid file if it was found to be stale/invalid when starting the the database server, however
Icannot find any reference to this anymore. 

It's always done that.

We occasionally see startup scripts that "helpfully" remove the .pid
file.  They are, without exception, wrong and dangerous.  The postmaster
is much more likely to get this right by itself.

            regards, tom lane


Re: postmaster.pid file auto-clean up?

From
Sebastien Boisvert
Date:
Is this mechanism documented anywhere (besides source code)?

It looks like PG will only clean it up if there's no other process running at all on the pid listed in the
postmaster.pidfile, even if any process running on that pid isn't a PG process or there's no server running on the data
directory(as per `pg_ctl status`). 


On Aug 20 2012, at 1:31 PM, Tom Lane wrote:

> Sebastien Boisvert <sebastienboisvert@yahoo.com> writes:
>> I vaguely remember reading in the release notes (around the time 9.x was released) something about it automatically
clearingout the postmaster.pid file if it was found to be stale/invalid when starting the the database server, however
Icannot find any reference to this anymore. 
>
> It's always done that.
>
> We occasionally see startup scripts that "helpfully" remove the .pid
> file.  They are, without exception, wrong and dangerous.  The postmaster
> is much more likely to get this right by itself.
>
>             regards, tom lane
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



Re: postmaster.pid file auto-clean up?

From
Tom Lane
Date:
Sebastien Boisvert <sebastienboisvert@yahoo.com> writes:
> Is this mechanism documented anywhere (besides source code)?

No, not really.

> It looks like PG will only clean it up if there's no other process running at all on the pid listed in the
postmaster.pidfile, even if any process running on that pid isn't a PG process or there's no server running on the data
directory(as per `pg_ctl status`). 

Not sure what you're looking at, but the above is wrong in at least one
critical detail, namely that there's a process-ownership check via
kill().  There are also checks to ensure no children of the previous
postmaster are still alive.  These are not things you want to lightly
bypass: two sets of postmaster children running against the same data
directory *will* result in unrecoverable data corruption.

If you're trying to claim you've seen a false-positive situation, it
would be interesting to hear actual details.

            regards, tom lane


Re: postmaster.pid file auto-clean up?

From
Michael Clark
Date:


On Mon, Aug 20, 2012 at 11:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Sebastien Boisvert <sebastienboisvert@yahoo.com> writes:
> Is this mechanism documented anywhere (besides source code)?

No, not really.

> It looks like PG will only clean it up if there's no other process running at all on the pid listed in the postmaster.pid file, even if any process running on that pid isn't a PG process or there's no server running on the data directory (as per `pg_ctl status`).

Not sure what you're looking at, but the above is wrong in at least one
critical detail, namely that there's a process-ownership check via
kill().  There are also checks to ensure no children of the previous
postmaster are still alive.  These are not things you want to lightly
bypass: two sets of postmaster children running against the same data
directory *will* result in unrecoverable data corruption.

If you're trying to claim you've seen a false-positive situation, it
would be interesting to hear actual details.


Hello, I work with Seb, and have been investigating this deeper.

It does in fact appear that we are getting false-positives.
When trying to start PG using pg_ctl, I am getting this response:
pg_ctl: another server might be running; trying to start server anyway
2012-08-26 04:46:02.211 GMT [] - FATAL:  lock file "postmaster.pid" already exists
2012-08-26 04:46:02.211 GMT [] - HINT:  Is another postmaster (PID 8574) running in data directory "/Users/mclark/Library/Application Support/com.marketcircle.Daylite4/StorageDebug.dlpdb/Data/9_1"?

pg_ctl: this data directory appears to be running a pre-existing postmaster
pg_ctl: could not start server
Examine the log output.


PID 8574 is actually iTunes, not PG, and PG was cleanly brought down on it's last run, there are no children processes running.
Seb figured out how to contrive this situation.
Run PG, copy the pid file, stop pg, copy the copied pid file back to the data dir and edit it, replacing the old PID with that of another running process.

At first we thought our software was to blame, because it checks the PID from PG's pid file to see if a process is running with that PID, and if none are found then we call pg_ctl, otherwise we just continue launching our software and trying to connect to PG.
I just added an additional check to see if the process name for the PID is postgres, and if not then try to start PG with pg_ctl, thinking it would figure it out and remove the pid file as it would if there was no process running with that pid.

Is this considered a bug?  Should PG do a similar check on the process name, or has the way we contrived this doing something unexpected?

Thanks,
Michael.

Re: postmaster.pid file auto-clean up?

From
John R Pierce
Date:
On 08/25/12 9:56 PM, Michael Clark wrote:
> PID 8574 is actually iTunes, not PG, and PG was cleanly brought down
> on it's last run, there are no children processes running.

when postgres is cleanly brought down, the postgresql.pid file is
supposed to be removed.   that file contains the PID that pg_ctl uses.

could you be running a pg_ctl from a different version, in the wrong
directory ?



--
john r pierce                            N 37, W 122
santa cruz ca                         mid-left coast



Re: postmaster.pid file auto-clean up?

From
Tom Lane
Date:
Michael Clark <codingninja@gmail.com> writes:
> It does in fact appear that we are getting false-positives.
> When trying to start PG using pg_ctl, I am getting this response:
> pg_ctl: another server might be running; trying to start server anyway
> 2012-08-26 04:46:02.211 GMT [] - FATAL:  lock file "postmaster.pid" already
> exists
> 2012-08-26 04:46:02.211 GMT [] - HINT:  Is another postmaster (PID 8574)
> running in data directory "/Users/mclark/Library/Application
> Support/com.marketcircle.Daylite4/StorageDebug.dlpdb/Data/9_1"?

> PID 8574 is actually iTunes, not PG,

iTunes?  What is that doing running under PG's userid?

If you mean that you're launching PG under some random user's UID, you
might want to think about giving it a dedicated UID instead, so as to
improve the selectivity of the same-UID check.  This would also give
a good deal more protection to the database files.

> and PG was cleanly brought down on
> it's last run, there are no children processes running.

As John pointed out, if PG was in fact stopped cleanly, the pid file
would not be there.

The symptoms you've described so far seem consistent with the idea that
PG was not stopped "cleanly", but rather by kill -9 on the postmaster
(with the child processes exiting either on their own, or as soon as
they noticed they were orphans).  This is not recommended practice.

> Seb figured out how to contrive this situation.
> Run PG, copy the pid file, stop pg, copy the copied pid file back to the
> data dir and edit it, replacing the old PID with that of another running
> process.

You're kidding, right?  If you intentionally set out to break the
postmaster interlock, you will doubtless be able to do that, and would
still be able to break any other algorithm we might devise.  Let's
confine this discussion to scenarios that could arise without
intentional interference.

            regards, tom lane


Re: postmaster.pid file auto-clean up?

From
Michael Clark
Date:


On Sun, Aug 26, 2012 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Clark <codingninja@gmail.com> writes:
> PID 8574 is actually iTunes, not PG,

iTunes?  What is that doing running under PG's userid?



We back our client application with PG, each OSX user gets their own instance of PG.
It runs as that OSX user.

 
> Seb figured out how to contrive this situation.
> Run PG, copy the pid file, stop pg, copy the copied pid file back to the
> data dir and edit it, replacing the old PID with that of another running
> process.

You're kidding, right?  If you intentionally set out to break the
postmaster interlock, you will doubtless be able to do that, and would
still be able to break any other algorithm we might devise.  Let's
confine this discussion to scenarios that could arise without
intentional interference.


We were presented with a problem we didn't understand.
We set out to try and figure out how we could replicate the problem, for debugging purposes.
We managed to do so to see how our application behaves, and to see how PG behaves.

In the wild this scenario has arisen without intentional interference.  In debugging, yes, we contrived the situation to replicate the behaviour.  Mind you, we may be using PG in an environment that isn't advisable.


We just started this discussion to learn and understand, and to see if this is a situation that would be expected to be handled.

Thanks,
Michael.
 

Re: postmaster.pid file auto-clean up?

From
Alban Hertroys
Date:
On 26 Aug 2012, at 17:21, Michael Clark wrote:

> On Sun, Aug 26, 2012 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Michael Clark <codingninja@gmail.com> writes:
>> > PID 8574 is actually iTunes, not PG,
>>
>> iTunes?  What is that doing running under PG's userid?
>>
>
>
> We back our client application with PG,

> each OSX user gets their own instance of PG.

Are you certain that's necessary? It's generally a better idea to run a single PG server with a database for each user.
Havingmultiple copies running has its use-cases, but the necessity is quite uncommon. 

You could compare what you're doing to giving every user their own copy of OS X. There are situations in which you'd
wantthat, but generally its considered a bad idea. 

You'd never have even thought to do that if you were, for example, using Oracle for the database. That's a hugely
expensivedatabase license for every user on the system, while you really only need one. 

> It runs as that OSX user.

>> > Seb figured out how to contrive this situation.
>> > Run PG, copy the pid file, stop pg, copy the copied pid file back to the
>> > data dir and edit it, replacing the old PID with that of another running
>> > process.
>>
>> You're kidding, right?  If you intentionally set out to break the
>> postmaster interlock, you will doubtless be able to do that, and would
>> still be able to break any other algorithm we might devise.  Let's
>> confine this discussion to scenarios that could arise without
>> intentional interference.
>
> We were presented with a problem we didn't understand.
> We set out to try and figure out how we could replicate the problem, for debugging purposes.
> We managed to do so to see how our application behaves, and to see how PG behaves.
>
> In the wild this scenario has arisen without intentional interference.  In debugging, yes, we contrived the situation
toreplicate the behaviour.  Mind you, we may be using PG in an environment that isn't advisable. 

What you replicated is not what happens when your problem occurs. Processes don't do things like that with each others
PIDfiles. 

What's probably happening in your case is that there's a conflict with another copy of Postgres running; perhaps its
runningunder the same user-id twice (or more) or on the same port? 

My suggestion would be to get rid of those extra copies of PG and just run one instance.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: postmaster.pid file auto-clean up?

From
Michael Clark
Date:


On Sun, Aug 26, 2012 at 1:25 PM, Alban Hertroys <haramrae@gmail.com> wrote:
> We back our client application with PG,

> each OSX user gets their own instance of PG.

Are you certain that's necessary?


It was a decision made, weighing various trade-offs, 4 years ago now.


> In the wild this scenario has arisen without intentional interference.  In debugging, yes, we contrived the situation to replicate the behaviour.  Mind you, we may be using PG in an environment that isn't advisable.

What you replicated is not what happens when your problem occurs. Processes don't do things like that with each others PID files.


That is true.
But the system does recycle pids, especially after a reboot.

I appreciate all the feedback and input from everyone who responded.
Thank you!!  You have answered our questions, and it gives us food for thought.


Michael.