Re: FATAL: lock file "postmaster.pid" already exists - Mailing list pgsql-general

From deepak
Subject Re: FATAL: lock file "postmaster.pid" already exists
Date
Msg-id CAALYute4UraTJ62Lz_NuYq8njVP8Ut8_zpj_MVnAyTp7+eV6Bg@mail.gmail.com
Whole thread Raw
In response to Re: FATAL: lock file "postmaster.pid" already exists  (deepak <deepak.pn@gmail.com>)
Responses Re: FATAL: lock file "postmaster.pid" already exists
List pgsql-general
Hi!

We could reproduce the start-up problem on Windows 2003. After a reboot, postmaster, in its start-up sequence cleans up old temporary files, and this step used to take several minutes (a little over 4 minutes), delaying the writing of line 6 onwards into the PID file. This delay caused pg_ctl to timeout, leaving behind an orphaned postgres.exe process (which eventually forks off many other postgres.exe processes). But since pg_ctl itself isn't running after the timeout, Windows thinks the service isn't running. A subsequent attempt to start the service using pg_ctl now complains about the existing lock file still being used by one of the postgres.exe processes that was spawned before.

We have observed conclusively that file system cache is coming into play. We tested the scenario in which a reboot was followed by navigating the file system under the data directory using "find" Cygwin command, following which there was "no" timeout for pg_ctl and the server started up fine, suggesting that the clean up is way faster when the file system is cached.

Any ideas on fixing this start-up delay in postmaster? 

Could the task of cleanup move elsewhere, specifically to somewhere after the writing of PID file is complete so that pg_ctl doesn't timeout?

Any other suggestions for working around this problem?


Thanks,

Deepak



On Tue, May 8, 2012 at 12:13 PM, deepak <deepak.pn@gmail.com> wrote:


On Tue, May 8, 2012 at 3:09 AM, Alban Hertroys <haramrae@gmail.com> wrote:
On 8 May 2012, at 24:34, deepak wrote:

> Hi,
>
> On Windows 2008, sometimes the server fails to start due to an existing "postmaster.pid' file.
>
> I tried rebooting a few times and even force shutting down the server, and it started up fine.
> It seems to be a race-condition of sorts in the code that detects whether the process with PID
> in the file is running or not.

No, it means that postgres wasn't shut down properly when Windows shut down. Removing the pid-file is one of the last things the shut-down procedure does. The file is used to prevent 2 instances of the same server running on the same data-directory.

If it's a race-condition, it's probably one in Microsoft's shutdown code. I've seen similar problems with Outlook mailboxes on a network directory; Windows unmounts the remote file-systems before Outlook finished updating its files under that mount point, so Outlook throws an error message and Windows doesn't shut down because of that.

I don't suppose that pid-file is on a remote file-system?

No, it's local.
 
> Does any one have this same problem?  Any way to fix it besides removing the PID file
> manually each time the server complains about this?


You could probably script removal of the pid file if its creation date is before the time the system started booting up.


Thanks, it looks like the code already seems to overwrite an old pid file if no other process is using it (if I understand the code correctly, it just echoes a byte onto a pipe to detect this).

Still, I can't see under what conditions this occurs, but I have seen it happen a couple of times, just that I don't know how to predictably reproduce the problem.


--
Deepak


pgsql-general by date:

Previous
From: Paulo Correia
Date:
Subject: Re: Postgres 9.0 Streaming Replication and Load Balancing?
Next
From: Chris Angelico
Date:
Subject: Re: Libpq question