On Sat, Jan 19, 2013 at 12:47:03AM -0500, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Sat, Jan 19, 2013 at 12:02:31AM -0500, Tom Lane wrote:
> >> In the meantime, I was wondering a bit why pg_upgrade looks at the
> >> postmaster.pid file at all.
>
> > The reason we check for postmaster.pid is so we can give the user a clue
> > about which postmaster is running.
>
> [ scratches head... ] I failed to detect any such clue in the error
> message it prints. Had you printed the PID from the file, or even
> better looked to see if that process was actually still alive, this
> argument would be reasonable. But pg_upgrade does neither of those,
> whereas if it had started a postmaster the postmaster would have done
> both of those things.
>
> > Also, we don't want to start on a non-clean shutdown, so the missing pid
> > file tells us it was clean.
>
> I agree that super paranoia is not unreasonable in pg_upgrade. But it
> would be useful to print something similar to what the backend prints,
> about checking whether PID N is still there and manually removing the
> lock file if not. Or (ahem) you could let the existing backend-side
> logic do that for you, rather than reimplementing that logic badly.
The current output is:
There seems to be a postmaster servicing the old cluster.
Please shutdown that postmaster and try again.
You are right that it is inaccurate. I should reword that to say the
server is running or was not properly shut down:
There seems to be a postmaster servicing the old cluster, or
it was not properly shut down. Please cleanly shutdown that
postmaster and try again.
Why is a clean shutdown important? If the server crashed, we would have
committed transactions in the WAL files which are not transfered to the
new server, and would be lost.
I am hesistant to even start such an old server because pg_upgrade never
modifies the old server. Even starting it in that case would be
modifying it.
The other problem is that if the server start fails, how do we know if
the failure was due to a running postmaster? I could later check the
postmaster.pid file, but it might have failed not yet getting to the
section where we remove that file.
The server-still-running is a common cause of failure, so I wanted
something that was very clear, rather than a generic
can't-start-the-server.
I am open to ideas.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +