Re: Yet another failure mode in pg_upgrade - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Yet another failure mode in pg_upgrade
Date
Msg-id 20120901154558.GA2969@momjian.us
Whole thread Raw
In response to Re: Yet another failure mode in pg_upgrade  (Magnus Hagander <magnus@hagander.net>)
Responses Re: Yet another failure mode in pg_upgrade  (Bruce Momjian <bruce@momjian.us>)
Re: Yet another failure mode in pg_upgrade  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Yet another failure mode in pg_upgrade  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Mon, Aug 13, 2012 at 12:46:43PM +0200, Magnus Hagander wrote:
> On Mon, Aug 13, 2012 at 4:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I've been experimenting with moving the Unix socket directory to
> > /var/run/postgresql for the Fedora distribution (don't ask :-().
> > It's mostly working, but I found out yet another way that pg_upgrade
> > can crash and burn: it doesn't consider the possibility that the
> > old or new postmaster is compiled with a different default
> > unix_socket_directory than what is compiled into the libpq it's using
> > or that pg_dump is using.
> >
> > This is another hazard that we could forget about if we had some way for
> > pg_upgrade to run standalone backends instead of starting a postmaster.
>
> Yeah, that would be nice.
>
>
> > But in the meantime, I suggest it'd be a good idea for pg_upgrade to
> > explicitly set unix_socket_directory (or unix_socket_directories in
> > HEAD) when starting the postmasters, and also explicitly set PGHOST
> > to ensure that the client-side code plays along.
>
> That sounds like a good idea for other reasons as well - manual
> connections attempting to get in during an upgrade will just fail with
> a "no connection" error, which makes sense...
>
> So, +1.

OK, I looked this over, and I have a patch, attached.

Because we are already playing with socket directories, this patch creates
the socket files in the current directory for upgrades and non-live
checks, but not live checks.  This eliminates the "someone accidentally
connects" problem, at least on Unix, plus we are using port 50432
already.  This also turns off TCP connections for unix domain socket
systems.

For "live check" operation, you are checking a running server, so
assuming the socket is in the current directory is not going to work.
What the code does is to read the 5th line from the running server's
postmaster.pid file, which has the socket directory in PG >= 9.1.  For
pre-9.1, pg_upgrade uses the compiled-in defaults for socket directory.
If the defaults are different between the two servers, the new binaries,
e.g. pg_dump, will not work.  The fix is for the user to set pg_upgrade
-O to match the old socket directory, and set PGHOST before running
pg_upgrade.  I could not find a good way to generate a proper error
message because we are blind to the socket directory in pre-9.1.
Frankly, this is a problem if the old pre-9.1 server is running in a
user-configured socket directory too, so a documentation addition seems
right here.

So, in summary, this patch moves the socket directory to the current
directory all but live check operation, and handles different socket
directories for old cluster >= 9.1.  I have added a documentation
mention of how to make this work for for pre-9.1 old servers.

Thus completes another "surgery on a moving train" that is pg_upgrade
development.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Attachment

pgsql-hackers by date:

Previous
From: Stefan Kaltenbrunner
Date:
Subject: Re: [COMMITTERS] pgsql: Cross-link to doc build requirements from install requirements.
Next
From: Bruce Momjian
Date:
Subject: Re: Yet another failure mode in pg_upgrade