Re: Yet another failure mode in pg_upgrade - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Yet another failure mode in pg_upgrade
Date
Msg-id CA+TgmoY1XFhJ9WoENiY=-NNvSy4PjDPYmqt4-2NCy8gWUcmyZA@mail.gmail.com
Whole thread Raw
In response to Re: Yet another failure mode in pg_upgrade  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Yet another failure mode in pg_upgrade
List pgsql-hackers
On Sat, Sep 1, 2012 at 11:45 AM, Bruce Momjian <bruce@momjian.us> wrote:
> On Mon, Aug 13, 2012 at 12:46:43PM +0200, Magnus Hagander wrote:
>> On Mon, Aug 13, 2012 at 4:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > I've been experimenting with moving the Unix socket directory to
>> > /var/run/postgresql for the Fedora distribution (don't ask :-().
>> > It's mostly working, but I found out yet another way that pg_upgrade
>> > can crash and burn: it doesn't consider the possibility that the
>> > old or new postmaster is compiled with a different default
>> > unix_socket_directory than what is compiled into the libpq it's using
>> > or that pg_dump is using.
>> >
>> > This is another hazard that we could forget about if we had some way for
>> > pg_upgrade to run standalone backends instead of starting a postmaster.
>>
>> Yeah, that would be nice.
>>
>>
>> > But in the meantime, I suggest it'd be a good idea for pg_upgrade to
>> > explicitly set unix_socket_directory (or unix_socket_directories in
>> > HEAD) when starting the postmasters, and also explicitly set PGHOST
>> > to ensure that the client-side code plays along.
>>
>> That sounds like a good idea for other reasons as well - manual
>> connections attempting to get in during an upgrade will just fail with
>> a "no connection" error, which makes sense...
>>
>> So, +1.
>
> OK, I looked this over, and I have a patch, attached.
>
> Because we are already playing with socket directories, this patch creates
> the socket files in the current directory for upgrades and non-live
> checks, but not live checks.  This eliminates the "someone accidentally
> connects" problem, at least on Unix, plus we are using port 50432
> already.  This also turns off TCP connections for unix domain socket
> systems.
>
> For "live check" operation, you are checking a running server, so
> assuming the socket is in the current directory is not going to work.
> What the code does is to read the 5th line from the running server's
> postmaster.pid file, which has the socket directory in PG >= 9.1.  For
> pre-9.1, pg_upgrade uses the compiled-in defaults for socket directory.
> If the defaults are different between the two servers, the new binaries,
> e.g. pg_dump, will not work.  The fix is for the user to set pg_upgrade
> -O to match the old socket directory, and set PGHOST before running
> pg_upgrade.  I could not find a good way to generate a proper error
> message because we are blind to the socket directory in pre-9.1.
> Frankly, this is a problem if the old pre-9.1 server is running in a
> user-configured socket directory too, so a documentation addition seems
> right here.
>
> So, in summary, this patch moves the socket directory to the current
> directory all but live check operation, and handles different socket
> directories for old cluster >= 9.1.  I have added a documentation
> mention of how to make this work for for pre-9.1 old servers.

I don't think this is reducing the number of failure modes; it's just
changing it from one set of obscure cases to a slightly different set
of obscure cases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Getting rid of cheap-startup-cost paths earlier
Next
From: Pavel Stehule
Date:
Subject: Fwd: PATCH: psql boolean display