[HACKERS] Re: [doc fix] PG10: wroing description on connect_timeout whenmultiple hosts are specified - Mailing list pgsql-hackers

From Robert Haas
Subject [HACKERS] Re: [doc fix] PG10: wroing description on connect_timeout whenmultiple hosts are specified
Date
Msg-id CA+Tgmob1CHff46feC5LeOAVHDON=QzBoq-apmCm7KwG6urDGMg@mail.gmail.com
Whole thread Raw
In response to [HACKERS] Re: [doc fix] PG10: wroing description on connect_timeout whenmultiple hosts are specified  (Noah Misch <noah@leadboat.com>)
Responses Re: [HACKERS] Re: [doc fix] PG10: wroing description on connect_timeout when multiple hosts are specified  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sun, May 14, 2017 at 11:45 PM, Noah Misch <noah@leadboat.com> wrote:
>> I'll add this item in the PostgreSQL 10 Open Items.
>
> [Action required within three days.  This is a generic notification.]

I think there is a good argument that the existing behavior is as per
the documentation, but I think we may want to change it anyway.  What
the documentation is saying - or at least what I believe I intended
for it to say - is that connect_timeout is restarted for each new
host, so you could end up waiting longer than connect_timeout - but
not forever - if you specify multiple hosts.  And I believe that
statement to be correct. Takayuki Tsunakawa is saying something
different. He's saying that when connect_timeout expires, we should
try the next host instead of giving up. That may or may not be a good
idea, but it doesn't contradict the passage from the documentation
which he quoted.  That passage from the documentation doesn't say
anything at all about what happens when connect_timeout expires.  It
only talks about how much time might pass before that happens.

Takayuki Tsunakawa raised a very similar issue in another thread
related to another open item, namely
https://www.postgresql.org/message-id/flat/0A3221C70F24FB45833433255569204D1F6F5659%40G01JPEXMBYT05
in which he argued that libpq ought to try then next host after a
connection failure regardless of the reason for the connection
failure.  Tom, Michael Paquier, and I all disagreed; none of us
believe that this feature was intended to retry the connection to a
different host after an arbitrary error reported by the remote server.
This thread is essentially the same issue, except here the question
isn't what should happen after we connect to a server and it returns
an error, but rather what happens when we time out waiting to connect
to a server.  When that happens, should we give up, or try the next
server?

Despite the chorus of support for the opposite conclusion on the other
thread, I'm inclined to think that it would be best to change the
behavior here as per the proposed patch.  The point of being able to
specify multiple hosts is to be able to have multiple database servers
(or perhaps, multiple ways to access the same database server) and use
whichever one of those servers is currently up.  I think that when the
server fails with a complaint like "I've never heard of the database
to which you want to connect" that's not a case of the server being
down, but some other kind of trouble that the administrator really
ought to fix; thus it's best to stop and report the error.  But if
connect_timeout expires, that sounds a whole lot like the server being
down.  It sounds morally equivalent to socket() or connect() failing
outright, which *would* trigger advancing to the next host.

So I'm inclined to accept the patch, but as a definitional change
rather than a bug fix.  However, I'd like to hear some other opinions.
I'll wait until Friday for such opinions to arrive, and then update on
next steps.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] Small improvement to compactify_tuples
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternativehosts when some errors occur