Re: BUG #16508: using multi-host connection string when the first host is starting fails - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #16508: using multi-host connection string when the first host is starting fails
Date
Msg-id 530083.1597356494@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #16508: using multi-host connection string when the first host is starting fails  (Noah Misch <noah@leadboat.com>)
Responses Re: BUG #16508: using multi-host connection string when the first host is starting fails
List pgsql-bugs
Noah Misch <noah@leadboat.com> writes:
> On Wed, Jun 24, 2020 at 08:17:44AM +0000, PG Bug reporting form wrote:
>> I'm connection to pg10 using psql (tried with clients psql 10.11 & psql
>> 12.2) using a connection string such as:
>> psql 'dbname=xxxxx1,xxxxx2,xxxxx3,xxxxx4 target_session_attrs=read-write'
>> 
>> the connection to first database (xxxxx1) fail with the error:
>> psql.bin: FATAL:  the database system is starting up
>> 
>> which is correct according to postgres state on that machine,
>> but then I would expect the psql tries the next server (xxxxx2) with is in
>> the one acceptiong the connection params (target_session_attrs=read-write)
>> instead of the error.

> I agree.

I think the OP needs to be less opaque about what he actually did,
because the given example could not possibly have worked in any variant.
There is no provision in libpq for interpreting dbname as a
comma-separated list; therefore, what you actually get with the above
is a single attempt to connect to a database named
"xxxxx1,xxxxx2,xxxxx3,xxxxx4" (or "xxxxx2,xxxxx1,xxxxx3,xxxxx4" in
the allegedly-working case).

I assume that the actual test case involved a comma-separated *host*
(or hostaddr) list, which is what drives multiple connection attempts.
It is true that if we manage to make a connection to a host, but it
then rejects us for some reason, we just give up rather than trying
the next host.  The problem with trying to improve that is that it's
very unclear which cases it's actually appropriate to do that for.
As an example, if you fat-finger the password to host 1, it's unlikely
that silently switching our attention to host 2 would be advisable.
At best, what you'd get is several confusing duplicate messages.

I experimented with letting PQconnectPoll retry after getting a server
error message, as per attached, but I thought the results were more
confusing than helpful.

            regards, tom lane


diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 7bee9dd201..77d6cf1e7e 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -3394,7 +3394,14 @@ keep_going:                        /* We will come back to here until there is
                     }
 #endif

-                    goto error_return;
+                    /*
+                     * This host rejected our connection attempt.  If we have
+                     * more hosts, try the next one.  (But don't consider
+                     * additional addresses for this host; we'd probably just
+                     * end up with confusing duplicate error messages.)
+                     */
+                    conn->try_next_host = true;
+                    goto keep_going;
                 }

                 /* It is an authentication request. */
@@ -3540,7 +3547,15 @@ keep_going:                        /* We will come back to here until there is
                         conn->errorMessage.data[conn->errorMessage.len - 1] != '\n')
                         appendPQExpBufferChar(&conn->errorMessage, '\n');
                     PQclear(res);
-                    goto error_return;
+
+                    /*
+                     * This host rejected our connection attempt.  If we have
+                     * more hosts, try the next one.  (But don't consider
+                     * additional addresses for this host; we'd probably just
+                     * end up with confusing duplicate error messages.)
+                     */
+                    conn->try_next_host = true;
+                    goto keep_going;
                 }

                 /* Fire up post-connection housekeeping if needed */

pgsql-bugs by date:

Previous
From: Christoph Berg
Date:
Subject: Re: BUG #16581: Corrupted debian repository
Next
From: Noah Misch
Date:
Subject: Re: BUG #16508: using multi-host connection string when the first host is starting fails