Was the primary running and accepting connections when you encountered this error? That is, if you specified host="host1 host2", host1 was the non-hot standby and host2 was a running primary? Or only the non-hot standby was running?
If a primary was running, I'd say it's a bug... Perhaps the following part in libpq gives up connection attempts wen the above FATAL error is returned from the server. Maybe libpq should differentiate errors using SQLSTATE and continue connection attempts on other hosts.
Yes, the primary was running, but non-hot standby is in front of the primary in connection string.
Hao Wu and I wrote a patch to fix this problem. Client side libpq should try another hosts in connection string when it is rejected by a non-hot standby, or the first host encounter some n/w problems during the libpq handshake.
Please send emails in text format. Your email was in HTML, and I changed this reply to text format.
Thanks. Is this email in text format now? I just use outlook in chrome. Let me know if it still in html format.
Hubert & Hao Wu
From: tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com>
Sent: Tuesday, October 27, 2020 5:30 PM
To: Hubert Zhang <zhubert@vmware.com>
Cc: pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>
Subject: RE: Multiple hosts in connection string failed to failover in non-hot standby mode
Please send emails in text format. Your email was in HTML, and I changed this reply to text format.
From: Hubert Zhang <zhubert@vmware.com>
> Libpq has supported to specify multiple hosts in connection string and enable auto failover when the previous PostgreSQL instance cannot be accessed.
> But when I tried to enable this feature for a non-hot standby, it cannot do the failover with the following messages.
>
> psql: error: could not connect to server: FATAL: the database system is starting up
Was the primary running and accepting connections when you encountered this error? That is, if you specified host="host1 host2", host1 was the non-hot standby and host2 was a running primary? Or only the non-hot standby was running?
If a primary was running, I'd say it's a bug... Perhaps the following part in libpq gives up connection attempts wen the above FATAL error is returned from the server. Maybe libpq should differentiate errors using SQLSTATE and continue connection attempts on other hosts.
[fe-connect.c]
/* Handle errors. */
if (beresp == 'E')
{
if (PG_PROTOCOL_MAJOR(conn->pversion) >= 3)
...
#endif
goto error_return;
}
/* It is an authentication request. */
conn->auth_req_received = true;
/* Get the type of request. */
Regards
Takayuki Tsunakawa