Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternativehosts when some errors occur - Mailing list pgsql-hackers

From Tsunakawa, Takayuki
Subject Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternativehosts when some errors occur
Date
Msg-id 0A3221C70F24FB45833433255569204D1F6F9B26@G01JPEXMBYT05
Whole thread Raw
In response to Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternativehosts when some errors occur
Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternativehosts when some errors occur
List pgsql-hackers
Hello Robert, Tom,

Thank you for being kind enough to explain.  I think I could understand your concern.

From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
> Who is right is a judgement call, but I don't think it's self-evident that
> users want to ignore anything and everything that might have gone wrong
> with the connection to the first server, rather than only those things which
> resemble a down server.  It seems quite possible to me that if we had defined
> it as you are proposing, somebody would now be arguing for a behavior change
> in the other direction.

Judgment call... so, I understood that it's a matter of choosing between helping to detect configuration errors early
orservice continuity.  Hmm, I'd like to know how other databases treat this, but I couldn't find useful information
aftersome Google search.  I wonder whether I sould ask PgJDBC people if they know something, because they chose service
continuity.


From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> The bigger picture here is that we only want to fail past transient errors,
> not configuration errors.  I'm willing to err in favor of regarding doubtful
> cases as transient, but most server login rejections aren't for transient
> causes.

I got "doubtful cases" as ones such as specifying non-existent host or an unused port number.  In that case, the
configurationerror can't be distinguished from the server failure.
 

What do you think of the following cases?  Don't you want to connect to other servers?

* The DBA shuts down the database.  The server takes a long time to do checkpointing.  During the shutdown checkpoint,
libpqtries to connect to the server and receive an error "the database system is shutting down."
 

* The former primary failed and now is trying to start as a standby, catching up by applying WAL.  During the recovery,
libpqtries to connect to the server and receive an error "the database system is performing recovery."
 

* The database server crashed due to a bug.  Unfortunately, the server takes unexpectedly long time to shut down
becauseit takes many seconds to write the stats file (as you remember, Tom-san experienced 57 seconds to write the
statsfile during regression tests.)  During the stats file write, libpq tries to connect to the server and receive an
error"the database system is shutting down."
 

These are equivalent to server failure.  I believe we should prioritize rescuing errors during operation over detecting
configurationerrors.
 


> Of course, the user would have to try connections to both foo and bar to
> be sure that they're both configured correctly.  But he might try
> "host=foo,bar" and "host=bar,foo" and figure he was OK, not noticing that
> both connections had silently been made to bar.

In that case, I think he would specify "host=foo" and "host=bar" in turn, because he would be worried about where he's
connectedif he specified multiple hosts.
 

Regards
Takayuki Tsunakawa




pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] synchronous_commit option is not visible after pressing TAB
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] [bug fix] Savepoint-related statements terminates connection