Re: [PATCH] libpq: try all addresses for a host before moving to next on target_session_attrs mismatch - Mailing list pgsql-hackers
| From | Alastair Turner |
|---|---|
| Subject | Re: [PATCH] libpq: try all addresses for a host before moving to next on target_session_attrs mismatch |
| Date | |
| Msg-id | CAC0GmyyeKCSSg8Dr2wOON56ar=eZ89QgfzpJfj43pXxUhEz28A@mail.gmail.com Whole thread |
| In response to | Re: [PATCH] libpq: try all addresses for a host before moving to next on target_session_attrs mismatch (Jacob Champion <jacob.champion@enterprisedb.com>) |
| List | pgsql-hackers |
On Wed, 11 Mar 2026 at 20:57, Jacob Champion <jacob.champion@enterprisedb.com> wrote:
Hi Evgeny,
(Evgeny asked me to weigh in on the patch. Careful what you wish for...)
I would like to, as kindly as possible, say that I don't like *either*
of these approaches, on this thread or the other. General concerns up
front:
<snip>
- I'm no DNS expert, but I can't shake the feeling that you're
(mis)using round-robin A records to reimplement, say, SRV records [1]
(or SRVB, which dovetails with recently-standardized ECH).
Neither an A record with multiple IP addresses or SRV, or SVCB which builds on SRV, are a perfect fit here, but an A record with multiple addresses feels to me like a better fit. SRV and SVCB are intended to be used at domain level, which works well for services like LDAP, which cover full domains. So _postgres._tcp.appone.prod.example.com implies a subdomain for appone.prod.myexample.com, and may actually require the creation of that subdomain hierarchy in some DNS tooling. An A record is not necessarily a hostname, but that's generally how they're used, so having read-only and read-write services behind one record doesn't feel quite right, as you say. Viewed a bit more broadly, as an Address (or Addresses) for a resource, we end up with much the same are outcome as the SRV solution, a list of addresses. Administering A records with multiple IP addresses is also a simpler, flat process.
<snip>
I think you've tangled a Postgres-level concern (find me a host with
these characteristics) with a socket-level concern (find me the
addresses for a host), and the main reason you were able to do that
was because PQconnectPoll() currently puts all those concerns into one
impossibly complex function. If someone later wanted to replace
getaddrinfo/connect with a Happy Eyeballs library, to cut down on
connection times, this proposal would prevent them from doing that.
(Both your patch, and the other thread's.) Personally I think we
should reserve the ability to use any API that says "connect me to
this hostname as fast as possible; I do not care how."
I'd say that the boundary has moved - from "find me an endpoint from this list of hosts with these characteristics" to "find me an endpoint from this list of IPs with these characteristics" - rather than that they've become tangled. "Connect me to this list of addresses as fast as possible" still sounds like a good place to be.
<snip>
I understand why it's appealing, I think, but the discussions so far
on both threads don't convince me that this is an overall reduction of
complexity. It exposes more implementation details, which makes it
harder to improve our network connection behavior in the future. It
potentially collides with attempts to encode network topology within
the Postgres protocol. I don't think we're likely to be happy with it
in a few years.
I can see a situation where the client's internal view of the topology could be populated by polling (which would work for any version of server) or from what was encoded in the protocol (for versions of the server which can provide it) as the features to discover topology roll out.
But I do want you to be able to point libpq at a cluster and have it
Just Work. It's a good conversation to have, even if this doesn't make
it in.
Regards
Alastsair
pgsql-hackers by date: