Re: Support load balancing in libpq - Mailing list pgsql-hackers

From Michael Banck
Subject Re: Support load balancing in libpq
Date
Msg-id 631d04f0.7b0a0220.1140d.9a91@mx.google.com
Whole thread Raw
In response to Re: Support load balancing in libpq  (Jelte Fennema <Jelte.Fennema@microsoft.com>)
Responses Re: [EXTERNAL] Re: Support load balancing in libpq
List pgsql-hackers
Hi,

the patch no longer applies cleanly, please rebase (it's trivial).

I don't like the provided commit message very much, I think the
discussion about pgJDBC having had load balancing for a while belongs
elsewhere.

On Wed, Jun 22, 2022 at 07:54:19AM +0000, Jelte Fennema wrote:
> I tried to stay in line with the naming of this same option in JDBC and
> Npgsql, where it's called "loadBalanceHosts" and "Load Balance Hosts"
> respectively. So, actually to be more in line it should be the option for 
> libpq should be called "load_balance_hosts" (not "loadbalance" like 
> in the previous patch). I attached a new patch with the name of the 
> option changed to this.

Maybe my imagination is not so great, but what else than hosts could we
possibly load-balance? I don't mind calling it load_balance, but I also
don't feel very strongly one way or the other and this is clearly
bikeshed territory.
 
> I also don't think the name is misleading. Randomization of hosts will 
> automatically result in balancing the load across multiple hosts. This is 
> assuming more than a single connection is made using the connection 
> string, either on the same client node or on different client nodes. I think
> I think is a fair assumption to make. Also note that this patch does not load 
> balance queries, it load balances connections. This is because libpq works
> at the connection level, not query level, due to session level state. 

I agree.

Also, I think the scope is ok for a first implementation (but see
below). You or others could possibly further enhance the algorithm in
the future, but it seems to be useful as-is.

> I agree it is indeed fairly simplistic load balancing.

If I understand correctly, you've added DNS-based load balancing on top
of just shuffling the provided hostnames.  This makes sense if a
hostname is backed by more than one IP address in the context of load
balancing, but it also complicates the patch. So I'm wondering how much
shorter the patch would be if you leave that out for now?

On the other hand, I believe pgJDBC keeps track of which hosts are up or
down and only load balances among the ones which are up (maybe
rechecking after a timeout? I don't remember), is this something you're
doing, or did you consider it?

Some quick remarks on the patch:

                /* OK, scan this addrlist for a working server address */
-               conn->addr_cur = conn->addrlist;
                reset_connection_state_machine = true;
                conn->try_next_host = false;

The comment might need to be updated.

+       int                     naddr;                  /* # of addrs returned by getaddrinfo */

This is spelt "number of" in several other places in the file, and we
still have enough space to spell it out here as well without a
line-wrap.


Michael



pgsql-hackers by date:

Previous
From: Jeff Janes
Date:
Subject: Index ordering after IS NULL
Next
From: Peter Geoghegan
Date:
Subject: Re: Index ordering after IS NULL