Thread: Support load balancing in libpq

Support load balancing in libpq

From
Jelte Fennema
Date:
Load balancing connections across multiple read replicas is a pretty
common way of scaling out read queries. There are two main ways of doing
so, both with their own advantages and disadvantages:
1. Load balancing at the client level
2. Load balancing by connecting to an intermediary load balancer

Option 1 has been supported by JDBC (Java) for 8 years and Npgsql (C#)
merged support about a year ago. This patch adds the same functionality
to libpq. The way it's implemented is the same as the implementation of
JDBC, and contains two levels of load balancing:
1. The given hosts are randomly shuffled, before resolving them
    one-by-one.
2. Once a host its addresses get resolved, those addresses are shuffled,
    before trying to connect to them one-by-one.
Attachment

Re: Support load balancing in libpq

From
Aleksander Alekseev
Date:
Hi Jelte,

> Load balancing connections across multiple read replicas is a pretty
> common way of scaling out read queries. There are two main ways of doing
> so, both with their own advantages and disadvantages:
> 1. Load balancing at the client level
> 2. Load balancing by connecting to an intermediary load balancer
>
> Option 1 has been supported by JDBC (Java) for 8 years and Npgsql (C#)
> merged support about a year ago. This patch adds the same functionality
> to libpq. The way it's implemented is the same as the implementation of
> JDBC, and contains two levels of load balancing:
> 1. The given hosts are randomly shuffled, before resolving them
>     one-by-one.
> 2. Once a host its addresses get resolved, those addresses are shuffled,
>     before trying to connect to them one-by-one.

Thanks for the patch.

I don't mind the feature but I believe the name is misleading. Unless
I missed something, the patch merely allows choosing a random host
from the provided list. By load balancing people generally expect
something more elaborate - e.g. sending read-only queries to replicas
and read/write queries to the primary, or perhaps using weights
proportional to the server throughput/performance.

Randomization would be a better term for what the patch does.

-- 
Best regards,
Aleksander Alekseev



Re: Support load balancing in libpq

From
Jelte Fennema
Date:
I tried to stay in line with the naming of this same option in JDBC and
Npgsql, where it's called "loadBalanceHosts" and "Load Balance Hosts"
respectively. So, actually to be more in line it should be the option for
libpq should be called "load_balance_hosts" (not "loadbalance" like
in the previous patch). I attached a new patch with the name of the
option changed to this.

I also don't think the name is misleading. Randomization of hosts will
automatically result in balancing the load across multiple hosts. This is
assuming more than a single connection is made using the connection
string, either on the same client node or on different client nodes. I think
I think is a fair assumption to make. Also note that this patch does not load
balance queries, it load balances connections. This is because libpq works
at the connection level, not query level, due to session level state.

I agree it is indeed fairly simplistic load balancing. But many dedicated load
balancers often use simplistic load balancing too. Round-robin, random and
hash+modulo based load balancing are all very commonly used load balancer
strategies. Using this patch you should even be able to implement the
weighted load balancing that you suggest, by supplying the same host + port
pair multiple times in the list of hosts.

My preference would be to use load_balance_hosts for the option name.
However, if the name of the option becomes the main point of contention
I would be fine with changing the option to "randomize_hosts". I think
in the end it comes down to what we want the name of the option to reflect:
1. load_balance_hosts reflects what you (want to) achieve by enabling it
2. randomize_hosts reflects how it is achieved


Jelte
Attachment

Re: Support load balancing in libpq

From
Bharath Rupireddy
Date:
On Fri, Jun 10, 2022 at 10:01 PM Jelte Fennema
<Jelte.Fennema@microsoft.com> wrote:
>
> Load balancing connections across multiple read replicas is a pretty
> common way of scaling out read queries. There are two main ways of doing
> so, both with their own advantages and disadvantages:
> 1. Load balancing at the client level
> 2. Load balancing by connecting to an intermediary load balancer
>
> Option 1 has been supported by JDBC (Java) for 8 years and Npgsql (C#)
> merged support about a year ago. This patch adds the same functionality
> to libpq. The way it's implemented is the same as the implementation of
> JDBC, and contains two levels of load balancing:
> 1. The given hosts are randomly shuffled, before resolving them
>     one-by-one.
> 2. Once a host its addresses get resolved, those addresses are shuffled,
>     before trying to connect to them one-by-one.

Thanks for the patch. +1 for the general idea of redirecting connections.

I'm quoting a previous attempt by Satyanarayana Narlapuram on this
topic [1], it also has a patch set.

IMO, rebalancing of the load must be based on parameters (as also
suggested by Aleksander Alekseev in this thread) such as the
read-only/write queries, CPU/IO/Memory utilization of the
primary/standby, network distance etc. We may not have to go the extra
mile to determine all of these parameters dynamically during query
authentication time, but we can let users provide a list of standby
hosts based on "some" priority (Satya's thread [1] attempts to do
this, in a way, with users specifying the hosts via pg_hba.conf file).
If required, randomization in choosing the hosts can be optional.

Also, IMO, the solution must have a fallback mechanism if the
standby/chosen host isn't reachable.

Few thoughts on the patch:
1) How are we determining if the submitted query is read-only or write?
2) What happens for explicit transactions? The queries related to the
same txn get executed on the same host right? How are we guaranteeing
this?
3) Isn't it good to provide a way to test the patch?

[1]
https://www.postgresql.org/message-id/flat/CY1PR21MB00246DE1F9E9C58455A78A37915C0%40CY1PR21MB0024.namprd21.prod.outlook.com

Regards,
Bharath Rupireddy.



Re: [EXTERNAL] Re: Support load balancing in libpq

From
Jelte Fennema
Date:
> I'm quoting a previous attempt by Satyanarayana Narlapuram on this
> topic [1], it also has a patch set.

Thanks for sharing that. It's indeed a different approach to solve the
same problem. I think my approach is much simpler, since it only
requires minimal changes to the libpq client and none to the postgres
server or the postgres protocol.

However, that linked patch is more flexible due to allowing redirection
based on users and databases. With my patch something similar could
still be achieved by using different hostname lists for different databases
or users at the client side.

To be completely clear on the core difference between the patch IMO:
In this patch a DNS server or (a hardcoded hostname/IP list at the client
side) is used to determine what host to connect to. In the linked
patch instead the Postgres server starts being the source of truth of what
to connect to, thus essentially becoming something similar to a DNS server.

> We may not have to go the extra
> mile to determine all of these parameters dynamically during query
> authentication time, but we can let users provide a list of standby
> hosts based on "some" priority (Satya's thread [1] attempts to do
> this, in a way, with users specifying the hosts via pg_hba.conf file).
> If required, randomization in choosing the hosts can be optional.

I'm not sure if you read my response to Aleksander. I feel like I
addressed part of this at least. But maybe I was not clear enough,
or added too much fluff. So, I'll re-iterate the important part:
By specifying the same host multiple times in the DNS response or
the hostname/IP list you can achieve weighted load balancing.

Few thoughts on the patch:
> 1) How are we determining if the submitted query is read-only or write?

This is not part of this patch. libpq and thus this patch works at the connection
level, not at the query level, so determining a read-only query or write only query
is not possible without large changes.

However, libpq already has a target_session_attrs[1] connection option. This can be
used to open connections specifically to read-only or writable servers. However,
once a read-only connection is opened it is then the responsibility of the client
not to send write queries over this read-only connection, otherwise they will fail.

> 2) What happens for explicit transactions? The queries related to the
> same txn get executed on the same host right? How are we guaranteeing
> this?

We're load balancing connections, not queries. Once a connection is made
all queries on that connection will be executed on the same host.

> 3) Isn't it good to provide a way to test the patch?

The way I tested it myself was by setting up a few databases on my local machine
listening on 127.0.0.1, 127.0.0.2, 127.0.0.3 and then putting all those in the connection
string. Then looking at the connection attempts on the servers their logs showed that
the client was indeed connecting to a random one (by using log_connections=true
in postgresql.conf).

I would definitely like to have some automated tests for this, but I couldn't find tests
for libpq that were connecting to multiple postgres servers. If they exist, any pointers
are appreciated. If they don't exist, pointers to similar tests are also appreciated.

[1]: https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNECT-TARGET-SESSION-ATTRS


RE: Support load balancing in libpq

From
"kuroda.hayato@fujitsu.com"
Date:
Dear Jelte,

I like your idea. But do we have to sort randomly even if target_session_attr is set to 'primary' or 'read-write'?

I think this parameter can be used when all listed servers have same data,
and we can assume that one of members is a primary and others are secondary.

In this case user maybe add a primary host to top of the list,
so sorting may increase time durations for establishing connection.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Re: Support load balancing in libpq

From
Jelte Fennema
Date:
we can assume that one of members is a primary and others are secondary.

With plain Postgres this assumption is probably correct. But the main reason
I'm interested in this patch was because I would like to be able to load
balance across the workers in a Citus cluster. These workers are all primaries.
Similar usage would likely be possible with BDR (bidirectional replication).

> In this case user maybe add a primary host to top of the list,
> so sorting may increase time durations for establishing connection.

If the user takes such care when building their host list, they could simply 
not add the load_balance_hosts=true option to the connection string.
If you know there's only one primary in the list and you're looking for
the primary, then there's no reason to use load_balance_hosts=true. 

RE: Support load balancing in libpq

From
"kuroda.hayato@fujitsu.com"
Date:
Dear Jelte,

> With plain Postgres this assumption is probably correct. But the main reason
> I'm interested in this patch was because I would like to be able to load
> balance across the workers in a Citus cluster. These workers are all primaries.
> Similar usage would likely be possible with BDR (bidirectional replication).

I agree this feature is useful for BDR-like solutions.

> If the user takes such care when building their host list, they could simply
> not add the load_balance_hosts=true option to the connection string.
> If you know there's only one primary in the list and you're looking for
> the primary, then there's no reason to use load_balance_hosts=true.

You meant that it was the user responsibility to set correctly, right?
It seemed reasonable because libpq was just a library for connecting to server
and should not be smart.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED




Re: Support load balancing in libpq

From
Michael Banck
Date:
Hi,

the patch no longer applies cleanly, please rebase (it's trivial).

I don't like the provided commit message very much, I think the
discussion about pgJDBC having had load balancing for a while belongs
elsewhere.

On Wed, Jun 22, 2022 at 07:54:19AM +0000, Jelte Fennema wrote:
> I tried to stay in line with the naming of this same option in JDBC and
> Npgsql, where it's called "loadBalanceHosts" and "Load Balance Hosts"
> respectively. So, actually to be more in line it should be the option for 
> libpq should be called "load_balance_hosts" (not "loadbalance" like 
> in the previous patch). I attached a new patch with the name of the 
> option changed to this.

Maybe my imagination is not so great, but what else than hosts could we
possibly load-balance? I don't mind calling it load_balance, but I also
don't feel very strongly one way or the other and this is clearly
bikeshed territory.
 
> I also don't think the name is misleading. Randomization of hosts will 
> automatically result in balancing the load across multiple hosts. This is 
> assuming more than a single connection is made using the connection 
> string, either on the same client node or on different client nodes. I think
> I think is a fair assumption to make. Also note that this patch does not load 
> balance queries, it load balances connections. This is because libpq works
> at the connection level, not query level, due to session level state. 

I agree.

Also, I think the scope is ok for a first implementation (but see
below). You or others could possibly further enhance the algorithm in
the future, but it seems to be useful as-is.

> I agree it is indeed fairly simplistic load balancing.

If I understand correctly, you've added DNS-based load balancing on top
of just shuffling the provided hostnames.  This makes sense if a
hostname is backed by more than one IP address in the context of load
balancing, but it also complicates the patch. So I'm wondering how much
shorter the patch would be if you leave that out for now?

On the other hand, I believe pgJDBC keeps track of which hosts are up or
down and only load balances among the ones which are up (maybe
rechecking after a timeout? I don't remember), is this something you're
doing, or did you consider it?

Some quick remarks on the patch:

                /* OK, scan this addrlist for a working server address */
-               conn->addr_cur = conn->addrlist;
                reset_connection_state_machine = true;
                conn->try_next_host = false;

The comment might need to be updated.

+       int                     naddr;                  /* # of addrs returned by getaddrinfo */

This is spelt "number of" in several other places in the file, and we
still have enough space to spell it out here as well without a
line-wrap.


Michael



Re: [EXTERNAL] Re: Support load balancing in libpq

From
Jelte Fennema
Date:
Attached is an updated patch with the following changes:
1. rebased (including solved merge conflict)
2. fixed failing tests in CI
3. changed the commit message a little bit
4. addressed the two remarks from Micheal
5. changed the prng_state from a global to a connection level value for thread-safety
6. use pg_prng_uint64_range

> Maybe my imagination is not so great, but what else than hosts could we
> possibly load-balance? I don't mind calling it load_balance, but I also
> don't feel very strongly one way or the other and this is clearly
> bikeshed territory.

I agree, which is why I called it load_balance in my original patch. But I also
think it's useful to match the naming for the already existing implementations
in the PG ecosystem around this. But like you I don't really feel strongly either
way. It's a tradeoff between short name and consistency in the ecosystem.

> If I understand correctly, you've added DNS-based load balancing on top
> of just shuffling the provided hostnames.  This makes sense if a
> hostname is backed by more than one IP address in the context of load
> balancing, but it also complicates the patch. So I'm wondering how much
> shorter the patch would be if you leave that out for now?

Yes, that's correct and indeed the patch would be simpler without, i.e. all the
addrinfo changes would become unnecessary. But IMHO the behaviour of
the added option would be very unexpected if it didn't load balance across
multiple IPs in a DNS record. libpq currently makes no real distinction in
handling of provided hosts and handling of their resolved IPs. If load balancing
would only apply to the host list that would start making a distinction
between the two.

Apart from that the load balancing across IPs is one of the main reasons
for my interest in this patch. The reason is that it allows expanding or reducing
the number of nodes that are being load balanced across transparently to the
application. Which means that there's no need to re-deploy applications with
new connection strings when changing the number hosts.

> On the other hand, I believe pgJDBC keeps track of which hosts are up or
> down and only load balances among the ones which are up (maybe
> rechecking after a timeout? I don't remember), is this something you're
> doing, or did you consider it?

I don't think it's possible to do this in libpq without huge changes to its
architecture, since normally a connection will only a PGconn will only
create a single connection. The reason pgJDBC can do this is because
it's actually a connection pooler, so it will open more than one connection
and can thus keep some global state about the different hosts.

Jelte
Attachment

Re: [EXTERNAL] Re: Support load balancing in libpq

From
Maxim Orlov
Date:
+1 for overall idea of load balancing via random host selection.

For the patch itself, I think it is better to use a more precise time function in libpq_prng_init or call it only once.
Thought it is a special corner case, imagine all the connection attempts at first second will be seeded with the save
value, i.e. will attempt to connect to the same host. I think, this is not we want to achieve.

And the "hostroder" option should be free'd in freePGconn.

> Also, IMO, the solution must have a fallback mechanism if the
> standby/chosen host isn't reachable.

Yeah, I think it should. I'm not insisting on a particular name of options here, but in my view, the overall idea may be next:
- we have two libpq's options: "load_balance_hosts" and "failover_timeout";
- the "load_balance_hosts" should be "sequential" or "random";
- the "failover_timeout" is a time period, within which, if connection to the server is not established, we switch to the next address or host.

While writing this text, I start thinking that load balancing is a combination of two parameters above.

> 3) Isn't it good to provide a way to test the patch?

Good idea too. I think, we should add tap test here.


--
Best regards,
Maxim Orlov.

Re: [EXTERNAL] Re: Support load balancing in libpq

From
Michael Banck
Date:
Hi,

On Mon, Sep 12, 2022 at 02:16:56PM +0000, Jelte Fennema wrote:
> Attached is an updated patch with the following changes:
> 1. rebased (including solved merge conflict)
> 2. fixed failing tests in CI
> 3. changed the commit message a little bit
> 4. addressed the two remarks from Micheal
> 5. changed the prng_state from a global to a connection level value for thread-safety
> 6. use pg_prng_uint64_range

Thanks!
 
I tested this some more, and found it somewhat surprising that at least
when looking at it on a microscopic level, some hosts are chosen more
often than the others for a while.

I basically ran 

while true; do psql -At "host=pg1,pg2,pg3 load_balance_hosts=1" -c
"SELECT inet_server_addr()"; sleep 1; done

and the initial output was:

10.0.3.109
10.0.3.109
10.0.3.240
10.0.3.109
10.0.3.109
10.0.3.240
10.0.3.109
10.0.3.240
10.0.3.240
10.0.3.240
10.0.3.240
10.0.3.109
10.0.3.240
10.0.3.109
10.0.3.109
10.0.3.240
10.0.3.240
10.0.3.109
10.0.3.60

I.e. the second host (pg2/10.0.3.60) was only hit after 19 iterations.

Once significantly more than a hundred iterations are run, the hosts
somewhat even out, but it is maybe suprising to users:

               50   100   250   500   1000   10000
10.0.3.60       9    24    77   165    328    3317
10.0.3.109     25    42    88   178    353    3372
10.0.3.240     16    34    85   157    319    3311

Or maybe my test setup is skewed? When I choose a two seconds timeout
between psql calls, I get a more even distribution initially, but it
then diverges after 100 iterations:

               50   100   250    500   1000
10.0.3.60      19    36    98    199    374 
10.0.3.109     13    33    80    150    285 
10.0.3.240     18    31    72    151    341 

Could just be bad luck...

I also switch one host to have two IP addresses in /etc/hosts:

10.0.3.109 pg1
10.0.3.60  pg1
10.0.3.240 pg3

And this resulted in this (one second timeout again):

First run:

               50   100   250    500   1000
10.0.3.60      10    18    56    120    255    
10.0.3.109     14    30    67    139    278    
10.0.3.240     26    52   127    241    467    

Second run:

               50   100   250    500   1000
10.0.3.60      20    31    77    138    265     
10.0.3.109      9    20    52    116    245     
10.0.3.240     21    49   121    246    490     

So it looks like it load-balances between pg1 and pg3, and not between
the three IPs -  is this expected?

If I switch from "host=pg1,pg3" to "host=pg1,pg1,pg3", each IP adress is
hit roughly equally.

So I guess this is how it should work, but in that case I think the
documentation should be more explicit about what is to be expected if a
host has multiple IP addresses or hosts are specified multiple times in
the connection string.

> > Maybe my imagination is not so great, but what else than hosts could we
> > possibly load-balance? I don't mind calling it load_balance, but I also
> > don't feel very strongly one way or the other and this is clearly
> > bikeshed territory.
> 
> I agree, which is why I called it load_balance in my original patch.
> But I also think it's useful to match the naming for the already
> existing implementations in the PG ecosystem around this. 
> But like you I don't really feel strongly either way. It's a tradeoff
> between short name and consistency in the ecosystem.

I don't think consistency is an extremely valid concern. As a
counterpoint, pgJDBC had targetServerType some time before Postgres, and
the libpq parameter was then named somewhat differently when it was
introduced, namely target_session_attrs.

> > If I understand correctly, you've added DNS-based load balancing on top
> > of just shuffling the provided hostnames.  This makes sense if a
> > hostname is backed by more than one IP address in the context of load
> > balancing, but it also complicates the patch. So I'm wondering how much
> > shorter the patch would be if you leave that out for now?
> 
> Yes, that's correct and indeed the patch would be simpler without, i.e. all the
> addrinfo changes would become unnecessary. But IMHO the behaviour of 
> the added option would be very unexpected if it didn't load balance across
> multiple IPs in a DNS record. libpq currently makes no real distinction in 
> handling of provided hosts and handling of their resolved IPs. If load balancing
> would only apply to the host list that would start making a distinction
> between the two.

Fair enough, I agree.
 
> Apart from that the load balancing across IPs is one of the main reasons
> for my interest in this patch. The reason is that it allows expanding or reducing
> the number of nodes that are being load balanced across transparently to the
> application. Which means that there's no need to re-deploy applications with 
> new connection strings when changing the number hosts.

That's a good point as well.
 
> > On the other hand, I believe pgJDBC keeps track of which hosts are up or
> > down and only load balances among the ones which are up (maybe
> > rechecking after a timeout? I don't remember), is this something you're
> > doing, or did you consider it?
> 
> I don't think it's possible to do this in libpq without huge changes to its
> architecture, since normally a connection will only a PGconn will only
> create a single connection. The reason pgJDBC can do this is because
> it's actually a connection pooler, so it will open more than one connection 
> and can thus keep some global state about the different hosts.

Ok.


Michael



Re: [EXTERNAL] Re: Support load balancing in libpq

From
Michael Banck
Date:
Hi,

On Wed, Sep 14, 2022 at 05:53:48PM +0300, Maxim Orlov wrote:
> > Also, IMO, the solution must have a fallback mechanism if the
> > standby/chosen host isn't reachable.
> 
> Yeah, I think it should. I'm not insisting on a particular name of options
> here, but in my view, the overall idea may be next:
> - we have two libpq's options: "load_balance_hosts" and "failover_timeout";
> - the "load_balance_hosts" should be "sequential" or "random";
> - the "failover_timeout" is a time period, within which, if connection to
> the server is not established, we switch to the next address or host.

Isn't this exactly what connect_timeout is providing? In my tests, it
worked exactly as I would expect it, i.e. after connect_timeout seconds,
libpq was re-shuffling and going for another host.

If you specify only one host (or all are down), you get an error.

In any case, I am not sure what failover has to do with it if we are
talking about initiating connections - usually failover is for already
established connections that suddendly go away for one reason or
another.

Or maybe I'm just not understanding where you're getting at?

> While writing this text, I start thinking that load balancing is a
> combination of two parameters above.

So I guess what you are saying is that if load_balance_hosts is set,
not setting connect_timeout would be a hazard, cause it would stall the
connection attempt even though other hosts would be available.

That's right, but I guess it's already a hazard if you put multiple
hosts in there, and the connection is not immediately failed (because
the host doesn't exist or it rejects the connection) but stalled by a
firewall for one host, while other hosts later on in the list would be
happy to accept connections.

So maybe this is something to think about, but just changing the
defaul of connect_timeout to something else when load balancing is on
would be very surprising. In any case, I don't think this absolutely
needs to be addressed by the initial feature, it could be expanded upon
later on if needed, the feature is useful on its own, along with
connect_timeout.


Michael



Re: Support load balancing in libpq

From
Jelte Fennema
Date:
I attached a new patch which does the following:
1. adds tap tests
2. adds random_seed parameter to libpq (required for tap tests)
3. frees conn->loadbalance in freePGConn
4. add more expansive docs on the feature its behaviour

Apart from bike shedding on the name of the option I think it's pretty good now.

> Isn't this exactly what connect_timeout is providing? In my tests, it
> worked exactly as I would expect it, i.e. after connect_timeout seconds,
> libpq was re-shuffling and going for another host.

Yes, this was the main purpose of multiple hosts previously. This patch
doesn't change that, and it indeed continues to work when enabling
load balancing too. I included this in the tap tests.

> I tested this some more, and found it somewhat surprising that at least
> when looking at it on a microscopic level, some hosts are chosen more
> often than the others for a while.

That does seem surprising, but it looks like it might simply be bad luck.
Did you compile with OpenSSL support? Otherwise, the strong random
source might not be used.

> So it looks like it load-balances between pg1 and pg3, and not between
> the three IPs -  is this expected?
>
> If I switch from "host=pg1,pg3" to "host=pg1,pg1,pg3", each IP adress is
> hit roughly equally.
>
> So I guess this is how it should work, but in that case I think the
> documentation should be more explicit about what is to be expected if a
> host has multiple IP addresses or hosts are specified multiple times in
> the connection string.

Yes, this behaviour is expected I tried to make that clearer in the newest
version of the docs.

> For the patch itself, I think it is better to use a more precise time
> function in libpq_prng_init or call it only once.
> Thought it is a special corner case, imagine all the connection attempts at
> first second will be seeded with the save

I agree that using microseconds would probably be preferable. But that seems
like a separate patch, since I took this initialization code from the InitProcessGlobals
function. Also, it shouldn't be a big issue in practice, since usually the strong random
source will be used.
Attachment

Re: Support load balancing in libpq

From
Jelte Fennema
Date:
Attached is a new version with the tests cleaned up a bit (more comments mostly).

@Michael, did you have a chance to look at the last version? Because I feel that the
patch is pretty much ready for a committer to look at, at this point.
Attachment

Re: Support load balancing in libpq

From
Michael Banck
Date:
Hi,

On Tue, Nov 29, 2022 at 02:57:08PM +0000, Jelte Fennema wrote:
> Attached is a new version with the tests cleaned up a bit (more
> comments mostly).
> 
> @Michael, did you have a chance to look at the last version? Because I
> feel that the patch is pretty much ready for a committer to look at,
> at this point.

I had another look; it still applies and tests pass. I still find the
distribution over three hosts a bit skewed (even when OpenSSL is
enabled, which wasn't the case when I first tested it), but couldn't
find a general fault and it worked well enough in my testing.

I wonder whether making the parameter a boolean will paint us into a
corner, and whether maybe additional modes could be envisioned in the
future, but I can't think of some right now (you can pretty neatly
restrict load-balancing to standbys by setting
target_session_attrs=standby in addition). Maybe a way to change the
behaviour if a dns hostname is backed by multiple entries?

Some further (mostly nitpicking) comments on the patch:

> From 6e20bb223012b666161521b5e7249c066467a5f3 Mon Sep 17 00:00:00 2001
> From: Jelte Fennema <github-tech@jeltef.nl>
> Date: Mon, 12 Sep 2022 09:44:06 +0200
> Subject: [PATCH v5] Support load balancing in libpq
> 
> Load balancing connections across multiple read replicas is a pretty
> common way of scaling out read queries. There are two main ways of doing
> so, both with their own advantages and disadvantages:
> 1. Load balancing at the client level
> 2. Load balancing by connecting to an intermediary load balancer
> 
> Both JBDC (Java) and Npgsql (C#) already support client level load
> balancing (option #1). This patch implements client level load balancing
> for libpq as well. To stay consistent with the JDBC and Npgsql part of
> the  ecosystem, a similar implementation and name for the option are
> used. 

I still think all of the above has no business in the commit message,
though maybe the first paragraph can stay as introduction.

> It contains two levels of load balancing:
> 1. The given hosts are randomly shuffled, before resolving them
>     one-by-one.
> 2. Once a host its addresses get resolved, those addresses are shuffled,
>     before trying to connect to them one-by-one.

That's fine.

What should be in the commit message is at least a mention of what the
new connection parameter is called and possibly what is done to
accomplish it.

But the committer will pick this up if needed I guess.

> diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
> index f9558dec3b..6ce7a0c9cc 100644
> --- a/doc/src/sgml/libpq.sgml
> +++ b/doc/src/sgml/libpq.sgml
> @@ -1316,6 +1316,54 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
>        </listitem>
>       </varlistentry>
>  
> +     <varlistentry id="libpq-load-balance-hosts" xreflabel="load_balance_hosts">
> +      <term><literal>load_balance_hosts</literal></term>
> +      <listitem>
> +       <para>
> +        Controls whether the client load balances connections across hosts and
> +        adresses. The default value is 0, meaning off, this means that hosts are
> +        tried in order they are provided and addresses are tried in the order
> +        they are received from DNS or a hosts file. If this value is set to 1,
> +        meaning on, the hosts and addresses that they resolve to are tried in
> +        random order. Subsequent queries once connected will still be sent to
> +        the same server. Setting this to 1, is mostly useful when opening
> +        multiple connections at the same time, possibly from different machines.
> +        This way connections can be load balanced across multiple Postgres
> +        servers.
> +       </para>
> +       <para>
> +        When providing multiple hosts, these hosts are resolved in random order.
> +        Then if that host resolves to multiple addresses, these addresses are
> +        connected to in random order. Only once all addresses for a single host
> +        have been tried, the addresses for the next random host will be
> +        resolved. This behaviour can lead to non-uniform address selection in
> +        certain cases. Such as when not all hosts resolve to the same number of
> +        addresses, or when multiple hosts resolve to the same address. So if you
> +        want uniform load balancing, this is something to keep in mind. However,
> +        non-uniform load balancing also has usecases, e.g. providing the
> +        hostname of a larger server multiple times in the host string so it gets
> +        more requests.
> +       </para>
> +       <para>
> +        When using this setting it's recommended to also configure a reasonable
> +        value for <xref linkend="libpq-connect-connect-timeout"/>. Because then,
> +        if one of the nodes that are used for load balancing is not responding,
> +        a new node will be tried.
> +       </para>
> +      </listitem>
> +     </varlistentry>

I think this whole section is generally fine, but needs some
copyediting.

> +     <varlistentry id="libpq-random-seed" xreflabel="random_seed">
> +      <term><literal>random_seed</literal></term>
> +      <listitem>
> +       <para>
> +        Sets the random seed that is used by <xref linkend="libpq-load-balance-hosts"/>
> +        to randomize the host order. This option is mostly useful when running
> +        tests that require a stable random order.
> +       </para>
> +      </listitem>
> +     </varlistentry>

I wonder whether this needs to be documented if it is mostly a
development/testing parameter?

> diff --git a/src/include/libpq/pqcomm.h b/src/include/libpq/pqcomm.h
> index fcf68df39b..39e93b1392 100644
> --- a/src/include/libpq/pqcomm.h
> +++ b/src/include/libpq/pqcomm.h
> @@ -27,6 +27,12 @@ typedef struct
>      socklen_t    salen;
>  } SockAddr;
>  
> +typedef struct
> +{
> +    int            family;
> +    SockAddr    addr;
> +}            AddrInfo;

That last line looks weirdly indented compared to SockAddr; in the
struct above.

>  /* Configure the UNIX socket location for the well known port. */
>  
>  #define UNIXSOCK_PATH(path, port, sockdir) \
> diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
> index f88d672c6c..b4d3613713 100644
> --- a/src/interfaces/libpq/fe-connect.c
> +++ b/src/interfaces/libpq/fe-connect.c
> @@ -241,6 +241,14 @@ static const internalPQconninfoOption PQconninfoOptions[] = {
>          "Fallback-Application-Name", "", 64,
>      offsetof(struct pg_conn, fbappname)},
>  
> +    {"load_balance_hosts", NULL, NULL, NULL,
> +        "Load-Balance", "", 1,    /* should be just '0' or '1' */
> +    offsetof(struct pg_conn, loadbalance)},
> +
> +    {"random_seed", NULL, NULL, NULL,
> +        "Random-Seed", "", 10,    /* strlen(INT32_MAX) == 10 */
> +    offsetof(struct pg_conn, randomseed)},
> +
>      {"keepalives", NULL, NULL, NULL,
>          "TCP-Keepalives", "", 1,    /* should be just '0' or '1' */
>      offsetof(struct pg_conn, keepalives)},
> @@ -379,6 +387,7 @@ static bool fillPGconn(PGconn *conn, PQconninfoOption *connOptions);
>  static void freePGconn(PGconn *conn);
>  static void closePGconn(PGconn *conn);
>  static void release_conn_addrinfo(PGconn *conn);
> +static bool store_conn_addrinfo(PGconn *conn, struct addrinfo *addrlist);
>  static void sendTerminateConn(PGconn *conn);
>  static PQconninfoOption *conninfo_init(PQExpBuffer errorMessage);
>  static PQconninfoOption *parse_connection_string(const char *connstr,
> @@ -424,6 +433,9 @@ static void pgpassfileWarning(PGconn *conn);
>  static void default_threadlock(int acquire);
>  static bool sslVerifyProtocolVersion(const char *version);
>  static bool sslVerifyProtocolRange(const char *min, const char *max);
> +static int    loadBalance(PGconn *conn);
> +static bool parse_int_param(const char *value, int *result, PGconn *conn,
> +                            const char *context);
>  
>  
>  /* global variable because fe-auth.c needs to access it */
> @@ -1007,6 +1019,46 @@ parse_comma_separated_list(char **startptr, bool *more)
>      return p;
>  }
>  
> +/*
> + * Initializes the prng_state field of the connection. We want something
> + * unpredictable, so if possible, use high-quality random bits for the
> + * seed. Otherwise, fall back to a seed based on timestamp and PID.
> + */
> +static bool
> +libpq_prng_init(PGconn *conn)
> +{
> +    if (unlikely(conn->randomseed))
> +    {
> +        int            rseed;
> +
> +        if (!parse_int_param(conn->randomseed, &rseed, conn, "random_seed"))
> +        {
> +            return false;
> +        };

I think it's project policy to drop the braces for single statements in
if blocks.

> +        pg_prng_seed(&conn->prng_state, rseed);
> +    }
> +    else if (unlikely(!pg_prng_strong_seed(&conn->prng_state)))
> +    {
> +        uint64        rseed;
> +        time_t        now = time(NULL);
> +
> +        /*
> +         * Since PIDs and timestamps tend to change more frequently in their
> +         * least significant bits, shift the timestamp left to allow a larger
> +         * total number of seeds in a given time period.  Since that would
> +         * leave only 20 bits of the timestamp that cycle every ~1 second,
> +         * also mix in some higher bits.
> +         */
> +        rseed = ((uint64) getpid()) ^
> +            ((uint64) now << 12) ^
> +            ((uint64) now >> 20);
> +
> +        pg_prng_seed(&conn->prng_state, rseed);
> +    }
> +    return true;
> +}
> +
> +
>  /*

Additional newline.

> @@ -1164,6 +1217,36 @@ connectOptions2(PGconn *conn)
>          }
>      }
>  
> +    if (loadbalancehosts < 0)
> +    {
> +        appendPQExpBufferStr(&conn->errorMessage,
> +                             libpq_gettext("loadbalance parameter must be an integer\n"));
> +        return false;
> +    }
> +
> +    if (loadbalancehosts)
> +    {
> +        if (!libpq_prng_init(conn))
> +        {
> +            return false;
> +        }
> +
> +        /*
> +         * Shuffle connhost with a Durstenfeld/Knuth version of the
> +         * Fisher-Yates shuffle. Source:
> +         * https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#The_modern_algorithm
> +         */
> +        for (i = conn->nconnhost - 1; i > 0; i--)
> +        {
> +            int            j = pg_prng_uint64_range(&conn->prng_state, 0, i);
> +            pg_conn_host temp = conn->connhost[j];
> +
> +            conn->connhost[j] = conn->connhost[i];
> +            conn->connhost[i] = temp;
> +        }
> +    }
> +
> +
>      /*

Additional newline.

> @@ -1726,6 +1809,27 @@ connectFailureMessage(PGconn *conn, int errorno)
>          libpq_append_conn_error(conn, "\tIs the server running on that host and accepting TCP/IP connections?");
>  }
>  
> +/*
> + * Should we load balance across hosts? Returns 1 if yes, 0 if no, and -1 if
> + * conn->loadbalance is set to a value which is not parseable as an integer.
> + */
> +static int
> +loadBalance(PGconn *conn)
> +{
> +    char       *ep;
> +    int            val;
> +
> +    if (conn->loadbalance == NULL)
> +    {
> +        return 0;
> +    }

Another case of additional braces.

> +    val = strtol(conn->loadbalance, &ep, 10);
> +    if (*ep)
> +        return -1;
> +    return val != 0 ? 1 : 0;
> +}
> +
> +
>  /*

Additional newline.

> @@ -4041,6 +4154,63 @@ freePGconn(PGconn *conn)
>      free(conn);
>  }
>  
> +
> +/*

Additional newline.

> + * Copies over the AddrInfos from addrlist to the PGconn.
> + */
> +static bool
> +store_conn_addrinfo(PGconn *conn, struct addrinfo *addrlist)
> +{
> +    struct addrinfo *ai = addrlist;
> +
> +    conn->whichaddr = 0;
> +
> +    conn->naddr = 0;
> +    while (ai)
> +    {
> +        ai = ai->ai_next;
> +        conn->naddr++;
> +    }
> +
> +    conn->addr = calloc(conn->naddr, sizeof(AddrInfo));
> +    if (conn->addr == NULL)
> +    {
> +        return false;
> +    }

Additional braces.

> @@ -4048,11 +4218,10 @@ freePGconn(PGconn *conn)
>  static void
>  release_conn_addrinfo(PGconn *conn)
>  {
> -    if (conn->addrlist)
> +    if (conn->addr)
>      {
> -        pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist);
> -        conn->addrlist = NULL;
> -        conn->addr_cur = NULL;    /* for safety */
> +        free(conn->addr);
> +        conn->addr = NULL;
>      }
>  }
>  
> diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
> index 512762f999..76ee988038 100644
> --- a/src/interfaces/libpq/libpq-int.h
> +++ b/src/interfaces/libpq/libpq-int.h
> @@ -82,6 +82,8 @@ typedef struct
>  #endif
>  #endif                            /* USE_OPENSSL */
>  
> +#include "common/pg_prng.h"
> +
>  /*
>   * POSTGRES backend dependent Constants.
>   */
> @@ -373,6 +375,8 @@ struct pg_conn
>      char       *pgpassfile;        /* path to a file containing password(s) */
>      char       *channel_binding;    /* channel binding mode
>                                       * (require,prefer,disable) */
> +    char       *loadbalance;    /* load balance over hosts */
> +    char       *randomseed;        /* seed for randomization of load balancing */
>      char       *keepalives;        /* use TCP keepalives? */

A bit unclear why you put the variables at this point in the list, but
the list doesn't seem to be ordered strictly anyway; still, maybe they
would fit best at the bottom below target_session_attrs?

>      char       *keepalives_idle;    /* time between TCP keepalives */
>      char       *keepalives_interval;    /* time between TCP keepalive
> @@ -461,8 +465,10 @@ struct pg_conn
>      PGTargetServerType target_server_type;    /* desired session properties */
>      bool        try_next_addr;    /* time to advance to next address/host? */
>      bool        try_next_host;    /* time to advance to next connhost[]? */
> -    struct addrinfo *addrlist;    /* list of addresses for current connhost */
> -    struct addrinfo *addr_cur;    /* the one currently being tried */
> +    int            naddr;            /* number of addrs returned by getaddrinfo */
> +    int            whichaddr;        /* the addr currently being tried */

Address(es) is always spelt out in the comments, those two addr(s)
should also I think.

> diff --git a/src/interfaces/libpq/t/003_loadbalance.pl b/src/interfaces/libpq/t/003_loadbalance.pl
> new file mode 100644
> index 0000000000..07eddbe9cc
> --- /dev/null
> +++ b/src/interfaces/libpq/t/003_loadbalance.pl
> @@ -0,0 +1,167 @@
> +# Copyright (c) 2022, PostgreSQL Global Development Group

Copyright bump needed.


Cheers,

Michael



Re: [EXTERNAL] Re: Support load balancing in libpq

From
Jelte Fennema
Date:
Attached an updated patch which should address your feedback and
I updated the commit message.

> I wonder whether making the parameter a boolean will paint us into a
> corner

I made it a string option, just like target_session_attrs. I'm pretty sure a
round-robin load balancing policy could be implemented in the future
given certain constraints, like connections being made within the same
process. I adjusted the docs accordingly.

> > +typedef struct
> > +{
> > +    int            family;
> > +    SockAddr    addr;
> > +}            AddrInfo;
>
> That last line looks weirdly indented compared to SockAddr; in the
> struct above.

Yes I agree, but for some reason pgindent really badly wants it formatted
that way. I now undid the changes made by pgindent manually.

> I wonder whether this needs to be documented if it is mostly a
> development/testing parameter?

I also wasn't sure whether it should be documented or not. I'm fine with
either, I'll leave it in for now and let a committer decide if it's wanted or not.

> A bit unclear why you put the variables at this point in the list, but
> the list doesn't seem to be ordered strictly anyway; still, maybe they
> would fit best at the bottom below target_session_attrs?

Good point, I added them after target_session_attrs now and also moved
docs/parsing accordingly. This makes conceptually to me as well, since
target_session_attrs and load_balance_hosts have some interesting
sense contextually too.

P.S. I also attached the same pgindent run patch that I added in

https://www.postgresql.org/message-id/flat/AM5PR83MB0178D3B31CA1B6EC4A8ECC42F7529@AM5PR83MB0178.EURPRD83.prod.outlook.com
Attachment

Re: [EXTERNAL] Re: Support load balancing in libpq

From
Jacob Champion
Date:
On Wed, Sep 14, 2022 at 7:54 AM Maxim Orlov <orlovmg@gmail.com> wrote:
> For the patch itself, I think it is better to use a more precise time function in libpq_prng_init or call it only
once.
> Thought it is a special corner case, imagine all the connection attempts at first second will be seeded with the
save
> value, i.e. will attempt to connect to the same host. I think, this is not we want to achieve.

Just a quick single-issue review, but I agree with Maxim that having
one PRNG, seeded once, would be simpler -- with the tangible benefit
that it would eliminate weird behavior on simultaneous connections
when the client isn't using OpenSSL. (I'm guessing a simple lock on a
global PRNG would be less overhead than the per-connection
strong_random machinery, too, but I have no data to back that up.) The
test seed could then be handled globally as well (envvar?) so that you
don't have to introduce a debug-only option into the connection
string.

Overall, I like the concept.

--Jacob



Re: [EXTERNAL] Re: Support load balancing in libpq

From
Jelte Fennema
Date:
> Just a quick single-issue review, but I agree with Maxim that having
> one PRNG, seeded once, would be simpler

I don't agree that it's simpler. Because now there's a mutex you have
to manage, and honestly cross-platform threading in C is not simple.
However, I attached two additional patches that implement this
approach on top of the previous patchset. Just to make sure that
this patch is not blocked on this.

> with the tangible benefit that it would eliminate weird behavior on
> simultaneous connections when the client isn't using OpenSSL.

This is true, but still I think in practice very few people have a libpq
that's compiled without OpenSSL support.

> I'm guessing a simple lock on a
> global PRNG would be less overhead than the per-connection
> strong_random machinery, too, but I have no data to back that up.

It might very well have less overhead, but neither of them should take
up any significant amount of time during connection establishment.

> The test seed could then be handled globally as well (envvar?) so that you
> don't have to introduce a debug-only option into the connection string.

Why is a debug-only envvar any better than a debug-only connection option?
For now I kept the connection option approach, since to me they seem pretty
much equivalent.

Attachment

Re: [EXTERNAL] Re: Support load balancing in libpq

From
Jacob Champion
Date:
On Fri, Jan 13, 2023 at 9:10 AM Jelte Fennema <postgres@jeltef.nl> wrote:
>
> > Just a quick single-issue review, but I agree with Maxim that having
> > one PRNG, seeded once, would be simpler
>
> I don't agree that it's simpler. Because now there's a mutex you have
> to manage, and honestly cross-platform threading in C is not simple.
> However, I attached two additional patches that implement this
> approach on top of the previous patchset. Just to make sure that
> this patch is not blocked on this.

It hadn't been my intention to block the patch on it, sorry. Just
registering a preference.

I also didn't intend to make you refactor the locking code -- my
assumption was that you could use the existing pglock_thread() to
handle it, since it didn't seem like the additional contention would
hurt too much. Maybe that's not actually performant enough, in which
case my suggestion loses an advantage.

> > The test seed could then be handled globally as well (envvar?) so that you
> > don't have to introduce a debug-only option into the connection string.
>
> Why is a debug-only envvar any better than a debug-only connection option?
> For now I kept the connection option approach, since to me they seem pretty
> much equivalent.

I guess I worry less about envvar namespace pollution than pollution
of the connection options. And my thought was that the one-time
initialization could be moved to a place that doesn't need to know the
connection options at all, to make it easier to reason about the
architecture. Say, next to the WSAStartup machinery.

But as it is now, I agree that the implementation hasn't really lost
any complexity compared to the original, and I don't feel particularly
strongly about it. If it doesn't help to make the change, then it
doesn't help.

Thanks,
--Jacob



Re: [EXTERNAL] Re: Support load balancing in libpq

From
Jacob Champion
Date:
On Fri, Jan 13, 2023 at 10:44 AM Jacob Champion <jchampion@timescale.com> wrote:
> And my thought was that the one-time
> initialization could be moved to a place that doesn't need to know the
> connection options at all, to make it easier to reason about the
> architecture. Say, next to the WSAStartup machinery.

(And after marinating on this over the weekend, it occurred to me that
keeping the per-connection option while making the PRNG global
introduces an additional hazard, because two concurrent connections
can now fight over the seed value.)

--Jacob



Re: [EXTERNAL] Re: Support load balancing in libpq

From
Jelte Fennema
Date:
As far as I can tell this is ready for committer feedback now btw. I'd
really like to get this into PG16.

> It hadn't been my intention to block the patch on it, sorry. Just
> registering a preference.

No problem. I hadn't looked into the shared PRNG solution closely
enough to determine if I thought it was better or not. Now that I
implemented an initial version I personally don't think it brings
enough advantages to warrant the added complexity. I definitely
don't think it's required for this patch, if needed this change can
always be done later without negative user impact afaict. And the
connection local PRNG works well enough to bring advantages.

> my
> assumption was that you could use the existing pglock_thread() to
> handle it, since it didn't seem like the additional contention would
> hurt too much.

That definitely would have been the easier approach and I considered
it. But the purpose of pglock_thread seemed so different from this lock
that it felt weird to combine the two. Another reason I refactored the lock
code is that it would be probably be necessary for a future round-robin
load balancing, which would require sharing state between different
connections.

> > And my thought was that the one-time
> > initialization could be moved to a place that doesn't need to know the
> > connection options at all, to make it easier to reason about the
> > architecture. Say, next to the WSAStartup machinery.

That's an interesting thought, but I don't think it would really simplify
the initialization code. Mostly it would change its location.

> (And after marinating on this over the weekend, it occurred to me that
> keeping the per-connection option while making the PRNG global
> introduces an additional hazard, because two concurrent connections
> can now fight over the seed value.)

I think since setting the initial seed value is really only meant for testing
it's not a big deal if it doesn't work with concurrent connections.



Re: [EXTERNAL] Re: Support load balancing in libpq

From
Jelte Fennema
Date:
After discussing this patch privately with Andres here's a new version of this
patch. The major differences are:
1. Use the pointer value of the connection as a randomness source
2. Use more precise time as randomness source
3. Move addrinfo changes into a separate commit. This is both to make
the actual change cleaner, and because another patch of mine (non
blocking cancels) benefits from the same change.
4. Use the same type of Fisher-Yates shuffle as is done in two other
places in the PG source code.
5. Move tests depending on hosts file to a separate file. This makes
it clear in the output that tests are skipped, because skip_all shows
a nice message.
6. Only enable hosts file load balancing when loadbalance is included
in PG_TEST_EXTRA, since this test listens on TCP socket and is thus
dangerous on a multi-user system.

On Wed, 18 Jan 2023 at 11:24, Jelte Fennema <postgres@jeltef.nl> wrote:
>
> As far as I can tell this is ready for committer feedback now btw. I'd
> really like to get this into PG16.
>
> > It hadn't been my intention to block the patch on it, sorry. Just
> > registering a preference.
>
> No problem. I hadn't looked into the shared PRNG solution closely
> enough to determine if I thought it was better or not. Now that I
> implemented an initial version I personally don't think it brings
> enough advantages to warrant the added complexity. I definitely
> don't think it's required for this patch, if needed this change can
> always be done later without negative user impact afaict. And the
> connection local PRNG works well enough to bring advantages.
>
> > my
> > assumption was that you could use the existing pglock_thread() to
> > handle it, since it didn't seem like the additional contention would
> > hurt too much.
>
> That definitely would have been the easier approach and I considered
> it. But the purpose of pglock_thread seemed so different from this lock
> that it felt weird to combine the two. Another reason I refactored the lock
> code is that it would be probably be necessary for a future round-robin
> load balancing, which would require sharing state between different
> connections.
>
> > > And my thought was that the one-time
> > > initialization could be moved to a place that doesn't need to know the
> > > connection options at all, to make it easier to reason about the
> > > architecture. Say, next to the WSAStartup machinery.
>
> That's an interesting thought, but I don't think it would really simplify
> the initialization code. Mostly it would change its location.
>
> > (And after marinating on this over the weekend, it occurred to me that
> > keeping the per-connection option while making the PRNG global
> > introduces an additional hazard, because two concurrent connections
> > can now fight over the seed value.)
>
> I think since setting the initial seed value is really only meant for testing
> it's not a big deal if it doesn't work with concurrent connections.

Attachment

Re: [EXTERNAL] Re: Support load balancing in libpq

From
Greg S
Date:
This patch seems to need a rebase.

I'll update the status to Waiting on Author for now. After rebasing
please update it to either Needs Review or Ready for Committer
depending on how simple the rebase was and whether there are open
questions to finish it.



Re: [EXTERNAL] Re: Support load balancing in libpq

From
Jelte Fennema
Date:
done and updated cf entry

On Wed, 1 Mar 2023 at 20:13, Greg S <stark.cfm@gmail.com> wrote:
>
> This patch seems to need a rebase.
>
> I'll update the status to Waiting on Author for now. After rebasing
> please update it to either Needs Review or Ready for Committer
> depending on how simple the rebase was and whether there are open
> questions to finish it.

Attachment

Re: [EXTERNAL] Re: Support load balancing in libpq

From
Andrey Borodin
Date:
On Wed, Mar 1, 2023 at 12:03 PM Jelte Fennema <postgres@jeltef.nl> wrote:
>
> done and updated cf entry
>

Hi Jelte!

I've looked into the patch. Although so many improvements can be
suggested, It definitely makes sense as-is too.
These improvements might be, for example, sorting hosts according to
ping latency or something like that. Or, perhaps, some other balancing
policies. Anyway, randomizing is a good start too.

I want to note that the Fisher-Yates algorithm is implemented in a
difficult to understand manner.
+if (j < i) /* avoid fetching undefined data if j=i */
This stuff does not make sense in case of shuffling arrays inplace. It
is important only for making a new copy of an array and only in
languages that cannot access uninitialized values. I'd suggest just
removing this line (in both cases).


Best regards, Andrey Borodin.



Re: [EXTERNAL] Re: Support load balancing in libpq

From
Jelte Fennema
Date:
> I want to note that the Fisher-Yates algorithm is implemented in a
> difficult to understand manner.
> +if (j < i) /* avoid fetching undefined data if j=i */
> This stuff does not make sense in case of shuffling arrays inplace. It
> is important only for making a new copy of an array and only in
> languages that cannot access uninitialized values. I'd suggest just
> removing this line (in both cases).

Done. Also added another patch to remove the same check from another
place in the codebase where it is unnecessary.

Attachment

Re: [EXTERNAL] Re: Support load balancing in libpq

From
Jelte Fennema
Date:
Small update. Improved some wording in the docs.

On Fri, 3 Mar 2023 at 15:37, Jelte Fennema <postgres@jeltef.nl> wrote:
>
> > I want to note that the Fisher-Yates algorithm is implemented in a
> > difficult to understand manner.
> > +if (j < i) /* avoid fetching undefined data if j=i */
> > This stuff does not make sense in case of shuffling arrays inplace. It
> > is important only for making a new copy of an array and only in
> > languages that cannot access uninitialized values. I'd suggest just
> > removing this line (in both cases).
>
> Done. Also added another patch to remove the same check from another
> place in the codebase where it is unnecessary.

Attachment

Re: [EXTERNAL] Re: Support load balancing in libpq

From
"Gregory Stark (as CFM)"
Date:
The pgindent run in b6dfee28f is causing this patch to need a rebase
for the cfbot to apply it.



Re: [EXTERNAL] Re: Support load balancing in libpq

From
Jelte Fennema
Date:
Rebased

On Tue, 14 Mar 2023 at 19:05, Gregory Stark (as CFM)
<stark.cfm@gmail.com> wrote:
>
> The pgindent run in b6dfee28f is causing this patch to need a rebase
> for the cfbot to apply it.

Attachment

Re: [EXTERNAL] Support load balancing in libpq

From
Daniel Gustafsson
Date:
In general I think this feature makes sense (which has been echoed many times
in the thread), and the implementation strikes a good balance of robustness and
simplicity.  Reading this I think it's very close to being committable, but I
have a few comments on the patch series:

+        sent to the same server. There are currently two modes:
The documentation lists the modes disabled and random, but I wonder if it's
worth expanding the docs to mention that "disabled" is pretty much a round
robin load balancing scheme?  It reads a bit odd to present load balancing
without a mention of round robin balancing given how common it is.

-       conn->addrlist_family = hint.ai_family = AF_UNSPEC;
+       hint.ai_family = AF_UNSPEC;
This removes all uses of conn->addrlist_family and that struct member can be
removed.

+        to, for example, load balance over stanby servers only. Once successfully
s/stanby/standby/

+            Postgres servers.
s/Postgres/<productname>PostgreSQL</productname>/

+            more addresses than others. So if you want uniform load balancing,
+            this is something to keep in mind. However, non-uniform load
+            balancing can also be used to your advantage, e.g. by providing the
The documentation typically use a less personal form, I would suggest something
along the lines of:

    "If uniform load balancing is required then an external load balancing tool
    must be used.  Non-uniform load balancing can also be used to skew the
    results, e.g.  by providing the.."

+               if (!libpq_prng_init(conn))
+                       return false;
This needs to set a returned error message with libpq_append_conn_error (feel
free to come up with a better wording of course):

  libpq_append_conn_error(conn, "unable to initiate random number generator");

-#ifndef WIN32
+/* MinGW has sys/time.h, but MSVC doesn't */
+#ifndef _MSC_VER
 #include <sys/time.h>
This seems unrelated to the patch in question, and should be a separate commit IMO.

+       LOAD_BALANCE_RANDOM,            /* Read-write server */
I assume this comment is a copy/paste and should have been reflecting random order?

+++ b/src/interfaces/libpq/t/003_loadbalance_host_list.pl
Nitpick, but we should probably rename this load_balance to match the parameter
being tested.

+++ b/src/interfaces/libpq/t/004_loadbalance_dns.pl
On the subject of tests, I'm a bit torn.  I appreciate that the patch includes
a thorough test, but I'm not convinced we should add that to the tree.  A test
which require root permission level manual system changes stand a very low
chance of ever being executed, and as such will equate to dead code that may
easily be broken or subtly broken.

I am also not a fan of the random_seed parameter.  A parameter only useful for
testing, and which enables testing by circumventing the mechanism to test
(making randomness predictable), seems like expensive bagage to carry around.
From experience we also know this risks ending up in production configs for all
the wrong reasons.

Given the implementation of this feature, the actual connection phase isn't any
different from just passing multiple hostnames and operating in the round-robin
fashion.  What we want to ensure is that the randomization isn't destroying the
connection array.  Let's see what we can do while still having tests that can
be executed in the buildfarm.

A few ideas:

  * Add basic tests for the load_balance_host connection param only accepting
    sane values

  * Alter the connect_ok() tests in 003_loadbalance_host_list.pl to not require
    random_seed but instead using randomization. Thinking out loud, how about
    something along these lines?
    - Passing a list of unreachable hostnames with a single good hostname
      should result in a connection.
    - Passing a list of one good hostname should result in a connection
    - Passing a list on n good hostname (where all are equal) should result in
      a connection
    - Passing a list of n unreachable hostnames should result in log entries
      for n broken resolves in some order, and running that test n' times
      shouldn't - statistically - result in the same order for a large enough n'.

  * Remove random_seed and 004_loadbalance_dns.pl

Thoughts?

--
Daniel Gustafsson




Re: [EXTERNAL] Support load balancing in libpq

From
Jelte Fennema
Date:
> The documentation lists the modes disabled and random, but I wonder if it's
> worth expanding the docs to mention that "disabled" is pretty much a round
> robin load balancing scheme?  It reads a bit odd to present load balancing
> without a mention of round robin balancing given how common it is.

I think you misunderstood what I meant in that section, so I rewrote
it to hopefully be clearer. Because disabled really isn't the same as
round-robin.

> This removes all uses of conn->addrlist_family and that struct member can be
> removed.

done

> s/stanby/standby/
> s/Postgres/<productname>PostgreSQL</productname>/

done

> The documentation typically use a less personal form, I would suggest something
> along the lines of:
>
>     "If uniform load balancing is required then an external load balancing tool
>     must be used.  Non-uniform load balancing can also be used to skew the
>     results, e.g.  by providing the.."

rewrote this to stop using "you" and expanded a bit on the topic

>   libpq_append_conn_error(conn, "unable to initiate random number generator");

done

> -#ifndef WIN32
> +/* MinGW has sys/time.h, but MSVC doesn't */
> +#ifndef _MSC_VER
>  #include <sys/time.h>
> This seems unrelated to the patch in question, and should be a separate commit IMO.

It's not really unrelated. This only started to be needed because
libpq_prng_init calls gettimeofday . That did not work on MinGW
systems. Before this patch libpq was never calling gettimeofday. So I
think it makes sense to leave it in the commit.

> +       LOAD_BALANCE_RANDOM,            /* Read-write server */
> I assume this comment is a copy/paste and should have been reflecting random order?

Yes, done

> +++ b/src/interfaces/libpq/t/003_loadbalance_host_list.pl
> Nitpick, but we should probably rename this load_balance to match the parameter
> being tested.

Done

> A test
> which require root permission level manual system changes stand a very low
> chance of ever being executed, and as such will equate to dead code that may
> easily be broken or subtly broken.

While I definitely agree that it makes it hard to execute, I don't
think that means it will be executed nearly as few times as you
suggest. Maybe you missed it, but I modified the .cirrus.yml file to
configure the hosts file for both Linux and Windows runs. So, while I
agree it is unlikely to be executed manually by many people, it would
still be run on every commit fest entry (which should capture most
issues that I can imagine could occur).

> I am also not a fan of the random_seed parameter.

Fair enough. Removed

> A few ideas:
>
>   * Add basic tests for the load_balance_host connection param only accepting
>     sane values
>
>   * Alter the connect_ok() tests in 003_loadbalance_host_list.pl to not require
>     random_seed but instead using randomization. Thinking out loud, how about
>     something along these lines?
>     - Passing a list of unreachable hostnames with a single good hostname
>       should result in a connection.
>     - Passing a list of one good hostname should result in a connection
>     - Passing a list on n good hostname (where all are equal) should result in
>       a connection

Implemented all these.

>     - Passing a list of n unreachable hostnames should result in log entries
>       for n broken resolves in some order, and running that test n' times
>       shouldn't - statistically - result in the same order for a large enough n'.

I didn't implement this one. Instead I went for another statistics
based approach with working hosts (see test for details).

>   * Remove random_seed and 004_loadbalance_dns.pl

I moved 004_load_balance_dns.pl to a separate commit (after making
similar random_seed removal related changes to it). As explained above
I think it's worth it to have it because it gets executed in CI. But
feel free to commit only the main patch, if you disagree.

Attachment

Re: [EXTERNAL] Support load balancing in libpq

From
Daniel Gustafsson
Date:
> On 17 Mar 2023, at 09:50, Jelte Fennema <postgres@jeltef.nl> wrote:
>
>> The documentation lists the modes disabled and random, but I wonder if it's
>> worth expanding the docs to mention that "disabled" is pretty much a round
>> robin load balancing scheme?  It reads a bit odd to present load balancing
>> without a mention of round robin balancing given how common it is.
>
> I think you misunderstood what I meant in that section, so I rewrote
> it to hopefully be clearer. Because disabled really isn't the same as
> round-robin.

Thinking more about it I removed that section since it adds more confusion than
it resolves I think.  It would be interesting to make it a true round-robin
with some form of locally stored pointer to last connection but thats for
future hacking.

>> -#ifndef WIN32
>> +/* MinGW has sys/time.h, but MSVC doesn't */
>> +#ifndef _MSC_VER
>> #include <sys/time.h>
>> This seems unrelated to the patch in question, and should be a separate commit IMO.
>
> It's not really unrelated. This only started to be needed because
> libpq_prng_init calls gettimeofday . That did not work on MinGW
> systems. Before this patch libpq was never calling gettimeofday. So I
> think it makes sense to leave it in the commit.

Gotcha.

>> A test
>> which require root permission level manual system changes stand a very low
>> chance of ever being executed, and as such will equate to dead code that may
>> easily be broken or subtly broken.
>
> While I definitely agree that it makes it hard to execute, I don't
> think that means it will be executed nearly as few times as you
> suggest. Maybe you missed it, but I modified the .cirrus.yml file to
> configure the hosts file for both Linux and Windows runs. So, while I
> agree it is unlikely to be executed manually by many people, it would
> still be run on every commit fest entry (which should capture most
> issues that I can imagine could occur).

I did see it was used in the CI since the jobs there are containerized, what
I'm less happy about is that we wont be able to test this in the BF.  That
being said, not having the test at all would mean even less testing so in the
end I agree that including it is the least bad option.  Longer term I would
like to rework into something less do-this-manually test, but I have no good
ideas right now.

I've played around some more with this and came up with the attached v15 which
I think is close to the final state.  The changes I've made are:

  * Added the DNS test back into the main commit
  * A few incorrect (referred to how the test worked previously) comments in
    the tests fixed.
  * The check against PG_TEST_EXTRA performed before any processing done
  * Reworked the check for hosts content attempting to make it a bit more
    robust
  * Changed store_conn_addrinfo to return int like how all the functions
    dealing with addrinfo does.  Also moved the error reporting to inside there
    where the error happened.
  * Made the prng init function void as it always returned true anyways.
  * Minor comment and docs tweaking.
  * I removed the change to geqo, while I don't think it's incorrect it also
    hardly seems worth the churn.
  * Commit messages are reworded.

I would like to see this wrapped up in the current CF, what do you think about
the attached?

--
Daniel Gustafsson


Attachment

Re: [EXTERNAL] Support load balancing in libpq

From
Jelte Fennema
Date:
Looks good overall. I attached a new version with a few small changes:

>   * Changed store_conn_addrinfo to return int like how all the functions
>     dealing with addrinfo does.  Also moved the error reporting to inside there
>     where the error happened.

I don't feel strong about the int vs bool return type. The existing
static libpq functions are a bit of a mixed bag around this, so either
way seems fine to me. And moving the log inside the function seems
fine too. But it seems you accidentally removed the "goto
error_return" part as well, so now we're completely ignoring the
allocation failure. The attached patch fixes that.

>+ok($node1_occurences > 1, "expected at least one execution on node1, found none");
>+ok($node2_occurences > 1, "expected at least one execution on node2, found none");
>+ok($node3_occurences > 1, "expected at least one execution on node3, found none");

I changed the message to be a description of the expected case,
instead of the failure case. This is in line with the way these
messages are used in other tests, and indeed seems like the correct
way because you get output from "meson test -v postgresql:libpq /
libpq/003_load_balance_host_list" like this:
▶ 6/6 - received at least one connection on node1       OK
▶ 6/6 - received at least one connection on node2       OK
▶ 6/6 - received at least one connection on node3       OK
▶ 6/6 - received 50 connections across all nodes        OK

Finally, I changed a few small typos in your updated commit message
(some of which originated from my earlier commit messages)

Attachment

Re: [EXTERNAL] Support load balancing in libpq

From
Daniel Gustafsson
Date:
> On 27 Mar 2023, at 13:50, Jelte Fennema <postgres@jeltef.nl> wrote:
>
> Looks good overall. I attached a new version with a few small changes:
>
>>  * Changed store_conn_addrinfo to return int like how all the functions
>>    dealing with addrinfo does.  Also moved the error reporting to inside there
>>    where the error happened.
>
> I don't feel strong about the int vs bool return type. The existing
> static libpq functions are a bit of a mixed bag around this, so either
> way seems fine to me. And moving the log inside the function seems
> fine too. But it seems you accidentally removed the "goto
> error_return" part as well, so now we're completely ignoring the
> allocation failure. The attached patch fixes that.

Ugh, thanks.  I had a conflict here when rebasing with the load balancing
commit in place and clearly fat-fingered that one.

>> +ok($node1_occurences > 1, "expected at least one execution on node1, found none");
>> +ok($node2_occurences > 1, "expected at least one execution on node2, found none");
>> +ok($node3_occurences > 1, "expected at least one execution on node3, found none");
>
> I changed the message to be a description of the expected case,
> instead of the failure case. This is in line with the way these
> messages are used in other tests, and indeed seems like the correct
> way because you get output from "meson test -v postgresql:libpq /
> libpq/003_load_balance_host_list" like this:
> ▶ 6/6 - received at least one connection on node1       OK
> ▶ 6/6 - received at least one connection on node2       OK
> ▶ 6/6 - received at least one connection on node3       OK
> ▶ 6/6 - received 50 connections across all nodes        OK

Good point.

> Finally, I changed a few small typos in your updated commit message
> (some of which originated from my earlier commit messages)

+1

--
Daniel Gustafsson




Re: [EXTERNAL] Support load balancing in libpq

From
Aleksander Alekseev
Date:
Hi,

> > ▶ 6/6 - received at least one connection on node1       OK
> > ▶ 6/6 - received at least one connection on node2       OK
> > ▶ 6/6 - received at least one connection on node3       OK
> > ▶ 6/6 - received 50 connections across all nodes        OK
>
> Good point.
>
> > Finally, I changed a few small typos in your updated commit message
> > (some of which originated from my earlier commit messages)
>
> +1

Hi,

> I would like to see this wrapped up in the current CF, what do you think about
> the attached?

In v15-0001:

```
+    conn->addr = calloc(conn->naddr, sizeof(AddrInfo));
+    if (conn->addr == NULL)
+    {
+        libpq_append_conn_error(conn, "out of memory");
+        return 1;
+    }
```

According to the man pages, in a corner case when naddr is 0 calloc
can return NULL which will not indicate an error.

So I think it should be:

```
if (conn->addr == NULL && conn->naddr != 0)
```

Other than that v15 looked very good. It was checked on Linux and
MacOS including running it under sanitizer.

I will take a look at v16 now.

--
Best regards,
Aleksander Alekseev



Re: [EXTERNAL] Support load balancing in libpq

From
Aleksander Alekseev
Date:
Hi,

> So I think it should be:
>
> ```
> if (conn->addr == NULL && conn->naddr != 0)
> ```
>
> [...]
>
> I will take a look at v16 now.

The code coverage could be slightly better.

In v16-0001:

```
+        ret = store_conn_addrinfo(conn, addrlist);
+        pg_freeaddrinfo_all(hint.ai_family, addrlist);
+        if (ret)
+            goto error_return;    /* message already logged */
```

The goto path is not test-covered.

In v16-0002:

```
+    }
+    else
+        conn->load_balance_type = LOAD_BALANCE_DISABLE;
```

The else branch is never executed.

```
         if (ret)
             goto error_return;    /* message already logged */

+        /*
+         * If random load balancing is enabled we shuffle the addresses.
+         */
+        if (conn->load_balance_type == LOAD_BALANCE_RANDOM)
+        {
+            /*
+             * This is the "inside-out" variant of the Fisher-Yates shuffle
[...]
+             */
+            for (int i = 1; i < conn->naddr; i++)
+            {
+                int            j =
pg_prng_uint64_range(&conn->prng_state, 0, i);
+                AddrInfo    temp = conn->addr[j];
+
+                conn->addr[j] = conn->addr[i];
+                conn->addr[i] = temp;
+            }
+        }
```

Strangely enough the body of the for loop is never executed either.
Apparently only one address is used and there is nothing to shuffle?

Here is the exact command I used to build the code coverage report:

```
git clean -dfx && meson setup --buildtype debug -Db_coverage=true
-Dcassert=true -DPG_TEST_EXTRA="kerberos ldap ssl load_balance"
-Dldap=disabled -Dssl=openssl -Dtap_tests=enabled
-Dprefix=/home/eax/projects/pginstall build && ninja -C build &&
PG_TEST_EXTRA=1 meson test -C build && ninja -C build coverage-html
```

I'm sharing this for the sake of completeness. I don't have a strong
opinion on whether we should bother with covering every new line of
code with tests.

Except for the named nitpicks v16 looks good to me.

-- 
Best regards,
Aleksander Alekseev



Re: [EXTERNAL] Support load balancing in libpq

From
Aleksander Alekseev
Date:
Hi,

> ```
> +        ret = store_conn_addrinfo(conn, addrlist);
> +        pg_freeaddrinfo_all(hint.ai_family, addrlist);
> +        if (ret)
> +            goto error_return;    /* message already logged */
> ```
> The goto path is not test-covered.

D'oh, this one is fine since store_conn_addrinfo() is going to fail
only when we are out of memory.

-- 
Best regards,
Aleksander Alekseev



Re: [EXTERNAL] Support load balancing in libpq

From
Jelte Fennema
Date:
> > ```
> > if (conn->addr == NULL && conn->naddr != 0)
> > ```

Afaict this is not necessary, since getaddrinfo already returns an
error if the host could not be resolved to any addresses. A quick test
gives me this error:
error: could not translate host name "doesnotexist" to address: Name
or service not known

>
> ```
> +    }
> +    else
> +        conn->load_balance_type = LOAD_BALANCE_DISABLE;
> ```
>
> The else branch is never executed.

I don't think that line is coverable then. There's definitely places
in the test suite where load_balance_hosts is not explicitly set. But
even in those cases I guess the argument parsing logic will use
DefaultLoadBalanceHosts instead of NULL as a value for
conn->load_balance_type.

> Strangely enough the body of the for loop is never executed either.
> Apparently only one address is used and there is nothing to shuffle?
>
> Here is the exact command I used to build the code coverage report:

I guess you didn't set up the hostnames in /etc/hosts as described in
004_load_balance_dns.pl. Then it's expected that the loop body isn't
covered. As discussed upthread, running this test manually is much
more cumbersome than is desirable, but it's still better than not
having the test at all, because it is run in CI.



Re: [EXTERNAL] Support load balancing in libpq

From
Aleksander Alekseev
Date:
Hi,

> I guess you didn't set up the hostnames in /etc/hosts as described in
> 004_load_balance_dns.pl. Then it's expected that the loop body isn't
> covered. As discussed upthread, running this test manually is much
> more cumbersome than is desirable, but it's still better than not
> having the test at all, because it is run in CI.

Got it, thanks.

I guess I'm completely out of nitpicks then!

-- 
Best regards,
Aleksander Alekseev



Re: [EXTERNAL] Support load balancing in libpq

From
Tatsuo Ishii
Date:
Hi,

"unlikely" macro is used in libpq_prng_init() in the patch. I wonder
if the place is really 'hot' to use "unlikely" macro.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: [EXTERNAL] Support load balancing in libpq

From
Daniel Gustafsson
Date:
> On 28 Mar 2023, at 09:16, Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

> "unlikely" macro is used in libpq_prng_init() in the patch. I wonder
> if the place is really 'hot' to use "unlikely" macro.

I don't think it is, I was thinking to rewrite as the below sketch:

{
    if (pg_prng_strong_seed(&conn->prng_state)))
        return;

    /* fallback seeding */
}

--
Daniel Gustafsson




Re: [EXTERNAL] Support load balancing in libpq

From
Jelte Fennema
Date:
I think it's fine to remove it. It originated from postmaster.c, where
I copied the original implementation of libpq_prng_init from.

On Tue, 28 Mar 2023 at 09:22, Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 28 Mar 2023, at 09:16, Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
>
> > "unlikely" macro is used in libpq_prng_init() in the patch. I wonder
> > if the place is really 'hot' to use "unlikely" macro.
>
> I don't think it is, I was thinking to rewrite as the below sketch:
>
> {
>     if (pg_prng_strong_seed(&conn->prng_state)))
>         return;
>
>     /* fallback seeding */
> }
>
> --
> Daniel Gustafsson
>



Re: [EXTERNAL] Support load balancing in libpq

From
Tatsuo Ishii
Date:
>> "unlikely" macro is used in libpq_prng_init() in the patch. I wonder
>> if the place is really 'hot' to use "unlikely" macro.
> 
> I don't think it is, I was thinking to rewrite as the below sketch:
> 
> {
>     if (pg_prng_strong_seed(&conn->prng_state)))
>         return;
> 
>     /* fallback seeding */
> }

+1.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: [EXTERNAL] Support load balancing in libpq

From
Tatsuo Ishii
Date:
> I think it's fine to remove it. It originated from postmaster.c, where
> I copied the original implementation of libpq_prng_init from.

I agree to remove unlikely macro here.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: [EXTERNAL] Support load balancing in libpq

From
Daniel Gustafsson
Date:
I took another couple of looks at this and pushed it after a few small tweaks
to the docs.

--
Daniel Gustafsson




RE: [EXTERNAL] Support load balancing in libpq

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Daniel, Jelte

Thank you for creating a good feature!
While checking the buildfarm, I found a failure on NetBSD caused by the added code[1]:

```
fe-connect.c: In function 'libpq_prng_init':
fe-connect.c:1048:11: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
 1048 |  rseed = ((uint64) conn) ^
      |           ^
cc1: all warnings being treated as errors
```

This failure seemed to occurr when the pointer is casted to different size.
And while checking more, I found that this machine seemed that size of pointer is 4 byte [2],
whereas sizeof(uint64) is 8.

```
checking size of void *... (cached) 4
```

I could not test because I do not have NetBSD, but I have come up with
Following solution to avoid the failure. sizeof(uintptr_t) will be addressed
based on the environment. How do you think?

```
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index a13ec16b32..bb7347cb0c 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -1045,7 +1045,7 @@ libpq_prng_init(PGconn *conn)

        gettimeofday(&tval, NULL);

-       rseed = ((uint64) conn) ^
+       rseed = ((uintptr_t) conn) ^
                        ((uint64) getpid()) ^
                        ((uint64) tval.tv_usec) ^
                        ((uint64) tval.tv_sec);
```

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2023-03-29%2023%3A24%3A44
[2]: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=mamba&dt=2023-03-29%2023%3A24%3A44&stg=configure

Best Regards,
Hayato Kuroda
FUJITSU LIMITED




Re: [EXTERNAL] Support load balancing in libpq

From
Daniel Gustafsson
Date:
> On 30 Mar 2023, at 03:48, Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote:

> While checking the buildfarm, I found a failure on NetBSD caused by the added code[1]:

Thanks for reporting, I see that lapwing which runs Linux (Debian 7, gcc 4.7.2)
has the same error.  I'll look into it today to get a fix committed.

--
Daniel Gustafsson




Re: [EXTERNAL] Support load balancing in libpq

From
Julien Rouhaud
Date:
On Thu, Mar 30, 2023 at 3:03 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 30 Mar 2023, at 03:48, Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote:
>
> > While checking the buildfarm, I found a failure on NetBSD caused by the added code[1]:
>
> Thanks for reporting, I see that lapwing which runs Linux (Debian 7, gcc 4.7.2)
> has the same error.  I'll look into it today to get a fix committed.

This is an i686 machine, so it probably has the same void *
difference.  Building with -m32 might be enough to reproduce the
problem.



Re: [EXTERNAL] Support load balancing in libpq

From
Daniel Gustafsson
Date:
> On 30 Mar 2023, at 10:00, Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Thu, Mar 30, 2023 at 3:03 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>>
>>> On 30 Mar 2023, at 03:48, Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote:
>>
>>> While checking the buildfarm, I found a failure on NetBSD caused by the added code[1]:
>>
>> Thanks for reporting, I see that lapwing which runs Linux (Debian 7, gcc 4.7.2)
>> has the same error.  I'll look into it today to get a fix committed.
>
> This is an i686 machine, so it probably has the same void *
> difference.  Building with -m32 might be enough to reproduce the
> problem.

Makes sense.  I think the best option is to simply remove conn from being part
of the seed and rely on the other values.  Will apply that after a testrun.

--
Daniel Gustafsson




Re: [EXTERNAL] Support load balancing in libpq

From
Daniel Gustafsson
Date:
> On 30 Mar 2023, at 10:21, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> On 30 Mar 2023, at 10:00, Julien Rouhaud <rjuju123@gmail.com> wrote:
>>
>> On Thu, Mar 30, 2023 at 3:03 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>>>
>>>> On 30 Mar 2023, at 03:48, Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote:
>>>
>>>> While checking the buildfarm, I found a failure on NetBSD caused by the added code[1]:
>>>
>>> Thanks for reporting, I see that lapwing which runs Linux (Debian 7, gcc 4.7.2)
>>> has the same error.  I'll look into it today to get a fix committed.
>>
>> This is an i686 machine, so it probably has the same void *
>> difference.  Building with -m32 might be enough to reproduce the
>> problem.
>
> Makes sense.  I think the best option is to simply remove conn from being part
> of the seed and rely on the other values.  Will apply that after a testrun.

After some offlist discussion I ended up pushing the proposed uintptr_t fix
instead, now waiting for these animals to build.

--
Daniel Gustafsson