Re: [EXTERNAL] Re: Support load balancing in libpq - Mailing list pgsql-hackers
From | Michael Banck |
---|---|
Subject | Re: [EXTERNAL] Re: Support load balancing in libpq |
Date | |
Msg-id | 6325fc84.050a0220.ab071.038d@mx.google.com Whole thread Raw |
In response to | Re: [EXTERNAL] Re: Support load balancing in libpq (Jelte Fennema <Jelte.Fennema@microsoft.com>) |
List | pgsql-hackers |
Hi, On Mon, Sep 12, 2022 at 02:16:56PM +0000, Jelte Fennema wrote: > Attached is an updated patch with the following changes: > 1. rebased (including solved merge conflict) > 2. fixed failing tests in CI > 3. changed the commit message a little bit > 4. addressed the two remarks from Micheal > 5. changed the prng_state from a global to a connection level value for thread-safety > 6. use pg_prng_uint64_range Thanks! I tested this some more, and found it somewhat surprising that at least when looking at it on a microscopic level, some hosts are chosen more often than the others for a while. I basically ran while true; do psql -At "host=pg1,pg2,pg3 load_balance_hosts=1" -c "SELECT inet_server_addr()"; sleep 1; done and the initial output was: 10.0.3.109 10.0.3.109 10.0.3.240 10.0.3.109 10.0.3.109 10.0.3.240 10.0.3.109 10.0.3.240 10.0.3.240 10.0.3.240 10.0.3.240 10.0.3.109 10.0.3.240 10.0.3.109 10.0.3.109 10.0.3.240 10.0.3.240 10.0.3.109 10.0.3.60 I.e. the second host (pg2/10.0.3.60) was only hit after 19 iterations. Once significantly more than a hundred iterations are run, the hosts somewhat even out, but it is maybe suprising to users: 50 100 250 500 1000 10000 10.0.3.60 9 24 77 165 328 3317 10.0.3.109 25 42 88 178 353 3372 10.0.3.240 16 34 85 157 319 3311 Or maybe my test setup is skewed? When I choose a two seconds timeout between psql calls, I get a more even distribution initially, but it then diverges after 100 iterations: 50 100 250 500 1000 10.0.3.60 19 36 98 199 374 10.0.3.109 13 33 80 150 285 10.0.3.240 18 31 72 151 341 Could just be bad luck... I also switch one host to have two IP addresses in /etc/hosts: 10.0.3.109 pg1 10.0.3.60 pg1 10.0.3.240 pg3 And this resulted in this (one second timeout again): First run: 50 100 250 500 1000 10.0.3.60 10 18 56 120 255 10.0.3.109 14 30 67 139 278 10.0.3.240 26 52 127 241 467 Second run: 50 100 250 500 1000 10.0.3.60 20 31 77 138 265 10.0.3.109 9 20 52 116 245 10.0.3.240 21 49 121 246 490 So it looks like it load-balances between pg1 and pg3, and not between the three IPs - is this expected? If I switch from "host=pg1,pg3" to "host=pg1,pg1,pg3", each IP adress is hit roughly equally. So I guess this is how it should work, but in that case I think the documentation should be more explicit about what is to be expected if a host has multiple IP addresses or hosts are specified multiple times in the connection string. > > Maybe my imagination is not so great, but what else than hosts could we > > possibly load-balance? I don't mind calling it load_balance, but I also > > don't feel very strongly one way or the other and this is clearly > > bikeshed territory. > > I agree, which is why I called it load_balance in my original patch. > But I also think it's useful to match the naming for the already > existing implementations in the PG ecosystem around this. > But like you I don't really feel strongly either way. It's a tradeoff > between short name and consistency in the ecosystem. I don't think consistency is an extremely valid concern. As a counterpoint, pgJDBC had targetServerType some time before Postgres, and the libpq parameter was then named somewhat differently when it was introduced, namely target_session_attrs. > > If I understand correctly, you've added DNS-based load balancing on top > > of just shuffling the provided hostnames. This makes sense if a > > hostname is backed by more than one IP address in the context of load > > balancing, but it also complicates the patch. So I'm wondering how much > > shorter the patch would be if you leave that out for now? > > Yes, that's correct and indeed the patch would be simpler without, i.e. all the > addrinfo changes would become unnecessary. But IMHO the behaviour of > the added option would be very unexpected if it didn't load balance across > multiple IPs in a DNS record. libpq currently makes no real distinction in > handling of provided hosts and handling of their resolved IPs. If load balancing > would only apply to the host list that would start making a distinction > between the two. Fair enough, I agree. > Apart from that the load balancing across IPs is one of the main reasons > for my interest in this patch. The reason is that it allows expanding or reducing > the number of nodes that are being load balanced across transparently to the > application. Which means that there's no need to re-deploy applications with > new connection strings when changing the number hosts. That's a good point as well. > > On the other hand, I believe pgJDBC keeps track of which hosts are up or > > down and only load balances among the ones which are up (maybe > > rechecking after a timeout? I don't remember), is this something you're > > doing, or did you consider it? > > I don't think it's possible to do this in libpq without huge changes to its > architecture, since normally a connection will only a PGconn will only > create a single connection. The reason pgJDBC can do this is because > it's actually a connection pooler, so it will open more than one connection > and can thus keep some global state about the different hosts. Ok. Michael
pgsql-hackers by date: