Re: [EXTERNAL] Support load balancing in libpq - Mailing list pgsql-hackers

From Daniel Gustafsson
Subject Re: [EXTERNAL] Support load balancing in libpq
Date
Msg-id BCEC6618-4C6E-41CE-9BB7-EC2A6126EE37@yesql.se
Whole thread Raw
In response to Re: [EXTERNAL] Re: Support load balancing in libpq  (Jelte Fennema <postgres@jeltef.nl>)
Responses Re: [EXTERNAL] Support load balancing in libpq
List pgsql-hackers
In general I think this feature makes sense (which has been echoed many times
in the thread), and the implementation strikes a good balance of robustness and
simplicity.  Reading this I think it's very close to being committable, but I
have a few comments on the patch series:

+        sent to the same server. There are currently two modes:
The documentation lists the modes disabled and random, but I wonder if it's
worth expanding the docs to mention that "disabled" is pretty much a round
robin load balancing scheme?  It reads a bit odd to present load balancing
without a mention of round robin balancing given how common it is.

-       conn->addrlist_family = hint.ai_family = AF_UNSPEC;
+       hint.ai_family = AF_UNSPEC;
This removes all uses of conn->addrlist_family and that struct member can be
removed.

+        to, for example, load balance over stanby servers only. Once successfully
s/stanby/standby/

+            Postgres servers.
s/Postgres/<productname>PostgreSQL</productname>/

+            more addresses than others. So if you want uniform load balancing,
+            this is something to keep in mind. However, non-uniform load
+            balancing can also be used to your advantage, e.g. by providing the
The documentation typically use a less personal form, I would suggest something
along the lines of:

    "If uniform load balancing is required then an external load balancing tool
    must be used.  Non-uniform load balancing can also be used to skew the
    results, e.g.  by providing the.."

+               if (!libpq_prng_init(conn))
+                       return false;
This needs to set a returned error message with libpq_append_conn_error (feel
free to come up with a better wording of course):

  libpq_append_conn_error(conn, "unable to initiate random number generator");

-#ifndef WIN32
+/* MinGW has sys/time.h, but MSVC doesn't */
+#ifndef _MSC_VER
 #include <sys/time.h>
This seems unrelated to the patch in question, and should be a separate commit IMO.

+       LOAD_BALANCE_RANDOM,            /* Read-write server */
I assume this comment is a copy/paste and should have been reflecting random order?

+++ b/src/interfaces/libpq/t/003_loadbalance_host_list.pl
Nitpick, but we should probably rename this load_balance to match the parameter
being tested.

+++ b/src/interfaces/libpq/t/004_loadbalance_dns.pl
On the subject of tests, I'm a bit torn.  I appreciate that the patch includes
a thorough test, but I'm not convinced we should add that to the tree.  A test
which require root permission level manual system changes stand a very low
chance of ever being executed, and as such will equate to dead code that may
easily be broken or subtly broken.

I am also not a fan of the random_seed parameter.  A parameter only useful for
testing, and which enables testing by circumventing the mechanism to test
(making randomness predictable), seems like expensive bagage to carry around.
From experience we also know this risks ending up in production configs for all
the wrong reasons.

Given the implementation of this feature, the actual connection phase isn't any
different from just passing multiple hostnames and operating in the round-robin
fashion.  What we want to ensure is that the randomization isn't destroying the
connection array.  Let's see what we can do while still having tests that can
be executed in the buildfarm.

A few ideas:

  * Add basic tests for the load_balance_host connection param only accepting
    sane values

  * Alter the connect_ok() tests in 003_loadbalance_host_list.pl to not require
    random_seed but instead using randomization. Thinking out loud, how about
    something along these lines?
    - Passing a list of unreachable hostnames with a single good hostname
      should result in a connection.
    - Passing a list of one good hostname should result in a connection
    - Passing a list on n good hostname (where all are equal) should result in
      a connection
    - Passing a list of n unreachable hostnames should result in log entries
      for n broken resolves in some order, and running that test n' times
      shouldn't - statistically - result in the same order for a large enough n'.

  * Remove random_seed and 004_loadbalance_dns.pl

Thoughts?

--
Daniel Gustafsson




pgsql-hackers by date:

Previous
From: "shiy.fnst@fujitsu.com"
Date:
Subject: RE: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL
Next
From: Amit Kapila
Date:
Subject: Re: logical decoding and replication of sequences, take 2