Re: Support load balancing in libpq - Mailing list pgsql-hackers

From Michael Banck
Subject Re: Support load balancing in libpq
Date
Msg-id 63b85894.170a0220.304c7.19a6@mx.google.com
Whole thread Raw
In response to Re: Support load balancing in libpq  (Jelte Fennema <Jelte.Fennema@microsoft.com>)
Responses Re: [EXTERNAL] Re: Support load balancing in libpq
List pgsql-hackers
Hi,

On Tue, Nov 29, 2022 at 02:57:08PM +0000, Jelte Fennema wrote:
> Attached is a new version with the tests cleaned up a bit (more
> comments mostly).
> 
> @Michael, did you have a chance to look at the last version? Because I
> feel that the patch is pretty much ready for a committer to look at,
> at this point.

I had another look; it still applies and tests pass. I still find the
distribution over three hosts a bit skewed (even when OpenSSL is
enabled, which wasn't the case when I first tested it), but couldn't
find a general fault and it worked well enough in my testing.

I wonder whether making the parameter a boolean will paint us into a
corner, and whether maybe additional modes could be envisioned in the
future, but I can't think of some right now (you can pretty neatly
restrict load-balancing to standbys by setting
target_session_attrs=standby in addition). Maybe a way to change the
behaviour if a dns hostname is backed by multiple entries?

Some further (mostly nitpicking) comments on the patch:

> From 6e20bb223012b666161521b5e7249c066467a5f3 Mon Sep 17 00:00:00 2001
> From: Jelte Fennema <github-tech@jeltef.nl>
> Date: Mon, 12 Sep 2022 09:44:06 +0200
> Subject: [PATCH v5] Support load balancing in libpq
> 
> Load balancing connections across multiple read replicas is a pretty
> common way of scaling out read queries. There are two main ways of doing
> so, both with their own advantages and disadvantages:
> 1. Load balancing at the client level
> 2. Load balancing by connecting to an intermediary load balancer
> 
> Both JBDC (Java) and Npgsql (C#) already support client level load
> balancing (option #1). This patch implements client level load balancing
> for libpq as well. To stay consistent with the JDBC and Npgsql part of
> the  ecosystem, a similar implementation and name for the option are
> used. 

I still think all of the above has no business in the commit message,
though maybe the first paragraph can stay as introduction.

> It contains two levels of load balancing:
> 1. The given hosts are randomly shuffled, before resolving them
>     one-by-one.
> 2. Once a host its addresses get resolved, those addresses are shuffled,
>     before trying to connect to them one-by-one.

That's fine.

What should be in the commit message is at least a mention of what the
new connection parameter is called and possibly what is done to
accomplish it.

But the committer will pick this up if needed I guess.

> diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
> index f9558dec3b..6ce7a0c9cc 100644
> --- a/doc/src/sgml/libpq.sgml
> +++ b/doc/src/sgml/libpq.sgml
> @@ -1316,6 +1316,54 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
>        </listitem>
>       </varlistentry>
>  
> +     <varlistentry id="libpq-load-balance-hosts" xreflabel="load_balance_hosts">
> +      <term><literal>load_balance_hosts</literal></term>
> +      <listitem>
> +       <para>
> +        Controls whether the client load balances connections across hosts and
> +        adresses. The default value is 0, meaning off, this means that hosts are
> +        tried in order they are provided and addresses are tried in the order
> +        they are received from DNS or a hosts file. If this value is set to 1,
> +        meaning on, the hosts and addresses that they resolve to are tried in
> +        random order. Subsequent queries once connected will still be sent to
> +        the same server. Setting this to 1, is mostly useful when opening
> +        multiple connections at the same time, possibly from different machines.
> +        This way connections can be load balanced across multiple Postgres
> +        servers.
> +       </para>
> +       <para>
> +        When providing multiple hosts, these hosts are resolved in random order.
> +        Then if that host resolves to multiple addresses, these addresses are
> +        connected to in random order. Only once all addresses for a single host
> +        have been tried, the addresses for the next random host will be
> +        resolved. This behaviour can lead to non-uniform address selection in
> +        certain cases. Such as when not all hosts resolve to the same number of
> +        addresses, or when multiple hosts resolve to the same address. So if you
> +        want uniform load balancing, this is something to keep in mind. However,
> +        non-uniform load balancing also has usecases, e.g. providing the
> +        hostname of a larger server multiple times in the host string so it gets
> +        more requests.
> +       </para>
> +       <para>
> +        When using this setting it's recommended to also configure a reasonable
> +        value for <xref linkend="libpq-connect-connect-timeout"/>. Because then,
> +        if one of the nodes that are used for load balancing is not responding,
> +        a new node will be tried.
> +       </para>
> +      </listitem>
> +     </varlistentry>

I think this whole section is generally fine, but needs some
copyediting.

> +     <varlistentry id="libpq-random-seed" xreflabel="random_seed">
> +      <term><literal>random_seed</literal></term>
> +      <listitem>
> +       <para>
> +        Sets the random seed that is used by <xref linkend="libpq-load-balance-hosts"/>
> +        to randomize the host order. This option is mostly useful when running
> +        tests that require a stable random order.
> +       </para>
> +      </listitem>
> +     </varlistentry>

I wonder whether this needs to be documented if it is mostly a
development/testing parameter?

> diff --git a/src/include/libpq/pqcomm.h b/src/include/libpq/pqcomm.h
> index fcf68df39b..39e93b1392 100644
> --- a/src/include/libpq/pqcomm.h
> +++ b/src/include/libpq/pqcomm.h
> @@ -27,6 +27,12 @@ typedef struct
>      socklen_t    salen;
>  } SockAddr;
>  
> +typedef struct
> +{
> +    int            family;
> +    SockAddr    addr;
> +}            AddrInfo;

That last line looks weirdly indented compared to SockAddr; in the
struct above.

>  /* Configure the UNIX socket location for the well known port. */
>  
>  #define UNIXSOCK_PATH(path, port, sockdir) \
> diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
> index f88d672c6c..b4d3613713 100644
> --- a/src/interfaces/libpq/fe-connect.c
> +++ b/src/interfaces/libpq/fe-connect.c
> @@ -241,6 +241,14 @@ static const internalPQconninfoOption PQconninfoOptions[] = {
>          "Fallback-Application-Name", "", 64,
>      offsetof(struct pg_conn, fbappname)},
>  
> +    {"load_balance_hosts", NULL, NULL, NULL,
> +        "Load-Balance", "", 1,    /* should be just '0' or '1' */
> +    offsetof(struct pg_conn, loadbalance)},
> +
> +    {"random_seed", NULL, NULL, NULL,
> +        "Random-Seed", "", 10,    /* strlen(INT32_MAX) == 10 */
> +    offsetof(struct pg_conn, randomseed)},
> +
>      {"keepalives", NULL, NULL, NULL,
>          "TCP-Keepalives", "", 1,    /* should be just '0' or '1' */
>      offsetof(struct pg_conn, keepalives)},
> @@ -379,6 +387,7 @@ static bool fillPGconn(PGconn *conn, PQconninfoOption *connOptions);
>  static void freePGconn(PGconn *conn);
>  static void closePGconn(PGconn *conn);
>  static void release_conn_addrinfo(PGconn *conn);
> +static bool store_conn_addrinfo(PGconn *conn, struct addrinfo *addrlist);
>  static void sendTerminateConn(PGconn *conn);
>  static PQconninfoOption *conninfo_init(PQExpBuffer errorMessage);
>  static PQconninfoOption *parse_connection_string(const char *connstr,
> @@ -424,6 +433,9 @@ static void pgpassfileWarning(PGconn *conn);
>  static void default_threadlock(int acquire);
>  static bool sslVerifyProtocolVersion(const char *version);
>  static bool sslVerifyProtocolRange(const char *min, const char *max);
> +static int    loadBalance(PGconn *conn);
> +static bool parse_int_param(const char *value, int *result, PGconn *conn,
> +                            const char *context);
>  
>  
>  /* global variable because fe-auth.c needs to access it */
> @@ -1007,6 +1019,46 @@ parse_comma_separated_list(char **startptr, bool *more)
>      return p;
>  }
>  
> +/*
> + * Initializes the prng_state field of the connection. We want something
> + * unpredictable, so if possible, use high-quality random bits for the
> + * seed. Otherwise, fall back to a seed based on timestamp and PID.
> + */
> +static bool
> +libpq_prng_init(PGconn *conn)
> +{
> +    if (unlikely(conn->randomseed))
> +    {
> +        int            rseed;
> +
> +        if (!parse_int_param(conn->randomseed, &rseed, conn, "random_seed"))
> +        {
> +            return false;
> +        };

I think it's project policy to drop the braces for single statements in
if blocks.

> +        pg_prng_seed(&conn->prng_state, rseed);
> +    }
> +    else if (unlikely(!pg_prng_strong_seed(&conn->prng_state)))
> +    {
> +        uint64        rseed;
> +        time_t        now = time(NULL);
> +
> +        /*
> +         * Since PIDs and timestamps tend to change more frequently in their
> +         * least significant bits, shift the timestamp left to allow a larger
> +         * total number of seeds in a given time period.  Since that would
> +         * leave only 20 bits of the timestamp that cycle every ~1 second,
> +         * also mix in some higher bits.
> +         */
> +        rseed = ((uint64) getpid()) ^
> +            ((uint64) now << 12) ^
> +            ((uint64) now >> 20);
> +
> +        pg_prng_seed(&conn->prng_state, rseed);
> +    }
> +    return true;
> +}
> +
> +
>  /*

Additional newline.

> @@ -1164,6 +1217,36 @@ connectOptions2(PGconn *conn)
>          }
>      }
>  
> +    if (loadbalancehosts < 0)
> +    {
> +        appendPQExpBufferStr(&conn->errorMessage,
> +                             libpq_gettext("loadbalance parameter must be an integer\n"));
> +        return false;
> +    }
> +
> +    if (loadbalancehosts)
> +    {
> +        if (!libpq_prng_init(conn))
> +        {
> +            return false;
> +        }
> +
> +        /*
> +         * Shuffle connhost with a Durstenfeld/Knuth version of the
> +         * Fisher-Yates shuffle. Source:
> +         * https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#The_modern_algorithm
> +         */
> +        for (i = conn->nconnhost - 1; i > 0; i--)
> +        {
> +            int            j = pg_prng_uint64_range(&conn->prng_state, 0, i);
> +            pg_conn_host temp = conn->connhost[j];
> +
> +            conn->connhost[j] = conn->connhost[i];
> +            conn->connhost[i] = temp;
> +        }
> +    }
> +
> +
>      /*

Additional newline.

> @@ -1726,6 +1809,27 @@ connectFailureMessage(PGconn *conn, int errorno)
>          libpq_append_conn_error(conn, "\tIs the server running on that host and accepting TCP/IP connections?");
>  }
>  
> +/*
> + * Should we load balance across hosts? Returns 1 if yes, 0 if no, and -1 if
> + * conn->loadbalance is set to a value which is not parseable as an integer.
> + */
> +static int
> +loadBalance(PGconn *conn)
> +{
> +    char       *ep;
> +    int            val;
> +
> +    if (conn->loadbalance == NULL)
> +    {
> +        return 0;
> +    }

Another case of additional braces.

> +    val = strtol(conn->loadbalance, &ep, 10);
> +    if (*ep)
> +        return -1;
> +    return val != 0 ? 1 : 0;
> +}
> +
> +
>  /*

Additional newline.

> @@ -4041,6 +4154,63 @@ freePGconn(PGconn *conn)
>      free(conn);
>  }
>  
> +
> +/*

Additional newline.

> + * Copies over the AddrInfos from addrlist to the PGconn.
> + */
> +static bool
> +store_conn_addrinfo(PGconn *conn, struct addrinfo *addrlist)
> +{
> +    struct addrinfo *ai = addrlist;
> +
> +    conn->whichaddr = 0;
> +
> +    conn->naddr = 0;
> +    while (ai)
> +    {
> +        ai = ai->ai_next;
> +        conn->naddr++;
> +    }
> +
> +    conn->addr = calloc(conn->naddr, sizeof(AddrInfo));
> +    if (conn->addr == NULL)
> +    {
> +        return false;
> +    }

Additional braces.

> @@ -4048,11 +4218,10 @@ freePGconn(PGconn *conn)
>  static void
>  release_conn_addrinfo(PGconn *conn)
>  {
> -    if (conn->addrlist)
> +    if (conn->addr)
>      {
> -        pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist);
> -        conn->addrlist = NULL;
> -        conn->addr_cur = NULL;    /* for safety */
> +        free(conn->addr);
> +        conn->addr = NULL;
>      }
>  }
>  
> diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
> index 512762f999..76ee988038 100644
> --- a/src/interfaces/libpq/libpq-int.h
> +++ b/src/interfaces/libpq/libpq-int.h
> @@ -82,6 +82,8 @@ typedef struct
>  #endif
>  #endif                            /* USE_OPENSSL */
>  
> +#include "common/pg_prng.h"
> +
>  /*
>   * POSTGRES backend dependent Constants.
>   */
> @@ -373,6 +375,8 @@ struct pg_conn
>      char       *pgpassfile;        /* path to a file containing password(s) */
>      char       *channel_binding;    /* channel binding mode
>                                       * (require,prefer,disable) */
> +    char       *loadbalance;    /* load balance over hosts */
> +    char       *randomseed;        /* seed for randomization of load balancing */
>      char       *keepalives;        /* use TCP keepalives? */

A bit unclear why you put the variables at this point in the list, but
the list doesn't seem to be ordered strictly anyway; still, maybe they
would fit best at the bottom below target_session_attrs?

>      char       *keepalives_idle;    /* time between TCP keepalives */
>      char       *keepalives_interval;    /* time between TCP keepalive
> @@ -461,8 +465,10 @@ struct pg_conn
>      PGTargetServerType target_server_type;    /* desired session properties */
>      bool        try_next_addr;    /* time to advance to next address/host? */
>      bool        try_next_host;    /* time to advance to next connhost[]? */
> -    struct addrinfo *addrlist;    /* list of addresses for current connhost */
> -    struct addrinfo *addr_cur;    /* the one currently being tried */
> +    int            naddr;            /* number of addrs returned by getaddrinfo */
> +    int            whichaddr;        /* the addr currently being tried */

Address(es) is always spelt out in the comments, those two addr(s)
should also I think.

> diff --git a/src/interfaces/libpq/t/003_loadbalance.pl b/src/interfaces/libpq/t/003_loadbalance.pl
> new file mode 100644
> index 0000000000..07eddbe9cc
> --- /dev/null
> +++ b/src/interfaces/libpq/t/003_loadbalance.pl
> @@ -0,0 +1,167 @@
> +# Copyright (c) 2022, PostgreSQL Global Development Group

Copyright bump needed.


Cheers,

Michael



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Next
From: Tom Lane
Date:
Subject: Re: ATTACH PARTITION seems to ignore column generation status