Thread: hostorder and failover_timeout for libpq

hostorder and failover_timeout for libpq

From
Ildar Musin
Date:
Hello hackers,

Couple of years ago Victor Wagner presented a patch [1] that introduced 
multiple hosts capability and also hostorder and failover_timeout 
parameters for libpq. Subsequently multi-host feature was reimplemented 
by Robert Haas and committed. Later target_session_attrs parameter was 
also added. In this thread I want to revisit hostorder and 
failover_timeout proposal.

'hostorder' defines the order in which postgres instances listed in 
connection string will be tried. Possible values are:
* sequential (default)
* random

Random order can be used, for instance, for maintaining load balancing 
(which is particularly useful in multi-master cluster, but also can be 
used to load-balance read-only connections to standbys).

'failover_timeout' specifies time span (in seconds) during which libpq 
would continue attempts to connect to the hosts listed in connection 
string. If failover_timeout is specified then libpq will loop over hosts 
again and again until either it successfully connects to one of the 
hosts or it runs out of time.

I reimplemented 'hostorder' and 'failover_timeout' parameters in the 
attached patch. I also took some documentation pieces from Victor 
Wagner's original patch. I'll be glad to see any comments and 
suggestions. Thanks!

[1] 
https://www.postgresql.org/message-id/flat/20150818041850.GA5092%40wagner.pp.ru

-- 
Ildar Musin
i.musin@postgrespro.ru

Attachment

RE: hostorder and failover_timeout for libpq

From
"Iwata, Aya"
Date:
Hello Ildar,

I have a question about failover_timeout parameter.
Which would be better: implementing the parameter to retry at waiting time 
or controlling the connection retry on the application side?

Also, I have no idea if the amount of random access by hostorder parameter will have a good effect on load balancing.
Please let me know if there are examples.

I am sorry if these were examined by the previous thread. I haven't read it yet.

Regards,
Aya Iwata

Re: hostorder and failover_timeout for libpq

From
Surafel Temesgen
Date:
Hey ,
Here are a few comment.
+     <varlistentry id="libpq-connect-falover-timeout"
xreflabel="failover_timeout">
Here's a typo: ="libpq-connect-falover-timeout"
+    {"failover_timeout", NULL, NULL, NULL,
+        "Failover Timeout", "", 10,
Word is separated by hyphen in internalPQconninfoOption lable as a
surrounding code
+        If the value is <literal>random</literal>, the host to connect to
+        will be randomly picked from the list. It allows load balacing between
+        several cluster nodes.
I Can’t think of use case where randomly picking a node rather than in
user specified order can load balance the cluster better. Can you
explain the purpose of this feature more? And in the code I can’t see
a mechanism for preventing picking one host multiple time
By the way patch doesn’t apply cleanly I think it need a rebase
http://cfbot.cputube.org/patch_19_1631.log

Regards
Surafel


Re: hostorder and failover_timeout for libpq

From
Ildar Musin
Date:
Hello Surafel,

On Fri, Sep 14, 2018 at 2:03 PM Surafel Temesgen <surafel3000@gmail.com> wrote:
Hey ,
Here are a few comment.
+     <varlistentry id="libpq-connect-falover-timeout"
xreflabel="failover_timeout">
Here's a typo: ="libpq-connect-falover-timeout"
+       {"failover_timeout", NULL, NULL, NULL,
+               "Failover Timeout", "", 10,
Word is separated by hyphen in internalPQconninfoOption lable as a
surrounding code
+        If the value is <literal>random</literal>, the host to connect to
+        will be randomly picked from the list. It allows load balacing between
+        several cluster nodes.
I Can’t think of use case where randomly picking a node rather than in
user specified order can load balance the cluster better. Can you
explain the purpose of this feature more?
Probably load-balancing is a wrong word for this. Think of it as a connection routing mechanism. Let's say you have 10 servers and 100 clients willing to establish read-only connection. Without this feature all clients will go to the first specified host (unless they hit max_connections limit). And with random `hostorder` they would be splited between hosts more or less evenly.
 
And in the code I can’t see
a mechanism for preventing picking one host multiple time
The original idea was to collect all ip addresses that we get after resolving specified hostnames, put those addresses into a global array, apply random permutations to it and then use round robin algorithm trying to connect to each of them until we succeed. Now I'm not sure that this approach was the best. There are two concerns:

1. host name can be resolved to several ip addresses (which in turn can point to either the same physical server with multiple network interfaces or different servers). In described above schema each ip address would be added to the global array. This may lead to a situation when one host gets higher chance of being picked because it has more addresses in global array than other hosts.
2. host may support both ipv4 and ipv6 connections, which again leads to extra items in global array and therefore also increases its chance to be picked.

Another approach would be to leave `pg_conn->connhost` as it is now (i.e. not to create global addresses array) and just apply random permutations to it if `hostorder=random` is specified. And probably apply permutations to addresses list within each individual host. 

At this point I'd like to ask community what in your opinion would be the best course of action and whether this feature should be implemented within libpq at all? Because from my POV there are factors that really depend on network architecture and there is probably no single right solution.

Kind regards,
Ildar

Re: hostorder and failover_timeout for libpq

From
Michael Paquier
Date:
On Wed, Sep 19, 2018 at 02:26:53PM +0200, Ildar Musin wrote:
> Another approach would be to leave `pg_conn->connhost` as it is now (i.e.
> not to create global addresses array) and just apply random permutations to
> it if `hostorder=random` is specified. And probably apply permutations to
> addresses list within each individual host.
>
> At this point I'd like to ask community what in your opinion would be the
> best course of action and whether this feature should be implemented within
> libpq at all? Because from my POV there are factors that really depend on
> network architecture and there is probably no single right solution.

As things stand now, when multiple hosts are defined in a connection
string the order specified in the string is used until a successful
connection is done.  When working on Postgres-XC, we have implemented
similar capability at application-level.  However, now that libpq also
supports multi-host capabilities, I could see a point in having
something within libpq.  What could we get though except a random mode
for read-only or read-write load balancing?  This only use case looks a
bit limited to me to rework again the code paths discarding the
connection failures for that though, as there is as well the argument to
tell the application to generate its own connection string based on
libpq properties.  So my take would be to just do that at
application-level and not bother.

By the way, I can see that the latest patch available does not apply at
tries to juggle with multiple concepts.  I can see at least two of them:
failover_timeout and hostorder.  You should split things.  I have moved
the patch to next CF, waiting on author.
--
Michael

Attachment

Re: hostorder and failover_timeout for libpq

From
Dmitry Dolgov
Date:
> On Mon, Oct 1, 2018 at 9:10 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> By the way, I can see that the latest patch available does not apply at
> tries to juggle with multiple concepts.  I can see at least two of them:
> failover_timeout and hostorder.  You should split things.  I have moved
> the patch to next CF, waiting on author.

Unfortunately, patch still needs to be rebased, and probably split into two, as
Michael suggested. Any plans about it?


Re: hostorder and failover_timeout for libpq

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Wed, Sep 19, 2018 at 02:26:53PM +0200, Ildar Musin wrote:
>> At this point I'd like to ask community what in your opinion would be the
>> best course of action and whether this feature should be implemented within
>> libpq at all? Because from my POV there are factors that really depend on
>> network architecture and there is probably no single right solution.

> By the way, I can see that the latest patch available does not apply at
> tries to juggle with multiple concepts.  I can see at least two of them:
> failover_timeout and hostorder.  You should split things.  I have moved
> the patch to next CF, waiting on author.

Per the discussion about the nearby prefer-standby patch,

https://www.postgresql.org/message-id/flat/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@mail.gmail.com

it seems pretty unfortunate that this patch proposes functionality
that's nearly identical to something in pgJDBC, but isn't using the
same terminology pgJDBC uses.

It's even more unfortunate that we have three separate patch proposal
threads that are touching more or less the same territory, but don't
seem to be talking to each other.  This one is also relevant:

https://www.postgresql.org/message-id/flat/1700970.cRWpxnom9y@hammer.magicstack.net

            regards, tom lane


Re: hostorder and failover_timeout for libpq

From
Andres Freund
Date:
Hi,

On 2018-11-29 17:23:11 +0100, Dmitry Dolgov wrote:
> > On Mon, Oct 1, 2018 at 9:10 AM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > By the way, I can see that the latest patch available does not apply at
> > tries to juggle with multiple concepts.  I can see at least two of them:
> > failover_timeout and hostorder.  You should split things.  I have moved
> > the patch to next CF, waiting on author.
> 
> Unfortunately, patch still needs to be rebased, and probably split into two, as
> Michael suggested. Any plans about it?

As this hasn't been done, and Tom's questions haven't been addressed,
I'm marking this as returned with feedback.

Greetings,

Andres Freund