Thread: Proposal: Implement failover on libpq connect level.

Proposal: Implement failover on libpq connect level.

From
Victor Wagner
Date:
Rationale
=========

Since introduction of the WAL-based replication into the PostgreSQL, it is
possible to create high-availability and load-balancing clusters.

However, there is no support for failover in the client libraries. So, only
way to provide transparent for client application failover is IP address
migration. This approach has some limitation, i.e. it requires that
master and backup servers reside in the same subnet or may not be
feasible for other reasons.

Commercial RDBMS, such as Oracle, employ more flexible approach. They
allow to specify multiple servers in the connect string, so if primary
server is not available, client library tries to connect to other ones.

This approach allows to use geographically distributed failover clusters
and also is a cheap way to implement load-balancing (which is not
possible with IP address migration).

Proposed change
===============

Allow to specify multiple hosts in the libpq connect string. Make libpq
attempt to connect to all host simultaneously or in random order 
and use of the server which successfully establishes connection first.


Syntax
------


Libpq connect string can be either set of the keyword=value pairs
or an URL. 

In the first form it can be just allowed to specify keyword host
multiple times.
   "host=main-server host=standby1 host=standby2 port=5432 dbname=database"

In the second form host can be specified either in the first part of URL
or in the query parameters.
   postgresql://user@host/database
   postgresql:///database?host=hostname&user=username

If host is specified as a parameter, it is also possible to allow
multiple host parameters without breaking existing syntax.
   postgresql:///database?host=main-server&host=standby1&host=standby2

In order to implement load-balancing clusters, additional parameters
should be added readonly=boolean and loadbalancing=boolean

Support for this syntax extensions is added to the PQconnectdb, 
PQconnectdbParams, PQConnectStart and PQConnectStartParams,
but not PQsetdb/PQsetdblogin functions. 


Behavoir
--------

If PQconnectdb encounters connect string with multiple hosts specified,
it attempts to establish connection with all these hosts simultaneously,
and begins to work with server which responds first, unless
loadbalancing parameter is true.

If the loadbalancing parameter is true, it tries servers sequentially in 
the random order.

If the parameter readonly is false, after authenticating with server it
executes show transaction_read_only, to find out whether current
connection is to the master or to the hot standby, and connection is
considered successful only if server allows read write transactions.

This allows to have clients which write to the database and clients
which perform read-only access. Read-only clients would be load-balanced
between the master and slave servers, and read-write clients connect only to
the master (whichever server has this role at the moment of connection).

Information of the alternate servers should be stored in the PGconn structure.

Function PQreset should be able to take advantage of new syntax and
possibly open connection to the new master, if failover occurred
during lifetime of the connection.

Possible drawbacks
==================

Compatibility
-------------

Proposed patch requires no modifications to the server or protocol, and 
modification of synchronous function (PQconnectdb, PQconnectdbParams) 
doesn't introduce incompatible changes to the client library. 

Even if connect string with multiple host would be erroneously used
with version of libpq, which do not support this feature, it is not an
error.  It just use last host specified in the connect string.

There could be some compatibility problems with asynchronous connections
created with PQConnectStart functions. Problem is that we are trying
to establish several connections at once, and there are several sockets
which should be integrated into application event loop.

Even if we would try servers in some particular order (such as randomized
order during load balancing), file descriptor of socket can change during
execution PQConnectPoll, and existing applications are not prepared to it.

Performance impact
------------------

Performance impact seems to be negligible.

1. If connect string contain only one host, the only complication is the
maintenance of the data structure, which possible can hold more than
one host name. Connection process itself would not be affected.

2. If there is pure high-availability cluster, i.e. standby servers do
not accept client connections on the specified port, there is no extra
load on standby servers, and almost no (only several unsuccessful
connect calls) on client.

3. If there is load balancing cluster, there is no performance impacts
for read-only client, but each read-write client causes standby servers
to process extra connection to the point where server can report
read-only state of transaction (i.e. including SSL handshake and
postgresql authentication). Typically, in the situation where read-only
clients should be load-balanced using this feature, there are much more 
read-only clients, than read-write ones. So some extra load related with 
read-write connection seems to be justified by simplification of client
configuration.

--                  Victor Wagner <vitus@wagner.pp.ru>



Re: Proposal: Implement failover on libpq connect level.

From
Albe Laurenz
Date:
Victor Wagner wrote:
> Rationale
> =========
> 
> Since introduction of the WAL-based replication into the PostgreSQL, it is
> possible to create high-availability and load-balancing clusters.
> 
> However, there is no support for failover in the client libraries. So, only
> way to provide transparent for client application failover is IP address
> migration. This approach has some limitation, i.e. it requires that
> master and backup servers reside in the same subnet or may not be
> feasible for other reasons.
> 
> Commercial RDBMS, such as Oracle, employ more flexible approach. They
> allow to specify multiple servers in the connect string, so if primary
> server is not available, client library tries to connect to other ones.
> 
> This approach allows to use geographically distributed failover clusters
> and also is a cheap way to implement load-balancing (which is not
> possible with IP address migration).

I wonder how useful this is at the present time.

If the primary goes down and the client gets connected to the standby,
it would have read-only access there.  Most applications wouldn't cope
well with that.

Once we have multi-master replication that can be used for fail-over,
the picture will change.  Then a feature like that would be very useful indeed.

>     "host=main-server host=standby1 host=standby2 port=5432 dbname=database"

It seems a bit arbitrary to require that all servers use the same port.

Maybe parameters like host2, port2, host3, port3 etc. might be better.

Yours,
Laurenz Albe

Re: Proposal: Implement failover on libpq connect level.

From
PostgreSQL - Hans-Jürgen Schönig
Date:
> On 18 Aug 2015, at 10:32, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
>
> Victor Wagner wrote:
>> Rationale
>> =========
>>
>> Since introduction of the WAL-based replication into the PostgreSQL, it is
>> possible to create high-availability and load-balancing clusters.
>>
>> However, there is no support for failover in the client libraries. So, only
>> way to provide transparent for client application failover is IP address
>> migration. This approach has some limitation, i.e. it requires that
>> master and backup servers reside in the same subnet or may not be
>> feasible for other reasons.
>>
>> Commercial RDBMS, such as Oracle, employ more flexible approach. They
>> allow to specify multiple servers in the connect string, so if primary
>> server is not available, client library tries to connect to other ones.
>>
>> This approach allows to use geographically distributed failover clusters
>> and also is a cheap way to implement load-balancing (which is not
>> possible with IP address migration).
>
> I wonder how useful this is at the present time.
>
> If the primary goes down and the client gets connected to the standby,
> it would have read-only access there.  Most applications wouldn't cope
> well with that.
>
> Once we have multi-master replication that can be used for fail-over,
> the picture will change.  Then a feature like that would be very useful indeed.
>
>>    "host=main-server host=standby1 host=standby2 port=5432 dbname=database"
>
> It seems a bit arbitrary to require that all servers use the same port.
>
> Maybe parameters like host2, port2, host3, port3 etc. might be better.
>
> Yours,
> Laurenz Albe


i totally agree with laurenz.
in addition to that you have the “problem” of transactions. if you failover in the middle
of a transaction, strange things might happen from the application point of view.

the good thing, however, is that stupid middleware is sometimes not able to handle
failed connections. however, overall i think it is more of a danger than a benefit.
regards,
    hans



>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers




Re: Proposal: Implement failover on libpq connect level.

From
Albe Laurenz
Date:
Hans-Jürgen Schönig wrote:
> in addition to that you have the “problem” of transactions. if you failover in the middle
> of a transaction, strange things might happen from the application point of view.
> 
> the good thing, however, is that stupid middleware is sometimes not able to handle
> failed connections. however, overall i think it is more of a danger than a benefit.

Maybe I misunderstood the original proposal, but my impression was that the alternative
servers would be tried only at the time the connection is established, and there would be no
such problems as you describe.

Those could only happen if libpq automatically tried to reconnect upon failure without
the client noticing.

So the stupid middleware would get an error message, but the reconnect would actually work.

Yours,
Laurenz Albe

Re: Proposal: Implement failover on libpq connect level.

From
PostgreSQL - Hans-Jürgen Schönig
Date:
> On 18 Aug 2015, at 11:19, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
>
> Hans-Jürgen Schönig wrote:
>> in addition to that you have the “problem” of transactions. if you failover in the middle
>> of a transaction, strange things might happen from the application point of view.
>>
>> the good thing, however, is that stupid middleware is sometimes not able to handle
>> failed connections. however, overall i think it is more of a danger than a benefit.
>
> Maybe I misunderstood the original proposal, but my impression was that the alternative
> servers would be tried only at the time the connection is established, and there would be no
> such problems as you describe.



it would still leave the problem of having a read only on the other side unless you are using BDR or so.
regards,
    hans





Re: Proposal: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Tue, Aug 18, 2015 at 6:07 AM, PostgreSQL - Hans-Jürgen Schönig
<postgres@cybertec.at> wrote:
>> On 18 Aug 2015, at 11:19, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
>>
>> Hans-Jürgen Schönig wrote:
>>> in addition to that you have the “problem” of transactions. if you failover in the middle
>>> of a transaction, strange things might happen from the application point of view.
>>>
>>> the good thing, however, is that stupid middleware is sometimes not able to handle
>>> failed connections. however, overall i think it is more of a danger than a benefit.
>>
>> Maybe I misunderstood the original proposal, but my impression was that the alternative
>> servers would be tried only at the time the connection is established, and there would be no
>> such problems as you describe.
>
> it would still leave the problem of having a read only on the other side unless you are using BDR or so.

That doesn't make this a bad idea.  Some people are using replication
solutions that can cope with this already (EDB has a proprietary
product, and I'm sure there are people using BDR, too) and, as the
solutions get better and more widely deployed, more people will want
to do it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal: Implement failover on libpq connect level.

From
Jaime Casanova
Date:
On 17 August 2015 at 23:18, Victor Wagner <vitus@wagner.pp.ru> wrote:
>
> Rationale
> =========
>
> Since introduction of the WAL-based replication into the PostgreSQL, it is
> possible to create high-availability and load-balancing clusters.
>
> However, there is no support for failover in the client libraries. So, only
> way to provide transparent for client application failover is IP address
> migration. This approach has some limitation, i.e. it requires that
> master and backup servers reside in the same subnet or may not be
> feasible for other reasons.
>

This is not completely true, you can always use something like
pgbouncer or other middleware to change the server to which clients
connect. you still need to solve the fact that you will have a
read-only server at the other side.

something like repmgr + pgbouncer will work fine.

i agree that once/if we ever have multimaster included then this could
be a good idea

--
Jaime Casanova                      www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación



Re: Proposal: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Tue, Aug 18, 2015 at 12:53 PM, Jaime Casanova
<jaime.casanova@2ndquadrant.com> wrote:
> This is not completely true, you can always use something like
> pgbouncer or other middleware to change the server to which clients
> connect. you still need to solve the fact that you will have a
> read-only server at the other side.
>
> something like repmgr + pgbouncer will work fine.

Sure, but pgbouncer is an extra hop, and has its own foibles.  There's
real appeal to doing this in the client.

> i agree that once/if we ever have multimaster included then this could
> be a good idea

I think it has a lot of appeal *now*.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal: Implement failover on libpq connect level.

From
Amit Kapila
Date:
On Tue, Aug 18, 2015 at 9:48 AM, Victor Wagner <vitus@wagner.pp.ru> wrote:

Behavoir
--------

If PQconnectdb encounters connect string with multiple hosts specified,
it attempts to establish connection with all these hosts simultaneously,
and begins to work with server which responds first, unless
loadbalancing parameter is true.


I think here you are mixing the behaviour for load balancing solution and
failover solution.  It seems to me that for client-side failover solution
(which is also known as Transparent Application Failover), the connection
attempt to second server should be done after the first connection is
broken as that provide more flexibility.
 
If the loadbalancing parameter is true, it tries servers sequentially in
the random order.

If the parameter readonly is false, after authenticating with server it
executes show transaction_read_only, to find out whether current
connection is to the master or to the hot standby, and connection is
considered successful only if server allows read write transactions.


Although both ideas (load balancing and failover) seems worth discussing,
they are separate features and can be worked on separately.  It will be
easier to sort out the details as well that way.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Proposal: Implement failover on libpq connect level.

From
Tatsuo Ishii
Date:
I wonder how extended protocol is handled by this proposal. Suppose
load balacing mode is enabled. PQprepare is executed on standby1. Then
PQexecPrepared gets called. This may be executed on standby2, which
will fail because there's no prepared statement created by the former
PQprepare call.

Even simple procotol is used, same thing can be said to SQL
PREPARE/EXECUTE/DEALLOCATE.

SQL BEGIN/COMMIT/ROLLBACK would be more interesting example in load
balancing mode.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: Proposal: Implement failover on libpq connect level.

From
'Victor Wagner *EXTERN*'
Date:
On 2015.08.18 at 08:32:28 +0000, Albe Laurenz wrote:

> I wonder how useful this is at the present time.
> 
> If the primary goes down and the client gets connected to the standby,
> it would have read-only access there.  Most applications wouldn't cope
> well with that.

It is supposed that somebody (either system administrator or some
cluster management software) have noticed failure of master and promoted
one of the standbys to master.

So, clients have only to find out which cluster node serves as master
just now.

Idea is that we don't need any extra administration actions such as IP
migration to do it. Clients have list of alternate servers and discover
which one to work with by trial and error.

I consider in my proposal following situations:

1. Warm standby - doesn't accept client connection at all unless
promoted to master.

2. Hot standby - we have two types of clients - one for which readonly
access is sufficient, and other that need to connect only to master.
In this case intention to write is explicitely stated in the connect
string (readonly=false) and connect procedure would check if node it
tries to connect allowed write.

It seems that most people discussing in this thread think in millisecond
time intervals (failure and immediate reconnect).

I was thinking about much longer time intervals - it would probaly take
seconds to cluster management software to notice server failure and
promote backup server to master, it might be possible for application to
spend minute or so trying to reconnect, but it would take
hours to change connect string on clients - it would require visit of
support enginer to each client terminal, if we are thinking of
distributed OLTP system such as point-of-sale network with thick
clients.


> 
> >     "host=main-server host=standby1 host=standby2 port=5432 dbname=database"
> 
> It seems a bit arbitrary to require that all servers use the same port.

> Maybe parameters like host2, port2, host3, port3 etc. might be better.

I've thought about this approach. But PostgreSQL administration guide
insists that all servers in the cluster should have as identical
configuration as possible to simplify administration. 

Moreover I've seldom have seen configurations where postgresql is
accepting connection on non-default port.

Using host1, host2 etc would have unintended connotations, such is "this
is first, main server". I think that client should treat all given
servers as equal and let cluster administration to choose which one
would accept connection.

--                Victor Wagner <vitus@wagner.pp.ru>




Re: Proposal: Implement failover on libpq connect level.

From
Amit Kapila
Date:
On Wed, Aug 19, 2015 at 12:21 PM, Victor Wagner *EXTERN* <vitus@wagner.pp.ru> wrote:
>
> On 2015.08.18 at 08:32:28 +0000, Albe Laurenz wrote:
>
> > I wonder how useful this is at the present time.
> >
> > If the primary goes down and the client gets connected to the standby,
> > it would have read-only access there.  Most applications wouldn't cope
> > well with that.
>
> It is supposed that somebody (either system administrator or some
> cluster management software) have noticed failure of master and promoted
> one of the standbys to master.
>
> So, clients have only to find out which cluster node serves as master
> just now.
>
> Idea is that we don't need any extra administration actions such as IP
> migration to do it. Clients have list of alternate servers and discover
> which one to work with by trial and error.
>
> I consider in my proposal following situations:
>
> 1. Warm standby - doesn't accept client connection at all unless
> promoted to master.
>
> 2. Hot standby - we have two types of clients - one for which readonly
> access is sufficient, and other that need to connect only to master.
> In this case intention to write is explicitely stated in the connect
> string (readonly=false) and connect procedure would check if node it
> tries to connect allowed write.
>
> It seems that most people discussing in this thread think in millisecond
> time intervals (failure and immediate reconnect).

Why not have this as a separate parameter (*_timeout or something like
that)?



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Proposal: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On 2015.08.19 at 08:28:32 +0530, Amit Kapila wrote:

> On Tue, Aug 18, 2015 at 9:48 AM, Victor Wagner <vitus@wagner.pp.ru> wrote:
> >
> >
> > Behavoir
> > --------
> >
> > If PQconnectdb encounters connect string with multiple hosts specified,
> > it attempts to establish connection with all these hosts simultaneously,
> > and begins to work with server which responds first, unless
> > loadbalancing parameter is true.
> >
> >
> I think here you are mixing the behaviour for load balancing solution and
> failover solution.  It seems to me that for client-side failover solution
> (which is also known as Transparent Application Failover), the connection
> attempt to second server should be done after the first connection is
> broken as that provide more flexibility.

I think that failover procedure should begin before first connection is
ever established.

When client application starts, it has no way of knowing current state
of the server cluster - which of servers is working as master now.

Application uses connect string, placed into its configuration file
long time ago, and changing this configuration might require special
permissions, user of application doesn't have. But user typically know
how to restart application or reboot his terminal. So, for the
spatially distributed networks with  thick clients we can handle only 
initial connections, not connection resets. At least application author
always can implement restoration of connection as closing old
connection and establishing new.

So, when application first establishes connection it have to be prepared
to connect any of alternate hosts.

I don't think that making connections in sequential order provide big
flexibility. But it can greatly increase startup time, because connect
to host which is physically down fails after significant timeout. While
application waits for first connect to fail, it might complete session
initialization with working server several times.

Of course, connecting to servers in sequential order is simpler to
implement, and allows even more mixing of load balancing with failover,
because code would be same.


> Although both ideas (load balancing and failover) seems worth discussing,
> they are separate features and can be worked on separately.  It will be
> easier to sort out the details as well that way.

Really load balancing comes almost for free if we implement connect to
alternate server for failover purposes. I'm not sure that in case of hot
standby, where only readonly transactions can be loadbalanced,
loadbalancing is very useful. And included it in the proposal only
because it is very cheap to implement in this form,

> 
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com



Re: Proposal: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On 2015.08.19 at 12:42:45 +0900, Tatsuo Ishii wrote:

> I wonder how extended protocol is handled by this proposal. Suppose
> load balacing mode is enabled. PQprepare is executed on standby1. Then
> PQexecPrepared gets called. This may be executed on standby2, which
> will fail because there's no prepared statement created by the former
> PQprepare call.

Here we are discussing load-balancing on the client level, not on the
statement level.

Suppose that we have 100 readonly clients and 3 standby servers + master.
If all clients specify all four servers in the their connect strings,
and connect randomly to them, each server would have approximately 25
clients.

But once connection is established, each client works with one
server (at least until communication failure occurs and it would call
PQreset. In this case it has to reprepare statements anyway).




Re: Proposal: Implement failover on libpq connect level.

From
Albe Laurenz
Date:
Victor Wagner wrote:
>> I wonder how useful this is at the present time.
>>
>> If the primary goes down and the client gets connected to the standby,
>> it would have read-only access there.  Most applications wouldn't cope
>> well with that.

> It is supposed that somebody (either system administrator or some
> cluster management software) have noticed failure of master and promoted
> one of the standbys to master.
> 
> So, clients have only to find out which cluster node serves as master
> just now.
> 
> Idea is that we don't need any extra administration actions such as IP
> migration to do it. Clients have list of alternate servers and discover
> which one to work with by trial and error.

Yes, but that will only work reliably if the (read-only) standby does not
allow connections before it is promoted.

> I consider in my proposal following situations:
> 
> 1. Warm standby - doesn't accept client connection at all unless
> promoted to master.
> 
> 2. Hot standby - we have two types of clients - one for which readonly
> access is sufficient, and other that need to connect only to master.
> In this case intention to write is explicitely stated in the connect
> string (readonly=false) and connect procedure would check if node it
> tries to connect allowed write.

I think that these are both valid use cases.

And as Robert said, there are people out using BDR or other proprietary
multi-master solutions, so there might well be an audience for this feature.

>>>     "host=main-server host=standby1 host=standby2 port=5432 dbname=database"

>> It seems a bit arbitrary to require that all servers use the same port.
>> Maybe parameters like host2, port2, host3, port3 etc. might be better.

> I've thought about this approach. But PostgreSQL administration guide
> insists that all servers in the cluster should have as identical
> configuration as possible to simplify administration.
>
> Moreover I've seldom have seen configurations where postgresql is
> accepting connection on non-default port.

We do it all the time.

> Using host1, host2 etc would have unintended connotations, such is "this
> is first, main server". I think that client should treat all given
> servers as equal and let cluster administration to choose which one
> would accept connection.

I don't think that my idea of "host", "host3" is very appealing myself,
but I still don't like your original proposal of having multiple "host"
parameters.

One problem with that is that this syntax is already allowed, but
your proposal would silently change the semantics.
Today, if you have multiple "host" parameters, the last one wins.
So with your modification in place, some connect strings that work today
would start behaving in unexpected ways.

Maybe a better idea would be: host=db1.myorg.com,db2.myorg.com port=5432,2345

Yours,
Laurenz Albe

Re: Proposal: Implement failover on libpq connect level.

From
Victor Wagner *EXTERN*
Date:
On 2015.08.19 at 12:29:51 +0530, Amit Kapila wrote:

> > It seems that most people discussing in this thread think in millisecond
> > time intervals (failure and immediate reconnect).
> 
> Why not have this as a separate parameter (*_timeout or something like
> that)?

Because it is not in the software configuration. It is in the people
heads. Or may be in the organizational configuration of the environments
we are talking about.

Each of us imagining some use-case for discussed feature. And these
cases are completely different, and have different typical time
interval.

I haven't explicitely stated my use cases in the proposal. So people
thinking in terms of their use cases, and this is very significant
feedback for me.





Re: Proposal: Implement failover on libpq connect level.

From
Amit Kapila
Date:
On Wed, Aug 19, 2015 at 12:35 PM, Victor Wagner <vitus@wagner.pp.ru> wrote:
On 2015.08.19 at 08:28:32 +0530, Amit Kapila wrote:

> On Tue, Aug 18, 2015 at 9:48 AM, Victor Wagner <vitus@wagner.pp.ru> wrote:
> >
> >
> > Behavoir
> > --------
> >
> > If PQconnectdb encounters connect string with multiple hosts specified,
> > it attempts to establish connection with all these hosts simultaneously,
> > and begins to work with server which responds first, unless
> > loadbalancing parameter is true.
> >
> >
> I think here you are mixing the behaviour for load balancing solution and
> failover solution.  It seems to me that for client-side failover solution
> (which is also known as Transparent Application Failover), the connection
> attempt to second server should be done after the first connection is
> broken as that provide more flexibility.

I think that failover procedure should begin before first connection is
ever established.

As far as I understand, failover gets initiated once the master server goes
down or is not accessible due to some reason, so for such cases if you
have the connection to both the servers then it might not work.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Proposal: Implement failover on libpq connect level.

From
''Victor Wagner *EXTERN*' *EXTERN*'
Date:
On 2015.08.19 at 07:15:30 +0000, Albe Laurenz wrote:

> > Idea is that we don't need any extra administration actions such as IP
> > migration to do it. Clients have list of alternate servers and discover
> > which one to work with by trial and error.
> 
> Yes, but that will only work reliably if the (read-only) standby does not
> allow connections before it is promoted.

It would just take a bit more time for client and a bit more load for
server - to make sure that this connection is read-write by
issuing
   show transaction_read_only 

statement before considering connection useful.

> 
> And as Robert said, there are people out using BDR or other proprietary
> multi-master solutions, so there might well be an audience for this feature.
Unfortunately I have no experience with such solutions, so I'd greatly
appreciate feedback from those people.

I've modelled my proposal after another proprietary solution  - Oracle
RAC.


> One problem with that is that this syntax is already allowed, but
> your proposal would silently change the semantics.
> Today, if you have multiple "host" parameters, the last one wins.
> So with your modification in place, some connect strings that work today
> would start behaving in unexpected ways.

This is serious argument. But what the use case of these connect strings
now? 

It seems to me that in most cases last host in the connect string would
be only host which accepts connections, so it wins anyway

> Maybe a better idea would be:
>   host=db1.myorg.com,db2.myorg.com port=5432,2345

I've tried not to introduce new delimiters. But this syntax definitely
have some advantages. At least it allows to specify host-port pairs as
two parallel lists.

Other idea - allow to specify host-port pair as argument of host
parameter. 
 host=db1.myorg.com:5432

It is consistent with URL syntax and system administrators are used to
it. And with long list of hosts there is less chances to made an error
as host and corresponding port come together.

But your variant allows to handle hostaddr parameter same way as host
and port.
--     Victor Wagner <vitus@wagner.pp.ru>



Re: Proposal: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On 2015.08.19 at 12:55:15 +0530, Amit Kapila wrote:

> > I think that failover procedure should begin before first connection is
> > ever established.
> >
> 
> As far as I understand, failover gets initiated once the master server goes
> down or is not accessible due to some reason, so for such cases if you
> have the connection to both the servers then it might not work.

Master server might go down when client is not started yet. 
And when client starts up, it has to find out which server to connect
now.

Consider point-of-sale terminals, bank offices or anything else, which
do not work round the clock. Clerk comes to his workplace in the
morning, switches on terminal and inserts her smartcard to authorize
with server. She doesn't need to know what server name is and where it
is located. Either application finds the server automatically, or
support engineer has to be called to fix things.

Moreover, in some situations restart of application (or even client
terminal) is acceptable price for failover, as long as there is no need
to manually fix the configuration.




Re: Proposal: Implement failover on libpq connect level.

From
Albe Laurenz
Date:
Victor Wagner wrote:
> > > Idea is that we don't need any extra administration actions such as IP
> > > migration to do it. Clients have list of alternate servers and discover
> > > which one to work with by trial and error.
> >
> > Yes, but that will only work reliably if the (read-only) standby does not
> > allow connections before it is promoted.
> 
> It would just take a bit more time for client and a bit more load for
> server - to make sure that this connection is read-write by
> issuing
> 
>     show transaction_read_only
> 
> statement before considering connection useful.

That's not very comfortable, and a lot of middleware software won't easily
learn the trick.

But even without that use case I think that the feature is probably
worth the effort.

[about having multiple "host" parameters in the connection string]

> > One problem with that is that this syntax is already allowed, but
> > your proposal would silently change the semantics.
> > Today, if you have multiple "host" parameters, the last one wins.
> > So with your modification in place, some connect strings that work today
> > would start behaving in unexpected ways.
> 
> This is serious argument. But what the use case of these connect strings
> now?
>
> It seems to me that in most cases last host in the connect string would
> be only host which accepts connections, so it wins anyway

I'm not saying that it is particularly wide-spread and useful; it could
happen through careless editing of connection strings or by using a
connection service file entry
(http://www.postgresql.org/docs/current/static/libpq-pgservice.html)
and overriding the host parameter on the command line.

> > Maybe a better idea would be:
> >   host=db1.myorg.com,db2.myorg.com port=5432,2345
> 
> I've tried not to introduce new delimiters. But this syntax definitely
> have some advantages. At least it allows to specify host-port pairs as
> two parallel lists.
> 
> Other idea - allow to specify host-port pair as argument of host
> parameter.
> 
>   host=db1.myorg.com:5432
> 
> It is consistent with URL syntax and system administrators are used to
> it. And with long list of hosts there is less chances to made an error
> as host and corresponding port come together.

I don't think that is very attactive as it confuses the distinction between
"host" and "port".  What would you do with
  host=db1.myorg.com:2345 port=1234

Yours,
Laurenz Albe

Re: Proposal: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On 2015.08.19 at 09:21:50 +0000, Albe Laurenz wrote:

> > > Yes, but that will only work reliably if the (read-only) standby does not
> > > allow connections before it is promoted.
> > 
> > It would just take a bit more time for client and a bit more load for
> > server - to make sure that this connection is read-write by
> > issuing
> > 
> >     show transaction_read_only
> > 
> > statement before considering connection useful.
> 
> That's not very comfortable, and a lot of middleware software won't easily
> learn the trick.

It shouldn't be left to middleware. It should be hidden into
PQConnectPoll. This function already handle very complicated state
transition diagram, including authentication, SSL negotiation and so on.
If we just add couple of new states such as CONNECTION_ASK_RW_STATUS
and CONNECTION_RW_STATUS_OK it should be fine.

Application would just call PQConnectPoll repeatedly (either via
PQconnectdb or explicitely when readable/writable condition detected on
the socket integrated into app even loop) until success or
unrecoverable error would be achieved. How many interaction with server
it would take, it is not middleware problem.


> > It seems to me that in most cases last host in the connect string would
> > be only host which accepts connections, so it wins anyway
> 
> I'm not saying that it is particularly wide-spread and useful; it could
> happen through careless editing of connection strings or by using a
> connection service file entry
> (http://www.postgresql.org/docs/current/static/libpq-pgservice.html)
> and overriding the host parameter on the command line.


I don't think that host definition from all possible sources 
(service file, environment, command line) should be collected together.
Although it is essential to be clear when host list is appended and when
- replaced. 

If we implement sequential trial of all hosts, then we can start with
last one, to provide compatibility with existing behavior. In this case
if last host is online, no change occur.

Another idea - is to enable multiple host connection only if special
option (loadbalance or failover) present in the connect string.



> > Other idea - allow to specify host-port pair as argument of host
> > parameter.
> > 
> >   host=db1.myorg.com:5432
> > 
> > It is consistent with URL syntax and system administrators are used to
> > it. And with long list of hosts there is less chances to made an error
> > as host and corresponding port come together.
> 
> I don't think that is very attactive as it confuses the distinction between
> "host" and "port".  What would you do with
> 
>    host=db1.myorg.com:2345 port=1234
I don't think that it does add any more confusion than simultaneous
existence of host and hostaddr, or ability to specify host and port both
in host part of URL and query parameters
   postgresql://user@host:5432/dbname?host=otherhost&port=2345

Bot really, you've convinced me that syntax with two or three (host, port
and hostaddr) parallel lists is the best. Although we'll need to allow
empty entries in the lists such as

host=master.db.com,standby1.db.com.standby2.db.com port=,2345, hostaddr=,192.168.0.4,192.160.1.8

with evident semantic:
get port, host and hostaddr elements with same number, and if some of
them are empty, behave as it was only host and corresponding parameter
not specified at all.

--     Victor Wagner <vitus@wagner.pp.ru>



Re: Proposal: Implement failover on libpq connect level.

From
Tom Lane
Date:
Albe Laurenz <laurenz.albe@wien.gv.at> writes:
> Victor Wagner wrote:
>> It would just take a bit more time for client and a bit more load for
>> server - to make sure that this connection is read-write by
>> issuing
>> show transaction_read_only
>> statement before considering connection useful.

> That's not very comfortable, and a lot of middleware software won't easily
> learn the trick.

That sort-of ties into what seems to me the main objection to this
proposal, namely that there is already a way to do this sort of thing:
DNS-based load balancing.  All the clients think they connect to
db.mycompany.com, but which server they actually get is determined by
what IP address the DNS server tells them to use.

This is a technology that is very well established, known to every
large-site admin, and usable for every Internet-based service.  Even if
libpq had its own nonstandard way of doing something similar, the site
admins would probably still need to use DNS load balancing for other
services.  In fact, they'd still need to use DNS balancing for Postgres,
because not everything connects with libpq (think JDBC for instance).

So I think we ought to reject this proposal, full stop.  I see no
reason to re-invent this wheel, and there are good reasons not to.
        regards, tom lane



Re: Proposal: Implement failover on libpq connect level.

From
Andres Freund
Date:
On 2015-08-19 09:41:32 -0400, Tom Lane wrote:
> In fact, they'd still need to use DNS balancing for Postgres,
> because not everything connects with libpq (think JDBC for instance).

It already does support this though.

https://jdbc.postgresql.org/documentation/head/connect.html :

> Connection Fail-over
> 
> To support simple connection fail-over it is possible to define multiple
> endpoints (host and port pairs) in the connection url separated by
> commas. The driver will try to once connect to each of them in order
> until the connection succeeds. If none succeed, a normal connection
> exception is thrown.
> 
> The syntax for the connection url is:
> 
> jdbc:postgresql://host1:port1,host2:port2/database


> So I think we ought to reject this proposal, full stop.  I see no
> reason to re-invent this wheel, and there are good reasons not to.

I don't really buy this argument. Allowing to connect to several
endpoints isn't exactly "new tech" either. A lot of database connectors
do support something very close to the above pgjdbc feature.

Greetings,

Andres Freund



Re: Proposal: Implement failover on libpq connect level.

From
Oleg Bartunov
Date:


On Wed, Aug 19, 2015 at 4:46 PM, Andres Freund <andres@anarazel.de> wrote:
On 2015-08-19 09:41:32 -0400, Tom Lane wrote:
> In fact, they'd still need to use DNS balancing for Postgres,
> because not everything connects with libpq (think JDBC for instance).

It already does support this though.

https://jdbc.postgresql.org/documentation/head/connect.html :

> Connection Fail-over
>
> To support simple connection fail-over it is possible to define multiple
> endpoints (host and port pairs) in the connection url separated by
> commas. The driver will try to once connect to each of them in order
> until the connection succeeds. If none succeed, a normal connection
> exception is thrown.
>
> The syntax for the connection url is:
>
> jdbc:postgresql://host1:port1,host2:port2/database

yes, I also wanted to show this, but you was quicker.
 


> So I think we ought to reject this proposal, full stop.  I see no
> reason to re-invent this wheel, and there are good reasons not to.

I don't really buy this argument. Allowing to connect to several
endpoints isn't exactly "new tech" either. A lot of database connectors
do support something very close to the above pgjdbc feature.


mysql, for example.
 

Greetings,

Andres Freund


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Proposal: Implement failover on libpq connect level.

From
Simon Riggs
Date:
On 19 August 2015 at 14:46, Andres Freund <andres@anarazel.de> wrote:
On 2015-08-19 09:41:32 -0400, Tom Lane wrote:
> In fact, they'd still need to use DNS balancing for Postgres,
> because not everything connects with libpq (think JDBC for instance).

It already does support this though.

https://jdbc.postgresql.org/documentation/head/connect.html :

> Connection Fail-over
>
> To support simple connection fail-over it is possible to define multiple
> endpoints (host and port pairs) in the connection url separated by
> commas. The driver will try to once connect to each of them in order
> until the connection succeeds. If none succeed, a normal connection
> exception is thrown.
>
> The syntax for the connection url is:
>
> jdbc:postgresql://host1:port1,host2:port2/database

When we discussed this feature at the Dev Meeting in 2014, I thought we agreed that allowing multiple hosts in the connection string would be OK.

+1 for bringing the jdbc driver URI syntax into libpq, so that all interfaces can be optionally specified this way. This doesn't preclude the use of ipfailover, in fact it might be work well together. If you don't like it, don't use it.

I think we do need some way of saying that a readonly connection is OK. So the default would be to connect to each in turn until we find the master. It should keep retrying for a period of time since for a short period it is possible there is no master. If you specify readonly, then a connection to a standby is acceptable and it will stop there.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Proposal: Implement failover on libpq connect level.

From
''Victor Wagner *EXTERN*' *EXTERN*' *EXTERN*
Date:
On 2015.08.19 at 15:35:17 +0100, Simon Riggs wrote:

> 
> I think we do need some way of saying that a readonly connection is OK. So

I had such thing in my propsal (boolean parameter readonly). 
But haven't yet checked if it is compatible with jdbc syntax.

> the default would be to connect to each in turn until we find the master.
> It should keep retrying for a period of time since for a short period it is
> possible there is no master. If you specify readonly, then a connection to

It is very important addition  - to specify that if no host is able to
establish read-write session, we should retry and give a chance for
sever administration to promote one of standbys to master. Probably
there should be additional timeout parameter (we have
connection_timeout, and this would be failover_timeout) with some
reasonaable default.




Re: Proposal: Implement failover on libpq connect level.

From
Tatsuo Ishii
Date:
> Here we are discussing load-balancing on the client level, not on the
> statement level.

I see.

> Suppose that we have 100 readonly clients and 3 standby servers + master.
> If all clients specify all four servers in the their connect strings,
> and connect randomly to them, each server would have approximately 25
> clients.
> 
> But once connection is established, each client works with one
> server (at least until communication failure occurs and it would call
> PQreset. In this case it has to reprepare statements anyway).

One downside of this is, if one of the standby servers is not
responding, every time clients will be blocked by the server before
giving up the connection trial. This could last for hours (for
example, the network cable is plugged out). I think round robin DNS is
better because the DNS server will drop the entry corresponding broken
server (or any solution which has similar capability). After all, this
type of client side solutions are not very stable in a real world
environment IMO (I heard the same opinion regarding HAProxy).

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: Proposal: Implement failover on libpq connect level.

From
"Shulgin, Oleksandr"
Date:
On Wed, Aug 19, 2015 at 4:45 PM, ''Victor Wagner *EXTERN*' *EXTERN*' *EXTERN* <vitus@wagner.pp.ru> wrote:
On 2015.08.19 at 15:35:17 +0100, Simon Riggs wrote:

>
> I think we do need some way of saying that a readonly connection is OK. So

I had such thing in my propsal (boolean parameter readonly).
But haven't yet checked if it is compatible with jdbc syntax.

> the default would be to connect to each in turn until we find the master.
> It should keep retrying for a period of time since for a short period it is
> possible there is no master. If you specify readonly, then a connection to

It is very important addition  - to specify that if no host is able to
establish read-write session, we should retry and give a chance for
sever administration to promote one of standbys to master. Probably
there should be additional timeout parameter (we have
connection_timeout, and this would be failover_timeout) with some
reasonaable default.

Are we going to put support for every existing and new jdbc feature into libpq?  One day they might want to add another parameter, e.g. the number of retries before failing ultimately (hm, and probably, delay between retries).  Should we already prepare for that?

I believe a good library should provide all the building blocks instead of trying to envision every possible use case and incorporate them as convenience functions.  All the described above can be implemented in terms of existing libpq features rather easily.  Not to mention that the proposed approach doesn't scale really well, IMO: once you have incorporated all your database hosts in client's connection string, you need additional steps to maintain this list on the app configuration side.

And the fact that a lot of other db connector libraries do this in one or the other way, isn't actually an argument in favor of the feature, at least not for me.

--
Alex

Re: Proposal: Implement failover on libpq connect level.

From
David Fetter
Date:
On Wed, Aug 19, 2015 at 07:15:30AM +0000, Laurenz Albe wrote:
> Victor Wagner wrote:
> >> I wonder how useful this is at the present time.
> >>
> Maybe a better idea would be:
>   host=db1.myorg.com,db2.myorg.com port=5432,2345

I think if we're going to provide multiple sets of connection info, we
should just do that rather than trying to piece them together from
constituent parts, where the former looks like:

host="service=foo sslmode=require",postgresql://bar.baz/mydb?sslmode=require,"host=quux.corge user=grault port=6433"

As far as I can tell, the only way a comma could sneak into these
strings is if it were in a database name or similarly bizarre spot, in
which case the usual quoting needed to handle it in general should
handle it here.

It's not clear to me that libpq is the correct place to add this
feature, as we have fairly large user bases--the JDBC world, for
example--that don't use it.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Proposal: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On 2015.08.20 at 00:17:35 +0900, Tatsuo Ishii wrote:

> > But once connection is established, each client works with one
> > server (at least until communication failure occurs and it would call
> > PQreset. In this case it has to reprepare statements anyway).
> 
> One downside of this is, if one of the standby servers is not
> responding, every time clients will be blocked by the server before
> giving up the connection trial. This could last for hours (for

This shouldn't happen. My proposal was to connect all servers
simultaneously, and then use that connection which would be established
first closing other ones

Even if we wouldn't do so (to properly randomize server load or to be
compatible with jdbc), there is connection_timeout parameter, so 
client wouldn't seat and just wait for hours while system TCP/IP stack
trying to connect nonexistent server.


> example, the network cable is plugged out). I think round robin DNS is
> better because the DNS server will drop the entry corresponding broken

DNS server wouldn't drop anything unless explicitely told so (by
administrator or by some watchdog software which is able to talk
nsupdate protocol). 

And not everyone database owner has control on his own domain.

Moreover, DNS is distributed system with agressive caching. If our
system is not local, DNS records for non-existing server would be cached
by DNS servers of client's internet provider. 




Re: Proposal: Implement failover on libpq connect level.

From
Tom Lane
Date:
Victor Wagner <vitus@wagner.pp.ru> writes:
> On 2015.08.20 at 00:17:35 +0900, Tatsuo Ishii wrote:
>> One downside of this is, if one of the standby servers is not
>> responding, every time clients will be blocked by the server before
>> giving up the connection trial. This could last for hours (for

> This shouldn't happen. My proposal was to connect all servers
> simultaneously, and then use that connection which would be established
> first closing other ones

That seems like seriously unfriendly behavior.  It will trigger dropped
connection bleats in the server logs, not to mentioned wasted process
forks.
        regards, tom lane



Re: Proposal: Implement failover on libpq connect level.

From
Amit Kapila
Date:
On Wed, Aug 19, 2015 at 1:23 PM, Victor Wagner <vitus@wagner.pp.ru> wrote:
>
> On 2015.08.19 at 12:55:15 +0530, Amit Kapila wrote:
>
> > > I think that failover procedure should begin before first connection is
> > > ever established.
> > >
> >
> > As far as I understand, failover gets initiated once the master server goes
> > down or is not accessible due to some reason, so for such cases if you
> > have the connection to both the servers then it might not work.
>
> Master server might go down when client is not started yet.
> And when client starts up, it has to find out which server to connect
> now.
>

Always try with the first server specified in connection string and if that
is not available try with second and so on.  I think for the case of failover,
the design shouldn't be much complicated and it is a standard thing provided
by most of the client-side drivers in other databases.  Considering what
currently PostgreSQL offers in terms of high-availability functionality, for
load-balancing, we need to be careful of many more things like redirecting
read-queries to standby's, write statements should be executed via connection
to master.  


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Proposal: Implement failover on libpq connect level.

From
Teodor Sigaev
Date:
> +1 for bringing the jdbc driver URI syntax into libpq, so that all interfaces
> can be optionally specified this way. This doesn't preclude the use of
> ipfailover, in fact it might be work well together. If you don't like it, don't
> use it.

+1

Another thought:  multiple hosts in URI could be used in simple configuration 
for read-only clients. I faced with customers which manages two connections in 
process - to master and to one of several slaves.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: Proposal: Implement failover on libpq connect level.

From
"Shulgin, Oleksandr"
Date:
On Fri, Aug 28, 2015 at 6:10 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
+1 for bringing the jdbc driver URI syntax into libpq, so that all interfaces
can be optionally specified this way. This doesn't preclude the use of
ipfailover, in fact it might be work well together. If you don't like it, don't
use it.

+1

Another thought:  multiple hosts in URI could be used in simple configuration for read-only clients. I faced with customers which manages two connections in process - to master and to one of several slaves.

Hm, but do they suffer any trouble while doing that *outside* of libpq?  What is the benefit in adding this to libpq itself while it already provides very rich and finely grained connection control functions?

--
Alex

Re: Proposal: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Wed, Aug 19, 2015 at 9:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> That sort-of ties into what seems to me the main objection to this
> proposal, namely that there is already a way to do this sort of thing:
> DNS-based load balancing.  All the clients think they connect to
> db.mycompany.com, but which server they actually get is determined by
> what IP address the DNS server tells them to use.

But that kinda sucks.  I mean, suppose I have three servers, A, B, and
C.  I point db.mycompany.com to A, which is the master; then A dies.
Under your proposal, whatever script I use to control failover now has
to change the DNS records to repoint db.mycompany.com to B, my new,
and newly-promoted, new master.   It's quite possible that some
machines on the network, or some processes, will have the old IP
address cached, and it may be several minutes before those caches time
out.  In the meantime, I'm down: even if I bounce the application
servers, they may just try to reconnect to A.

Victor's proposal is far more convenient.  When A goes offline, the
servers automatically begin trying to connect to B and C.  Let's
suppose I use iptables or something like that to prevent connections
to B and C as long as A is online.  Or pg_hba.conf or whatever.  But
once I'm sure A is dead, I can promote B and reconfigure it to allow
connections *and I'm done*.  At most, I need to restart my application
servers.  I don't need access to the DNS server - which the guys in IT
are unlikely to provide to a lowly DBA anyhow.  I don't have to worry
about stale caches.  Everything just works.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal: Implement failover on libpq connect level.

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Wed, Aug 19, 2015 at 9:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > That sort-of ties into what seems to me the main objection to this
> > proposal, namely that there is already a way to do this sort of thing:
> > DNS-based load balancing.  All the clients think they connect to
> > db.mycompany.com, but which server they actually get is determined by
> > what IP address the DNS server tells them to use.
> 
> But that kinda sucks.  I mean, suppose I have three servers, A, B, and
> C.  I point db.mycompany.com to A, which is the master; then A dies.
> Under your proposal, whatever script I use to control failover now has
> to change the DNS records to repoint db.mycompany.com to B, my new,
> and newly-promoted, new master.   It's quite possible that some
> machines on the network, or some processes, will have the old IP
> address cached, and it may be several minutes before those caches time
> out.  In the meantime, I'm down: even if I bounce the application
> servers, they may just try to reconnect to A.

The solution to this part seems to be to lower the TTL, which seems
easy enough.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Proposal: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Wed, Aug 19, 2015 at 11:06 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Always try with the first server specified in connection string and if that
> is not available try with second and so on.  I think for the case of
> failover,
> the design shouldn't be much complicated and it is a standard thing provided
> by most of the client-side drivers in other databases.  Considering what
> currently PostgreSQL offers in terms of high-availability functionality, for
> load-balancing, we need to be careful of many more things like redirecting
> read-queries to standby's, write statements should be executed via
> connection
> to master.

This can be really slow, though, if the master server is completely
offline rather than actively rejecting connections.  Maybe a good idea
would be to have a new connection string parameter, failover_time.  If
not specified or set to a negative value, we use only the
last-specified host=X parameter, just as now.  If set to zero, we try
to connect to everything at once, as Victor proposes.  If set to a
value >0, we try to connect to the servers one after another,
initiating each connection attempt after the number of milliseconds
specified by the parameter.  So then you can do host=A host=B host=C
failover_time=100 and this will launch a connection attempt against A
immediately; if that hasn't completed within 100ms, then we'll also
start one targeting server B, and then after another 100ms we'll try
C.  ISTM that this would cater to several different use cases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal: Implement failover on libpq connect level.

From
Magnus Hagander
Date:
On Tue, Sep 1, 2015 at 7:50 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Robert Haas wrote:
> On Wed, Aug 19, 2015 at 9:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > That sort-of ties into what seems to me the main objection to this
> > proposal, namely that there is already a way to do this sort of thing:
> > DNS-based load balancing.  All the clients think they connect to
> > db.mycompany.com, but which server they actually get is determined by
> > what IP address the DNS server tells them to use.
>
> But that kinda sucks.  I mean, suppose I have three servers, A, B, and
> C.  I point db.mycompany.com to A, which is the master; then A dies.
> Under your proposal, whatever script I use to control failover now has
> to change the DNS records to repoint db.mycompany.com to B, my new,
> and newly-promoted, new master.   It's quite possible that some
> machines on the network, or some processes, will have the old IP
> address cached, and it may be several minutes before those caches time
> out.  In the meantime, I'm down: even if I bounce the application
> servers, they may just try to reconnect to A.

The solution to this part seems to be to lower the TTL, which seems
easy enough.

It seems easy enough. Until you actually try to do it, and discover platforms and appservers that completely ignore the ttl by default (like, uh, java, which you may have run into - at least it used to do that), or enforces a lower minimum of longer than you want in order to decrease load, and things like that.
 
--

Re: Proposal: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Tue, Sep 1, 2015 at 1:50 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Robert Haas wrote:
>> On Wed, Aug 19, 2015 at 9:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > That sort-of ties into what seems to me the main objection to this
>> > proposal, namely that there is already a way to do this sort of thing:
>> > DNS-based load balancing.  All the clients think they connect to
>> > db.mycompany.com, but which server they actually get is determined by
>> > what IP address the DNS server tells them to use.
>>
>> But that kinda sucks.  I mean, suppose I have three servers, A, B, and
>> C.  I point db.mycompany.com to A, which is the master; then A dies.
>> Under your proposal, whatever script I use to control failover now has
>> to change the DNS records to repoint db.mycompany.com to B, my new,
>> and newly-promoted, new master.   It's quite possible that some
>> machines on the network, or some processes, will have the old IP
>> address cached, and it may be several minutes before those caches time
>> out.  In the meantime, I'm down: even if I bounce the application
>> servers, they may just try to reconnect to A.
>
> The solution to this part seems to be to lower the TTL, which seems
> easy enough.

In theory, yeah.  In practice, not all systems obey the TTL, and in my
experience, that's actually a fairly common problem.   Sometimes the
TTL gets enforced separately at multiple levels, so that all of the
old records don't go away for 2 or 3 times the TTL, or occasionally
completely random intervals of time thoroughly unrelated to the TTL
you configured.  And that assumes that the guy who controls the DNS
server is willing to configure a different TTL for you, which is not
always the case.

It also assumes that guy is OK granting access to modify DNS records
to an automated system running on the database server machines.  That
may be OK if the database server is THE ONE THING that needs treatment
of this type, but if the company supports 50 or 100 services that all
need failover handling, suddenly giving all of those things the
ability to reconfigure the DNS server sounds like a pretty poor plan.
Plus, there may be multiple copies of the DNS server in different
geographies, all cloned from a master at the central office.  When the
central office dies, you lose not only the main database server but
also the main DNS server.  That's OK, because the backup DNS servers
still have copies of all the data from the master ... but you can't
make changes until the master is back up.

All of these problems can be solved if you're willing to put enough
time and energy into it.  For example, Akamai has (or had, at the time
I worked there) a service that did very robust geographical
load-balancing and failover.  So you could, like, go buy that, and
maybe it would solve your problem.  By now, there are probably other
companies offering similar services.  I have no doubt that similar
solutions can be crafted from purely open-source software, and there
may very well be great tools available for this that weren't around
the last time I worked as a network administrator.  But I think it's
quite wrong to assume that the infrastructure for this is available
and usable everywhere, because in my experience, that's far from the
case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal: Implement failover on libpq connect level.

From
Andres Freund
Date:
On 2015-09-01 14:07:19 -0400, Robert Haas wrote:
> But I think it's quite wrong to assume that the infrastructure for
> this is available and usable everywhere, because in my experience,
> that's far from the case.

Especially when the alternative is a rather short patch implementing an
otherwise widely available feature.



Re: Proposal: Implement failover on libpq connect level.

From
"Shulgin, Oleksandr"
Date:
On Tue, Sep 1, 2015 at 8:12 PM, Andres Freund <andres@anarazel.de> wrote:
On 2015-09-01 14:07:19 -0400, Robert Haas wrote:
> But I think it's quite wrong to assume that the infrastructure for
> this is available and usable everywhere, because in my experience,
> that's far from the case.

Especially when the alternative is a rather short patch implementing an
otherwise widely available feature.

But that won't actually help in the case described by Robert: if the master server A failed, the client has no idea if B or C would become the new master.

Unless it actually tries to connect them in turn and check for the result of pg_is_in_recovery().  I think that brings enough complexity for keeping this outside of libpq.  Also think about all the extra flexibility people will likely want to have: number of retries, delay between retries, delay backoff, etc., to the point we'll have to support some sort of client code retry_policy_callback.

For read-only clients you might want to include a number of slave hostnames, and let the connector choose one, but then again you can't achieve load-balancing on the client side, you're better off using round-robin DNS.  To add or remove a slave you only need to update DNS, and not configuration on all the clients.

For the master failover I think a common technique is to just move the floating IP address from the old master to the new one.  This doesn't require touching the DNS record.

--
Alex

Re: Proposal: Implement failover on libpq connect level.

From
Christopher Browne
Date:
On 2 September 2015 at 04:52, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:
>
> On Tue, Sep 1, 2015 at 8:12 PM, Andres Freund <andres@anarazel.de> wrote:
>>
>> On 2015-09-01 14:07:19 -0400, Robert Haas wrote:
>> > But I think it's quite wrong to assume that the infrastructure for
>> > this is available and usable everywhere, because in my experience,
>> > that's far from the case.
>>
>> Especially when the alternative is a rather short patch implementing an
>> otherwise widely available feature.
>
>
> But that won't actually help in the case described by Robert: if the master server A failed, the client has no idea if B or C would become the new master.
>
> Unless it actually tries to connect them in turn and check for the result of pg_is_in_recovery().  I think that brings enough complexity for keeping this outside of libpq.  Also think about all the extra flexibility people will likely want to have: number of retries, delay between retries, delay backoff, etc., to the point we'll have to support some sort of client code retry_policy_callback.
>
> For read-only clients you might want to include a number of slave hostnames, and let the connector choose one, but then again you can't achieve load-balancing on the client side, you're better off using round-robin DNS.  To add or remove a slave you only need to update DNS, and not configuration on all the clients.
>
> For the master failover I think a common technique is to just move the floating IP address from the old master to the new one.  This doesn't require touching the DNS record.

It seems to me that any time the answer involves touching the DNS record, then having this be libpq configuration is terribly inappropriate.

I'll note that OpenLDAP tends to accept multiple database URIs, and is nonspecific as to the relative semantics.  That points me at two things that may be worth considering for libpq:

1.  It seems to me that it would be ridiculous to try to have semantics trying to interpret multiple values passed to PGPORT/PGHOST/PGDATABASE/-p/-h/-d/, but rather, that if there are to be multiple values, they should be presented as URIs.

I'm finding, by the way, that it is very useful in simplifying application configuration to use the URIs that have been usable since about 9.2.

I <3
postgresql://postgres@localhost:7093/myfavedatabase

2.  I'd be inclined to go with pretty simple functionality in anything that would be built-in to libpq.  No great amount of logic, rather

   a) Try the first URI.  If that works (where "connects" = "works"), we're done.
   b) Try the next URI.  If that works, we're done.
   c) Is there another one?  Try it...  Works... Done.  Repeat as needful.

There's all kinds of functionality that you could add, between...
  - Random selection of URI.
  - Running some query against each URI tried to see if it's a "master/slave", changing client functionality based on that.
  - Picking URI based on connecting (somewhere) and deciding what URI to prefer.

I'd suggest implementing the simplest thing (e.g. - try URI 1, then 2, then ...), as that's most likely the most useful.
And more sophisticated functionality probably ought to be in the application anyways.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

Re: Proposal: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Wed, Sep 2, 2015 at 4:52 AM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> On Tue, Sep 1, 2015 at 8:12 PM, Andres Freund <andres@anarazel.de> wrote:
>>
>> On 2015-09-01 14:07:19 -0400, Robert Haas wrote:
>> > But I think it's quite wrong to assume that the infrastructure for
>> > this is available and usable everywhere, because in my experience,
>> > that's far from the case.
>>
>> Especially when the alternative is a rather short patch implementing an
>> otherwise widely available feature.
>
> But that won't actually help in the case described by Robert: if the master
> server A failed, the client has no idea if B or C would become the new
> master.

Sure it does.  You just need to ensure that whichever of those is the
new master accepts connections, and the other one doesn't.  There are
lots of ways to do this; e.g. give the machine a second IP that
accepts connections only when the machine is the designated master,
and have read-write clients connect to that IP, and read-only clients
connect to the machine's main IP.

Andres's point is the same as mine: we ought to accept this feature,
in some form, because it's really quite useful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal: Implement failover on libpq connect level.

From
"Shulgin, Oleksandr"
Date:
On Wed, Sep 2, 2015 at 9:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Sep 2, 2015 at 4:52 AM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> On Tue, Sep 1, 2015 at 8:12 PM, Andres Freund <andres@anarazel.de> wrote:
>>
>> On 2015-09-01 14:07:19 -0400, Robert Haas wrote:
>> > But I think it's quite wrong to assume that the infrastructure for
>> > this is available and usable everywhere, because in my experience,
>> > that's far from the case.
>>
>> Especially when the alternative is a rather short patch implementing an
>> otherwise widely available feature.
>
> But that won't actually help in the case described by Robert: if the master
> server A failed, the client has no idea if B or C would become the new
> master.

Sure it does.  You just need to ensure that whichever of those is the
new master accepts connections, and the other one doesn't.  There are
lots of ways to do this; e.g. give the machine a second IP that
accepts connections only when the machine is the designated master,
and have read-write clients connect to that IP, and read-only clients
connect to the machine's main IP.

Well, I see how that can help, but still sounds like a lot of hassle.

What if you have 5 servers: A..F, listed in client's connection settings in that order, and after failing over from A, now F is the new master (for whatever reason: I don't think it would be realistic to assume that you can and always will fail over to the next host in the list).  So suddenly, the read-write clients need to make 5 connection attempts before arriving at the master (add name resolution to the picture for even more latency).  Connection pooling can probably mitigate this to some degree, of course.

I believe that having a floating IP for the master is much more practical approach and it doesn't require any patch to libpq or modification of the client connection settings.

Andres's point is the same as mine: we ought to accept this feature,
in some form, because it's really quite useful.

Even if someone is keen on implementing the multiple connection strings approach, nothing stops them from doing that on top libpq, and I really think it will be ever more flexible than anything we can build into libpq itself.

--
Alex

Re: Proposal: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Thu, Sep 3, 2015 at 4:00 AM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> I believe that having a floating IP for the master is much more practical
> approach and it doesn't require any patch to libpq or modification of the
> client connection settings.

I think that's a great approach if all the machines are on the same
subnet.  If they are in different datacenters, it doesn't work.

The amount of opposition to this feature is remarkable considering
that it's available in Oracle, SQL Server, MongoDB, Cassandra, and
MySQL.  See for example:

http://docs.mongodb.org/manual/reference/connection-string/
https://datastax.github.io/python-driver/getting_started.html

This is a small patch with minimal to no downside implementing a
feature that is present in most or all of the major competing
products.  We're really doing ourselves a disservice if we reject it.
I think it would be far better to progress to talking about what
design we'd be comfortable with, rather than kidding ourselves that a
feature that everyone else has and which somebody has taken the time
to implement (thus, obviously it has value for them) and which has
been discussed to general approval at PGCon developer meetings and
which has been endorsed on this thread by three committers is somehow
something that nobody really needs.  Seriously?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal: Implement failover on libpq connect level.

From
Dave Page
Date:
On Thu, Sep 3, 2015 at 3:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Sep 3, 2015 at 4:00 AM, Shulgin, Oleksandr
> <oleksandr.shulgin@zalando.de> wrote:
>> I believe that having a floating IP for the master is much more practical
>> approach and it doesn't require any patch to libpq or modification of the
>> client connection settings.
>
> I think that's a great approach if all the machines are on the same
> subnet.  If they are in different datacenters, it doesn't work.
>
> The amount of opposition to this feature is remarkable considering
> that it's available in Oracle, SQL Server, MongoDB, Cassandra, and
> MySQL.  See for example:
>
> http://docs.mongodb.org/manual/reference/connection-string/
> https://datastax.github.io/python-driver/getting_started.html
>
> This is a small patch with minimal to no downside implementing a
> feature that is present in most or all of the major competing
> products.  We're really doing ourselves a disservice if we reject it.
> I think it would be far better to progress to talking about what
> design we'd be comfortable with, rather than kidding ourselves that a
> feature that everyone else has and which somebody has taken the time
> to implement (thus, obviously it has value for them) and which has
> been discussed to general approval at PGCon developer meetings and
> which has been endorsed on this thread by three committers is somehow
> something that nobody really needs.  Seriously?

+100

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal: Implement failover on libpq connect level.

From
Stephen Frost
Date:
* Robert Haas (robertmhaas@gmail.com) wrote:
> On Thu, Sep 3, 2015 at 4:00 AM, Shulgin, Oleksandr
> <oleksandr.shulgin@zalando.de> wrote:
> > I believe that having a floating IP for the master is much more practical
> > approach and it doesn't require any patch to libpq or modification of the
> > client connection settings.
>
> I think that's a great approach if all the machines are on the same
> subnet.  If they are in different datacenters, it doesn't work.

Anycast could technically be used to address that issue, but there's a
whole host of reasons why that would be quite painful for a PG
connection.

> I think it would be far better to progress to talking about what
> design we'd be comfortable with, rather than kidding ourselves that a
> feature that everyone else has and which somebody has taken the time
> to implement (thus, obviously it has value for them) and which has
> been discussed to general approval at PGCon developer meetings and
> which has been endorsed on this thread by three committers is somehow
> something that nobody really needs.  Seriously?

Agreed.  For my part, I like the JDBC configuration approach and
definitely would ask that we support 'host:port' options since not all
servers will be on the same port.  I don't agree with Tom's concern
regarding the simultaneous connection to all servers at once (yes, it's
a bit unfriendly, but I don't see that as a reason to not provide that
choice and there's a lot of reasons why you'd want it).

What would be nice is a better way to configure these more complicated
options than the single string or even the current very simple
pg_service.conf file.  For example, a service name which could define
*other* service names to try along with a plan for how to connect to
them (round robin, simultaneously, read/write only, etc) and perhaps
also support specifying multiple service names to the 'service'
parameter.  I'd prefer that we support all different configuration
options through the 'single string' method also, but I'm not convinced
that's a hard requirement.

Thanks!

Stephen

Re: Proposal: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Thu, Sep 3, 2015 at 11:42 AM, Stephen Frost <sfrost@snowman.net> wrote:
>> > I believe that having a floating IP for the master is much more practical
>> > approach and it doesn't require any patch to libpq or modification of the
>> > client connection settings.
>>
>> I think that's a great approach if all the machines are on the same
>> subnet.  If they are in different datacenters, it doesn't work.
>
> Anycast could technically be used to address that issue, but there's a
> whole host of reasons why that would be quite painful for a PG
> connection.

/me rolls eyes.

>> I think it would be far better to progress to talking about what
>> design we'd be comfortable with, rather than kidding ourselves that a
>> feature that everyone else has and which somebody has taken the time
>> to implement (thus, obviously it has value for them) and which has
>> been discussed to general approval at PGCon developer meetings and
>> which has been endorsed on this thread by three committers is somehow
>> something that nobody really needs.  Seriously?
>
> Agreed.  For my part, I like the JDBC configuration approach and
> definitely would ask that we support 'host:port' options since not all
> servers will be on the same port.  I don't agree with Tom's concern
> regarding the simultaneous connection to all servers at once (yes, it's
> a bit unfriendly, but I don't see that as a reason to not provide that
> choice and there's a lot of reasons why you'd want it).

Yep.  And it can even be configurable behavior, as I suggested upthread.

> What would be nice is a better way to configure these more complicated
> options than the single string or even the current very simple
> pg_service.conf file.  For example, a service name which could define
> *other* service names to try along with a plan for how to connect to
> them (round robin, simultaneously, read/write only, etc) and perhaps
> also support specifying multiple service names to the 'service'
> parameter.  I'd prefer that we support all different configuration
> options through the 'single string' method also, but I'm not convinced
> that's a hard requirement.

Maybe someday we should have all that, but I think for right now
that's complicating things unnecessarily.  I think the best proposal
so far is to allow the host=X option to be repeated multiple times.
If you repeat the host=X option N times, you can also repeat the
port=X option exactly N times, or else you can specify it just once.
Done.

Alternatively, leave the host=X option alone and add a new option
hostlist=X, allowing a comma-separated list of names or IPs, with each
hostname or IP allowed an optional :port suffix.  If host=X parameter
is omitted or the connection to that machine fails, try everybody in
the hostlist concurrently, or with some configurable (and presumably
short) delay between one and then next.  Again, done.

Alternatively, change the rules for parsing the existing host=X
parameter so that we split it on some separator that isn't a valid
hostname character, and then strip off an optional :port syntax from
each entry; that value, if present, overrides port=X for that entry.

I think we're really tying ourselves in knots about problems that
really aren't very hard to solve here.  I'm sure some of these
proposals are better than others and the idea thing may be something
else again.  But if NASA can send a space probe 7.5 billion kilometers
to a frigid spheroid in the outer solar system without crashing into
anything or having any catastrophic software or hardware failures, I
bet we can come up with a convenient way to specify multiple IP
addresses.  I'd like the story of this feature to resemble a work by
e.e. cummings more than it does one by Robert Jordan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal: Implement failover on libpq connect level.

From
Alvaro Herrera
Date:
Robert Haas wrote:

> Alternatively, change the rules for parsing the existing host=X
> parameter so that we split it on some separator that isn't a valid
> hostname character, and then strip off an optional :port syntax from
> each entry; that value, if present, overrides port=X for that entry.

: is not a valid character in hostnames, so I think this makes sense.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Proposal: Implement failover on libpq connect level.

From
"Shulgin, Oleksandr"
Date:
On Thu, Sep 3, 2015 at 6:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Maybe someday we should have all that, but I think for right now
that's complicating things unnecessarily.  I think the best proposal
so far is to allow the host=X option to be repeated multiple times.
If you repeat the host=X option N times, you can also repeat the
port=X option exactly N times, or else you can specify it just once.
Done.

But this already breaks backwards-compatibility with any clients who belief that whatever value specified the latest takes precedence.  I'm not arguing that there are such use cases in the wild or that it's entirely sane thing to do, but still.

More importantly, this will break any code that tries to parse the conninfo string and produce a hashmap from it for modification.

Alternatively, leave the host=X option alone and add a new option
hostlist=X, allowing a comma-separated list of names or IPs, with each
hostname or IP allowed an optional :port suffix.  If host=X parameter
is omitted or the connection to that machine fails, try everybody in
the hostlist concurrently, or with some configurable (and presumably
short) delay between one and then next.  Again, done.

The exact behavior in case of both host/port and hostlist are specified becomes really tricky then.  It's already tricky enough, if you recall the service files -- how are they going to come into play here?

I believe the less there are implicit workings in the way libpq connects, the better.

Alternatively, change the rules for parsing the existing host=X
parameter so that we split it on some separator that isn't a valid
hostname character, and then strip off an optional :port syntax from
each entry; that value, if present, overrides port=X for that entry.

It's tempting to use ':' as the separator here, but it's still valid for directory names and host can be one in case of UN*X sockets.

--
Alex

Re: Proposal: Implement failover on libpq connect level.

From
Alvaro Herrera
Date:
Shulgin, Oleksandr wrote:

> > Alternatively, change the rules for parsing the existing host=X
> > parameter so that we split it on some separator that isn't a valid
> > hostname character, and then strip off an optional :port syntax from
> > each entry; that value, if present, overrides port=X for that entry.
> 
> It's tempting to use ':' as the separator here, but it's still valid for
> directory names and host can be one in case of UN*X sockets.

I think that's rare enough that we could just say that if you want to
have a : in a directory name used for local connections, you have to
escape the : character.  This is going to be pretty easy to detect as a
problem because of the obvious error message ("cannot parse "pg" in
/usr/sockets:pg as a port number"), except in the even rarer case that
the only stuff after the colon is digits.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Proposal: Implement failover on libpq connect level.

From
Stephen Frost
Date:
* Robert Haas (robertmhaas@gmail.com) wrote:
> Alternatively, change the rules for parsing the existing host=X
> parameter so that we split it on some separator that isn't a valid
> hostname character, and then strip off an optional :port syntax from
> each entry; that value, if present, overrides port=X for that entry.

Using a ':' should work just fine for that.  Having only one option for
how all the connections are done (concurrently, round robin, etc) and an
option for the timeout works for now.

Thanks!

Stephen

Re: Proposal: Implement failover on libpq connect level.

From
Christopher Browne
Date:
On 3 September 2015 at 12:57, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote
>
> On Thu, Sep 3, 2015 at 6:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>>
>> Maybe someday we should have all that, but I think for right now
>> that's complicating things unnecessarily.  I think the best proposal
>> so far is to allow the host=X option to be repeated multiple times.
>> If you repeat the host=X option N times, you can also repeat the
>> port=X option exactly N times, or else you can specify it just once.
>> Done.
>
>
> But this already breaks backwards-compatibility with any clients who belief that whatever value specified the latest takes precedence.  I'm not arguing that there are such use cases in the wild or that it's entirely sane thing to do, but still.
>
> More importantly, this will break any code that tries to parse the conninfo string and produce a hashmap from it for modification.

The notion of an "ordered hashmap" makes me break out in hives...

>> Alternatively, leave the host=X option alone and add a new option
>> hostlist=X, allowing a comma-separated list of names or IPs, with each
>> hostname or IP allowed an optional :port suffix.  If host=X parameter
>> is omitted or the connection to that machine fails, try everybody in
>> the hostlist concurrently, or with some configurable (and presumably
>> short) delay between one and then next.  Again, done.
>
>
> The exact behavior in case of both host/port and hostlist are specified becomes really tricky then.  It's already tricky enough, if you recall the service files -- how are they going to come into play here?
>
> I believe the less there are implicit workings in the way libpq connects, the better.

In that case, let's have a New Option, and expressly break with the implicit bits.

The new option ONLY accepts URIs, but allows it to be submitted multiple times.

psql --uri postgresql://postgres@favehost:5432/some_db_name --uri postgresql://postgres@favehost:5432/another_db_name --uri postgresql://postgres@favehost:5432/third_db_name --uri postgresql://postgres@favehost:5432/fourth_backup_db

Parsing conninfo strings is no fun.  I'm finding I prefer using URIs.  They may even be easier to parse, not that I have thus far cared; using URIs tends to mean I don't *need* to parse anything.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

Re: Proposal: Implement failover on libpq connect level.

From
Stephen Frost
Date:
* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
> Shulgin, Oleksandr wrote:
>
> > > Alternatively, change the rules for parsing the existing host=X
> > > parameter so that we split it on some separator that isn't a valid
> > > hostname character, and then strip off an optional :port syntax from
> > > each entry; that value, if present, overrides port=X for that entry.
> >
> > It's tempting to use ':' as the separator here, but it's still valid for
> > directory names and host can be one in case of UN*X sockets.
>
> I think that's rare enough that we could just say that if you want to
> have a : in a directory name used for local connections, you have to
> escape the : character.  This is going to be pretty easy to detect as a
> problem because of the obvious error message ("cannot parse "pg" in
> /usr/sockets:pg as a port number"), except in the even rarer case that
> the only stuff after the colon is digits.

If we really want to worry about this, we could simply check if the
directory exists with the ':5433' or whatever at the end and, if it
does, use whatever the port specification is.  If that directory doesn't
exist, and one without the ':5433' does, then we try that directory and
that port.

Personally, I agree with Alvaro that it's really just overkill to worry
about though.

Thanks!

Stephen

Re: Proposal: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Thu, Sep 3, 2015 at 12:57 PM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> On Thu, Sep 3, 2015 at 6:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Maybe someday we should have all that, but I think for right now
>> that's complicating things unnecessarily.  I think the best proposal
>> so far is to allow the host=X option to be repeated multiple times.
>> If you repeat the host=X option N times, you can also repeat the
>> port=X option exactly N times, or else you can specify it just once.
>> Done.
>
> But this already breaks backwards-compatibility with any clients who belief
> that whatever value specified the latest takes precedence.  I'm not arguing
> that there are such use cases in the wild or that it's entirely sane thing
> to do, but still.

Yep.  If we care about backward compatibility, there can be a new
option that must be specified to get the new behavior.  We can also
decide not to care about this case.

> More importantly, this will break any code that tries to parse the conninfo
> string and produce a hashmap from it for modification.

That is true, but I am not sure I agree that it is important.  Switch
to a hashmap whose values are arrays.

>> Alternatively, leave the host=X option alone and add a new option
>> hostlist=X, allowing a comma-separated list of names or IPs, with each
>> hostname or IP allowed an optional :port suffix.  If host=X parameter
>> is omitted or the connection to that machine fails, try everybody in
>> the hostlist concurrently, or with some configurable (and presumably
>> short) delay between one and then next.  Again, done.
>
> The exact behavior in case of both host/port and hostlist are specified
> becomes really tricky then.  It's already tricky enough, if you recall the
> service files -- how are they going to come into play here?

It doesn't seem that tricky to me, but maybe I'm biased by having just
invented it 5 minutes ago.

> I believe the less there are implicit workings in the way libpq connects,
> the better.

I don't disagree with that as a general rule - only when it keeps us
from implementing useful features.

>>> Alternatively, change the rules for parsing the existing host=X
>> parameter so that we split it on some separator that isn't a valid
>> hostname character, and then strip off an optional :port syntax from
>> each entry; that value, if present, overrides port=X for that entry.
>
> It's tempting to use ':' as the separator here, but it's still valid for
> directory names and host can be one in case of UN*X sockets.

The directory name is only likely to contain : on Windows, and Windows
doesn't support UNIX sockets.

All of these objections seem pretty thin to me.  I'd accept any of
them as a reason for preferring one alternative over another, but I
don't accept that the presence of a few problems of this magnitude
means we should give up on the feature.  It's a good enough feature
that it is worth the possibility of slightly inconveniencing someone
running in an unusual configuration.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal: Implement failover on libpq connect level.

From
"Shulgin, Oleksandr"
Date:
<p dir="ltr"><br /> On Sep 3, 2015 7:30 PM, "Robert Haas" <<a
href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>>wrote:<br /> ><br /> > All of these objections
seempretty thin to me.  I'd accept any of<br /> > them as a reason for preferring one alternative over another, but
I<br/> > don't accept that the presence of a few problems of this magnitude<br /> > means we should give up on
thefeature.  It's a good enough feature<br /> > that it is worth the possibility of slightly inconveniencing
someone<br/> > running in an unusual configuration.<p dir="ltr">I give up.<p dir="ltr">Though I still don't see any
compellingreason for this to be in libpq itself. By the way, what about mixing conninfo and uris - should this not be
allowed?<pdir="ltr">-<br /> Alex 

Re: Proposal: Implement failover on libpq connect level.

From
Bruce Momjian
Date:
On Thu, Sep  3, 2015 at 10:56:42AM -0400, Robert Haas wrote:
> The amount of opposition to this feature is remarkable considering
> that it's available in Oracle, SQL Server, MongoDB, Cassandra, and
> MySQL.  See for example:
> 
> http://docs.mongodb.org/manual/reference/connection-string/
> https://datastax.github.io/python-driver/getting_started.html
> 
> This is a small patch with minimal to no downside implementing a
> feature that is present in most or all of the major competing
> products.  We're really doing ourselves a disservice if we reject it.
> I think it would be far better to progress to talking about what
> design we'd be comfortable with, rather than kidding ourselves that a
> feature that everyone else has and which somebody has taken the time
> to implement (thus, obviously it has value for them) and which has
> been discussed to general approval at PGCon developer meetings and
> which has been endorsed on this thread by three committers is somehow
> something that nobody really needs.  Seriously?

As much as I like to disagree with Robert, I can't in this case.  ;-)

It is annoying for less capable database to say they have high
availability when that just involves having a client library that can
connect to multiple hosts.  Yes, we can do this in DNS, but that is all
happening at a different layer.

Now, the counter-argument is that this is the wrong layer to do it, and
we will end up adding tons of configurations variables to libpq to
control this.

We are clearly not adding this just because JDBC has it --- we are
adding it because it allows for more complex server configurations. 
Could this ability be more powerfully done with DNS or a connection
pooler, yes, but not everyone wants that complexity.  For me, this libpq
change has a simple user API with a small amount of code change that
give us a simple solution to a common problem.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Discussion summary: Proposal: Implement failover on libpq connect level.

From
Victor Wagner
Date:
First of all, I wish to thank all the participants of the discussion
for their useful suggestions and critics.

I see that people are really interested in this feature.

Now, list of syntax and behavior changes against original proposal:

1. There is similar feature in JDBC, so syntax of URL-style connect
string should be compatible with JDBC:
   postgresql://host:port,host:port/dbname...

2. In the param=value style connect string, there should be way to
specify different non-standard port for each host in the cluster.

So on, it seems that people came to conclusion, that allowing host:port
syntax as value of the host parameter shouldn't break anything.

So, port should be determined following way:
If there is :portnumber in the host string, this portnumber takes
precedence over anything else for this particular host. To avoid
problems with file system objects with strange names, colon is
interpreted as port number separator only if value of the host
parameter doesn't begin with slash. If it begins with slash, it is
handled exactly as it handled now.
If port is not explicitly specified in the host string, then use
parameter port= (for all hosts, which are not accompanied with own port
number), if no port parameter in the connect string, use same chain of
fallbacks up to compiled in standard value, as in use now.

3. Simultaneous connection to all the hosts in cluster shouldn't be
default. By default hosts should be tried in order they are specified
in the connect string. To minimize syntax changes there should be
single parameter 'hostorder' which specifies order of hosts with
possible choices
sequential(default), parallel and random.

4. There should be a way to allow cluster software or system
administrator some time to perform failover procedures and promote one
of standbys to master. So, there should be additional parameter,
failover_timeout, which specifies number of seconds connection attempts
should be retried, if no usable master server found.

5. Parameter readonly (which allows to distinguish between failover and
load balancing of readonly clients) should be implemented as in
original proposal.



--                                   Victor Wagner <vitus@wagner.pp.ru>



Re: Proposal: Implement failover on libpq connect level.

From
"Daniel Verite"
Date:
    Victor Wagner wrote:

> It would just take a bit more time for client and a bit more load for
> server - to make sure that this connection is read-write by
> issuing
>
>    show transaction_read_only
>
> statement before considering connection useful.

If the purpose of the feature is to wait for a failover to complete,
shouldn't it check for pg_is_in_recovery() rather than transaction_read_only ?

That's because a database or user can be made read-only-on-connect
on an otherwise read-write instance by issuing ALTER DATABASE dbname SET default_transaction_read_only TO on;
The same for a user with ALTER USER.

In that case,  transaction_read_only  would be OFF after connecting,
both on the master and on a slave, independantly of any failover
in progress or finished or not having occurred at all.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Proposal: Implement failover on libpq connect level.

From
Victor Wagner
Date:
В Mon, 07 Sep 2015 17:32:48 +0200
"Daniel Verite" <daniel@manitou-mail.org> пишет:

>     Victor Wagner wrote:
>
> > It would just take a bit more time for client and a bit more load
> > for server - to make sure that this connection is read-write by
> > issuing
> >
> >    show transaction_read_only
> >
> > statement before considering connection useful.
>
> If the purpose of the feature is to wait for a failover to complete,
> shouldn't it check for pg_is_in_recovery() rather than
>  transaction_read_only ?
>

Purpose of this feature is to distinguish between master and standby
servers.

This allows failover system to work with standby servers accepting
client connections, and even to create system where read-only clients
can be loadbalanced among several hot backup servers, and read-write
clients work with master, but do not need reconfiguration when
failover happens.

pg_is_in_recovery() is really better. But it seems that chapter 25 of
documentation should be improved and this function mentioned in the
section 25.5.1 (Hot Standby / User Overview)



--                                   Victor Wagner <vitus@wagner.pp.ru>



Re: Proposal: Implement failover on libpq connect level.

From
Kevin Grittner
Date:
Bruce Momjian <bruce@momjian.us> wrote:

> It is annoying for less capable database to say they have high
> availability when that just involves having a client library that
> can connect to multiple hosts.

This sounds like the "But all the *other* kids are doing it!"
argument, which comes up often.  We generally resist doing
something solely on that basis, so the rest of the email is really
what matters, I think, much as this does gall.

> Yes, we can do this in DNS, but that is all happening at a
> different layer.

More than that, there are technical reasons that can be a bad
solution.  As just one example, the servers might well be in
different domains.

> Now, the counter-argument is that this is the wrong layer to do
> it, and we will end up adding tons of configurations variables to
> libpq to control this.

Yeah, we definitely *don't* want to implement some sort of failover
manager in every connector -- that way madness lies.

> We are clearly not adding this just because JDBC has it --- we
> are adding it because it allows for more complex server
> configurations.

I think what we need is a clear description of use cases where we
think this is the solution, and some clear boundaries to the scope
-- so it is also clear what kinds of problems this is *not*
intended to solve.

> Could this ability be more powerfully done with DNS or a
> connection pooler, yes, but not everyone wants that complexity.
> For me, this libpq change has a simple user API with a small
> amount of code change that give us a simple solution to a common
> problem.

I'm not saying we shouldn't have something like this; but we need a
clear definition of that common problem we are solving.  I don't
think I've seen that yet.  I've seen various spins on solutions 
described, from which I can infer various possible problems; but to 
pick the best version of this as *the* solution I think we need a 
clear statement of the problem itself.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Tue, Sep 8, 2015 at 9:29 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> I'm not saying we shouldn't have something like this; but we need a
> clear definition of that common problem we are solving.  I don't
> think I've seen that yet.  I've seen various spins on solutions
> described, from which I can infer various possible problems; but to
> pick the best version of this as *the* solution I think we need a
> clear statement of the problem itself.

I think the problem we should be trying to solve is: Given a set of
server IPs, connect to one that is up.

I believe this comes up in several different scenarios.

Example #1: I need to move my database server to a different IP
address.  I could have clients connect by name, but using DNS sucks
for the reasons already discussed.  I could move the server and then
change all the client connect strings afterwards, but then I'm bound
to be down for longer than necessary.  Instead, I'll change all of my
connect strings FIRST, to specify that either IP is acceptable.  Then
I'll move the server to the new IP, and all of them will quickly find
that the old IP is down and the new IP is up and start using that
instead.  At leisure, I can remove the old IP from connect strings
(and then re-purpose that IP).

Example #2: I am using EnterpriseDB's xDB multi-master replication, or
2ndQuadrant's BDR, to replicate among geographically distributed
database servers.  I want clients to connect to their local server,
but if it is down, I want them to connect to one of the other masters.
Connecting to the local server first minimizes replication conflicts,
since most transactions on a given application server will be for data
local to that geography, but being willing to fall back to some other
server maximizes availability when my local server goes down.

Example #3: I have a master and and 3 SR standbys, all on different
subnets.  Periodically, I fail over, so that the master role moves
around.  Each server has an IP which can be used for read-only
connections.  Each also has a virtual IP which is up when it is the
write master and down when it is a standby.  Read-only queries are 90%
of my traffic, and eventual consistency is fine.  So, for a read
query, I want to pick among the IPs that are up; for write IPs, I want
to find the one place that writes can be performed, but that might be
any of 4 virtual IPs.

I'm sure there are more.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal: Implement failover on libpq connect level.

From
Kevin Grittner
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> I think the problem we should be trying to solve is: Given a set
> of server IPs, connect to one that is up.
>
> I believe this comes up in several different scenarios.
>
> Example #1: [single server; changing IP address gracefully]
>
> Example #2: [xDB/BDR client uses local master if up; else a remote]
>
> Example #3: [SR/HS with changing primary]

For all of those it is clear that we do not need (or want!)
heartbeat, STONITH, fencing, etc. to be handled by the connector.
If the above are the sorts of problems we are trying to solve, a
very simple solution is the best.  I know you outlined several; I'm
not sure it would matter all that much which one we used -- any
would work and someone should Just Do It.

> I'm sure there are more.

Ay, there's the rub.  Some people seemed to be suggesting that this
should be far more than what you describe above.  That would, IMO,
be a mistake.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Wed, Sep 9, 2015 at 4:30 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>
>> I think the problem we should be trying to solve is: Given a set
>> of server IPs, connect to one that is up.
>>
>> I believe this comes up in several different scenarios.
>>
>> Example #1: [single server; changing IP address gracefully]
>>
>> Example #2: [xDB/BDR client uses local master if up; else a remote]
>>
>> Example #3: [SR/HS with changing primary]
>
> For all of those it is clear that we do not need (or want!)
> heartbeat, STONITH, fencing, etc. to be handled by the connector.
> If the above are the sorts of problems we are trying to solve, a
> very simple solution is the best.  I know you outlined several; I'm
> not sure it would matter all that much which one we used -- any
> would work and someone should Just Do It.
>
>> I'm sure there are more.
>
> Ay, there's the rub.  Some people seemed to be suggesting that this
> should be far more than what you describe above.  That would, IMO,
> be a mistake.

It sounds like we are pretty much on the same page.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On 2015.08.18 at 07:18:50 +0300, Victor Wagner wrote:

> Rationale
> =========
>
> Since introduction of the WAL-based replication into the PostgreSQL, it is
> possible to create high-availability and load-balancing clusters.
>
> However, there is no support for failover in the client libraries. So, only
> way to provide transparent for client application failover is IP address
> migration. This approach has some limitation, i.e. it requires that
> master and backup servers reside in the same subnet or may not be
> feasible for other reasons.
>
> Commercial RDBMS, such as Oracle, employ more flexible approach. They
> allow to specify multiple servers in the connect string, so if primary
> server is not available, client library tries to connect to other ones.
>
> This approach allows to use geographically distributed failover clusters
> and also is a cheap way to implement load-balancing (which is not
> possible with IP address migration).
>

Attached patch which implements client library failover and
loadbalancing as was described in the proposal
<20150818041850.GA5092@wagner.pp.ru>.

This patch implements following fuctionality:

1. It is allowed to specify several hosts in the connect string, either
in URL-style (separated by comma) or in param=value form (several host
parameters).

2. Each host parameter can be accompanied with port specifier separated
by colon. Port, specified such way takes precedence of port parameter or
default port for this particular host only.

3. There is hostorder parameter with two possible valies 'sequential'
and 'random' (default sequential). 'parallel' hostorder described in the
proposal is not yet implemented in this version of patch.

4. In the URL-style connect string parameter loadBalanceHosts=true is
considered equal to 'hostorder=random' for compatibility with jdbc.

5. Added new parameter readonly=boolean. If this parameter is false (the
default) upon successful connection check is performed that backend is
not in the recovery state. If so, connection is not considered usable
and next host is tried.

6. Added new parameter falover_timeout. If no usable host is found and
this parameter is specified, hosts are retried cyclically until this
timeout expires. It gives some time for claster management software to
promote one of standbys to master if master fails. By default there is
no retries.

Some implementation notes:

1. Field  pghost in the PGconn structure now stores comma separated list
of hosts, which is parsed in the connectDBStart. So, expected results of
some tests in src/interfaces/libpq/test are changed.

2. Url with colon  but no port number after the host no more considered
valid.

3. With hostorder=random we have to seed libc random number gernerator.
Some care was taken to not to lose entropy if generator was
initialized by app before connection, and to ensure different random
values if several connections are made from same application in one
second (even in single thread). RNG is seeded by xor of current time,
random value from this rng before seeding (if it was seeded earlier, it
keeps entropy) and address of the connection structure. May be it worth
effort adding thread id or pid, but there is no portable way to doing
so, so it would need testing on all supported platform.




Attachment

Re: Patch: Implement failover on libpq connect level.

From
"Shulgin, Oleksandr"
Date:
On Wed, Oct 14, 2015 at 12:41 PM, Victor Wagner <vitus@wagner.pp.ru> wrote:
On 2015.08.18 at 07:18:50 +0300, Victor Wagner wrote:

> Rationale
> =========
>
> Since introduction of the WAL-based replication into the PostgreSQL, it is
> possible to create high-availability and load-balancing clusters.
>
> However, there is no support for failover in the client libraries. So, only
> way to provide transparent for client application failover is IP address
> migration. This approach has some limitation, i.e. it requires that
> master and backup servers reside in the same subnet or may not be
> feasible for other reasons.
>
> Commercial RDBMS, such as Oracle, employ more flexible approach. They
> allow to specify multiple servers in the connect string, so if primary
> server is not available, client library tries to connect to other ones.
>
> This approach allows to use geographically distributed failover clusters
> and also is a cheap way to implement load-balancing (which is not
> possible with IP address migration).
>

Attached patch which implements client library failover and
loadbalancing as was described in the proposal
<20150818041850.GA5092@wagner.pp.ru>.

This patch implements following fuctionality:

1. It is allowed to specify several hosts in the connect string, either
in URL-style (separated by comma) or in param=value form (several host
parameters).

So what exactly would happen with this command: PGHOST=host1 psql -h host2

Or this one: PGHOST=host1 psql host=host2

What about service files?

2. Each host parameter can be accompanied with port specifier separated
by colon. Port, specified such way takes precedence of port parameter or
default port for this particular host only.

3. There is hostorder parameter with two possible valies 'sequential'
and 'random' (default sequential). 'parallel' hostorder described in the
proposal is not yet implemented in this version of patch.

4. In the URL-style connect string parameter loadBalanceHosts=true is
considered equal to 'hostorder=random' for compatibility with jdbc.

5. Added new parameter readonly=boolean. If this parameter is false (the
default) upon successful connection check is performed that backend is
not in the recovery state. If so, connection is not considered usable
and next host is tried.

6. Added new parameter falover_timeout. If no usable host is found and
this parameter is specified, hosts are retried cyclically until this
timeout expires. It gives some time for claster management software to
promote one of standbys to master if master fails. By default there is
no retries.

Some implementation notes:

1. Field  pghost in the PGconn structure now stores comma separated list
of hosts, which is parsed in the connectDBStart. So, expected results of
some tests in src/interfaces/libpq/test are changed.

 trying postgresql://[::1]:12345/db
-dbname='db' host='::1' port='12345' (inet)
+dbname='db' host='[::1]:12345' (inet) 

Such a change doesn't look really nice to me.


2. Url with colon  but no port number after the host no more considered
valid.

We could live with that, but is there a good reason for backward compatibility break in this case?

--
Alex

Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On 2015.10.14 at 14:47:46 +0200, Shulgin, Oleksandr wrote:

> 
> So what exactly would happen with this command: PGHOST=host1 psql -h host2
> 
> Or this one: PGHOST=host1 psql host=host2

The right thing - value of the PGHOST environment variable is ignored,
and explicitely specified host is used.

> What about service files?

Oops, I've missed something here. Multiple host lines in the service
file do not produce list of alternate hosts.

I'll address this case in the next version of patch.

However, comma-separated list of host in the one host string in the
service file works.


> 
>  trying postgresql://[::1]:12345/db
> -dbname='db' host='::1' port='12345' (inet)
> +dbname='db' host='[::1]:12345' (inet)
> 
> Such a change doesn't look really nice to me.

Really, any alternative would be worse. Only alternative I can imagine
is to replace pghost field in the PGConn structure by array or linked
list of structures with host and port fields.

Now I decided to keep such array local to connectDBStart function, and
store list of host-port pair as string. 

It takes some CPU cycles to reparse this list each time when
reconnection is needed, but this is negligible compared with DNS
resolution of each host name. And we cannot avoid repeated DNS
resolution, because we have to take into account possibility of
DNS-based failover.


> > 2. Url with colon  but no port number after the host no more considered
> > valid.
> >
> 
> We could live with that, but is there a good reason for backward
> compatibility break in this case?

Because we can. Really, I don't see any useful semantic of host with
colon at the end. Null host name with colon and port have useful
semantic (if there is just one host in the  connect string), so it kept
as it was.

In the old version of connection-string parsing code stripping lone
colon of the host name was done by generic code (because port number
have gone to another variable anyway). Now handling of lone
colon is special case, because we just check syntax of host-port pair
list. 

And I have to choose how to handle it - either complain or silently
strip it away. Complaining was similier. We compute port number and
check if it is between 1 and 65535. Zero-length port number is not
integer between 1 and 65535.


> --
> Alex

--                                   Victor Wagner <vitus@wagner.pp.ru>



Re: Patch: Implement failover on libpq connect level.

From
Craig Ringer
Date:
On 14 October 2015 at 18:41, Victor Wagner <vitus@wagner.pp.ru> wrote:

> 5. Added new parameter readonly=boolean. If this parameter is false (the
> default) upon successful connection check is performed that backend is
> not in the recovery state. If so, connection is not considered usable
> and next host is tried.

What constitutes "failed" as far as this is concerned?

Like the PgJDBC approach I wonder how much this'll handle in practice
and how it'll go with less clear-cut failures like disk-full on a
replica that's a member of the failover pool, long delays before
no-route-to-host errors, dns problems, etc.

Had any chance to simulate network failures?

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On 2015.10.15 at 17:13:46 +0800, Craig Ringer wrote:

> On 14 October 2015 at 18:41, Victor Wagner <vitus@wagner.pp.ru> wrote:
> 
> > 5. Added new parameter readonly=boolean. If this parameter is false (the
> > default) upon successful connection check is performed that backend is
> > not in the recovery state. If so, connection is not considered usable
> > and next host is tried.
> 
> What constitutes "failed" as far as this is concerned?

failed means "select pg_is_in_recovery() returns true". 
The only reason of adding this parameter is to state that we 
can use  connection to warm-backup slave (or if we cannot).

> Like the PgJDBC approach I wonder how much this'll handle in practice
> and how it'll go with less clear-cut failures like disk-full on a
> replica that's a member of the failover pool, long delays before
> no-route-to-host errors, dns problems, etc.

Really I don't think that disk-full and other such node errors are
problems of client library. It is cluster management software which
should check for these errors and shut down erroneous nodes or at least
disable their listening for connection.

As for long-timeouts on network failures, they can be handled by
existing connect_timeout parameter. 

May be it is worth effort to rewrite this parameter handling. Now it
only used in blocking functions which call connectDBComplete
(PQconnectdb[Params], PQreset and PQsetdbLogin). May be it should be hidden into
PQconnectPoll state machine, so event-driven applications could get its
handling for free.

Really, my initial proposal included hostorder=parallel especially for
this situation. But I've decided not to implement it right now, because
it requires changes in the libpq public API.
> Had any chance to simulate network failures?

Testing failover code is really tricky thing. All testing I've done yet
was in the mode "Shudown one of the nodes and see what happens".
It is not too hard to simulate network failures using iptables, but
requires considerable planning of the test scenario.


--                                   Victor Wagner <vitus@wagner.pp.ru>



Patch (2): Implement failover on libpq connect level.

From
Victor Wagner
Date:
On 2015.10.14 at 13:41:51 +0300, Victor Wagner wrote:

>
> Attached patch which implements client library failover and
> loadbalancing as was described in the proposal
> <20150818041850.GA5092@wagner.pp.ru>.
>

I'm sending imporoved verison of patch. As Olexander Shulgin noted,
previous version of patch lacks support for service files.

Now support for service files is implemented and multiple host
statements in the service file are allowed.
--
                                   Victor Wagner <vitus@wagner.pp.ru>

Attachment

Re: Patch (2): Implement failover on libpq connect level.

From
Thom Brown
Date:
On 21 October 2015 at 10:07, Victor Wagner <vitus@wagner.pp.ru> wrote:
> On 2015.10.14 at 13:41:51 +0300, Victor Wagner wrote:
>
>>
>> Attached patch which implements client library failover and
>> loadbalancing as was described in the proposal
>> <20150818041850.GA5092@wagner.pp.ru>.
>>
>
> I'm sending imporoved verison of patch. As Olexander Shulgin noted,
> previous version of patch lacks support for service files.
>
> Now support for service files is implemented and multiple host
> statements in the service file are allowed.

This is causing breakage:

$ pg_basebackup -v -x -D standby1 -h localhost -p 5532 -U rep_user
row number 0 is out of range 0..-1
Segmentation fault

The log contains this:

2015-10-22 13:51:41.380 BST - rep_user - [unknown]DEBUG:  received
replication command: SELECT pg_is_in_recovery()
2015-10-22 13:51:41.380 BST - rep_user - [unknown]ERROR:  syntax error
2015-10-22 13:51:41.380 BST - rep_user - [unknown]LOG:  could not
receive data from client: Connection reset by peer
2015-10-22 13:51:41.380 BST - rep_user - [unknown]DEBUG:  unexpected
EOF on client connection
2015-10-22 13:51:41.380 BST - rep_user - [unknown]DEBUG:
shmem_exit(0): 1 before_shmem_exit callbacks to make
2015-10-22 13:51:41.380 BST - rep_user - [unknown]DEBUG:
shmem_exit(0): 7 on_shmem_exit callbacks to make
2015-10-22 13:51:41.380 BST - rep_user - [unknown]DEBUG:
proc_exit(0): 3 callbacks to make
2015-10-22 13:51:41.380 BST - rep_user - [unknown]DEBUG:  exit(0)
2015-10-22 13:51:41.380 BST - rep_user - [unknown]DEBUG:
shmem_exit(-1): 0 before_shmem_exit callbacks to make
2015-10-22 13:51:41.380 BST - rep_user - [unknown]DEBUG:
shmem_exit(-1): 0 on_shmem_exit callbacks to make
2015-10-22 13:51:41.380 BST - rep_user - [unknown]DEBUG:
proc_exit(-1): 0 callbacks to make
2015-10-22 13:51:41.380 BST -  - DEBUG:  reaping dead processes
2015-10-22 13:51:41.380 BST -  - DEBUG:  server process (PID 24303)
exited with exit code 0

-- 
Thom



Re: Patch (2): Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Thu, 22 Oct 2015 14:33:11 +0100
Thom Brown <thom@linux.com> wrote:

> On 21 October 2015 at 10:07, Victor Wagner <vitus@wagner.pp.ru> wrote:
> > On 2015.10.14 at 13:41:51 +0300, Victor Wagner wrote:
> >
> >>
> >> Attached patch which implements client library failover and
> >> loadbalancing as was described in the proposal
> >> <20150818041850.GA5092@wagner.pp.ru>.
> >>
> >
> > I'm sending imporoved verison of patch. As Olexander Shulgin noted,
> > previous version of patch lacks support for service files.
> >
> > Now support for service files is implemented and multiple host
> > statements in the service file are allowed.
> 
> This is causing breakage:
> 
> $ pg_basebackup -v -x -D standby1 -h localhost -p 5532 -U rep_user
> row number 0 is out of range 0..-1

It seems that pg_basebackup should always specify readonly attribute
for the connection. 

Your data directory is named standby1, so I suspect
that you are trying to make backup from read-only standby instance of
the base. 

Can you check if problem persists in your setup with command

pg_basebackup -v -x -D standby1 \ -d "host=localhost port=5532 user=rep_user readonly=1"


> Segmentation fault

Of course, this shouldn't happen even if libpq is severely misused.
And it is almost evident what should be done to fix it.
But making sure that my patch doesn't interfere with operation of
pg_basebackup is a bit more complicated.





Re: Patch (2): Implement failover on libpq connect level.

From
Korry Douglas
Date:


> On 2015.10.14 at 13:41:51 +0300, Victor Wagner wrote:
>
>> Attached patch which implements client library failover and
>> loadbalancing as was described in the proposal
>> <20150818041850.GA5092@wagner.pp.ru>.
>>
> I'm sending imporoved verison of patch. As Olexander Shulgin noted,
> previous version of patch lacks support for service files.
>
> Now support for service files is implemented and multiple host
> statements in the service file are allowed.

A couple of minor nits:

When you call pg_is_in_recovery(), you should schema-qualify the 
function name, just in case some other version of that function exists 
in the search_path.

Also, pg_is_in_recovery() returns a boolean value - PQgetvalue() will 
not return "true" or "false", it will return "t" or "f".

And, you have a bit of garbage in the patch (the patch inserts 
UNIXSOCK_PATH(portstr, portnum, conn->pgunixsocket); in the header 
comment at the top of fe-connect.c).

                        -- Korry



Re: Patch (2): Implement failover on libpq connect level.

From
Thom Brown
Date:
On 23 October 2015 at 12:52, Victor Wagner <vitus@wagner.pp.ru> wrote:
> On Thu, 22 Oct 2015 14:33:11 +0100
> Thom Brown <thom@linux.com> wrote:
>
>> On 21 October 2015 at 10:07, Victor Wagner <vitus@wagner.pp.ru> wrote:
>> > On 2015.10.14 at 13:41:51 +0300, Victor Wagner wrote:
>> >
>> >>
>> >> Attached patch which implements client library failover and
>> >> loadbalancing as was described in the proposal
>> >> <20150818041850.GA5092@wagner.pp.ru>.
>> >>
>> >
>> > I'm sending imporoved verison of patch. As Olexander Shulgin noted,
>> > previous version of patch lacks support for service files.
>> >
>> > Now support for service files is implemented and multiple host
>> > statements in the service file are allowed.
>>
>> This is causing breakage:
>>
>> $ pg_basebackup -v -x -D standby1 -h localhost -p 5532 -U rep_user
>> row number 0 is out of range 0..-1
>
> It seems that pg_basebackup should always specify readonly attribute
> for the connection.
>
> Your data directory is named standby1, so I suspect
> that you are trying to make backup from read-only standby instance of
> the base.

No, the -D option on pg_basebackup specifies the directory to receive
the backup into, not from.  This is backing up a primary instance.

> Can you check if problem persists in your setup with command
>
> pg_basebackup -v -x -D standby1 \
>   -d "host=localhost port=5532 user=rep_user readonly=1"

Yes, this works:

$ pg_basebackup -v -x -D standby1 -d "host=localhost port=5532
user=rep_user readonly=1"
transaction log start point: 0/2000028 on timeline 1
transaction log end point: 0/2000130
pg_basebackup: base backup completed

Thom



Re: Patch (2): Implement failover on libpq connect level.

From
Victor Wagner
Date:
В Fri, 23 Oct 2015 22:14:56 +0100
Thom Brown <thom@linux.com> пишет:

c> >
> > pg_basebackup -v -x -D standby1 \
> >   -d "host=localhost port=5532 user=rep_user readonly=1"
>
> Yes, this works:
>
> $ pg_basebackup -v -x -D standby1 -d "host=localhost port=5532
> user=rep_user readonly=1"
> transaction log start point: 0/2000028 on timeline 1
> transaction log end point: 0/2000130
> pg_basebackup: base backup completed
>
> Thom

Thanks for your help.

I was unable to reproduce segfault with this situation on Linux x86-64..

I got message:

row number 0 is out of range 0..-1

and connection failed. Can you tell me a bit more of your setup.
At least which architecture you are compiling Postgres for.


--                                   Victor Wagner <vitus@wagner.pp.ru>



Re: Patch (2): Implement failover on libpq connect level.

From
Victor Wagner
Date:
В Fri, 23 Oct 2015 16:02:33 -0400
Korry Douglas <korry.douglas@enterprisedb.com> пишет:

d> > Now support for service files is implemented and multiple host
> > statements in the service file are allowed.
>
> A couple of minor nits:
>
> When you call pg_is_in_recovery(), you should schema-qualify the
> function name, just in case some other version of that function
> exists in the search_path.
>
> Also, pg_is_in_recovery() returns a boolean value - PQgetvalue() will
> not return "true" or "false", it will return "t" or "f".
>
> And, you have a bit of garbage in the patch (the patch inserts
> UNIXSOCK_PATH(portstr, portnum, conn->pgunixsocket); in the header
> comment at the top of fe-connect.c).
>

Thanks, I'd address this issues in the next version of path.



--                                   Victor Wagner <vitus@wagner.pp.ru>



Re: Patch (2): Implement failover on libpq connect level.

From
Robert Haas
Date:
On Fri, Oct 23, 2015 at 4:02 PM, Korry Douglas
<korry.douglas@enterprisedb.com> wrote:
> When you call pg_is_in_recovery(), you should schema-qualify the function
> name, just in case some other version of that function exists in the
> search_path.

I wonder whether it's really a good idea to put this kind of logic
into libpq at all.  I think there was some previous votes against
doing so, and I tend to agree with that viewpoint.  Shouldn't probing
for the state of the connection be the caller's job, not libpq's?  If
somebody wants to write a wrapper function around this that runs this
query after connecting - or any other query - they can do so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Patch (3): Implement failover on libpq connect level.

From
Victor Wagner
Date:
On 2015.10.14 at 13:41:51 +0300, Victor Wagner wrote:

> Attached patch which implements client library failover and
> loadbalancing as was described in the proposal
> <20150818041850.GA5092@wagner.pp.ru>.

New version of patch

1. Handles replication connections correctly (i.e doesn't attempt to
check readwrite mode if replication option is on)
2. Some minor improvement recommended by Korry Douglas  in
<562A9259.4060408@enterprisedb.com>

--
                                   Victor Wagner <vitus@wagner.pp.ru>

Attachment

Re: Patch (3): Implement failover on libpq connect level.

From
Robert Haas
Date:
On Mon, Oct 26, 2015 at 3:58 AM, Victor Wagner <vitus@wagner.pp.ru> wrote:
> On 2015.10.14 at 13:41:51 +0300, Victor Wagner wrote:
>> Attached patch which implements client library failover and
>> loadbalancing as was described in the proposal
>> <20150818041850.GA5092@wagner.pp.ru>.
>
> New version of patch
>
> 1. Handles replication connections correctly (i.e doesn't attempt to
> check readwrite mode if replication option is on)
> 2. Some minor improvement recommended by Korry Douglas  in
> <562A9259.4060408@enterprisedb.com>

Changing the subject breaks threading for anybody using gmail (and
maybe some other mail readers).  Could you try to avoid that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch (2): Implement failover on libpq connect level.

From
Thom Brown
Date:
On 24 October 2015 at 07:32, Victor Wagner <vitus@wagner.pp.ru> wrote:
> В Fri, 23 Oct 2015 22:14:56 +0100
> Thom Brown <thom@linux.com> пишет:
>
> c> >
>> > pg_basebackup -v -x -D standby1 \
>> >   -d "host=localhost port=5532 user=rep_user readonly=1"
>>
>> Yes, this works:
>>
>> $ pg_basebackup -v -x -D standby1 -d "host=localhost port=5532
>> user=rep_user readonly=1"
>> transaction log start point: 0/2000028 on timeline 1
>> transaction log end point: 0/2000130
>> pg_basebackup: base backup completed
>>
>> Thom
>
> Thanks for your help.
>
> I was unable to reproduce segfault with this situation on Linux x86-64..
>
> I got message:
>
> row number 0 is out of range 0..-1
>
> and connection failed. Can you tell me a bit more of your setup.
> At least which architecture you are compiling Postgres for.

Linux 64-bit, 3.11 kernel.

Thom



Re: Patch: Implement failover on libpq connect level.

From
Peter Eisentraut
Date:
On 10/14/15 6:41 AM, Victor Wagner wrote:
> 1. It is allowed to specify several hosts in the connect string, either
> in URL-style (separated by comma) or in param=value form (several host
> parameters).

I'm not fond of having URLs that are not valid URLs according to the
applicable standards.  Because then they can't be parsed or composed by
standard libraries.

Also, this assumes that all the components other than host and port are
the same.  Earlier there was a discussion about why the ports would ever
need to be different.  Well, why can't the database names be different?I could have use for that.

I think you should just accept multiple URLs.



Re: Patch (2): Implement failover on libpq connect level.

From
Peter Eisentraut
Date:
On 10/24/15 7:55 AM, Robert Haas wrote:
> On Fri, Oct 23, 2015 at 4:02 PM, Korry Douglas
> <korry.douglas@enterprisedb.com> wrote:
>> When you call pg_is_in_recovery(), you should schema-qualify the function
>> name, just in case some other version of that function exists in the
>> search_path.
> 
> I wonder whether it's really a good idea to put this kind of logic
> into libpq at all.  I think there was some previous votes against
> doing so, and I tend to agree with that viewpoint.  Shouldn't probing
> for the state of the connection be the caller's job, not libpq's?  If
> somebody wants to write a wrapper function around this that runs this
> query after connecting - or any other query - they can do so.

Well, earlier there was a debate whether any of this should be in libpq.

I think asking for a read-only or read-write connection would be quite
useful, and there isn't really a lot of room for interpretation that
would lead to a lot of application-specific implementations.

That said, it might still be best to omit this from the first round of
the patch, to simplify the discussion.





Re: Patch: Implement failover on libpq connect level.

From
Christopher Browne
Date:

On 26 October 2015 at 16:25, Peter Eisentraut <peter_e@gmx.net> wrote:
On 10/14/15 6:41 AM, Victor Wagner wrote:
> 1. It is allowed to specify several hosts in the connect string, either
> in URL-style (separated by comma) or in param=value form (several host
> parameters).

I'm not fond of having URLs that are not valid URLs according to the
applicable standards.  Because then they can't be parsed or composed by
standard libraries.

Also, this assumes that all the components other than host and port are
the same.  Earlier there was a discussion about why the ports would ever
need to be different.  Well, why can't the database names be different?
 I could have use for that.

I think you should just accept multiple URLs.

I'd give a "+1" on this...

As an area of new behaviour, I don't see a big problem with declining to
support every wee bit of libpq configuration, and instead requiring the
use of URLs.

Trying to put "multiplicities" into each parameter (and then considering
it at the pg_service level, too) is WAY more complicated, and for a
feature where it seems to me that it is pretty reasonable to have a
series of fully qualified URLs.

Specifying several URLs should be easier to understand, easier to
test, easier to code, and easier to keep from blowing up badly.

I'll observe that this is the way that OpenLDAP supports specifying
multiple clients, so this technique is familiar in other contexts
where people are trying to accomplish the same kind of thing.
Sample docs, perhaps not authoritative, but useful enough...
<http://manpages.courier-mta.org/htmlman5/ldap.conf.5.html>
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

Re: Patch: Implement failover on libpq connect level.

From
"Shulgin, Oleksandr"
Date:
On Mon, Oct 26, 2015 at 10:02 PM, Christopher Browne <cbbrowne@gmail.com> wrote:

On 26 October 2015 at 16:25, Peter Eisentraut <peter_e@gmx.net> wrote:
On 10/14/15 6:41 AM, Victor Wagner wrote:
> 1. It is allowed to specify several hosts in the connect string, either
> in URL-style (separated by comma) or in param=value form (several host
> parameters).

I'm not fond of having URLs that are not valid URLs according to the
applicable standards.  Because then they can't be parsed or composed by
standard libraries.

Also, this assumes that all the components other than host and port are
the same.  Earlier there was a discussion about why the ports would ever
need to be different.  Well, why can't the database names be different?
 I could have use for that.

I think you should just accept multiple URLs.

I'd give a "+1" on this...

As an area of new behaviour, I don't see a big problem with declining to
support every wee bit of libpq configuration, and instead requiring the
use of URLs.

Trying to put "multiplicities" into each parameter (and then considering
it at the pg_service level, too) is WAY more complicated, and for a
feature where it seems to me that it is pretty reasonable to have a
series of fully qualified URLs.

Specifying several URLs should be easier to understand, easier to
test, easier to code, and easier to keep from blowing up badly.

Setting aside all other concerns, have a +1 from me on that too.

--
Alex

Re: Patch: Implement failover on libpq connect level.

From
Josh Berkus
Date:
On 10/27/2015 01:42 AM, Shulgin, Oleksandr wrote:
> On Mon, Oct 26, 2015 at 10:02 PM, Christopher Browne <cbbrowne@gmail.com
> <mailto:cbbrowne@gmail.com>> wrote:
> 
> 
>     On 26 October 2015 at 16:25, Peter Eisentraut <peter_e@gmx.net
>     <mailto:peter_e@gmx.net>> wrote:
> 
>         On 10/14/15 6:41 AM, Victor Wagner wrote:
>         > 1. It is allowed to specify several hosts in the connect string, either
>         > in URL-style (separated by comma) or in param=value form (several host
>         > parameters).
> 
>         I'm not fond of having URLs that are not valid URLs according to the
>         applicable standards.  Because then they can't be parsed or
>         composed by
>         standard libraries.
> 
>         Also, this assumes that all the components other than host and
>         port are
>         the same.  Earlier there was a discussion about why the ports
>         would ever
>         need to be different.  Well, why can't the database names be
>         different?
>          I could have use for that.
> 
>         I think you should just accept multiple URLs.
> 
> 
>     I'd give a "+1" on this...
> 
>     As an area of new behaviour, I don't see a big problem with declining to
>     support every wee bit of libpq configuration, and instead requiring the
>     use of URLs.
> 
>     Trying to put "multiplicities" into each parameter (and then considering
>     it at the pg_service level, too) is WAY more complicated, and for a
>     feature where it seems to me that it is pretty reasonable to have a
>     series of fully qualified URLs.
> 
>     Specifying several URLs should be easier to understand, easier to
>     test, easier to code, and easier to keep from blowing up badly.
> 
> 
> Setting aside all other concerns, have a +1 from me on that too.

I'm good with this.  +1


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Mon, 26 Oct 2015 16:25:57 -0400
Peter Eisentraut <peter_e@gmx.net> wrote:

> Also, this assumes that all the components other than host and port
> are the same.  Earlier there was a discussion about why the ports
> would ever need to be different.  Well, why can't the database names
> be different? I could have use for that.

Because of way postgresql replication is implemented.

This multihost feature is for clusters. Either for automatic switch
from one cluster node to another when master node fails and one of the
standbys get promoted to master, or for load balancing between hot
standby nodes.

Postgresql allows different replicas listen on different ports, because
each replica has its own postgresql conf, but repication works on the
node level, not on the database level, so all the databases from master
node would be replicated with their original names.




Re: Patch: Implement failover on libpq connect level.

From
Peter Eisentraut
Date:
On 10/28/15 4:18 AM, Victor Wagner wrote:
> On Mon, 26 Oct 2015 16:25:57 -0400
> Peter Eisentraut <peter_e@gmx.net> wrote:
> 
>> Also, this assumes that all the components other than host and port
>> are the same.  Earlier there was a discussion about why the ports
>> would ever need to be different.  Well, why can't the database names
>> be different? I could have use for that.
> 
> Because of way postgresql replication is implemented.

There are multiple types of PostgreSQL replication, and there will be
others in the future.




Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Thu, Oct 29, 2015 at 8:29 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On 10/28/15 4:18 AM, Victor Wagner wrote:
>> On Mon, 26 Oct 2015 16:25:57 -0400
>> Peter Eisentraut <peter_e@gmx.net> wrote:
>>
>>> Also, this assumes that all the components other than host and port
>>> are the same.  Earlier there was a discussion about why the ports
>>> would ever need to be different.  Well, why can't the database names
>>> be different? I could have use for that.
>>
>> Because of way postgresql replication is implemented.
>
> There are multiple types of PostgreSQL replication, and there will be
> others in the future.

That's true, but doesn't allowing every parameter to be multiply
specified greatly increase the implementation complexity for a pretty
marginal benefit?  I think host and IP would hit 98% of the use cases
here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Fri, 30 Oct 2015 14:26:45 +0100
Robert Haas <robertmhaas@gmail.com> wrote:

> On Thu, Oct 29, 2015 at 8:29 PM, Peter Eisentraut <peter_e@gmx.net>
> wrote:

> 
> That's true, but doesn't allowing every parameter to be multiply
> specified greatly increase the implementation complexity for a pretty
> marginal benefit?  I think host and IP would hit 98% of the use cases
> here.

As far as I can tell from the experience of writing this patch, it
would greatly increase complexity.

If there should be only need to have multiple hosts, I could almost
completely incapsulate changes into DNS resolving code (which already
allows to handle several addresses). Need to support different port for
each host already required change of internal storage, and as a
consequence changes in the regression test suite
(src/interfaces/libpq/test/regress.out)

But both host and port are used in the same place - in the connect
system call. If we add possibility to different values per host for some
parameter, such as database name, which should be used significantly
later, i.e. during sending of first protocol message, size of patch
would grow may be twice.




Re: Patch: Implement failover on libpq connect level.

From
Christopher Browne
Date:
<div dir="ltr">On 30 October 2015 at 09:26, Robert Haas <<a
href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>>wrote:<br />><br />> On Thu, Oct 29, 2015 at
8:29PM, Peter Eisentraut <<a href="mailto:peter_e@gmx.net">peter_e@gmx.net</a>> wrote:<br />> > On 10/28/15
4:18AM, Victor Wagner wrote:<br />> >> On Mon, 26 Oct 2015 16:25:57 -0400<br />> >> Peter Eisentraut
<<ahref="mailto:peter_e@gmx.net">peter_e@gmx.net</a>> wrote:<br />> >><br />> >>> Also, this
assumesthat all the components other than host and port<br />> >>> are the same.  Earlier there was a
discussionabout why the ports<br />> >>> would ever need to be different.  Well, why can't the database
names<br/>> >>> be different? I could have use for that.<br />> >><br />> >> Because of
waypostgresql replication is implemented.<br />> ><br />> > There are multiple types of PostgreSQL
replication,and there will be<br />> > others in the future.<br />><br />> That's true, but doesn't
allowingevery parameter to be multiply<br />> specified greatly increase the implementation complexity for a
pretty<br/>> marginal benefit?  I think host and IP would hit 98% of the use cases<br />> here.<br /><br />I
thinkit makes the feature WORSE.  I am getting more and more convinced<br />that the Correct Solution is for this
featureto be handled by submitting<br />multiple URIs, and my argument isn't even based on any aspects of<br
/>implementationcomplexity.<br /><br />Take as example the case where I have two database servers I want to<br />be
considered.<br/><br />a) Database with URI<br />   postgresql://<a
href="http://cbbrowne@server-a.example.info:5432/my-first-database">cbbrowne@server-a.example.info:5432/my-first-database</a><br
/><br/>b) Database with URL<br />   postgresql://<a
href="http://robert@server-b.example.info:7032/my-second-database">robert@server-b.example.info:7032/my-second-database</a><br
/><br/>With all the "per-variable multiplicities", this would turn into a combinatorial explosion of combinations, some
2^4,or 16 possible servers, some of which likely don't exist, and others of which aren't proper (e.g. - trying to
connectto database a) using robert's credentials).<br /><br />Possibly some of those combinations are outright
improper.<br/><br />I'm not going to claim it's terribly wise to be doing the cross-credential bit; I'd think it likely
tobe rather dumb for the application to use one user when connecting to one database and another when connecting to
another. But the notion of it being nondeterministic is just awful.<br /><br />I head back to... "the way OpenLDAP does
this"...<br/>  They let you specify multiple LDAP servers...<br />    ldap://<a
href="http://server1.example.info:389">server1.example.info:389</a>ldap://<a
href="http://server2.example.info:389">server2.example.info:389</a><br/>where URIs are separated by whitespace.<br
/><br/>Seems like Best Goodness for Postgres to do something analogous...<br />   postgresql://<a
href="http://cbbrowne@server-a.example.info:5432/my-first-database">cbbrowne@server-a.example.info:5432/my-first-database</a>
postgresql://<a
href="http://robert@server-b.example.info:7032/my-second-database">robert@server-b.example.info:7032/my-second-database</a><br
/><br/>Is it a bit long?  Sure.  But it is unambiguous, and requires *only* whitespace parsing to separate the URIs. 
I'dthink it fine for Postgres to support this via multiple "-d" options; that would be about as good.<br /><br />That
cancover 100% of cases, and I don't see it needing to interact specially with odd bits such as "was that a replication
slot?" You can always choose to shoot yourself in the foot, but this doesn't spin the roulette for you...<br />--<br
/>Whenconfronted by a difficult problem, solve it by reducing it to the<br />question, "How would the Lone Ranger
handlethis?"</div> 

Re: Patch: Implement failover on libpq connect level.

From
David Fetter
Date:
On Fri, Oct 30, 2015 at 11:29:09AM -0400, Christopher Browne wrote:
> On 30 October 2015 at 09:26, Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Thu, Oct 29, 2015 at 8:29 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> > > On 10/28/15 4:18 AM, Victor Wagner wrote:
> > >> On Mon, 26 Oct 2015 16:25:57 -0400
> > >> Peter Eisentraut <peter_e@gmx.net> wrote:
> > >>
> > >>> Also, this assumes that all the components other than host and port
> > >>> are the same.  Earlier there was a discussion about why the ports
> > >>> would ever need to be different.  Well, why can't the database names
> > >>> be different? I could have use for that.
> > >>
> > >> Because of way postgresql replication is implemented.
> > >
> > > There are multiple types of PostgreSQL replication, and there will be
> > > others in the future.
> >
> > That's true, but doesn't allowing every parameter to be multiply
> > specified greatly increase the implementation complexity for a pretty
> > marginal benefit?  I think host and IP would hit 98% of the use cases
> > here.
> 
> I think it makes the feature WORSE.  I am getting more and more convinced
> that the Correct Solution is for this feature to be handled by submitting
> multiple URIs, and my argument isn't even based on any aspects of
> implementation complexity.
> 
> Take as example the case where I have two database servers I want to
> be considered.
> 
> a) Database with URI
>    postgresql://cbbrowne@server-a.example.info:5432/my-first-database
> 
> b) Database with URL
>    postgresql://robert@server-b.example.info:7032/my-second-database
> 
> With all the "per-variable multiplicities", this would turn into a
> combinatorial explosion of combinations, some 2^4, or 16 possible servers,
> some of which likely don't exist, and others of which aren't proper (e.g. -
> trying to connect to database a) using robert's credentials).
> 
> Possibly some of those combinations are outright improper.

Let's say that the chances of their all both existing and being proper
are close enough to zero not to matter.

> Seems like Best Goodness for Postgres to do something analogous...
>    postgresql://cbbrowne@server-a.example.info:5432/my-first-database
> postgresql://robert@server-b.example.info:7032/my-second-database

Yes.

> Is it a bit long?  Sure.  But it is unambiguous, and requires *only*
> whitespace parsing to separate the URIs.  I'd think it fine for Postgres to
> support this via multiple "-d" options; that would be about as good.

Indeed.  Is there any risk of losing ordering information in the
standard command line processing libraries?

> That can cover 100% of cases, and I don't see it needing to interact
> specially with odd bits such as "was that a replication slot?"  You can
> always choose to shoot yourself in the foot, but this doesn't spin the
> roulette for you...

An evocative image, if not a pretty one.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Patch: Implement failover on libpq connect level.

From
Josh Berkus
Date:
On 10/30/2015 08:29 AM, Christopher Browne wrote:
> I think it makes the feature WORSE.  I am getting more and more convinced
> that the Correct Solution is for this feature to be handled by submitting
> multiple URIs, and my argument isn't even based on any aspects of
> implementation complexity.

+1

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Patch: Implement failover on libpq connect level.

From
Peter Eisentraut
Date:
On 10/30/15 9:26 AM, Robert Haas wrote:
> That's true, but doesn't allowing every parameter to be multiply
> specified greatly increase the implementation complexity for a pretty
> marginal benefit?

Well, the way I would have approached is that after all the parsing you
end up with a list of PQconninfoOption arrays and you you loop over that
list and call connectDBStart() until you have a connection that
satisfies you.

I see that the patch doesn't do that and it looks quite messy as a result.




Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Mon, Nov 2, 2015 at 12:15 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On 10/30/15 9:26 AM, Robert Haas wrote:
>> That's true, but doesn't allowing every parameter to be multiply
>> specified greatly increase the implementation complexity for a pretty
>> marginal benefit?
>
> Well, the way I would have approached is that after all the parsing you
> end up with a list of PQconninfoOption arrays and you you loop over that
> list and call connectDBStart() until you have a connection that
> satisfies you.
>
> I see that the patch doesn't do that and it looks quite messy as a result.

Hmm, I see.  That approach certainly seems worth considering.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On 2015.11.02 at 12:15:48 -0500, Peter Eisentraut wrote:

>> That's true, but doesn't allowing every parameter to be multiply
>> specified greatly increase the implementation complexity for a pretty
>> marginal benefit?  

>Well, the way I would have approached is that after all the parsing you
>end up with a list of PQconninfoOption arrays and you you loop over that
>list and call connectDBStart() until you have a connection that
>satisfies you.

>I see that the patch doesn't do that and it looks quite messy as a
>result.

There are some drawbacks with this approach

1. You are suggesting usage of two nested loops, instead of one straight
loop - outer loop over the URLs in the connect string, inner loop over
the IP addresses each URL resolves into. (This inner loop already
presents there for several versions of the postgres).

2. You are suggesting that it should be possible to specify all options 
separately for each of the fallback hosts. 

But some options control how
next host should be choosen (i.e. use random order for load-balancing
or sequential order for high availability), so they should be specified
only once per connect string.

Some other options are
connection level rather then node-level (i.e. if you are using
replication mode of the connection, there is no sense to specify it for
one host and not specify for other). These options should be better
specified such way, that it would be not possible to use conflicting
values.

Third category of options are specified per-cluster much more often
than per node. But are quite often changed from compiled in default.
Typically there are authentication credentials (even you use
trigger-based replication, user information would probably be
replicated), database name (if you use WAL-level replication), client
encoding (which depends on more on the client than on server), etc.

It would be pain if we would need specify them for each host separately.
Syntax, implemented in the my patch allows even to specify
nonstandard-port once for all hosts (using port= parameter in the query
string) and override it for particular host in the list using colon
syntax.

3. Last, but not least. There is more than one postgresql client
implementation. Apart of libpq we have jdbc, native GO language client
etc. And I think that maintaining compatibility of the connect string
between them is good thing. So, I've modelled syntax of multihost
feature in the libpq URL-style syntax after jdbc syntax.
--                                   Victor Wagner <vitus@wagner.pp.ru>



Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Thu, Nov 5, 2015 at 10:12 AM, Victor Wagner <vitus@wagner.pp.ru> wrote:
> There are some drawbacks with this approach
>
> 1. You are suggesting usage of two nested loops, instead of one straight
> loop - outer loop over the URLs in the connect string, inner loop over
> the IP addresses each URL resolves into. (This inner loop already
> presents there for several versions of the postgres).

This isn't really a problem.

> 2. You are suggesting that it should be possible to specify all options
> separately for each of the fallback hosts.
>
> But some options control how
> next host should be choosen (i.e. use random order for load-balancing
> or sequential order for high availability), so they should be specified
> only once per connect string.

But this seems like a point worthy of consideration.

> Third category of options are specified per-cluster much more often
> than per node. But are quite often changed from compiled in default.
> Typically there are authentication credentials (even you use
> trigger-based replication, user information would probably be
> replicated), database name (if you use WAL-level replication), client
> encoding (which depends on more on the client than on server), etc.
>
> It would be pain if we would need specify them for each host separately.
> Syntax, implemented in the my patch allows even to specify
> nonstandard-port once for all hosts (using port= parameter in the query
> string) and override it for particular host in the list using colon
> syntax.

This, too.

> 3. Last, but not least. There is more than one postgresql client
> implementation. Apart of libpq we have jdbc, native GO language client
> etc. And I think that maintaining compatibility of the connect string
> between them is good thing. So, I've modelled syntax of multihost
> feature in the libpq URL-style syntax after jdbc syntax.

Also this.

I'm not sure what the right decision is here - I can see both sides of
it to some degree.  But I think your points deserve to be taken
seriously.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Craig Ringer
Date:
On 6 November 2015 at 13:34, Robert Haas <robertmhaas@gmail.com> wrote:

>> But some options control how
>> next host should be choosen (i.e. use random order for load-balancing
>> or sequential order for high availability), so they should be specified
>> only once per connect string.
>
> But this seems like a point worthy of consideration.

This makes me think that trying to wedge this into the current API
using a funky connection string format might be a mistake.

Lots of these issues would go away if you could provide more than just
a connstring.

>> Third category of options are specified per-cluster much more often
>> than per node. But are quite often changed from compiled in default.
>
> This, too.

Like this. If you have a global set of connection options, then
per-connection options, it's a lot simpler.

I guess that can be hacked in with a more dramatic change in the
connstring format, otherwise, incorporating subsections or something.
I'm not keen on doing anything like that, but there are all sorts of
options...

"global:[user=fred port=5432] host1:[host=somehost user=user1]
host2:[host=localhost]"


(puts head in brown paper bag)

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Patch: Implement failover on libpq connect level.

From
Alvaro Herrera
Date:
Craig Ringer wrote:
> On 6 November 2015 at 13:34, Robert Haas <robertmhaas@gmail.com> wrote:
> 
> >> But some options control how
> >> next host should be choosen (i.e. use random order for load-balancing
> >> or sequential order for high availability), so they should be specified
> >> only once per connect string.
> >
> > But this seems like a point worthy of consideration.
> 
> This makes me think that trying to wedge this into the current API
> using a funky connection string format might be a mistake.
> 
> Lots of these issues would go away if you could provide more than just
> a connstring.

Yes, I agree.  I wonder if the failover aspect couldn't be better
covered by something more powerful than a single URI, such as the
service file format.  Maybe just allow the contents of a service file to
be passed as a "connection string", so that the application/environment
can continue to maintain the connection info as a string somewhere
instead of having to have an actual file.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Fri, Nov 6, 2015 at 10:38 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Craig Ringer wrote:
>> On 6 November 2015 at 13:34, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> >> But some options control how
>> >> next host should be choosen (i.e. use random order for load-balancing
>> >> or sequential order for high availability), so they should be specified
>> >> only once per connect string.
>> >
>> > But this seems like a point worthy of consideration.
>>
>> This makes me think that trying to wedge this into the current API
>> using a funky connection string format might be a mistake.
>>
>> Lots of these issues would go away if you could provide more than just
>> a connstring.
>
> Yes, I agree.  I wonder if the failover aspect couldn't be better
> covered by something more powerful than a single URI, such as the
> service file format.  Maybe just allow the contents of a service file to
> be passed as a "connection string", so that the application/environment
> can continue to maintain the connection info as a string somewhere
> instead of having to have an actual file.

This gets pretty far from the concept of a connection string, which is
supposed to be a sort of universal format for telling anything
PostgreSQL how to find the database server.  For example, psql accepts
a connection string, but you wouldn't want to have to pass the entire
contents of a file, newlines and all, as a command-line argument.
Now, we could design some kind of new connection string format that
would work, like:

psql 'alternatives failovertime=1 (host=192.168.10.49 user=rhaas)
(host=192.168.10.52 user=bob)'

There are a couple of problems with this.  One, it can't be parsed
anything like a normal connection string.  Lots of software will have
to be rewritten to use whatever parser we come up with for the new
format.  Two, it's nothing like the existing JDBC syntax that already
does more or less what people want here.  Three, I think that
implementing it is likely to be an extraordinarily large project.  The
whole data structure that libpq uses to store a parsed connection
string probably needs to change in fairly major ways.

So, I really wonder why we're not happy with the ability to substitute
out just the host and IP.

http://www.postgresql.org/message-id/4DFA5A86.9030301@acm.org
https://jdbc.postgresql.org/documentation/94/connect.html (bottom of page)

Yes, it's possible that some people might need to change some other
parameter when the host or IP changes.  But those people aren't
prohibited from writing their own function that tries to connect to
multiple PostgreSQL servers one after the other using any system
they'd like.  I don't think we really need to cater to every possible
use case in core.  The JDBC feature has been out for several years and
people seem to like it OK - the fact that we can think of things it
doesn't do doesn't make it a bad idea.  I'd rather have a feature that
can do 95% of what people want in the real world next month than 99%
of what people want in 2019, and I'm starting to think we're setting
the bar awfully high.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Craig Ringer
Date:
On 7 November 2015 at 04:59, Robert Haas <robertmhaas@gmail.com> wrote:

> So, I really wonder why we're not happy with the ability to substitute
> out just the host and IP.

I tend to agree. That solves 95% of the problem and doesn't foreclose
solving the other 5% some other way if someone really cares later.

I'd rather not see convoluted and complex connstrings, per my prior
post. The JDBC extended URL style seems good enough.

IMO the biggest problem is that client-side failover is way more
complex than "got -ECONNREFUSED, try next host". I think we're all
focusing on not being able to twiddle arbitrary settings while
ignoring the real problem with client failover, i.e. making it
actually work in the real world.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Mon, 9 Nov 2015 15:14:02 +0800
Craig Ringer <craig@2ndquadrant.com> wrote:


> 
> I'd rather not see convoluted and complex connstrings, per my prior
> post. The JDBC extended URL style seems good enough.

It is what my patch implements.
> IMO the biggest problem is that client-side failover is way more
> complex than "got -ECONNREFUSED, try next host". I think we're all
> focusing on not being able to twiddle arbitrary settings while
> ignoring the real problem with client failover, i.e. making it
> actually work in the real world.
> 
I've tried to deal with some of these problems. 

My patch have support for following things:

1. Check whether database instance is in the recovery/standby mode and
try to find another one if so.
2. Let cluster management software to have some time to promote one of
the standbys to master. I.e. there can be failover timeout specified to
allow retry after some time if no working master found.

Really there is room for some improvements in handling of connect
timeout (which became much more important thing when ability to try
next host appears). Now it is handled only by blocking-mode connect
functions, not by async state machine. But I decided to publish patch
without these improvements to get feedback from community.





Re: Patch (3): Implement failover on libpq connect level.

From
Thom Brown
Date:
On 26 October 2015 at 07:58, Victor Wagner <vitus@wagner.pp.ru> wrote:
> On 2015.10.14 at 13:41:51 +0300, Victor Wagner wrote:
>
>> Attached patch which implements client library failover and
>> loadbalancing as was described in the proposal
>> <20150818041850.GA5092@wagner.pp.ru>.
>
> New version of patch
>
> 1. Handles replication connections correctly (i.e doesn't attempt to
> check readwrite mode if replication option is on)
> 2. Some minor improvement recommended by Korry Douglas  in
> <562A9259.4060408@enterprisedb.com>

This patch doesn't apply.  On line 636, this appears:

<<<<<<< BEGIN MERGE CONFLICT: local copy shown first <<<<<<<<<<<<<<<
+                                       if (value[0] ==  't')
======= COMMON ANCESTOR content follows ============================
+                                       if (strcmp(value, "true") == 0)
======= MERGED IN content follows ==================================
+                                       if (value[0]=='t')
>>>>>>> END MERGE CONFLICT >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Thom



Re: Patch: Implement failover on libpq connect level.

From
Peter Eisentraut
Date:
On 11/6/15 3:59 PM, Robert Haas wrote:
> So, I really wonder why we're not happy with the ability to substitute
> out just the host and IP.

One of my concerns is that the proposed URLs are not valid URLs anymore
that can be parsed or composed with a URL library, which would be sad.

The other issue is that I think it's taking the implementation down the
wrong path.



Re: Patch: Implement failover on libpq connect level.

From
Peter Eisentraut
Date:
On 11/5/15 10:12 AM, Victor Wagner wrote:
> 2. You are suggesting that it should be possible to specify all options 
> separately for each of the fallback hosts. 

I'm not necessarily suggesting that.  I think it could very well be

conn="postgresql://foo postgresql://bar order=random"




Re: Patch (3): Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Tue, 17 Nov 2015 19:42:42 +0000
Thom Brown <thom@linux.com> wrote:

>
> This patch doesn't apply.  On line 636, this appears:


It doesn't apply to current state of the master branch due
to changes in fe-connect.c made by Tom Lane in the commit c40591885.
Unfortunately these changes appears in the source just line by line
with my changes. I'm attaching here updated version of the patch,
applicable to master later that 11-12-2015.



Attachment

Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Tue, Nov 17, 2015 at 3:20 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On 11/6/15 3:59 PM, Robert Haas wrote:
>> So, I really wonder why we're not happy with the ability to substitute
>> out just the host and IP.
>
> One of my concerns is that the proposed URLs are not valid URLs anymore
> that can be parsed or composed with a URL library, which would be sad.

I agree that's sad.  But compatibility with JDBC is happy, which is
something to think about, at least.

> The other issue is that I think it's taking the implementation down the
> wrong path.

Well, I don't object to the idea of having some kind of a function
that can take a set of connection strings or URLs and try them all.
But then it's not really a connection string any more.  I mean, with
the proposal where we allow the host option to be multiply specified,
you can just plug this feature into any utility whatever that
understands connection strings, and regardless of whether you are
using the classic connection string format or the URL format, it will
work.  And it's a pretty simple extension of what we support already.

psql 'host=foo host=bar failovertime=1'

If we do what you're proposing, then I'm really not clear on how this
works.  Certainly, there can be new libpq functions that take a list
of connection strings or URLs as an argument... but what would this
look like if you wanted to invoke psql this way?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Korry Douglas
Date:
> I've tried to deal with some of these problems.
>
> My patch have support for following things:
>
> 1. Check whether database instance is in the recovery/standby mode and
> try to find another one if so.
> 2. Let cluster management software to have some time to promote one of
> the standbys to master. I.e. there can be failover timeout specified to
> allow retry after some time if no working master found.
>
> Really there is room for some improvements in handling of connect
> timeout (which became much more important thing when ability to try
> next host appears). Now it is handled only by blocking-mode connect
> functions, not by async state machine. But I decided to publish patch
> without these improvements to get feedback from community.
A bit of testing on this turns up a problem.

Consider a connection string that specifies two hosts and a read/write 
connection:
  postgresql://korry@127.0.0.1:5301,127.0.0.1:5300/edb?readonly=0

If the first host is a healthy standby (meaning that I can connect to it 
but pg_is_in_recovery() returns 't'), the state machine will never move 
on to the second host.

The problem seems to be in PQconnectPoll() in the case for 
CONNECTION_AUTH_OK, specifically this code:
  /* We can release the address list now. */  pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist);
conn->addrlist= NULL;  conn->addr_cur = NULL;
 

That frees up the list of alternative host addresses.  The state machine 
then progresses to CONNECTION_CHECK_RO (which invokes 
pg_is_in_recovery()), then CONNECTION_CHECK_RW (waiting for the response 
from the server).  Since we just connected to a standby replica, 
pg_is_in_recovery() returns 't' and the state changes to 
CONNECTION_NEEDED.  The next call to try_next_address() will fail to 
find a next address because we freed the list in the case for 
CONNECTION_AUTH_OK.

A related issue:  when the above sequence occurs, no error message is 
returned (because the case for CONNECTION_NEEDED thinks "an appropriate 
error message is already set up").

In short, if you successfully connect to standby replica (and specify 
readonly=0), the remaining hosts are ignored, even though one of those 
hosts is a master.

And one comment about the code itself - in connectDBStart(), you've 
added quite a bit of code to parse multiple hosts/ports. I would 
recommend adding a comment that shows the expected format, and then 
choosing better variable names (better than 'p', 'q', and 'r'); perhaps 
the variable names could refer to components of the connection string 
that you are parsing (like 'host', 'port', 'delimiter', ...).  That 
would make the code much easier to read/maintain.

Thanks.

                -- Korry




Re: Patch: Implement failover on libpq connect level.

From
Korry Douglas
Date:


>
>> I've tried to deal with some of these problems.
>>
>> My patch have support for following things:
>>
>> 1. Check whether database instance is in the recovery/standby mode and
>> try to find another one if so.
>> 2. Let cluster management software to have some time to promote one of
>> the standbys to master. I.e. there can be failover timeout specified to
>> allow retry after some time if no working master found.
>>
>> Really there is room for some improvements in handling of connect
>> timeout (which became much more important thing when ability to try
>> next host appears). Now it is handled only by blocking-mode connect
>> functions, not by async state machine. But I decided to publish patch
>> without these improvements to get feedback from community.
> A bit of testing on this turns up a problem.
>
> Consider a connection string that specifies two hosts and a read/write 
> connection:
>
>   postgresql://korry@127.0.0.1:5301,127.0.0.1:5300/edb?readonly=0
>
> If the first host is a healthy standby (meaning that I can connect to 
> it but pg_is_in_recovery() returns 't'), the state machine will never 
> move on to the second host.
>
> The problem seems to be in PQconnectPoll() in the case for 
> CONNECTION_AUTH_OK, specifically this code:
>
>   /* We can release the address list now. */
>   pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist);
>   conn->addrlist = NULL;
>   conn->addr_cur = NULL;
>
> That frees up the list of alternative host addresses.  The state 
> machine then progresses to CONNECTION_CHECK_RO (which invokes 
> pg_is_in_recovery()), then CONNECTION_CHECK_RW (waiting for the 
> response from the server).  Since we just connected to a standby 
> replica, pg_is_in_recovery() returns 't' and the state changes to 
> CONNECTION_NEEDED.  The next call to try_next_address() will fail to 
> find a next address because we freed the list in the case for 
> CONNECTION_AUTH_OK.
>
> A related issue:  when the above sequence occurs, no error message is 
> returned (because the case for CONNECTION_NEEDED thinks "an 
> appropriate error message is already set up").
>
> In short, if you successfully connect to standby replica (and specify 
> readonly=0), the remaining hosts are ignored, even though one of those 
> hosts is a master.

A follow-up - the conn->addrlist is also freed when the case for 
CONNECTION_CHECK_RW decides that conn->status != CONNECTION_OK and calls 
closePGConn().

                -- Korry



Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Mon, 07 Dec 2015 15:26:33 -0500
Korry Douglas <korry.douglas@enterprisedb.com> wrote:


> The problem seems to be in PQconnectPoll() in the case for
> CONNECTION_AUTH_OK, specifically this code:
>
>    /* We can release the address list now. */
>    pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist);
>    conn->addrlist = NULL;
>    conn->addr_cur = NULL;
> That frees up the list of alternative host addresses.  The state
> machine then progresses to CONNECTION_CHECK_RO (which invokes
> pg_is_in_recovery()), then CONNECTION_CHECK_RW (waiting for the

Thank you for pointing to this problem. I've overlooked it. Probably
I should improve my testing scenario.

I', attaching new version of the patch, which, hopefully, handles
address list freeing correctly.




--
                                   Victor Wagner <vitus@wagner.pp.ru>

Attachment

Re: Patch: Implement failover on libpq connect level.

From
Teodor Sigaev
Date:
Sorry, but there is something wrong with your patch:
% patch -p1 -C < ~/Downloads/libpq-failover-5.patch
....
--------------------------
|diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
...
Hunk #18 succeeded at 2804.
patch: **** malformed patch at line 666: <<<<<<< BEGIN MERGE CONFLICT: local 
copy shown first <<<<<<<<<<<<<<<

Victor Wagner wrote:
> On Mon, 07 Dec 2015 15:26:33 -0500
> Korry Douglas <korry.douglas@enterprisedb.com> wrote:
>
>
>> The problem seems to be in PQconnectPoll() in the case for
>> CONNECTION_AUTH_OK, specifically this code:
>>
>>     /* We can release the address list now. */
>>     pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist);
>>     conn->addrlist = NULL;
>>     conn->addr_cur = NULL;
>> That frees up the list of alternative host addresses.  The state
>> machine then progresses to CONNECTION_CHECK_RO (which invokes
>> pg_is_in_recovery()), then CONNECTION_CHECK_RW (waiting for the
>
> Thank you for pointing to this problem. I've overlooked it. Probably
> I should improve my testing scenario.
>
> I', attaching new version of the patch, which, hopefully, handles
> address list freeing correctly.
>
>
>
>
>
>
>

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Mon, 21 Dec 2015 17:18:37 +0300
Teodor Sigaev <teodor@sigaev.ru> wrote:

> Sorry, but there is something wrong with your patch:
> % patch -p1 -C < ~/Downloads/libpq-failover-5.patch

Really, somehow broken version of the patch got into message.

Here is correct one.


Attachment

Re: Patch: Implement failover on libpq connect level.

From
Alvaro Herrera
Date:
Victor Wagner wrote:
> On Mon, 21 Dec 2015 17:18:37 +0300
> Teodor Sigaev <teodor@sigaev.ru> wrote:
> 
> > Sorry, but there is something wrong with your patch:
> > % patch -p1 -C < ~/Downloads/libpq-failover-5.patch
> 
> Really, somehow broken version of the patch got into message.

It would be good to add a few test cases of this new functionality to
libpq's regress.in file.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Patch: Implement failover on libpq connect level.

From
Michael Paquier
Date:
On Mon, Dec 21, 2015 at 11:50 PM, Victor Wagner <vitus@wagner.pp.ru> wrote:
> On Mon, 21 Dec 2015 17:18:37 +0300
> Teodor Sigaev <teodor@sigaev.ru> wrote:
>
>> Sorry, but there is something wrong with your patch:
>> % patch -p1 -C < ~/Downloads/libpq-failover-5.patch
>
> Really, somehow broken version of the patch got into message.
>
> Here is correct one.

Patch is moved to next CF as you are still working on it.
-- 
Michael



Re: Patch: Implement failover on libpq connect level.

From
Thom Brown
Date:
On 21 December 2015 at 14:50, Victor Wagner <vitus@wagner.pp.ru> wrote:
> On Mon, 21 Dec 2015 17:18:37 +0300
> Teodor Sigaev <teodor@sigaev.ru> wrote:
>
>> Sorry, but there is something wrong with your patch:
>> % patch -p1 -C < ~/Downloads/libpq-failover-5.patch
>
> Really, somehow broken version of the patch got into message.
>
> Here is correct one.

The segfault issue I originally reported now appears to be resolved.

But now I have another one:

psql
'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random&readonly=1&failover_timeout=5'
-c 'show port'

Segmentation fault

This is where no nodes are available.  If I remove hostorder=random,
or replace it with hostorder=sequential, it doesn't segfault.
However, it then tries to connect to PGHOST on PGPORT repeatedly, even
if I bring up one of the nodes it should be looking for.  Not only
this, but it seems to do it forever if failover_timeout is greater
than 0.

Thom



Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Tue, 19 Jan 2016 14:34:54 +0000
Thom Brown <thom@linux.com> wrote:

> The segfault issue I originally reported now appears to be resolved.
> 
> But now I have another one:
> 
> psql
>
'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random&readonly=1&failover_timeout=5'
> -c 'show port'
> 
> Segmentation fault

This segfault appears to be a simple programming error. 
which comes out only when  we need to retry address list (which happens
when no node is available at the time of first scan)

There is code which prevents random hostorder to connect same node
twice a row
       while (current != NULL)       {           if (current == conn->addr_cur)           {               current =
current->ai_next;          }           count++;           if ((rand()&0xffff) < 0x10000 / count)           {
  choice = current;           }           current = current->ai_next;       }
 


There should be continue;  after first current = current->ai_next;
> This is where no nodes are available.  If I remove hostorder=random,
> or replace it with hostorder=sequential, it doesn't segfault.
> However, it then tries to connect to PGHOST on PGPORT repeatedly, even
> if I bring up one of the nodes it should be looking for.  Not only
> this, but it seems to do it forever if failover_timeout is greater
> than 0.

But this turns out to be more complicated matter, so I'm not ready to
provide next version of patch yet.








Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Tue, 19 Jan 2016 14:34:54 +0000
Thom Brown <thom@linux.com> wrote:

>
> The segfault issue I originally reported now appears to be resolved.
>
> But now I have another one:
>
> psql
>
'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random&readonly=1&failover_timeout=5'
> -c 'show port'

Here is new version of the patch. Now I've reworked hostorder=random and
it seems to work as well as sequential. failover_timeout works too.
I've also found a case when attempt to connect fail when receiving
FATAL message from server which is not properly up yet. So, it is fixed
too.

Addititonally, error messages from all failed connect attempts are not
accumulated now. Only last one is returned.





--
                                   Victor Wagner <vitus@wagner.pp.ru>

Attachment

Re: Patch: Implement failover on libpq connect level.

From
Alvaro Herrera
Date:
You're editing the expected file for the libpq-regress thingy, but you
haven't added any new lines to test the new capability.  I think it'd be
good to add some there.  (I already said this earlier in the thread; is
there any reason you ignored it the first time?)

If the test program requires improvement to handle the new stuff, let's
do that.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Patch: Implement failover on libpq connect level.

From
Thom Brown
Date:
On 22 January 2016 at 19:30, Victor Wagner <vitus@wagner.pp.ru> wrote:
> On Tue, 19 Jan 2016 14:34:54 +0000
> Thom Brown <thom@linux.com> wrote:
>
>>
>> The segfault issue I originally reported now appears to be resolved.
>>
>> But now I have another one:
>>
>> psql
>>
'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random&readonly=1&failover_timeout=5'
>> -c 'show port'
>
> Here is new version of the patch. Now I've reworked hostorder=random and
> it seems to work as well as sequential. failover_timeout works too.
> I've also found a case when attempt to connect fail when receiving
> FATAL message from server which is not properly up yet. So, it is fixed
> too.
>
> Addititonally, error messages from all failed connect attempts are not
> accumulated now. Only last one is returned.

I can't connect to a standby with the patch applied:

thom@swift:~/Development/test$ psql -p 5531 postgres
psql: thom@swift:~/Development/test$

No error message, nothing in the logs.  I find this is the case with
any standby, but doesn't affect primaries.

So this has broken existing functionality somewhere.

Thom



Re: Patch: Implement failover on libpq connect level.

From
Thom Brown
Date:
On 23 January 2016 at 03:32, Thom Brown <thom@linux.com> wrote:
> On 22 January 2016 at 19:30, Victor Wagner <vitus@wagner.pp.ru> wrote:
>> On Tue, 19 Jan 2016 14:34:54 +0000
>> Thom Brown <thom@linux.com> wrote:
>>
>>>
>>> The segfault issue I originally reported now appears to be resolved.
>>>
>>> But now I have another one:
>>>
>>> psql
>>>
'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random&readonly=1&failover_timeout=5'
>>> -c 'show port'
>>
>> Here is new version of the patch. Now I've reworked hostorder=random and
>> it seems to work as well as sequential. failover_timeout works too.
>> I've also found a case when attempt to connect fail when receiving
>> FATAL message from server which is not properly up yet. So, it is fixed
>> too.
>>
>> Addititonally, error messages from all failed connect attempts are not
>> accumulated now. Only last one is returned.
>
> I can't connect to a standby with the patch applied:
>
> thom@swift:~/Development/test$ psql -p 5531 postgres
> psql: thom@swift:~/Development/test$
>
> No error message, nothing in the logs.  I find this is the case with
> any standby, but doesn't affect primaries.
>
> So this has broken existing functionality somewhere.

Okay, I've tested my original complaints, and they appear to be
resolved.  The timeout works fine, the sequential and random orders
behave as expected, and the readonly setting is also working.

Thom



Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Fri, 22 Jan 2016 16:36:15 -0300
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> You're editing the expected file for the libpq-regress thingy, but you
> haven't added any new lines to test the new capability.  I think it'd
> be good to add some there.  (I already said this earlier in the
> thread; is there any reason you ignored it the first time?)

I seriously doubt that this program can be used to test new
capabilities.

All it does, it calls PQconninfoParse and than examines some fields of
PGconn structure.

The only reason I've to modify expected output is that I changed usage
of one of these field, and keep there comma-separated list of host:port
pairs instead of just hostname.

Thus contents this field after parsing of some existing URIs is
changed, while semantic of URIs is same.

If I add some new uris, than only thing I can test is that comma is
properly copied from the URI to this field. And may be that some syntax
errors are properly detected.

> If the test program requires improvement to handle the new stuff,
> let's do that.

The only improvement I can think of, is to examine list of the addrinfo
structures into which host list is eventually parsed. But it is quite
problematic, because it depends on many factors which are outside of
our control. 

It stores addresses resolved via OS's name resolver.

For example, if we specify 'localhost' there it can be parsed into one
or to records, depending on presence of IPv6 support.

If we use some other hostname here, we'll rely on internet connectivity
and DNS system. And we cannot ensure that any name to IP mapping will
persist for a long enough time. 

So, I think that new functionality need other approach for testing.
There should be test of real connection to real temporary cluster.
Probably, based on Perl TAP framework which is actively used in the
Postgres recently.
> 



--                                   Victor Wagner <vitus@wagner.pp.ru>



Re: Patch: Implement failover on libpq connect level.

From
Thom Brown
Date:
On 23 January 2016 at 03:32, Thom Brown <thom@linux.com> wrote:
> On 22 January 2016 at 19:30, Victor Wagner <vitus@wagner.pp.ru> wrote:
>> On Tue, 19 Jan 2016 14:34:54 +0000
>> Thom Brown <thom@linux.com> wrote:
>>
>>>
>>> The segfault issue I originally reported now appears to be resolved.
>>>
>>> But now I have another one:
>>>
>>> psql
>>>
'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random&readonly=1&failover_timeout=5'
>>> -c 'show port'
>>
>> Here is new version of the patch. Now I've reworked hostorder=random and
>> it seems to work as well as sequential. failover_timeout works too.
>> I've also found a case when attempt to connect fail when receiving
>> FATAL message from server which is not properly up yet. So, it is fixed
>> too.
>>
>> Addititonally, error messages from all failed connect attempts are not
>> accumulated now. Only last one is returned.
>
> I can't connect to a standby with the patch applied:
>
> thom@swift:~/Development/test$ psql -p 5531 postgres
> psql: thom@swift:~/Development/test$
>
> No error message, nothing in the logs.  I find this is the case with
> any standby, but doesn't affect primaries.
>
> So this has broken existing functionality somewhere.

Okay, I've tested this again with additional logging.  Again, I'm just
running "psql -p 5531 postgres", which connects to a standby.  This
immediately exits psql, and the logs show:

2016-01-24 15:04:59.879 GMT - [unknown] - [unknown] LOG:  00000:
connection received: host=[local]
2016-01-24 15:04:59.879 GMT - [unknown] - [unknown] LOCATION:
BackendInitialize, postmaster.c:4081
2016-01-24 15:04:59.880 GMT - thom - postgres LOG:  00000: connection
authorized: user=thom database=postgres
2016-01-24 15:04:59.880 GMT - thom - postgres LOCATION:
PerformAuthentication, postinit.c:272
2016-01-24 15:04:59.886 GMT - thom - postgres LOG:  00000: statement:
SELECT pg_catalog.pg_is_in_recovery()
2016-01-24 15:04:59.886 GMT - thom - postgres LOCATION:
exec_simple_query, postgres.c:935
2016-01-24 15:04:59.886 GMT - thom - postgres LOG:  00000: duration: 0.583 ms
2016-01-24 15:04:59.886 GMT - thom - postgres LOCATION:
exec_simple_query, postgres.c:1164
2016-01-24 15:04:59.886 GMT - thom - postgres LOG:  00000:
disconnection: session time: 0:00:00.007 user=thom database=postgres
host=[local]
2016-01-24 15:04:59.886 GMT - thom - postgres LOCATION:
log_disconnections, postgres.c:4458

This shouldn't be checking whether it's a standby.  I also noticed that with:

psql
'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535/postgres?hostorder=random&readonly=1'
-c 'show port'

The standby at port 5533 shows in the logs that it's checking whether
it's a standby when it happens to hit it.  Shouldn't this be
unnecessary if we've set "readonly" to 1?  The result of the query
doesn't appear to be useful for anything.

Another thing I've noticed is that the PORT variable (shown by \set)
always shows PGPORT, but I expect it to equal the port of whichever
host we successfully connected to.

Thom



Re: Patch: Implement failover on libpq connect level.

From
Thom Brown
Date:
On 24 January 2016 at 15:30, Thom Brown <thom@linux.com> wrote:
> On 23 January 2016 at 03:32, Thom Brown <thom@linux.com> wrote:
>> On 22 January 2016 at 19:30, Victor Wagner <vitus@wagner.pp.ru> wrote:
>>> On Tue, 19 Jan 2016 14:34:54 +0000
>>> Thom Brown <thom@linux.com> wrote:
>>>
>>>>
>>>> The segfault issue I originally reported now appears to be resolved.
>>>>
>>>> But now I have another one:
>>>>
>>>> psql
>>>>
'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random&readonly=1&failover_timeout=5'
>>>> -c 'show port'
>>>
>>> Here is new version of the patch. Now I've reworked hostorder=random and
>>> it seems to work as well as sequential. failover_timeout works too.
>>> I've also found a case when attempt to connect fail when receiving
>>> FATAL message from server which is not properly up yet. So, it is fixed
>>> too.
>>>
>>> Addititonally, error messages from all failed connect attempts are not
>>> accumulated now. Only last one is returned.
>>
>> I can't connect to a standby with the patch applied:
>>
>> thom@swift:~/Development/test$ psql -p 5531 postgres
>> psql: thom@swift:~/Development/test$
>>
>> No error message, nothing in the logs.  I find this is the case with
>> any standby, but doesn't affect primaries.
>>
>> So this has broken existing functionality somewhere.
>
> Okay, I've tested this again with additional logging.  Again, I'm just
> running "psql -p 5531 postgres", which connects to a standby.  This
> immediately exits psql, and the logs show:
>
> 2016-01-24 15:04:59.879 GMT - [unknown] - [unknown] LOG:  00000:
> connection received: host=[local]
> 2016-01-24 15:04:59.879 GMT - [unknown] - [unknown] LOCATION:
> BackendInitialize, postmaster.c:4081
> 2016-01-24 15:04:59.880 GMT - thom - postgres LOG:  00000: connection
> authorized: user=thom database=postgres
> 2016-01-24 15:04:59.880 GMT - thom - postgres LOCATION:
> PerformAuthentication, postinit.c:272
> 2016-01-24 15:04:59.886 GMT - thom - postgres LOG:  00000: statement:
> SELECT pg_catalog.pg_is_in_recovery()
> 2016-01-24 15:04:59.886 GMT - thom - postgres LOCATION:
> exec_simple_query, postgres.c:935
> 2016-01-24 15:04:59.886 GMT - thom - postgres LOG:  00000: duration: 0.583 ms
> 2016-01-24 15:04:59.886 GMT - thom - postgres LOCATION:
> exec_simple_query, postgres.c:1164
> 2016-01-24 15:04:59.886 GMT - thom - postgres LOG:  00000:
> disconnection: session time: 0:00:00.007 user=thom database=postgres
> host=[local]
> 2016-01-24 15:04:59.886 GMT - thom - postgres LOCATION:
> log_disconnections, postgres.c:4458
>
> This shouldn't be checking whether it's a standby.  I also noticed that with:
>
> psql
'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535/postgres?hostorder=random&readonly=1'
> -c 'show port'
>
> The standby at port 5533 shows in the logs that it's checking whether
> it's a standby when it happens to hit it.  Shouldn't this be
> unnecessary if we've set "readonly" to 1?  The result of the query
> doesn't appear to be useful for anything.
>
> Another thing I've noticed is that the PORT variable (shown by \set)
> always shows PGPORT, but I expect it to equal the port of whichever
> host we successfully connected to.

Actually, the same goes for the HOST variable, which is currently
showing the list of hosts:ports.

Output of \set variables without patch:

HOST = '127.0.0.1'
PORT = '5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535'

And with patch:

HOST = '127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535'
PORT = '5488'

They're both wrong, but I'm hoping we can just show the right information here.

Thom



Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Sun, 24 Jan 2016 15:30:22 +0000
Thom Brown <thom@linux.com> wrote:

в
> Okay, I've tested this again with additional logging.  Again, I'm just
> running "psql -p 5531 postgres", which connects to a standby.  This
> immediately exits psql, and the logs show:

> 2016-01-24 15:04:59.886 GMT - thom - postgres LOG:  00000: statement:
> SELECT pg_catalog.pg_is_in_recovery()
> This shouldn't be checking whether it's a standby.  I also noticed
> that with:

This is, of course, incompatibility with previous behavior. Probably,
I should modify this patch, so it would imply readonly flag if only one
host/port pair is specified in the command line.

Now it does check for standby regardless of number of hosts specified.



--                                   Victor Wagner <vitus@wagner.pp.ru>



Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Sun, 24 Jan 2016 15:58:10 +0000
Thom Brown <thom@linux.com> wrote:


> 
> Output of \set variables without patch:
> 
> HOST = '127.0.0.1'
> PORT =
> '5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535'
> 
> And with patch:
> 
> HOST =
> '127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535'
> PORT = '5488'
> 
> They're both wrong, but I'm hoping we can just show the right
> information here.

I think we should show right information here, but it is not so simple.

Problem is that I never keep symbolic representation of individual
host/port pair. And when we connect successfully, we have only struct
sockaddr representation of the it, which contain right IP
address, but doesn't contain symbolic host name.

Moreover, one hostname from connect string can produce more than one
addrinfo structures. For example, on the machines with IPv6 support,
'localhost' hostname is resolved into both 127.0.0.1 IPv4 address and
[::1] IPv6, and produces two records. 

So would do any name, which have both A and AAAA records in DNS. And
nothing prevent domain administrator to put more than one A record for
same hostname into DNS zone.


So, it is just same information which can be retrieved from the backend
via 

select inet_client_addr();
select inet_client_port();

What is really interesting for HOST and PORT variable - it is the name
of host and port number used to make actual connection, as they were
specified in the connect string or service file.


> 
> Thom



--                                   Victor Wagner <vitus@wagner.pp.ru>



Re: Patch: Implement failover on libpq connect level.

From
Thom Brown
Date:
On 24 January 2016 at 19:53, Victor Wagner <vitus@wagner.pp.ru> wrote:
> On Sun, 24 Jan 2016 15:58:10 +0000
> Thom Brown <thom@linux.com> wrote:
>
>
>>
>> Output of \set variables without patch:
>>
>> HOST = '127.0.0.1'
>> PORT =
>> '5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535'
>>
>> And with patch:
>>
>> HOST =
>> '127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535'
>> PORT = '5488'
>>
>> They're both wrong, but I'm hoping we can just show the right
>> information here.
>
> I think we should show right information here, but it is not so simple.
>
> Problem is that I never keep symbolic representation of individual
> host/port pair. And when we connect successfully, we have only struct
> sockaddr representation of the it, which contain right IP
> address, but doesn't contain symbolic host name.
>
> Moreover, one hostname from connect string can produce more than one
> addrinfo structures. For example, on the machines with IPv6 support,
> 'localhost' hostname is resolved into both 127.0.0.1 IPv4 address and
> [::1] IPv6, and produces two records.
>
> So would do any name, which have both A and AAAA records in DNS. And
> nothing prevent domain administrator to put more than one A record for
> same hostname into DNS zone.
>
>
> So, it is just same information which can be retrieved from the backend
> via
>
> select inet_client_addr();
> select inet_client_port();

I think you mean:

select inet_server_addr();
select inet_server_port();

> What is really interesting for HOST and PORT variable - it is the name
> of host and port number used to make actual connection, as they were
> specified in the connect string or service file.

And this is probably not the correct thing for it to report.  The
documentation says "The database server host you are currently
connected to." and "The database server port to which you are
currently connected.", so yeah, I'd expect to see those set to
whatever those 2 functions resolve to.  That being said, if one
connects via a domain socket, those appear to come back blank with
those functions, yet HOST and PORT report the correct information in
those cases (without passing in multiple hosts).  Is that a
pre-existing bug?

Thom



Re: Patch: Implement failover on libpq connect level.

From
Thom Brown
Date:
On 24 January 2016 at 20:11, Thom Brown <thom@linux.com> wrote:
> On 24 January 2016 at 19:53, Victor Wagner <vitus@wagner.pp.ru> wrote:
>> On Sun, 24 Jan 2016 15:58:10 +0000
>> Thom Brown <thom@linux.com> wrote:
>>
>>
>>>
>>> Output of \set variables without patch:
>>>
>>> HOST = '127.0.0.1'
>>> PORT =
>>> '5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535'
>>>
>>> And with patch:
>>>
>>> HOST =
>>> '127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535'
>>> PORT = '5488'
>>>
>>> They're both wrong, but I'm hoping we can just show the right
>>> information here.
>>
>> I think we should show right information here, but it is not so simple.
>>
>> Problem is that I never keep symbolic representation of individual
>> host/port pair. And when we connect successfully, we have only struct
>> sockaddr representation of the it, which contain right IP
>> address, but doesn't contain symbolic host name.
>>
>> Moreover, one hostname from connect string can produce more than one
>> addrinfo structures. For example, on the machines with IPv6 support,
>> 'localhost' hostname is resolved into both 127.0.0.1 IPv4 address and
>> [::1] IPv6, and produces two records.
>>
>> So would do any name, which have both A and AAAA records in DNS. And
>> nothing prevent domain administrator to put more than one A record for
>> same hostname into DNS zone.
>>
>>
>> So, it is just same information which can be retrieved from the backend
>> via
>>
>> select inet_client_addr();
>> select inet_client_port();
>
> I think you mean:
>
> select inet_server_addr();
> select inet_server_port();
>
>> What is really interesting for HOST and PORT variable - it is the name
>> of host and port number used to make actual connection, as they were
>> specified in the connect string or service file.
>
> And this is probably not the correct thing for it to report.  The
> documentation says "The database server host you are currently
> connected to." and "The database server port to which you are
> currently connected.", so yeah, I'd expect to see those set to
> whatever those 2 functions resolve to.  That being said, if one
> connects via a domain socket, those appear to come back blank with
> those functions, yet HOST and PORT report the correct information in
> those cases (without passing in multiple hosts).  Is that a
> pre-existing bug?

I've just checked, and can see that this doesn't appear to be a bug.
As per network.c:

/** IP address that the server accepted the connection on (NULL if Unix socket)*/

and

/** port that the server accepted the connection on (NULL if Unix socket)*/

Thom



Re: Patch: Implement failover on libpq connect level.

From
Alvaro Herrera
Date:
Victor Wagner wrote:
> On Fri, 22 Jan 2016 16:36:15 -0300
> Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> 
> > You're editing the expected file for the libpq-regress thingy, but you
> > haven't added any new lines to test the new capability.  I think it'd
> > be good to add some there.  (I already said this earlier in the
> > thread; is there any reason you ignored it the first time?)
> 
> I seriously doubt that this program can be used to test new
> capabilities.
> 
> All it does, it calls PQconninfoParse and than examines some fields of
> PGconn structure.

Ah, you're right, I didn't remember that.

> If I add some new uris, than only thing I can test is that comma is
> properly copied from the URI to this field. And may be that some syntax
> errors are properly detected.

Yeah, we should do that.

> So, I think that new functionality need other approach for testing.
> There should be test of real connection to real temporary cluster.
> Probably, based on Perl TAP framework which is actively used in the
> Postgres recently.

Yes, agreed.  So please have a look at that one and share your opinion
about it.  It'd be useful.

Meanwhile I'm moving the patch to the next commitfest.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Patch: Implement failover on libpq connect level.

From
David Steele
Date:
Hi Victor<small>,</small><br /><br /><div class="moz-cite-prefix">On 2/1/16 5:05 PM, Alvaro Herrera wrote:<br
/></div><blockquotecite="mid:%3C20160201220540.GA99599@alvherre.pgsql%3E" type="cite"><pre wrap="">Victor Wagner
wrote:
</pre><blockquote type="cite"><pre wrap="">On Fri, 22 Jan 2016 16:36:15 -0300
Alvaro Herrera <a class="moz-txt-link-rfc2396E"
href="mailto:alvherre@2ndquadrant.com"><alvherre@2ndquadrant.com></a>wrote:
 

</pre><blockquote type="cite"><pre wrap="">You're editing the expected file for the libpq-regress thingy, but you
haven't added any new lines to test the new capability.  I think it'd
be good to add some there.  (I already said this earlier in the
thread; is there any reason you ignored it the first time?)
</pre></blockquote><pre wrap="">
I seriously doubt that this program can be used to test new
capabilities.

All it does, it calls PQconninfoParse and than examines some fields of
PGconn structure.
</pre></blockquote><pre wrap="">
Ah, you're right, I didn't remember that.

</pre><blockquote type="cite"><pre wrap="">If I add some new uris, than only thing I can test is that comma is
properly copied from the URI to this field. And may be that some syntax
errors are properly detected.
</pre></blockquote><pre wrap="">
Yeah, we should do that.

</pre><blockquote type="cite"><pre wrap="">So, I think that new functionality need other approach for testing.
There should be test of real connection to real temporary cluster.
Probably, based on Perl TAP framework which is actively used in the
Postgres recently.
</pre></blockquote><pre wrap="">
Yes, agreed.  So please have a look at that one and share your opinion
about it.  It'd be useful.

Meanwhile I'm moving the patch to the next commitfest.
</pre></blockquote><br /> There hasn't been any movement on this patch in a while.  Will you have a new tests ready for
reviewsoon? <br /><br /> Thanks, <br /><pre class="moz-signature" cols="72">-- 
 
-David
<a class="moz-txt-link-abbreviated" href="mailto:david@pgmasters.net">david@pgmasters.net</a></pre>

Re: Patch: Implement failover on libpq connect level.

From
Simon Riggs
Date:
On 9 March 2016 at 23:11, David Steele <david@pgmasters.net> wrote:
 
There hasn't been any movement on this patch in a while.  Will you have a new tests ready for review soon? 

I see the value in this feature, but the patch itself needs work and probably some slimming down/reality and a good spellcheck.

If someone takes this on soon it can go into 9.6, otherwise I vote to reject this early to avoid wasting review time. 

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Patch: Implement failover on libpq connect level.

From
David Steele
Date:
On 3/10/16 7:34 AM, Simon Riggs wrote:
> On 9 March 2016 at 23:11, David Steele <david@pgmasters.net
> <mailto:david@pgmasters.net>> wrote:
>
>     There hasn't been any movement on this patch in a while.  Will you
>     have a new tests ready for review soon?
>
>
> I see the value in this feature, but the patch itself needs work and
> probably some slimming down/reality and a good spellcheck.
>
> If someone takes this on soon it can go into 9.6, otherwise I vote to
> reject this early to avoid wasting review time.

Since there has been no response from the author I have marked this 
patch "returned with feedback".  Please feel free to resubmit for 9.7!

-- 
-David
david@pgmasters.net



Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:
On Thu, Mar 17, 2016 at 4:47 AM, David Steele <david@pgmasters.net> wrote:
>Since there has been no response from the author I have marked this patch "returned with feedback".  Please feel free >to resubmit for 9.7!
I have started to work on this patch, and tried to fix some of the issues discussed above. The most recent patch 06 has fixed many issues which was raised previously which include indefinite looping, crashes. And, some of the issues which remain pending are.

1. Connection status functions PQport, PQhost do not give corresponding values of established connection. -- Have attached the patch for same.

2. Default value of readonly parameter is 0, which means should connect to master only. So absence of parameter in connection string in simple cases like "./psql -p PORT database" fails to connect to hot standby server.  I think since if readonly=1 means connect to any. and readonly=0 means connect to master only, we should change the default value  to 1, to handle the cases when parameter is not specified.

JFYI Interestingly PostgreSql JDBC driver have following options targetServerType=any|master|slave|preferSlave for same purpose.

Also did a little bit of patch clean-up.

Also found some minor issues related to build which I am working on.
1. make check-world @src/interfaces/ecpg/test/connect fails with following error:
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O0 -pthread -D_REENTRANT -D_THREAD_SAFE -D_POSIX_PTHREAD_SEMANTICS test1.o -L../../ecpglib -L../../pgtypeslib -L../../../../../src/interfaces/libpq -L../../../../../src/port -L../../../../../src/common -Wl,--as-needed -Wl,-rpath,'/home/mithun/edbsrc/patch6bin/lib',--enable-new-dtags  -lecpg -lpgtypes -lpq -lpgcommon -lpgport -lz -lrt -lcrypt -ldl -lm   -o test1
../../../../../src/interfaces/libpq/libpq.so: undefined reference to `pg_usleep'
collect2: error: ld returned 1 exit status
make[3]: *** [test1] Error 1
make[3]: Leaving directory `/home/mithun/mynewpost/p1/src/interfaces/ecpg/test/connect'

patch has used pg_usleep() which is in L../../../../../src/port  I think dependency is not captured rightly.

Thanks and Regards
Mithun C Y

Attachment

Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Fri, 26 Aug 2016 10:10:33 +0530
Mithun Cy <mithun.cy@enterprisedb.com> wrote:

> On Thu, Mar 17, 2016 at 4:47 AM, David Steele <david@pgmasters.net>
> wrote:
> >Since there has been no response from the author I have marked this
> >patch
> "returned with feedback".  Please feel free >to resubmit for 9.7!
> I have started to work on this patch, and tried to fix some of the
> issues discussed above. The most recent patch 06 has fixed many
> issues which was raised previously which include indefinite looping,
> crashes. And, some of the issues which remain pending are.

Although I have not resubmutted my patch yet, I've continue to work on
it, so you can see my last version on 

https://www.wagner.pp.ru/fossil/pgfailover.

It seems that I've already fixed some of issues you mentioned below.
Also I've added testsuite, which is incomplete.

> JFYI Interestingly PostgreSql JDBC driver have following options
> targetServerType=any|master|slave|preferSlave for same purpose.

Probably we should havbe a comptibility alias.


--                                   Victor Wagner <vitus@wagner.pp.ru>



Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:
On Fri, Aug 26, 2016 at 10:10 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>rpath,'/home/mithun/edbsrc/patch6bin/lib',--enable-new-dtags  -lecpg -lpgtypes -lpq -lpgcommon -lpgport -lz -lrt -lcrypt -ldl -lm   -o test1
>../../../../../src/interfaces/libpq/libpq.so: undefined reference to `pg_usleep'
 
As similar to psql I have added -lpq for compilation. So now ecpg tests compiles and make check-world has passed.
--
Thanks and Regards
Mithun C Y

Attachment

Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Fri, 26 Aug 2016 10:10:33 +0530
Mithun Cy <mithun.cy@enterprisedb.com> wrote:

> On Thu, Mar 17, 2016 at 4:47 AM, David Steele <david@pgmasters.net>
> wrote:
> >Since there has been no response from the author I have marked this
> >patch  
> "returned with feedback".  Please feel free >to resubmit for 9.7!
> I have started to work on this patch, and tried to fix some of the
> issues discussed above. The most recent patch 06 has fixed many
> issues which was raised previously which include indefinite looping,
> crashes. And, some of the issues which remain pending are.
> 
> 1. Connection status functions PQport, PQhost do not give
> corresponding values of established connection. -- Have attached the
> patch for same.
> 
> 2. Default value of readonly parameter is 0, which means should
> connect to master only. So absence of parameter in connection string
> in simple cases like "./psql -p PORT database" fails to connect to
> hot standby server.  I think since if readonly=1 means connect to
> any. and readonly=0 means connect to master only, we should change
> the default value  to 1, to handle the cases when parameter is not
> specified.
> 
> JFYI Interestingly PostgreSql JDBC driver have following options
> targetServerType=any|master|slave|preferSlave for same purpose.
> 
> Also did a little bit of patch clean-up.
> 
> Also found some minor issues related to build which I am working on.
> 1. make check-world @src/interfaces/ecpg/test/connect fails with
> following error:
> gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels
> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
> -fwrapv -fexcess-precision=standard -g -O0 -pthread -D_REENTRANT
> -D_THREAD_SAFE -D_POSIX_PTHREAD_SEMANTICS test1.o -L../../ecpglib
> -L../../pgtypeslib -L../../../../../src/interfaces/libpq
> -L../../../../../src/port -L../../../../../src/common -Wl,--as-needed
> -Wl,-rpath,'/home/mithun/edbsrc/patch6bin/lib',--enable-new-dtags
> -lecpg -lpgtypes -lpq -lpgcommon -lpgport -lz -lrt -lcrypt -ldl -lm
> -o test1 ../../../../../src/interfaces/libpq/libpq.so: undefined
> reference to `pg_usleep'
> collect2: error: ld returned 1 exit status
> make[3]: *** [test1] Error 1
> make[3]: Leaving directory `/home/mithun/mynewpost/p1/
> src/interfaces/ecpg/test/connect'
> 
> patch has used pg_usleep() which is in L../../../../../src/port  I
> think dependency is not captured rightly.
> 
> Thanks and Regards
> Mithun C Y
> EnterpriseDB: http://www.enterprisedb.com




Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Tue, 30 Aug 2016 14:54:57 +0530
Mithun Cy <mithun.cy@enterprisedb.com> wrote:

> On Fri, Aug 26, 2016 at 10:10 AM, Mithun Cy
> <mithun.cy@enterprisedb.com> wrote:
> >  
> > >rpath,'/home/mithun/edbsrc/patch6bin/lib',--enable-new-dtags
> > >-lecpg  
> > -lpgtypes -lpq -lpgcommon -lpgport -lz -lrt -lcrypt -ldl -lm   -o
> > test1  
> > >../../../../../src/interfaces/libpq/libpq.so: undefined reference
> > >to  
> > `pg_usleep'
> >  
> 
> As similar to psql I have added -lpq for compilation. So now ecpg
> tests compiles and make check-world has passed.

Thanks, I've added this to 7-th (yet unpublished here) version of my
patch, available on my site.




Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:
<p dir="ltr">On Aug 31, 2016 1:44 PM, "Victor Wagner" <<a
href="mailto:vitus@wagner.pp.ru">vitus@wagner.pp.ru</a>>wrote:<br /> > Thanks, I've added this to 7-th (yet
unpublishedhere) version of my<br /> > patch.<br /> Hi victor, just wanted know what your plan for your patch 07.
Wouldyou like to submit it to the community. I have just signed up as a reviewer for your patch.<br /> 

Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Mon, 5 Sep 2016 07:59:02 +0530
Mithun Cy <mithun.cy@enterprisedb.com> wrote:

> On Aug 31, 2016 1:44 PM, "Victor Wagner" <vitus@wagner.pp.ru> wrote:
> > Thanks, I've added this to 7-th (yet unpublished here) version of my
> > patch.
> Hi victor, just wanted know what your plan for your patch 07. Would
> you like to submit it to the community. I have just signed up as a
> reviewer for your patch.

As community shows an interest with this patch, and it has been
included in the current commitfest, I've to submit it.

This version of patch includes

1. Most important - a tap test suite (really small enough and has a
room for improvements).

2. Compatibility with older versions - now readonly check is skipped if
only one hostname is specified in the connect string

3. pg_host and pg_port variables now show host name and port library
was connected to.

4. Your fix with compilation of ecpg tests.

Patch is attached.



--
                                   Victor Wagner <vitus@wagner.pp.ru>

Attachment

Re: Patch: Implement failover on libpq connect level.

From
Aleksander Alekseev
Date:
Hello, Victor.

> As community shows an interest with this patch, and it has been
> included in the current commitfest, I've to submit it.

Naturally we show interest in this patch. It's a great feature that
would be very nice to have!

> This version of patch includes
> 
> 1. Most important - a tap test suite (really small enough and has a
> room for improvements).
> 
> 2. Compatibility with older versions - now readonly check is skipped if
> only one hostname is specified in the connect string
> 
> 3. pg_host and pg_port variables now show host name and port library
> was connected to.
> 
> 4. Your fix with compilation of ecpg tests.
> 
> Patch is attached.

After a brief examination I've found following ways to improve the
patch.

1) It looks like code is not properly formatted.

2)

> +      <term><literal>readonly</literal></term>
> +      <listitem>
> +      <para>
> +      If this parameter is 0 (the default), upon successful connection
> +      library checks if host is in recovery state, and if it is so, 
> +      tries next host in the connect string. If this parameter is
> +      non-zero, connection to warm standby nodes are considered
> +      successful.

IIRC the are "warm" and "hot" standby's in PostgreSQL terminology. The
difference is that you can't execute any queries on a "warm" standby. So
the description is probably wrong. I suggest to fix it to just "...
connection to standby nodes are...".

3)

> +     <term><literal>falover_timeout</literal></term>
> +     <listitem>
> +     <para>
> +     Maximum time to cycilically retry all the hosts in commect string.
> +     (as decimal integer number of seconds). If not specified, then
> +     hosts are tried just once.
> +     </para>

Typos: cycilically, commect

4) 

>  static int 
>  connectDBStart(PGconn *conn)
>  {
> +       struct nodeinfo
> +       {
> +               char       *host;
> +               char       *port;
> +       };

Not sure if all compilers we support can parse this. I suggest to
declare nodinfo structure outside of procedure, just to be safe.

5)

> +               nodes = calloc(sizeof(struct nodeinfo), 2); 
> +               nodes->host = strdup(conn->pghostaddr);
>                 hint.ai_family = AF_UNSPEC;

> +               /* Parse comma-separated list of host-port pairs into
> +                  function-local array of records */
> +               nodes = malloc(sizeof(struct nodeinfo) * 4); 


>                 /* pghostaddr and pghost are NULL, so use Unix domain
>  * socket */
> -               node = NULL;
> +               nodes = calloc(sizeof(struct nodeinfo), 2); 

> +       sprintf(port,"%d", htons(((struct sockaddr_in
> *)ai->ai_addr)->sin_port));
> +       return strdup(port);

You should check return values of malloc, calloc and strdup procedures,
or use safe pg_* equivalents.

6) 

> +                       for (p = addrs; p->ai_next != NULL; p =
>   p->ai_next);
> +                       p->ai_next = this_node_addrs;

Looks a bit scary. I suggest to add an empty scope ( {} ) and a comment
to make our intention clear here.

7) Code compiles with following warnings (CLang 3.8):

> 1 warning generated.
> fe-connect.c:5239:39: warning: if statement has empty body
> [-Wempty-body]
>                                                                    errorMessage,
> false, true));
>                                                                                               ^
> fe-connect.c:5239:39: note: put the semicolon on a separate line to
> silence this warning
> 1 warning generated.

8) get_next_element procedure implementation is way too smart (read
"complicated"). You could probably just store current list length and
generate a random number between 0 and length-1.

-- 
Best regards,
Aleksander Alekseev



Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:

On Mon, Sep 5, 2016 at 4:33 PM, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote:
>After a brief examination I've found following ways to improve the patch.
Adding to above comments.

1)
+ /*
+ * consult connection options and check if RO connection is OK
+ * RO connection is OK if readonly connection is explicitely
+ * requested or if replication option is set, or just one
+ * host is specified in the connect string.
+ */
+ if ((conn->pghost==NULL || strchr(conn->pghost,',')==NULL) ||
+ ((conn->read_only && conn->read_only[0] > '0'))
+ ||(conn->replication && conn->replication[0])
+ )
+ {

Now if only one host is in connection string and it ask for read_write connection(connect to master) I mean read_only is set 0 explicitly. With above logic we will allow it to connect to standby?. I still think psql connection to standby should be handled by changing the default value of readonly to 1 (which means connect to any). Thinking further probably replacing readonly parameter with targetServerType=any|master (with default being any) should clear some confusions and bring consistency since same is used in JDBC multi host connection string.

2)
@@ -1460,33 +1538,80 @@ connectDBStart(PGconn *conn)
   portstr,
   (int) (UNIXSOCK_PATH_BUFLEN - 1));
  conn->options_valid = false;
+ free(nodes->port);

nodes->port was not allocated at this point.

--
Thanks and Regards
Mithun C Y

Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Tue, 6 Sep 2016 07:58:28 +0530
Mithun Cy <mithun.cy@enterprisedb.com> wrote:

> 
> Now if only one host is in connection string and it ask for read_write
> connection(connect to master) I mean read_only is set 0 explicitly.
> With above logic we will allow it to connect to standby?. I still
> think psql connection to standby should be handled by changing the
> default value of readonly to 1 (which means connect to any). 

It would definitely be more logical, but I haven't found easy way to do
it. May be I should return to this place and rethink. I don't like and
idea to explicitely ignore connection string parameter due to some
condition. 

Really, I think, test for replication connection can be either
removed from this part of code now.  


> Thinking
> further probably replacing readonly parameter with
> targetServerType=any|master (with default being any) should clear
> some confusions and bring consistency since same is used in JDBC
> multi host connection string.

It seems that in this context change readonly=0|1 to
targetServerType=any|master  makes sense.


> 2)
> @@ -1460,33 +1538,80 @@ connectDBStart(PGconn *conn)
>    portstr,
>    (int) (UNIXSOCK_PATH_BUFLEN - 1));
>   conn->options_valid = false;
> + free(nodes->port);
> 
> nodes->port was not allocated at this point.
> 

I'll recheck it.

--                                   Victor Wagner <vitus@wagner.pp.ru>



Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Mon, 5 Sep 2016 14:03:11 +0300
Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote:

> Hello, Victor.
> 
> 
> 1) It looks like code is not properly formatted.
> 

Thanks for pointing to the documentation and formatting problems. I'll
fix them 
> >  static int 
> >  connectDBStart(PGconn *conn)
> >  {
> > +       struct nodeinfo
> > +       {
> > +               char       *host;
> > +               char       *port;
> > +       };  
> 
> Not sure if all compilers we support can parse this. I suggest to
> declare nodinfo structure outside of procedure, just to be safe.
> 

Block-local structs  are part of C language standard. I don't remember
when they appeared first (may be in K&R C), but at least since C89 AFAIR
they are allowed explicitely.

And using most local scope possible is always a good thing.

So, I'll leave this structure function local for now.


> 
> You should check return values of malloc, calloc and strdup
> procedures, or use safe pg_* equivalents.

Thanks, I'll fix this one.
> 6) 
> 
> > +                       for (p = addrs; p->ai_next != NULL; p =
> >   p->ai_next);
> > +                       p->ai_next = this_node_addrs;  
> 
> Looks a bit scary. I suggest to add an empty scope ( {} ) and a
> comment to make our intention clear here.

Ok, it really would make code more clear.
> 7) Code compiles with following warnings (CLang 3.8):
> 
> > 1 warning generated.
> > fe-connect.c:5239:39: warning: if statement has empty body
> > [-Wempty-body]
> >                                                                    errorMessage,
> > false, true));
> >                                                                                               ^
> > fe-connect.c:5239:39: note: put the semicolon on a separate line to
> > silence this warning
> > 1 warning generated.  

Oops, it looks like logic error, which should be fixed. Thanks for
testing my patch by more picky compiler.

> 8) get_next_element procedure implementation is way too smart (read
> "complicated"). You could probably just store current list length and
> generate a random number between 0 and length-1.

No, algorithm here is more complicated. It must ensure that there would
not be second attempt to connect to host, for which unsuccessful
connection attempt was done. So, there is list rearrangement.

Algorithm for pick random list element by single pass is quite trivial.





Re: Patch: Implement failover on libpq connect level.

From
Aleksander Alekseev
Date:
> > 8) get_next_element procedure implementation is way too smart (read
> > "complicated"). You could probably just store current list length and
> > generate a random number between 0 and length-1.
> 
> No, algorithm here is more complicated. It must ensure that there would
> not be second attempt to connect to host, for which unsuccessful
> connection attempt was done. So, there is list rearrangement.
> 
> Algorithm for pick random list element by single pass is quite trivial.

Great! In this case I would be _trivial_ for you to write a comment that
describes how this procedure works, what makes you think that it gives a
good distribution in all possible cases (e.g. if there is more than
0x10000 elements in a list - why not), etc. Right? :)

-- 
Best regards,
Aleksander Alekseev



Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Tue, 6 Sep 2016 07:58:28 +0530
Mithun Cy <mithun.cy@enterprisedb.com> wrote:

> On Mon, Sep 5, 2016 at 4:33 PM, Aleksander Alekseev <
> a.alekseev@postgrespro.ru> wrote:
> >After a brief examination I've found following ways to improve the
> >patch.
> Adding to above comments.
>
I'm sending new version of patch.

I've replaced readonly option with target_server_type (with JDBC
compatibility alias targetServerType),

use logic of setting defaults based on number of hosts in the connect
string instead of complicated condition in the state machine,

added checks for return values of memory allocation function.

Also, I've solved pg_usleep problem by linking pgsleep.c into libpq
along with some other libpgport objects which are already linked there.

Thus client applications don't need to link with libpgport and libpq
shared library is self-containted.

Attachment

Re: Patch: Implement failover on libpq connect level.

From
Aleksander Alekseev
Date:
Hello, Victor.

> I'm sending new version of patch.
> 
> I've replaced readonly option with target_server_type (with JDBC
> compatibility alias targetServerType), 
> 
> use logic of setting defaults based on number of hosts in the connect
> string instead of complicated condition in the state machine,
> 
> added checks for return values of memory allocation function.
> 
> Also, I've solved pg_usleep problem by linking pgsleep.c into libpq
> along with some other libpgport objects which are already linked there.
> 
> Thus client applications don't need to link with libpgport and libpq
> shared library is self-containted.

This patch doesn't apply to master branch:

```
$ git apply ~/temp/libpq-failover-8.patch
/home/eax/temp/libpq-failover-8.patch:184: trailing whitespace.
check: 
/home/eax/temp/libpq-failover-8.patch:252: trailing whitespace./* 
/home/eax/temp/libpq-failover-8.patch:253: trailing whitespace. *  Validate target_server_mode option 
/home/eax/temp/libpq-failover-8.patch:254: trailing whitespace. */ 
/home/eax/temp/libpq-failover-8.patch:306: trailing whitespace.        appendPQExpBuffer(&conn->errorMessage, 
error: src/interfaces/libpq/t/001-multihost.pl: already exists in
working directory

$ git diff
```

-- 
Best regards,
Aleksander Alekseev



Re: Patch: Implement failover on libpq connect level.

From
Aleksander Alekseev
Date:
> Hello, Victor.
> 
> > I'm sending new version of patch.
> > 
> > I've replaced readonly option with target_server_type (with JDBC
> > compatibility alias targetServerType), 
> > 
> > use logic of setting defaults based on number of hosts in the connect
> > string instead of complicated condition in the state machine,
> > 
> > added checks for return values of memory allocation function.
> > 
> > Also, I've solved pg_usleep problem by linking pgsleep.c into libpq
> > along with some other libpgport objects which are already linked there.
> > 
> > Thus client applications don't need to link with libpgport and libpq
> > shared library is self-containted.
> 
> This patch doesn't apply to master branch:
> 
> ```
> $ git apply ~/temp/libpq-failover-8.patch
> /home/eax/temp/libpq-failover-8.patch:184: trailing whitespace.
> check: 
> /home/eax/temp/libpq-failover-8.patch:252: trailing whitespace.
>     /* 
> /home/eax/temp/libpq-failover-8.patch:253: trailing whitespace.
>      *  Validate target_server_mode option 
> /home/eax/temp/libpq-failover-8.patch:254: trailing whitespace.
>      */ 
> /home/eax/temp/libpq-failover-8.patch:306: trailing whitespace.
>             appendPQExpBuffer(&conn->errorMessage, 
> error: src/interfaces/libpq/t/001-multihost.pl: already exists in
> working directory
> 
> $ git diff
> ```

Oops. Sorry, my mistake. I forgot to check for untracked files.
Everything is fine.

-- 
Best regards,
Aleksander Alekseev



Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:

On Wed, Sep 7, 2016 at 7:26 PM, Victor Wagner <vitus@wagner.pp.ru> wrote:
> No, algorithm here is more complicated. It must ensure that there would
> not be second attempt to connect to host, for which unsuccessful
> connection attempt was done. So, there is list rearrangement.
>Algorithm for pick random list element by single pass is quite trivial.

If concern is only about excluding address which was tried previously. Then I think we can try this way, randomly permute given address list (for example fisher-yates shuffle) before trying any of those address in it. After that you can treat the new list exactly like we do for sequential connections. I think this makes code less complex.
--
Thanks and Regards
Mithun C Y

Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Fri, 9 Sep 2016 11:20:59 +0530
Mithun Cy <mithun.cy@enterprisedb.com> wrote:

> If concern is only about excluding address which was tried
> previously. Then I think we can try this way, randomly permute given
> address list (for example fisher-yates shuffle) before trying any of
> those address in it. After that you can treat the new list exactly
> like we do for sequential connections. I think this makes code less
> complex.

Random permutation is much more computationally complex than random
picking. 

Alexander suggests to use other random pick algorithm than I use.

I use algorithm which works with lists of unknown length and chooses
line of such list with equal probability. This algorithm requires one
random number for each element of the list, but we are using C library
rand function which uses quite cheap linear congruental pseudo random
numbers, so random number generation is infinitely cheap than network
connection attempt.

As this algorithm doesn't need beforehand knowledge of the list length,
it is easy to wrap this algorithm into loop, which modifies the list,
moving already tried elements out of scope of next picking.
(it is really quite alike fisher-yates algorithm).

Alexander suggests to store somewhere number of elements of the list,
than generate one random number and pick element with this number.

Unfortunately, it doesn't save much effort in the linked list.
At average we'll need to scan half of the list to find desired element.
So, it is O(N) anyway.

Now, the main point is that we have two different timeframes of
operation.

1. Attempt to connect. It is short period, and we can assume that state
of servers doesn't change. So if, server is down, we should ignore it
until the end of this attempt to connect.

2. Attempt to failover. If no good candidate was found, we might need
to retry connection until one of standbys would be promoted to master
(with targetServerType=master) or some host comes up. See documentation
of failover_timeout configuration parameter.

It means that we cannot just throw away those hosts from list which
didn't respond during this connect attempt.
So, using random permutation instead  of random pick wouln't help.
And keeping somewhere number of elements in the list wouldn't help
either unless we change linked list to completely different data
structure which allows random access.
-- 





Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Fri, Sep 9, 2016 at 4:49 AM, Victor Wagner <vitus@wagner.pp.ru> wrote:
> Random permutation is much more computationally complex than random
> picking.

It really isn't.  The pseudocode given at
https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle is all of 4
lines long, and one of those lines is a comment.  The C implementation
is longer, but not by much.

Mind you, I haven't read the patch, so I don't know whether using
something like this would make the code simpler or more complicated.
However, if the two modes of operation are (1) try all hosts in random
order and (2) try all hosts in the listed order, it's worth noting
that the code for the second thing can be used to implement the first
thing just by shuffling the list before you begin.

> So, using random permutation instead  of random pick wouln't help.
> And keeping somewhere number of elements in the list wouldn't help
> either unless we change linked list to completely different data
> structure which allows random access.

Is there a good reason not to use an array rather than a linked list?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:
I have some more comments on libpq-failover-8.patch

+ /* FIXME need to check that port is numeric */

Is this still applicable?.

1)

+ /*

+ * if there is more than one host in the connect string and

+ * target_server_type is not specified explicitely, set

+ * target_server_type to "master"

+ */

I think we need comments to know why we change default value based on number of elements in connection string. why default in not “any" when node count > 1.


2)

+ /* loop over all the host specs in the node variable */

+ for (node = nodes; node->host != NULL || node->port != NULL; node++)

  {

I think instead of loop if we can move these code into a separate function say pg_add_to_addresslist(host, port, &addrs) this helps in removing temp variables like "struct node info” and several heap calls around it.


3)

+static char *

+get_port_from_addrinfo(struct addrinfo * ai)

+{

+ char port[6];

+

+ sprintf(port, "%d", htons(((struct sockaddr_in *) ai->ai_addr)->sin_port));

+ return strdup(port);

+}

Need some comments for this function.

We use strdup in many places no where we handle memory allocation failure.


4)

+ /*

+ * consult connection options and check if RO connection is OK RO

+ * connection is OK if readonly connection is explicitely

+ * requested or if replication option is set, or just one host is

+ * specified in the connect string.

+ */

+ if (conn->pghost == NULL || !conn->target_server_type ||

+ strcmp(conn->target_server_type, "any") == 0)

Comments not in sink with code.


On Wed, Sep 14, 2016 at 12:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Sep 9, 2016 at 4:49 AM, Victor Wagner <vitus@wagner.pp.ru> wrote:
> Random permutation is much more computationally complex than random
> picking.

It really isn't.  The pseudocode given at
https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle is all of 4
lines long, and one of those lines is a comment.  The C implementation
is longer, but not by much.

Mind you, I haven't read the patch, so I don't know whether using
something like this would make the code simpler or more complicated.
However, if the two modes of operation are (1) try all hosts in random
order and (2) try all hosts in the listed order, it's worth noting
that the code for the second thing can be used to implement the first
thing just by shuffling the list before you begin.

> So, using random permutation instead  of random pick wouln't help.
> And keeping somewhere number of elements in the list wouldn't help
> either unless we change linked list to completely different data
> structure which allows random access.

Is there a good reason not to use an array rather than a linked list?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



--
Thanks and Regards
Mithun C Y

Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Sun, 25 Sep 2016 17:31:53 +0530
Mithun Cy <mithun.cy@enterprisedb.com> wrote:

> I have some more comments on libpq-failover-8.patch
>
> + /* FIXME need to check that port is numeric */
>
> Is this still applicable?.
>

Unfortunately, it was. I've fixed this problem in 9-th version of patch
(attached)
>
> I think we need comments to know why we change default value based on
> number of elements in connection string. why default in not “any"
> when node count > 1.

Fixed.

>
> + /* loop over all the host specs in the node variable */
>
> + for (node = nodes; node->host != NULL || node->port != NULL; node++)
>
>   {
>
> I think instead of loop if we can move these code into a separate
> function say pg_add_to_addresslist(host, port, &addrs) this helps in
> removing temp variables like "struct node info” and several heap
> calls around it.

For some reason DNS resolving was concentrated in one place before my
changes. So, I've tried to not change this decision.


> 3)
>
> +static char *
>
> +get_port_from_addrinfo(struct addrinfo * ai)
>
>
> Need some comments for this function.

Done.

> We use strdup in many places no where we handle memory allocation
> failure.

Added some more memory allocation error checks.
>
> Comments not in sink with code.

Fixed.


Attachment

Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:
This patch do not apply on latest code. it fails as follows
libpq-failover-9.patch:176: trailing whitespace.
thread.o pgsleep.o
libpq-failover-9.patch:184: trailing whitespace.
check: 
libpq-failover-9.patch:185: trailing whitespace.
$(prove_check)
libpq-failover-9.patch:186: trailing whitespace.

libpq-failover-9.patch:194: trailing whitespace.
rm -rf tmp_check
error: patch failed: doc/src/sgml/libpq.sgml:792
error: doc/src/sgml/libpq.sgml: patch does not apply
error: patch failed: src/interfaces/libpq/Makefile:36
error: src/interfaces/libpq/Makefile: patch does not apply
error: patch failed: src/interfaces/libpq/fe-connect.c:299
error: src/interfaces/libpq/fe-connect.c: patch does not apply
error: patch failed: src/interfaces/libpq/libpq-fe.h:62
error: src/interfaces/libpq/libpq-fe.h: patch does not apply
error: patch failed: src/interfaces/libpq/libpq-int.h:334
error: src/interfaces/libpq/libpq-int.h: patch does not apply
error: patch failed: src/interfaces/libpq/test/expected.out:1
error: src/interfaces/libpq/test/expected.out: patch does not apply
error: patch failed: src/test/perl/PostgresNode.pm:398
error: src/test/perl/PostgresNode.pm: patch does not apply


On Tue, Sep 27, 2016 at 2:49 PM, Victor Wagner <vitus@wagner.pp.ru> wrote:
1).
>* if there is more than one host in the connect string and
>* target_server_type is not specified explicitely, set
>* target_server_type to "master", because default mode of
>* operation is failover, and so, we need to connect to RW
>* host, and keep other nodes of the cluster in the connect
>* string just in case master would fail and one of standbys
>* would be promoted to master.

I am slightly confused. As per this target_server_type=master means user is expecting failover. What if user pass target_server_type=any and request sequential connection isn't this also a case for failover?.  I think by default it should be "any" for any number of hosts. And parameter "sequential and random will decide whether we want just failover or with load-balance.

2).

>For some reason DNS resolving was concentrated in one place before my
>changes. So, I've tried to not change this decision.

My intention was not to have a replacement function for "pg_getaddrinfo_all", I just suggested to

have a local function which call pg_getaddrinfo_all for every host, port pair read earlier. By this way we need not to maintain nodes struct. This also reduces complexity of connectDBStart.

FUNC (host, port, addrs)

{

    CALL pg_getaddrinfo_all(host, port, newaddrs);

   addrs-> ai_next = newaddrs;

}

3).

I think you should run a spellcheck once. And, there are some formatting issue with respect to comments and curly braces of controlled blocks which need to be fixed.

--
Thanks and Regards
Mithun C Y

Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Thu, 29 Sep 2016 23:45:52 +0530
Mithun Cy <mithun.cy@enterprisedb.com> wrote:

> This patch do not apply on latest code. it fails as follows

It's strange. I have no problems applying it to commit
fd321a1dfd64d30

Ok, some trailing whitespace and mixing of tabs and spaces
which git apply doesn't like really present in the patch.

I'm attached hear version with these issues resolved.


> I am slightly confused. As per this target_server_type=master means
> user is expecting failover. What if user pass target_server_type=any
> and request sequential connection isn't this also a case for
> failover?.  I think by default it should be "any" for any number of
> hosts. And parameter "sequential and random will decide whether we
> want just failover or with load-balance.

I don't agree with this. In the first versions of the patch it refuses
connect to readonly server even if it is only one, because I think that
read-write connection is what user typically expect.

When user tries to connect to cluster (specifying many hosts in the
connect string), it can be by default assumed that he wants master.

But backward compatibility is more
important thing, so I now assume, that user tries to connect just one
node, and this node is read only, user knows what he is doing.




Attachment

Re: Patch: Implement failover on libpq connect level.

From
Michael Paquier
Date:
On Fri, Sep 30, 2016 at 5:44 PM, Victor Wagner <vitus@wagner.pp.ru> wrote:
> But backward compatibility is more
> important thing, so I now assume, that user tries to connect just one
> node, and this node is read only, user knows what he is doing.

Moved this patch to next CF.

(Note that I am in CF-vacuuming mode this morning, I'll try to close it asap.)
-- 
Michael



Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:
On Fri, Sep 30, 2016 at 2:14 PM, Victor Wagner <vitus@wagner.pp.ru> wrote:
>Ok, some trailing whitespace and mixing of tabs and spaces
>which git apply doesn't like really present in the patch.
>I'm attached hear version with these issues resolved.

There were some more trailing spaces and spaces used for tabs in patch 09. I have fixed same along with formatting issues related to comments and { of controlled blocks.

>But backward compatibility is more important thing, so I now assume, that user tries to connect just onenode, and this node is read only, user knows
>what he is doing.
Okay but for me consistency is also important. Since we agree to disagree on some of the comments and others have not expressed any problem I am moving it to committer.

--
Thanks and Regards
Mithun C Y

Attachment

Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Thu, 13 Oct 2016 12:30:59 +0530
Mithun Cy <mithun.cy@enterprisedb.com> wrote:

> On Fri, Sep 30, 2016 at 2:14 PM, Victor Wagner <vitus@wagner.pp.ru>
> wrote:

> Okay but for me consistency is also important. Since we agree to
> disagree on some of the comments and others have not expressed any
> problem I am moving it to committer.


Thank you for your efforts improving my patch




Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Thu, Oct 13, 2016 at 5:53 AM, Victor Wagner <vitus@wagner.pp.ru> wrote:
> On Thu, 13 Oct 2016 12:30:59 +0530
> Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>> On Fri, Sep 30, 2016 at 2:14 PM, Victor Wagner <vitus@wagner.pp.ru>
>> wrote:
>> Okay but for me consistency is also important. Since we agree to
>> disagree on some of the comments and others have not expressed any
>> problem I am moving it to committer.
>
> Thank you for your efforts improving my patch

I haven't deeply reviewed this patch, but on a quick read-through it
certainly seems to need a lot of cleanup work.

+      <varlistentry id="libpq-connect-hostorder" xreflabel="hostorder">
+      <term><literal>hostorder</literal></term>
+      <listitem>

I don't think that putting everything at the same indentation level
matches the style of the surrounding documentation.

@@ -985,7 +1042,6 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname      </para>      </listitem>     </varlistentry>
-     <varlistentry id="libpq-connect-connect-timeout"
xreflabel="connect_timeout">      <term><literal>connect_timeout</literal></term>      <listitem>

Spurious hunk.

+                /* Host has explicitely specified port */

Spelling.  This same error is repeated elsewhere.

+                /* Check if port is numeric */
+                for (nptr=r+1;nptr<q;nptr++)
+                {
+                    if (*nptr<'0' ||*nptr >'9')

Formatting.  Consider using pgindent to get this into correct PostgreSQL style.

+             * behavoir.

Spelling.

+ * the beginning (and putting its addres into given pointer.

Spelling.

+ *    Returns 1 if address is choosen and 0 if there are no more addresses

Spelling.

+         * Initialize random number generator in case if nobody have done it
+         * before. Use value from rand along with time in case random number

Grammar ("in case nobody has done it already").  Also, is it really OK
for libpq to call srand() as a side effect?  I think that seems rather
unfriendly.

+             * taget_server_type is set to 'any' or is not exlicitely

Multiple spelling mistakes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Thom Brown
Date:
On 13 October 2016 at 10:53, Victor Wagner <vitus@wagner.pp.ru> wrote:
> On Thu, 13 Oct 2016 12:30:59 +0530
> Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>
>> On Fri, Sep 30, 2016 at 2:14 PM, Victor Wagner <vitus@wagner.pp.ru>
>> wrote:
>
>> Okay but for me consistency is also important. Since we agree to
>> disagree on some of the comments and others have not expressed any
>> problem I am moving it to committer.
>
>
> Thank you for your efforts improving my patch

I've found various spelling and grammatical issues in the docs
section, which I've corrected.  I've attached a revised patch with
these changes.

One thing I'm wondering is whether we should be using the term
"master", as that's usually paired with "slave", whereas, nowadays,
there's a tendency to refer to them as "primary" and "standby".  I
know we use "master" in some other places in the documentation, but it
seems inconsistent to have "master" as a parameter value, but then
having "primary" used in some other configuration parameters.

I'd also avoid referring to "the library", and just describe what
happens without making reference to what's making it happen.

Regards

Thom

Attachment

Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Mon, Oct 26, 2015 at 4:25 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On 10/14/15 6:41 AM, Victor Wagner wrote:
>> 1. It is allowed to specify several hosts in the connect string, either
>> in URL-style (separated by comma) or in param=value form (several host
>> parameters).
>
> I'm not fond of having URLs that are not valid URLs according to the
> applicable standards.  Because then they can't be parsed or composed by
> standard libraries.

I did a little bit more research on this topic and found out a few
things that are interesting, at least to me.  First, our documentation
reference RFC3986.  According to RFC3986:

URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

hier-part   = "//" authority path-abempty                 / path-absolute                 / path-rootless
 / path-empty
 

authority   = [ userinfo "@" ] host [ ":" port ]

host        = IP-literal / IPv4address / reg-name

reg-name    = *( unreserved / pct-encoded / sub-delims )

sub-delims include comma but not colon, so I think that
postgresql://host1,host2,host3/ is a perfectly good URL, and so is
postgresql://host1,host2,host3:1111/ but
postgresql://host1:1234,host2:3456/ is not a valid URL because the :
after host1 terminates the "host" portion of the URL.  The port can't
contain anything but digits, so 1234 has to be the port, but then
there's nothing to do with the portion between the comma and the
following slash, so it is indeed an invalid URI as far as I can tell.

However, PostgreSQL's JDBC driver isn't alone in supporting something
like this.  The MySQL JDBC driver does the same thing:

http://lists.mysql.com/cluster/249
https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-configuration-properties.html

MongoDB refers to their connection string as a URI, but it uses
exactly this syntax:

https://docs.mongodb.com/manual/reference/connection-string/

Couchbase's syntax is also quite similar, though it's not clear that
they allow port numbers:

http://developer.couchbase.com/documentation/server/4.1/developer-guide/connecting.html

Of course, since this is a very common need and it's not obvious how
to satisfy it within the confines of the URI specification, people
have developed a variety of other answers, such as (a) multiple URIs
separated by commas, which is a terrible idea because comma can occur
*within* URIs, (b) separating multiple host names with a double-dash,
(c) including a parameter in the "query" portion of the URI to specify
alternate host names, (d) defining a new failover: URI scheme that
acts as a container for multiple connection URIs of whatever form is
normally supported, and in the case of Oracle (e) creating some
frightening monstrosity of proprietary syntax that I don't (care to)
understand.

All in all, I'm still feeling pretty good about trying to support the
same syntax that our JDBC driver already does.  It's certainly not a
perfect solution, but it is at least compatible with MySQL's JDBC
driver and with MongoDB, and in a world where everybody has picked a
different approach that's not too bad.  Hey, maybe if we use the same
syntax as MongoDB they'll let us hang out with the cool kids...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Peter van Hardenberg
Date:
On Wed, Oct 19, 2016 at 3:08 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Oct 26, 2015 at 4:25 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On 10/14/15 6:41 AM, Victor Wagner wrote:
All in all, I'm still feeling pretty good about trying to support the
same syntax that our JDBC driver already does.  It's certainly not a
perfect solution, but it is at least compatible with MySQL's JDBC
driver and with MongoDB, and in a world where everybody has picked a
different approach that's not too bad.  Hey, maybe if we use the same
syntax as MongoDB they'll let us hang out with the cool kids...


They will never let us hang out with the cool kids. Don't worry though, the cool kids are too busy figuring out why their cluster is out of consensus to pay attention to much else.

Supporting different ports on different servers would be a much appreciated feature (I can't remember if it was Kafka or Cassandra that didn't do this and it was very annoying.)

Remember, as the connection string gets more complicated, psql supports the Postgres URL format as a single command-line argument and we may want to begin encouraging people to use that syntax instead.

Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Wed, Oct 19, 2016 at 12:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Oct 13, 2016 at 5:53 AM, Victor Wagner <vitus@wagner.pp.ru> wrote:
>> On Thu, 13 Oct 2016 12:30:59 +0530
>> Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>>> On Fri, Sep 30, 2016 at 2:14 PM, Victor Wagner <vitus@wagner.pp.ru>
>>> wrote:
>>> Okay but for me consistency is also important. Since we agree to
>>> disagree on some of the comments and others have not expressed any
>>> problem I am moving it to committer.
>>
>> Thank you for your efforts improving my patch
>
> I haven't deeply reviewed this patch, but on a quick read-through it
> certainly seems to need a lot of cleanup work.

Some more comments:

- I am pretty doubtful that the changes to connectOptions2() work as
intended.  I think that's only going to be called before actualhost
and actualport could possibly get set.  I don't think we keep
reinvoking this function for every new host we try.

- It's pretty clear that this isn't going to work if the host list
includes a mix of hostnames and UNIX socket addresses.  The code that
handles the UNIX socket address case is totally separate from the code
that handles the multiple-hostname case.

- This creates a sort of definitional problem for functions like
PQhost().  The documentation says of this and other related functions:
"These values are fixed for the life of the PGconn object."  But with
this patch, that would no longer be true.  So at least the
documentation needs an update.  Actually, though, I think the problem
is more fundamental than that.  If the user asks for PQhost(conn), do
they want the list of all hosts, or the one to which we're currently
connected?  It might be either, depending on what they intend to do
with the information.  If they mean to construct a new connection
string, they might well want them all; but something like psql's
\conninfo only shows the current one.  There's also the question of
what "the current one" means if we're not connected to any server at
all.  I'm not sure exactly how to solve these problems, but they need
some thought.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Thom Brown
Date:
On 20 October 2016 at 01:15, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Oct 19, 2016 at 12:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Thu, Oct 13, 2016 at 5:53 AM, Victor Wagner <vitus@wagner.pp.ru> wrote:
> >> On Thu, 13 Oct 2016 12:30:59 +0530
> >> Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> >>> On Fri, Sep 30, 2016 at 2:14 PM, Victor Wagner <vitus@wagner.pp.ru>
> >>> wrote:
> >>> Okay but for me consistency is also important. Since we agree to
> >>> disagree on some of the comments and others have not expressed any
> >>> problem I am moving it to committer.
> >>
> >> Thank you for your efforts improving my patch
> >
> > I haven't deeply reviewed this patch, but on a quick read-through it
> > certainly seems to need a lot of cleanup work.
>
> Some more comments:
>
> - I am pretty doubtful that the changes to connectOptions2() work as
> intended.  I think that's only going to be called before actualhost
> and actualport could possibly get set.  I don't think we keep
> reinvoking this function for every new host we try.
>
> - It's pretty clear that this isn't going to work if the host list
> includes a mix of hostnames and UNIX socket addresses.  The code that
> handles the UNIX socket address case is totally separate from the code
> that handles the multiple-hostname case.

So should it be the case that it disallows UNIX socket addresses
entirely?  I can't imagine a list of UNIX socket addresses being that
useful.

Thom



Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Mon, 24 Oct 2016 11:36:31 +0100
Thom Brown <thom@linux.com> wrote:


> > - It's pretty clear that this isn't going to work if the host list
> > includes a mix of hostnames and UNIX socket addresses.  The code
> > that handles the UNIX socket address case is totally separate from
> > the code that handles the multiple-hostname case.  
> 
> So should it be the case that it disallows UNIX socket addresses
> entirely?  I can't imagine a list of UNIX socket addresses being that
> useful.

Really, patch was implemented as TCP-only from the beginning.

I know only one case, where list of UNIX sockets is useful - test suite.
I have to include modifications to PostgresNode.pm in the patch to
allow testing with TCP sockets.

Also, I can imagine that people would want include one unix socket into
list of TCP ones, i.e. one of replicas running on the same host as
application server. But patch doesn't support it. May be it should be
clarified in the documentation, that this is not supported.




Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Wed, 19 Oct 2016 20:15:38 -0400
Robert Haas <robertmhaas@gmail.com> wrote:

> 
> Some more comments:
> 
> - I am pretty doubtful that the changes to connectOptions2() work as
> intended.  I think that's only going to be called before actualhost
> and actualport could possibly get set.  I don't think we keep
> reinvoking this function for every new host we try.

I would investigate this. Probably test cases for .pgpass file should
be added.

> - It's pretty clear that this isn't going to work if the host list
> includes a mix of hostnames and UNIX socket addresses.  The code that
> handles the UNIX socket address case is totally separate from the code
> that handles the multiple-hostname case.

Muitihost feature was never intended to work with unix-domain socket. 
May be it should be clarified in the docs, or even improved error
message emitted when unix socket occurs in the multiple host list.
> - This creates a sort of definitional problem for functions like
> PQhost().  The documentation says of this and other related functions:
> "These values are fixed for the life of the PGconn object."  But with
> this patch, that would no longer be true.  So at least the
> documentation needs an update.  Actually, though, I think the problem

I've missed this phrase, documentation should be updated.

> is more fundamental than that.  If the user asks for PQhost(conn), do
> they want the list of all hosts, or the one to which we're currently
> connected?  It might be either, depending on what they intend to do
> with the information.  If they mean to construct a new connection
> string, they might well want them all; but something like psql's
> \conninfo only shows the current one.  There's also the question of
> what "the current one" means if we're not connected to any server at
> all.  I'm not sure exactly how to solve these problems, but they need
> some thought.

I've thought that no one would need to reconstruct connect string from
conninfo object.  May be I've missed some use cases. 

Name PQhost suggest that it'll return just one host. So, I've decided
to interpret it as  'currently choosen host'. 

If we really need function which lists all the potentially connected
hosts, it should be new function. 

I hope this would introduce no backward incompatibility, because no
existing setup uses multihost connect strings.

Thanks for your critique.
        Victor.



Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Wed, Oct 19, 2016 at 7:26 PM, Peter van Hardenberg <pvh@pvh.ca> wrote:
> Supporting different ports on different servers would be a much appreciated
> feature (I can't remember if it was Kafka or Cassandra that didn't do this
> and it was very annoying.)
>
> Remember, as the connection string gets more complicated, psql supports the
> Postgres URL format as a single command-line argument and we may want to
> begin encouraging people to use that syntax instead.

While I was experimenting with this today, I discovered a problem of
interpretation related to IPv6 addresses.  Internally, a postgresql://
URL and a connection string are converted into the same format, so
postgresql://a,b/ means just the same thing as host=a,b.  I thought
that could similarly decide that postgresql://a:123,b:456/ is going to
get translated to host=a:123,b:456 and then there can be further code
to parse that into a list of host-and-port pairs.  However, if you do
that, then something like host=1:2:3::4:5:6 is fundamentally
ambiguous.  That :6 is equally valid either as part of the IP address
or as a trailing port number specification, and there's no a priori
way to know which it is.  Today, since the host part can't include a
port specifier, it's regarded as part of the IP address, and I think
it would probably be a bad idea to change that, as I believe Victor's
patch would.  He seems to have it in mind that we could allow things
like host=[1:2:3::4:5:6] or host=[1:2:3::4:5]:6, which would might be
helpful for the future but doesn't avoid changing the meaning of
connection strings that work today.

So now I think that to make this work correctly, we're going to need
to change both the URL parser and also add parsing for the host and
port.  Let's say the user says this:

postgresql://[1::2]:3,[4::5],[6::7]::8/

What I think we need to do is translate that into this:

host=1::2,4::5,6::7 port=3,,8

Note the double-comma, indicating a blank port number for the second
URL component.  When we parse those host and port strings, we can
match up each host with the corresponding port number again.  Of
course, the user could also skip the URL format and directly specify a
connection string.  And then they might write one where the host and
port parts don't have the same number of components, like this:

host=a,b,c port=3,4
or
host=a,b port=3,4,5

It is obvious what is meant if multiple hosts are given but only a
single port number is specified; it is also obvious what is meant if
the number of ports is equal to the number of hosts.  It is not
obvious what it means if there are multiple ports but the number
doesn't equal the number of hosts.  I think we can either treat that
case as an error or else do the following: if there are extra port
specifiers, ignore them; if there are extra host specifiers, use the
last port in the list for all of the remaining hosts.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Victor Wagner
Date:
On Wed, 19 Oct 2016 12:04:27 -0400
Robert Haas <robertmhaas@gmail.com> wrote:

> On Thu, Oct 13, 2016 at 5:53 AM, Victor Wagner <vitus@wagner.pp.ru>
> wrote:
> > On Thu, 13 Oct 2016 12:30:59 +0530
> > Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> >> On Fri, Sep 30, 2016 at 2:14 PM, Victor Wagner <vitus@wagner.pp.ru>
> >> wrote:
> >> Okay but for me consistency is also important. Since we agree to
> >> disagree on some of the comments and others have not expressed any
> >> problem I am moving it to committer.
> >
> > Thank you for your efforts improving my patch
>
> I haven't deeply reviewed this patch, but on a quick read-through it
> certainly seems to need a lot of cleanup work.
>

I've did some cleanup, i.e. running pgindent, spell-checking comments
and indenting sgml and attaching cleaned up version here.

I've also re-added test file t/001-multihost.pl which get lost from
previous post.

Thanks for your suggestions.



--
                                   Victor Wagner <vitus@wagner.pp.ru>

Attachment

Re: Patch: Implement failover on libpq connect level.

From
Peter Eisentraut
Date:
On 10/24/16 6:36 AM, Thom Brown wrote:
> So should it be the case that it disallows UNIX socket addresses
> entirely?  I can't imagine a list of UNIX socket addresses being that
> useful.

But maybe a mixed list of Unix-domain sockets and TCP connections?

The nice thing is that is it currently transparent and you don't have to
wonder.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Patch: Implement failover on libpq connect level.

From
Peter Eisentraut
Date:
On 10/24/16 11:57 AM, Robert Haas wrote:
> Today, since the host part can't include a
> port specifier, it's regarded as part of the IP address, and I think
> it would probably be a bad idea to change that, as I believe Victor's
> patch would.  He seems to have it in mind that we could allow things
> like host=[1:2:3::4:5:6] or host=[1:2:3::4:5]:6, which would might be
> helpful for the future but doesn't avoid changing the meaning of
> connection strings that work today.

Let's keep in mind here that the decision to allow database names to
contain a connection parameter substructure has caused some security
issues.  Let's not create more levels of ambiguity and the need to pass
around override flags.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Patch: Implement failover on libpq connect level.

From
Alvaro Herrera
Date:
Robert Haas wrote:

> While I was experimenting with this today, I discovered a problem of
> interpretation related to IPv6 addresses.  Internally, a postgresql://
> URL and a connection string are converted into the same format, so
> postgresql://a,b/ means just the same thing as host=a,b.  I thought
> that could similarly decide that postgresql://a:123,b:456/ is going to
> get translated to host=a:123,b:456 and then there can be further code
> to parse that into a list of host-and-port pairs.  However, if you do
> that, then something like host=1:2:3::4:5:6 is fundamentally
> ambiguous.  That :6 is equally valid either as part of the IP address
> or as a trailing port number specification, and there's no a priori
> way to know which it is.  Today, since the host part can't include a
> port specifier, it's regarded as part of the IP address, and I think
> it would probably be a bad idea to change that, as I believe Victor's
> patch would.  He seems to have it in mind that we could allow things
> like host=[1:2:3::4:5:6] or host=[1:2:3::4:5]:6, which would might be
> helpful for the future but doesn't avoid changing the meaning of
> connection strings that work today.

Umm, my recollection regarding IPv6 parsing in the URI syntax is that
those must appear inside square brackets -- it's not valid to have the
IPv6 address outside brackets, and the port number is necessarily
outside the brackets.  So there's no ambiguity.  If the current patch is
allowing IPv6 address to appear outside of brackets, that seems like a
bug to me.

The string conninfo spec should not accept port numbers in the "host"
part.  (Does it?)

> So now I think that to make this work correctly, we're going to need
> to change both the URL parser and also add parsing for the host and
> port.  Let's say the user says this:
> 
> postgresql://[1::2]:3,[4::5],[6::7]::8/

(There's a double colon before 8, I suppose that's just a typo.)

> What I think we need to do is translate that into this:
> 
> host=1::2,4::5,6::7 port=3,,8
> 
> Note the double-comma, indicating a blank port number for the second
> URL component.

Sounds reasonable.

> And then they might write one where the host and
> port parts don't have the same number of components, like this:
> 
> host=a,b,c port=3,4
> or
> host=a,b port=3,4,5
> 
> It is obvious what is meant if multiple hosts are given but only a
> single port number is specified; it is also obvious what is meant if
> the number of ports is equal to the number of hosts.

Agreed -- seems like we can accept both those cases.

> It is not obvious what it means if there are multiple ports but the
> number doesn't equal the number of hosts.

I think we should reject the case of differing number of elements and
neither host nor port is a singleton, as an error.  The suggestion to
ignore some parts seems too error-prone.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Patch: Implement failover on libpq connect level.

From
Amit Kapila
Date:
On Tue, Oct 25, 2016 at 2:10 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Robert Haas wrote:
>
>
>> It is not obvious what it means if there are multiple ports but the
>> number doesn't equal the number of hosts.
>
> I think we should reject the case of differing number of elements and
> neither host nor port is a singleton, as an error.  The suggestion to
> ignore some parts seems too error-prone.
>

+1. I also think returning error is better option in above case.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Mon, Oct 24, 2016 at 4:40 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Umm, my recollection regarding IPv6 parsing in the URI syntax is that
> those must appear inside square brackets -- it's not valid to have the
> IPv6 address outside brackets, and the port number is necessarily
> outside the brackets.  So there's no ambiguity.  If the current patch is
> allowing IPv6 address to appear outside of brackets, that seems like a
> bug to me.

I agree.

> The string conninfo spec should not accept port numbers in the "host"
> part.  (Does it?)

Not currently, but the proposed patch would cause it to do so.

>> So now I think that to make this work correctly, we're going to need
>> to change both the URL parser and also add parsing for the host and
>> port.  Let's say the user says this:
>>
>> postgresql://[1::2]:3,[4::5],[6::7]::8/
>
> (There's a double colon before 8, I suppose that's just a typo.)

Oh, yeah.  Right.

>> It is not obvious what it means if there are multiple ports but the
>> number doesn't equal the number of hosts.
>
> I think we should reject the case of differing number of elements and
> neither host nor port is a singleton, as an error.  The suggestion to
> ignore some parts seems too error-prone.

OK, can do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Mon, Oct 24, 2016 at 4:15 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 10/24/16 11:57 AM, Robert Haas wrote:
>> Today, since the host part can't include a
>> port specifier, it's regarded as part of the IP address, and I think
>> it would probably be a bad idea to change that, as I believe Victor's
>> patch would.  He seems to have it in mind that we could allow things
>> like host=[1:2:3::4:5:6] or host=[1:2:3::4:5]:6, which would might be
>> helpful for the future but doesn't avoid changing the meaning of
>> connection strings that work today.
>
> Let's keep in mind here that the decision to allow database names to
> contain a connection parameter substructure has caused some security
> issues.  Let's not create more levels of ambiguity and the need to pass
> around override flags.

I agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Mon, Oct 24, 2016 at 11:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> So now I think that to make this work correctly, we're going to need
> to change both the URL parser and also add parsing for the host and
> port.  Let's say the user says this:
>
> postgresql://[1::2]:3,[4::5],[6::7]::8/
>
> What I think we need to do is translate that into this:
>
> host=1::2,4::5,6::7 port=3,,8

So here's a patch implementing that.  This seems to work, but I
haven't tested it in incredible detail at this point, so testing from
others would be greatly appreciated (the handling of .pgpass files, in
particular, I have not tested).  This is loosely inspired by Victor's
patch but rewritten from scratch to fix the various design issues
which I've commented on in previous emails on this thread.

Some notes on decisions that I made while implementing this:

- Only the host= and port= parameter support multiple, comma-separated
values.  In particular, you can only specify a single password.
However, you can still end up with different passwords for connections
to different servers if the passwords are read from .pgpass.

- The PQhost(), PQport(), and PQpass() now functions return the
information for the host to which we actually connected or to the
first one in the list if we haven't tried to connect yet.  Looking
through where these functions are actually used in our source code,
this behavior seems more useful than any other I could imagine.  On
the other hand, PQreset() will retry the connection starting with the
first host in the original list.

- The hostorder, target_server_type, and failover_timeout parameters
from Victor's patch are not present in this version.  I think we can
add those things in future patches once the basic functionality is
committed.  Or maybe we'll decide that we don't want those things or
that we want something different instead, which is fine, too.  The
point is, they don't need to be there in the first patch.

- As requested, it's an error if you specify N hosts and K port
numbers where N != K.  However, K = 1 is always allowed; that is, if
only a single port number is specified, it applies to all hosts.

- I have made an attempt to update the documentation appropriately but
there might be other sections which also need to be adjusted.

Let me know your thoughts.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:
On Wed, Oct 26, 2016 at 8:49 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Let me know your thoughts.

One small issue. I tried to run make installcheck after applying patch there seems to be a invalid write access in code (resulting in crash).

==118997== Invalid write of size 1
==118997==    at 0x4E3DDF1: connectOptions2 (fe-connect.c:884)
==118997==    by 0x4E3D6FF: PQconnectStartParams (fe-connect.c:608)
==118997==    by 0x4E3D553: PQconnectdbParams (fe-connect.c:465)
==118997==    by 0x413067: main (startup.c:245)
==118997==  Address 0x5dc4014 is 0 bytes after a block of size 4 alloc'd
==118997==    at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==118997==    by 0x4E3DD3C: connectOptions2 (fe-connect.c:880)
==118997==    by 0x4E3D6FF: PQconnectStartParams (fe-connect.c:608)
==118997==    by 0x4E3D553: PQconnectdbParams (fe-connect.c:465)
==118997==    by 0x413067: main (startup.c:245)

After locally fixing this by allocating an extra space for string terminal character. make installcheck  tests pass.
-(char *) malloc(sizeof(char) * (e - s));
+ (char *) malloc(sizeof(char) * (e - s + 1));

--
Thanks and Regards
Mithun C Y

Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Thu, Oct 27, 2016 at 9:46 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Wed, Oct 26, 2016 at 8:49 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Let me know your thoughts.
>
> One small issue. I tried to run make installcheck after applying patch there
> seems to be a invalid write access in code (resulting in crash).
>
> ==118997== Invalid write of size 1
> ==118997==    at 0x4E3DDF1: connectOptions2 (fe-connect.c:884)
> ==118997==    by 0x4E3D6FF: PQconnectStartParams (fe-connect.c:608)
> ==118997==    by 0x4E3D553: PQconnectdbParams (fe-connect.c:465)
> ==118997==    by 0x413067: main (startup.c:245)
> ==118997==  Address 0x5dc4014 is 0 bytes after a block of size 4 alloc'd
> ==118997==    at 0x4C29BFD: malloc (in
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==118997==    by 0x4E3DD3C: connectOptions2 (fe-connect.c:880)
> ==118997==    by 0x4E3D6FF: PQconnectStartParams (fe-connect.c:608)
> ==118997==    by 0x4E3D553: PQconnectdbParams (fe-connect.c:465)
> ==118997==    by 0x413067: main (startup.c:245)
>
> After locally fixing this by allocating an extra space for string terminal
> character. make installcheck  tests pass.
> -(char *) malloc(sizeof(char) * (e - s));
> + (char *) malloc(sizeof(char) * (e - s + 1));

Thanks.  Here's a new version with a fix for that issue and also a fix
for PQconnectionNeedsPassword(), which was busted in v1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:
On Thu, Oct 27, 2016 at 11:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Thanks.  Here's a new version with a fix for that issue and also a fix
> for PQconnectionNeedsPassword(), which was busted in v1.
I did some more testing of the patch for both URI and (host, port) parameter pairs. I did test for ipv4, ipv6, UDS type of sockets, Also tested different password using .pgpass. I did not see any major issue with. All seems to work as defined. Tested the behaviors of API PQport, PQhost, PQpass, PQreset all works as defined.

I also have following observation which can be considered if it is valid.

1. URI do not support UDS, where as (host,port) pair support same.
But there is one problem with this, instead of reporting it, we discard the URI input (@conninfo_uri_parse_options) and try to connect to default unix socket

[mithun@localhost bin]$ ./psql postgres:///home/mithun:5555,127.0.0.1:5555/postgres -U mithun
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
server is running on both the sockets.

2. If we fail to connect we get an error message for each of the addrinfo we tried to connect, wondering if we could simplify same. Just a suggestion.
[mithun@localhost bin]$ ./psql postgres://,,,,,,,/postgres
psql: could not connect to server: Connection refused
Is the server running on host "" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (127.0.0.1) and accepting
TCP/IP connections on port 5432?

--
Thanks and Regards
Mithun C Y

Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Tue, Nov 1, 2016 at 2:03 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Thu, Oct 27, 2016 at 11:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Thanks.  Here's a new version with a fix for that issue and also a fix
>> for PQconnectionNeedsPassword(), which was busted in v1.
> I did some more testing of the patch for both URI and (host, port) parameter
> pairs. I did test for ipv4, ipv6, UDS type of sockets, Also tested different
> password using .pgpass. I did not see any major issue with. All seems to
> work as defined. Tested the behaviors of API PQport, PQhost, PQpass, PQreset
> all works as defined.
>
> I also have following observation which can be considered if it is valid.
>
> 1. URI do not support UDS, where as (host,port) pair support same.
> But there is one problem with this, instead of reporting it, we discard the
> URI input (@conninfo_uri_parse_options) and try to connect to default unix
> socket
>
> [mithun@localhost bin]$ ./psql
> postgres:///home/mithun:5555,127.0.0.1:5555/postgres -U mithun
> psql: could not connect to server: No such file or directory
> Is the server running locally and accepting
> connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
> server is running on both the sockets.

That's the wrong syntax.  If you look in
https://www.postgresql.org/docs/devel/static/libpq-connect.html under
"32.1.1.2. Connection URIs", it gives an example of how to include a
slash in a pathname.  You have to use %2F, or else the URL parser will
think you're starting a new section of the URI.

I believe this works fine if you use the correct syntax.

> 2. If we fail to connect we get an error message for each of the addrinfo we
> tried to connect, wondering if we could simplify same. Just a suggestion.
> [mithun@localhost bin]$ ./psql postgres://,,,,,,,/postgres
> psql: could not connect to server: Connection refused
> Is the server running on host "" (::1) and accepting
> TCP/IP connections on port 5432?
> could not connect to server: Connection refused
> Is the server running on host "" (127.0.0.1) and accepting
> TCP/IP connections on port 5432?
> could not connect to server: Connection refused
> Is the server running on host "" (::1) and accepting
> TCP/IP connections on port 5432?
> could not connect to server: Connection refused
> Is the server running on host "" (127.0.0.1) and accepting
> TCP/IP connections on port 5432?
> could not connect to server: Connection refused
> Is the server running on host "" (::1) and accepting
> TCP/IP connections on port 5432?
> could not connect to server: Connection refused
> Is the server running on host "" (127.0.0.1) and accepting
> TCP/IP connections on port 5432?
> could not connect to server: Connection refused
> Is the server running on host "" (::1) and accepting
> TCP/IP connections on port 5432?
> could not connect to server: Connection refused
> Is the server running on host "" (127.0.0.1) and accepting
> TCP/IP connections on port 5432?
> could not connect to server: Connection refused
> Is the server running on host "" (::1) and accepting
> TCP/IP connections on port 5432?
> could not connect to server: Connection refused
> Is the server running on host "" (127.0.0.1) and accepting
> TCP/IP connections on port 5432?
> could not connect to server: Connection refused
> Is the server running on host "" (::1) and accepting
> TCP/IP connections on port 5432?
> could not connect to server: Connection refused
> Is the server running on host "" (127.0.0.1) and accepting
> TCP/IP connections on port 5432?
> could not connect to server: Connection refused
> Is the server running on host "" (::1) and accepting
> TCP/IP connections on port 5432?
> could not connect to server: Connection refused
> Is the server running on host "" (127.0.0.1) and accepting
> TCP/IP connections on port 5432?
> could not connect to server: Connection refused
> Is the server running on host "" (::1) and accepting
> TCP/IP connections on port 5432?
> could not connect to server: Connection refused
> Is the server running on host "" (127.0.0.1) and accepting
> TCP/IP connections on port 5432?

That output seems fine to me.  In a real connection string, you're not
likely to have so many duplicated addresses, and it's good for the
error message to make clear which addresses were tried and what
happened for each one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:

On Tue, Nov 1, 2016 at 6:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>That's the wrong syntax.  If you look in
> "32.1.1.2. Connection URIs", it gives an example of how to include a
> slash in a pathname.  You have to use %2F, or else the URL parser will
> think you're starting a new section of the URI.
>I believe this works fine if you use the correct syntax.

Sorry that was a mistake from me. Now it appears fine.

Starting program: /home/mithun/libpqbin/bin/./psql postgres://%2fhome%2fmithun:5555/postgres -U mithun1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/home/mithun/.s.PGSQL.5555"?


Accidentally when testing further on this line I found a crash when invalid encoding "%2" (not recognised as hexa) was used.
Test:
Starting program: /home/mithun/libpqbin/bin/./psql postgres://%2home%2mithun:5555/postgres -U mithun1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7bb8d4f in PQpass (conn=0x68aaa0) at fe-connect.c:5582
5582 password = conn->connhost[conn->whichhost].password;
Missing separate debuginfos, use: debuginfo-install glibc-2.17-106.el7_2.6.x86_64 ncurses-libs-5.9-13.20130511.el7.x86_64 readline-6.2-9.el7.x86_64

There can be PGconn which have no connhost. 
exposed API's PQpass, PQreset->connectDBStart access conn->connhost without checking whether it is set.
 
>That output seems fine to me.  In a real connection string, you're not
>likely to have so many duplicated addresses, and it's good for the
>error message to make clear which addresses were tried and what
>happened for each one.

Agree, Thanks.

--
Thanks and Regards
Mithun C Y

Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Tue, Nov 1, 2016 at 9:43 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>> Starting program: /home/mithun/libpqbin/bin/./psql
>> postgres://%2fhome%2fmithun:5555/postgres -U mithun1
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> psql: could not connect to server: No such file or directory
>> Is the server running locally and accepting
>> connections on Unix domain socket "/home/mithun/.s.PGSQL.5555"?
>
> Accidentally when testing further on this line I found a crash when invalid
> encoding "%2" (not recognised as hexa) was used.
> Test:
>>
>> Starting program: /home/mithun/libpqbin/bin/./psql
>> postgres://%2home%2mithun:5555/postgres -U mithun1

You seem to be confused about how to escape characters in URLs.  The
format is a % followed by exactly two hex digits.  In this case, that
would give you %2h and %2m, and h and m are not hex characters, so the
error message is right.  Your input is not valid.

> There can be PGconn which have no connhost.
> exposed API's PQpass, PQreset->connectDBStart access conn->connhost without
> checking whether it is set.

Can you provide a concrete test scenario or some test code that fails?connhost is supposed to be getting set in
connectOptions2(),which is
 
run (AIUI) when the PGconn is first created.  So I think that it will
be set in all scenarios of the type you mention, but I might be
missing something.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:

On Tue, Nov 1, 2016 at 7:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Starting program: /home/mithun/libpqbin/bin/./psql
>> postgres://%2home%2mithun:5555/postgres -U mithun1
>Can you provide a concrete test scenario or some test code that fails?
>connhost is supposed to be getting set in connectOptions2(), which is
>run (AIUI) when the PGconn is first created.  So I think that it will
>be set in all scenarios of the type you mention, but I might be
>missing something.

Sorry if my sentence is confusing 
If I give a proper hexadecimal encoding % followed by 2 hexadigit (i.e. for e.g %2f, %2a) every thing is fine. When I pass a invalid hexadigit encoding eg:  %2h, %2m among the host string e.g "postgres://%2home%2mithun:5555/postgres". then "PQconnectdbParams()" fails before calling connectOptions2(). In that case failed PQconnectdbParams() also return a PGconn where connhost is not set. If we call PQpass(), PQReset() on such a PGconn we get a crash.

A simple test case which crash is:
./psql  'postgres://%2hxxx:5555/postgres'

Call stack:
--------------
#0  0x00007ffff7bb8d4f in PQpass (conn=0x68aa10) at fe-connect.c:5582
#1  0x00007ffff7bb907a in PQconnectionNeedsPassword (conn=0x68aa10) at fe-connect.c:5727
#2  0x00000000004130aa in main (argc=2, argv=0x7fffffffdff8) at startup.c:250

--
Thanks and Regards
Mithun C Y

Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Tue, Nov 1, 2016 at 11:21 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Tue, Nov 1, 2016 at 7:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> Starting program: /home/mithun/libpqbin/bin/./psql
>>> postgres://%2home%2mithun:5555/postgres -U mithun1
>>Can you provide a concrete test scenario or some test code that fails?
>>connhost is supposed to be getting set in connectOptions2(), which is
>>run (AIUI) when the PGconn is first created.  So I think that it will
>>be set in all scenarios of the type you mention, but I might be
>>missing something.
>
> Sorry if my sentence is confusing
> If I give a proper hexadecimal encoding % followed by 2 hexadigit (i.e. for
> e.g %2f, %2a) every thing is fine. When I pass a invalid hexadigit encoding
> eg:  %2h, %2m among the host string e.g
> "postgres://%2home%2mithun:5555/postgres". then "PQconnectdbParams()" fails
> before calling connectOptions2(). In that case failed PQconnectdbParams()
> also return a PGconn where connhost is not set. If we call PQpass(),
> PQReset() on such a PGconn we get a crash.
>
> A simple test case which crash is:
> ./psql  'postgres://%2hxxx:5555/postgres'
>
> Call stack:
> --------------
> #0  0x00007ffff7bb8d4f in PQpass (conn=0x68aa10) at fe-connect.c:5582
> #1  0x00007ffff7bb907a in PQconnectionNeedsPassword (conn=0x68aa10) at
> fe-connect.c:5727
> #2  0x00000000004130aa in main (argc=2, argv=0x7fffffffdff8) at
> startup.c:250

Ah, nuts.  Thanks, good catch.  Should be fixed in the attached version.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:
On Tue, Nov 1, 2016 at 9:42 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Ah, nuts.  Thanks, good catch.  Should be fixed in the attached version.

I repeated the test on new patch, It works fine now, Also did some more negative tests forcibly failing some internal calls. All tests have passed. This patch works as described and look good to me. 

--
Thanks and Regards
Mithun C Y

Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Wed, Nov 2, 2016 at 1:59 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Tue, Nov 1, 2016 at 9:42 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Ah, nuts.  Thanks, good catch.  Should be fixed in the attached version.
>
> I repeated the test on new patch, It works fine now, Also did some more
> negative tests forcibly failing some internal calls. All tests have passed.
> This patch works as described and look good to me.

Great, committed.  There's still potentially more work to be done
here, because my patch omits some features that were present in
Victor's original submission, like setting the failover timeout,
optionally randomizing the order of the hosts, and distinguishing
between master and standby servers; Victor, or anyone, please feel
free to submit separate patches for those things.

Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:
On Thu, Nov 3, 2016 at 7:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Great, committed.  There's still potentially more work to be done
> here, because my patch omits some features that were present in
> Victor's original submission, like setting the failover timeout,
> optionally randomizing the order of the hosts, and distinguishing
> between master and standby servers; Victor, or anyone, please feel
> free to submit separate patches for those things.

Among the remaining things I have worked on failover to new master idea. Below patch implement that idea. This is taken from Victors patch but rewritten by me to do some cleanup. As in Victor's patch we have a new connection parameter "target_server_type", It can take 2 values 1. "any" 2. "master" with DEFAULT as "any". If it's has the value "any" we can connect to any of the host server (both master(primary) and slave(standby)). If the value is "master" then we try to connect to master(primary) only.
NOTE: Parameter name is inspired and taken from PostgreSql JDBC Driver.

The main difference between Victor's and this new patch is Default value of parameter target_server_type. In Victor's patch if number of host in connection string is 1 then default value is "any" (This was done to make sure old psql connect to standby as it is now). If it is greater than 1 then default value is set as "master". For me this appeared slightly inconsistent having default value as "any" for any number of connection appeared more appropriate which is also backward compatible. And, if user want failover to master he should ask for it by setting target_server_type=master in connection string.

This patch do not include load balancing feature by trying to connect to host in random fashion.

This patch also do not have failover timeout feature. Victor's implementation of same is not a strict one. i. e. 1 walk of all of the host is allowed even if timer might have expired. Only on second walk we check for timeout. I thought application can manage the failover timeout instead libpq doing it. So avoided the same in this patch. If others find that I am wrong to think that way, will add the same. 

-- 
Thanks and Regards
Mithun C Y
Attachment

Re: Patch: Implement failover on libpq connect level.

From
"Tsunakawa, Takayuki"
Date:
From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Mithun Cy
> Among the remaining things I have worked on failover to new master idea.
> Below patch implement that idea. This is taken from Victors patch but
> rewritten by me to do some cleanup. As in Victor's patch we have a new
> connection parameter "target_server_type", It can take 2 values 1. "any"
> 2. "master" with DEFAULT as "any". If it's has the value "any" we can connect
> to any of the host server (both master(primary) and slave(standby)). If
> the value is "master" then we try to connect to master(primary) only.
> NOTE: Parameter name is inspired and taken from PostgreSql JDBC Driver
> <https://jdbc.postgresql.org/documentation/head/connect.html> .

I'm interested to review this patch (but I haven't read it yet, I'm reading Robert's patch now.)  Are you planning a
newCommitFest entry?
 

Why don't you add "standby" and "prefer_standby" as the target_server_type value?  Are you thinking that those values
areuseful with load balancing feature?
 


> The main difference between Victor's and this new patch is Default value
> of parameter target_server_type. In Victor's patch if number of host in
> connection string is 1 then default value is "any" (This was done to make
> sure old psql connect to standby as it is now). If it is greater than 1
> then default value is set as "master". For me this appeared slightly
> inconsistent having default value as "any" for any number of connection
> appeared more appropriate which is also backward compatible. And, if user
> want failover to master he should ask for it by setting
> target_server_type=master in connection string.

That's sensible, I agree.


Regards
Takayuki Tsunakawa


Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:
On Thu, Nov 10, 2016 at 1:11 PM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
> Why don't you add "standby" and "prefer_standby" as the target_server_type value?  Are you thinking that those values are useful with load balancing > feature?
Yes this patch will only address failover to new master, values "master" and "any" appeared sufficient for that case.

--
Thanks and Regards
Mithun C Y

Re: Patch: Implement failover on libpq connect level.

From
"Tsunakawa, Takayuki"
Date:
From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Mithun Cy
> Yes this patch will only address failover to new master, values "master"
> and "any" appeared sufficient for that case.

Do you mean that unlike pgJDBC "standby" and "prefer_standby" are useless, or they are useful but you don't have time
toimplement it and want to do it in the near future?  Do you mind if I do it if time permits me?  I think they are
usefulwithout load balancing feature, when the user has multiple standbys for HA.
 

Could you add a new entry in CommitFest 2017-1? I'm afraid we can't track the status of your patch because the original
patchin this thread has already been committed.
 

Regards
Takayuki Tsunakawa


Re: Patch: Implement failover on libpq connect level.

From
"Tsunakawa, Takayuki"
Date:
From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
> Great, committed.  There's still potentially more work to be done here,
> because my patch omits some features that were present in Victor's original
> submission, like setting the failover timeout, optionally randomizing the
> order of the hosts, and distinguishing between master and standby servers;
> Victor, or anyone, please feel free to submit separate patches for those
> things.

I did a few tests with ECPG.  I'm satisfied with the current behavior, but someone says different.  I'd like to share
theresult.
 

The following literal connection strings succeeded.  host1 is a server where PostgreSQL is not running, and host2 is
whereit's running.  I could connect to the database server on host2.
 

EXEC SQL CONNECT TO 'tcp:postgresql://host1,host2:5450/postgres';
EXEC SQL CONNECT TO 'tcp:postgresql://host1,host2:5450,5450/postgres';

EXEC SQL CONNECT TO 'postgres@host1,host2:5450';
EXEC SQL CONNECT TO 'postgres@host1,host2:5450,5450';


EXEC SQL CONNECT TO 'tcp:postgresql://?service=my_service';

~/.pg_service.conf
[my_service]
host=host1,host2
port=5450  # and port=5450,5450 case
dbname=postgres


But this one makes PQconnectdbParams() fail, because the passed "host" is "host1:5450,host2" and "port" is "5450".
ECPGconnect()'sparser is different from libpq's.  However, the tcp:postgresql:// syntax is not described a URL in the
manual,so I think it's sufficient to just describe the syntax in the ECPG article.
 

EXEC SQL CONNECT TO 'tcp:postgresql://host1:5450,host2:5450/postgres';


And without the single quote like below, ecpg fails to precompile the source file.  I also think it's enough to state
inthe manual "quote the connection target if you specify multiple hosts or ports".
 

EXEC SQL CONNECT TO tcp:postgresql://host1,host2:5450,5450/postgres;

connect.ec:12: ERROR: syntax error at or near ","


Comments?


Regards
Takayuki Tsunakawa


Re: Patch: Implement failover on libpq connect level.

From
Peter Eisentraut
Date:
On 11/9/16 10:05 AM, Mithun Cy wrote:
> As in Victor's patch we have a new connection parameter
> "target_server_type", It can take 2 values 1. "any" 2. "master" with
> DEFAULT as "any". If it's has the value "any" we can connect to any of
> the host server (both master(primary) and slave(standby)). If the value
> is "master" then we try to connect to master(primary) only.

We tend to use the term "primary" instead of "master".

Will this work with logical replication?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Patch: Implement failover on libpq connect level.

From
"Tsunakawa, Takayuki"
Date:
From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
> Great, committed.  There's still potentially more work to be done here,
> because my patch omits some features that were present in Victor's original
> submission, like setting the failover timeout, optionally randomizing the
> order of the hosts, and distinguishing between master and standby servers;
> Victor, or anyone, please feel free to submit separate patches for those
> things.

The attached patch fixes some bugs and make a clarification for doc.  Could you check and test the authentication stuff
asI don't have an environment at hand?
 

(1) fe-connect.c
There was a memory leak.

(2) fe-secure_openssl.c, fe-auth.c
GSSAPI/SSPI/SSL authentication requires the target host name, but the code uses conn->pghost which contains a
comma-separatedlist of host names.
 

(3) libpq.sgml
Added sentences to clarify connect_timeout when it is used with multiple hosts.

BTW, let me two questions.

Q1: Is there any reason why hostaddr doesn't accept multiple IP addresses?

Q2: pg_isready (and hence PQping() family) reports success when one host is alive and other hosts are down.  Is this
intended? I think this behavior is correct.
 
e.g. host1 is down and host2 is alive.
$ pg_isready -h host1,host2
host1,host2:5450 - accepting connections
$ echo $?
0

Regards
Takayuki Tsunakawa


Attachment

Re: Patch: Implement failover on libpq connect level.

From
"Tsunakawa, Takayuki"
Date:
Hi, Mithun

Before going deeper into the patch, let me give you some findings.

(1)
PGconn->target_server_type is not freed in freePGconn().


(2)
Could you add PGTARGETSERVERTYPE environment variable?  Like other variables, it will ease testing, since users can
changethe behavior without altering the connection string here and there.
 


(3)
I think it would be better to expose the server state via ParameterStatus protocol message like
standard_conforming_strings,instead of running "SELECT pg_is_in_recovery()".  We shouldn't want to add one round trip
tocheck the server type (master, standby).  postmaster can return the server type based on its state (pmState); PM_RUN
ismaster, and PM_HOT_STANDBY is standby.  In addition, as an impractical concern, DBA can revoke EXECUTE privilege on
pg_is_in_recovery()from non-superusers.
 

Regards
Takayuki Tsunakawa


Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:
On Fri, Nov 11, 2016 at 7:33 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
> We tend to use the term "primary" instead of "master".

Thanks, I will use primary instead of master in my next patch.

>Will this work with logical replication?

I have not tested with logical replication. Currently we identify the primary to connect based on result of "SELECT pg_is_in_recovery()". So I think it works. Do you want me test a particular setup?

--
Thanks and Regards
Mithun C Y

Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:
On Mon, Nov 14, 2016 at 11:31 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
> PGconn->target_server_type is not freed in freePGconn().

Thanks, will fix in new patch.

>Could you add PGTARGETSERVERTYPE environment variable?  Like other variables, it will ease testing, since users can change the behavior >without altering the connection string here and there.

Okay, will add one.

> I think it would be better to expose the server state via ParameterStatus protocol message like standard_conforming_strings, instead of running > >"SELECT pg_is_in_recovery()".  We shouldn't want to add one round trip to check the server type (master, standby).  postmaster can return the server >type based on its state (pmState); PM_RUN is master, and PM_HOT_STANDBY is standby.  In addition, as an impractical concern, DBA can revoke >EXECUTE privilege on pg_is_in_recovery() from non-superusers.

If you are suggesting me to change in protocol messages, I think that would not be backward compatible to older version servers. I also think such level of protocol changes will not be allowed. with connection status CONNECTION_SETENV used for protocol version 2.0 setup, we sent some query  like
"select pg_catalog.pg_client_encoding()" for same. So I think using "SELECT pg_is_in_recovery()" should be fine.
Thanks and Regards
Mithun C Y

Re: Patch: Implement failover on libpq connect level.

From
"Tsunakawa, Takayuki"
Date:
From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Mithun Cy
> If you are suggesting me to change in protocol messages, I think that would
> not be backward compatible to older version servers. I also think such level
> of protocol changes will not be allowed. with connection status
> CONNECTION_SETENV used for protocol version 2.0 setup, we sent some query
> like "select pg_catalog.pg_client_encoding()" for same. So I think using
> "SELECT pg_is_in_recovery()" should be fine.

No, there's no concern about compatibility.  Please look at this:

https://www.postgresql.org/docs/devel/static/protocol-flow.html#PROTOCOL-ASYNC

[Excerpt]
----------------------------------------
ParameterStatus messages will be generated whenever the active value changes for any of the parameters the backend
believesthe frontend should know about. Most commonly this occurs in response to a SET SQL command executed by the
frontend,and this case is effectively synchronous — but it is also possible for parameter status changes to occur
becausethe administrator changed a configuration file and then sent the SIGHUP signal to the server. Also, if a SET
commandis rolled back, an appropriate ParameterStatus message will be generated to report the current effective value.
 

At present there is a hard-wired set of parameters for which ParameterStatus will be generated: they are
server_version,server_encoding, client_encoding, application_name, is_superuser, session_authorization, DateStyle,
IntervalStyle,TimeZone, integer_datetimes, and standard_conforming_strings. (server_encoding, TimeZone, and
integer_datetimeswere not reported by releases before 8.0; standard_conforming_strings was not reported by releases
before8.1; IntervalStyle was not reported by releases before 8.4; application_name was not reported by releases before
9.0.)Note that server_version, server_encoding and integer_datetimes are pseudo-parameters that cannot change after
startup.This set might change in the future, or even become configurable. Accordingly, a frontend should simply ignore
ParameterStatusfor parameters that it does not understand or care about.
 
----------------------------------------



Regards
Takayuki Tsunakawa


Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:
On Mon, Nov 14, 2016 at 1:37 PM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
 > No, there's no concern about compatibility.  Please look at this:
Thanks, my concern is suppose you have 3 server in cluster A(new version), B(new version), C(old version). If we implement as above only new servers will send ParameterStatus message to indicate what type of server we are connected. Server C will not send same. So we will not be able to use new feature "failover to new master" for such a kind of cluster.

--
Thanks and Regards
Mithun C Y

Re: Patch: Implement failover on libpq connect level.

From
"Tsunakawa, Takayuki"
Date:
From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Mithun Cy
> Thanks, my concern is suppose you have 3 server in cluster A(new version),
> B(new version), C(old version). If we implement as above only new servers
> will send ParameterStatus message to indicate what type of server we are
> connected. Server C will not send same. So we will not be able to use new
> feature "failover to new master" for such a kind of cluster.

No, the streaming replication requires the same major release for all member servers, so there's no concern about the
mixed-versioncluster.
 

Sorry, pmState can only be used in postmaster.  In our context, postgres can use RecoveryInProgress().  Anyway, in
additionto the reduced round trip, the libpq code would be much simpler.
 

Regards
Takayuki Tsunakawa



Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Mon, Nov 14, 2016 at 2:38 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Fri, Nov 11, 2016 at 7:33 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> We tend to use the term "primary" instead of "master".
>
> Thanks, I will use primary instead of master in my next patch.
>
>>Will this work with logical replication?
>
> I have not tested with logical replication. Currently we identify the
> primary to connect based on result of "SELECT pg_is_in_recovery()". So I
> think it works. Do you want me test a particular setup?

If logical replication is in use, none of the servers involved would
be in recovery.  I'm not sure what command would need to be used to
assess whether we've got a master or a standby, but probably not that
one.  This gets at one of my earlier complaints about this part of the
functionality, which is that hardcoding that particular SQL statement
into libpq seems like a giant hack.  However, I'm not sure what to do
about it.  The functionality is clearly useful, because JDBC has it,
and Victor proposed this patch to add it to libpq, and - totally
independently of any of that - EnterpriseDB has a customer who has
requested libpq support for this as well.  So I am tempted to just
hold my nose and hard-code the SQL as JDBC is presumably already
doing.  If we figure out what the equivalent for logical replication
would be we can add something to cover that case, too.  It's ugly, but
I don't have a better idea, and I think there's value in being
compatible with what JDBC has already done (even if it's not what we
would have chosen to do tabula rasa).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Mon, Nov 14, 2016 at 8:09 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> From: pgsql-hackers-owner@postgresql.org
>> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Mithun Cy
>> Thanks, my concern is suppose you have 3 server in cluster A(new version),
>> B(new version), C(old version). If we implement as above only new servers
>> will send ParameterStatus message to indicate what type of server we are
>> connected. Server C will not send same. So we will not be able to use new
>> feature "failover to new master" for such a kind of cluster.
>
> No, the streaming replication requires the same major release for all member servers, so there's no concern about the
mixed-versioncluster.
 

True, but there is a concern about a newer libpq connecting to older
servers.  If we mimic what JDBC is already doing, we'll be compatible
and you'll be able to use this feature with a v10 libpq without
worrying about whether the target server is also v10.  If we invent
something new on the server side, then you'll need to be sure you have
both a v10 libpq and v10 server.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Mon, Nov 14, 2016 at 2:38 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

> > I have not tested with logical replication. Currently we identify the
> > primary to connect based on result of "SELECT pg_is_in_recovery()". So I
> > think it works. Do you want me test a particular setup?
> 
> If logical replication is in use, none of the servers involved would
> be in recovery.  I'm not sure what command would need to be used to
> assess whether we've got a master or a standby, but probably not that
> one.  This gets at one of my earlier complaints about this part of the
> functionality, which is that hardcoding that particular SQL statement
> into libpq seems like a giant hack.  However, I'm not sure what to do
> about it.  The functionality is clearly useful, because JDBC has it,
> and Victor proposed this patch to add it to libpq, and - totally
> independently of any of that - EnterpriseDB has a customer who has
> requested libpq support for this as well.  So I am tempted to just
> hold my nose and hard-code the SQL as JDBC is presumably already
> doing.  If we figure out what the equivalent for logical replication
> would be we can add something to cover that case, too.  It's ugly, but
> I don't have a better idea, and I think there's value in being
> compatible with what JDBC has already done (even if it's not what we
> would have chosen to do tabula rasa).

I would rather come up with something that works in both cases that we
can extend internally later, say pg_is_primary_node() or something like
that instead; and we implement it initially by returning the inverse of
pg_is_in_recovery() for replicated non-logical flocks, while we figure
out what to do in logical replication.  Otherwise it will be harder to
change later if we embed it in libpq, and we may be forced into
supporting nonsensical situations such as having pg_is_in_recovery()
return true for logical replication primary nodes.

FWIW I'm not sure "primary" is the right term here either.  I think what
we really want to know is whether the node can accept writes; maybe
"pg_is_writable_node".

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Tue, Nov 15, 2016 at 9:42 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Robert Haas wrote:
>> On Mon, Nov 14, 2016 at 2:38 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>> > I have not tested with logical replication. Currently we identify the
>> > primary to connect based on result of "SELECT pg_is_in_recovery()". So I
>> > think it works. Do you want me test a particular setup?
>>
>> If logical replication is in use, none of the servers involved would
>> be in recovery.  I'm not sure what command would need to be used to
>> assess whether we've got a master or a standby, but probably not that
>> one.  This gets at one of my earlier complaints about this part of the
>> functionality, which is that hardcoding that particular SQL statement
>> into libpq seems like a giant hack.  However, I'm not sure what to do
>> about it.  The functionality is clearly useful, because JDBC has it,
>> and Victor proposed this patch to add it to libpq, and - totally
>> independently of any of that - EnterpriseDB has a customer who has
>> requested libpq support for this as well.  So I am tempted to just
>> hold my nose and hard-code the SQL as JDBC is presumably already
>> doing.  If we figure out what the equivalent for logical replication
>> would be we can add something to cover that case, too.  It's ugly, but
>> I don't have a better idea, and I think there's value in being
>> compatible with what JDBC has already done (even if it's not what we
>> would have chosen to do tabula rasa).
>
> I would rather come up with something that works in both cases that we
> can extend internally later, say pg_is_primary_node() or something like
> that instead; and we implement it initially by returning the inverse of
> pg_is_in_recovery() for replicated non-logical flocks, while we figure
> out what to do in logical replication.  Otherwise it will be harder to
> change later if we embed it in libpq, and we may be forced into
> supporting nonsensical situations such as having pg_is_in_recovery()
> return true for logical replication primary nodes.

I don't think we'll be backed into a corner like that, because we can
always make this contingent on server version.  libpq will have that
available.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Catalin Iacob
Date:
On Tue, Nov 15, 2016 at 5:58 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Nov 15, 2016 at 9:42 AM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> I would rather come up with something that works in both cases that we
>> can extend internally later, say pg_is_primary_node() or something like
>> that instead; and we implement it initially by returning the inverse of
>> pg_is_in_recovery() for replicated non-logical flocks, while we figure
>> out what to do in logical replication.  Otherwise it will be harder to
>> change later if we embed it in libpq, and we may be forced into
>> supporting nonsensical situations such as having pg_is_in_recovery()
>> return true for logical replication primary nodes.
>
> I don't think we'll be backed into a corner like that, because we can
> always make this contingent on server version.  libpq will have that
> available.

But even with server version checking code, that code will be inside
libpq so there will be old libpq versions in the field that won't know
the proper query to send to new server versions.



Re: Patch: Implement failover on libpq connect level.

From
Catalin Iacob
Date:
On Tue, Nov 15, 2016 at 3:42 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> FWIW I'm not sure "primary" is the right term here either.  I think what
> we really want to know is whether the node can accept writes; maybe
> "pg_is_writable_node".

This made me think of another complication: what about cascading
replication where the middle node is both a master and a slave? You'd
probably not want to fallback to the middle node even though it is a
master for its downstream.



Re: Patch: Implement failover on libpq connect level.

From
Alvaro Herrera
Date:
Catalin Iacob wrote:
> On Tue, Nov 15, 2016 at 3:42 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > FWIW I'm not sure "primary" is the right term here either.  I think what
> > we really want to know is whether the node can accept writes; maybe
> > "pg_is_writable_node".
> 
> This made me think of another complication: what about cascading
> replication where the middle node is both a master and a slave? You'd
> probably not want to fallback to the middle node even though it is a
> master for its downstream.

Exactly my thought, hence the above suggestion.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Tue, Nov 15, 2016 at 12:53 PM, Catalin Iacob <iacobcatalin@gmail.com> wrote:
> On Tue, Nov 15, 2016 at 5:58 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Nov 15, 2016 at 9:42 AM, Alvaro Herrera
>> <alvherre@2ndquadrant.com> wrote:
>>> I would rather come up with something that works in both cases that we
>>> can extend internally later, say pg_is_primary_node() or something like
>>> that instead; and we implement it initially by returning the inverse of
>>> pg_is_in_recovery() for replicated non-logical flocks, while we figure
>>> out what to do in logical replication.  Otherwise it will be harder to
>>> change later if we embed it in libpq, and we may be forced into
>>> supporting nonsensical situations such as having pg_is_in_recovery()
>>> return true for logical replication primary nodes.
>>
>> I don't think we'll be backed into a corner like that, because we can
>> always make this contingent on server version.  libpq will have that
>> available.
>
> But even with server version checking code, that code will be inside
> libpq so there will be old libpq versions in the field that won't know
> the proper query to send to new server versions.

Good point.  pg_is_writable_node() sounds good, then, and we can still
send pg_is_in_recovery() if we're connected to a pre-10 version.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:
> So I am tempted to just
> hold my nose and hard-code the SQL as JDBC is presumably already
doing.  

JDBC is sending "show transaction_read_only" to find whether it is master or not.
Victor's patch also started with it, but later it was transformed into pg_is_in_recovery
by him as it appeared more appropriate to identify the master / slave.

ConnectionFactoryImpl.java:685

+ private boolean isMaster(QueryExecutor queryExecutor, Logger logger)
+ throws SQLException, IOException {
+    byte[][] results = SetupQueryRunner.run(queryExecutor, "show transaction_read_only", true);
+    String value = queryExecutor.getEncoding().decode(results[0]);
+    return value.equalsIgnoreCase("off");
}

--
Thanks and Regards
Mithun C Y

Re: Patch: Implement failover on libpq connect level.

From
Tatsuo Ishii
Date:
> JDBC is sending "show transaction_read_only" to find whether it is master
> or not.
> Victor's patch also started with it, but later it was transformed into
> pg_is_in_recovery
> by him as it appeared more appropriate to identify the master / slave.

In my understanding pg_is_in_recovery() returns true if it's a standby
node. However, even if it returns other than true, the server is not
necessarily a primary. Even it's not configured as a streaming
replication primary, it returns other than true.

So if your intention is finding a primary, I am not sure if
pg_is_in_recovery() is the best solution.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Tue, Nov 15, 2016 at 11:31 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>> > So I am tempted to just
>> > hold my nose and hard-code the SQL as JDBC is presumably already
>> doing.
>
> JDBC is sending "show transaction_read_only" to find whether it is master or
> not.
> Victor's patch also started with it, but later it was transformed into
> pg_is_in_recovery
> by him as it appeared more appropriate to identify the master / slave.

Hmm, let's go back to the JDBC method, then.  "show
transaction_read_only" will return true on a standby, but presumably
also on any other non-writable node.  You could even force it to be
true artificially if you wanted to force traffic off of a node, using
ALTER {SYSTEM|USER ...|DATABASE ..} SET default_transaction_read_only
= on

I think that would address Alvaro's concern, and it's nicer anyway if
libpq and JDBC are doing the same thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Kevin Grittner
Date:
[moving this branch of discussion to pgsql-jdbc]

On Tue, Nov 15, 2016 at 10:31 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

> JDBC is sending "show transaction_read_only" to find whether it
> is master or not.

If true, that's insane.  That can be different on each connection
to the cluster and can change tens of thousands of times per second
on any connection!

I know of one large shop that sets default_transaction_read_only =
true because 99% of their transactions are read only and they use
serializable transactions -- which run faster and with less
contention when transactions which don't need to write are flagged
as read only.  It seems safer to them to only turn off the read
only property for transactions which might need to write.

> Victor's patch also started with it, but later it was transformed into
> pg_is_in_recovery
> by him as it appeared more appropriate to identify the master / slave.

I don't know whether that is ideal, but it is sure a lot better
than something which can change with every transaction -- or even
within a transaction (in one direction).

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Patch: Implement failover on libpq connect level.

From
Catalin Iacob
Date:
On Wed, Nov 16, 2016 at 2:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Hmm, let's go back to the JDBC method, then.  "show
> transaction_read_only" will return true on a standby, but presumably
> also on any other non-writable node.  You could even force it to be
> true artificially if you wanted to force traffic off of a node, using
> ALTER {SYSTEM|USER ...|DATABASE ..} SET default_transaction_read_only
> = on
>
> I think that would address Alvaro's concern, and it's nicer anyway if
> libpq and JDBC are doing the same thing.

Not sure I agree that using this is a good idea in the first place.

But if we end up using this, I really think the docs should be very
explicit about what's implemented and not just say master/any. With
the master/any docs in the patch I would be *very* surprised if a
master is skipped only because it was configured with
default_transaction_read_only = on.



Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Wed, Nov 16, 2016 at 12:00 PM, Catalin Iacob <iacobcatalin@gmail.com> wrote:
> On Wed, Nov 16, 2016 at 2:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Hmm, let's go back to the JDBC method, then.  "show
>> transaction_read_only" will return true on a standby, but presumably
>> also on any other non-writable node.  You could even force it to be
>> true artificially if you wanted to force traffic off of a node, using
>> ALTER {SYSTEM|USER ...|DATABASE ..} SET default_transaction_read_only
>> = on
>>
>> I think that would address Alvaro's concern, and it's nicer anyway if
>> libpq and JDBC are doing the same thing.
>
> Not sure I agree that using this is a good idea in the first place.
>
> But if we end up using this, I really think the docs should be very
> explicit about what's implemented and not just say master/any. With
> the master/any docs in the patch I would be *very* surprised if a
> master is skipped only because it was configured with
> default_transaction_read_only = on.

It seems like it is going to be difficult to please everyone here
100%, because there are multiple conflicting priorities.  But we can
definitely document whatever choices we make.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
"Tsunakawa, Takayuki"
Date:
From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
> On Mon, Nov 14, 2016 at 8:09 PM, Tsunakawa, Takayuki
> <tsunakawa.takay@jp.fujitsu.com> wrote:
> > From: pgsql-hackers-owner@postgresql.org
> >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Mithun Cy
> >> Thanks, my concern is suppose you have 3 server in cluster A(new
> >> version), B(new version), C(old version). If we implement as above
> >> only new servers will send ParameterStatus message to indicate what
> >> type of server we are connected. Server C will not send same. So we
> >> will not be able to use new feature "failover to new master" for such
> a kind of cluster.
> >
> > No, the streaming replication requires the same major release for all
> member servers, so there's no concern about the mixed-version cluster.
> 
> True, but there is a concern about a newer libpq connecting to older servers.
> If we mimic what JDBC is already doing, we'll be compatible and you'll be
> able to use this feature with a v10 libpq without worrying about whether
> the target server is also v10.  If we invent something new on the server
> side, then you'll need to be sure you have both a v10 libpq and v10 server.

Do we really want to enable libpq failover against pre-V10 servers?  I don't think so, as libpq is a part of PostgreSQL
andlibpq failover is a new feature in PostgreSQL 10.  At least, as one user, I don't want PostgreSQL to sacrifice
anotherround trip to establish a connection.  As a developer, I don't want libpq code more complex than necessary (the
proposedpatch adds a new state to the connection state machine.)  And I think it's natural for the server to return the
serverattribute (primary/standby, writable, etc.) as a response to the Startup message like server_version,
standard_conforming_stringsand server_encoding.
 

Regards
Takayuki Tsunakawa



Re: Patch: Implement failover on libpq connect level.

From
"Tsunakawa, Takayuki"
Date:
From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tatsuo Ishii
> In my understanding pg_is_in_recovery() returns true if it's a standby node.
> However, even if it returns other than true, the server is not necessarily
> a primary. Even it's not configured as a streaming replication primary,
> it returns other than true.
> 
> So if your intention is finding a primary, I am not sure if
> pg_is_in_recovery() is the best solution.

Yes, I don't think pg_is_in_recovery() is the best, but there doesn't seem to be a better solution.
pg_is_in_recovery(),as its name clearly suggests, returns true if the server is performing recovery.  For example, it
returnstrue if hot_standby=on is present in postgresql.conf and the recovery from backup is in progress.  It's not a
standby.

Regards
Takayuki Tsunakawa




Re: Patch: Implement failover on libpq connect level.

From
"Tsunakawa, Takayuki"
Date:
From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
> Hmm, let's go back to the JDBC method, then.  "show transaction_read_only"
> will return true on a standby, but presumably also on any other non-writable
> node.  You could even force it to be true artificially if you wanted to
> force traffic off of a node, using ALTER {SYSTEM|USER ...|DATABASE ..} SET
> default_transaction_read_only = on
> 
> I think that would address Alvaro's concern, and it's nicer anyway if libpq
> and JDBC are doing the same thing.

If you prefer consistency between libpq and JDBC, then we could correct JDBC.  People here should know the server state
well,and be able to figure out a good specification.
 

Regards
Takayuki Tsunakawa



Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Wed, Nov 16, 2016 at 9:00 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> Do we really want to enable libpq failover against pre-V10 servers?  I don't think so, as libpq is a part of
PostgreSQLand libpq failover is a new feature in PostgreSQL 10.  At least, as one user, I don't want PostgreSQL to
sacrificeanother round trip to establish a connection.  As a developer, I don't want libpq code more complex than
necessary(the proposed patch adds a new state to the connection state machine.)  And I think it's natural for the
serverto return the server attribute (primary/standby, writable, etc.) as a response to the Startup message like
server_version,standard_conforming_strings and server_encoding. 

Well, generally speaking, a new feature that works against older
server is better than one that doesn't.  Of course, if that entails
making other compromises then you have to decide whether it's worth
it, but SELECT pg_is_in_recovery() and SHOW transaction_read_only
exist in older versions so if we pick either of those methods then it
will just work.  If we decide to invent some completely new method of
distinguishing masters from standbys, then it might not, but that
would be a drawback of such a choice, not a benefit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:
On Thu, Nov 17, 2016 at 8:27 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>but SELECT pg_is_in_recovery() and SHOW transaction_read_only
>exist in older versions so if we pick either of those methods then it
>will just work.

I am adding next version of the patch it have following fixes.
Tsunakawa's comments
1.  PGconn->target_server_type is now freed in freePGconn()
2.  Added PGTARGETSERVERTYPE.

Additional comments from others
3.  Moved from SELECT pg_is_in_recovery() to SHOW transaction_read_only now should handle different kind of replication, as we recognise server to which writable connection can be made as primary. Very exactly like JDBC driver. Also documented about it.
4. renamed words from master to primary.

--
Thanks and Regards
Mithun C Y

Attachment

Re: Patch: Implement failover on libpq connect level.

From
"Tsunakawa, Takayuki"
Date:
From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Mithun Cy
> I am adding next version of the patch it have following fixes.
> Tsunakawa's comments
> 
> 1.  PGconn->target_server_type is now freed in freePGconn() 2.  Added
> PGTARGETSERVERTYPE.
> 
> 
> Additional comments from others
> 3.  Moved from SELECT pg_is_in_recovery() to SHOW transaction_read_only
> now should handle different kind of replication, as we recognise server
> to which writable connection can be made as primary. Very exactly like JDBC
> driver. Also documented about it.
> 4. renamed words from master to primary.

Thank you.  The following items need addressing.  Some of them require some more discussion to reach consensus, and I
hopethey will settle down soon.  After checking the progress for a week or so, I'll mark the CommitFest entry as "ready
forcommitter" or "waiting on author".
 

(1)
+        server. Set this to <literaL>any</literal>, if you want to connect to
+        A server is recognized as a primary/standby by observering whether it

Typo.  <literaL. -> <literal>, and "observering" -> "observing".


(2)
+    {"target_server_type", "PGTARGETSERVERTYPE", NULL, NULL,
+        "Target server type", "", 6,

Looking at existing parameters, the default value is defined as a macro, and the display label is a sequence of words
separatedby "-".  i.e.
 

+    {"target_server_type", "PGTARGETSERVERTYPE", DefaultTargetServerType, NULL,
+        "Target-Server-Type", "", 6,


(3)
Please avoid adding another round trip by using a GUC_REPORTed variable (ParameterStatus entry).  If you want to
supportthis libpq failover with pre-10 servers, you can switch the method of determining the primary based on the
serverversion.  But I don't think it's worth supporting older servers at the price of libpq code complexity.
 


(4)
Please consider supporting "standby" and "prefer_standby" like PgJDBC.  They are useful without load balancing when
multiplestandbys are used for HA.
 


(5)
I haven't tracked the progress of logical replication, but will target_server_type be likely to be usable with it?  How
willtarget_server_type fit logical replication?
 

Regards
Takayuki Tsunakawa



Re: Patch: Implement failover on libpq connect level.

From
Craig Ringer
Date:
On 17 November 2016 at 10:57, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Nov 16, 2016 at 9:00 PM, Tsunakawa, Takayuki
> <tsunakawa.takay@jp.fujitsu.com> wrote:
>> Do we really want to enable libpq failover against pre-V10 servers?  I don't think so, as libpq is a part of
PostgreSQLand libpq failover is a new feature in PostgreSQL 10.  At least, as one user, I don't want PostgreSQL to
sacrificeanother round trip to establish a connection.  As a developer, I don't want libpq code more complex than
necessary(the proposed patch adds a new state to the connection state machine.)  And I think it's natural for the
serverto return the server attribute (primary/standby, writable, etc.) as a response to the Startup message like
server_version,standard_conforming_strings and server_encoding. 
>
> Well, generally speaking, a new feature that works against older
> server is better than one that doesn't.  Of course, if that entails
> making other compromises then you have to decide whether it's worth
> it, but SELECT pg_is_in_recovery() and SHOW transaction_read_only
> exist in older versions so if we pick either of those methods then it
> will just work.  If we decide to invent some completely new method of
> distinguishing masters from standbys, then it might not, but that
> would be a drawback of such a choice, not a benefit.

We can and probably should have both.

If the server tells us on connect whether it's a standby or not, use that.

Otherwise, ask it.

That way we don't pay the round-trip cost and get the log spam when
talking to newer servers that send us something useful in the startup
packet, but we can still query it on older servers. Graceful fallback.

Every round trip is potentially very expensive. Having libpq do them
unnecessarily is bad.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Thu, Nov 17, 2016 at 10:08 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 17 November 2016 at 10:57, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Nov 16, 2016 at 9:00 PM, Tsunakawa, Takayuki
>> <tsunakawa.takay@jp.fujitsu.com> wrote:
>>> Do we really want to enable libpq failover against pre-V10 servers?  I don't think so, as libpq is a part of
PostgreSQLand libpq failover is a new feature in PostgreSQL 10.  At least, as one user, I don't want PostgreSQL to
sacrificeanother round trip to establish a connection.  As a developer, I don't want libpq code more complex than
necessary(the proposed patch adds a new state to the connection state machine.)  And I think it's natural for the
serverto return the server attribute (primary/standby, writable, etc.) as a response to the Startup message like
server_version,standard_conforming_strings and server_encoding. 
>>
>> Well, generally speaking, a new feature that works against older
>> server is better than one that doesn't.  Of course, if that entails
>> making other compromises then you have to decide whether it's worth
>> it, but SELECT pg_is_in_recovery() and SHOW transaction_read_only
>> exist in older versions so if we pick either of those methods then it
>> will just work.  If we decide to invent some completely new method of
>> distinguishing masters from standbys, then it might not, but that
>> would be a drawback of such a choice, not a benefit.
>
> We can and probably should have both.
>
> If the server tells us on connect whether it's a standby or not, use that.
>
> Otherwise, ask it.
>
> That way we don't pay the round-trip cost and get the log spam when
> talking to newer servers that send us something useful in the startup
> packet, but we can still query it on older servers. Graceful fallback.
>
> Every round trip is potentially very expensive. Having libpq do them
> unnecessarily is bad.

True, but raising the bar for this feature so that it doesn't get done
is also bad.  It can be improved in a later patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Craig Ringer
Date:
On 19 November 2016 at 01:29, Robert Haas <robertmhaas@gmail.com> wrote:

>> We can and probably should have both.
>>
>> If the server tells us on connect whether it's a standby or not, use that.
>>
>> Otherwise, ask it.
>>
>> That way we don't pay the round-trip cost and get the log spam when
>> talking to newer servers that send us something useful in the startup
>> packet, but we can still query it on older servers. Graceful fallback.
>>
>> Every round trip is potentially very expensive. Having libpq do them
>> unnecessarily is bad.
>
> True, but raising the bar for this feature so that it doesn't get done
> is also bad.  It can be improved in a later patch.

Good point. Starting with the followup query method seems fine.


-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:

On Fri, Nov 18, 2016 at 6:39 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
 >Typo.  <literaL. -> <literal>, and "observering" -> "observing".

Thanks fixed.
 
> +       {"target_server_type", "PGTARGETSERVERTYPE", DefaultTargetServerType, NULL,
> +               "Target-Server-Type", "", 6,

Thanks fixed.
  
> Please avoid adding another round trip by using a GUC_REPORTed variable (ParameterStatus entry).  If you want to support this libpq failover with >pre-10 servers, you can switch the method of determining the primary based on the server version.  But I don't think it's worth supporting older servers > at the price of libpq code complexity.

Currently there is no consensus around this. For now, making this patch to address failover to next primary as similar to JDBC seems sufficient for me.
On next proposal of patch I think we can try to extend as you have proposed

>Please consider supporting "standby" and "prefer_standby" like PgJDBC.  They are useful without load balancing when multiple standbys are used for HA.

I think they become more relevant with load-balancing. And, making it usable when we extend this feature to have load-balancing makes sense to me. 

> I haven't tracked the progress of logical replication, but will target_server_type be likely to be usable with it?  How will target_server_type fit logical       > replication?

I tried to check logical replication WIP patch, not very sure how to accomodate same.

--
Thanks and Regards
Mithun C Y

Attachment

Re: Patch: Implement failover on libpq connect level.

From
"Tsunakawa, Takayuki"
Date:
From: Mithun Cy [mailto:mithun.cy@enterprisedb.com]
> > +       {"target_server_type", "PGTARGETSERVERTYPE",
> DefaultTargetServerType, NULL,
> > +               "Target-Server-Type", "", 6,
> 
> Thanks fixed.

+    {"target_server_type", "PGTARGETSERVERTYPE", NULL, NULL,

The default value field is still NULL.


> > Please avoid adding another round trip by using a GUC_REPORTed variable
> (ParameterStatus entry).  If you want to support this libpq failover with
> >pre-10 servers, you can switch the method of determining the primary based
> on the server version.  But I don't think it's worth supporting older
> servers > at the price of libpq code complexity.
> 
> Currently there is no consensus around this. For now, making this patch
> to address failover to next primary as similar to JDBC seems sufficient
> for me.
> On next proposal of patch I think we can try to extend as you have proposed

I don't think show transaction is correct, because it's affected by default_transaction_read_only and ALTER
DATABASE/USERSET transaction_read_only.  transaction_read_only is a transaction attribute, not the server type.  What
wewant to use to determine the connection target should be not only whether the transaction is read only, but also
otherattributes such as whether temporary tables can be used (only standby), maintenance commands can be executed
(VACUUMand ANALYZE can run when transaction_read_only is on, but not on standby), etc.  And how about other DBMSs?  Do
wereally want to determine the target based on transaction_read_only while e.g. Oracle is based on primary/standby?
 

If you really want the transaction_read_only attribute for your application, then your app should execute "SET
default_transaction_read_only= on" upon connection, or ALTER DATABASE/USER SET default_transaction_read_only = on in
advance.


Regards
Takayuki Tsunakawa






Re: Patch: Implement failover on libpq connect level.

From
"Tsunakawa, Takayuki"
Date:
From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
> On Thu, Nov 17, 2016 at 10:08 PM, Craig Ringer <craig@2ndquadrant.com>
> wrote:
> > We can and probably should have both.
> >
> > If the server tells us on connect whether it's a standby or not, use that.
> >
> > Otherwise, ask it.
> >
> > That way we don't pay the round-trip cost and get the log spam when
> > talking to newer servers that send us something useful in the startup
> > packet, but we can still query it on older servers. Graceful fallback.
> >
> > Every round trip is potentially very expensive. Having libpq do them
> > unnecessarily is bad.
> 
> True, but raising the bar for this feature so that it doesn't get done is
> also bad.  It can be improved in a later patch.

I thought you are very strict about performance, so I hesitate to believe you forgive the additional round trip.  libpq
failoveris a new feature in v10, so I think it should provide the best user experience for v10 client+server users from
thestart.  If the concern is the time for development, then support for older releases can be added in a later patch.
 

There are still several months left for v10.  Why don't we try the best?  Could you share the difficulty?


Regards
Takayuki Tsunakawa



Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Sun, Nov 20, 2016 at 8:48 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
>> True, but raising the bar for this feature so that it doesn't get done is
>> also bad.  It can be improved in a later patch.
>
> I thought you are very strict about performance, so I hesitate to believe you forgive the additional round trip.
libpqfailover is a new feature in v10, so I think it should provide the best user experience for v10 client+server
usersfrom the start.  If the concern is the time for development, then support for older releases can be added in a
laterpatch. 
>
> There are still several months left for v10.  Why don't we try the best?  Could you share the difficulty?

I am very strict about regressing the performance of things that we
already have, but I try not to make a policy that a new feature must
be as fast as it could ever be.  That could result in us having very
few new features.  Of course, it also matters how frequently the
overhead is likely to be incurred.  A little overhead on each tuple
visibility check is a much bigger problem than a little overhead on
each CREATE TABLE statement, which in turn is a much bigger problem
than a little overhead on each connection attempt.  For good
performance, connections must last a while, so a little extra time
setting one up doesn't seem like a showstopper to me.  Anyway, anybody
who finds this mechanism too expensive doesn't have to use it; they
can implement something else instead.

Also, I am not saying that we should not change this in time for v10.
I'm saying that I don't think it should be a requirement for the next
patch to be committed in this area to introduce a whole new mechanism
for determining whether something is a master or a standby.  Love it
or hate it, pgsql-jdbc has already implemented something in this area
and it does something useful -- without requiring a wire protocol
change.  Now you and Kevin are trying to say that what they did is all
wrong, but I don't agree.  There are very many users for whom the
pgsql-jdbc approach will do exactly what they need, and no doubt some
for whom it won't.  Getting a patch that mimics that approach
committed is *progress*.  Improving it afterwards, whether for v10 or
some later release, is also good.

There is a saying that one should not let the perfect be the enemy of
the good.  I believe that saying applies here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
"Tsunakawa, Takayuki"
Date:
From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
> I am very strict about regressing the performance of things that we already
> have, but I try not to make a policy that a new feature must be as fast
> as it could ever be.  That could result in us having very few new features.

I see.  I like your attitude toward new features.  But I don't think now is the time to compromise this feature and
rushto the commit.
 


> Also, I am not saying that we should not change this in time for v10.
> I'm saying that I don't think it should be a requirement for the next patch
> to be committed in this area to introduce a whole new mechanism for
> determining whether something is a master or a standby.  Love it or hate
> it, pgsql-jdbc has already implemented something in this area and it does
> something useful -- without requiring a wire protocol change.  Now you and
> Kevin are trying to say that what they did is all wrong, but I don't agree.
> There are very many users for whom the pgsql-jdbc approach will do exactly
> what they need, and no doubt some for whom it won't.  Getting a patch that
> mimics that approach committed is *progress*.  Improving it afterwards,
> whether for v10 or some later release, is also good.

transaction_read_only=on does not mean the standby.  As the manual article on hot standby says, they are different.

https://www.postgresql.org/docs/devel/static/hot-standby.html

[Excerpt]
--------------------------------------------------
In normal operation, “read-only” transactions are allowed to update sequences and to use LISTEN, UNLISTEN, and NOTIFY,
soHot Standby sessions operate under slightly tighter restrictions than ordinary read-only sessions. It is possible
thatsome of these restrictions might be loosened in a future release.
 
...
Users will be able to tell whether their session is read-only by issuing SHOW transaction_read_only. In addition, a set
offunctions (Table 9.79, “Recovery Information Functions”) allow users to access information about the standby server.
Theseallow you to write programs that are aware of the current state of the database. These can be used to monitor the
progressof recovery, or to allow you to write complex programs that restore the database to particular states.
 
--------------------------------------------------


I'm afraid that if the current patch is committed, you will lose interest in the ideal solution.  Then if the current
patchis out as v10, there would be a concern about incompatibility when we pursue the ideal solution in a later
release. That is, "should we continue to report that this server is standby even if it's actually a primary with
transaction_read_onlyis on, to maintain compatibility with the older release."
 

If you want to connect to a server where the transaction is read-only, then shouldn't the connection parameter be
somethinglike "target_session_attrs=readonly"?  That represents exactly what the code does.
 


> There is a saying that one should not let the perfect be the enemy of the
> good.  I believe that saying applies here.

True, so I suggested not including the support for older servers for a while.  Shall we find the real enemy -- what's
preventingthe ideal solution?  I know my knowledge is still far less than you, so I may be missing something difficult.
So, I'd like Mithun to share the difficulty.
 

Regards
Takayuki Tsunakawa



Re: Patch: Implement failover on libpq connect level.

From
Craig Ringer
Date:
On 21 November 2016 at 00:08, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

>> Please avoid adding another round trip by using a GUC_REPORTed variable
>> (ParameterStatus entry).  If you want to support this libpq failover with
>> >pre-10 servers, you can switch the method of determining the primary based
>> on the server version.  But I don't think it's worth supporting older
>> servers > at the price of libpq code complexity.
>
> Currently there is no consensus around this. For now, making this patch to
> address failover to next primary as similar to JDBC seems sufficient for me.
> On next proposal of patch I think we can try to extend as you have proposed

FWIW, I agree. Working first, then improved.

>> I haven't tracked the progress of logical replication, but will
>> target_server_type be likely to be usable with it?  How will
>> target_server_type fit logical       > replication?
>
> I tried to check logical replication WIP patch, not very sure how to
> accomodate same.

Logical replication downstreams won't force transactions to read-only.
A significant part of the appeal of logical replication is that you
can use temp tables and unlogged tables on the downstream, and even
use persistent tables so long as they don't clash with the upstream.
More sophisticated models even allow the downstream to safely write to
replicated tables by doing conflict resolution.

So as it stands, this patch would consider any logical replication
downstream a 'master'.

That's fine for now IMO. Determining whether a server is a logical
replica isn't that simple, nor is it a clear-cut yes/no answer, and I
think it's out of the scope of this patch to deal with it. Figure it
out once logical replication lands.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:
An updated patch with some fixes for bugs reported earlier,

A. failover_to_new_master_v4.patch
Default value "any" is added to target_server_type parameter during its definition.

B. libpq-failover-smallbugs_02.patch
Fixed the issue raised by [PATCH] pgpassfile connection option, Now added host and port name along with pgpass file error message when ever authentication fails and added same to file libpq-failover-smallbugs.patch (bugs reported and fixed by Tsunakawa). I did review of original bugs reported and fixes it seems okay for me. I could not test GSS and SSPI authentication as it appears they needed windows.


Attachment

Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Sun, Nov 13, 2016 at 9:02 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> From: pgsql-hackers-owner@postgresql.org
>> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
>> Great, committed.  There's still potentially more work to be done here,
>> because my patch omits some features that were present in Victor's original
>> submission, like setting the failover timeout, optionally randomizing the
>> order of the hosts, and distinguishing between master and standby servers;
>> Victor, or anyone, please feel free to submit separate patches for those
>> things.
>
> The attached patch fixes some bugs and make a clarification for doc.  Could you check and test the authentication
stuffas I don't have an environment at hand?
 

I don't have an environment at hand, either, but I think your fixes
are correct, so I have committed them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Catalin Iacob
Date:
On Tue, Nov 22, 2016 at 8:38 AM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> transaction_read_only=on does not mean the standby.  As the manual article on hot standby says, they are different.

> I'm afraid that if the current patch is committed, you will lose interest in the ideal solution.

I agree with Robert that lacking the ideal solution shouldn't block a
good enough solution today. But I find the current patch as it stands
too confusing to pass the "good enough" bar.

> Then if the current patch is out as v10, there would be a concern about incompatibility when we pursue the ideal
solutionin a later release.  That is, "should we continue to report > that this server is standby even if it's actually
aprimary with transaction_read_only is on, to maintain compatibility with the older release." 

Agreed. I am worried that this will end up being a wart:
target_server_type=primary doesn't really mean primary it means that
by default the session is not read only which by the way is usually
the case for primaries but that can be changed.

> If you want to connect to a server where the transaction is read-only, then shouldn't the connection parameter be
somethinglike "target_session_attrs=readonly"?  That represents exactly what the code does. 

FWIW I find this to be a reasonable compromise. To keep the analogy
with the current patch it would be more something like
"target_session_attrs=read_write|any" and the docs could point out "by
the way, if you didn't tweak default_transaction_read_only read_write
will only pick a primary so this can be used for failover".
The functionality is the same but changing the name removes the
ambiguity in the semantics and leaves the door open for a future real
primary/replica functionality which includes logical replication with
whatever semantics are deemed appropriate then.

We do lose pgjdbc compatibility but the current patch doesn't have
that 100% anyway: in pgjdbc it's called targetServerType and it
accepts master instead of primary and it also accepts slave and
preferSlave
I notice at https://github.com/pgjdbc/www/blob/master/documentation/head/connect.md
that pgjdbc says "The master/slave distinction is currently done by
observing if the server allows writes" which, due to the "currently"
part seems to acknowledge this is not ideal and leaves the door open
for changes in semantics but I don't think changing semantics is going
to fly for Postgres itself.



Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:
On Wed, Nov 23, 2016 at 10:19 PM, Catalin Iacob <iacobcatalin@gmail.com> wrote:
   On Tue, Nov 22, 2016 at 8:38 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
 >> If you want to connect to a server where the transaction is read-only, then shouldn't the connection parameter be something like  >>"target_session_attrs=readonly"?  That represents exactly what the code does.

 >FWIW I find this to be a reasonable compromise. To keep the analogy
 >with the current patch it would be more something like "target_session_attrs=read_write|any".

I have taken this suggestion now renamed target_server_type to target_session_attrs with possible 2 values "read-write", "any".
May be we could expand to "readonly" and "prefer-readonly" in next patch proposal. Attaching the patch for same.
--
Thanks and Regards
Mithun C Y

Attachment

Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Thu, Nov 24, 2016 at 7:16 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Wed, Nov 23, 2016 at 10:19 PM, Catalin Iacob <iacobcatalin@gmail.com>
> wrote:
>    On Tue, Nov 22, 2016 at 8:38 AM, Tsunakawa, Takayuki
> <tsunakawa.takay@jp.fujitsu.com> wrote:
>  >> If you want to connect to a server where the transaction is read-only,
> then shouldn't the connection parameter be something like
>>>"target_session_attrs=readonly"?  That represents exactly what the code
> does.
>
>  >FWIW I find this to be a reasonable compromise. To keep the analogy
>  >with the current patch it would be more something like
> "target_session_attrs=read_write|any".
>
> I have taken this suggestion now renamed target_server_type to
> target_session_attrs with possible 2 values "read-write", "any".

I didn't hear any objections to this approach and, after some thought,
I think it's good.  So I've committed this after rewriting the
documentation, some cosmetic cleanup, and changing the logic so that
if one IP for a host turns out to be read-only, we skip all remaining
IPs for that host, not just the one that we tested already.  This
seems likely to be what users will want.

I recognize that this isn't going to be please everybody 100%, but
there's still room for followup patches to improve things further.
One thing I really like about the target_session_attrs renaming is
that it seems to leave the door open to a variety of things we might
want to do later.  The "read-write" value we now support is closely
related to the query we use for testing ("show
transaction_read_only").  If we later want to filter based on some
other server property, we can add additional values with associated
SQL commands for each.  Of course that could get a little crazy if
overdone, but hopefully we won't overdo it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Kuntal Ghosh
Date:
On Thu, Nov 24, 2016 at 5:46 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> I have taken this suggestion now renamed target_server_type to
> target_session_attrs with possible 2 values "read-write", "any".
> May be we could expand to "readonly" and "prefer-readonly" in next patch
> proposal. Attaching the patch for same.
I was doing some testing with the patch and I found some inconsistency
in the error message.
I've a read-only server running on port 5433 and no server on 5436 and 5438.

command: bin/psql
'postgresql://localhost:5436,localhost:5433,localhost:5438/postgres?target_session_attrs=read-write'

I get the following error message.

psql: could not make a writable connection to server "localhost:5433"
could not connect to server: Connection refused   Is the server running on host "localhost" (::1) and accepting
TCP/IPconnections on port 5438?
 
could not connect to server: Connection refused   Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IPconnections on port 5438?
 

It didn't show any error message for port 5436. But, if I modify the
connection string as following:

command: bin/psql
'postgresql://localhost:5433,localhost:5436,localhost:5438/postgres?target_session_attrs=read-write'

I get the following error message:

psql: could not make a writable connection to server "localhost:5433"
could not connect to server: Connection refused   Is the server running on host "localhost" (::1) and accepting
TCP/IPconnections on port 5436?
 
could not connect to server: Connection refused   Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IPconnections on port 5436?
 
could not connect to server: Connection refused   Is the server running on host "localhost" (::1) and accepting
TCP/IPconnections on port 5438?
 
could not connect to server: Connection refused   Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IPconnections on port 5438?
 

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Tue, Nov 29, 2016 at 2:19 PM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:
> On Thu, Nov 24, 2016 at 5:46 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>> I have taken this suggestion now renamed target_server_type to
>> target_session_attrs with possible 2 values "read-write", "any".
>> May be we could expand to "readonly" and "prefer-readonly" in next patch
>> proposal. Attaching the patch for same.
> I was doing some testing with the patch and I found some inconsistency
> in the error message.
> I've a read-only server running on port 5433 and no server on 5436 and 5438.
>
> command: bin/psql
> 'postgresql://localhost:5436,localhost:5433,localhost:5438/postgres?target_session_attrs=read-write'
>
> I get the following error message.
>
> psql: could not make a writable connection to server "localhost:5433"
> could not connect to server: Connection refused
>     Is the server running on host "localhost" (::1) and accepting
>     TCP/IP connections on port 5438?
> could not connect to server: Connection refused
>     Is the server running on host "localhost" (127.0.0.1) and accepting
>     TCP/IP connections on port 5438?
>
> It didn't show any error message for port 5436. But, if I modify the
> connection string as following:
>
> command: bin/psql
> 'postgresql://localhost:5433,localhost:5436,localhost:5438/postgres?target_session_attrs=read-write'
>
> I get the following error message:
>
> psql: could not make a writable connection to server "localhost:5433"
> could not connect to server: Connection refused
>     Is the server running on host "localhost" (::1) and accepting
>     TCP/IP connections on port 5436?
> could not connect to server: Connection refused
>     Is the server running on host "localhost" (127.0.0.1) and accepting
>     TCP/IP connections on port 5436?
> could not connect to server: Connection refused
>     Is the server running on host "localhost" (::1) and accepting
>     TCP/IP connections on port 5438?
> could not connect to server: Connection refused
>     Is the server running on host "localhost" (127.0.0.1) and accepting
>     TCP/IP connections on port 5438?

Hmm, maybe the query buffer is getting cleared someplace in there.  We
might need to save/restore it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Tue, Nov 29, 2016 at 3:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Nov 29, 2016 at 2:19 PM, Kuntal Ghosh
> <kuntalghosh.2007@gmail.com> wrote:
>> On Thu, Nov 24, 2016 at 5:46 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>>> I have taken this suggestion now renamed target_server_type to
>>> target_session_attrs with possible 2 values "read-write", "any".
>>> May be we could expand to "readonly" and "prefer-readonly" in next patch
>>> proposal. Attaching the patch for same.
>> I was doing some testing with the patch and I found some inconsistency
>> in the error message.
>> I've a read-only server running on port 5433 and no server on 5436 and 5438.
>>
>> command: bin/psql
>> 'postgresql://localhost:5436,localhost:5433,localhost:5438/postgres?target_session_attrs=read-write'
>>
>> I get the following error message.
>>
>> psql: could not make a writable connection to server "localhost:5433"
>> could not connect to server: Connection refused
>>     Is the server running on host "localhost" (::1) and accepting
>>     TCP/IP connections on port 5438?
>> could not connect to server: Connection refused
>>     Is the server running on host "localhost" (127.0.0.1) and accepting
>>     TCP/IP connections on port 5438?
>>
>> It didn't show any error message for port 5436. But, if I modify the
>> connection string as following:
>>
>> command: bin/psql
>> 'postgresql://localhost:5433,localhost:5436,localhost:5438/postgres?target_session_attrs=read-write'
>>
>> I get the following error message:
>>
>> psql: could not make a writable connection to server "localhost:5433"
>> could not connect to server: Connection refused
>>     Is the server running on host "localhost" (::1) and accepting
>>     TCP/IP connections on port 5436?
>> could not connect to server: Connection refused
>>     Is the server running on host "localhost" (127.0.0.1) and accepting
>>     TCP/IP connections on port 5436?
>> could not connect to server: Connection refused
>>     Is the server running on host "localhost" (::1) and accepting
>>     TCP/IP connections on port 5438?
>> could not connect to server: Connection refused
>>     Is the server running on host "localhost" (127.0.0.1) and accepting
>>     TCP/IP connections on port 5438?
>
> Hmm, maybe the query buffer is getting cleared someplace in there.  We
> might need to save/restore it.

Not the query buffer.  conn->errorMessage.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:
On Wed, Nov 30, 2016 at 1:44 AM, Robert Haas <robertmhaas@gmail.com> wrote:
 >On Tue, Nov 29, 2016 at 2:19 PM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
 >> I was doing some testing with the patch and I found some inconsistency
 >> in the error message.

 > Hmm, maybe the query buffer is getting cleared someplace in there.  We
 > might need to save/restore it.
 
 Thanks, send query resets the errorMessage. Will fix same.

 PQsendQuery()->PQsendQueryStart()->resetPQExpBuffer(&conn->errorMessage);


Thanks and Regards
Mithun C Y

Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:
On Wed, Nov 30, 2016 at 7:14 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> Thanks, send query resets the errorMessage. Will fix same.
PQsendQuery()->PQsendQueryStart()->resetPQExpBuffer(&conn->errorMessage);

Please find the patch which fixes this bug. conn->errorMessage as per definition only stores current error message, a new member
conn->savedMessage is introduced to save and restore error messages related to previous hosts which we failed to connect.

--
Thanks and Regards
Mithun C Y

Attachment

Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Thu, Dec 1, 2016 at 5:00 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Wed, Nov 30, 2016 at 7:14 AM, Mithun Cy <mithun.cy@enterprisedb.com>
> wrote:
>> Thanks, send query resets the errorMessage. Will fix same.
>> PQsendQuery()->PQsendQueryStart()->resetPQExpBuffer(&conn->errorMessage);
>
> Please find the patch which fixes this bug. conn->errorMessage as per
> definition only stores current error message, a new member
> conn->savedMessage is introduced to save and restore error messages related
> to previous hosts which we failed to connect.

Couldn't this just be a variable in PQconnectPoll(), instead of adding
a new structure member?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:

On Fri, Dec 2, 2016 at 8:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Couldn't this just be a variable in PQconnectPoll(), instead of adding
> a new structure member?

I have fixed same with a local variable in PQconnectPoll, Initally I thought savedMessage should have same visiblitly as errorMessage so we can restore the errorMessage even outside PQconnectPoll. But that seems not required. Attacting the new patch which fixes the same.

--
Thanks and Regards
Mithun C Y

Attachment

Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Sun, Dec 4, 2016 at 11:00 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Fri, Dec 2, 2016 at 8:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Couldn't this just be a variable in PQconnectPoll(), instead of adding
>> a new structure member?
>
> I have fixed same with a local variable in PQconnectPoll, Initally I thought
> savedMessage should have same visiblitly as errorMessage so we can restore
> the errorMessage even outside PQconnectPoll. But that seems not required.
> Attacting the new patch which fixes the same.

I think that you need a restoreErrorMessage call here:

                        /* Skip any remaining addresses for this host. */
                        conn->addr_cur = NULL;
                        if (conn->whichhost + 1 < conn->nconnhost)
                        {
                            conn->status = CONNECTION_NEEDED;
                            restoreErrorMessage(conn, &savedMessage);
                            goto keep_going;
                        }

Updated patch attached with that change, the removal of a useless
hunk, and the removal of "inline" from the new functions, which seems
like second-guessing of whatever decision the compiler chooses to
make.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: Patch: Implement failover on libpq connect level.

From
Gavin Flower
Date:
On 05/12/16 17:00, Mithun Cy wrote:
[...]
> errorMessage even outside PQconnectPoll. But that seems not required. 
> Attacting the new patch which fixes the same.
>
> -- 
> Thanks and Regards
> Mithun C Y
> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>
[...]

Is that meant to be 'attaching' or 'attacking'???
<ducks, and runs away VERY quickly>


MORE SERIOUSLY
thanks for your and others efforts for making pg great!


Cheers,
Gavin



Re: Patch: Implement failover on libpq connect level.

From
Mithun Cy
Date:
On Mon, Dec 5, 2016 at 11:23 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>I think that you need a restoreErrorMessage call here:
>                        /* Skip any remaining addresses for this host. */
>                        conn->addr_cur = NULL;
>                        if (conn->whichhost + 1 < conn->nconnhost)
>                        {
>                            conn->status = CONNECTION_NEEDED
>                            restoreErrorMessage(conn, &savedMessage);
>                            goto keep_going;
>                        }

Right after seeing transaction is read-only we have restored the saved message so I think we do not need one more restore there.
  if (strncmp(val, "on", 2) == 0)
  {
  PQclear(res);
+ restoreErrorMessage(conn, &savedMessage);

--
Thanks and Regards
Mithun C Y

Re: Patch: Implement failover on libpq connect level.

From
Robert Haas
Date:
On Mon, Dec 5, 2016 at 1:59 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Mon, Dec 5, 2016 at 11:23 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>I think that you need a restoreErrorMessage call here:
>>                        /* Skip any remaining addresses for this host. */
>>                        conn->addr_cur = NULL;
>>                        if (conn->whichhost + 1 < conn->nconnhost)
>>                        {
>>                            conn->status = CONNECTION_NEEDED
>>                            restoreErrorMessage(conn, &savedMessage);
>>                            goto keep_going;
>>                        }
>
> Right after seeing transaction is read-only we have restored the saved
> message so I think we do not need one more restore there.
>   if (strncmp(val, "on", 2) == 0)
>   {
>   PQclear(res);
> + restoreErrorMessage(conn, &savedMessage);

D'oh!  You're correct, of course.

Committed without that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company