Thread: Proposal: Implement failover on libpq connect level.
Rationale ========= Since introduction of the WAL-based replication into the PostgreSQL, it is possible to create high-availability and load-balancing clusters. However, there is no support for failover in the client libraries. So, only way to provide transparent for client application failover is IP address migration. This approach has some limitation, i.e. it requires that master and backup servers reside in the same subnet or may not be feasible for other reasons. Commercial RDBMS, such as Oracle, employ more flexible approach. They allow to specify multiple servers in the connect string, so if primary server is not available, client library tries to connect to other ones. This approach allows to use geographically distributed failover clusters and also is a cheap way to implement load-balancing (which is not possible with IP address migration). Proposed change =============== Allow to specify multiple hosts in the libpq connect string. Make libpq attempt to connect to all host simultaneously or in random order and use of the server which successfully establishes connection first. Syntax ------ Libpq connect string can be either set of the keyword=value pairs or an URL. In the first form it can be just allowed to specify keyword host multiple times. "host=main-server host=standby1 host=standby2 port=5432 dbname=database" In the second form host can be specified either in the first part of URL or in the query parameters. postgresql://user@host/database postgresql:///database?host=hostname&user=username If host is specified as a parameter, it is also possible to allow multiple host parameters without breaking existing syntax. postgresql:///database?host=main-server&host=standby1&host=standby2 In order to implement load-balancing clusters, additional parameters should be added readonly=boolean and loadbalancing=boolean Support for this syntax extensions is added to the PQconnectdb, PQconnectdbParams, PQConnectStart and PQConnectStartParams, but not PQsetdb/PQsetdblogin functions. Behavoir -------- If PQconnectdb encounters connect string with multiple hosts specified, it attempts to establish connection with all these hosts simultaneously, and begins to work with server which responds first, unless loadbalancing parameter is true. If the loadbalancing parameter is true, it tries servers sequentially in the random order. If the parameter readonly is false, after authenticating with server it executes show transaction_read_only, to find out whether current connection is to the master or to the hot standby, and connection is considered successful only if server allows read write transactions. This allows to have clients which write to the database and clients which perform read-only access. Read-only clients would be load-balanced between the master and slave servers, and read-write clients connect only to the master (whichever server has this role at the moment of connection). Information of the alternate servers should be stored in the PGconn structure. Function PQreset should be able to take advantage of new syntax and possibly open connection to the new master, if failover occurred during lifetime of the connection. Possible drawbacks ================== Compatibility ------------- Proposed patch requires no modifications to the server or protocol, and modification of synchronous function (PQconnectdb, PQconnectdbParams) doesn't introduce incompatible changes to the client library. Even if connect string with multiple host would be erroneously used with version of libpq, which do not support this feature, it is not an error. It just use last host specified in the connect string. There could be some compatibility problems with asynchronous connections created with PQConnectStart functions. Problem is that we are trying to establish several connections at once, and there are several sockets which should be integrated into application event loop. Even if we would try servers in some particular order (such as randomized order during load balancing), file descriptor of socket can change during execution PQConnectPoll, and existing applications are not prepared to it. Performance impact ------------------ Performance impact seems to be negligible. 1. If connect string contain only one host, the only complication is the maintenance of the data structure, which possible can hold more than one host name. Connection process itself would not be affected. 2. If there is pure high-availability cluster, i.e. standby servers do not accept client connections on the specified port, there is no extra load on standby servers, and almost no (only several unsuccessful connect calls) on client. 3. If there is load balancing cluster, there is no performance impacts for read-only client, but each read-write client causes standby servers to process extra connection to the point where server can report read-only state of transaction (i.e. including SSL handshake and postgresql authentication). Typically, in the situation where read-only clients should be load-balanced using this feature, there are much more read-only clients, than read-write ones. So some extra load related with read-write connection seems to be justified by simplification of client configuration. -- Victor Wagner <vitus@wagner.pp.ru>
Victor Wagner wrote: > Rationale > ========= > > Since introduction of the WAL-based replication into the PostgreSQL, it is > possible to create high-availability and load-balancing clusters. > > However, there is no support for failover in the client libraries. So, only > way to provide transparent for client application failover is IP address > migration. This approach has some limitation, i.e. it requires that > master and backup servers reside in the same subnet or may not be > feasible for other reasons. > > Commercial RDBMS, such as Oracle, employ more flexible approach. They > allow to specify multiple servers in the connect string, so if primary > server is not available, client library tries to connect to other ones. > > This approach allows to use geographically distributed failover clusters > and also is a cheap way to implement load-balancing (which is not > possible with IP address migration). I wonder how useful this is at the present time. If the primary goes down and the client gets connected to the standby, it would have read-only access there. Most applications wouldn't cope well with that. Once we have multi-master replication that can be used for fail-over, the picture will change. Then a feature like that would be very useful indeed. > "host=main-server host=standby1 host=standby2 port=5432 dbname=database" It seems a bit arbitrary to require that all servers use the same port. Maybe parameters like host2, port2, host3, port3 etc. might be better. Yours, Laurenz Albe
Re: Proposal: Implement failover on libpq connect level.
> On 18 Aug 2015, at 10:32, Albe Laurenz <laurenz.albe@wien.gv.at> wrote: > > Victor Wagner wrote: >> Rationale >> ========= >> >> Since introduction of the WAL-based replication into the PostgreSQL, it is >> possible to create high-availability and load-balancing clusters. >> >> However, there is no support for failover in the client libraries. So, only >> way to provide transparent for client application failover is IP address >> migration. This approach has some limitation, i.e. it requires that >> master and backup servers reside in the same subnet or may not be >> feasible for other reasons. >> >> Commercial RDBMS, such as Oracle, employ more flexible approach. They >> allow to specify multiple servers in the connect string, so if primary >> server is not available, client library tries to connect to other ones. >> >> This approach allows to use geographically distributed failover clusters >> and also is a cheap way to implement load-balancing (which is not >> possible with IP address migration). > > I wonder how useful this is at the present time. > > If the primary goes down and the client gets connected to the standby, > it would have read-only access there. Most applications wouldn't cope > well with that. > > Once we have multi-master replication that can be used for fail-over, > the picture will change. Then a feature like that would be very useful indeed. > >> "host=main-server host=standby1 host=standby2 port=5432 dbname=database" > > It seems a bit arbitrary to require that all servers use the same port. > > Maybe parameters like host2, port2, host3, port3 etc. might be better. > > Yours, > Laurenz Albe i totally agree with laurenz. in addition to that you have the “problem” of transactions. if you failover in the middle of a transaction, strange things might happen from the application point of view. the good thing, however, is that stupid middleware is sometimes not able to handle failed connections. however, overall i think it is more of a danger than a benefit. regards, hans > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
Hans-Jürgen Schönig wrote: > in addition to that you have the “problem” of transactions. if you failover in the middle > of a transaction, strange things might happen from the application point of view. > > the good thing, however, is that stupid middleware is sometimes not able to handle > failed connections. however, overall i think it is more of a danger than a benefit. Maybe I misunderstood the original proposal, but my impression was that the alternative servers would be tried only at the time the connection is established, and there would be no such problems as you describe. Those could only happen if libpq automatically tried to reconnect upon failure without the client noticing. So the stupid middleware would get an error message, but the reconnect would actually work. Yours, Laurenz Albe
Re: Proposal: Implement failover on libpq connect level.
> On 18 Aug 2015, at 11:19, Albe Laurenz <laurenz.albe@wien.gv.at> wrote: > > Hans-Jürgen Schönig wrote: >> in addition to that you have the “problem” of transactions. if you failover in the middle >> of a transaction, strange things might happen from the application point of view. >> >> the good thing, however, is that stupid middleware is sometimes not able to handle >> failed connections. however, overall i think it is more of a danger than a benefit. > > Maybe I misunderstood the original proposal, but my impression was that the alternative > servers would be tried only at the time the connection is established, and there would be no > such problems as you describe. it would still leave the problem of having a read only on the other side unless you are using BDR or so. regards, hans
On Tue, Aug 18, 2015 at 6:07 AM, PostgreSQL - Hans-Jürgen Schönig <postgres@cybertec.at> wrote: >> On 18 Aug 2015, at 11:19, Albe Laurenz <laurenz.albe@wien.gv.at> wrote: >> >> Hans-Jürgen Schönig wrote: >>> in addition to that you have the “problem” of transactions. if you failover in the middle >>> of a transaction, strange things might happen from the application point of view. >>> >>> the good thing, however, is that stupid middleware is sometimes not able to handle >>> failed connections. however, overall i think it is more of a danger than a benefit. >> >> Maybe I misunderstood the original proposal, but my impression was that the alternative >> servers would be tried only at the time the connection is established, and there would be no >> such problems as you describe. > > it would still leave the problem of having a read only on the other side unless you are using BDR or so. That doesn't make this a bad idea. Some people are using replication solutions that can cope with this already (EDB has a proprietary product, and I'm sure there are people using BDR, too) and, as the solutions get better and more widely deployed, more people will want to do it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 17 August 2015 at 23:18, Victor Wagner <vitus@wagner.pp.ru> wrote: > > Rationale > ========= > > Since introduction of the WAL-based replication into the PostgreSQL, it is > possible to create high-availability and load-balancing clusters. > > However, there is no support for failover in the client libraries. So, only > way to provide transparent for client application failover is IP address > migration. This approach has some limitation, i.e. it requires that > master and backup servers reside in the same subnet or may not be > feasible for other reasons. > This is not completely true, you can always use something like pgbouncer or other middleware to change the server to which clients connect. you still need to solve the fact that you will have a read-only server at the other side. something like repmgr + pgbouncer will work fine. i agree that once/if we ever have multimaster included then this could be a good idea -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación
On Tue, Aug 18, 2015 at 12:53 PM, Jaime Casanova <jaime.casanova@2ndquadrant.com> wrote: > This is not completely true, you can always use something like > pgbouncer or other middleware to change the server to which clients > connect. you still need to solve the fact that you will have a > read-only server at the other side. > > something like repmgr + pgbouncer will work fine. Sure, but pgbouncer is an extra hop, and has its own foibles. There's real appeal to doing this in the client. > i agree that once/if we ever have multimaster included then this could > be a good idea I think it has a lot of appeal *now*. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Behavoir
--------
If PQconnectdb encounters connect string with multiple hosts specified,
it attempts to establish connection with all these hosts simultaneously,
and begins to work with server which responds first, unless
loadbalancing parameter is true.
If the loadbalancing parameter is true, it tries servers sequentially in
the random order.
If the parameter readonly is false, after authenticating with server it
executes show transaction_read_only, to find out whether current
connection is to the master or to the hot standby, and connection is
considered successful only if server allows read write transactions.
I wonder how extended protocol is handled by this proposal. Suppose load balacing mode is enabled. PQprepare is executed on standby1. Then PQexecPrepared gets called. This may be executed on standby2, which will fail because there's no prepared statement created by the former PQprepare call. Even simple procotol is used, same thing can be said to SQL PREPARE/EXECUTE/DEALLOCATE. SQL BEGIN/COMMIT/ROLLBACK would be more interesting example in load balancing mode. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On 2015.08.18 at 08:32:28 +0000, Albe Laurenz wrote: > I wonder how useful this is at the present time. > > If the primary goes down and the client gets connected to the standby, > it would have read-only access there. Most applications wouldn't cope > well with that. It is supposed that somebody (either system administrator or some cluster management software) have noticed failure of master and promoted one of the standbys to master. So, clients have only to find out which cluster node serves as master just now. Idea is that we don't need any extra administration actions such as IP migration to do it. Clients have list of alternate servers and discover which one to work with by trial and error. I consider in my proposal following situations: 1. Warm standby - doesn't accept client connection at all unless promoted to master. 2. Hot standby - we have two types of clients - one for which readonly access is sufficient, and other that need to connect only to master. In this case intention to write is explicitely stated in the connect string (readonly=false) and connect procedure would check if node it tries to connect allowed write. It seems that most people discussing in this thread think in millisecond time intervals (failure and immediate reconnect). I was thinking about much longer time intervals - it would probaly take seconds to cluster management software to notice server failure and promote backup server to master, it might be possible for application to spend minute or so trying to reconnect, but it would take hours to change connect string on clients - it would require visit of support enginer to each client terminal, if we are thinking of distributed OLTP system such as point-of-sale network with thick clients. > > > "host=main-server host=standby1 host=standby2 port=5432 dbname=database" > > It seems a bit arbitrary to require that all servers use the same port. > Maybe parameters like host2, port2, host3, port3 etc. might be better. I've thought about this approach. But PostgreSQL administration guide insists that all servers in the cluster should have as identical configuration as possible to simplify administration. Moreover I've seldom have seen configurations where postgresql is accepting connection on non-default port. Using host1, host2 etc would have unintended connotations, such is "this is first, main server". I think that client should treat all given servers as equal and let cluster administration to choose which one would accept connection. -- Victor Wagner <vitus@wagner.pp.ru>
>
> On 2015.08.18 at 08:32:28 +0000, Albe Laurenz wrote:
>
> > I wonder how useful this is at the present time.
> >
> > If the primary goes down and the client gets connected to the standby,
> > it would have read-only access there. Most applications wouldn't cope
> > well with that.
>
> It is supposed that somebody (either system administrator or some
> cluster management software) have noticed failure of master and promoted
> one of the standbys to master.
>
> So, clients have only to find out which cluster node serves as master
> just now.
>
> Idea is that we don't need any extra administration actions such as IP
> migration to do it. Clients have list of alternate servers and discover
> which one to work with by trial and error.
>
> I consider in my proposal following situations:
>
> 1. Warm standby - doesn't accept client connection at all unless
> promoted to master.
>
> 2. Hot standby - we have two types of clients - one for which readonly
> access is sufficient, and other that need to connect only to master.
> In this case intention to write is explicitely stated in the connect
> string (readonly=false) and connect procedure would check if node it
> tries to connect allowed write.
>
> It seems that most people discussing in this thread think in millisecond
> time intervals (failure and immediate reconnect).
On 2015.08.19 at 08:28:32 +0530, Amit Kapila wrote: > On Tue, Aug 18, 2015 at 9:48 AM, Victor Wagner <vitus@wagner.pp.ru> wrote: > > > > > > Behavoir > > -------- > > > > If PQconnectdb encounters connect string with multiple hosts specified, > > it attempts to establish connection with all these hosts simultaneously, > > and begins to work with server which responds first, unless > > loadbalancing parameter is true. > > > > > I think here you are mixing the behaviour for load balancing solution and > failover solution. It seems to me that for client-side failover solution > (which is also known as Transparent Application Failover), the connection > attempt to second server should be done after the first connection is > broken as that provide more flexibility. I think that failover procedure should begin before first connection is ever established. When client application starts, it has no way of knowing current state of the server cluster - which of servers is working as master now. Application uses connect string, placed into its configuration file long time ago, and changing this configuration might require special permissions, user of application doesn't have. But user typically know how to restart application or reboot his terminal. So, for the spatially distributed networks with thick clients we can handle only initial connections, not connection resets. At least application author always can implement restoration of connection as closing old connection and establishing new. So, when application first establishes connection it have to be prepared to connect any of alternate hosts. I don't think that making connections in sequential order provide big flexibility. But it can greatly increase startup time, because connect to host which is physically down fails after significant timeout. While application waits for first connect to fail, it might complete session initialization with working server several times. Of course, connecting to servers in sequential order is simpler to implement, and allows even more mixing of load balancing with failover, because code would be same. > Although both ideas (load balancing and failover) seems worth discussing, > they are separate features and can be worked on separately. It will be > easier to sort out the details as well that way. Really load balancing comes almost for free if we implement connect to alternate server for failover purposes. I'm not sure that in case of hot standby, where only readonly transactions can be loadbalanced, loadbalancing is very useful. And included it in the proposal only because it is very cheap to implement in this form, > > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com
On 2015.08.19 at 12:42:45 +0900, Tatsuo Ishii wrote: > I wonder how extended protocol is handled by this proposal. Suppose > load balacing mode is enabled. PQprepare is executed on standby1. Then > PQexecPrepared gets called. This may be executed on standby2, which > will fail because there's no prepared statement created by the former > PQprepare call. Here we are discussing load-balancing on the client level, not on the statement level. Suppose that we have 100 readonly clients and 3 standby servers + master. If all clients specify all four servers in the their connect strings, and connect randomly to them, each server would have approximately 25 clients. But once connection is established, each client works with one server (at least until communication failure occurs and it would call PQreset. In this case it has to reprepare statements anyway).
Victor Wagner wrote: >> I wonder how useful this is at the present time. >> >> If the primary goes down and the client gets connected to the standby, >> it would have read-only access there. Most applications wouldn't cope >> well with that. > It is supposed that somebody (either system administrator or some > cluster management software) have noticed failure of master and promoted > one of the standbys to master. > > So, clients have only to find out which cluster node serves as master > just now. > > Idea is that we don't need any extra administration actions such as IP > migration to do it. Clients have list of alternate servers and discover > which one to work with by trial and error. Yes, but that will only work reliably if the (read-only) standby does not allow connections before it is promoted. > I consider in my proposal following situations: > > 1. Warm standby - doesn't accept client connection at all unless > promoted to master. > > 2. Hot standby - we have two types of clients - one for which readonly > access is sufficient, and other that need to connect only to master. > In this case intention to write is explicitely stated in the connect > string (readonly=false) and connect procedure would check if node it > tries to connect allowed write. I think that these are both valid use cases. And as Robert said, there are people out using BDR or other proprietary multi-master solutions, so there might well be an audience for this feature. >>> "host=main-server host=standby1 host=standby2 port=5432 dbname=database" >> It seems a bit arbitrary to require that all servers use the same port. >> Maybe parameters like host2, port2, host3, port3 etc. might be better. > I've thought about this approach. But PostgreSQL administration guide > insists that all servers in the cluster should have as identical > configuration as possible to simplify administration. > > Moreover I've seldom have seen configurations where postgresql is > accepting connection on non-default port. We do it all the time. > Using host1, host2 etc would have unintended connotations, such is "this > is first, main server". I think that client should treat all given > servers as equal and let cluster administration to choose which one > would accept connection. I don't think that my idea of "host", "host3" is very appealing myself, but I still don't like your original proposal of having multiple "host" parameters. One problem with that is that this syntax is already allowed, but your proposal would silently change the semantics. Today, if you have multiple "host" parameters, the last one wins. So with your modification in place, some connect strings that work today would start behaving in unexpected ways. Maybe a better idea would be: host=db1.myorg.com,db2.myorg.com port=5432,2345 Yours, Laurenz Albe
On 2015.08.19 at 12:29:51 +0530, Amit Kapila wrote: > > It seems that most people discussing in this thread think in millisecond > > time intervals (failure and immediate reconnect). > > Why not have this as a separate parameter (*_timeout or something like > that)? Because it is not in the software configuration. It is in the people heads. Or may be in the organizational configuration of the environments we are talking about. Each of us imagining some use-case for discussed feature. And these cases are completely different, and have different typical time interval. I haven't explicitely stated my use cases in the proposal. So people thinking in terms of their use cases, and this is very significant feedback for me.
On 2015.08.19 at 08:28:32 +0530, Amit Kapila wrote:
> On Tue, Aug 18, 2015 at 9:48 AM, Victor Wagner <vitus@wagner.pp.ru> wrote:
> >
> >
> > Behavoir
> > --------
> >
> > If PQconnectdb encounters connect string with multiple hosts specified,
> > it attempts to establish connection with all these hosts simultaneously,
> > and begins to work with server which responds first, unless
> > loadbalancing parameter is true.
> >
> >
> I think here you are mixing the behaviour for load balancing solution and
> failover solution. It seems to me that for client-side failover solution
> (which is also known as Transparent Application Failover), the connection
> attempt to second server should be done after the first connection is
> broken as that provide more flexibility.
I think that failover procedure should begin before first connection is
ever established.
Re: Proposal: Implement failover on libpq connect level.
On 2015.08.19 at 07:15:30 +0000, Albe Laurenz wrote: > > Idea is that we don't need any extra administration actions such as IP > > migration to do it. Clients have list of alternate servers and discover > > which one to work with by trial and error. > > Yes, but that will only work reliably if the (read-only) standby does not > allow connections before it is promoted. It would just take a bit more time for client and a bit more load for server - to make sure that this connection is read-write by issuing show transaction_read_only statement before considering connection useful. > > And as Robert said, there are people out using BDR or other proprietary > multi-master solutions, so there might well be an audience for this feature. Unfortunately I have no experience with such solutions, so I'd greatly appreciate feedback from those people. I've modelled my proposal after another proprietary solution - Oracle RAC. > One problem with that is that this syntax is already allowed, but > your proposal would silently change the semantics. > Today, if you have multiple "host" parameters, the last one wins. > So with your modification in place, some connect strings that work today > would start behaving in unexpected ways. This is serious argument. But what the use case of these connect strings now? It seems to me that in most cases last host in the connect string would be only host which accepts connections, so it wins anyway > Maybe a better idea would be: > host=db1.myorg.com,db2.myorg.com port=5432,2345 I've tried not to introduce new delimiters. But this syntax definitely have some advantages. At least it allows to specify host-port pairs as two parallel lists. Other idea - allow to specify host-port pair as argument of host parameter. host=db1.myorg.com:5432 It is consistent with URL syntax and system administrators are used to it. And with long list of hosts there is less chances to made an error as host and corresponding port come together. But your variant allows to handle hostaddr parameter same way as host and port. -- Victor Wagner <vitus@wagner.pp.ru>
On 2015.08.19 at 12:55:15 +0530, Amit Kapila wrote: > > I think that failover procedure should begin before first connection is > > ever established. > > > > As far as I understand, failover gets initiated once the master server goes > down or is not accessible due to some reason, so for such cases if you > have the connection to both the servers then it might not work. Master server might go down when client is not started yet. And when client starts up, it has to find out which server to connect now. Consider point-of-sale terminals, bank offices or anything else, which do not work round the clock. Clerk comes to his workplace in the morning, switches on terminal and inserts her smartcard to authorize with server. She doesn't need to know what server name is and where it is located. Either application finds the server automatically, or support engineer has to be called to fix things. Moreover, in some situations restart of application (or even client terminal) is acceptable price for failover, as long as there is no need to manually fix the configuration.
Victor Wagner wrote: > > > Idea is that we don't need any extra administration actions such as IP > > > migration to do it. Clients have list of alternate servers and discover > > > which one to work with by trial and error. > > > > Yes, but that will only work reliably if the (read-only) standby does not > > allow connections before it is promoted. > > It would just take a bit more time for client and a bit more load for > server - to make sure that this connection is read-write by > issuing > > show transaction_read_only > > statement before considering connection useful. That's not very comfortable, and a lot of middleware software won't easily learn the trick. But even without that use case I think that the feature is probably worth the effort. [about having multiple "host" parameters in the connection string] > > One problem with that is that this syntax is already allowed, but > > your proposal would silently change the semantics. > > Today, if you have multiple "host" parameters, the last one wins. > > So with your modification in place, some connect strings that work today > > would start behaving in unexpected ways. > > This is serious argument. But what the use case of these connect strings > now? > > It seems to me that in most cases last host in the connect string would > be only host which accepts connections, so it wins anyway I'm not saying that it is particularly wide-spread and useful; it could happen through careless editing of connection strings or by using a connection service file entry (http://www.postgresql.org/docs/current/static/libpq-pgservice.html) and overriding the host parameter on the command line. > > Maybe a better idea would be: > > host=db1.myorg.com,db2.myorg.com port=5432,2345 > > I've tried not to introduce new delimiters. But this syntax definitely > have some advantages. At least it allows to specify host-port pairs as > two parallel lists. > > Other idea - allow to specify host-port pair as argument of host > parameter. > > host=db1.myorg.com:5432 > > It is consistent with URL syntax and system administrators are used to > it. And with long list of hosts there is less chances to made an error > as host and corresponding port come together. I don't think that is very attactive as it confuses the distinction between "host" and "port". What would you do with host=db1.myorg.com:2345 port=1234 Yours, Laurenz Albe
On 2015.08.19 at 09:21:50 +0000, Albe Laurenz wrote: > > > Yes, but that will only work reliably if the (read-only) standby does not > > > allow connections before it is promoted. > > > > It would just take a bit more time for client and a bit more load for > > server - to make sure that this connection is read-write by > > issuing > > > > show transaction_read_only > > > > statement before considering connection useful. > > That's not very comfortable, and a lot of middleware software won't easily > learn the trick. It shouldn't be left to middleware. It should be hidden into PQConnectPoll. This function already handle very complicated state transition diagram, including authentication, SSL negotiation and so on. If we just add couple of new states such as CONNECTION_ASK_RW_STATUS and CONNECTION_RW_STATUS_OK it should be fine. Application would just call PQConnectPoll repeatedly (either via PQconnectdb or explicitely when readable/writable condition detected on the socket integrated into app even loop) until success or unrecoverable error would be achieved. How many interaction with server it would take, it is not middleware problem. > > It seems to me that in most cases last host in the connect string would > > be only host which accepts connections, so it wins anyway > > I'm not saying that it is particularly wide-spread and useful; it could > happen through careless editing of connection strings or by using a > connection service file entry > (http://www.postgresql.org/docs/current/static/libpq-pgservice.html) > and overriding the host parameter on the command line. I don't think that host definition from all possible sources (service file, environment, command line) should be collected together. Although it is essential to be clear when host list is appended and when - replaced. If we implement sequential trial of all hosts, then we can start with last one, to provide compatibility with existing behavior. In this case if last host is online, no change occur. Another idea - is to enable multiple host connection only if special option (loadbalance or failover) present in the connect string. > > Other idea - allow to specify host-port pair as argument of host > > parameter. > > > > host=db1.myorg.com:5432 > > > > It is consistent with URL syntax and system administrators are used to > > it. And with long list of hosts there is less chances to made an error > > as host and corresponding port come together. > > I don't think that is very attactive as it confuses the distinction between > "host" and "port". What would you do with > > host=db1.myorg.com:2345 port=1234 I don't think that it does add any more confusion than simultaneous existence of host and hostaddr, or ability to specify host and port both in host part of URL and query parameters postgresql://user@host:5432/dbname?host=otherhost&port=2345 Bot really, you've convinced me that syntax with two or three (host, port and hostaddr) parallel lists is the best. Although we'll need to allow empty entries in the lists such as host=master.db.com,standby1.db.com.standby2.db.com port=,2345, hostaddr=,192.168.0.4,192.160.1.8 with evident semantic: get port, host and hostaddr elements with same number, and if some of them are empty, behave as it was only host and corresponding parameter not specified at all. -- Victor Wagner <vitus@wagner.pp.ru>
Albe Laurenz <laurenz.albe@wien.gv.at> writes: > Victor Wagner wrote: >> It would just take a bit more time for client and a bit more load for >> server - to make sure that this connection is read-write by >> issuing >> show transaction_read_only >> statement before considering connection useful. > That's not very comfortable, and a lot of middleware software won't easily > learn the trick. That sort-of ties into what seems to me the main objection to this proposal, namely that there is already a way to do this sort of thing: DNS-based load balancing. All the clients think they connect to db.mycompany.com, but which server they actually get is determined by what IP address the DNS server tells them to use. This is a technology that is very well established, known to every large-site admin, and usable for every Internet-based service. Even if libpq had its own nonstandard way of doing something similar, the site admins would probably still need to use DNS load balancing for other services. In fact, they'd still need to use DNS balancing for Postgres, because not everything connects with libpq (think JDBC for instance). So I think we ought to reject this proposal, full stop. I see no reason to re-invent this wheel, and there are good reasons not to. regards, tom lane
On 2015-08-19 09:41:32 -0400, Tom Lane wrote: > In fact, they'd still need to use DNS balancing for Postgres, > because not everything connects with libpq (think JDBC for instance). It already does support this though. https://jdbc.postgresql.org/documentation/head/connect.html : > Connection Fail-over > > To support simple connection fail-over it is possible to define multiple > endpoints (host and port pairs) in the connection url separated by > commas. The driver will try to once connect to each of them in order > until the connection succeeds. If none succeed, a normal connection > exception is thrown. > > The syntax for the connection url is: > > jdbc:postgresql://host1:port1,host2:port2/database > So I think we ought to reject this proposal, full stop. I see no > reason to re-invent this wheel, and there are good reasons not to. I don't really buy this argument. Allowing to connect to several endpoints isn't exactly "new tech" either. A lot of database connectors do support something very close to the above pgjdbc feature. Greetings, Andres Freund
On 2015-08-19 09:41:32 -0400, Tom Lane wrote:
> In fact, they'd still need to use DNS balancing for Postgres,
> because not everything connects with libpq (think JDBC for instance).
It already does support this though.
https://jdbc.postgresql.org/documentation/head/connect.html :
> Connection Fail-over
>
> To support simple connection fail-over it is possible to define multiple
> endpoints (host and port pairs) in the connection url separated by
> commas. The driver will try to once connect to each of them in order
> until the connection succeeds. If none succeed, a normal connection
> exception is thrown.
>
> The syntax for the connection url is:
>
> jdbc:postgresql://host1:port1,host2:port2/database
> So I think we ought to reject this proposal, full stop. I see no
> reason to re-invent this wheel, and there are good reasons not to.
I don't really buy this argument. Allowing to connect to several
endpoints isn't exactly "new tech" either. A lot of database connectors
do support something very close to the above pgjdbc feature.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-08-19 09:41:32 -0400, Tom Lane wrote:
> In fact, they'd still need to use DNS balancing for Postgres,
> because not everything connects with libpq (think JDBC for instance).
It already does support this though.
https://jdbc.postgresql.org/documentation/head/connect.html :
> Connection Fail-over
>
> To support simple connection fail-over it is possible to define multiple
> endpoints (host and port pairs) in the connection url separated by
> commas. The driver will try to once connect to each of them in order
> until the connection succeeds. If none succeed, a normal connection
> exception is thrown.
>
> The syntax for the connection url is:
>
> jdbc:postgresql://host1:port1,host2:port2/database
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Proposal: Implement failover on libpq connect level.
On 2015.08.19 at 15:35:17 +0100, Simon Riggs wrote: > > I think we do need some way of saying that a readonly connection is OK. So I had such thing in my propsal (boolean parameter readonly). But haven't yet checked if it is compatible with jdbc syntax. > the default would be to connect to each in turn until we find the master. > It should keep retrying for a period of time since for a short period it is > possible there is no master. If you specify readonly, then a connection to It is very important addition - to specify that if no host is able to establish read-write session, we should retry and give a chance for sever administration to promote one of standbys to master. Probably there should be additional timeout parameter (we have connection_timeout, and this would be failover_timeout) with some reasonaable default.
> Here we are discussing load-balancing on the client level, not on the > statement level. I see. > Suppose that we have 100 readonly clients and 3 standby servers + master. > If all clients specify all four servers in the their connect strings, > and connect randomly to them, each server would have approximately 25 > clients. > > But once connection is established, each client works with one > server (at least until communication failure occurs and it would call > PQreset. In this case it has to reprepare statements anyway). One downside of this is, if one of the standby servers is not responding, every time clients will be blocked by the server before giving up the connection trial. This could last for hours (for example, the network cable is plugged out). I think round robin DNS is better because the DNS server will drop the entry corresponding broken server (or any solution which has similar capability). After all, this type of client side solutions are not very stable in a real world environment IMO (I heard the same opinion regarding HAProxy). Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On 2015.08.19 at 15:35:17 +0100, Simon Riggs wrote:
>
> I think we do need some way of saying that a readonly connection is OK. So
I had such thing in my propsal (boolean parameter readonly).
But haven't yet checked if it is compatible with jdbc syntax.
> the default would be to connect to each in turn until we find the master.
> It should keep retrying for a period of time since for a short period it is
> possible there is no master. If you specify readonly, then a connection to
It is very important addition - to specify that if no host is able to
establish read-write session, we should retry and give a chance for
sever administration to promote one of standbys to master. Probably
there should be additional timeout parameter (we have
connection_timeout, and this would be failover_timeout) with some
reasonaable default.
On Wed, Aug 19, 2015 at 07:15:30AM +0000, Laurenz Albe wrote: > Victor Wagner wrote: > >> I wonder how useful this is at the present time. > >> > Maybe a better idea would be: > host=db1.myorg.com,db2.myorg.com port=5432,2345 I think if we're going to provide multiple sets of connection info, we should just do that rather than trying to piece them together from constituent parts, where the former looks like: host="service=foo sslmode=require",postgresql://bar.baz/mydb?sslmode=require,"host=quux.corge user=grault port=6433" As far as I can tell, the only way a comma could sneak into these strings is if it were in a database name or similarly bizarre spot, in which case the usual quoting needed to handle it in general should handle it here. It's not clear to me that libpq is the correct place to add this feature, as we have fairly large user bases--the JDBC world, for example--that don't use it. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 2015.08.20 at 00:17:35 +0900, Tatsuo Ishii wrote: > > But once connection is established, each client works with one > > server (at least until communication failure occurs and it would call > > PQreset. In this case it has to reprepare statements anyway). > > One downside of this is, if one of the standby servers is not > responding, every time clients will be blocked by the server before > giving up the connection trial. This could last for hours (for This shouldn't happen. My proposal was to connect all servers simultaneously, and then use that connection which would be established first closing other ones Even if we wouldn't do so (to properly randomize server load or to be compatible with jdbc), there is connection_timeout parameter, so client wouldn't seat and just wait for hours while system TCP/IP stack trying to connect nonexistent server. > example, the network cable is plugged out). I think round robin DNS is > better because the DNS server will drop the entry corresponding broken DNS server wouldn't drop anything unless explicitely told so (by administrator or by some watchdog software which is able to talk nsupdate protocol). And not everyone database owner has control on his own domain. Moreover, DNS is distributed system with agressive caching. If our system is not local, DNS records for non-existing server would be cached by DNS servers of client's internet provider.
Victor Wagner <vitus@wagner.pp.ru> writes: > On 2015.08.20 at 00:17:35 +0900, Tatsuo Ishii wrote: >> One downside of this is, if one of the standby servers is not >> responding, every time clients will be blocked by the server before >> giving up the connection trial. This could last for hours (for > This shouldn't happen. My proposal was to connect all servers > simultaneously, and then use that connection which would be established > first closing other ones That seems like seriously unfriendly behavior. It will trigger dropped connection bleats in the server logs, not to mentioned wasted process forks. regards, tom lane
>
> On 2015.08.19 at 12:55:15 +0530, Amit Kapila wrote:
>
> > > I think that failover procedure should begin before first connection is
> > > ever established.
> > >
> >
> > As far as I understand, failover gets initiated once the master server goes
> > down or is not accessible due to some reason, so for such cases if you
> > have the connection to both the servers then it might not work.
>
> Master server might go down when client is not started yet.
> And when client starts up, it has to find out which server to connect
> now.
>
> +1 for bringing the jdbc driver URI syntax into libpq, so that all interfaces > can be optionally specified this way. This doesn't preclude the use of > ipfailover, in fact it might be work well together. If you don't like it, don't > use it. +1 Another thought: multiple hosts in URI could be used in simple configuration for read-only clients. I faced with customers which manages two connections in process - to master and to one of several slaves. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
+1 for bringing the jdbc driver URI syntax into libpq, so that all interfaces
can be optionally specified this way. This doesn't preclude the use of
ipfailover, in fact it might be work well together. If you don't like it, don't
use it.
+1
Another thought: multiple hosts in URI could be used in simple configuration for read-only clients. I faced with customers which manages two connections in process - to master and to one of several slaves.
On Wed, Aug 19, 2015 at 9:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > That sort-of ties into what seems to me the main objection to this > proposal, namely that there is already a way to do this sort of thing: > DNS-based load balancing. All the clients think they connect to > db.mycompany.com, but which server they actually get is determined by > what IP address the DNS server tells them to use. But that kinda sucks. I mean, suppose I have three servers, A, B, and C. I point db.mycompany.com to A, which is the master; then A dies. Under your proposal, whatever script I use to control failover now has to change the DNS records to repoint db.mycompany.com to B, my new, and newly-promoted, new master. It's quite possible that some machines on the network, or some processes, will have the old IP address cached, and it may be several minutes before those caches time out. In the meantime, I'm down: even if I bounce the application servers, they may just try to reconnect to A. Victor's proposal is far more convenient. When A goes offline, the servers automatically begin trying to connect to B and C. Let's suppose I use iptables or something like that to prevent connections to B and C as long as A is online. Or pg_hba.conf or whatever. But once I'm sure A is dead, I can promote B and reconfigure it to allow connections *and I'm done*. At most, I need to restart my application servers. I don't need access to the DNS server - which the guys in IT are unlikely to provide to a lowly DBA anyhow. I don't have to worry about stale caches. Everything just works. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Wed, Aug 19, 2015 at 9:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > That sort-of ties into what seems to me the main objection to this > > proposal, namely that there is already a way to do this sort of thing: > > DNS-based load balancing. All the clients think they connect to > > db.mycompany.com, but which server they actually get is determined by > > what IP address the DNS server tells them to use. > > But that kinda sucks. I mean, suppose I have three servers, A, B, and > C. I point db.mycompany.com to A, which is the master; then A dies. > Under your proposal, whatever script I use to control failover now has > to change the DNS records to repoint db.mycompany.com to B, my new, > and newly-promoted, new master. It's quite possible that some > machines on the network, or some processes, will have the old IP > address cached, and it may be several minutes before those caches time > out. In the meantime, I'm down: even if I bounce the application > servers, they may just try to reconnect to A. The solution to this part seems to be to lower the TTL, which seems easy enough. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Aug 19, 2015 at 11:06 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > Always try with the first server specified in connection string and if that > is not available try with second and so on. I think for the case of > failover, > the design shouldn't be much complicated and it is a standard thing provided > by most of the client-side drivers in other databases. Considering what > currently PostgreSQL offers in terms of high-availability functionality, for > load-balancing, we need to be careful of many more things like redirecting > read-queries to standby's, write statements should be executed via > connection > to master. This can be really slow, though, if the master server is completely offline rather than actively rejecting connections. Maybe a good idea would be to have a new connection string parameter, failover_time. If not specified or set to a negative value, we use only the last-specified host=X parameter, just as now. If set to zero, we try to connect to everything at once, as Victor proposes. If set to a value >0, we try to connect to the servers one after another, initiating each connection attempt after the number of milliseconds specified by the parameter. So then you can do host=A host=B host=C failover_time=100 and this will launch a connection attempt against A immediately; if that hasn't completed within 100ms, then we'll also start one targeting server B, and then after another 100ms we'll try C. ISTM that this would cater to several different use cases. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote:
> On Wed, Aug 19, 2015 at 9:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > That sort-of ties into what seems to me the main objection to this
> > proposal, namely that there is already a way to do this sort of thing:
> > DNS-based load balancing. All the clients think they connect to
> > db.mycompany.com, but which server they actually get is determined by
> > what IP address the DNS server tells them to use.
>
> But that kinda sucks. I mean, suppose I have three servers, A, B, and
> C. I point db.mycompany.com to A, which is the master; then A dies.
> Under your proposal, whatever script I use to control failover now has
> to change the DNS records to repoint db.mycompany.com to B, my new,
> and newly-promoted, new master. It's quite possible that some
> machines on the network, or some processes, will have the old IP
> address cached, and it may be several minutes before those caches time
> out. In the meantime, I'm down: even if I bounce the application
> servers, they may just try to reconnect to A.
The solution to this part seems to be to lower the TTL, which seems
easy enough.
On Tue, Sep 1, 2015 at 1:50 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Robert Haas wrote: >> On Wed, Aug 19, 2015 at 9:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > That sort-of ties into what seems to me the main objection to this >> > proposal, namely that there is already a way to do this sort of thing: >> > DNS-based load balancing. All the clients think they connect to >> > db.mycompany.com, but which server they actually get is determined by >> > what IP address the DNS server tells them to use. >> >> But that kinda sucks. I mean, suppose I have three servers, A, B, and >> C. I point db.mycompany.com to A, which is the master; then A dies. >> Under your proposal, whatever script I use to control failover now has >> to change the DNS records to repoint db.mycompany.com to B, my new, >> and newly-promoted, new master. It's quite possible that some >> machines on the network, or some processes, will have the old IP >> address cached, and it may be several minutes before those caches time >> out. In the meantime, I'm down: even if I bounce the application >> servers, they may just try to reconnect to A. > > The solution to this part seems to be to lower the TTL, which seems > easy enough. In theory, yeah. In practice, not all systems obey the TTL, and in my experience, that's actually a fairly common problem. Sometimes the TTL gets enforced separately at multiple levels, so that all of the old records don't go away for 2 or 3 times the TTL, or occasionally completely random intervals of time thoroughly unrelated to the TTL you configured. And that assumes that the guy who controls the DNS server is willing to configure a different TTL for you, which is not always the case. It also assumes that guy is OK granting access to modify DNS records to an automated system running on the database server machines. That may be OK if the database server is THE ONE THING that needs treatment of this type, but if the company supports 50 or 100 services that all need failover handling, suddenly giving all of those things the ability to reconfigure the DNS server sounds like a pretty poor plan. Plus, there may be multiple copies of the DNS server in different geographies, all cloned from a master at the central office. When the central office dies, you lose not only the main database server but also the main DNS server. That's OK, because the backup DNS servers still have copies of all the data from the master ... but you can't make changes until the master is back up. All of these problems can be solved if you're willing to put enough time and energy into it. For example, Akamai has (or had, at the time I worked there) a service that did very robust geographical load-balancing and failover. So you could, like, go buy that, and maybe it would solve your problem. By now, there are probably other companies offering similar services. I have no doubt that similar solutions can be crafted from purely open-source software, and there may very well be great tools available for this that weren't around the last time I worked as a network administrator. But I think it's quite wrong to assume that the infrastructure for this is available and usable everywhere, because in my experience, that's far from the case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015-09-01 14:07:19 -0400, Robert Haas wrote: > But I think it's quite wrong to assume that the infrastructure for > this is available and usable everywhere, because in my experience, > that's far from the case. Especially when the alternative is a rather short patch implementing an otherwise widely available feature.
On 2015-09-01 14:07:19 -0400, Robert Haas wrote:
> But I think it's quite wrong to assume that the infrastructure for
> this is available and usable everywhere, because in my experience,
> that's far from the case.
Especially when the alternative is a rather short patch implementing an
otherwise widely available feature.
>
> On Tue, Sep 1, 2015 at 8:12 PM, Andres Freund <andres@anarazel.de> wrote:
>>
>> On 2015-09-01 14:07:19 -0400, Robert Haas wrote:
>> > But I think it's quite wrong to assume that the infrastructure for
>> > this is available and usable everywhere, because in my experience,
>> > that's far from the case.
>>
>> Especially when the alternative is a rather short patch implementing an
>> otherwise widely available feature.
>
>
> But that won't actually help in the case described by Robert: if the master server A failed, the client has no idea if B or C would become the new master.
>
> Unless it actually tries to connect them in turn and check for the result of pg_is_in_recovery(). I think that brings enough complexity for keeping this outside of libpq. Also think about all the extra flexibility people will likely want to have: number of retries, delay between retries, delay backoff, etc., to the point we'll have to support some sort of client code retry_policy_callback.
>
> For read-only clients you might want to include a number of slave hostnames, and let the connector choose one, but then again you can't achieve load-balancing on the client side, you're better off using round-robin DNS. To add or remove a slave you only need to update DNS, and not configuration on all the clients.
>
> For the master failover I think a common technique is to just move the floating IP address from the old master to the new one. This doesn't require touching the DNS record.
I <3
postgresql://postgres@localhost:7093/myfavedatabase
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
On Wed, Sep 2, 2015 at 4:52 AM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote: > On Tue, Sep 1, 2015 at 8:12 PM, Andres Freund <andres@anarazel.de> wrote: >> >> On 2015-09-01 14:07:19 -0400, Robert Haas wrote: >> > But I think it's quite wrong to assume that the infrastructure for >> > this is available and usable everywhere, because in my experience, >> > that's far from the case. >> >> Especially when the alternative is a rather short patch implementing an >> otherwise widely available feature. > > But that won't actually help in the case described by Robert: if the master > server A failed, the client has no idea if B or C would become the new > master. Sure it does. You just need to ensure that whichever of those is the new master accepts connections, and the other one doesn't. There are lots of ways to do this; e.g. give the machine a second IP that accepts connections only when the machine is the designated master, and have read-write clients connect to that IP, and read-only clients connect to the machine's main IP. Andres's point is the same as mine: we ought to accept this feature, in some form, because it's really quite useful. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Sep 2, 2015 at 4:52 AM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> On Tue, Sep 1, 2015 at 8:12 PM, Andres Freund <andres@anarazel.de> wrote:
>>
>> On 2015-09-01 14:07:19 -0400, Robert Haas wrote:
>> > But I think it's quite wrong to assume that the infrastructure for
>> > this is available and usable everywhere, because in my experience,
>> > that's far from the case.
>>
>> Especially when the alternative is a rather short patch implementing an
>> otherwise widely available feature.
>
> But that won't actually help in the case described by Robert: if the master
> server A failed, the client has no idea if B or C would become the new
> master.
Sure it does. You just need to ensure that whichever of those is the
new master accepts connections, and the other one doesn't. There are
lots of ways to do this; e.g. give the machine a second IP that
accepts connections only when the machine is the designated master,
and have read-write clients connect to that IP, and read-only clients
connect to the machine's main IP.
Andres's point is the same as mine: we ought to accept this feature,
in some form, because it's really quite useful.
On Thu, Sep 3, 2015 at 4:00 AM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote: > I believe that having a floating IP for the master is much more practical > approach and it doesn't require any patch to libpq or modification of the > client connection settings. I think that's a great approach if all the machines are on the same subnet. If they are in different datacenters, it doesn't work. The amount of opposition to this feature is remarkable considering that it's available in Oracle, SQL Server, MongoDB, Cassandra, and MySQL. See for example: http://docs.mongodb.org/manual/reference/connection-string/ https://datastax.github.io/python-driver/getting_started.html This is a small patch with minimal to no downside implementing a feature that is present in most or all of the major competing products. We're really doing ourselves a disservice if we reject it. I think it would be far better to progress to talking about what design we'd be comfortable with, rather than kidding ourselves that a feature that everyone else has and which somebody has taken the time to implement (thus, obviously it has value for them) and which has been discussed to general approval at PGCon developer meetings and which has been endorsed on this thread by three committers is somehow something that nobody really needs. Seriously? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Sep 3, 2015 at 3:56 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Sep 3, 2015 at 4:00 AM, Shulgin, Oleksandr > <oleksandr.shulgin@zalando.de> wrote: >> I believe that having a floating IP for the master is much more practical >> approach and it doesn't require any patch to libpq or modification of the >> client connection settings. > > I think that's a great approach if all the machines are on the same > subnet. If they are in different datacenters, it doesn't work. > > The amount of opposition to this feature is remarkable considering > that it's available in Oracle, SQL Server, MongoDB, Cassandra, and > MySQL. See for example: > > http://docs.mongodb.org/manual/reference/connection-string/ > https://datastax.github.io/python-driver/getting_started.html > > This is a small patch with minimal to no downside implementing a > feature that is present in most or all of the major competing > products. We're really doing ourselves a disservice if we reject it. > I think it would be far better to progress to talking about what > design we'd be comfortable with, rather than kidding ourselves that a > feature that everyone else has and which somebody has taken the time > to implement (thus, obviously it has value for them) and which has > been discussed to general approval at PGCon developer meetings and > which has been endorsed on this thread by three committers is somehow > something that nobody really needs. Seriously? +100 -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > On Thu, Sep 3, 2015 at 4:00 AM, Shulgin, Oleksandr > <oleksandr.shulgin@zalando.de> wrote: > > I believe that having a floating IP for the master is much more practical > > approach and it doesn't require any patch to libpq or modification of the > > client connection settings. > > I think that's a great approach if all the machines are on the same > subnet. If they are in different datacenters, it doesn't work. Anycast could technically be used to address that issue, but there's a whole host of reasons why that would be quite painful for a PG connection. > I think it would be far better to progress to talking about what > design we'd be comfortable with, rather than kidding ourselves that a > feature that everyone else has and which somebody has taken the time > to implement (thus, obviously it has value for them) and which has > been discussed to general approval at PGCon developer meetings and > which has been endorsed on this thread by three committers is somehow > something that nobody really needs. Seriously? Agreed. For my part, I like the JDBC configuration approach and definitely would ask that we support 'host:port' options since not all servers will be on the same port. I don't agree with Tom's concern regarding the simultaneous connection to all servers at once (yes, it's a bit unfriendly, but I don't see that as a reason to not provide that choice and there's a lot of reasons why you'd want it). What would be nice is a better way to configure these more complicated options than the single string or even the current very simple pg_service.conf file. For example, a service name which could define *other* service names to try along with a plan for how to connect to them (round robin, simultaneously, read/write only, etc) and perhaps also support specifying multiple service names to the 'service' parameter. I'd prefer that we support all different configuration options through the 'single string' method also, but I'm not convinced that's a hard requirement. Thanks! Stephen
On Thu, Sep 3, 2015 at 11:42 AM, Stephen Frost <sfrost@snowman.net> wrote: >> > I believe that having a floating IP for the master is much more practical >> > approach and it doesn't require any patch to libpq or modification of the >> > client connection settings. >> >> I think that's a great approach if all the machines are on the same >> subnet. If they are in different datacenters, it doesn't work. > > Anycast could technically be used to address that issue, but there's a > whole host of reasons why that would be quite painful for a PG > connection. /me rolls eyes. >> I think it would be far better to progress to talking about what >> design we'd be comfortable with, rather than kidding ourselves that a >> feature that everyone else has and which somebody has taken the time >> to implement (thus, obviously it has value for them) and which has >> been discussed to general approval at PGCon developer meetings and >> which has been endorsed on this thread by three committers is somehow >> something that nobody really needs. Seriously? > > Agreed. For my part, I like the JDBC configuration approach and > definitely would ask that we support 'host:port' options since not all > servers will be on the same port. I don't agree with Tom's concern > regarding the simultaneous connection to all servers at once (yes, it's > a bit unfriendly, but I don't see that as a reason to not provide that > choice and there's a lot of reasons why you'd want it). Yep. And it can even be configurable behavior, as I suggested upthread. > What would be nice is a better way to configure these more complicated > options than the single string or even the current very simple > pg_service.conf file. For example, a service name which could define > *other* service names to try along with a plan for how to connect to > them (round robin, simultaneously, read/write only, etc) and perhaps > also support specifying multiple service names to the 'service' > parameter. I'd prefer that we support all different configuration > options through the 'single string' method also, but I'm not convinced > that's a hard requirement. Maybe someday we should have all that, but I think for right now that's complicating things unnecessarily. I think the best proposal so far is to allow the host=X option to be repeated multiple times. If you repeat the host=X option N times, you can also repeat the port=X option exactly N times, or else you can specify it just once. Done. Alternatively, leave the host=X option alone and add a new option hostlist=X, allowing a comma-separated list of names or IPs, with each hostname or IP allowed an optional :port suffix. If host=X parameter is omitted or the connection to that machine fails, try everybody in the hostlist concurrently, or with some configurable (and presumably short) delay between one and then next. Again, done. Alternatively, change the rules for parsing the existing host=X parameter so that we split it on some separator that isn't a valid hostname character, and then strip off an optional :port syntax from each entry; that value, if present, overrides port=X for that entry. I think we're really tying ourselves in knots about problems that really aren't very hard to solve here. I'm sure some of these proposals are better than others and the idea thing may be something else again. But if NASA can send a space probe 7.5 billion kilometers to a frigid spheroid in the outer solar system without crashing into anything or having any catastrophic software or hardware failures, I bet we can come up with a convenient way to specify multiple IP addresses. I'd like the story of this feature to resemble a work by e.e. cummings more than it does one by Robert Jordan. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > Alternatively, change the rules for parsing the existing host=X > parameter so that we split it on some separator that isn't a valid > hostname character, and then strip off an optional :port syntax from > each entry; that value, if present, overrides port=X for that entry. : is not a valid character in hostnames, so I think this makes sense. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Maybe someday we should have all that, but I think for right now
that's complicating things unnecessarily. I think the best proposal
so far is to allow the host=X option to be repeated multiple times.
If you repeat the host=X option N times, you can also repeat the
port=X option exactly N times, or else you can specify it just once.
Done.
Alternatively, leave the host=X option alone and add a new option
hostlist=X, allowing a comma-separated list of names or IPs, with each
hostname or IP allowed an optional :port suffix. If host=X parameter
is omitted or the connection to that machine fails, try everybody in
the hostlist concurrently, or with some configurable (and presumably
short) delay between one and then next. Again, done.
Alternatively, change the rules for parsing the existing host=X
parameter so that we split it on some separator that isn't a valid
hostname character, and then strip off an optional :port syntax from
each entry; that value, if present, overrides port=X for that entry.
Shulgin, Oleksandr wrote: > > Alternatively, change the rules for parsing the existing host=X > > parameter so that we split it on some separator that isn't a valid > > hostname character, and then strip off an optional :port syntax from > > each entry; that value, if present, overrides port=X for that entry. > > It's tempting to use ':' as the separator here, but it's still valid for > directory names and host can be one in case of UN*X sockets. I think that's rare enough that we could just say that if you want to have a : in a directory name used for local connections, you have to escape the : character. This is going to be pretty easy to detect as a problem because of the obvious error message ("cannot parse "pg" in /usr/sockets:pg as a port number"), except in the even rarer case that the only stuff after the colon is digits. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
* Robert Haas (robertmhaas@gmail.com) wrote: > Alternatively, change the rules for parsing the existing host=X > parameter so that we split it on some separator that isn't a valid > hostname character, and then strip off an optional :port syntax from > each entry; that value, if present, overrides port=X for that entry. Using a ':' should work just fine for that. Having only one option for how all the connections are done (concurrently, round robin, etc) and an option for the timeout works for now. Thanks! Stephen
>
> On Thu, Sep 3, 2015 at 6:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>>
>> Maybe someday we should have all that, but I think for right now
>> that's complicating things unnecessarily. I think the best proposal
>> so far is to allow the host=X option to be repeated multiple times.
>> If you repeat the host=X option N times, you can also repeat the
>> port=X option exactly N times, or else you can specify it just once.
>> Done.
>
>
> But this already breaks backwards-compatibility with any clients who belief that whatever value specified the latest takes precedence. I'm not arguing that there are such use cases in the wild or that it's entirely sane thing to do, but still.
>
> More importantly, this will break any code that tries to parse the conninfo string and produce a hashmap from it for modification.
>> Alternatively, leave the host=X option alone and add a new option
>> hostlist=X, allowing a comma-separated list of names or IPs, with each
>> hostname or IP allowed an optional :port suffix. If host=X parameter
>> is omitted or the connection to that machine fails, try everybody in
>> the hostlist concurrently, or with some configurable (and presumably
>> short) delay between one and then next. Again, done.
>
>
> The exact behavior in case of both host/port and hostlist are specified becomes really tricky then. It's already tricky enough, if you recall the service files -- how are they going to come into play here?
>
> I believe the less there are implicit workings in the way libpq connects, the better.
question, "How would the Lone Ranger handle this?"
* Alvaro Herrera (alvherre@2ndquadrant.com) wrote: > Shulgin, Oleksandr wrote: > > > > Alternatively, change the rules for parsing the existing host=X > > > parameter so that we split it on some separator that isn't a valid > > > hostname character, and then strip off an optional :port syntax from > > > each entry; that value, if present, overrides port=X for that entry. > > > > It's tempting to use ':' as the separator here, but it's still valid for > > directory names and host can be one in case of UN*X sockets. > > I think that's rare enough that we could just say that if you want to > have a : in a directory name used for local connections, you have to > escape the : character. This is going to be pretty easy to detect as a > problem because of the obvious error message ("cannot parse "pg" in > /usr/sockets:pg as a port number"), except in the even rarer case that > the only stuff after the colon is digits. If we really want to worry about this, we could simply check if the directory exists with the ':5433' or whatever at the end and, if it does, use whatever the port specification is. If that directory doesn't exist, and one without the ':5433' does, then we try that directory and that port. Personally, I agree with Alvaro that it's really just overkill to worry about though. Thanks! Stephen
On Thu, Sep 3, 2015 at 12:57 PM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote: > On Thu, Sep 3, 2015 at 6:02 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> Maybe someday we should have all that, but I think for right now >> that's complicating things unnecessarily. I think the best proposal >> so far is to allow the host=X option to be repeated multiple times. >> If you repeat the host=X option N times, you can also repeat the >> port=X option exactly N times, or else you can specify it just once. >> Done. > > But this already breaks backwards-compatibility with any clients who belief > that whatever value specified the latest takes precedence. I'm not arguing > that there are such use cases in the wild or that it's entirely sane thing > to do, but still. Yep. If we care about backward compatibility, there can be a new option that must be specified to get the new behavior. We can also decide not to care about this case. > More importantly, this will break any code that tries to parse the conninfo > string and produce a hashmap from it for modification. That is true, but I am not sure I agree that it is important. Switch to a hashmap whose values are arrays. >> Alternatively, leave the host=X option alone and add a new option >> hostlist=X, allowing a comma-separated list of names or IPs, with each >> hostname or IP allowed an optional :port suffix. If host=X parameter >> is omitted or the connection to that machine fails, try everybody in >> the hostlist concurrently, or with some configurable (and presumably >> short) delay between one and then next. Again, done. > > The exact behavior in case of both host/port and hostlist are specified > becomes really tricky then. It's already tricky enough, if you recall the > service files -- how are they going to come into play here? It doesn't seem that tricky to me, but maybe I'm biased by having just invented it 5 minutes ago. > I believe the less there are implicit workings in the way libpq connects, > the better. I don't disagree with that as a general rule - only when it keeps us from implementing useful features. >>> Alternatively, change the rules for parsing the existing host=X >> parameter so that we split it on some separator that isn't a valid >> hostname character, and then strip off an optional :port syntax from >> each entry; that value, if present, overrides port=X for that entry. > > It's tempting to use ':' as the separator here, but it's still valid for > directory names and host can be one in case of UN*X sockets. The directory name is only likely to contain : on Windows, and Windows doesn't support UNIX sockets. All of these objections seem pretty thin to me. I'd accept any of them as a reason for preferring one alternative over another, but I don't accept that the presence of a few problems of this magnitude means we should give up on the feature. It's a good enough feature that it is worth the possibility of slightly inconveniencing someone running in an unusual configuration. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
<p dir="ltr"><br /> On Sep 3, 2015 7:30 PM, "Robert Haas" <<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>>wrote:<br /> ><br /> > All of these objections seempretty thin to me. I'd accept any of<br /> > them as a reason for preferring one alternative over another, but I<br/> > don't accept that the presence of a few problems of this magnitude<br /> > means we should give up on thefeature. It's a good enough feature<br /> > that it is worth the possibility of slightly inconveniencing someone<br/> > running in an unusual configuration.<p dir="ltr">I give up.<p dir="ltr">Though I still don't see any compellingreason for this to be in libpq itself. By the way, what about mixing conninfo and uris - should this not be allowed?<pdir="ltr">-<br /> Alex
On Thu, Sep 3, 2015 at 10:56:42AM -0400, Robert Haas wrote: > The amount of opposition to this feature is remarkable considering > that it's available in Oracle, SQL Server, MongoDB, Cassandra, and > MySQL. See for example: > > http://docs.mongodb.org/manual/reference/connection-string/ > https://datastax.github.io/python-driver/getting_started.html > > This is a small patch with minimal to no downside implementing a > feature that is present in most or all of the major competing > products. We're really doing ourselves a disservice if we reject it. > I think it would be far better to progress to talking about what > design we'd be comfortable with, rather than kidding ourselves that a > feature that everyone else has and which somebody has taken the time > to implement (thus, obviously it has value for them) and which has > been discussed to general approval at PGCon developer meetings and > which has been endorsed on this thread by three committers is somehow > something that nobody really needs. Seriously? As much as I like to disagree with Robert, I can't in this case. ;-) It is annoying for less capable database to say they have high availability when that just involves having a client library that can connect to multiple hosts. Yes, we can do this in DNS, but that is all happening at a different layer. Now, the counter-argument is that this is the wrong layer to do it, and we will end up adding tons of configurations variables to libpq to control this. We are clearly not adding this just because JDBC has it --- we are adding it because it allows for more complex server configurations. Could this ability be more powerfully done with DNS or a connection pooler, yes, but not everyone wants that complexity. For me, this libpq change has a simple user API with a small amount of code change that give us a simple solution to a common problem. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
First of all, I wish to thank all the participants of the discussion for their useful suggestions and critics. I see that people are really interested in this feature. Now, list of syntax and behavior changes against original proposal: 1. There is similar feature in JDBC, so syntax of URL-style connect string should be compatible with JDBC: postgresql://host:port,host:port/dbname... 2. In the param=value style connect string, there should be way to specify different non-standard port for each host in the cluster. So on, it seems that people came to conclusion, that allowing host:port syntax as value of the host parameter shouldn't break anything. So, port should be determined following way: If there is :portnumber in the host string, this portnumber takes precedence over anything else for this particular host. To avoid problems with file system objects with strange names, colon is interpreted as port number separator only if value of the host parameter doesn't begin with slash. If it begins with slash, it is handled exactly as it handled now. If port is not explicitly specified in the host string, then use parameter port= (for all hosts, which are not accompanied with own port number), if no port parameter in the connect string, use same chain of fallbacks up to compiled in standard value, as in use now. 3. Simultaneous connection to all the hosts in cluster shouldn't be default. By default hosts should be tried in order they are specified in the connect string. To minimize syntax changes there should be single parameter 'hostorder' which specifies order of hosts with possible choices sequential(default), parallel and random. 4. There should be a way to allow cluster software or system administrator some time to perform failover procedures and promote one of standbys to master. So, there should be additional parameter, failover_timeout, which specifies number of seconds connection attempts should be retried, if no usable master server found. 5. Parameter readonly (which allows to distinguish between failover and load balancing of readonly clients) should be implemented as in original proposal. -- Victor Wagner <vitus@wagner.pp.ru>
Victor Wagner wrote: > It would just take a bit more time for client and a bit more load for > server - to make sure that this connection is read-write by > issuing > > show transaction_read_only > > statement before considering connection useful. If the purpose of the feature is to wait for a failover to complete, shouldn't it check for pg_is_in_recovery() rather than transaction_read_only ? That's because a database or user can be made read-only-on-connect on an otherwise read-write instance by issuing ALTER DATABASE dbname SET default_transaction_read_only TO on; The same for a user with ALTER USER. In that case, transaction_read_only would be OFF after connecting, both on the master and on a slave, independantly of any failover in progress or finished or not having occurred at all. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
В Mon, 07 Sep 2015 17:32:48 +0200 "Daniel Verite" <daniel@manitou-mail.org> пишет: > Victor Wagner wrote: > > > It would just take a bit more time for client and a bit more load > > for server - to make sure that this connection is read-write by > > issuing > > > > show transaction_read_only > > > > statement before considering connection useful. > > If the purpose of the feature is to wait for a failover to complete, > shouldn't it check for pg_is_in_recovery() rather than > transaction_read_only ? > Purpose of this feature is to distinguish between master and standby servers. This allows failover system to work with standby servers accepting client connections, and even to create system where read-only clients can be loadbalanced among several hot backup servers, and read-write clients work with master, but do not need reconfiguration when failover happens. pg_is_in_recovery() is really better. But it seems that chapter 25 of documentation should be improved and this function mentioned in the section 25.5.1 (Hot Standby / User Overview) -- Victor Wagner <vitus@wagner.pp.ru>
Bruce Momjian <bruce@momjian.us> wrote: > It is annoying for less capable database to say they have high > availability when that just involves having a client library that > can connect to multiple hosts. This sounds like the "But all the *other* kids are doing it!" argument, which comes up often. We generally resist doing something solely on that basis, so the rest of the email is really what matters, I think, much as this does gall. > Yes, we can do this in DNS, but that is all happening at a > different layer. More than that, there are technical reasons that can be a bad solution. As just one example, the servers might well be in different domains. > Now, the counter-argument is that this is the wrong layer to do > it, and we will end up adding tons of configurations variables to > libpq to control this. Yeah, we definitely *don't* want to implement some sort of failover manager in every connector -- that way madness lies. > We are clearly not adding this just because JDBC has it --- we > are adding it because it allows for more complex server > configurations. I think what we need is a clear description of use cases where we think this is the solution, and some clear boundaries to the scope -- so it is also clear what kinds of problems this is *not* intended to solve. > Could this ability be more powerfully done with DNS or a > connection pooler, yes, but not everyone wants that complexity. > For me, this libpq change has a simple user API with a small > amount of code change that give us a simple solution to a common > problem. I'm not saying we shouldn't have something like this; but we need a clear definition of that common problem we are solving. I don't think I've seen that yet. I've seen various spins on solutions described, from which I can infer various possible problems; but to pick the best version of this as *the* solution I think we need a clear statement of the problem itself. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Sep 8, 2015 at 9:29 AM, Kevin Grittner <kgrittn@ymail.com> wrote: > I'm not saying we shouldn't have something like this; but we need a > clear definition of that common problem we are solving. I don't > think I've seen that yet. I've seen various spins on solutions > described, from which I can infer various possible problems; but to > pick the best version of this as *the* solution I think we need a > clear statement of the problem itself. I think the problem we should be trying to solve is: Given a set of server IPs, connect to one that is up. I believe this comes up in several different scenarios. Example #1: I need to move my database server to a different IP address. I could have clients connect by name, but using DNS sucks for the reasons already discussed. I could move the server and then change all the client connect strings afterwards, but then I'm bound to be down for longer than necessary. Instead, I'll change all of my connect strings FIRST, to specify that either IP is acceptable. Then I'll move the server to the new IP, and all of them will quickly find that the old IP is down and the new IP is up and start using that instead. At leisure, I can remove the old IP from connect strings (and then re-purpose that IP). Example #2: I am using EnterpriseDB's xDB multi-master replication, or 2ndQuadrant's BDR, to replicate among geographically distributed database servers. I want clients to connect to their local server, but if it is down, I want them to connect to one of the other masters. Connecting to the local server first minimizes replication conflicts, since most transactions on a given application server will be for data local to that geography, but being willing to fall back to some other server maximizes availability when my local server goes down. Example #3: I have a master and and 3 SR standbys, all on different subnets. Periodically, I fail over, so that the master role moves around. Each server has an IP which can be used for read-only connections. Each also has a virtual IP which is up when it is the write master and down when it is a standby. Read-only queries are 90% of my traffic, and eventual consistency is fine. So, for a read query, I want to pick among the IPs that are up; for write IPs, I want to find the one place that writes can be performed, but that might be any of 4 virtual IPs. I'm sure there are more. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > I think the problem we should be trying to solve is: Given a set > of server IPs, connect to one that is up. > > I believe this comes up in several different scenarios. > > Example #1: [single server; changing IP address gracefully] > > Example #2: [xDB/BDR client uses local master if up; else a remote] > > Example #3: [SR/HS with changing primary] For all of those it is clear that we do not need (or want!) heartbeat, STONITH, fencing, etc. to be handled by the connector. If the above are the sorts of problems we are trying to solve, a very simple solution is the best. I know you outlined several; I'm not sure it would matter all that much which one we used -- any would work and someone should Just Do It. > I'm sure there are more. Ay, there's the rub. Some people seemed to be suggesting that this should be far more than what you describe above. That would, IMO, be a mistake. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Sep 9, 2015 at 4:30 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > >> I think the problem we should be trying to solve is: Given a set >> of server IPs, connect to one that is up. >> >> I believe this comes up in several different scenarios. >> >> Example #1: [single server; changing IP address gracefully] >> >> Example #2: [xDB/BDR client uses local master if up; else a remote] >> >> Example #3: [SR/HS with changing primary] > > For all of those it is clear that we do not need (or want!) > heartbeat, STONITH, fencing, etc. to be handled by the connector. > If the above are the sorts of problems we are trying to solve, a > very simple solution is the best. I know you outlined several; I'm > not sure it would matter all that much which one we used -- any > would work and someone should Just Do It. > >> I'm sure there are more. > > Ay, there's the rub. Some people seemed to be suggesting that this > should be far more than what you describe above. That would, IMO, > be a mistake. It sounds like we are pretty much on the same page. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015.08.18 at 07:18:50 +0300, Victor Wagner wrote: > Rationale > ========= > > Since introduction of the WAL-based replication into the PostgreSQL, it is > possible to create high-availability and load-balancing clusters. > > However, there is no support for failover in the client libraries. So, only > way to provide transparent for client application failover is IP address > migration. This approach has some limitation, i.e. it requires that > master and backup servers reside in the same subnet or may not be > feasible for other reasons. > > Commercial RDBMS, such as Oracle, employ more flexible approach. They > allow to specify multiple servers in the connect string, so if primary > server is not available, client library tries to connect to other ones. > > This approach allows to use geographically distributed failover clusters > and also is a cheap way to implement load-balancing (which is not > possible with IP address migration). > Attached patch which implements client library failover and loadbalancing as was described in the proposal <20150818041850.GA5092@wagner.pp.ru>. This patch implements following fuctionality: 1. It is allowed to specify several hosts in the connect string, either in URL-style (separated by comma) or in param=value form (several host parameters). 2. Each host parameter can be accompanied with port specifier separated by colon. Port, specified such way takes precedence of port parameter or default port for this particular host only. 3. There is hostorder parameter with two possible valies 'sequential' and 'random' (default sequential). 'parallel' hostorder described in the proposal is not yet implemented in this version of patch. 4. In the URL-style connect string parameter loadBalanceHosts=true is considered equal to 'hostorder=random' for compatibility with jdbc. 5. Added new parameter readonly=boolean. If this parameter is false (the default) upon successful connection check is performed that backend is not in the recovery state. If so, connection is not considered usable and next host is tried. 6. Added new parameter falover_timeout. If no usable host is found and this parameter is specified, hosts are retried cyclically until this timeout expires. It gives some time for claster management software to promote one of standbys to master if master fails. By default there is no retries. Some implementation notes: 1. Field pghost in the PGconn structure now stores comma separated list of hosts, which is parsed in the connectDBStart. So, expected results of some tests in src/interfaces/libpq/test are changed. 2. Url with colon but no port number after the host no more considered valid. 3. With hostorder=random we have to seed libc random number gernerator. Some care was taken to not to lose entropy if generator was initialized by app before connection, and to ensure different random values if several connections are made from same application in one second (even in single thread). RNG is seeded by xor of current time, random value from this rng before seeding (if it was seeded earlier, it keeps entropy) and address of the connection structure. May be it worth effort adding thread id or pid, but there is no portable way to doing so, so it would need testing on all supported platform.
Attachment
On 2015.08.18 at 07:18:50 +0300, Victor Wagner wrote:
> Rationale
> =========
>
> Since introduction of the WAL-based replication into the PostgreSQL, it is
> possible to create high-availability and load-balancing clusters.
>
> However, there is no support for failover in the client libraries. So, only
> way to provide transparent for client application failover is IP address
> migration. This approach has some limitation, i.e. it requires that
> master and backup servers reside in the same subnet or may not be
> feasible for other reasons.
>
> Commercial RDBMS, such as Oracle, employ more flexible approach. They
> allow to specify multiple servers in the connect string, so if primary
> server is not available, client library tries to connect to other ones.
>
> This approach allows to use geographically distributed failover clusters
> and also is a cheap way to implement load-balancing (which is not
> possible with IP address migration).
>
Attached patch which implements client library failover and
loadbalancing as was described in the proposal
<20150818041850.GA5092@wagner.pp.ru>.
This patch implements following fuctionality:
1. It is allowed to specify several hosts in the connect string, either
in URL-style (separated by comma) or in param=value form (several host
parameters).
2. Each host parameter can be accompanied with port specifier separated
by colon. Port, specified such way takes precedence of port parameter or
default port for this particular host only.
3. There is hostorder parameter with two possible valies 'sequential'
and 'random' (default sequential). 'parallel' hostorder described in the
proposal is not yet implemented in this version of patch.
4. In the URL-style connect string parameter loadBalanceHosts=true is
considered equal to 'hostorder=random' for compatibility with jdbc.
5. Added new parameter readonly=boolean. If this parameter is false (the
default) upon successful connection check is performed that backend is
not in the recovery state. If so, connection is not considered usable
and next host is tried.
6. Added new parameter falover_timeout. If no usable host is found and
this parameter is specified, hosts are retried cyclically until this
timeout expires. It gives some time for claster management software to
promote one of standbys to master if master fails. By default there is
no retries.
Some implementation notes:
1. Field pghost in the PGconn structure now stores comma separated list
of hosts, which is parsed in the connectDBStart. So, expected results of
some tests in src/interfaces/libpq/test are changed.
2. Url with colon but no port number after the host no more considered
valid.
On 2015.10.14 at 14:47:46 +0200, Shulgin, Oleksandr wrote: > > So what exactly would happen with this command: PGHOST=host1 psql -h host2 > > Or this one: PGHOST=host1 psql host=host2 The right thing - value of the PGHOST environment variable is ignored, and explicitely specified host is used. > What about service files? Oops, I've missed something here. Multiple host lines in the service file do not produce list of alternate hosts. I'll address this case in the next version of patch. However, comma-separated list of host in the one host string in the service file works. > > trying postgresql://[::1]:12345/db > -dbname='db' host='::1' port='12345' (inet) > +dbname='db' host='[::1]:12345' (inet) > > Such a change doesn't look really nice to me. Really, any alternative would be worse. Only alternative I can imagine is to replace pghost field in the PGConn structure by array or linked list of structures with host and port fields. Now I decided to keep such array local to connectDBStart function, and store list of host-port pair as string. It takes some CPU cycles to reparse this list each time when reconnection is needed, but this is negligible compared with DNS resolution of each host name. And we cannot avoid repeated DNS resolution, because we have to take into account possibility of DNS-based failover. > > 2. Url with colon but no port number after the host no more considered > > valid. > > > > We could live with that, but is there a good reason for backward > compatibility break in this case? Because we can. Really, I don't see any useful semantic of host with colon at the end. Null host name with colon and port have useful semantic (if there is just one host in the connect string), so it kept as it was. In the old version of connection-string parsing code stripping lone colon of the host name was done by generic code (because port number have gone to another variable anyway). Now handling of lone colon is special case, because we just check syntax of host-port pair list. And I have to choose how to handle it - either complain or silently strip it away. Complaining was similier. We compute port number and check if it is between 1 and 65535. Zero-length port number is not integer between 1 and 65535. > -- > Alex -- Victor Wagner <vitus@wagner.pp.ru>
On 14 October 2015 at 18:41, Victor Wagner <vitus@wagner.pp.ru> wrote: > 5. Added new parameter readonly=boolean. If this parameter is false (the > default) upon successful connection check is performed that backend is > not in the recovery state. If so, connection is not considered usable > and next host is tried. What constitutes "failed" as far as this is concerned? Like the PgJDBC approach I wonder how much this'll handle in practice and how it'll go with less clear-cut failures like disk-full on a replica that's a member of the failover pool, long delays before no-route-to-host errors, dns problems, etc. Had any chance to simulate network failures? -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2015.10.15 at 17:13:46 +0800, Craig Ringer wrote: > On 14 October 2015 at 18:41, Victor Wagner <vitus@wagner.pp.ru> wrote: > > > 5. Added new parameter readonly=boolean. If this parameter is false (the > > default) upon successful connection check is performed that backend is > > not in the recovery state. If so, connection is not considered usable > > and next host is tried. > > What constitutes "failed" as far as this is concerned? failed means "select pg_is_in_recovery() returns true". The only reason of adding this parameter is to state that we can use connection to warm-backup slave (or if we cannot). > Like the PgJDBC approach I wonder how much this'll handle in practice > and how it'll go with less clear-cut failures like disk-full on a > replica that's a member of the failover pool, long delays before > no-route-to-host errors, dns problems, etc. Really I don't think that disk-full and other such node errors are problems of client library. It is cluster management software which should check for these errors and shut down erroneous nodes or at least disable their listening for connection. As for long-timeouts on network failures, they can be handled by existing connect_timeout parameter. May be it is worth effort to rewrite this parameter handling. Now it only used in blocking functions which call connectDBComplete (PQconnectdb[Params], PQreset and PQsetdbLogin). May be it should be hidden into PQconnectPoll state machine, so event-driven applications could get its handling for free. Really, my initial proposal included hostorder=parallel especially for this situation. But I've decided not to implement it right now, because it requires changes in the libpq public API. > Had any chance to simulate network failures? Testing failover code is really tricky thing. All testing I've done yet was in the mode "Shudown one of the nodes and see what happens". It is not too hard to simulate network failures using iptables, but requires considerable planning of the test scenario. -- Victor Wagner <vitus@wagner.pp.ru>
On 2015.10.14 at 13:41:51 +0300, Victor Wagner wrote: > > Attached patch which implements client library failover and > loadbalancing as was described in the proposal > <20150818041850.GA5092@wagner.pp.ru>. > I'm sending imporoved verison of patch. As Olexander Shulgin noted, previous version of patch lacks support for service files. Now support for service files is implemented and multiple host statements in the service file are allowed. -- Victor Wagner <vitus@wagner.pp.ru>
Attachment
On 21 October 2015 at 10:07, Victor Wagner <vitus@wagner.pp.ru> wrote: > On 2015.10.14 at 13:41:51 +0300, Victor Wagner wrote: > >> >> Attached patch which implements client library failover and >> loadbalancing as was described in the proposal >> <20150818041850.GA5092@wagner.pp.ru>. >> > > I'm sending imporoved verison of patch. As Olexander Shulgin noted, > previous version of patch lacks support for service files. > > Now support for service files is implemented and multiple host > statements in the service file are allowed. This is causing breakage: $ pg_basebackup -v -x -D standby1 -h localhost -p 5532 -U rep_user row number 0 is out of range 0..-1 Segmentation fault The log contains this: 2015-10-22 13:51:41.380 BST - rep_user - [unknown]DEBUG: received replication command: SELECT pg_is_in_recovery() 2015-10-22 13:51:41.380 BST - rep_user - [unknown]ERROR: syntax error 2015-10-22 13:51:41.380 BST - rep_user - [unknown]LOG: could not receive data from client: Connection reset by peer 2015-10-22 13:51:41.380 BST - rep_user - [unknown]DEBUG: unexpected EOF on client connection 2015-10-22 13:51:41.380 BST - rep_user - [unknown]DEBUG: shmem_exit(0): 1 before_shmem_exit callbacks to make 2015-10-22 13:51:41.380 BST - rep_user - [unknown]DEBUG: shmem_exit(0): 7 on_shmem_exit callbacks to make 2015-10-22 13:51:41.380 BST - rep_user - [unknown]DEBUG: proc_exit(0): 3 callbacks to make 2015-10-22 13:51:41.380 BST - rep_user - [unknown]DEBUG: exit(0) 2015-10-22 13:51:41.380 BST - rep_user - [unknown]DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make 2015-10-22 13:51:41.380 BST - rep_user - [unknown]DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make 2015-10-22 13:51:41.380 BST - rep_user - [unknown]DEBUG: proc_exit(-1): 0 callbacks to make 2015-10-22 13:51:41.380 BST - - DEBUG: reaping dead processes 2015-10-22 13:51:41.380 BST - - DEBUG: server process (PID 24303) exited with exit code 0 -- Thom
On Thu, 22 Oct 2015 14:33:11 +0100 Thom Brown <thom@linux.com> wrote: > On 21 October 2015 at 10:07, Victor Wagner <vitus@wagner.pp.ru> wrote: > > On 2015.10.14 at 13:41:51 +0300, Victor Wagner wrote: > > > >> > >> Attached patch which implements client library failover and > >> loadbalancing as was described in the proposal > >> <20150818041850.GA5092@wagner.pp.ru>. > >> > > > > I'm sending imporoved verison of patch. As Olexander Shulgin noted, > > previous version of patch lacks support for service files. > > > > Now support for service files is implemented and multiple host > > statements in the service file are allowed. > > This is causing breakage: > > $ pg_basebackup -v -x -D standby1 -h localhost -p 5532 -U rep_user > row number 0 is out of range 0..-1 It seems that pg_basebackup should always specify readonly attribute for the connection. Your data directory is named standby1, so I suspect that you are trying to make backup from read-only standby instance of the base. Can you check if problem persists in your setup with command pg_basebackup -v -x -D standby1 \ -d "host=localhost port=5532 user=rep_user readonly=1" > Segmentation fault Of course, this shouldn't happen even if libpq is severely misused. And it is almost evident what should be done to fix it. But making sure that my patch doesn't interfere with operation of pg_basebackup is a bit more complicated.
> On 2015.10.14 at 13:41:51 +0300, Victor Wagner wrote: > >> Attached patch which implements client library failover and >> loadbalancing as was described in the proposal >> <20150818041850.GA5092@wagner.pp.ru>. >> > I'm sending imporoved verison of patch. As Olexander Shulgin noted, > previous version of patch lacks support for service files. > > Now support for service files is implemented and multiple host > statements in the service file are allowed. A couple of minor nits: When you call pg_is_in_recovery(), you should schema-qualify the function name, just in case some other version of that function exists in the search_path. Also, pg_is_in_recovery() returns a boolean value - PQgetvalue() will not return "true" or "false", it will return "t" or "f". And, you have a bit of garbage in the patch (the patch inserts UNIXSOCK_PATH(portstr, portnum, conn->pgunixsocket); in the header comment at the top of fe-connect.c). -- Korry
On 23 October 2015 at 12:52, Victor Wagner <vitus@wagner.pp.ru> wrote: > On Thu, 22 Oct 2015 14:33:11 +0100 > Thom Brown <thom@linux.com> wrote: > >> On 21 October 2015 at 10:07, Victor Wagner <vitus@wagner.pp.ru> wrote: >> > On 2015.10.14 at 13:41:51 +0300, Victor Wagner wrote: >> > >> >> >> >> Attached patch which implements client library failover and >> >> loadbalancing as was described in the proposal >> >> <20150818041850.GA5092@wagner.pp.ru>. >> >> >> > >> > I'm sending imporoved verison of patch. As Olexander Shulgin noted, >> > previous version of patch lacks support for service files. >> > >> > Now support for service files is implemented and multiple host >> > statements in the service file are allowed. >> >> This is causing breakage: >> >> $ pg_basebackup -v -x -D standby1 -h localhost -p 5532 -U rep_user >> row number 0 is out of range 0..-1 > > It seems that pg_basebackup should always specify readonly attribute > for the connection. > > Your data directory is named standby1, so I suspect > that you are trying to make backup from read-only standby instance of > the base. No, the -D option on pg_basebackup specifies the directory to receive the backup into, not from. This is backing up a primary instance. > Can you check if problem persists in your setup with command > > pg_basebackup -v -x -D standby1 \ > -d "host=localhost port=5532 user=rep_user readonly=1" Yes, this works: $ pg_basebackup -v -x -D standby1 -d "host=localhost port=5532 user=rep_user readonly=1" transaction log start point: 0/2000028 on timeline 1 transaction log end point: 0/2000130 pg_basebackup: base backup completed Thom
В Fri, 23 Oct 2015 22:14:56 +0100 Thom Brown <thom@linux.com> пишет: c> > > > pg_basebackup -v -x -D standby1 \ > > -d "host=localhost port=5532 user=rep_user readonly=1" > > Yes, this works: > > $ pg_basebackup -v -x -D standby1 -d "host=localhost port=5532 > user=rep_user readonly=1" > transaction log start point: 0/2000028 on timeline 1 > transaction log end point: 0/2000130 > pg_basebackup: base backup completed > > Thom Thanks for your help. I was unable to reproduce segfault with this situation on Linux x86-64.. I got message: row number 0 is out of range 0..-1 and connection failed. Can you tell me a bit more of your setup. At least which architecture you are compiling Postgres for. -- Victor Wagner <vitus@wagner.pp.ru>
В Fri, 23 Oct 2015 16:02:33 -0400 Korry Douglas <korry.douglas@enterprisedb.com> пишет: d> > Now support for service files is implemented and multiple host > > statements in the service file are allowed. > > A couple of minor nits: > > When you call pg_is_in_recovery(), you should schema-qualify the > function name, just in case some other version of that function > exists in the search_path. > > Also, pg_is_in_recovery() returns a boolean value - PQgetvalue() will > not return "true" or "false", it will return "t" or "f". > > And, you have a bit of garbage in the patch (the patch inserts > UNIXSOCK_PATH(portstr, portnum, conn->pgunixsocket); in the header > comment at the top of fe-connect.c). > Thanks, I'd address this issues in the next version of path. -- Victor Wagner <vitus@wagner.pp.ru>
On Fri, Oct 23, 2015 at 4:02 PM, Korry Douglas <korry.douglas@enterprisedb.com> wrote: > When you call pg_is_in_recovery(), you should schema-qualify the function > name, just in case some other version of that function exists in the > search_path. I wonder whether it's really a good idea to put this kind of logic into libpq at all. I think there was some previous votes against doing so, and I tend to agree with that viewpoint. Shouldn't probing for the state of the connection be the caller's job, not libpq's? If somebody wants to write a wrapper function around this that runs this query after connecting - or any other query - they can do so. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015.10.14 at 13:41:51 +0300, Victor Wagner wrote: > Attached patch which implements client library failover and > loadbalancing as was described in the proposal > <20150818041850.GA5092@wagner.pp.ru>. New version of patch 1. Handles replication connections correctly (i.e doesn't attempt to check readwrite mode if replication option is on) 2. Some minor improvement recommended by Korry Douglas in <562A9259.4060408@enterprisedb.com> -- Victor Wagner <vitus@wagner.pp.ru>
Attachment
On Mon, Oct 26, 2015 at 3:58 AM, Victor Wagner <vitus@wagner.pp.ru> wrote: > On 2015.10.14 at 13:41:51 +0300, Victor Wagner wrote: >> Attached patch which implements client library failover and >> loadbalancing as was described in the proposal >> <20150818041850.GA5092@wagner.pp.ru>. > > New version of patch > > 1. Handles replication connections correctly (i.e doesn't attempt to > check readwrite mode if replication option is on) > 2. Some minor improvement recommended by Korry Douglas in > <562A9259.4060408@enterprisedb.com> Changing the subject breaks threading for anybody using gmail (and maybe some other mail readers). Could you try to avoid that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 24 October 2015 at 07:32, Victor Wagner <vitus@wagner.pp.ru> wrote: > В Fri, 23 Oct 2015 22:14:56 +0100 > Thom Brown <thom@linux.com> пишет: > > c> > >> > pg_basebackup -v -x -D standby1 \ >> > -d "host=localhost port=5532 user=rep_user readonly=1" >> >> Yes, this works: >> >> $ pg_basebackup -v -x -D standby1 -d "host=localhost port=5532 >> user=rep_user readonly=1" >> transaction log start point: 0/2000028 on timeline 1 >> transaction log end point: 0/2000130 >> pg_basebackup: base backup completed >> >> Thom > > Thanks for your help. > > I was unable to reproduce segfault with this situation on Linux x86-64.. > > I got message: > > row number 0 is out of range 0..-1 > > and connection failed. Can you tell me a bit more of your setup. > At least which architecture you are compiling Postgres for. Linux 64-bit, 3.11 kernel. Thom
On 10/14/15 6:41 AM, Victor Wagner wrote: > 1. It is allowed to specify several hosts in the connect string, either > in URL-style (separated by comma) or in param=value form (several host > parameters). I'm not fond of having URLs that are not valid URLs according to the applicable standards. Because then they can't be parsed or composed by standard libraries. Also, this assumes that all the components other than host and port are the same. Earlier there was a discussion about why the ports would ever need to be different. Well, why can't the database names be different?I could have use for that. I think you should just accept multiple URLs.
On 10/24/15 7:55 AM, Robert Haas wrote: > On Fri, Oct 23, 2015 at 4:02 PM, Korry Douglas > <korry.douglas@enterprisedb.com> wrote: >> When you call pg_is_in_recovery(), you should schema-qualify the function >> name, just in case some other version of that function exists in the >> search_path. > > I wonder whether it's really a good idea to put this kind of logic > into libpq at all. I think there was some previous votes against > doing so, and I tend to agree with that viewpoint. Shouldn't probing > for the state of the connection be the caller's job, not libpq's? If > somebody wants to write a wrapper function around this that runs this > query after connecting - or any other query - they can do so. Well, earlier there was a debate whether any of this should be in libpq. I think asking for a read-only or read-write connection would be quite useful, and there isn't really a lot of room for interpretation that would lead to a lot of application-specific implementations. That said, it might still be best to omit this from the first round of the patch, to simplify the discussion.
On 10/14/15 6:41 AM, Victor Wagner wrote:
> 1. It is allowed to specify several hosts in the connect string, either
> in URL-style (separated by comma) or in param=value form (several host
> parameters).
I'm not fond of having URLs that are not valid URLs according to the
applicable standards. Because then they can't be parsed or composed by
standard libraries.
Also, this assumes that all the components other than host and port are
the same. Earlier there was a discussion about why the ports would ever
need to be different. Well, why can't the database names be different?
I could have use for that.
I think you should just accept multiple URLs.
<http://manpages.courier-mta.org/htmlman5/ldap.conf.5.html>
question, "How would the Lone Ranger handle this?"
On 26 October 2015 at 16:25, Peter Eisentraut <peter_e@gmx.net> wrote:On 10/14/15 6:41 AM, Victor Wagner wrote:
> 1. It is allowed to specify several hosts in the connect string, either
> in URL-style (separated by comma) or in param=value form (several host
> parameters).
I'm not fond of having URLs that are not valid URLs according to the
applicable standards. Because then they can't be parsed or composed by
standard libraries.
Also, this assumes that all the components other than host and port are
the same. Earlier there was a discussion about why the ports would ever
need to be different. Well, why can't the database names be different?
I could have use for that.
I think you should just accept multiple URLs.I'd give a "+1" on this...As an area of new behaviour, I don't see a big problem with declining tosupport every wee bit of libpq configuration, and instead requiring theuse of URLs.Trying to put "multiplicities" into each parameter (and then consideringit at the pg_service level, too) is WAY more complicated, and for afeature where it seems to me that it is pretty reasonable to have aseries of fully qualified URLs.Specifying several URLs should be easier to understand, easier totest, easier to code, and easier to keep from blowing up badly.
Alex
On 10/27/2015 01:42 AM, Shulgin, Oleksandr wrote: > On Mon, Oct 26, 2015 at 10:02 PM, Christopher Browne <cbbrowne@gmail.com > <mailto:cbbrowne@gmail.com>> wrote: > > > On 26 October 2015 at 16:25, Peter Eisentraut <peter_e@gmx.net > <mailto:peter_e@gmx.net>> wrote: > > On 10/14/15 6:41 AM, Victor Wagner wrote: > > 1. It is allowed to specify several hosts in the connect string, either > > in URL-style (separated by comma) or in param=value form (several host > > parameters). > > I'm not fond of having URLs that are not valid URLs according to the > applicable standards. Because then they can't be parsed or > composed by > standard libraries. > > Also, this assumes that all the components other than host and > port are > the same. Earlier there was a discussion about why the ports > would ever > need to be different. Well, why can't the database names be > different? > I could have use for that. > > I think you should just accept multiple URLs. > > > I'd give a "+1" on this... > > As an area of new behaviour, I don't see a big problem with declining to > support every wee bit of libpq configuration, and instead requiring the > use of URLs. > > Trying to put "multiplicities" into each parameter (and then considering > it at the pg_service level, too) is WAY more complicated, and for a > feature where it seems to me that it is pretty reasonable to have a > series of fully qualified URLs. > > Specifying several URLs should be easier to understand, easier to > test, easier to code, and easier to keep from blowing up badly. > > > Setting aside all other concerns, have a +1 from me on that too. I'm good with this. +1 -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Mon, 26 Oct 2015 16:25:57 -0400 Peter Eisentraut <peter_e@gmx.net> wrote: > Also, this assumes that all the components other than host and port > are the same. Earlier there was a discussion about why the ports > would ever need to be different. Well, why can't the database names > be different? I could have use for that. Because of way postgresql replication is implemented. This multihost feature is for clusters. Either for automatic switch from one cluster node to another when master node fails and one of the standbys get promoted to master, or for load balancing between hot standby nodes. Postgresql allows different replicas listen on different ports, because each replica has its own postgresql conf, but repication works on the node level, not on the database level, so all the databases from master node would be replicated with their original names.
On 10/28/15 4:18 AM, Victor Wagner wrote: > On Mon, 26 Oct 2015 16:25:57 -0400 > Peter Eisentraut <peter_e@gmx.net> wrote: > >> Also, this assumes that all the components other than host and port >> are the same. Earlier there was a discussion about why the ports >> would ever need to be different. Well, why can't the database names >> be different? I could have use for that. > > Because of way postgresql replication is implemented. There are multiple types of PostgreSQL replication, and there will be others in the future.
On Thu, Oct 29, 2015 at 8:29 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > On 10/28/15 4:18 AM, Victor Wagner wrote: >> On Mon, 26 Oct 2015 16:25:57 -0400 >> Peter Eisentraut <peter_e@gmx.net> wrote: >> >>> Also, this assumes that all the components other than host and port >>> are the same. Earlier there was a discussion about why the ports >>> would ever need to be different. Well, why can't the database names >>> be different? I could have use for that. >> >> Because of way postgresql replication is implemented. > > There are multiple types of PostgreSQL replication, and there will be > others in the future. That's true, but doesn't allowing every parameter to be multiply specified greatly increase the implementation complexity for a pretty marginal benefit? I think host and IP would hit 98% of the use cases here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, 30 Oct 2015 14:26:45 +0100 Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Oct 29, 2015 at 8:29 PM, Peter Eisentraut <peter_e@gmx.net> > wrote: > > That's true, but doesn't allowing every parameter to be multiply > specified greatly increase the implementation complexity for a pretty > marginal benefit? I think host and IP would hit 98% of the use cases > here. As far as I can tell from the experience of writing this patch, it would greatly increase complexity. If there should be only need to have multiple hosts, I could almost completely incapsulate changes into DNS resolving code (which already allows to handle several addresses). Need to support different port for each host already required change of internal storage, and as a consequence changes in the regression test suite (src/interfaces/libpq/test/regress.out) But both host and port are used in the same place - in the connect system call. If we add possibility to different values per host for some parameter, such as database name, which should be used significantly later, i.e. during sending of first protocol message, size of patch would grow may be twice.
<div dir="ltr">On 30 October 2015 at 09:26, Robert Haas <<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>>wrote:<br />><br />> On Thu, Oct 29, 2015 at 8:29PM, Peter Eisentraut <<a href="mailto:peter_e@gmx.net">peter_e@gmx.net</a>> wrote:<br />> > On 10/28/15 4:18AM, Victor Wagner wrote:<br />> >> On Mon, 26 Oct 2015 16:25:57 -0400<br />> >> Peter Eisentraut <<ahref="mailto:peter_e@gmx.net">peter_e@gmx.net</a>> wrote:<br />> >><br />> >>> Also, this assumesthat all the components other than host and port<br />> >>> are the same. Earlier there was a discussionabout why the ports<br />> >>> would ever need to be different. Well, why can't the database names<br/>> >>> be different? I could have use for that.<br />> >><br />> >> Because of waypostgresql replication is implemented.<br />> ><br />> > There are multiple types of PostgreSQL replication,and there will be<br />> > others in the future.<br />><br />> That's true, but doesn't allowingevery parameter to be multiply<br />> specified greatly increase the implementation complexity for a pretty<br/>> marginal benefit? I think host and IP would hit 98% of the use cases<br />> here.<br /><br />I thinkit makes the feature WORSE. I am getting more and more convinced<br />that the Correct Solution is for this featureto be handled by submitting<br />multiple URIs, and my argument isn't even based on any aspects of<br />implementationcomplexity.<br /><br />Take as example the case where I have two database servers I want to<br />be considered.<br/><br />a) Database with URI<br /> postgresql://<a href="http://cbbrowne@server-a.example.info:5432/my-first-database">cbbrowne@server-a.example.info:5432/my-first-database</a><br /><br/>b) Database with URL<br /> postgresql://<a href="http://robert@server-b.example.info:7032/my-second-database">robert@server-b.example.info:7032/my-second-database</a><br /><br/>With all the "per-variable multiplicities", this would turn into a combinatorial explosion of combinations, some 2^4,or 16 possible servers, some of which likely don't exist, and others of which aren't proper (e.g. - trying to connectto database a) using robert's credentials).<br /><br />Possibly some of those combinations are outright improper.<br/><br />I'm not going to claim it's terribly wise to be doing the cross-credential bit; I'd think it likely tobe rather dumb for the application to use one user when connecting to one database and another when connecting to another. But the notion of it being nondeterministic is just awful.<br /><br />I head back to... "the way OpenLDAP does this"...<br/> They let you specify multiple LDAP servers...<br /> ldap://<a href="http://server1.example.info:389">server1.example.info:389</a>ldap://<a href="http://server2.example.info:389">server2.example.info:389</a><br/>where URIs are separated by whitespace.<br /><br/>Seems like Best Goodness for Postgres to do something analogous...<br /> postgresql://<a href="http://cbbrowne@server-a.example.info:5432/my-first-database">cbbrowne@server-a.example.info:5432/my-first-database</a> postgresql://<a href="http://robert@server-b.example.info:7032/my-second-database">robert@server-b.example.info:7032/my-second-database</a><br /><br/>Is it a bit long? Sure. But it is unambiguous, and requires *only* whitespace parsing to separate the URIs. I'dthink it fine for Postgres to support this via multiple "-d" options; that would be about as good.<br /><br />That cancover 100% of cases, and I don't see it needing to interact specially with odd bits such as "was that a replication slot?" You can always choose to shoot yourself in the foot, but this doesn't spin the roulette for you...<br />--<br />Whenconfronted by a difficult problem, solve it by reducing it to the<br />question, "How would the Lone Ranger handlethis?"</div>
On Fri, Oct 30, 2015 at 11:29:09AM -0400, Christopher Browne wrote: > On 30 October 2015 at 09:26, Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Thu, Oct 29, 2015 at 8:29 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > > > On 10/28/15 4:18 AM, Victor Wagner wrote: > > >> On Mon, 26 Oct 2015 16:25:57 -0400 > > >> Peter Eisentraut <peter_e@gmx.net> wrote: > > >> > > >>> Also, this assumes that all the components other than host and port > > >>> are the same. Earlier there was a discussion about why the ports > > >>> would ever need to be different. Well, why can't the database names > > >>> be different? I could have use for that. > > >> > > >> Because of way postgresql replication is implemented. > > > > > > There are multiple types of PostgreSQL replication, and there will be > > > others in the future. > > > > That's true, but doesn't allowing every parameter to be multiply > > specified greatly increase the implementation complexity for a pretty > > marginal benefit? I think host and IP would hit 98% of the use cases > > here. > > I think it makes the feature WORSE. I am getting more and more convinced > that the Correct Solution is for this feature to be handled by submitting > multiple URIs, and my argument isn't even based on any aspects of > implementation complexity. > > Take as example the case where I have two database servers I want to > be considered. > > a) Database with URI > postgresql://cbbrowne@server-a.example.info:5432/my-first-database > > b) Database with URL > postgresql://robert@server-b.example.info:7032/my-second-database > > With all the "per-variable multiplicities", this would turn into a > combinatorial explosion of combinations, some 2^4, or 16 possible servers, > some of which likely don't exist, and others of which aren't proper (e.g. - > trying to connect to database a) using robert's credentials). > > Possibly some of those combinations are outright improper. Let's say that the chances of their all both existing and being proper are close enough to zero not to matter. > Seems like Best Goodness for Postgres to do something analogous... > postgresql://cbbrowne@server-a.example.info:5432/my-first-database > postgresql://robert@server-b.example.info:7032/my-second-database Yes. > Is it a bit long? Sure. But it is unambiguous, and requires *only* > whitespace parsing to separate the URIs. I'd think it fine for Postgres to > support this via multiple "-d" options; that would be about as good. Indeed. Is there any risk of losing ordering information in the standard command line processing libraries? > That can cover 100% of cases, and I don't see it needing to interact > specially with odd bits such as "was that a replication slot?" You can > always choose to shoot yourself in the foot, but this doesn't spin the > roulette for you... An evocative image, if not a pretty one. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 10/30/2015 08:29 AM, Christopher Browne wrote: > I think it makes the feature WORSE. I am getting more and more convinced > that the Correct Solution is for this feature to be handled by submitting > multiple URIs, and my argument isn't even based on any aspects of > implementation complexity. +1 -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 10/30/15 9:26 AM, Robert Haas wrote: > That's true, but doesn't allowing every parameter to be multiply > specified greatly increase the implementation complexity for a pretty > marginal benefit? Well, the way I would have approached is that after all the parsing you end up with a list of PQconninfoOption arrays and you you loop over that list and call connectDBStart() until you have a connection that satisfies you. I see that the patch doesn't do that and it looks quite messy as a result.
On Mon, Nov 2, 2015 at 12:15 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > On 10/30/15 9:26 AM, Robert Haas wrote: >> That's true, but doesn't allowing every parameter to be multiply >> specified greatly increase the implementation complexity for a pretty >> marginal benefit? > > Well, the way I would have approached is that after all the parsing you > end up with a list of PQconninfoOption arrays and you you loop over that > list and call connectDBStart() until you have a connection that > satisfies you. > > I see that the patch doesn't do that and it looks quite messy as a result. Hmm, I see. That approach certainly seems worth considering. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015.11.02 at 12:15:48 -0500, Peter Eisentraut wrote: >> That's true, but doesn't allowing every parameter to be multiply >> specified greatly increase the implementation complexity for a pretty >> marginal benefit? >Well, the way I would have approached is that after all the parsing you >end up with a list of PQconninfoOption arrays and you you loop over that >list and call connectDBStart() until you have a connection that >satisfies you. >I see that the patch doesn't do that and it looks quite messy as a >result. There are some drawbacks with this approach 1. You are suggesting usage of two nested loops, instead of one straight loop - outer loop over the URLs in the connect string, inner loop over the IP addresses each URL resolves into. (This inner loop already presents there for several versions of the postgres). 2. You are suggesting that it should be possible to specify all options separately for each of the fallback hosts. But some options control how next host should be choosen (i.e. use random order for load-balancing or sequential order for high availability), so they should be specified only once per connect string. Some other options are connection level rather then node-level (i.e. if you are using replication mode of the connection, there is no sense to specify it for one host and not specify for other). These options should be better specified such way, that it would be not possible to use conflicting values. Third category of options are specified per-cluster much more often than per node. But are quite often changed from compiled in default. Typically there are authentication credentials (even you use trigger-based replication, user information would probably be replicated), database name (if you use WAL-level replication), client encoding (which depends on more on the client than on server), etc. It would be pain if we would need specify them for each host separately. Syntax, implemented in the my patch allows even to specify nonstandard-port once for all hosts (using port= parameter in the query string) and override it for particular host in the list using colon syntax. 3. Last, but not least. There is more than one postgresql client implementation. Apart of libpq we have jdbc, native GO language client etc. And I think that maintaining compatibility of the connect string between them is good thing. So, I've modelled syntax of multihost feature in the libpq URL-style syntax after jdbc syntax. -- Victor Wagner <vitus@wagner.pp.ru>
On Thu, Nov 5, 2015 at 10:12 AM, Victor Wagner <vitus@wagner.pp.ru> wrote: > There are some drawbacks with this approach > > 1. You are suggesting usage of two nested loops, instead of one straight > loop - outer loop over the URLs in the connect string, inner loop over > the IP addresses each URL resolves into. (This inner loop already > presents there for several versions of the postgres). This isn't really a problem. > 2. You are suggesting that it should be possible to specify all options > separately for each of the fallback hosts. > > But some options control how > next host should be choosen (i.e. use random order for load-balancing > or sequential order for high availability), so they should be specified > only once per connect string. But this seems like a point worthy of consideration. > Third category of options are specified per-cluster much more often > than per node. But are quite often changed from compiled in default. > Typically there are authentication credentials (even you use > trigger-based replication, user information would probably be > replicated), database name (if you use WAL-level replication), client > encoding (which depends on more on the client than on server), etc. > > It would be pain if we would need specify them for each host separately. > Syntax, implemented in the my patch allows even to specify > nonstandard-port once for all hosts (using port= parameter in the query > string) and override it for particular host in the list using colon > syntax. This, too. > 3. Last, but not least. There is more than one postgresql client > implementation. Apart of libpq we have jdbc, native GO language client > etc. And I think that maintaining compatibility of the connect string > between them is good thing. So, I've modelled syntax of multihost > feature in the libpq URL-style syntax after jdbc syntax. Also this. I'm not sure what the right decision is here - I can see both sides of it to some degree. But I think your points deserve to be taken seriously. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 6 November 2015 at 13:34, Robert Haas <robertmhaas@gmail.com> wrote: >> But some options control how >> next host should be choosen (i.e. use random order for load-balancing >> or sequential order for high availability), so they should be specified >> only once per connect string. > > But this seems like a point worthy of consideration. This makes me think that trying to wedge this into the current API using a funky connection string format might be a mistake. Lots of these issues would go away if you could provide more than just a connstring. >> Third category of options are specified per-cluster much more often >> than per node. But are quite often changed from compiled in default. > > This, too. Like this. If you have a global set of connection options, then per-connection options, it's a lot simpler. I guess that can be hacked in with a more dramatic change in the connstring format, otherwise, incorporating subsections or something. I'm not keen on doing anything like that, but there are all sorts of options... "global:[user=fred port=5432] host1:[host=somehost user=user1] host2:[host=localhost]" (puts head in brown paper bag) -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Craig Ringer wrote: > On 6 November 2015 at 13:34, Robert Haas <robertmhaas@gmail.com> wrote: > > >> But some options control how > >> next host should be choosen (i.e. use random order for load-balancing > >> or sequential order for high availability), so they should be specified > >> only once per connect string. > > > > But this seems like a point worthy of consideration. > > This makes me think that trying to wedge this into the current API > using a funky connection string format might be a mistake. > > Lots of these issues would go away if you could provide more than just > a connstring. Yes, I agree. I wonder if the failover aspect couldn't be better covered by something more powerful than a single URI, such as the service file format. Maybe just allow the contents of a service file to be passed as a "connection string", so that the application/environment can continue to maintain the connection info as a string somewhere instead of having to have an actual file. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Nov 6, 2015 at 10:38 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Craig Ringer wrote: >> On 6 November 2015 at 13:34, Robert Haas <robertmhaas@gmail.com> wrote: >> >> >> But some options control how >> >> next host should be choosen (i.e. use random order for load-balancing >> >> or sequential order for high availability), so they should be specified >> >> only once per connect string. >> > >> > But this seems like a point worthy of consideration. >> >> This makes me think that trying to wedge this into the current API >> using a funky connection string format might be a mistake. >> >> Lots of these issues would go away if you could provide more than just >> a connstring. > > Yes, I agree. I wonder if the failover aspect couldn't be better > covered by something more powerful than a single URI, such as the > service file format. Maybe just allow the contents of a service file to > be passed as a "connection string", so that the application/environment > can continue to maintain the connection info as a string somewhere > instead of having to have an actual file. This gets pretty far from the concept of a connection string, which is supposed to be a sort of universal format for telling anything PostgreSQL how to find the database server. For example, psql accepts a connection string, but you wouldn't want to have to pass the entire contents of a file, newlines and all, as a command-line argument. Now, we could design some kind of new connection string format that would work, like: psql 'alternatives failovertime=1 (host=192.168.10.49 user=rhaas) (host=192.168.10.52 user=bob)' There are a couple of problems with this. One, it can't be parsed anything like a normal connection string. Lots of software will have to be rewritten to use whatever parser we come up with for the new format. Two, it's nothing like the existing JDBC syntax that already does more or less what people want here. Three, I think that implementing it is likely to be an extraordinarily large project. The whole data structure that libpq uses to store a parsed connection string probably needs to change in fairly major ways. So, I really wonder why we're not happy with the ability to substitute out just the host and IP. http://www.postgresql.org/message-id/4DFA5A86.9030301@acm.org https://jdbc.postgresql.org/documentation/94/connect.html (bottom of page) Yes, it's possible that some people might need to change some other parameter when the host or IP changes. But those people aren't prohibited from writing their own function that tries to connect to multiple PostgreSQL servers one after the other using any system they'd like. I don't think we really need to cater to every possible use case in core. The JDBC feature has been out for several years and people seem to like it OK - the fact that we can think of things it doesn't do doesn't make it a bad idea. I'd rather have a feature that can do 95% of what people want in the real world next month than 99% of what people want in 2019, and I'm starting to think we're setting the bar awfully high. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 7 November 2015 at 04:59, Robert Haas <robertmhaas@gmail.com> wrote: > So, I really wonder why we're not happy with the ability to substitute > out just the host and IP. I tend to agree. That solves 95% of the problem and doesn't foreclose solving the other 5% some other way if someone really cares later. I'd rather not see convoluted and complex connstrings, per my prior post. The JDBC extended URL style seems good enough. IMO the biggest problem is that client-side failover is way more complex than "got -ECONNREFUSED, try next host". I think we're all focusing on not being able to twiddle arbitrary settings while ignoring the real problem with client failover, i.e. making it actually work in the real world. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, 9 Nov 2015 15:14:02 +0800 Craig Ringer <craig@2ndquadrant.com> wrote: > > I'd rather not see convoluted and complex connstrings, per my prior > post. The JDBC extended URL style seems good enough. It is what my patch implements. > IMO the biggest problem is that client-side failover is way more > complex than "got -ECONNREFUSED, try next host". I think we're all > focusing on not being able to twiddle arbitrary settings while > ignoring the real problem with client failover, i.e. making it > actually work in the real world. > I've tried to deal with some of these problems. My patch have support for following things: 1. Check whether database instance is in the recovery/standby mode and try to find another one if so. 2. Let cluster management software to have some time to promote one of the standbys to master. I.e. there can be failover timeout specified to allow retry after some time if no working master found. Really there is room for some improvements in handling of connect timeout (which became much more important thing when ability to try next host appears). Now it is handled only by blocking-mode connect functions, not by async state machine. But I decided to publish patch without these improvements to get feedback from community.
On 26 October 2015 at 07:58, Victor Wagner <vitus@wagner.pp.ru> wrote: > On 2015.10.14 at 13:41:51 +0300, Victor Wagner wrote: > >> Attached patch which implements client library failover and >> loadbalancing as was described in the proposal >> <20150818041850.GA5092@wagner.pp.ru>. > > New version of patch > > 1. Handles replication connections correctly (i.e doesn't attempt to > check readwrite mode if replication option is on) > 2. Some minor improvement recommended by Korry Douglas in > <562A9259.4060408@enterprisedb.com> This patch doesn't apply. On line 636, this appears: <<<<<<< BEGIN MERGE CONFLICT: local copy shown first <<<<<<<<<<<<<<< + if (value[0] == 't') ======= COMMON ANCESTOR content follows ============================ + if (strcmp(value, "true") == 0) ======= MERGED IN content follows ================================== + if (value[0]=='t') >>>>>>> END MERGE CONFLICT >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thom
On 11/6/15 3:59 PM, Robert Haas wrote: > So, I really wonder why we're not happy with the ability to substitute > out just the host and IP. One of my concerns is that the proposed URLs are not valid URLs anymore that can be parsed or composed with a URL library, which would be sad. The other issue is that I think it's taking the implementation down the wrong path.
On 11/5/15 10:12 AM, Victor Wagner wrote: > 2. You are suggesting that it should be possible to specify all options > separately for each of the fallback hosts. I'm not necessarily suggesting that. I think it could very well be conn="postgresql://foo postgresql://bar order=random"
On Tue, 17 Nov 2015 19:42:42 +0000 Thom Brown <thom@linux.com> wrote: > > This patch doesn't apply. On line 636, this appears: It doesn't apply to current state of the master branch due to changes in fe-connect.c made by Tom Lane in the commit c40591885. Unfortunately these changes appears in the source just line by line with my changes. I'm attaching here updated version of the patch, applicable to master later that 11-12-2015.
Attachment
On Tue, Nov 17, 2015 at 3:20 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > On 11/6/15 3:59 PM, Robert Haas wrote: >> So, I really wonder why we're not happy with the ability to substitute >> out just the host and IP. > > One of my concerns is that the proposed URLs are not valid URLs anymore > that can be parsed or composed with a URL library, which would be sad. I agree that's sad. But compatibility with JDBC is happy, which is something to think about, at least. > The other issue is that I think it's taking the implementation down the > wrong path. Well, I don't object to the idea of having some kind of a function that can take a set of connection strings or URLs and try them all. But then it's not really a connection string any more. I mean, with the proposal where we allow the host option to be multiply specified, you can just plug this feature into any utility whatever that understands connection strings, and regardless of whether you are using the classic connection string format or the URL format, it will work. And it's a pretty simple extension of what we support already. psql 'host=foo host=bar failovertime=1' If we do what you're proposing, then I'm really not clear on how this works. Certainly, there can be new libpq functions that take a list of connection strings or URLs as an argument... but what would this look like if you wanted to invoke psql this way? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> I've tried to deal with some of these problems. > > My patch have support for following things: > > 1. Check whether database instance is in the recovery/standby mode and > try to find another one if so. > 2. Let cluster management software to have some time to promote one of > the standbys to master. I.e. there can be failover timeout specified to > allow retry after some time if no working master found. > > Really there is room for some improvements in handling of connect > timeout (which became much more important thing when ability to try > next host appears). Now it is handled only by blocking-mode connect > functions, not by async state machine. But I decided to publish patch > without these improvements to get feedback from community. A bit of testing on this turns up a problem. Consider a connection string that specifies two hosts and a read/write connection: postgresql://korry@127.0.0.1:5301,127.0.0.1:5300/edb?readonly=0 If the first host is a healthy standby (meaning that I can connect to it but pg_is_in_recovery() returns 't'), the state machine will never move on to the second host. The problem seems to be in PQconnectPoll() in the case for CONNECTION_AUTH_OK, specifically this code: /* We can release the address list now. */ pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist); conn->addrlist= NULL; conn->addr_cur = NULL; That frees up the list of alternative host addresses. The state machine then progresses to CONNECTION_CHECK_RO (which invokes pg_is_in_recovery()), then CONNECTION_CHECK_RW (waiting for the response from the server). Since we just connected to a standby replica, pg_is_in_recovery() returns 't' and the state changes to CONNECTION_NEEDED. The next call to try_next_address() will fail to find a next address because we freed the list in the case for CONNECTION_AUTH_OK. A related issue: when the above sequence occurs, no error message is returned (because the case for CONNECTION_NEEDED thinks "an appropriate error message is already set up"). In short, if you successfully connect to standby replica (and specify readonly=0), the remaining hosts are ignored, even though one of those hosts is a master. And one comment about the code itself - in connectDBStart(), you've added quite a bit of code to parse multiple hosts/ports. I would recommend adding a comment that shows the expected format, and then choosing better variable names (better than 'p', 'q', and 'r'); perhaps the variable names could refer to components of the connection string that you are parsing (like 'host', 'port', 'delimiter', ...). That would make the code much easier to read/maintain. Thanks. -- Korry
> >> I've tried to deal with some of these problems. >> >> My patch have support for following things: >> >> 1. Check whether database instance is in the recovery/standby mode and >> try to find another one if so. >> 2. Let cluster management software to have some time to promote one of >> the standbys to master. I.e. there can be failover timeout specified to >> allow retry after some time if no working master found. >> >> Really there is room for some improvements in handling of connect >> timeout (which became much more important thing when ability to try >> next host appears). Now it is handled only by blocking-mode connect >> functions, not by async state machine. But I decided to publish patch >> without these improvements to get feedback from community. > A bit of testing on this turns up a problem. > > Consider a connection string that specifies two hosts and a read/write > connection: > > postgresql://korry@127.0.0.1:5301,127.0.0.1:5300/edb?readonly=0 > > If the first host is a healthy standby (meaning that I can connect to > it but pg_is_in_recovery() returns 't'), the state machine will never > move on to the second host. > > The problem seems to be in PQconnectPoll() in the case for > CONNECTION_AUTH_OK, specifically this code: > > /* We can release the address list now. */ > pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist); > conn->addrlist = NULL; > conn->addr_cur = NULL; > > That frees up the list of alternative host addresses. The state > machine then progresses to CONNECTION_CHECK_RO (which invokes > pg_is_in_recovery()), then CONNECTION_CHECK_RW (waiting for the > response from the server). Since we just connected to a standby > replica, pg_is_in_recovery() returns 't' and the state changes to > CONNECTION_NEEDED. The next call to try_next_address() will fail to > find a next address because we freed the list in the case for > CONNECTION_AUTH_OK. > > A related issue: when the above sequence occurs, no error message is > returned (because the case for CONNECTION_NEEDED thinks "an > appropriate error message is already set up"). > > In short, if you successfully connect to standby replica (and specify > readonly=0), the remaining hosts are ignored, even though one of those > hosts is a master. A follow-up - the conn->addrlist is also freed when the case for CONNECTION_CHECK_RW decides that conn->status != CONNECTION_OK and calls closePGConn(). -- Korry
On Mon, 07 Dec 2015 15:26:33 -0500 Korry Douglas <korry.douglas@enterprisedb.com> wrote: > The problem seems to be in PQconnectPoll() in the case for > CONNECTION_AUTH_OK, specifically this code: > > /* We can release the address list now. */ > pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist); > conn->addrlist = NULL; > conn->addr_cur = NULL; > That frees up the list of alternative host addresses. The state > machine then progresses to CONNECTION_CHECK_RO (which invokes > pg_is_in_recovery()), then CONNECTION_CHECK_RW (waiting for the Thank you for pointing to this problem. I've overlooked it. Probably I should improve my testing scenario. I', attaching new version of the patch, which, hopefully, handles address list freeing correctly. -- Victor Wagner <vitus@wagner.pp.ru>
Attachment
Sorry, but there is something wrong with your patch: % patch -p1 -C < ~/Downloads/libpq-failover-5.patch .... -------------------------- |diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c ... Hunk #18 succeeded at 2804. patch: **** malformed patch at line 666: <<<<<<< BEGIN MERGE CONFLICT: local copy shown first <<<<<<<<<<<<<<< Victor Wagner wrote: > On Mon, 07 Dec 2015 15:26:33 -0500 > Korry Douglas <korry.douglas@enterprisedb.com> wrote: > > >> The problem seems to be in PQconnectPoll() in the case for >> CONNECTION_AUTH_OK, specifically this code: >> >> /* We can release the address list now. */ >> pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist); >> conn->addrlist = NULL; >> conn->addr_cur = NULL; >> That frees up the list of alternative host addresses. The state >> machine then progresses to CONNECTION_CHECK_RO (which invokes >> pg_is_in_recovery()), then CONNECTION_CHECK_RW (waiting for the > > Thank you for pointing to this problem. I've overlooked it. Probably > I should improve my testing scenario. > > I', attaching new version of the patch, which, hopefully, handles > address list freeing correctly. > > > > > > > -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On Mon, 21 Dec 2015 17:18:37 +0300 Teodor Sigaev <teodor@sigaev.ru> wrote: > Sorry, but there is something wrong with your patch: > % patch -p1 -C < ~/Downloads/libpq-failover-5.patch Really, somehow broken version of the patch got into message. Here is correct one.
Attachment
Victor Wagner wrote: > On Mon, 21 Dec 2015 17:18:37 +0300 > Teodor Sigaev <teodor@sigaev.ru> wrote: > > > Sorry, but there is something wrong with your patch: > > % patch -p1 -C < ~/Downloads/libpq-failover-5.patch > > Really, somehow broken version of the patch got into message. It would be good to add a few test cases of this new functionality to libpq's regress.in file. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Dec 21, 2015 at 11:50 PM, Victor Wagner <vitus@wagner.pp.ru> wrote: > On Mon, 21 Dec 2015 17:18:37 +0300 > Teodor Sigaev <teodor@sigaev.ru> wrote: > >> Sorry, but there is something wrong with your patch: >> % patch -p1 -C < ~/Downloads/libpq-failover-5.patch > > Really, somehow broken version of the patch got into message. > > Here is correct one. Patch is moved to next CF as you are still working on it. -- Michael
On 21 December 2015 at 14:50, Victor Wagner <vitus@wagner.pp.ru> wrote: > On Mon, 21 Dec 2015 17:18:37 +0300 > Teodor Sigaev <teodor@sigaev.ru> wrote: > >> Sorry, but there is something wrong with your patch: >> % patch -p1 -C < ~/Downloads/libpq-failover-5.patch > > Really, somehow broken version of the patch got into message. > > Here is correct one. The segfault issue I originally reported now appears to be resolved. But now I have another one: psql 'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random&readonly=1&failover_timeout=5' -c 'show port' Segmentation fault This is where no nodes are available. If I remove hostorder=random, or replace it with hostorder=sequential, it doesn't segfault. However, it then tries to connect to PGHOST on PGPORT repeatedly, even if I bring up one of the nodes it should be looking for. Not only this, but it seems to do it forever if failover_timeout is greater than 0. Thom
On Tue, 19 Jan 2016 14:34:54 +0000 Thom Brown <thom@linux.com> wrote: > The segfault issue I originally reported now appears to be resolved. > > But now I have another one: > > psql > 'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random&readonly=1&failover_timeout=5' > -c 'show port' > > Segmentation fault This segfault appears to be a simple programming error. which comes out only when we need to retry address list (which happens when no node is available at the time of first scan) There is code which prevents random hostorder to connect same node twice a row while (current != NULL) { if (current == conn->addr_cur) { current = current->ai_next; } count++; if ((rand()&0xffff) < 0x10000 / count) { choice = current; } current = current->ai_next; } There should be continue; after first current = current->ai_next; > This is where no nodes are available. If I remove hostorder=random, > or replace it with hostorder=sequential, it doesn't segfault. > However, it then tries to connect to PGHOST on PGPORT repeatedly, even > if I bring up one of the nodes it should be looking for. Not only > this, but it seems to do it forever if failover_timeout is greater > than 0. But this turns out to be more complicated matter, so I'm not ready to provide next version of patch yet.
On Tue, 19 Jan 2016 14:34:54 +0000 Thom Brown <thom@linux.com> wrote: > > The segfault issue I originally reported now appears to be resolved. > > But now I have another one: > > psql > 'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random&readonly=1&failover_timeout=5' > -c 'show port' Here is new version of the patch. Now I've reworked hostorder=random and it seems to work as well as sequential. failover_timeout works too. I've also found a case when attempt to connect fail when receiving FATAL message from server which is not properly up yet. So, it is fixed too. Addititonally, error messages from all failed connect attempts are not accumulated now. Only last one is returned. -- Victor Wagner <vitus@wagner.pp.ru>
Attachment
You're editing the expected file for the libpq-regress thingy, but you haven't added any new lines to test the new capability. I think it'd be good to add some there. (I already said this earlier in the thread; is there any reason you ignored it the first time?) If the test program requires improvement to handle the new stuff, let's do that. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 22 January 2016 at 19:30, Victor Wagner <vitus@wagner.pp.ru> wrote: > On Tue, 19 Jan 2016 14:34:54 +0000 > Thom Brown <thom@linux.com> wrote: > >> >> The segfault issue I originally reported now appears to be resolved. >> >> But now I have another one: >> >> psql >> 'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random&readonly=1&failover_timeout=5' >> -c 'show port' > > Here is new version of the patch. Now I've reworked hostorder=random and > it seems to work as well as sequential. failover_timeout works too. > I've also found a case when attempt to connect fail when receiving > FATAL message from server which is not properly up yet. So, it is fixed > too. > > Addititonally, error messages from all failed connect attempts are not > accumulated now. Only last one is returned. I can't connect to a standby with the patch applied: thom@swift:~/Development/test$ psql -p 5531 postgres psql: thom@swift:~/Development/test$ No error message, nothing in the logs. I find this is the case with any standby, but doesn't affect primaries. So this has broken existing functionality somewhere. Thom
On 23 January 2016 at 03:32, Thom Brown <thom@linux.com> wrote: > On 22 January 2016 at 19:30, Victor Wagner <vitus@wagner.pp.ru> wrote: >> On Tue, 19 Jan 2016 14:34:54 +0000 >> Thom Brown <thom@linux.com> wrote: >> >>> >>> The segfault issue I originally reported now appears to be resolved. >>> >>> But now I have another one: >>> >>> psql >>> 'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random&readonly=1&failover_timeout=5' >>> -c 'show port' >> >> Here is new version of the patch. Now I've reworked hostorder=random and >> it seems to work as well as sequential. failover_timeout works too. >> I've also found a case when attempt to connect fail when receiving >> FATAL message from server which is not properly up yet. So, it is fixed >> too. >> >> Addititonally, error messages from all failed connect attempts are not >> accumulated now. Only last one is returned. > > I can't connect to a standby with the patch applied: > > thom@swift:~/Development/test$ psql -p 5531 postgres > psql: thom@swift:~/Development/test$ > > No error message, nothing in the logs. I find this is the case with > any standby, but doesn't affect primaries. > > So this has broken existing functionality somewhere. Okay, I've tested my original complaints, and they appear to be resolved. The timeout works fine, the sequential and random orders behave as expected, and the readonly setting is also working. Thom
On Fri, 22 Jan 2016 16:36:15 -0300 Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > You're editing the expected file for the libpq-regress thingy, but you > haven't added any new lines to test the new capability. I think it'd > be good to add some there. (I already said this earlier in the > thread; is there any reason you ignored it the first time?) I seriously doubt that this program can be used to test new capabilities. All it does, it calls PQconninfoParse and than examines some fields of PGconn structure. The only reason I've to modify expected output is that I changed usage of one of these field, and keep there comma-separated list of host:port pairs instead of just hostname. Thus contents this field after parsing of some existing URIs is changed, while semantic of URIs is same. If I add some new uris, than only thing I can test is that comma is properly copied from the URI to this field. And may be that some syntax errors are properly detected. > If the test program requires improvement to handle the new stuff, > let's do that. The only improvement I can think of, is to examine list of the addrinfo structures into which host list is eventually parsed. But it is quite problematic, because it depends on many factors which are outside of our control. It stores addresses resolved via OS's name resolver. For example, if we specify 'localhost' there it can be parsed into one or to records, depending on presence of IPv6 support. If we use some other hostname here, we'll rely on internet connectivity and DNS system. And we cannot ensure that any name to IP mapping will persist for a long enough time. So, I think that new functionality need other approach for testing. There should be test of real connection to real temporary cluster. Probably, based on Perl TAP framework which is actively used in the Postgres recently. > -- Victor Wagner <vitus@wagner.pp.ru>
On 23 January 2016 at 03:32, Thom Brown <thom@linux.com> wrote: > On 22 January 2016 at 19:30, Victor Wagner <vitus@wagner.pp.ru> wrote: >> On Tue, 19 Jan 2016 14:34:54 +0000 >> Thom Brown <thom@linux.com> wrote: >> >>> >>> The segfault issue I originally reported now appears to be resolved. >>> >>> But now I have another one: >>> >>> psql >>> 'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random&readonly=1&failover_timeout=5' >>> -c 'show port' >> >> Here is new version of the patch. Now I've reworked hostorder=random and >> it seems to work as well as sequential. failover_timeout works too. >> I've also found a case when attempt to connect fail when receiving >> FATAL message from server which is not properly up yet. So, it is fixed >> too. >> >> Addititonally, error messages from all failed connect attempts are not >> accumulated now. Only last one is returned. > > I can't connect to a standby with the patch applied: > > thom@swift:~/Development/test$ psql -p 5531 postgres > psql: thom@swift:~/Development/test$ > > No error message, nothing in the logs. I find this is the case with > any standby, but doesn't affect primaries. > > So this has broken existing functionality somewhere. Okay, I've tested this again with additional logging. Again, I'm just running "psql -p 5531 postgres", which connects to a standby. This immediately exits psql, and the logs show: 2016-01-24 15:04:59.879 GMT - [unknown] - [unknown] LOG: 00000: connection received: host=[local] 2016-01-24 15:04:59.879 GMT - [unknown] - [unknown] LOCATION: BackendInitialize, postmaster.c:4081 2016-01-24 15:04:59.880 GMT - thom - postgres LOG: 00000: connection authorized: user=thom database=postgres 2016-01-24 15:04:59.880 GMT - thom - postgres LOCATION: PerformAuthentication, postinit.c:272 2016-01-24 15:04:59.886 GMT - thom - postgres LOG: 00000: statement: SELECT pg_catalog.pg_is_in_recovery() 2016-01-24 15:04:59.886 GMT - thom - postgres LOCATION: exec_simple_query, postgres.c:935 2016-01-24 15:04:59.886 GMT - thom - postgres LOG: 00000: duration: 0.583 ms 2016-01-24 15:04:59.886 GMT - thom - postgres LOCATION: exec_simple_query, postgres.c:1164 2016-01-24 15:04:59.886 GMT - thom - postgres LOG: 00000: disconnection: session time: 0:00:00.007 user=thom database=postgres host=[local] 2016-01-24 15:04:59.886 GMT - thom - postgres LOCATION: log_disconnections, postgres.c:4458 This shouldn't be checking whether it's a standby. I also noticed that with: psql 'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535/postgres?hostorder=random&readonly=1' -c 'show port' The standby at port 5533 shows in the logs that it's checking whether it's a standby when it happens to hit it. Shouldn't this be unnecessary if we've set "readonly" to 1? The result of the query doesn't appear to be useful for anything. Another thing I've noticed is that the PORT variable (shown by \set) always shows PGPORT, but I expect it to equal the port of whichever host we successfully connected to. Thom
On 24 January 2016 at 15:30, Thom Brown <thom@linux.com> wrote: > On 23 January 2016 at 03:32, Thom Brown <thom@linux.com> wrote: >> On 22 January 2016 at 19:30, Victor Wagner <vitus@wagner.pp.ru> wrote: >>> On Tue, 19 Jan 2016 14:34:54 +0000 >>> Thom Brown <thom@linux.com> wrote: >>> >>>> >>>> The segfault issue I originally reported now appears to be resolved. >>>> >>>> But now I have another one: >>>> >>>> psql >>>> 'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random&readonly=1&failover_timeout=5' >>>> -c 'show port' >>> >>> Here is new version of the patch. Now I've reworked hostorder=random and >>> it seems to work as well as sequential. failover_timeout works too. >>> I've also found a case when attempt to connect fail when receiving >>> FATAL message from server which is not properly up yet. So, it is fixed >>> too. >>> >>> Addititonally, error messages from all failed connect attempts are not >>> accumulated now. Only last one is returned. >> >> I can't connect to a standby with the patch applied: >> >> thom@swift:~/Development/test$ psql -p 5531 postgres >> psql: thom@swift:~/Development/test$ >> >> No error message, nothing in the logs. I find this is the case with >> any standby, but doesn't affect primaries. >> >> So this has broken existing functionality somewhere. > > Okay, I've tested this again with additional logging. Again, I'm just > running "psql -p 5531 postgres", which connects to a standby. This > immediately exits psql, and the logs show: > > 2016-01-24 15:04:59.879 GMT - [unknown] - [unknown] LOG: 00000: > connection received: host=[local] > 2016-01-24 15:04:59.879 GMT - [unknown] - [unknown] LOCATION: > BackendInitialize, postmaster.c:4081 > 2016-01-24 15:04:59.880 GMT - thom - postgres LOG: 00000: connection > authorized: user=thom database=postgres > 2016-01-24 15:04:59.880 GMT - thom - postgres LOCATION: > PerformAuthentication, postinit.c:272 > 2016-01-24 15:04:59.886 GMT - thom - postgres LOG: 00000: statement: > SELECT pg_catalog.pg_is_in_recovery() > 2016-01-24 15:04:59.886 GMT - thom - postgres LOCATION: > exec_simple_query, postgres.c:935 > 2016-01-24 15:04:59.886 GMT - thom - postgres LOG: 00000: duration: 0.583 ms > 2016-01-24 15:04:59.886 GMT - thom - postgres LOCATION: > exec_simple_query, postgres.c:1164 > 2016-01-24 15:04:59.886 GMT - thom - postgres LOG: 00000: > disconnection: session time: 0:00:00.007 user=thom database=postgres > host=[local] > 2016-01-24 15:04:59.886 GMT - thom - postgres LOCATION: > log_disconnections, postgres.c:4458 > > This shouldn't be checking whether it's a standby. I also noticed that with: > > psql 'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535/postgres?hostorder=random&readonly=1' > -c 'show port' > > The standby at port 5533 shows in the logs that it's checking whether > it's a standby when it happens to hit it. Shouldn't this be > unnecessary if we've set "readonly" to 1? The result of the query > doesn't appear to be useful for anything. > > Another thing I've noticed is that the PORT variable (shown by \set) > always shows PGPORT, but I expect it to equal the port of whichever > host we successfully connected to. Actually, the same goes for the HOST variable, which is currently showing the list of hosts:ports. Output of \set variables without patch: HOST = '127.0.0.1' PORT = '5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535' And with patch: HOST = '127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535' PORT = '5488' They're both wrong, but I'm hoping we can just show the right information here. Thom
On Sun, 24 Jan 2016 15:30:22 +0000 Thom Brown <thom@linux.com> wrote: в > Okay, I've tested this again with additional logging. Again, I'm just > running "psql -p 5531 postgres", which connects to a standby. This > immediately exits psql, and the logs show: > 2016-01-24 15:04:59.886 GMT - thom - postgres LOG: 00000: statement: > SELECT pg_catalog.pg_is_in_recovery() > This shouldn't be checking whether it's a standby. I also noticed > that with: This is, of course, incompatibility with previous behavior. Probably, I should modify this patch, so it would imply readonly flag if only one host/port pair is specified in the command line. Now it does check for standby regardless of number of hosts specified. -- Victor Wagner <vitus@wagner.pp.ru>
On Sun, 24 Jan 2016 15:58:10 +0000 Thom Brown <thom@linux.com> wrote: > > Output of \set variables without patch: > > HOST = '127.0.0.1' > PORT = > '5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535' > > And with patch: > > HOST = > '127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535' > PORT = '5488' > > They're both wrong, but I'm hoping we can just show the right > information here. I think we should show right information here, but it is not so simple. Problem is that I never keep symbolic representation of individual host/port pair. And when we connect successfully, we have only struct sockaddr representation of the it, which contain right IP address, but doesn't contain symbolic host name. Moreover, one hostname from connect string can produce more than one addrinfo structures. For example, on the machines with IPv6 support, 'localhost' hostname is resolved into both 127.0.0.1 IPv4 address and [::1] IPv6, and produces two records. So would do any name, which have both A and AAAA records in DNS. And nothing prevent domain administrator to put more than one A record for same hostname into DNS zone. So, it is just same information which can be retrieved from the backend via select inet_client_addr(); select inet_client_port(); What is really interesting for HOST and PORT variable - it is the name of host and port number used to make actual connection, as they were specified in the connect string or service file. > > Thom -- Victor Wagner <vitus@wagner.pp.ru>
On 24 January 2016 at 19:53, Victor Wagner <vitus@wagner.pp.ru> wrote: > On Sun, 24 Jan 2016 15:58:10 +0000 > Thom Brown <thom@linux.com> wrote: > > >> >> Output of \set variables without patch: >> >> HOST = '127.0.0.1' >> PORT = >> '5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535' >> >> And with patch: >> >> HOST = >> '127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535' >> PORT = '5488' >> >> They're both wrong, but I'm hoping we can just show the right >> information here. > > I think we should show right information here, but it is not so simple. > > Problem is that I never keep symbolic representation of individual > host/port pair. And when we connect successfully, we have only struct > sockaddr representation of the it, which contain right IP > address, but doesn't contain symbolic host name. > > Moreover, one hostname from connect string can produce more than one > addrinfo structures. For example, on the machines with IPv6 support, > 'localhost' hostname is resolved into both 127.0.0.1 IPv4 address and > [::1] IPv6, and produces two records. > > So would do any name, which have both A and AAAA records in DNS. And > nothing prevent domain administrator to put more than one A record for > same hostname into DNS zone. > > > So, it is just same information which can be retrieved from the backend > via > > select inet_client_addr(); > select inet_client_port(); I think you mean: select inet_server_addr(); select inet_server_port(); > What is really interesting for HOST and PORT variable - it is the name > of host and port number used to make actual connection, as they were > specified in the connect string or service file. And this is probably not the correct thing for it to report. The documentation says "The database server host you are currently connected to." and "The database server port to which you are currently connected.", so yeah, I'd expect to see those set to whatever those 2 functions resolve to. That being said, if one connects via a domain socket, those appear to come back blank with those functions, yet HOST and PORT report the correct information in those cases (without passing in multiple hosts). Is that a pre-existing bug? Thom
On 24 January 2016 at 20:11, Thom Brown <thom@linux.com> wrote: > On 24 January 2016 at 19:53, Victor Wagner <vitus@wagner.pp.ru> wrote: >> On Sun, 24 Jan 2016 15:58:10 +0000 >> Thom Brown <thom@linux.com> wrote: >> >> >>> >>> Output of \set variables without patch: >>> >>> HOST = '127.0.0.1' >>> PORT = >>> '5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535' >>> >>> And with patch: >>> >>> HOST = >>> '127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535' >>> PORT = '5488' >>> >>> They're both wrong, but I'm hoping we can just show the right >>> information here. >> >> I think we should show right information here, but it is not so simple. >> >> Problem is that I never keep symbolic representation of individual >> host/port pair. And when we connect successfully, we have only struct >> sockaddr representation of the it, which contain right IP >> address, but doesn't contain symbolic host name. >> >> Moreover, one hostname from connect string can produce more than one >> addrinfo structures. For example, on the machines with IPv6 support, >> 'localhost' hostname is resolved into both 127.0.0.1 IPv4 address and >> [::1] IPv6, and produces two records. >> >> So would do any name, which have both A and AAAA records in DNS. And >> nothing prevent domain administrator to put more than one A record for >> same hostname into DNS zone. >> >> >> So, it is just same information which can be retrieved from the backend >> via >> >> select inet_client_addr(); >> select inet_client_port(); > > I think you mean: > > select inet_server_addr(); > select inet_server_port(); > >> What is really interesting for HOST and PORT variable - it is the name >> of host and port number used to make actual connection, as they were >> specified in the connect string or service file. > > And this is probably not the correct thing for it to report. The > documentation says "The database server host you are currently > connected to." and "The database server port to which you are > currently connected.", so yeah, I'd expect to see those set to > whatever those 2 functions resolve to. That being said, if one > connects via a domain socket, those appear to come back blank with > those functions, yet HOST and PORT report the correct information in > those cases (without passing in multiple hosts). Is that a > pre-existing bug? I've just checked, and can see that this doesn't appear to be a bug. As per network.c: /** IP address that the server accepted the connection on (NULL if Unix socket)*/ and /** port that the server accepted the connection on (NULL if Unix socket)*/ Thom
Victor Wagner wrote: > On Fri, 22 Jan 2016 16:36:15 -0300 > Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > > You're editing the expected file for the libpq-regress thingy, but you > > haven't added any new lines to test the new capability. I think it'd > > be good to add some there. (I already said this earlier in the > > thread; is there any reason you ignored it the first time?) > > I seriously doubt that this program can be used to test new > capabilities. > > All it does, it calls PQconninfoParse and than examines some fields of > PGconn structure. Ah, you're right, I didn't remember that. > If I add some new uris, than only thing I can test is that comma is > properly copied from the URI to this field. And may be that some syntax > errors are properly detected. Yeah, we should do that. > So, I think that new functionality need other approach for testing. > There should be test of real connection to real temporary cluster. > Probably, based on Perl TAP framework which is actively used in the > Postgres recently. Yes, agreed. So please have a look at that one and share your opinion about it. It'd be useful. Meanwhile I'm moving the patch to the next commitfest. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Victor<small>,</small><br /><br /><div class="moz-cite-prefix">On 2/1/16 5:05 PM, Alvaro Herrera wrote:<br /></div><blockquotecite="mid:%3C20160201220540.GA99599@alvherre.pgsql%3E" type="cite"><pre wrap="">Victor Wagner wrote: </pre><blockquote type="cite"><pre wrap="">On Fri, 22 Jan 2016 16:36:15 -0300 Alvaro Herrera <a class="moz-txt-link-rfc2396E" href="mailto:alvherre@2ndquadrant.com"><alvherre@2ndquadrant.com></a>wrote: </pre><blockquote type="cite"><pre wrap="">You're editing the expected file for the libpq-regress thingy, but you haven't added any new lines to test the new capability. I think it'd be good to add some there. (I already said this earlier in the thread; is there any reason you ignored it the first time?) </pre></blockquote><pre wrap=""> I seriously doubt that this program can be used to test new capabilities. All it does, it calls PQconninfoParse and than examines some fields of PGconn structure. </pre></blockquote><pre wrap=""> Ah, you're right, I didn't remember that. </pre><blockquote type="cite"><pre wrap="">If I add some new uris, than only thing I can test is that comma is properly copied from the URI to this field. And may be that some syntax errors are properly detected. </pre></blockquote><pre wrap=""> Yeah, we should do that. </pre><blockquote type="cite"><pre wrap="">So, I think that new functionality need other approach for testing. There should be test of real connection to real temporary cluster. Probably, based on Perl TAP framework which is actively used in the Postgres recently. </pre></blockquote><pre wrap=""> Yes, agreed. So please have a look at that one and share your opinion about it. It'd be useful. Meanwhile I'm moving the patch to the next commitfest. </pre></blockquote><br /> There hasn't been any movement on this patch in a while. Will you have a new tests ready for reviewsoon? <br /><br /> Thanks, <br /><pre class="moz-signature" cols="72">-- -David <a class="moz-txt-link-abbreviated" href="mailto:david@pgmasters.net">david@pgmasters.net</a></pre>
There hasn't been any movement on this patch in a while. Will you have a new tests ready for review soon?
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3/10/16 7:34 AM, Simon Riggs wrote: > On 9 March 2016 at 23:11, David Steele <david@pgmasters.net > <mailto:david@pgmasters.net>> wrote: > > There hasn't been any movement on this patch in a while. Will you > have a new tests ready for review soon? > > > I see the value in this feature, but the patch itself needs work and > probably some slimming down/reality and a good spellcheck. > > If someone takes this on soon it can go into 9.6, otherwise I vote to > reject this early to avoid wasting review time. Since there has been no response from the author I have marked this patch "returned with feedback". Please feel free to resubmit for 9.7! -- -David david@pgmasters.net
>Since there has been no response from the author I have marked this patch "returned with feedback". Please feel free >to resubmit for 9.7!
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O0 -pthread -D_REENTRANT -D_THREAD_SAFE -D_POSIX_PTHREAD_SEMANTICS test1.o -L../../ecpglib -L../../pgtypeslib -L../../../../../src/
../../../../../src/interfaces/
collect2: error: ld returned 1 exit status
make[3]: *** [test1] Error 1
make[3]: Leaving directory `/home/mithun/mynewpost/p1/
Attachment
On Fri, 26 Aug 2016 10:10:33 +0530 Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Thu, Mar 17, 2016 at 4:47 AM, David Steele <david@pgmasters.net> > wrote: > >Since there has been no response from the author I have marked this > >patch > "returned with feedback". Please feel free >to resubmit for 9.7! > I have started to work on this patch, and tried to fix some of the > issues discussed above. The most recent patch 06 has fixed many > issues which was raised previously which include indefinite looping, > crashes. And, some of the issues which remain pending are. Although I have not resubmutted my patch yet, I've continue to work on it, so you can see my last version on https://www.wagner.pp.ru/fossil/pgfailover. It seems that I've already fixed some of issues you mentioned below. Also I've added testsuite, which is incomplete. > JFYI Interestingly PostgreSql JDBC driver have following options > targetServerType=any|master|slave|preferSlave for same purpose. Probably we should havbe a comptibility alias. -- Victor Wagner <vitus@wagner.pp.ru>
>rpath,'/home/mithun/edbsrc/patch6bin/lib',--enable-new- dtags -lecpg -lpgtypes -lpq -lpgcommon -lpgport -lz -lrt -lcrypt -ldl -lm -o test1 >../../../../../src/interfaces/libpq/libpq.so: undefined reference to `pg_usleep'
--
Attachment
On Fri, 26 Aug 2016 10:10:33 +0530 Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Thu, Mar 17, 2016 at 4:47 AM, David Steele <david@pgmasters.net> > wrote: > >Since there has been no response from the author I have marked this > >patch > "returned with feedback". Please feel free >to resubmit for 9.7! > I have started to work on this patch, and tried to fix some of the > issues discussed above. The most recent patch 06 has fixed many > issues which was raised previously which include indefinite looping, > crashes. And, some of the issues which remain pending are. > > 1. Connection status functions PQport, PQhost do not give > corresponding values of established connection. -- Have attached the > patch for same. > > 2. Default value of readonly parameter is 0, which means should > connect to master only. So absence of parameter in connection string > in simple cases like "./psql -p PORT database" fails to connect to > hot standby server. I think since if readonly=1 means connect to > any. and readonly=0 means connect to master only, we should change > the default value to 1, to handle the cases when parameter is not > specified. > > JFYI Interestingly PostgreSql JDBC driver have following options > targetServerType=any|master|slave|preferSlave for same purpose. > > Also did a little bit of patch clean-up. > > Also found some minor issues related to build which I am working on. > 1. make check-world @src/interfaces/ecpg/test/connect fails with > following error: > gcc -Wall -Wmissing-prototypes -Wpointer-arith > -Wdeclaration-after-statement -Wendif-labels > -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing > -fwrapv -fexcess-precision=standard -g -O0 -pthread -D_REENTRANT > -D_THREAD_SAFE -D_POSIX_PTHREAD_SEMANTICS test1.o -L../../ecpglib > -L../../pgtypeslib -L../../../../../src/interfaces/libpq > -L../../../../../src/port -L../../../../../src/common -Wl,--as-needed > -Wl,-rpath,'/home/mithun/edbsrc/patch6bin/lib',--enable-new-dtags > -lecpg -lpgtypes -lpq -lpgcommon -lpgport -lz -lrt -lcrypt -ldl -lm > -o test1 ../../../../../src/interfaces/libpq/libpq.so: undefined > reference to `pg_usleep' > collect2: error: ld returned 1 exit status > make[3]: *** [test1] Error 1 > make[3]: Leaving directory `/home/mithun/mynewpost/p1/ > src/interfaces/ecpg/test/connect' > > patch has used pg_usleep() which is in L../../../../../src/port I > think dependency is not captured rightly. > > Thanks and Regards > Mithun C Y > EnterpriseDB: http://www.enterprisedb.com
On Tue, 30 Aug 2016 14:54:57 +0530 Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Fri, Aug 26, 2016 at 10:10 AM, Mithun Cy > <mithun.cy@enterprisedb.com> wrote: > > > > >rpath,'/home/mithun/edbsrc/patch6bin/lib',--enable-new-dtags > > >-lecpg > > -lpgtypes -lpq -lpgcommon -lpgport -lz -lrt -lcrypt -ldl -lm -o > > test1 > > >../../../../../src/interfaces/libpq/libpq.so: undefined reference > > >to > > `pg_usleep' > > > > As similar to psql I have added -lpq for compilation. So now ecpg > tests compiles and make check-world has passed. Thanks, I've added this to 7-th (yet unpublished here) version of my patch, available on my site.
<p dir="ltr">On Aug 31, 2016 1:44 PM, "Victor Wagner" <<a href="mailto:vitus@wagner.pp.ru">vitus@wagner.pp.ru</a>>wrote:<br /> > Thanks, I've added this to 7-th (yet unpublishedhere) version of my<br /> > patch.<br /> Hi victor, just wanted know what your plan for your patch 07. Wouldyou like to submit it to the community. I have just signed up as a reviewer for your patch.<br />
On Mon, 5 Sep 2016 07:59:02 +0530 Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Aug 31, 2016 1:44 PM, "Victor Wagner" <vitus@wagner.pp.ru> wrote: > > Thanks, I've added this to 7-th (yet unpublished here) version of my > > patch. > Hi victor, just wanted know what your plan for your patch 07. Would > you like to submit it to the community. I have just signed up as a > reviewer for your patch. As community shows an interest with this patch, and it has been included in the current commitfest, I've to submit it. This version of patch includes 1. Most important - a tap test suite (really small enough and has a room for improvements). 2. Compatibility with older versions - now readonly check is skipped if only one hostname is specified in the connect string 3. pg_host and pg_port variables now show host name and port library was connected to. 4. Your fix with compilation of ecpg tests. Patch is attached. -- Victor Wagner <vitus@wagner.pp.ru>
Attachment
Hello, Victor. > As community shows an interest with this patch, and it has been > included in the current commitfest, I've to submit it. Naturally we show interest in this patch. It's a great feature that would be very nice to have! > This version of patch includes > > 1. Most important - a tap test suite (really small enough and has a > room for improvements). > > 2. Compatibility with older versions - now readonly check is skipped if > only one hostname is specified in the connect string > > 3. pg_host and pg_port variables now show host name and port library > was connected to. > > 4. Your fix with compilation of ecpg tests. > > Patch is attached. After a brief examination I've found following ways to improve the patch. 1) It looks like code is not properly formatted. 2) > + <term><literal>readonly</literal></term> > + <listitem> > + <para> > + If this parameter is 0 (the default), upon successful connection > + library checks if host is in recovery state, and if it is so, > + tries next host in the connect string. If this parameter is > + non-zero, connection to warm standby nodes are considered > + successful. IIRC the are "warm" and "hot" standby's in PostgreSQL terminology. The difference is that you can't execute any queries on a "warm" standby. So the description is probably wrong. I suggest to fix it to just "... connection to standby nodes are...". 3) > + <term><literal>falover_timeout</literal></term> > + <listitem> > + <para> > + Maximum time to cycilically retry all the hosts in commect string. > + (as decimal integer number of seconds). If not specified, then > + hosts are tried just once. > + </para> Typos: cycilically, commect 4) > static int > connectDBStart(PGconn *conn) > { > + struct nodeinfo > + { > + char *host; > + char *port; > + }; Not sure if all compilers we support can parse this. I suggest to declare nodinfo structure outside of procedure, just to be safe. 5) > + nodes = calloc(sizeof(struct nodeinfo), 2); > + nodes->host = strdup(conn->pghostaddr); > hint.ai_family = AF_UNSPEC; > + /* Parse comma-separated list of host-port pairs into > + function-local array of records */ > + nodes = malloc(sizeof(struct nodeinfo) * 4); > /* pghostaddr and pghost are NULL, so use Unix domain > * socket */ > - node = NULL; > + nodes = calloc(sizeof(struct nodeinfo), 2); > + sprintf(port,"%d", htons(((struct sockaddr_in > *)ai->ai_addr)->sin_port)); > + return strdup(port); You should check return values of malloc, calloc and strdup procedures, or use safe pg_* equivalents. 6) > + for (p = addrs; p->ai_next != NULL; p = > p->ai_next); > + p->ai_next = this_node_addrs; Looks a bit scary. I suggest to add an empty scope ( {} ) and a comment to make our intention clear here. 7) Code compiles with following warnings (CLang 3.8): > 1 warning generated. > fe-connect.c:5239:39: warning: if statement has empty body > [-Wempty-body] > errorMessage, > false, true)); > ^ > fe-connect.c:5239:39: note: put the semicolon on a separate line to > silence this warning > 1 warning generated. 8) get_next_element procedure implementation is way too smart (read "complicated"). You could probably just store current list length and generate a random number between 0 and length-1. -- Best regards, Aleksander Alekseev
On Tue, 6 Sep 2016 07:58:28 +0530 Mithun Cy <mithun.cy@enterprisedb.com> wrote: > > Now if only one host is in connection string and it ask for read_write > connection(connect to master) I mean read_only is set 0 explicitly. > With above logic we will allow it to connect to standby?. I still > think psql connection to standby should be handled by changing the > default value of readonly to 1 (which means connect to any). It would definitely be more logical, but I haven't found easy way to do it. May be I should return to this place and rethink. I don't like and idea to explicitely ignore connection string parameter due to some condition. Really, I think, test for replication connection can be either removed from this part of code now. > Thinking > further probably replacing readonly parameter with > targetServerType=any|master (with default being any) should clear > some confusions and bring consistency since same is used in JDBC > multi host connection string. It seems that in this context change readonly=0|1 to targetServerType=any|master makes sense. > 2) > @@ -1460,33 +1538,80 @@ connectDBStart(PGconn *conn) > portstr, > (int) (UNIXSOCK_PATH_BUFLEN - 1)); > conn->options_valid = false; > + free(nodes->port); > > nodes->port was not allocated at this point. > I'll recheck it. -- Victor Wagner <vitus@wagner.pp.ru>
On Mon, 5 Sep 2016 14:03:11 +0300 Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote: > Hello, Victor. > > > 1) It looks like code is not properly formatted. > Thanks for pointing to the documentation and formatting problems. I'll fix them > > static int > > connectDBStart(PGconn *conn) > > { > > + struct nodeinfo > > + { > > + char *host; > > + char *port; > > + }; > > Not sure if all compilers we support can parse this. I suggest to > declare nodinfo structure outside of procedure, just to be safe. > Block-local structs are part of C language standard. I don't remember when they appeared first (may be in K&R C), but at least since C89 AFAIR they are allowed explicitely. And using most local scope possible is always a good thing. So, I'll leave this structure function local for now. > > You should check return values of malloc, calloc and strdup > procedures, or use safe pg_* equivalents. Thanks, I'll fix this one. > 6) > > > + for (p = addrs; p->ai_next != NULL; p = > > p->ai_next); > > + p->ai_next = this_node_addrs; > > Looks a bit scary. I suggest to add an empty scope ( {} ) and a > comment to make our intention clear here. Ok, it really would make code more clear. > 7) Code compiles with following warnings (CLang 3.8): > > > 1 warning generated. > > fe-connect.c:5239:39: warning: if statement has empty body > > [-Wempty-body] > > errorMessage, > > false, true)); > > ^ > > fe-connect.c:5239:39: note: put the semicolon on a separate line to > > silence this warning > > 1 warning generated. Oops, it looks like logic error, which should be fixed. Thanks for testing my patch by more picky compiler. > 8) get_next_element procedure implementation is way too smart (read > "complicated"). You could probably just store current list length and > generate a random number between 0 and length-1. No, algorithm here is more complicated. It must ensure that there would not be second attempt to connect to host, for which unsuccessful connection attempt was done. So, there is list rearrangement. Algorithm for pick random list element by single pass is quite trivial.
> > 8) get_next_element procedure implementation is way too smart (read > > "complicated"). You could probably just store current list length and > > generate a random number between 0 and length-1. > > No, algorithm here is more complicated. It must ensure that there would > not be second attempt to connect to host, for which unsuccessful > connection attempt was done. So, there is list rearrangement. > > Algorithm for pick random list element by single pass is quite trivial. Great! In this case I would be _trivial_ for you to write a comment that describes how this procedure works, what makes you think that it gives a good distribution in all possible cases (e.g. if there is more than 0x10000 elements in a list - why not), etc. Right? :) -- Best regards, Aleksander Alekseev
On Tue, 6 Sep 2016 07:58:28 +0530 Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Mon, Sep 5, 2016 at 4:33 PM, Aleksander Alekseev < > a.alekseev@postgrespro.ru> wrote: > >After a brief examination I've found following ways to improve the > >patch. > Adding to above comments. > I'm sending new version of patch. I've replaced readonly option with target_server_type (with JDBC compatibility alias targetServerType), use logic of setting defaults based on number of hosts in the connect string instead of complicated condition in the state machine, added checks for return values of memory allocation function. Also, I've solved pg_usleep problem by linking pgsleep.c into libpq along with some other libpgport objects which are already linked there. Thus client applications don't need to link with libpgport and libpq shared library is self-containted.
Attachment
Hello, Victor. > I'm sending new version of patch. > > I've replaced readonly option with target_server_type (with JDBC > compatibility alias targetServerType), > > use logic of setting defaults based on number of hosts in the connect > string instead of complicated condition in the state machine, > > added checks for return values of memory allocation function. > > Also, I've solved pg_usleep problem by linking pgsleep.c into libpq > along with some other libpgport objects which are already linked there. > > Thus client applications don't need to link with libpgport and libpq > shared library is self-containted. This patch doesn't apply to master branch: ``` $ git apply ~/temp/libpq-failover-8.patch /home/eax/temp/libpq-failover-8.patch:184: trailing whitespace. check: /home/eax/temp/libpq-failover-8.patch:252: trailing whitespace./* /home/eax/temp/libpq-failover-8.patch:253: trailing whitespace. * Validate target_server_mode option /home/eax/temp/libpq-failover-8.patch:254: trailing whitespace. */ /home/eax/temp/libpq-failover-8.patch:306: trailing whitespace. appendPQExpBuffer(&conn->errorMessage, error: src/interfaces/libpq/t/001-multihost.pl: already exists in working directory $ git diff ``` -- Best regards, Aleksander Alekseev
> Hello, Victor. > > > I'm sending new version of patch. > > > > I've replaced readonly option with target_server_type (with JDBC > > compatibility alias targetServerType), > > > > use logic of setting defaults based on number of hosts in the connect > > string instead of complicated condition in the state machine, > > > > added checks for return values of memory allocation function. > > > > Also, I've solved pg_usleep problem by linking pgsleep.c into libpq > > along with some other libpgport objects which are already linked there. > > > > Thus client applications don't need to link with libpgport and libpq > > shared library is self-containted. > > This patch doesn't apply to master branch: > > ``` > $ git apply ~/temp/libpq-failover-8.patch > /home/eax/temp/libpq-failover-8.patch:184: trailing whitespace. > check: > /home/eax/temp/libpq-failover-8.patch:252: trailing whitespace. > /* > /home/eax/temp/libpq-failover-8.patch:253: trailing whitespace. > * Validate target_server_mode option > /home/eax/temp/libpq-failover-8.patch:254: trailing whitespace. > */ > /home/eax/temp/libpq-failover-8.patch:306: trailing whitespace. > appendPQExpBuffer(&conn->errorMessage, > error: src/interfaces/libpq/t/001-multihost.pl: already exists in > working directory > > $ git diff > ``` Oops. Sorry, my mistake. I forgot to check for untracked files. Everything is fine. -- Best regards, Aleksander Alekseev
On Fri, 9 Sep 2016 11:20:59 +0530 Mithun Cy <mithun.cy@enterprisedb.com> wrote: > If concern is only about excluding address which was tried > previously. Then I think we can try this way, randomly permute given > address list (for example fisher-yates shuffle) before trying any of > those address in it. After that you can treat the new list exactly > like we do for sequential connections. I think this makes code less > complex. Random permutation is much more computationally complex than random picking. Alexander suggests to use other random pick algorithm than I use. I use algorithm which works with lists of unknown length and chooses line of such list with equal probability. This algorithm requires one random number for each element of the list, but we are using C library rand function which uses quite cheap linear congruental pseudo random numbers, so random number generation is infinitely cheap than network connection attempt. As this algorithm doesn't need beforehand knowledge of the list length, it is easy to wrap this algorithm into loop, which modifies the list, moving already tried elements out of scope of next picking. (it is really quite alike fisher-yates algorithm). Alexander suggests to store somewhere number of elements of the list, than generate one random number and pick element with this number. Unfortunately, it doesn't save much effort in the linked list. At average we'll need to scan half of the list to find desired element. So, it is O(N) anyway. Now, the main point is that we have two different timeframes of operation. 1. Attempt to connect. It is short period, and we can assume that state of servers doesn't change. So if, server is down, we should ignore it until the end of this attempt to connect. 2. Attempt to failover. If no good candidate was found, we might need to retry connection until one of standbys would be promoted to master (with targetServerType=master) or some host comes up. See documentation of failover_timeout configuration parameter. It means that we cannot just throw away those hosts from list which didn't respond during this connect attempt. So, using random permutation instead of random pick wouln't help. And keeping somewhere number of elements in the list wouldn't help either unless we change linked list to completely different data structure which allows random access. --
On Fri, Sep 9, 2016 at 4:49 AM, Victor Wagner <vitus@wagner.pp.ru> wrote: > Random permutation is much more computationally complex than random > picking. It really isn't. The pseudocode given at https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle is all of 4 lines long, and one of those lines is a comment. The C implementation is longer, but not by much. Mind you, I haven't read the patch, so I don't know whether using something like this would make the code simpler or more complicated. However, if the two modes of operation are (1) try all hosts in random order and (2) try all hosts in the listed order, it's worth noting that the code for the second thing can be used to implement the first thing just by shuffling the list before you begin. > So, using random permutation instead of random pick wouln't help. > And keeping somewhere number of elements in the list wouldn't help > either unless we change linked list to completely different data > structure which allows random access. Is there a good reason not to use an array rather than a linked list? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
+ /* FIXME need to check that port is numeric */
Is this still applicable?.
1)
+ /*
+ * if there is more than one host in the connect string and
+ * target_server_type is not specified explicitely, set
+ * target_server_type to "master"
+ */
I think we need comments to know why we change default value based on number of elements in connection string. why default in not “any" when node count > 1.
2)
+ /* loop over all the host specs in the node variable */
+ for (node = nodes; node->host != NULL || node->port != NULL; node++)
{
I think instead of loop if we can move these code into a separate function say pg_add_to_addresslist(host, port, &addrs) this helps in removing temp variables like "struct node info” and several heap calls around it.
3)
+static char *
+get_port_from_addrinfo(struct addrinfo * ai)
+{
+ char port[6];
+
+ sprintf(port, "%d", htons(((struct sockaddr_in *) ai->ai_addr)->sin_port));
+ return strdup(port);
+}
Need some comments for this function.
We use strdup in many places no where we handle memory allocation failure.
4)
+ /*
+ * consult connection options and check if RO connection is OK RO
+ * connection is OK if readonly connection is explicitely
+ * requested or if replication option is set, or just one host is
+ * specified in the connect string.
+ */
+ if (conn->pghost == NULL || !conn->target_server_type ||
+ strcmp(conn->target_server_type, "any") == 0)
Comments not in sink with code.
On Fri, Sep 9, 2016 at 4:49 AM, Victor Wagner <vitus@wagner.pp.ru> wrote:
> Random permutation is much more computationally complex than random
> picking.
It really isn't. The pseudocode given at
https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle is all of 4
lines long, and one of those lines is a comment. The C implementation
is longer, but not by much.
Mind you, I haven't read the patch, so I don't know whether using
something like this would make the code simpler or more complicated.
However, if the two modes of operation are (1) try all hosts in random
order and (2) try all hosts in the listed order, it's worth noting
that the code for the second thing can be used to implement the first
thing just by shuffling the list before you begin.
> So, using random permutation instead of random pick wouln't help.
> And keeping somewhere number of elements in the list wouldn't help
> either unless we change linked list to completely different data
> structure which allows random access.
Is there a good reason not to use an array rather than a linked list?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, 25 Sep 2016 17:31:53 +0530 Mithun Cy <mithun.cy@enterprisedb.com> wrote: > I have some more comments on libpq-failover-8.patch > > + /* FIXME need to check that port is numeric */ > > Is this still applicable?. > Unfortunately, it was. I've fixed this problem in 9-th version of patch (attached) > > I think we need comments to know why we change default value based on > number of elements in connection string. why default in not “any" > when node count > 1. Fixed. > > + /* loop over all the host specs in the node variable */ > > + for (node = nodes; node->host != NULL || node->port != NULL; node++) > > { > > I think instead of loop if we can move these code into a separate > function say pg_add_to_addresslist(host, port, &addrs) this helps in > removing temp variables like "struct node info” and several heap > calls around it. For some reason DNS resolving was concentrated in one place before my changes. So, I've tried to not change this decision. > 3) > > +static char * > > +get_port_from_addrinfo(struct addrinfo * ai) > > > Need some comments for this function. Done. > We use strdup in many places no where we handle memory allocation > failure. Added some more memory allocation error checks. > > Comments not in sink with code. Fixed.
Attachment
2).
>For some reason DNS resolving was concentrated in one place before my
>changes. So, I've tried to not change this decision.
My intention was not to have a replacement function for "pg_getaddrinfo_all", I just suggested to
have a local function which call pg_getaddrinfo_all for every host, port pair read earlier. By this way we need not to maintain nodes struct. This also reduces complexity of connectDBStart.
FUNC (host, port, addrs)
{
CALL pg_getaddrinfo_all(host, port, newaddrs);
addrs-> ai_next = newaddrs;
}
3).
I think you should run a spellcheck once. And, there are some formatting issue with respect to comments and curly braces of controlled blocks which need to be fixed.
On Thu, 29 Sep 2016 23:45:52 +0530 Mithun Cy <mithun.cy@enterprisedb.com> wrote: > This patch do not apply on latest code. it fails as follows It's strange. I have no problems applying it to commit fd321a1dfd64d30 Ok, some trailing whitespace and mixing of tabs and spaces which git apply doesn't like really present in the patch. I'm attached hear version with these issues resolved. > I am slightly confused. As per this target_server_type=master means > user is expecting failover. What if user pass target_server_type=any > and request sequential connection isn't this also a case for > failover?. I think by default it should be "any" for any number of > hosts. And parameter "sequential and random will decide whether we > want just failover or with load-balance. I don't agree with this. In the first versions of the patch it refuses connect to readonly server even if it is only one, because I think that read-write connection is what user typically expect. When user tries to connect to cluster (specifying many hosts in the connect string), it can be by default assumed that he wants master. But backward compatibility is more important thing, so I now assume, that user tries to connect just one node, and this node is read only, user knows what he is doing.
Attachment
On Fri, Sep 30, 2016 at 5:44 PM, Victor Wagner <vitus@wagner.pp.ru> wrote: > But backward compatibility is more > important thing, so I now assume, that user tries to connect just one > node, and this node is read only, user knows what he is doing. Moved this patch to next CF. (Note that I am in CF-vacuuming mode this morning, I'll try to close it asap.) -- Michael
Attachment
On Thu, 13 Oct 2016 12:30:59 +0530 Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Fri, Sep 30, 2016 at 2:14 PM, Victor Wagner <vitus@wagner.pp.ru> > wrote: > Okay but for me consistency is also important. Since we agree to > disagree on some of the comments and others have not expressed any > problem I am moving it to committer. Thank you for your efforts improving my patch
On Thu, Oct 13, 2016 at 5:53 AM, Victor Wagner <vitus@wagner.pp.ru> wrote: > On Thu, 13 Oct 2016 12:30:59 +0530 > Mithun Cy <mithun.cy@enterprisedb.com> wrote: >> On Fri, Sep 30, 2016 at 2:14 PM, Victor Wagner <vitus@wagner.pp.ru> >> wrote: >> Okay but for me consistency is also important. Since we agree to >> disagree on some of the comments and others have not expressed any >> problem I am moving it to committer. > > Thank you for your efforts improving my patch I haven't deeply reviewed this patch, but on a quick read-through it certainly seems to need a lot of cleanup work. + <varlistentry id="libpq-connect-hostorder" xreflabel="hostorder"> + <term><literal>hostorder</literal></term> + <listitem> I don't think that putting everything at the same indentation level matches the style of the surrounding documentation. @@ -985,7 +1042,6 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname </para> </listitem> </varlistentry> - <varlistentry id="libpq-connect-connect-timeout" xreflabel="connect_timeout"> <term><literal>connect_timeout</literal></term> <listitem> Spurious hunk. + /* Host has explicitely specified port */ Spelling. This same error is repeated elsewhere. + /* Check if port is numeric */ + for (nptr=r+1;nptr<q;nptr++) + { + if (*nptr<'0' ||*nptr >'9') Formatting. Consider using pgindent to get this into correct PostgreSQL style. + * behavoir. Spelling. + * the beginning (and putting its addres into given pointer. Spelling. + * Returns 1 if address is choosen and 0 if there are no more addresses Spelling. + * Initialize random number generator in case if nobody have done it + * before. Use value from rand along with time in case random number Grammar ("in case nobody has done it already"). Also, is it really OK for libpq to call srand() as a side effect? I think that seems rather unfriendly. + * taget_server_type is set to 'any' or is not exlicitely Multiple spelling mistakes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 13 October 2016 at 10:53, Victor Wagner <vitus@wagner.pp.ru> wrote: > On Thu, 13 Oct 2016 12:30:59 +0530 > Mithun Cy <mithun.cy@enterprisedb.com> wrote: > >> On Fri, Sep 30, 2016 at 2:14 PM, Victor Wagner <vitus@wagner.pp.ru> >> wrote: > >> Okay but for me consistency is also important. Since we agree to >> disagree on some of the comments and others have not expressed any >> problem I am moving it to committer. > > > Thank you for your efforts improving my patch I've found various spelling and grammatical issues in the docs section, which I've corrected. I've attached a revised patch with these changes. One thing I'm wondering is whether we should be using the term "master", as that's usually paired with "slave", whereas, nowadays, there's a tendency to refer to them as "primary" and "standby". I know we use "master" in some other places in the documentation, but it seems inconsistent to have "master" as a parameter value, but then having "primary" used in some other configuration parameters. I'd also avoid referring to "the library", and just describe what happens without making reference to what's making it happen. Regards Thom
Attachment
On Mon, Oct 26, 2015 at 4:25 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > On 10/14/15 6:41 AM, Victor Wagner wrote: >> 1. It is allowed to specify several hosts in the connect string, either >> in URL-style (separated by comma) or in param=value form (several host >> parameters). > > I'm not fond of having URLs that are not valid URLs according to the > applicable standards. Because then they can't be parsed or composed by > standard libraries. I did a little bit more research on this topic and found out a few things that are interesting, at least to me. First, our documentation reference RFC3986. According to RFC3986: URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] hier-part = "//" authority path-abempty / path-absolute / path-rootless / path-empty authority = [ userinfo "@" ] host [ ":" port ] host = IP-literal / IPv4address / reg-name reg-name = *( unreserved / pct-encoded / sub-delims ) sub-delims include comma but not colon, so I think that postgresql://host1,host2,host3/ is a perfectly good URL, and so is postgresql://host1,host2,host3:1111/ but postgresql://host1:1234,host2:3456/ is not a valid URL because the : after host1 terminates the "host" portion of the URL. The port can't contain anything but digits, so 1234 has to be the port, but then there's nothing to do with the portion between the comma and the following slash, so it is indeed an invalid URI as far as I can tell. However, PostgreSQL's JDBC driver isn't alone in supporting something like this. The MySQL JDBC driver does the same thing: http://lists.mysql.com/cluster/249 https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-configuration-properties.html MongoDB refers to their connection string as a URI, but it uses exactly this syntax: https://docs.mongodb.com/manual/reference/connection-string/ Couchbase's syntax is also quite similar, though it's not clear that they allow port numbers: http://developer.couchbase.com/documentation/server/4.1/developer-guide/connecting.html Of course, since this is a very common need and it's not obvious how to satisfy it within the confines of the URI specification, people have developed a variety of other answers, such as (a) multiple URIs separated by commas, which is a terrible idea because comma can occur *within* URIs, (b) separating multiple host names with a double-dash, (c) including a parameter in the "query" portion of the URI to specify alternate host names, (d) defining a new failover: URI scheme that acts as a container for multiple connection URIs of whatever form is normally supported, and in the case of Oracle (e) creating some frightening monstrosity of proprietary syntax that I don't (care to) understand. All in all, I'm still feeling pretty good about trying to support the same syntax that our JDBC driver already does. It's certainly not a perfect solution, but it is at least compatible with MySQL's JDBC driver and with MongoDB, and in a world where everybody has picked a different approach that's not too bad. Hey, maybe if we use the same syntax as MongoDB they'll let us hang out with the cool kids... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Oct 26, 2015 at 4:25 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On 10/14/15 6:41 AM, Victor Wagner wrote:
All in all, I'm still feeling pretty good about trying to support the
same syntax that our JDBC driver already does. It's certainly not a
perfect solution, but it is at least compatible with MySQL's JDBC
driver and with MongoDB, and in a world where everybody has picked a
different approach that's not too bad. Hey, maybe if we use the same
syntax as MongoDB they'll let us hang out with the cool kids...
On Wed, Oct 19, 2016 at 12:04 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Oct 13, 2016 at 5:53 AM, Victor Wagner <vitus@wagner.pp.ru> wrote: >> On Thu, 13 Oct 2016 12:30:59 +0530 >> Mithun Cy <mithun.cy@enterprisedb.com> wrote: >>> On Fri, Sep 30, 2016 at 2:14 PM, Victor Wagner <vitus@wagner.pp.ru> >>> wrote: >>> Okay but for me consistency is also important. Since we agree to >>> disagree on some of the comments and others have not expressed any >>> problem I am moving it to committer. >> >> Thank you for your efforts improving my patch > > I haven't deeply reviewed this patch, but on a quick read-through it > certainly seems to need a lot of cleanup work. Some more comments: - I am pretty doubtful that the changes to connectOptions2() work as intended. I think that's only going to be called before actualhost and actualport could possibly get set. I don't think we keep reinvoking this function for every new host we try. - It's pretty clear that this isn't going to work if the host list includes a mix of hostnames and UNIX socket addresses. The code that handles the UNIX socket address case is totally separate from the code that handles the multiple-hostname case. - This creates a sort of definitional problem for functions like PQhost(). The documentation says of this and other related functions: "These values are fixed for the life of the PGconn object." But with this patch, that would no longer be true. So at least the documentation needs an update. Actually, though, I think the problem is more fundamental than that. If the user asks for PQhost(conn), do they want the list of all hosts, or the one to which we're currently connected? It might be either, depending on what they intend to do with the information. If they mean to construct a new connection string, they might well want them all; but something like psql's \conninfo only shows the current one. There's also the question of what "the current one" means if we're not connected to any server at all. I'm not sure exactly how to solve these problems, but they need some thought. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 20 October 2016 at 01:15, Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Oct 19, 2016 at 12:04 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Oct 13, 2016 at 5:53 AM, Victor Wagner <vitus@wagner.pp.ru> wrote: > >> On Thu, 13 Oct 2016 12:30:59 +0530 > >> Mithun Cy <mithun.cy@enterprisedb.com> wrote: > >>> On Fri, Sep 30, 2016 at 2:14 PM, Victor Wagner <vitus@wagner.pp.ru> > >>> wrote: > >>> Okay but for me consistency is also important. Since we agree to > >>> disagree on some of the comments and others have not expressed any > >>> problem I am moving it to committer. > >> > >> Thank you for your efforts improving my patch > > > > I haven't deeply reviewed this patch, but on a quick read-through it > > certainly seems to need a lot of cleanup work. > > Some more comments: > > - I am pretty doubtful that the changes to connectOptions2() work as > intended. I think that's only going to be called before actualhost > and actualport could possibly get set. I don't think we keep > reinvoking this function for every new host we try. > > - It's pretty clear that this isn't going to work if the host list > includes a mix of hostnames and UNIX socket addresses. The code that > handles the UNIX socket address case is totally separate from the code > that handles the multiple-hostname case. So should it be the case that it disallows UNIX socket addresses entirely? I can't imagine a list of UNIX socket addresses being that useful. Thom
On Mon, 24 Oct 2016 11:36:31 +0100 Thom Brown <thom@linux.com> wrote: > > - It's pretty clear that this isn't going to work if the host list > > includes a mix of hostnames and UNIX socket addresses. The code > > that handles the UNIX socket address case is totally separate from > > the code that handles the multiple-hostname case. > > So should it be the case that it disallows UNIX socket addresses > entirely? I can't imagine a list of UNIX socket addresses being that > useful. Really, patch was implemented as TCP-only from the beginning. I know only one case, where list of UNIX sockets is useful - test suite. I have to include modifications to PostgresNode.pm in the patch to allow testing with TCP sockets. Also, I can imagine that people would want include one unix socket into list of TCP ones, i.e. one of replicas running on the same host as application server. But patch doesn't support it. May be it should be clarified in the documentation, that this is not supported.
On Wed, 19 Oct 2016 20:15:38 -0400 Robert Haas <robertmhaas@gmail.com> wrote: > > Some more comments: > > - I am pretty doubtful that the changes to connectOptions2() work as > intended. I think that's only going to be called before actualhost > and actualport could possibly get set. I don't think we keep > reinvoking this function for every new host we try. I would investigate this. Probably test cases for .pgpass file should be added. > - It's pretty clear that this isn't going to work if the host list > includes a mix of hostnames and UNIX socket addresses. The code that > handles the UNIX socket address case is totally separate from the code > that handles the multiple-hostname case. Muitihost feature was never intended to work with unix-domain socket. May be it should be clarified in the docs, or even improved error message emitted when unix socket occurs in the multiple host list. > - This creates a sort of definitional problem for functions like > PQhost(). The documentation says of this and other related functions: > "These values are fixed for the life of the PGconn object." But with > this patch, that would no longer be true. So at least the > documentation needs an update. Actually, though, I think the problem I've missed this phrase, documentation should be updated. > is more fundamental than that. If the user asks for PQhost(conn), do > they want the list of all hosts, or the one to which we're currently > connected? It might be either, depending on what they intend to do > with the information. If they mean to construct a new connection > string, they might well want them all; but something like psql's > \conninfo only shows the current one. There's also the question of > what "the current one" means if we're not connected to any server at > all. I'm not sure exactly how to solve these problems, but they need > some thought. I've thought that no one would need to reconstruct connect string from conninfo object. May be I've missed some use cases. Name PQhost suggest that it'll return just one host. So, I've decided to interpret it as 'currently choosen host'. If we really need function which lists all the potentially connected hosts, it should be new function. I hope this would introduce no backward incompatibility, because no existing setup uses multihost connect strings. Thanks for your critique. Victor.
On Wed, Oct 19, 2016 at 7:26 PM, Peter van Hardenberg <pvh@pvh.ca> wrote: > Supporting different ports on different servers would be a much appreciated > feature (I can't remember if it was Kafka or Cassandra that didn't do this > and it was very annoying.) > > Remember, as the connection string gets more complicated, psql supports the > Postgres URL format as a single command-line argument and we may want to > begin encouraging people to use that syntax instead. While I was experimenting with this today, I discovered a problem of interpretation related to IPv6 addresses. Internally, a postgresql:// URL and a connection string are converted into the same format, so postgresql://a,b/ means just the same thing as host=a,b. I thought that could similarly decide that postgresql://a:123,b:456/ is going to get translated to host=a:123,b:456 and then there can be further code to parse that into a list of host-and-port pairs. However, if you do that, then something like host=1:2:3::4:5:6 is fundamentally ambiguous. That :6 is equally valid either as part of the IP address or as a trailing port number specification, and there's no a priori way to know which it is. Today, since the host part can't include a port specifier, it's regarded as part of the IP address, and I think it would probably be a bad idea to change that, as I believe Victor's patch would. He seems to have it in mind that we could allow things like host=[1:2:3::4:5:6] or host=[1:2:3::4:5]:6, which would might be helpful for the future but doesn't avoid changing the meaning of connection strings that work today. So now I think that to make this work correctly, we're going to need to change both the URL parser and also add parsing for the host and port. Let's say the user says this: postgresql://[1::2]:3,[4::5],[6::7]::8/ What I think we need to do is translate that into this: host=1::2,4::5,6::7 port=3,,8 Note the double-comma, indicating a blank port number for the second URL component. When we parse those host and port strings, we can match up each host with the corresponding port number again. Of course, the user could also skip the URL format and directly specify a connection string. And then they might write one where the host and port parts don't have the same number of components, like this: host=a,b,c port=3,4 or host=a,b port=3,4,5 It is obvious what is meant if multiple hosts are given but only a single port number is specified; it is also obvious what is meant if the number of ports is equal to the number of hosts. It is not obvious what it means if there are multiple ports but the number doesn't equal the number of hosts. I think we can either treat that case as an error or else do the following: if there are extra port specifiers, ignore them; if there are extra host specifiers, use the last port in the list for all of the remaining hosts. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, 19 Oct 2016 12:04:27 -0400 Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Oct 13, 2016 at 5:53 AM, Victor Wagner <vitus@wagner.pp.ru> > wrote: > > On Thu, 13 Oct 2016 12:30:59 +0530 > > Mithun Cy <mithun.cy@enterprisedb.com> wrote: > >> On Fri, Sep 30, 2016 at 2:14 PM, Victor Wagner <vitus@wagner.pp.ru> > >> wrote: > >> Okay but for me consistency is also important. Since we agree to > >> disagree on some of the comments and others have not expressed any > >> problem I am moving it to committer. > > > > Thank you for your efforts improving my patch > > I haven't deeply reviewed this patch, but on a quick read-through it > certainly seems to need a lot of cleanup work. > I've did some cleanup, i.e. running pgindent, spell-checking comments and indenting sgml and attaching cleaned up version here. I've also re-added test file t/001-multihost.pl which get lost from previous post. Thanks for your suggestions. -- Victor Wagner <vitus@wagner.pp.ru>
Attachment
On 10/24/16 6:36 AM, Thom Brown wrote: > So should it be the case that it disallows UNIX socket addresses > entirely? I can't imagine a list of UNIX socket addresses being that > useful. But maybe a mixed list of Unix-domain sockets and TCP connections? The nice thing is that is it currently transparent and you don't have to wonder. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 10/24/16 11:57 AM, Robert Haas wrote: > Today, since the host part can't include a > port specifier, it's regarded as part of the IP address, and I think > it would probably be a bad idea to change that, as I believe Victor's > patch would. He seems to have it in mind that we could allow things > like host=[1:2:3::4:5:6] or host=[1:2:3::4:5]:6, which would might be > helpful for the future but doesn't avoid changing the meaning of > connection strings that work today. Let's keep in mind here that the decision to allow database names to contain a connection parameter substructure has caused some security issues. Let's not create more levels of ambiguity and the need to pass around override flags. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Robert Haas wrote: > While I was experimenting with this today, I discovered a problem of > interpretation related to IPv6 addresses. Internally, a postgresql:// > URL and a connection string are converted into the same format, so > postgresql://a,b/ means just the same thing as host=a,b. I thought > that could similarly decide that postgresql://a:123,b:456/ is going to > get translated to host=a:123,b:456 and then there can be further code > to parse that into a list of host-and-port pairs. However, if you do > that, then something like host=1:2:3::4:5:6 is fundamentally > ambiguous. That :6 is equally valid either as part of the IP address > or as a trailing port number specification, and there's no a priori > way to know which it is. Today, since the host part can't include a > port specifier, it's regarded as part of the IP address, and I think > it would probably be a bad idea to change that, as I believe Victor's > patch would. He seems to have it in mind that we could allow things > like host=[1:2:3::4:5:6] or host=[1:2:3::4:5]:6, which would might be > helpful for the future but doesn't avoid changing the meaning of > connection strings that work today. Umm, my recollection regarding IPv6 parsing in the URI syntax is that those must appear inside square brackets -- it's not valid to have the IPv6 address outside brackets, and the port number is necessarily outside the brackets. So there's no ambiguity. If the current patch is allowing IPv6 address to appear outside of brackets, that seems like a bug to me. The string conninfo spec should not accept port numbers in the "host" part. (Does it?) > So now I think that to make this work correctly, we're going to need > to change both the URL parser and also add parsing for the host and > port. Let's say the user says this: > > postgresql://[1::2]:3,[4::5],[6::7]::8/ (There's a double colon before 8, I suppose that's just a typo.) > What I think we need to do is translate that into this: > > host=1::2,4::5,6::7 port=3,,8 > > Note the double-comma, indicating a blank port number for the second > URL component. Sounds reasonable. > And then they might write one where the host and > port parts don't have the same number of components, like this: > > host=a,b,c port=3,4 > or > host=a,b port=3,4,5 > > It is obvious what is meant if multiple hosts are given but only a > single port number is specified; it is also obvious what is meant if > the number of ports is equal to the number of hosts. Agreed -- seems like we can accept both those cases. > It is not obvious what it means if there are multiple ports but the > number doesn't equal the number of hosts. I think we should reject the case of differing number of elements and neither host nor port is a singleton, as an error. The suggestion to ignore some parts seems too error-prone. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Oct 25, 2016 at 2:10 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Robert Haas wrote: > > >> It is not obvious what it means if there are multiple ports but the >> number doesn't equal the number of hosts. > > I think we should reject the case of differing number of elements and > neither host nor port is a singleton, as an error. The suggestion to > ignore some parts seems too error-prone. > +1. I also think returning error is better option in above case. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Oct 24, 2016 at 4:40 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Umm, my recollection regarding IPv6 parsing in the URI syntax is that > those must appear inside square brackets -- it's not valid to have the > IPv6 address outside brackets, and the port number is necessarily > outside the brackets. So there's no ambiguity. If the current patch is > allowing IPv6 address to appear outside of brackets, that seems like a > bug to me. I agree. > The string conninfo spec should not accept port numbers in the "host" > part. (Does it?) Not currently, but the proposed patch would cause it to do so. >> So now I think that to make this work correctly, we're going to need >> to change both the URL parser and also add parsing for the host and >> port. Let's say the user says this: >> >> postgresql://[1::2]:3,[4::5],[6::7]::8/ > > (There's a double colon before 8, I suppose that's just a typo.) Oh, yeah. Right. >> It is not obvious what it means if there are multiple ports but the >> number doesn't equal the number of hosts. > > I think we should reject the case of differing number of elements and > neither host nor port is a singleton, as an error. The suggestion to > ignore some parts seems too error-prone. OK, can do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Oct 24, 2016 at 4:15 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 10/24/16 11:57 AM, Robert Haas wrote: >> Today, since the host part can't include a >> port specifier, it's regarded as part of the IP address, and I think >> it would probably be a bad idea to change that, as I believe Victor's >> patch would. He seems to have it in mind that we could allow things >> like host=[1:2:3::4:5:6] or host=[1:2:3::4:5]:6, which would might be >> helpful for the future but doesn't avoid changing the meaning of >> connection strings that work today. > > Let's keep in mind here that the decision to allow database names to > contain a connection parameter substructure has caused some security > issues. Let's not create more levels of ambiguity and the need to pass > around override flags. I agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Oct 24, 2016 at 11:57 AM, Robert Haas <robertmhaas@gmail.com> wrote: > So now I think that to make this work correctly, we're going to need > to change both the URL parser and also add parsing for the host and > port. Let's say the user says this: > > postgresql://[1::2]:3,[4::5],[6::7]::8/ > > What I think we need to do is translate that into this: > > host=1::2,4::5,6::7 port=3,,8 So here's a patch implementing that. This seems to work, but I haven't tested it in incredible detail at this point, so testing from others would be greatly appreciated (the handling of .pgpass files, in particular, I have not tested). This is loosely inspired by Victor's patch but rewritten from scratch to fix the various design issues which I've commented on in previous emails on this thread. Some notes on decisions that I made while implementing this: - Only the host= and port= parameter support multiple, comma-separated values. In particular, you can only specify a single password. However, you can still end up with different passwords for connections to different servers if the passwords are read from .pgpass. - The PQhost(), PQport(), and PQpass() now functions return the information for the host to which we actually connected or to the first one in the list if we haven't tried to connect yet. Looking through where these functions are actually used in our source code, this behavior seems more useful than any other I could imagine. On the other hand, PQreset() will retry the connection starting with the first host in the original list. - The hostorder, target_server_type, and failover_timeout parameters from Victor's patch are not present in this version. I think we can add those things in future patches once the basic functionality is committed. Or maybe we'll decide that we don't want those things or that we want something different instead, which is fine, too. The point is, they don't need to be there in the first patch. - As requested, it's an error if you specify N hosts and K port numbers where N != K. However, K = 1 is always allowed; that is, if only a single port number is specified, it applies to all hosts. - I have made an attempt to update the documentation appropriately but there might be other sections which also need to be adjusted. Let me know your thoughts. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Thu, Oct 27, 2016 at 9:46 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Wed, Oct 26, 2016 at 8:49 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> Let me know your thoughts. > > One small issue. I tried to run make installcheck after applying patch there > seems to be a invalid write access in code (resulting in crash). > > ==118997== Invalid write of size 1 > ==118997== at 0x4E3DDF1: connectOptions2 (fe-connect.c:884) > ==118997== by 0x4E3D6FF: PQconnectStartParams (fe-connect.c:608) > ==118997== by 0x4E3D553: PQconnectdbParams (fe-connect.c:465) > ==118997== by 0x413067: main (startup.c:245) > ==118997== Address 0x5dc4014 is 0 bytes after a block of size 4 alloc'd > ==118997== at 0x4C29BFD: malloc (in > /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) > ==118997== by 0x4E3DD3C: connectOptions2 (fe-connect.c:880) > ==118997== by 0x4E3D6FF: PQconnectStartParams (fe-connect.c:608) > ==118997== by 0x4E3D553: PQconnectdbParams (fe-connect.c:465) > ==118997== by 0x413067: main (startup.c:245) > > After locally fixing this by allocating an extra space for string terminal > character. make installcheck tests pass. > -(char *) malloc(sizeof(char) * (e - s)); > + (char *) malloc(sizeof(char) * (e - s + 1)); Thanks. Here's a new version with a fix for that issue and also a fix for PQconnectionNeedsPassword(), which was busted in v1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
I also have following observation which can be considered if it is valid.
On Tue, Nov 1, 2016 at 2:03 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Thu, Oct 27, 2016 at 11:15 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> Thanks. Here's a new version with a fix for that issue and also a fix >> for PQconnectionNeedsPassword(), which was busted in v1. > I did some more testing of the patch for both URI and (host, port) parameter > pairs. I did test for ipv4, ipv6, UDS type of sockets, Also tested different > password using .pgpass. I did not see any major issue with. All seems to > work as defined. Tested the behaviors of API PQport, PQhost, PQpass, PQreset > all works as defined. > > I also have following observation which can be considered if it is valid. > > 1. URI do not support UDS, where as (host,port) pair support same. > But there is one problem with this, instead of reporting it, we discard the > URI input (@conninfo_uri_parse_options) and try to connect to default unix > socket > > [mithun@localhost bin]$ ./psql > postgres:///home/mithun:5555,127.0.0.1:5555/postgres -U mithun > psql: could not connect to server: No such file or directory > Is the server running locally and accepting > connections on Unix domain socket "/tmp/.s.PGSQL.5432"? > server is running on both the sockets. That's the wrong syntax. If you look in https://www.postgresql.org/docs/devel/static/libpq-connect.html under "32.1.1.2. Connection URIs", it gives an example of how to include a slash in a pathname. You have to use %2F, or else the URL parser will think you're starting a new section of the URI. I believe this works fine if you use the correct syntax. > 2. If we fail to connect we get an error message for each of the addrinfo we > tried to connect, wondering if we could simplify same. Just a suggestion. > [mithun@localhost bin]$ ./psql postgres://,,,,,,,/postgres > psql: could not connect to server: Connection refused > Is the server running on host "" (::1) and accepting > TCP/IP connections on port 5432? > could not connect to server: Connection refused > Is the server running on host "" (127.0.0.1) and accepting > TCP/IP connections on port 5432? > could not connect to server: Connection refused > Is the server running on host "" (::1) and accepting > TCP/IP connections on port 5432? > could not connect to server: Connection refused > Is the server running on host "" (127.0.0.1) and accepting > TCP/IP connections on port 5432? > could not connect to server: Connection refused > Is the server running on host "" (::1) and accepting > TCP/IP connections on port 5432? > could not connect to server: Connection refused > Is the server running on host "" (127.0.0.1) and accepting > TCP/IP connections on port 5432? > could not connect to server: Connection refused > Is the server running on host "" (::1) and accepting > TCP/IP connections on port 5432? > could not connect to server: Connection refused > Is the server running on host "" (127.0.0.1) and accepting > TCP/IP connections on port 5432? > could not connect to server: Connection refused > Is the server running on host "" (::1) and accepting > TCP/IP connections on port 5432? > could not connect to server: Connection refused > Is the server running on host "" (127.0.0.1) and accepting > TCP/IP connections on port 5432? > could not connect to server: Connection refused > Is the server running on host "" (::1) and accepting > TCP/IP connections on port 5432? > could not connect to server: Connection refused > Is the server running on host "" (127.0.0.1) and accepting > TCP/IP connections on port 5432? > could not connect to server: Connection refused > Is the server running on host "" (::1) and accepting > TCP/IP connections on port 5432? > could not connect to server: Connection refused > Is the server running on host "" (127.0.0.1) and accepting > TCP/IP connections on port 5432? > could not connect to server: Connection refused > Is the server running on host "" (::1) and accepting > TCP/IP connections on port 5432? > could not connect to server: Connection refused > Is the server running on host "" (127.0.0.1) and accepting > TCP/IP connections on port 5432? That output seems fine to me. In a real connection string, you're not likely to have so many duplicated addresses, and it's good for the error message to make clear which addresses were tried and what happened for each one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Starting program: /home/mithun/libpqbin/bin/./psql postgres://%2fhome%2fmithun:5555/postgres -U mithun1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/home/mithun/.s.PGSQL.5555"?
Starting program: /home/mithun/libpqbin/bin/./psql postgres://%2home%2mithun: 5555/postgres -U mithun1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7bb8d4f in PQpass (conn=0x68aaa0) at fe-connect.c:5582
5582 password = conn->connhost[conn->whichhost].password;
Missing separate debuginfos, use: debuginfo-install glibc-2.17-106.el7_2.6.x86_64 ncurses-libs-5.9-13.20130511.el7.x86_64 readline-6.2-9.el7.x86_64
On Tue, Nov 1, 2016 at 9:43 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: >> Starting program: /home/mithun/libpqbin/bin/./psql >> postgres://%2fhome%2fmithun:5555/postgres -U mithun1 >> [Thread debugging using libthread_db enabled] >> Using host libthread_db library "/lib64/libthread_db.so.1". >> psql: could not connect to server: No such file or directory >> Is the server running locally and accepting >> connections on Unix domain socket "/home/mithun/.s.PGSQL.5555"? > > Accidentally when testing further on this line I found a crash when invalid > encoding "%2" (not recognised as hexa) was used. > Test: >> >> Starting program: /home/mithun/libpqbin/bin/./psql >> postgres://%2home%2mithun:5555/postgres -U mithun1 You seem to be confused about how to escape characters in URLs. The format is a % followed by exactly two hex digits. In this case, that would give you %2h and %2m, and h and m are not hex characters, so the error message is right. Your input is not valid. > There can be PGconn which have no connhost. > exposed API's PQpass, PQreset->connectDBStart access conn->connhost without > checking whether it is set. Can you provide a concrete test scenario or some test code that fails?connhost is supposed to be getting set in connectOptions2(),which is run (AIUI) when the PGconn is first created. So I think that it will be set in all scenarios of the type you mention, but I might be missing something. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Nov 1, 2016 at 11:21 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Tue, Nov 1, 2016 at 7:44 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> Starting program: /home/mithun/libpqbin/bin/./psql >>> postgres://%2home%2mithun:5555/postgres -U mithun1 >>Can you provide a concrete test scenario or some test code that fails? >>connhost is supposed to be getting set in connectOptions2(), which is >>run (AIUI) when the PGconn is first created. So I think that it will >>be set in all scenarios of the type you mention, but I might be >>missing something. > > Sorry if my sentence is confusing > If I give a proper hexadecimal encoding % followed by 2 hexadigit (i.e. for > e.g %2f, %2a) every thing is fine. When I pass a invalid hexadigit encoding > eg: %2h, %2m among the host string e.g > "postgres://%2home%2mithun:5555/postgres". then "PQconnectdbParams()" fails > before calling connectOptions2(). In that case failed PQconnectdbParams() > also return a PGconn where connhost is not set. If we call PQpass(), > PQReset() on such a PGconn we get a crash. > > A simple test case which crash is: > ./psql 'postgres://%2hxxx:5555/postgres' > > Call stack: > -------------- > #0 0x00007ffff7bb8d4f in PQpass (conn=0x68aa10) at fe-connect.c:5582 > #1 0x00007ffff7bb907a in PQconnectionNeedsPassword (conn=0x68aa10) at > fe-connect.c:5727 > #2 0x00000000004130aa in main (argc=2, argv=0x7fffffffdff8) at > startup.c:250 Ah, nuts. Thanks, good catch. Should be fixed in the attached version. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Wed, Nov 2, 2016 at 1:59 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Tue, Nov 1, 2016 at 9:42 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> Ah, nuts. Thanks, good catch. Should be fixed in the attached version. > > I repeated the test on new patch, It works fine now, Also did some more > negative tests forcibly failing some internal calls. All tests have passed. > This patch works as described and look good to me. Great, committed. There's still potentially more work to be done here, because my patch omits some features that were present in Victor's original submission, like setting the failover timeout, optionally randomizing the order of the hosts, and distinguishing between master and standby servers; Victor, or anyone, please feel free to submit separate patches for those things. Thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Mithun Cy > Among the remaining things I have worked on failover to new master idea. > Below patch implement that idea. This is taken from Victors patch but > rewritten by me to do some cleanup. As in Victor's patch we have a new > connection parameter "target_server_type", It can take 2 values 1. "any" > 2. "master" with DEFAULT as "any". If it's has the value "any" we can connect > to any of the host server (both master(primary) and slave(standby)). If > the value is "master" then we try to connect to master(primary) only. > NOTE: Parameter name is inspired and taken from PostgreSql JDBC Driver > <https://jdbc.postgresql.org/documentation/head/connect.html> . I'm interested to review this patch (but I haven't read it yet, I'm reading Robert's patch now.) Are you planning a newCommitFest entry? Why don't you add "standby" and "prefer_standby" as the target_server_type value? Are you thinking that those values areuseful with load balancing feature? > The main difference between Victor's and this new patch is Default value > of parameter target_server_type. In Victor's patch if number of host in > connection string is 1 then default value is "any" (This was done to make > sure old psql connect to standby as it is now). If it is greater than 1 > then default value is set as "master". For me this appeared slightly > inconsistent having default value as "any" for any number of connection > appeared more appropriate which is also backward compatible. And, if user > want failover to master he should ask for it by setting > target_server_type=master in connection string. That's sensible, I agree. Regards Takayuki Tsunakawa
From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Mithun Cy > Yes this patch will only address failover to new master, values "master" > and "any" appeared sufficient for that case. Do you mean that unlike pgJDBC "standby" and "prefer_standby" are useless, or they are useful but you don't have time toimplement it and want to do it in the near future? Do you mind if I do it if time permits me? I think they are usefulwithout load balancing feature, when the user has multiple standbys for HA. Could you add a new entry in CommitFest 2017-1? I'm afraid we can't track the status of your patch because the original patchin this thread has already been committed. Regards Takayuki Tsunakawa
From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas > Great, committed. There's still potentially more work to be done here, > because my patch omits some features that were present in Victor's original > submission, like setting the failover timeout, optionally randomizing the > order of the hosts, and distinguishing between master and standby servers; > Victor, or anyone, please feel free to submit separate patches for those > things. I did a few tests with ECPG. I'm satisfied with the current behavior, but someone says different. I'd like to share theresult. The following literal connection strings succeeded. host1 is a server where PostgreSQL is not running, and host2 is whereit's running. I could connect to the database server on host2. EXEC SQL CONNECT TO 'tcp:postgresql://host1,host2:5450/postgres'; EXEC SQL CONNECT TO 'tcp:postgresql://host1,host2:5450,5450/postgres'; EXEC SQL CONNECT TO 'postgres@host1,host2:5450'; EXEC SQL CONNECT TO 'postgres@host1,host2:5450,5450'; EXEC SQL CONNECT TO 'tcp:postgresql://?service=my_service'; ~/.pg_service.conf [my_service] host=host1,host2 port=5450 # and port=5450,5450 case dbname=postgres But this one makes PQconnectdbParams() fail, because the passed "host" is "host1:5450,host2" and "port" is "5450". ECPGconnect()'sparser is different from libpq's. However, the tcp:postgresql:// syntax is not described a URL in the manual,so I think it's sufficient to just describe the syntax in the ECPG article. EXEC SQL CONNECT TO 'tcp:postgresql://host1:5450,host2:5450/postgres'; And without the single quote like below, ecpg fails to precompile the source file. I also think it's enough to state inthe manual "quote the connection target if you specify multiple hosts or ports". EXEC SQL CONNECT TO tcp:postgresql://host1,host2:5450,5450/postgres; connect.ec:12: ERROR: syntax error at or near "," Comments? Regards Takayuki Tsunakawa
On 11/9/16 10:05 AM, Mithun Cy wrote: > As in Victor's patch we have a new connection parameter > "target_server_type", It can take 2 values 1. "any" 2. "master" with > DEFAULT as "any". If it's has the value "any" we can connect to any of > the host server (both master(primary) and slave(standby)). If the value > is "master" then we try to connect to master(primary) only. We tend to use the term "primary" instead of "master". Will this work with logical replication? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas > Great, committed. There's still potentially more work to be done here, > because my patch omits some features that were present in Victor's original > submission, like setting the failover timeout, optionally randomizing the > order of the hosts, and distinguishing between master and standby servers; > Victor, or anyone, please feel free to submit separate patches for those > things. The attached patch fixes some bugs and make a clarification for doc. Could you check and test the authentication stuff asI don't have an environment at hand? (1) fe-connect.c There was a memory leak. (2) fe-secure_openssl.c, fe-auth.c GSSAPI/SSPI/SSL authentication requires the target host name, but the code uses conn->pghost which contains a comma-separatedlist of host names. (3) libpq.sgml Added sentences to clarify connect_timeout when it is used with multiple hosts. BTW, let me two questions. Q1: Is there any reason why hostaddr doesn't accept multiple IP addresses? Q2: pg_isready (and hence PQping() family) reports success when one host is alive and other hosts are down. Is this intended? I think this behavior is correct. e.g. host1 is down and host2 is alive. $ pg_isready -h host1,host2 host1,host2:5450 - accepting connections $ echo $? 0 Regards Takayuki Tsunakawa
Attachment
Hi, Mithun Before going deeper into the patch, let me give you some findings. (1) PGconn->target_server_type is not freed in freePGconn(). (2) Could you add PGTARGETSERVERTYPE environment variable? Like other variables, it will ease testing, since users can changethe behavior without altering the connection string here and there. (3) I think it would be better to expose the server state via ParameterStatus protocol message like standard_conforming_strings,instead of running "SELECT pg_is_in_recovery()". We shouldn't want to add one round trip tocheck the server type (master, standby). postmaster can return the server type based on its state (pmState); PM_RUN ismaster, and PM_HOT_STANDBY is standby. In addition, as an impractical concern, DBA can revoke EXECUTE privilege on pg_is_in_recovery()from non-superusers. Regards Takayuki Tsunakawa
"select pg_catalog.pg_client_encoding()" for same. So I think using "SELECT pg_is_in_recovery()" should be fine.
-
From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Mithun Cy > If you are suggesting me to change in protocol messages, I think that would > not be backward compatible to older version servers. I also think such level > of protocol changes will not be allowed. with connection status > CONNECTION_SETENV used for protocol version 2.0 setup, we sent some query > like "select pg_catalog.pg_client_encoding()" for same. So I think using > "SELECT pg_is_in_recovery()" should be fine. No, there's no concern about compatibility. Please look at this: https://www.postgresql.org/docs/devel/static/protocol-flow.html#PROTOCOL-ASYNC [Excerpt] ---------------------------------------- ParameterStatus messages will be generated whenever the active value changes for any of the parameters the backend believesthe frontend should know about. Most commonly this occurs in response to a SET SQL command executed by the frontend,and this case is effectively synchronous — but it is also possible for parameter status changes to occur becausethe administrator changed a configuration file and then sent the SIGHUP signal to the server. Also, if a SET commandis rolled back, an appropriate ParameterStatus message will be generated to report the current effective value. At present there is a hard-wired set of parameters for which ParameterStatus will be generated: they are server_version,server_encoding, client_encoding, application_name, is_superuser, session_authorization, DateStyle, IntervalStyle,TimeZone, integer_datetimes, and standard_conforming_strings. (server_encoding, TimeZone, and integer_datetimeswere not reported by releases before 8.0; standard_conforming_strings was not reported by releases before8.1; IntervalStyle was not reported by releases before 8.4; application_name was not reported by releases before 9.0.)Note that server_version, server_encoding and integer_datetimes are pseudo-parameters that cannot change after startup.This set might change in the future, or even become configurable. Accordingly, a frontend should simply ignore ParameterStatusfor parameters that it does not understand or care about. ---------------------------------------- Regards Takayuki Tsunakawa
From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Mithun Cy > Thanks, my concern is suppose you have 3 server in cluster A(new version), > B(new version), C(old version). If we implement as above only new servers > will send ParameterStatus message to indicate what type of server we are > connected. Server C will not send same. So we will not be able to use new > feature "failover to new master" for such a kind of cluster. No, the streaming replication requires the same major release for all member servers, so there's no concern about the mixed-versioncluster. Sorry, pmState can only be used in postmaster. In our context, postgres can use RecoveryInProgress(). Anyway, in additionto the reduced round trip, the libpq code would be much simpler. Regards Takayuki Tsunakawa
On Mon, Nov 14, 2016 at 2:38 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Fri, Nov 11, 2016 at 7:33 PM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> We tend to use the term "primary" instead of "master". > > Thanks, I will use primary instead of master in my next patch. > >>Will this work with logical replication? > > I have not tested with logical replication. Currently we identify the > primary to connect based on result of "SELECT pg_is_in_recovery()". So I > think it works. Do you want me test a particular setup? If logical replication is in use, none of the servers involved would be in recovery. I'm not sure what command would need to be used to assess whether we've got a master or a standby, but probably not that one. This gets at one of my earlier complaints about this part of the functionality, which is that hardcoding that particular SQL statement into libpq seems like a giant hack. However, I'm not sure what to do about it. The functionality is clearly useful, because JDBC has it, and Victor proposed this patch to add it to libpq, and - totally independently of any of that - EnterpriseDB has a customer who has requested libpq support for this as well. So I am tempted to just hold my nose and hard-code the SQL as JDBC is presumably already doing. If we figure out what the equivalent for logical replication would be we can add something to cover that case, too. It's ugly, but I don't have a better idea, and I think there's value in being compatible with what JDBC has already done (even if it's not what we would have chosen to do tabula rasa). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Nov 14, 2016 at 8:09 PM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote: > From: pgsql-hackers-owner@postgresql.org >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Mithun Cy >> Thanks, my concern is suppose you have 3 server in cluster A(new version), >> B(new version), C(old version). If we implement as above only new servers >> will send ParameterStatus message to indicate what type of server we are >> connected. Server C will not send same. So we will not be able to use new >> feature "failover to new master" for such a kind of cluster. > > No, the streaming replication requires the same major release for all member servers, so there's no concern about the mixed-versioncluster. True, but there is a concern about a newer libpq connecting to older servers. If we mimic what JDBC is already doing, we'll be compatible and you'll be able to use this feature with a v10 libpq without worrying about whether the target server is also v10. If we invent something new on the server side, then you'll need to be sure you have both a v10 libpq and v10 server. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Mon, Nov 14, 2016 at 2:38 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > > I have not tested with logical replication. Currently we identify the > > primary to connect based on result of "SELECT pg_is_in_recovery()". So I > > think it works. Do you want me test a particular setup? > > If logical replication is in use, none of the servers involved would > be in recovery. I'm not sure what command would need to be used to > assess whether we've got a master or a standby, but probably not that > one. This gets at one of my earlier complaints about this part of the > functionality, which is that hardcoding that particular SQL statement > into libpq seems like a giant hack. However, I'm not sure what to do > about it. The functionality is clearly useful, because JDBC has it, > and Victor proposed this patch to add it to libpq, and - totally > independently of any of that - EnterpriseDB has a customer who has > requested libpq support for this as well. So I am tempted to just > hold my nose and hard-code the SQL as JDBC is presumably already > doing. If we figure out what the equivalent for logical replication > would be we can add something to cover that case, too. It's ugly, but > I don't have a better idea, and I think there's value in being > compatible with what JDBC has already done (even if it's not what we > would have chosen to do tabula rasa). I would rather come up with something that works in both cases that we can extend internally later, say pg_is_primary_node() or something like that instead; and we implement it initially by returning the inverse of pg_is_in_recovery() for replicated non-logical flocks, while we figure out what to do in logical replication. Otherwise it will be harder to change later if we embed it in libpq, and we may be forced into supporting nonsensical situations such as having pg_is_in_recovery() return true for logical replication primary nodes. FWIW I'm not sure "primary" is the right term here either. I think what we really want to know is whether the node can accept writes; maybe "pg_is_writable_node". -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Nov 15, 2016 at 9:42 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Robert Haas wrote: >> On Mon, Nov 14, 2016 at 2:38 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: >> > I have not tested with logical replication. Currently we identify the >> > primary to connect based on result of "SELECT pg_is_in_recovery()". So I >> > think it works. Do you want me test a particular setup? >> >> If logical replication is in use, none of the servers involved would >> be in recovery. I'm not sure what command would need to be used to >> assess whether we've got a master or a standby, but probably not that >> one. This gets at one of my earlier complaints about this part of the >> functionality, which is that hardcoding that particular SQL statement >> into libpq seems like a giant hack. However, I'm not sure what to do >> about it. The functionality is clearly useful, because JDBC has it, >> and Victor proposed this patch to add it to libpq, and - totally >> independently of any of that - EnterpriseDB has a customer who has >> requested libpq support for this as well. So I am tempted to just >> hold my nose and hard-code the SQL as JDBC is presumably already >> doing. If we figure out what the equivalent for logical replication >> would be we can add something to cover that case, too. It's ugly, but >> I don't have a better idea, and I think there's value in being >> compatible with what JDBC has already done (even if it's not what we >> would have chosen to do tabula rasa). > > I would rather come up with something that works in both cases that we > can extend internally later, say pg_is_primary_node() or something like > that instead; and we implement it initially by returning the inverse of > pg_is_in_recovery() for replicated non-logical flocks, while we figure > out what to do in logical replication. Otherwise it will be harder to > change later if we embed it in libpq, and we may be forced into > supporting nonsensical situations such as having pg_is_in_recovery() > return true for logical replication primary nodes. I don't think we'll be backed into a corner like that, because we can always make this contingent on server version. libpq will have that available. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Nov 15, 2016 at 5:58 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Nov 15, 2016 at 9:42 AM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: >> I would rather come up with something that works in both cases that we >> can extend internally later, say pg_is_primary_node() or something like >> that instead; and we implement it initially by returning the inverse of >> pg_is_in_recovery() for replicated non-logical flocks, while we figure >> out what to do in logical replication. Otherwise it will be harder to >> change later if we embed it in libpq, and we may be forced into >> supporting nonsensical situations such as having pg_is_in_recovery() >> return true for logical replication primary nodes. > > I don't think we'll be backed into a corner like that, because we can > always make this contingent on server version. libpq will have that > available. But even with server version checking code, that code will be inside libpq so there will be old libpq versions in the field that won't know the proper query to send to new server versions.
On Tue, Nov 15, 2016 at 3:42 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > FWIW I'm not sure "primary" is the right term here either. I think what > we really want to know is whether the node can accept writes; maybe > "pg_is_writable_node". This made me think of another complication: what about cascading replication where the middle node is both a master and a slave? You'd probably not want to fallback to the middle node even though it is a master for its downstream.
Catalin Iacob wrote: > On Tue, Nov 15, 2016 at 3:42 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > FWIW I'm not sure "primary" is the right term here either. I think what > > we really want to know is whether the node can accept writes; maybe > > "pg_is_writable_node". > > This made me think of another complication: what about cascading > replication where the middle node is both a master and a slave? You'd > probably not want to fallback to the middle node even though it is a > master for its downstream. Exactly my thought, hence the above suggestion. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Nov 15, 2016 at 12:53 PM, Catalin Iacob <iacobcatalin@gmail.com> wrote: > On Tue, Nov 15, 2016 at 5:58 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Tue, Nov 15, 2016 at 9:42 AM, Alvaro Herrera >> <alvherre@2ndquadrant.com> wrote: >>> I would rather come up with something that works in both cases that we >>> can extend internally later, say pg_is_primary_node() or something like >>> that instead; and we implement it initially by returning the inverse of >>> pg_is_in_recovery() for replicated non-logical flocks, while we figure >>> out what to do in logical replication. Otherwise it will be harder to >>> change later if we embed it in libpq, and we may be forced into >>> supporting nonsensical situations such as having pg_is_in_recovery() >>> return true for logical replication primary nodes. >> >> I don't think we'll be backed into a corner like that, because we can >> always make this contingent on server version. libpq will have that >> available. > > But even with server version checking code, that code will be inside > libpq so there will be old libpq versions in the field that won't know > the proper query to send to new server versions. Good point. pg_is_writable_node() sounds good, then, and we can still send pg_is_in_recovery() if we're connected to a pre-10 version. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> So I am tempted to just
> hold my nose and hard-code the SQL as JDBC is presumably already
doing.
> JDBC is sending "show transaction_read_only" to find whether it is master > or not. > Victor's patch also started with it, but later it was transformed into > pg_is_in_recovery > by him as it appeared more appropriate to identify the master / slave. In my understanding pg_is_in_recovery() returns true if it's a standby node. However, even if it returns other than true, the server is not necessarily a primary. Even it's not configured as a streaming replication primary, it returns other than true. So if your intention is finding a primary, I am not sure if pg_is_in_recovery() is the best solution. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Tue, Nov 15, 2016 at 11:31 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: >> > So I am tempted to just >> > hold my nose and hard-code the SQL as JDBC is presumably already >> doing. > > JDBC is sending "show transaction_read_only" to find whether it is master or > not. > Victor's patch also started with it, but later it was transformed into > pg_is_in_recovery > by him as it appeared more appropriate to identify the master / slave. Hmm, let's go back to the JDBC method, then. "show transaction_read_only" will return true on a standby, but presumably also on any other non-writable node. You could even force it to be true artificially if you wanted to force traffic off of a node, using ALTER {SYSTEM|USER ...|DATABASE ..} SET default_transaction_read_only = on I think that would address Alvaro's concern, and it's nicer anyway if libpq and JDBC are doing the same thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
[moving this branch of discussion to pgsql-jdbc] On Tue, Nov 15, 2016 at 10:31 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > JDBC is sending "show transaction_read_only" to find whether it > is master or not. If true, that's insane. That can be different on each connection to the cluster and can change tens of thousands of times per second on any connection! I know of one large shop that sets default_transaction_read_only = true because 99% of their transactions are read only and they use serializable transactions -- which run faster and with less contention when transactions which don't need to write are flagged as read only. It seems safer to them to only turn off the read only property for transactions which might need to write. > Victor's patch also started with it, but later it was transformed into > pg_is_in_recovery > by him as it appeared more appropriate to identify the master / slave. I don't know whether that is ideal, but it is sure a lot better than something which can change with every transaction -- or even within a transaction (in one direction). -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Nov 16, 2016 at 2:50 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Hmm, let's go back to the JDBC method, then. "show > transaction_read_only" will return true on a standby, but presumably > also on any other non-writable node. You could even force it to be > true artificially if you wanted to force traffic off of a node, using > ALTER {SYSTEM|USER ...|DATABASE ..} SET default_transaction_read_only > = on > > I think that would address Alvaro's concern, and it's nicer anyway if > libpq and JDBC are doing the same thing. Not sure I agree that using this is a good idea in the first place. But if we end up using this, I really think the docs should be very explicit about what's implemented and not just say master/any. With the master/any docs in the patch I would be *very* surprised if a master is skipped only because it was configured with default_transaction_read_only = on.
On Wed, Nov 16, 2016 at 12:00 PM, Catalin Iacob <iacobcatalin@gmail.com> wrote: > On Wed, Nov 16, 2016 at 2:50 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> Hmm, let's go back to the JDBC method, then. "show >> transaction_read_only" will return true on a standby, but presumably >> also on any other non-writable node. You could even force it to be >> true artificially if you wanted to force traffic off of a node, using >> ALTER {SYSTEM|USER ...|DATABASE ..} SET default_transaction_read_only >> = on >> >> I think that would address Alvaro's concern, and it's nicer anyway if >> libpq and JDBC are doing the same thing. > > Not sure I agree that using this is a good idea in the first place. > > But if we end up using this, I really think the docs should be very > explicit about what's implemented and not just say master/any. With > the master/any docs in the patch I would be *very* surprised if a > master is skipped only because it was configured with > default_transaction_read_only = on. It seems like it is going to be difficult to please everyone here 100%, because there are multiple conflicting priorities. But we can definitely document whatever choices we make. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas > On Mon, Nov 14, 2016 at 8:09 PM, Tsunakawa, Takayuki > <tsunakawa.takay@jp.fujitsu.com> wrote: > > From: pgsql-hackers-owner@postgresql.org > >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Mithun Cy > >> Thanks, my concern is suppose you have 3 server in cluster A(new > >> version), B(new version), C(old version). If we implement as above > >> only new servers will send ParameterStatus message to indicate what > >> type of server we are connected. Server C will not send same. So we > >> will not be able to use new feature "failover to new master" for such > a kind of cluster. > > > > No, the streaming replication requires the same major release for all > member servers, so there's no concern about the mixed-version cluster. > > True, but there is a concern about a newer libpq connecting to older servers. > If we mimic what JDBC is already doing, we'll be compatible and you'll be > able to use this feature with a v10 libpq without worrying about whether > the target server is also v10. If we invent something new on the server > side, then you'll need to be sure you have both a v10 libpq and v10 server. Do we really want to enable libpq failover against pre-V10 servers? I don't think so, as libpq is a part of PostgreSQL andlibpq failover is a new feature in PostgreSQL 10. At least, as one user, I don't want PostgreSQL to sacrifice anotherround trip to establish a connection. As a developer, I don't want libpq code more complex than necessary (the proposedpatch adds a new state to the connection state machine.) And I think it's natural for the server to return the serverattribute (primary/standby, writable, etc.) as a response to the Startup message like server_version, standard_conforming_stringsand server_encoding. Regards Takayuki Tsunakawa
From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tatsuo Ishii > In my understanding pg_is_in_recovery() returns true if it's a standby node. > However, even if it returns other than true, the server is not necessarily > a primary. Even it's not configured as a streaming replication primary, > it returns other than true. > > So if your intention is finding a primary, I am not sure if > pg_is_in_recovery() is the best solution. Yes, I don't think pg_is_in_recovery() is the best, but there doesn't seem to be a better solution. pg_is_in_recovery(),as its name clearly suggests, returns true if the server is performing recovery. For example, it returnstrue if hot_standby=on is present in postgresql.conf and the recovery from backup is in progress. It's not a standby. Regards Takayuki Tsunakawa
From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas > Hmm, let's go back to the JDBC method, then. "show transaction_read_only" > will return true on a standby, but presumably also on any other non-writable > node. You could even force it to be true artificially if you wanted to > force traffic off of a node, using ALTER {SYSTEM|USER ...|DATABASE ..} SET > default_transaction_read_only = on > > I think that would address Alvaro's concern, and it's nicer anyway if libpq > and JDBC are doing the same thing. If you prefer consistency between libpq and JDBC, then we could correct JDBC. People here should know the server state well,and be able to figure out a good specification. Regards Takayuki Tsunakawa
On Wed, Nov 16, 2016 at 9:00 PM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote: > Do we really want to enable libpq failover against pre-V10 servers? I don't think so, as libpq is a part of PostgreSQLand libpq failover is a new feature in PostgreSQL 10. At least, as one user, I don't want PostgreSQL to sacrificeanother round trip to establish a connection. As a developer, I don't want libpq code more complex than necessary(the proposed patch adds a new state to the connection state machine.) And I think it's natural for the serverto return the server attribute (primary/standby, writable, etc.) as a response to the Startup message like server_version,standard_conforming_strings and server_encoding. Well, generally speaking, a new feature that works against older server is better than one that doesn't. Of course, if that entails making other compromises then you have to decide whether it's worth it, but SELECT pg_is_in_recovery() and SHOW transaction_read_only exist in older versions so if we pick either of those methods then it will just work. If we decide to invent some completely new method of distinguishing masters from standbys, then it might not, but that would be a drawback of such a choice, not a benefit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2. Added PGTARGETSERVERTYPE.
Attachment
From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Mithun Cy > I am adding next version of the patch it have following fixes. > Tsunakawa's comments > > 1. PGconn->target_server_type is now freed in freePGconn() 2. Added > PGTARGETSERVERTYPE. > > > Additional comments from others > 3. Moved from SELECT pg_is_in_recovery() to SHOW transaction_read_only > now should handle different kind of replication, as we recognise server > to which writable connection can be made as primary. Very exactly like JDBC > driver. Also documented about it. > 4. renamed words from master to primary. Thank you. The following items need addressing. Some of them require some more discussion to reach consensus, and I hopethey will settle down soon. After checking the progress for a week or so, I'll mark the CommitFest entry as "ready forcommitter" or "waiting on author". (1) + server. Set this to <literaL>any</literal>, if you want to connect to + A server is recognized as a primary/standby by observering whether it Typo. <literaL. -> <literal>, and "observering" -> "observing". (2) + {"target_server_type", "PGTARGETSERVERTYPE", NULL, NULL, + "Target server type", "", 6, Looking at existing parameters, the default value is defined as a macro, and the display label is a sequence of words separatedby "-". i.e. + {"target_server_type", "PGTARGETSERVERTYPE", DefaultTargetServerType, NULL, + "Target-Server-Type", "", 6, (3) Please avoid adding another round trip by using a GUC_REPORTed variable (ParameterStatus entry). If you want to supportthis libpq failover with pre-10 servers, you can switch the method of determining the primary based on the serverversion. But I don't think it's worth supporting older servers at the price of libpq code complexity. (4) Please consider supporting "standby" and "prefer_standby" like PgJDBC. They are useful without load balancing when multiplestandbys are used for HA. (5) I haven't tracked the progress of logical replication, but will target_server_type be likely to be usable with it? How willtarget_server_type fit logical replication? Regards Takayuki Tsunakawa
On 17 November 2016 at 10:57, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Nov 16, 2016 at 9:00 PM, Tsunakawa, Takayuki > <tsunakawa.takay@jp.fujitsu.com> wrote: >> Do we really want to enable libpq failover against pre-V10 servers? I don't think so, as libpq is a part of PostgreSQLand libpq failover is a new feature in PostgreSQL 10. At least, as one user, I don't want PostgreSQL to sacrificeanother round trip to establish a connection. As a developer, I don't want libpq code more complex than necessary(the proposed patch adds a new state to the connection state machine.) And I think it's natural for the serverto return the server attribute (primary/standby, writable, etc.) as a response to the Startup message like server_version,standard_conforming_strings and server_encoding. > > Well, generally speaking, a new feature that works against older > server is better than one that doesn't. Of course, if that entails > making other compromises then you have to decide whether it's worth > it, but SELECT pg_is_in_recovery() and SHOW transaction_read_only > exist in older versions so if we pick either of those methods then it > will just work. If we decide to invent some completely new method of > distinguishing masters from standbys, then it might not, but that > would be a drawback of such a choice, not a benefit. We can and probably should have both. If the server tells us on connect whether it's a standby or not, use that. Otherwise, ask it. That way we don't pay the round-trip cost and get the log spam when talking to newer servers that send us something useful in the startup packet, but we can still query it on older servers. Graceful fallback. Every round trip is potentially very expensive. Having libpq do them unnecessarily is bad. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Nov 17, 2016 at 10:08 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 17 November 2016 at 10:57, Robert Haas <robertmhaas@gmail.com> wrote: >> On Wed, Nov 16, 2016 at 9:00 PM, Tsunakawa, Takayuki >> <tsunakawa.takay@jp.fujitsu.com> wrote: >>> Do we really want to enable libpq failover against pre-V10 servers? I don't think so, as libpq is a part of PostgreSQLand libpq failover is a new feature in PostgreSQL 10. At least, as one user, I don't want PostgreSQL to sacrificeanother round trip to establish a connection. As a developer, I don't want libpq code more complex than necessary(the proposed patch adds a new state to the connection state machine.) And I think it's natural for the serverto return the server attribute (primary/standby, writable, etc.) as a response to the Startup message like server_version,standard_conforming_strings and server_encoding. >> >> Well, generally speaking, a new feature that works against older >> server is better than one that doesn't. Of course, if that entails >> making other compromises then you have to decide whether it's worth >> it, but SELECT pg_is_in_recovery() and SHOW transaction_read_only >> exist in older versions so if we pick either of those methods then it >> will just work. If we decide to invent some completely new method of >> distinguishing masters from standbys, then it might not, but that >> would be a drawback of such a choice, not a benefit. > > We can and probably should have both. > > If the server tells us on connect whether it's a standby or not, use that. > > Otherwise, ask it. > > That way we don't pay the round-trip cost and get the log spam when > talking to newer servers that send us something useful in the startup > packet, but we can still query it on older servers. Graceful fallback. > > Every round trip is potentially very expensive. Having libpq do them > unnecessarily is bad. True, but raising the bar for this feature so that it doesn't get done is also bad. It can be improved in a later patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 19 November 2016 at 01:29, Robert Haas <robertmhaas@gmail.com> wrote: >> We can and probably should have both. >> >> If the server tells us on connect whether it's a standby or not, use that. >> >> Otherwise, ask it. >> >> That way we don't pay the round-trip cost and get the log spam when >> talking to newer servers that send us something useful in the startup >> packet, but we can still query it on older servers. Graceful fallback. >> >> Every round trip is potentially very expensive. Having libpq do them >> unnecessarily is bad. > > True, but raising the bar for this feature so that it doesn't get done > is also bad. It can be improved in a later patch. Good point. Starting with the followup query method seems fine. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Attachment
From: Mithun Cy [mailto:mithun.cy@enterprisedb.com] > > + {"target_server_type", "PGTARGETSERVERTYPE", > DefaultTargetServerType, NULL, > > + "Target-Server-Type", "", 6, > > Thanks fixed. + {"target_server_type", "PGTARGETSERVERTYPE", NULL, NULL, The default value field is still NULL. > > Please avoid adding another round trip by using a GUC_REPORTed variable > (ParameterStatus entry). If you want to support this libpq failover with > >pre-10 servers, you can switch the method of determining the primary based > on the server version. But I don't think it's worth supporting older > servers > at the price of libpq code complexity. > > Currently there is no consensus around this. For now, making this patch > to address failover to next primary as similar to JDBC seems sufficient > for me. > On next proposal of patch I think we can try to extend as you have proposed I don't think show transaction is correct, because it's affected by default_transaction_read_only and ALTER DATABASE/USERSET transaction_read_only. transaction_read_only is a transaction attribute, not the server type. What wewant to use to determine the connection target should be not only whether the transaction is read only, but also otherattributes such as whether temporary tables can be used (only standby), maintenance commands can be executed (VACUUMand ANALYZE can run when transaction_read_only is on, but not on standby), etc. And how about other DBMSs? Do wereally want to determine the target based on transaction_read_only while e.g. Oracle is based on primary/standby? If you really want the transaction_read_only attribute for your application, then your app should execute "SET default_transaction_read_only= on" upon connection, or ALTER DATABASE/USER SET default_transaction_read_only = on in advance. Regards Takayuki Tsunakawa
From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas > On Thu, Nov 17, 2016 at 10:08 PM, Craig Ringer <craig@2ndquadrant.com> > wrote: > > We can and probably should have both. > > > > If the server tells us on connect whether it's a standby or not, use that. > > > > Otherwise, ask it. > > > > That way we don't pay the round-trip cost and get the log spam when > > talking to newer servers that send us something useful in the startup > > packet, but we can still query it on older servers. Graceful fallback. > > > > Every round trip is potentially very expensive. Having libpq do them > > unnecessarily is bad. > > True, but raising the bar for this feature so that it doesn't get done is > also bad. It can be improved in a later patch. I thought you are very strict about performance, so I hesitate to believe you forgive the additional round trip. libpq failoveris a new feature in v10, so I think it should provide the best user experience for v10 client+server users from thestart. If the concern is the time for development, then support for older releases can be added in a later patch. There are still several months left for v10. Why don't we try the best? Could you share the difficulty? Regards Takayuki Tsunakawa
On Sun, Nov 20, 2016 at 8:48 PM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote: >> True, but raising the bar for this feature so that it doesn't get done is >> also bad. It can be improved in a later patch. > > I thought you are very strict about performance, so I hesitate to believe you forgive the additional round trip. libpqfailover is a new feature in v10, so I think it should provide the best user experience for v10 client+server usersfrom the start. If the concern is the time for development, then support for older releases can be added in a laterpatch. > > There are still several months left for v10. Why don't we try the best? Could you share the difficulty? I am very strict about regressing the performance of things that we already have, but I try not to make a policy that a new feature must be as fast as it could ever be. That could result in us having very few new features. Of course, it also matters how frequently the overhead is likely to be incurred. A little overhead on each tuple visibility check is a much bigger problem than a little overhead on each CREATE TABLE statement, which in turn is a much bigger problem than a little overhead on each connection attempt. For good performance, connections must last a while, so a little extra time setting one up doesn't seem like a showstopper to me. Anyway, anybody who finds this mechanism too expensive doesn't have to use it; they can implement something else instead. Also, I am not saying that we should not change this in time for v10. I'm saying that I don't think it should be a requirement for the next patch to be committed in this area to introduce a whole new mechanism for determining whether something is a master or a standby. Love it or hate it, pgsql-jdbc has already implemented something in this area and it does something useful -- without requiring a wire protocol change. Now you and Kevin are trying to say that what they did is all wrong, but I don't agree. There are very many users for whom the pgsql-jdbc approach will do exactly what they need, and no doubt some for whom it won't. Getting a patch that mimics that approach committed is *progress*. Improving it afterwards, whether for v10 or some later release, is also good. There is a saying that one should not let the perfect be the enemy of the good. I believe that saying applies here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas > I am very strict about regressing the performance of things that we already > have, but I try not to make a policy that a new feature must be as fast > as it could ever be. That could result in us having very few new features. I see. I like your attitude toward new features. But I don't think now is the time to compromise this feature and rushto the commit. > Also, I am not saying that we should not change this in time for v10. > I'm saying that I don't think it should be a requirement for the next patch > to be committed in this area to introduce a whole new mechanism for > determining whether something is a master or a standby. Love it or hate > it, pgsql-jdbc has already implemented something in this area and it does > something useful -- without requiring a wire protocol change. Now you and > Kevin are trying to say that what they did is all wrong, but I don't agree. > There are very many users for whom the pgsql-jdbc approach will do exactly > what they need, and no doubt some for whom it won't. Getting a patch that > mimics that approach committed is *progress*. Improving it afterwards, > whether for v10 or some later release, is also good. transaction_read_only=on does not mean the standby. As the manual article on hot standby says, they are different. https://www.postgresql.org/docs/devel/static/hot-standby.html [Excerpt] -------------------------------------------------- In normal operation, “read-only” transactions are allowed to update sequences and to use LISTEN, UNLISTEN, and NOTIFY, soHot Standby sessions operate under slightly tighter restrictions than ordinary read-only sessions. It is possible thatsome of these restrictions might be loosened in a future release. ... Users will be able to tell whether their session is read-only by issuing SHOW transaction_read_only. In addition, a set offunctions (Table 9.79, “Recovery Information Functions”) allow users to access information about the standby server. Theseallow you to write programs that are aware of the current state of the database. These can be used to monitor the progressof recovery, or to allow you to write complex programs that restore the database to particular states. -------------------------------------------------- I'm afraid that if the current patch is committed, you will lose interest in the ideal solution. Then if the current patchis out as v10, there would be a concern about incompatibility when we pursue the ideal solution in a later release. That is, "should we continue to report that this server is standby even if it's actually a primary with transaction_read_onlyis on, to maintain compatibility with the older release." If you want to connect to a server where the transaction is read-only, then shouldn't the connection parameter be somethinglike "target_session_attrs=readonly"? That represents exactly what the code does. > There is a saying that one should not let the perfect be the enemy of the > good. I believe that saying applies here. True, so I suggested not including the support for older servers for a while. Shall we find the real enemy -- what's preventingthe ideal solution? I know my knowledge is still far less than you, so I may be missing something difficult. So, I'd like Mithun to share the difficulty. Regards Takayuki Tsunakawa
On 21 November 2016 at 00:08, Mithun Cy <mithun.cy@enterprisedb.com> wrote: >> Please avoid adding another round trip by using a GUC_REPORTed variable >> (ParameterStatus entry). If you want to support this libpq failover with >> >pre-10 servers, you can switch the method of determining the primary based >> on the server version. But I don't think it's worth supporting older >> servers > at the price of libpq code complexity. > > Currently there is no consensus around this. For now, making this patch to > address failover to next primary as similar to JDBC seems sufficient for me. > On next proposal of patch I think we can try to extend as you have proposed FWIW, I agree. Working first, then improved. >> I haven't tracked the progress of logical replication, but will >> target_server_type be likely to be usable with it? How will >> target_server_type fit logical > replication? > > I tried to check logical replication WIP patch, not very sure how to > accomodate same. Logical replication downstreams won't force transactions to read-only. A significant part of the appeal of logical replication is that you can use temp tables and unlogged tables on the downstream, and even use persistent tables so long as they don't clash with the upstream. More sophisticated models even allow the downstream to safely write to replicated tables by doing conflict resolution. So as it stands, this patch would consider any logical replication downstream a 'master'. That's fine for now IMO. Determining whether a server is a logical replica isn't that simple, nor is it a clear-cut yes/no answer, and I think it's out of the scope of this patch to deal with it. Figure it out once logical replication lands. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Sun, Nov 13, 2016 at 9:02 PM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote: > From: pgsql-hackers-owner@postgresql.org >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas >> Great, committed. There's still potentially more work to be done here, >> because my patch omits some features that were present in Victor's original >> submission, like setting the failover timeout, optionally randomizing the >> order of the hosts, and distinguishing between master and standby servers; >> Victor, or anyone, please feel free to submit separate patches for those >> things. > > The attached patch fixes some bugs and make a clarification for doc. Could you check and test the authentication stuffas I don't have an environment at hand? I don't have an environment at hand, either, but I think your fixes are correct, so I have committed them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Nov 22, 2016 at 8:38 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote: > transaction_read_only=on does not mean the standby. As the manual article on hot standby says, they are different. > I'm afraid that if the current patch is committed, you will lose interest in the ideal solution. I agree with Robert that lacking the ideal solution shouldn't block a good enough solution today. But I find the current patch as it stands too confusing to pass the "good enough" bar. > Then if the current patch is out as v10, there would be a concern about incompatibility when we pursue the ideal solutionin a later release. That is, "should we continue to report > that this server is standby even if it's actually aprimary with transaction_read_only is on, to maintain compatibility with the older release." Agreed. I am worried that this will end up being a wart: target_server_type=primary doesn't really mean primary it means that by default the session is not read only which by the way is usually the case for primaries but that can be changed. > If you want to connect to a server where the transaction is read-only, then shouldn't the connection parameter be somethinglike "target_session_attrs=readonly"? That represents exactly what the code does. FWIW I find this to be a reasonable compromise. To keep the analogy with the current patch it would be more something like "target_session_attrs=read_write|any" and the docs could point out "by the way, if you didn't tweak default_transaction_read_only read_write will only pick a primary so this can be used for failover". The functionality is the same but changing the name removes the ambiguity in the semantics and leaves the door open for a future real primary/replica functionality which includes logical replication with whatever semantics are deemed appropriate then. We do lose pgjdbc compatibility but the current patch doesn't have that 100% anyway: in pgjdbc it's called targetServerType and it accepts master instead of primary and it also accepts slave and preferSlave I notice at https://github.com/pgjdbc/www/blob/master/documentation/head/connect.md that pgjdbc says "The master/slave distinction is currently done by observing if the server allows writes" which, due to the "currently" part seems to acknowledge this is not ideal and leaves the door open for changes in semantics but I don't think changing semantics is going to fly for Postgres itself.
Attachment
On Thu, Nov 24, 2016 at 7:16 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Wed, Nov 23, 2016 at 10:19 PM, Catalin Iacob <iacobcatalin@gmail.com> > wrote: > On Tue, Nov 22, 2016 at 8:38 AM, Tsunakawa, Takayuki > <tsunakawa.takay@jp.fujitsu.com> wrote: > >> If you want to connect to a server where the transaction is read-only, > then shouldn't the connection parameter be something like >>>"target_session_attrs=readonly"? That represents exactly what the code > does. > > >FWIW I find this to be a reasonable compromise. To keep the analogy > >with the current patch it would be more something like > "target_session_attrs=read_write|any". > > I have taken this suggestion now renamed target_server_type to > target_session_attrs with possible 2 values "read-write", "any". I didn't hear any objections to this approach and, after some thought, I think it's good. So I've committed this after rewriting the documentation, some cosmetic cleanup, and changing the logic so that if one IP for a host turns out to be read-only, we skip all remaining IPs for that host, not just the one that we tested already. This seems likely to be what users will want. I recognize that this isn't going to be please everybody 100%, but there's still room for followup patches to improve things further. One thing I really like about the target_session_attrs renaming is that it seems to leave the door open to a variety of things we might want to do later. The "read-write" value we now support is closely related to the query we use for testing ("show transaction_read_only"). If we later want to filter based on some other server property, we can add additional values with associated SQL commands for each. Of course that could get a little crazy if overdone, but hopefully we won't overdo it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Nov 24, 2016 at 5:46 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > I have taken this suggestion now renamed target_server_type to > target_session_attrs with possible 2 values "read-write", "any". > May be we could expand to "readonly" and "prefer-readonly" in next patch > proposal. Attaching the patch for same. I was doing some testing with the patch and I found some inconsistency in the error message. I've a read-only server running on port 5433 and no server on 5436 and 5438. command: bin/psql 'postgresql://localhost:5436,localhost:5433,localhost:5438/postgres?target_session_attrs=read-write' I get the following error message. psql: could not make a writable connection to server "localhost:5433" could not connect to server: Connection refused Is the server running on host "localhost" (::1) and accepting TCP/IPconnections on port 5438? could not connect to server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IPconnections on port 5438? It didn't show any error message for port 5436. But, if I modify the connection string as following: command: bin/psql 'postgresql://localhost:5433,localhost:5436,localhost:5438/postgres?target_session_attrs=read-write' I get the following error message: psql: could not make a writable connection to server "localhost:5433" could not connect to server: Connection refused Is the server running on host "localhost" (::1) and accepting TCP/IPconnections on port 5436? could not connect to server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IPconnections on port 5436? could not connect to server: Connection refused Is the server running on host "localhost" (::1) and accepting TCP/IPconnections on port 5438? could not connect to server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IPconnections on port 5438? -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
On Tue, Nov 29, 2016 at 2:19 PM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: > On Thu, Nov 24, 2016 at 5:46 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: >> I have taken this suggestion now renamed target_server_type to >> target_session_attrs with possible 2 values "read-write", "any". >> May be we could expand to "readonly" and "prefer-readonly" in next patch >> proposal. Attaching the patch for same. > I was doing some testing with the patch and I found some inconsistency > in the error message. > I've a read-only server running on port 5433 and no server on 5436 and 5438. > > command: bin/psql > 'postgresql://localhost:5436,localhost:5433,localhost:5438/postgres?target_session_attrs=read-write' > > I get the following error message. > > psql: could not make a writable connection to server "localhost:5433" > could not connect to server: Connection refused > Is the server running on host "localhost" (::1) and accepting > TCP/IP connections on port 5438? > could not connect to server: Connection refused > Is the server running on host "localhost" (127.0.0.1) and accepting > TCP/IP connections on port 5438? > > It didn't show any error message for port 5436. But, if I modify the > connection string as following: > > command: bin/psql > 'postgresql://localhost:5433,localhost:5436,localhost:5438/postgres?target_session_attrs=read-write' > > I get the following error message: > > psql: could not make a writable connection to server "localhost:5433" > could not connect to server: Connection refused > Is the server running on host "localhost" (::1) and accepting > TCP/IP connections on port 5436? > could not connect to server: Connection refused > Is the server running on host "localhost" (127.0.0.1) and accepting > TCP/IP connections on port 5436? > could not connect to server: Connection refused > Is the server running on host "localhost" (::1) and accepting > TCP/IP connections on port 5438? > could not connect to server: Connection refused > Is the server running on host "localhost" (127.0.0.1) and accepting > TCP/IP connections on port 5438? Hmm, maybe the query buffer is getting cleared someplace in there. We might need to save/restore it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Nov 29, 2016 at 3:14 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Nov 29, 2016 at 2:19 PM, Kuntal Ghosh > <kuntalghosh.2007@gmail.com> wrote: >> On Thu, Nov 24, 2016 at 5:46 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: >>> I have taken this suggestion now renamed target_server_type to >>> target_session_attrs with possible 2 values "read-write", "any". >>> May be we could expand to "readonly" and "prefer-readonly" in next patch >>> proposal. Attaching the patch for same. >> I was doing some testing with the patch and I found some inconsistency >> in the error message. >> I've a read-only server running on port 5433 and no server on 5436 and 5438. >> >> command: bin/psql >> 'postgresql://localhost:5436,localhost:5433,localhost:5438/postgres?target_session_attrs=read-write' >> >> I get the following error message. >> >> psql: could not make a writable connection to server "localhost:5433" >> could not connect to server: Connection refused >> Is the server running on host "localhost" (::1) and accepting >> TCP/IP connections on port 5438? >> could not connect to server: Connection refused >> Is the server running on host "localhost" (127.0.0.1) and accepting >> TCP/IP connections on port 5438? >> >> It didn't show any error message for port 5436. But, if I modify the >> connection string as following: >> >> command: bin/psql >> 'postgresql://localhost:5433,localhost:5436,localhost:5438/postgres?target_session_attrs=read-write' >> >> I get the following error message: >> >> psql: could not make a writable connection to server "localhost:5433" >> could not connect to server: Connection refused >> Is the server running on host "localhost" (::1) and accepting >> TCP/IP connections on port 5436? >> could not connect to server: Connection refused >> Is the server running on host "localhost" (127.0.0.1) and accepting >> TCP/IP connections on port 5436? >> could not connect to server: Connection refused >> Is the server running on host "localhost" (::1) and accepting >> TCP/IP connections on port 5438? >> could not connect to server: Connection refused >> Is the server running on host "localhost" (127.0.0.1) and accepting >> TCP/IP connections on port 5438? > > Hmm, maybe the query buffer is getting cleared someplace in there. We > might need to save/restore it. Not the query buffer. conn->errorMessage. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
PQsendQuery()->PQsendQueryStart()->resetPQExpBuffer(&conn->errorMessage);
Attachment
On Thu, Dec 1, 2016 at 5:00 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Wed, Nov 30, 2016 at 7:14 AM, Mithun Cy <mithun.cy@enterprisedb.com> > wrote: >> Thanks, send query resets the errorMessage. Will fix same. >> PQsendQuery()->PQsendQueryStart()->resetPQExpBuffer(&conn->errorMessage); > > Please find the patch which fixes this bug. conn->errorMessage as per > definition only stores current error message, a new member > conn->savedMessage is introduced to save and restore error messages related > to previous hosts which we failed to connect. Couldn't this just be a variable in PQconnectPoll(), instead of adding a new structure member? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Sun, Dec 4, 2016 at 11:00 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Fri, Dec 2, 2016 at 8:54 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> Couldn't this just be a variable in PQconnectPoll(), instead of adding >> a new structure member? > > I have fixed same with a local variable in PQconnectPoll, Initally I thought > savedMessage should have same visiblitly as errorMessage so we can restore > the errorMessage even outside PQconnectPoll. But that seems not required. > Attacting the new patch which fixes the same. I think that you need a restoreErrorMessage call here: /* Skip any remaining addresses for this host. */ conn->addr_cur = NULL; if (conn->whichhost + 1 < conn->nconnhost) { conn->status = CONNECTION_NEEDED; restoreErrorMessage(conn, &savedMessage); goto keep_going; } Updated patch attached with that change, the removal of a useless hunk, and the removal of "inline" from the new functions, which seems like second-guessing of whatever decision the compiler chooses to make. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 05/12/16 17:00, Mithun Cy wrote: [...] > errorMessage even outside PQconnectPoll. But that seems not required. > Attacting the new patch which fixes the same. > > -- > Thanks and Regards > Mithun C Y > EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/> [...] Is that meant to be 'attaching' or 'attacking'??? <ducks, and runs away VERY quickly> MORE SERIOUSLY thanks for your and others efforts for making pg great! Cheers, Gavin
On Mon, Dec 5, 2016 at 1:59 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Mon, Dec 5, 2016 at 11:23 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>I think that you need a restoreErrorMessage call here: >> /* Skip any remaining addresses for this host. */ >> conn->addr_cur = NULL; >> if (conn->whichhost + 1 < conn->nconnhost) >> { >> conn->status = CONNECTION_NEEDED >> restoreErrorMessage(conn, &savedMessage); >> goto keep_going; >> } > > Right after seeing transaction is read-only we have restored the saved > message so I think we do not need one more restore there. > if (strncmp(val, "on", 2) == 0) > { > PQclear(res); > + restoreErrorMessage(conn, &savedMessage); D'oh! You're correct, of course. Committed without that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company