Thread: Automatic Client Failover
When primary server fails, it would be good if the clients connected to the primary knew to reconnect to the standby servers automatically. We might want to specify that centrally and then send the redirection address to the client when it connects. Sounds like lots of work though. Seems fairly straightforward to specify a standby connection service at client level: .pgreconnect, or pgreconnect.conf No config, then option not used. Would work with various forms of replication. Implementation would be to make PQreset() try secondary connection if the primary one fails to reset. Of course you can program this manually, but the feature is that you wouldn't need to, nor would you need to request changes to 27 different interfaces either. Good? Bad? Ugly? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Mon, Aug 4, 2008 at 5:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > When primary server fails, it would be good if the clients connected to > the primary knew to reconnect to the standby servers automatically. This would be a nice feature which many people I've talked to have asked for. In Oracle-land, it's called Transparent Application Failover (TAF) and it gives you a lot of options, including the ability to write your own callbacks when a failover is detected. +1 -- Jonah H. Harris, Senior DBA myYearbook.com
On Monday 04 August 2008 14:08, Simon Riggs wrote: > When primary server fails, it would be good if the clients connected to > the primary knew to reconnect to the standby servers automatically. > > We might want to specify that centrally and then send the redirection > address to the client when it connects. Sounds like lots of work though. > > Seems fairly straightforward to specify a standby connection service at > client level: .pgreconnect, or pgreconnect.conf > No config, then option not used. Well, it's less simple, but you can already do this with pgPool on the client machine. -- --Josh Josh Berkus PostgreSQL San Francisco
On Mon, Aug 4, 2008 at 5:39 PM, Josh Berkus <josh@agliodbs.com> wrote: > Well, it's less simple, but you can already do this with pgPool on the > client machine. Yeah, but if you have tens or hundreds of clients, you wouldn't want to be installing/managing a pgpool on each. Similarly, I think an application should have the option of being notified of a connection change; I know that wasn't in Simon's proposal, but I've found it necessary in several applications which rely on things such as temporary tables. You don't want the app just blowing up because a table doesn't exist; you want to be able to handle it gracefully. -- Jonah H. Harris, Senior DBA myYearbook.com
"Jonah H. Harris" <jonah.harris@gmail.com> writes: > On Mon, Aug 4, 2008 at 5:39 PM, Josh Berkus <josh@agliodbs.com> wrote: >> Well, it's less simple, but you can already do this with pgPool on the >> client machine. > Yeah, but if you have tens or hundreds of clients, you wouldn't want > to be installing/managing a pgpool on each. Huh? The pgpool is on the server, not on the client side. There is one really bad consequence of the oversimplified failover design that Simon proposes, which is that clients might try to fail over for reasons other than a primary server failure. (Think network partition.) You really want any such behavior to be managed centrally, IMHO. regards, tom lane
On Mon, 2008-08-04 at 22:08 +0100, Simon Riggs wrote: > When primary server fails, it would be good if the clients connected to > the primary knew to reconnect to the standby servers automatically. > > We might want to specify that centrally and then send the redirection > address to the client when it connects. Sounds like lots of work though. One way to do it is _outside_ of client, by having a separately managed subnet for logical DB addresses. So when a failover occurs, then you move that logical DB address to the new host, flush ARP caches and just reconnect. This also solves the case of inadvertent failover in case of unrelated network failure. > Seems fairly straightforward to specify a standby connection service at > client level: .pgreconnect, or pgreconnect.conf > No config, then option not used. > > Would work with various forms of replication. > > Implementation would be to make PQreset() try secondary connection if > the primary one fails to reset. Of course you can program this manually, > but the feature is that you wouldn't need to, nor would you need to > request changes to 27 different interfaces either. > > Good? Bad? Ugly? > > -- > Simon Riggs www.2ndQuadrant.com > PostgreSQL Training, Services and Support > >
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, Le 5 août 08 à 01:13, Tom Lane a écrit : > There is one really bad consequence of the oversimplified failover > design that Simon proposes, which is that clients might try to fail > over > for reasons other than a primary server failure. (Think network > partition.) You really want any such behavior to be managed > centrally, > IMHO. Then, what about having pgbouncer capability into -core. This would probably mean, AFAIUI, than the listen()ing process would no longer be postmaster but a specialized one, with the portable poll()/ select()/... process, that is now know as pgbouncer. Existing pgbouncer would have to be expanded to: - provide a backward compatible mode (session pooling, release serversession at client closing time) - allow to configure several backend servers and to try next on certain conditions - add hooks for clients to know when some events happen (failure of current master, automatic switchover,etc) Existing pgbouncer hooks and next ones could be managed with catalog tables as we have special options table for autovacuum, e.g., pg_connection_pool, which could contain arbitrary SQL for new backend fork, backend closing, failover, switchover, etc; and maybe the client hooks would be NOTIFY messages sent from the backend at its initiative. Would we then have the centrally managed behavior Tom is mentioning? I'm understanding this in 2 ways: - this extension would be able to distinguish between failure cases where we are able to do an automatic failover from "hard" crashes (impacting the listener) - when we have read-only slave(s) pgbouncer will be able to redirect ro statements to it. Maybe it would even be useful to see about Markus' work in Postgres-R and its inter-backend communication system allowing the executor to require more than one backend working on a single query. The pgbouncer inherited system would then be a pre-forked backend pooling manager too... Once more, I hope that giving (not so) random ideas here as a (not yet) pgsql hacker is helping the project more than it's disturbing real work... Regards, - -- dim -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iEYEARECAAYFAkiXk5gACgkQlBXRlnbh1bkBhACfQdgHh27yGeyHgeCrC7aV1LET U4IAn1N6FaanI2BEWMLyPWKmGtedaSQC =ifVF -----END PGP SIGNATURE-----
Dimitri Fontaine <dfontaine@hi-media.com> writes: > Le 5 ao�t 08 � 01:13, Tom Lane a �crit : >> There is one really bad consequence of the oversimplified failover >> design that Simon proposes, which is that clients might try to fail >> over for reasons other than a primary server failure. (Think network >> partition.) You really want any such behavior to be managed >> centrally, IMHO. > Then, what about having pgbouncer capability into -core. This would > probably mean, AFAIUI, than the listen()ing process would no longer > be postmaster but a specialized one, Huh? The problem case is that the primary server goes down, which would certainly mean that a pgbouncer instance on the same machine goes with it. So it seems to me that integrating pgbouncer is 100% backwards. Failover that actually works is not something we can provide with trivial changes to Postgres. It's really a major project in its own right: you need heartbeat detection, STONITH capability, IP address redirection, etc. I think we should be recommending external failover-management project(s) instead of offering a half-baked home-grown solution. Searching freshmeat for "failover" finds plenty of potential candidates, but not having used any of them I'm not sure which are worth closer investigation. regards, tom lane
Tom, > Failover that actually works is not something we can provide with > trivial changes to Postgres. I think the proposal was for an extremely simple "works 75% of the time" failover solution. While I can see the attraction of that, the consequences of having failover *not* work are pretty severe. On the other hand, we will need to deal with this for the built-in replication project. -- --Josh Josh Berkus PostgreSQL San Francisco
On Mon, Aug 04, 2008 at 05:17:59PM -0400, Jonah H. Harris wrote: > On Mon, Aug 4, 2008 at 5:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > When primary server fails, it would be good if the clients connected to > > the primary knew to reconnect to the standby servers automatically. > > This would be a nice feature which many people I've talked to have > asked for. In Oracle-land, it's called Transparent Application > Failover (TAF) and it gives you a lot of options, including the > ability to write your own callbacks when a failover is detected. This might be better done as part of a proxy server, eg pgbouncer, pgpool than as part of postgresql or libpq. I like the concept, but the logic to determine when a failover has occurred is complex and a client will often not have access to enough information to make this determination accurately. postgresql could have hooks to support this though, ie to determine when a standby thinks it has become the master. -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects.
Josh Berkus <josh@agliodbs.com> writes: > I think the proposal was for an extremely simple "works 75% of the time" > failover solution. While I can see the attraction of that, the > consequences of having failover *not* work are pretty severe. Exactly. The point of failover (or any other HA feature) is to get several nines worth of reliability. "It usually works" is simply not playing in the right league. > On the other hand, we will need to deal with this for the built-in > replication project. Nope, that's orthogonal. A failover solution depends on having a master and a slave database, but it has nothing directly to do with how those DBs are synchronized. regards, tom lane
On Mon, 2008-08-04 at 22:56 -0400, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > > I think the proposal was for an extremely simple "works 75% of the time" > > failover solution. While I can see the attraction of that, the > > consequences of having failover *not* work are pretty severe. > > Exactly. The point of failover (or any other HA feature) is to get > several nines worth of reliability. "It usually works" is simply > not playing in the right league. Why would you all presume that I haven't thought about the things you mention? Where did I say "...and this would be the only feature required for full and correct HA failover." The post is specifically about Client Failover, as the title clearly states. Your comments were illogical anyway, since if it was so bad a technique then it would not work for pgpool either, since it is also a client. If pgpool can do this, why can't another client? Why can't *all* clients? With correctly configured other components the primary will shut down if it is no longer the boss. The client will then be disconnected. If it switches to its secondary connection, we can have an option to read session_replication_role to ensure that this is set to origin. This covers the case where the client has lost connection with primary, though it is still up, yet can reach the standby which has not changed state. DB2, SQLServer and Oracle all provide this feature, BTW. We don't need to follow, but we should do that consciously. I'm comfortable with us deciding not to do it, if that is our considered judgement. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Greg On 5-Aug-08, at 12:15 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > > There is one really bad consequence of the oversimplified failover > design that Simon proposes, which is that clients might try to fail > over > for reasons other than a primary server failure. (Think network > partition.) You really want any such behavior to be managed > centrally, > IMHO. The alternative to a cwnrallu managed failover system is one based on a quorum system. At first glance it seems to me that would fit our use case better. But the point remains that we would be better off adopting a complete system than trying to reinvent one.
Le mardi 05 août 2008, Tom Lane a écrit : > Huh? The problem case is that the primary server goes down, which would > certainly mean that a pgbouncer instance on the same machine goes with > it. So it seems to me that integrating pgbouncer is 100% backwards. With all due respect, it seems to me you're missing an important piece of the scheme here: I certainly failed to explain correctly. Of course, I'm not sure (by and large) that detailing what I have in mind will answer your concerns, but still... What I have in mind is having the pgbouncer listening process both at master and slave sites. So your clients can already connect to slave for normal operations, and the listener process simply connects them to the master, transparently. When we later provider RO slave, some queries could be processed locally instead of getting sent to the master. The point being that the client does not have to care itself whether it's connecting to a master or a slave, -core knows what it can handle for the client and handles it (proxying the connection). Now, that does not solve the client side automatic failover per-se, it's another way to think about it:- both master & slave accept connection in any mode- master & slave are able to "speak" toeach other (life link)- when master knows it's crashing (elog(FATAL)), it can say so to the slave- when said so, slavecan switch to master It obviously only catches some errors on master, the ones we're able to log about. So it does nothing on its own for allowing HA in case of master crash. But... > Failover that actually works is not something we can provide with > trivial changes to Postgres. It's really a major project in its > own right: you need heartbeat detection, STONITH capability, > IP address redirection, etc. I think we should be recommending > external failover-management project(s) instead of offering a > half-baked home-grown solution. Searching freshmeat for "failover" > finds plenty of potential candidates, but not having used any of > them I'm not sure which are worth closer investigation. We have worked here with heartbeat, and automating failover is hard. Not for technical reasons only, also because:- current PostgreSQL offers no sync replication, switching means trading or losingthe D in ACID,- you do not want to lose any commited data. If 8.4 resolve this, failover implementation will be a lot easier. What I see my proposal fit is the ability to handle a part of the smartness in -core directly, so the hard part of the STONITH/failover/switchback could be implemented in cooperation with -core, not playing tricks against it. For example, switching back when master gets back online would only means for the master to tell the slave to now redirect the queries to him as soon as it's ready --- which still is the hard part, sync back data. Having clients able to blindly connect to master or any slave and having the current cluster topology smartness into -core would certainly help here, even if not fullfilling all HA goals. Of course, in the case of master hard crash, we still have to get sure it won't restart on its own, and we have to have an external way to get a chosen slave become the master. I'm even envisioning than -core could help STONITH projects with having sth like the recovery.conf file for the master to restart in not-up-to-date slave mode. Whether we implement resyncing to the new master in -core or from external scripts is another concern, but certainly -core could help here (even if not in 8.4, of course). I'm still thinking that this proposal has a place in the scheme of an integrated HA solution and offers interresting bits. Regards, -- dim
On Tue, 2008-08-05 at 07:52 +0100, Simon Riggs wrote: > On Mon, 2008-08-04 at 22:56 -0400, Tom Lane wrote: > > Josh Berkus <josh@agliodbs.com> writes: > > > I think the proposal was for an extremely simple "works 75% of the time" > > > failover solution. While I can see the attraction of that, the > > > consequences of having failover *not* work are pretty severe. > > > > Exactly. The point of failover (or any other HA feature) is to get > > several nines worth of reliability. "It usually works" is simply > > not playing in the right league. > > Why would you all presume that I haven't thought about the things you > mention? Where did I say "...and this would be the only feature required > for full and correct HA failover." The post is specifically about Client > Failover, as the title clearly states. I guess having the title "Automatic Client Failover" suggest to most readers, that you are trying to solve the client side separately from server. > Your comments were illogical anyway, since if it was so bad a technique > then it would not work for pgpool either, since it is also a client. If > pgpool can do this, why can't another client? Why can't *all* clients? IIRC pgpool was itself a poor-mans replication solution, so it _is_ the point of doing failover. > With correctly configured other components the primary will shut down if > it is no longer the boss. The client will then be disconnected. If it > switches to its secondary connection, we can have an option to read > session_replication_role to ensure that this is set to origin. Probably this should not be an option, but a must. maybe session_replication_role should be a DBA-defined function, so that the same client failover mechanism can be applied to different replication solutions, both server-built-in and external. create function session_replication_role() returns enum('master','ro-slave','please-wait-coming-online','...') $$ ... > This > covers the case where the client has lost connection with primary, > though it is still up, yet can reach the standby which has not changed > state. > > DB2, SQLServer and Oracle all provide this feature, BTW. We don't need > to follow, but we should do that consciously. I'm comfortable with us > deciding not to do it, if that is our considered judgement. The main argument seemed to be, that it can't be "Automatic Client-ONLY Failover." -------------- Hannu
On Tue, 2008-08-05 at 11:50 +0300, Hannu Krosing wrote: > On Tue, 2008-08-05 at 07:52 +0100, Simon Riggs wrote: > > On Mon, 2008-08-04 at 22:56 -0400, Tom Lane wrote: > > > Josh Berkus <josh@agliodbs.com> writes: > > > > I think the proposal was for an extremely simple "works 75% of the time" > > > > failover solution. While I can see the attraction of that, the > > > > consequences of having failover *not* work are pretty severe. > > > > > > Exactly. The point of failover (or any other HA feature) is to get > > > several nines worth of reliability. "It usually works" is simply > > > not playing in the right league. > > > > Why would you all presume that I haven't thought about the things you > > mention? Where did I say "...and this would be the only feature required > > for full and correct HA failover." The post is specifically about Client > > Failover, as the title clearly states. > > I guess having the title "Automatic Client Failover" suggest to most > readers, that you are trying to solve the client side separately from > server. Yes, that's right: separately. Why would anybody presume I meant "and by the way you can turn off all other HA measures not mentioned here"? Not mentioning a topic means no change or no impact in that area, at least on all other hackers threads. > > Your comments were illogical anyway, since if it was so bad a technique > > then it would not work for pgpool either, since it is also a client. If > > pgpool can do this, why can't another client? Why can't *all* clients? > > IIRC pgpool was itself a poor-mans replication solution, so it _is_ the > point of doing failover. Agreed. > > With correctly configured other components the primary will shut down if > > it is no longer the boss. The client will then be disconnected. If it > > switches to its secondary connection, we can have an option to read > > session_replication_role to ensure that this is set to origin. > > Probably this should not be an option, but a must. Perhaps, but some people doing read only queries don't really care which one they are connected to. > maybe session_replication_role should be a DBA-defined function, so that > the same client failover mechanism can be applied to different > replication solutions, both server-built-in and external. > > create function session_replication_role() > returns enum('master','ro-slave','please-wait-coming-online','...') > $$ > ... Maybe, trouble is "please wait coming online" is the message a Hot Standby would give also. Happy to list out all the states so we can make this work for everyone. > > This > > covers the case where the client has lost connection with primary, > > though it is still up, yet can reach the standby which has not changed > > state. > > > > DB2, SQLServer and Oracle all provide this feature, BTW. We don't need > > to follow, but we should do that consciously. I'm comfortable with us > > deciding not to do it, if that is our considered judgement. > > The main argument seemed to be, that it can't be "Automatic Client-ONLY > Failover." No argument. Never was. It can't be. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi, Tom Lane wrote: > Huh? The pgpool is on the server, not on the client side. Not necessarily. Having pgpool on the client side works just as well. > There is one really bad consequence of the oversimplified failover > design that Simon proposes, which is that clients might try to fail over > for reasons other than a primary server failure. Why is that? It's just fine for a client to (re)connect to another server due to a fluky connection to the current server. I had something pretty similar in mind for Postgres-R. (Except that we should definitely allow to specify more than just a primary and a secondary server.) > (Think network partition.) Uh... well, yeah, of course the servers themselves need to exchange their state and make sure they only accept clients if they are up and running (as seen by the cluster). That's what the 'view' of a GCS is all about. Or STONITH, for that matter. > You really want any such behavior to be managed centrally,> IMHO. Controlling that client behavior reliably would involve using multiple (at least N+1) connections to different servers, so you can control the client even if N of the servers fail. That's certainly more complex than what Simon proposed. Speaking in terms of orthogonality, client failover is orthogonal to the (cluster-wide) server state management. Which in turn is orthogonal to how the nodes replicate data. (Modulo some side effects like nodes lagging behind for async replication...) Regards Markus Wanner
Hi, Greg Stark wrote: > a cwnrallu What is that? Regards Markus Wanner
Hi, Simon Riggs wrote: > On Tue, 2008-08-05 at 11:50 +0300, Hannu Krosing wrote: >> I guess having the title "Automatic Client Failover" suggest to most >> readers, that you are trying to solve the client side separately from >> server. > > Yes, that's right: separately. Why would anybody presume I meant "and by > the way you can turn off all other HA measures not mentioned here"? Not > mentioning a topic means no change or no impact in that area, at least > on all other hackers threads. I think the pgbouncer-in-core idea caused some confusion here. IMO the client failover method is very to what DNS round-robin setups do for webservers: even if clients might failover 'automatically', you still have to maintain the server states (which servers do you list in the DNS?) and care about 'replication' of your site to the webservers. Regards Markus Wanner
Le mardi 05 août 2008, Markus Wanner a écrit : > > (Think network partition.) > > Uh... well, yeah, of course the servers themselves need to exchange > their state and make sure they only accept clients if they are up and > running (as seen by the cluster). That's what the 'view' of a GCS is all > about. Or STONITH, for that matter. That's where I'm thinking that some -core smartness would makes this part simpler, hence the confusion (sorry about that) on the thread. If slave nodes were able to accept connection and redirect them to master, the client wouldn't need to care about connecting to master or slave, just to connect to a live node. So the proposal for Automatic Client Failover becomes much more simpler. -- dim
Hi, Dimitri Fontaine wrote: > If slave nodes were able to accept connection and redirect them to master, the > client wouldn't need to care about connecting to master or slave, just to > connect to a live node. I've thought about that as well, but think about it this way: to protect against N failing nodes, you need to forward *every* request through N living nodes, before actually hitting the node which processes the query. To me, that sounds like an awful lot of traffic within the cluster, which can easily be avoided with automatic client failover. (Why are you stating, that only slaves need to redirect? What is happening in case of a master failure?) > So the proposal for Automatic Client Failover becomes much more simpler. I'm arguing it's the other way around: taking down a node of the cluster becomes much simpler with ACF, because clients automatically reconnect to another node themselves. The servers don't need to care. Regards Markus Wanner
Le mardi 05 août 2008, Markus Wanner a écrit : > I've thought about that as well, but think about it this way: to protect > against N failing nodes, you need to forward *every* request through N > living nodes, before actually hitting the node which processes the > query. To me, that sounds like an awful lot of traffic within the > cluster, which can easily be avoided with automatic client failover. > > (Why are you stating, that only slaves need to redirect? What is > happening in case of a master failure?) I'm thinking in term of single master multiple slaves scenario... In single master case, each slave only needs to know who the current master is and if itself can process read-only queries (locally) or not. You seem to be thinking in term of multi-master, where the choosing of a master node is a different concern, as a failing master does not imply slave promotion. > > So the proposal for Automatic Client Failover becomes much more simpler. > > I'm arguing it's the other way around: taking down a node of the cluster > becomes much simpler with ACF, because clients automatically reconnect > to another node themselves. The servers don't need to care. Well, in the single master case I'm not sure to agree, but in the case of multi master configuration, it well seems that choosing some alive master is a client task. Now what about multi-master multi-slave case? Does such a configuration have sense? It this ever becomes possible (2 active/active masters servers, with some slaves for long running queries, e.g.), then you may want the ACF-enabled connection routine to choose to connect to any master or slave in the pool, and have the slave be itself an AFC client to target some alive master. Does this still makes sense? -- dim
Hi, Dimitri Fontaine wrote: > I'm thinking in term of single master multiple slaves scenario... > In single master case, each slave only needs to know who the current master is > and if itself can process read-only queries (locally) or not. I don't think that's as trivial as you make it sound. I'd rather put it as: all nodes need to agree on exactly one master node at any given point in time. However, IMO that has nothing to do with automatic client failover. > You seem to be thinking in term of multi-master, where the choosing of a > master node is a different concern, as a failing master does not imply slave > promotion. I'm thinking about the problem which AFC tries to solve: connection losses between the client and one of the servers (no matter if it's a master or a slave). As opposed to a traditional single-node database, there might be other servers available to connect to, once a client lost the current connection (and thus suspects the server behind that connection to have gone down). Redirecting writing transactions from slaves to the master node solves another problem. Being able to 'rescue' such forwarded connections in case of a failure of the master is just a nice side effect. But it doesn't solve the problem of connection losses between a client and the master. > Well, in the single master case I'm not sure to agree, but in the case of > multi master configuration, it well seems that choosing some alive master is > a client task. Given a failure of the master server, how do you expect clients, which were connected to that master server, to "failover"? Some way or another, they need to be able to (re)connect to one of the slaves (which possibly turned into the new master by then). Of course, you can load that burden on the application, and simply let that try to connect to another server upon connection failures. AFAIU Simon is proposing to put that logic into libpq. I see merits in that for multiple replication solutions and don't think anything exclusively server-sided could solve the same issue (because the client currently only has one connection to one server, which might fail at any time). [ Please note that you still need the retry-loop in the application. It mainly saves having to care about the list of servers and server states in the app. ] > Now what about multi-master multi-slave case? Does such a configuration have > sense? Heh.. I'm glad you are asking. ;-) IMO the only reason for master-slave replication is ease of implementation. It's certainly not something a sane end-users is ever requesting by himself, because he needs that "feature". After all, not being able to run writing queries on certain nodes is not a feature, but a bare limitation. In your question, you are implicitly assuming an existing multi-master implementation. Given my reasoning, this would make an additional master-slave replication pretty useless. Thus I'm claiming that such a configuration does not make sense. > It this ever becomes possible (2 active/active masters servers, with some > slaves for long running queries, e.g.), then you may want the ACF-enabled > connection routine to choose to connect to any master or slave in the pool, You can do the same with multi-master replication, without any disadvantage. > and have the slave be itself an AFC client to target some alive master. Huh? AFC for master-slave communication? That implies that slaves are connected to the master(s) via libpq, which I think is not such a good fit. Regards Markus Wanner
Le mardi 05 août 2008, Markus Wanner a écrit : > Dimitri Fontaine wrote: > > I'm thinking in term of single master multiple slaves scenario... > > In single master case, each slave only needs to know who the current > > master is and if itself can process read-only queries (locally) or not. > > I don't think that's as trivial as you make it sound. I'd rather put it > as: all nodes need to agree on exactly one master node at any given > point in time. However, IMO that has nothing to do with automatic client > failover. Agreed, the idea is trying to help the AFC by reducing what I understood was its realm. It seems I'm misunderstanding the perimeter of the proposed change... And as for the apparent triviality, it resides only in the concept, and when you're confronted to nodes acting as master or slave depending on context (session_replication_role) it becomes more interresting. > I'm thinking about the problem which AFC tries to solve: connection > losses between the client and one of the servers (no matter if it's a > master or a slave). As opposed to a traditional single-node database, > there might be other servers available to connect to, once a client lost > the current connection (and thus suspects the server behind that > connection to have gone down). > > Redirecting writing transactions from slaves to the master node solves > another problem. Being able to 'rescue' such forwarded connections in > case of a failure of the master is just a nice side effect. But it > doesn't solve the problem of connection losses between a client and the > master. Agreed. It simply allows the ACF part not to bother with master(s) slave(s) topology, which still looks as a great win for me. > Given a failure of the master server, how do you expect clients, which > were connected to that master server, to "failover"? Some way or > another, they need to be able to (re)connect to one of the slaves (which > possibly turned into the new master by then). Yes, you still need ACF, I'm sure I never wanted to say anything against this. > IMO the only reason for master-slave replication is ease of > implementation. It's certainly not something a sane end-users is ever > requesting by himself, because he needs that "feature". After all, not > being able to run writing queries on certain nodes is not a feature, but > a bare limitation. I'm not agreeing here. I have replication needs where some data are only yo be edited by an admin backoffice, then replicated to servers. Those servers also write data (logs) which are to be sent to the main server (now a slave) which will compute stats on-the-fly (trigger based at replication receiving). Now, this configuration needs to be resistant to network failure of any node, central one included. So I don't want synchronous replication, thanks. And I don't want multi-master either, as I WANT to forbid central to edit data from the servers, and to forbid servers to edit data coming from the backoffice. Now, I certainly would appreciate having the central server not being a SPOF by having two masters both active at any time. Of course, if I want HA, whatever features and failure autodetection PostgreSQL gives me, I still need ACF. And if I get master/slave instead of master/master, I need STONITH and hearbeat or equivalent. I was just trying to propose ideas for having those external part as easy as possible to get right with whatever integrated solution comes from -core. > In your question, you are implicitly assuming an existing multi-master > implementation. Given my reasoning, this would make an additional > master-slave replication pretty useless. Thus I'm claiming that such a > configuration does not make sense. I disagree here, see above. > Huh? AFC for master-slave communication? That implies that slaves are > connected to the master(s) via libpq, which I think is not such a good fit. I'm using londiste (from Skytools), a master/slaves replication solution in python. I'm not sure whether the psycopg component is using libpq or implementing the fe protocol itself, but it seems to me in any case it would be a candidate to benefit from Simon's proposal. Regards, -- dim
Hi, Dimitri Fontaine wrote: >> Redirecting writing transactions from slaves to the master node solves >> another problem. Being able to 'rescue' such forwarded connections in >> case of a failure of the master is just a nice side effect. But it >> doesn't solve the problem of connection losses between a client and the >> master. > > Agreed. It simply allows the ACF part not to bother with master(s) slave(s) > topology, which still looks as a great win for me. Hm.. yeah, for master-slave replication I'm slowly beginning to see merit in it. However, given the lacking use of master-slave... > Yes, you still need ACF, I'm sure I never wanted to say anything against this. Ah, okay. I thought you were proposing an alternative. >> IMO the only reason for master-slave replication is ease of >> implementation. It's certainly not something a sane end-users is ever >> requesting by himself, because he needs that "feature". After all, not >> being able to run writing queries on certain nodes is not a feature, but >> a bare limitation. > > I'm not agreeing here. Somehow, I just knew it.. ;-) > I have replication needs where some data are only yo be edited by an admin > backoffice, then replicated to servers. Those servers also write data (logs) > which are to be sent to the main server (now a slave) which will compute > stats on-the-fly (trigger based at replication receiving). Sure, you can currently do that because there exist master-slave replication solutions which can do that. And that's perfectly fine. Comparing that with concepts of an inexistent multi-master replication solution is not fair by definition. > > Now, this configuration needs to be resistant to network failure of any node, > central one included. So I don't want synchronous replication, thanks. And I > don't want multi-master either, as I WANT to forbid central to edit data from > the servers, and to forbid servers to edit data coming from the backoffice. > > Now, I certainly would appreciate having the central server not being a SPOF > by having two masters both active at any time. > > Of course, if I want HA, whatever features and failure autodetection > PostgreSQL gives me, I still need ACF. And if I get master/slave instead of > master/master, I need STONITH and hearbeat or equivalent. > I was just trying to propose ideas for having those external part as easy as > possible to get right with whatever integrated solution comes from -core. > >> In your question, you are implicitly assuming an existing multi-master >> implementation. Given my reasoning, this would make an additional >> master-slave replication pretty useless. Thus I'm claiming that such a >> configuration does not make sense. > > I disagree here, see above. > >> Huh? AFC for master-slave communication? That implies that slaves are >> connected to the master(s) via libpq, which I think is not such a good fit. > > I'm using londiste (from Skytools), a master/slaves replication solution in > python. I'm not sure whether the psycopg component is using libpq or > implementing the fe protocol itself, but it seems to me in any case it would > be a candidate to benefit from Simon's proposal. > > Regards,
Hi, (sorry... I'm typing too fast and hitting the wrong keys... continuing the previous mail now...) Dimitri Fontaine wrote: > Now, this configuration needs to be resistant to network failure of any node, Yeah, increasing availability is the primary purpose of doing replication. > central one included. So I don't want synchronous replication, thanks. I do not understanding that reasoning. Synchronous replication is certainly *more* resilient to network failures, as it does *not* loose any data on failover. However, you are speaking about "logs" and "stats". That certainly sounds like data you can afford to loose during a failover, because you can easily recreate it. And as asynchronous replication is faster, that's why you should prefer async replication here, IMO. > And I > don't want multi-master either, as I WANT to forbid central to edit data from > the servers, and to forbid servers to edit data coming from the backoffice. Well, I'd say you are (ab)using replication as an access controlling method. That's not quite what it's made for, but you can certainly use it that way. As I understand master-slave replication, a slave should be able to take over from the master in case that one fails. In that case, the slave must suddenly become writable and your access controlling is void. In case you are preventing that, you are using replication only to transfer data and not to increase availability. That's fine, but it's quite a different use case. And something I admittedly haven't thought about. Thanks for pointing me to this use case of replication. We could probably combine Postgres-R (for multi-master replication) with londiste (to transfer selected data) asynchronously to other nodes. > Of course, if I want HA, whatever features and failure autodetection > PostgreSQL gives me, I still need ACF. Agreed. > And if I get master/slave instead of > master/master, I need STONITH and hearbeat or equivalent. A two-node setup with STONITH has the disadvantage, that you need manual intervention to bring up a crashed node again. (To remove the bullet from inside its head). I'm thus recommending to use at least three nodes for any kind of high-availability setup. Even if the third one only serves as a quorum and doesn't hold a replica of the data. It allows automation of node recovery, which does not only ease administration, but eliminates a possible source of errors. > I was just trying to propose ideas for having those external part as easy as > possible to get right with whatever integrated solution comes from -core. Yeah, that'd be great. However, ISTM that it's not quite clear, yet, what solution will get integrated into -core. >> Huh? AFC for master-slave communication? That implies that slaves are >> connected to the master(s) via libpq, which I think is not such a good fit. > > I'm using londiste (from Skytools), a master/slaves replication solution in > python. I'm not sure whether the psycopg component is using libpq or > implementing the fe protocol itself, but it seems to me in any case it would > be a candidate to benefit from Simon's proposal. Hm.. yeah, that might be true. On the other hand, the servers in the cluster need to keep track of their state anyway, so there's not that much to be gained here. Regards Markus Wanner
Le mardi 05 août 2008, Markus Wanner a écrit : > I do not understanding that reasoning. Synchronous replication is > certainly *more* resilient to network failures, as it does *not* loose > any data on failover. > > However, you are speaking about "logs" and "stats". That certainly > sounds like data you can afford to loose during a failover, because you > can easily recreate it. And as asynchronous replication is faster, > that's why you should prefer async replication here, IMO. That's not exactly this, I want to preserve any of the database servers from erroring whenever a network failure happens. Sync is not an answer here. > Well, I'd say you are (ab)using replication as an access controlling > method. That's not quite what it's made for, but you can certainly use > it that way. The fact that I need those controls led me to this replication design. > As I understand master-slave replication, a slave should be able to take > over from the master in case that one fails. In that case, the slave > must suddenly become writable and your access controlling is void. > > In case you are preventing that, you are using replication only to > transfer data and not to increase availability. That's fine, but it's > quite a different use case. And something I admittedly haven't thought > about. Thanks for pointing me to this use case of replication. That's exactly it: I'm not using replication as a way for a slave to takeover the master in case of failure, but to spread data availability where I need it, and without requiring a central server to be accessible (SPOF). > Hm.. yeah, that might be true. On the other hand, the servers in the > cluster need to keep track of their state anyway, so there's not that > much to be gained here. In the case of a slave replicated node which is there to replace the master when it goes offline, yes the slave needs to know it's a slave. PITR based solution achieve this by having the slave eternaly in recovery mode, by the time it pass this step it's a master. Slony, AFAIUI, will soon be using the session_replication_role GUC to decide about its "state". Here it's more interresting since a single server can acts as a master for some data and as a slave for some others, and the triggers to run are not the same depending on the role. Of course, with multi-master replication, the client can INSERT to any member of the cluster and the same triggers will get run, you're not after disabling replication trigger if you're acting as a slave. But as you mention it, we don't yet have a multi-master production setup. I still hope it'll get on the radar sooner than later, though ;) -- dim
Hi, Dimitri Fontaine wrote: > That's not exactly this, I want to preserve any of the database servers from > erroring whenever a network failure happens. Sync is not an answer here. So, you want your base data to remain readable on the slaves, even if it looses connection to the master, right? However, this is not dependent on any timing property of replication of writing transaction (i.e. sync vs async). Instead, it's very well possible for any kind of replication solution, to continue allowing read-only access to nodes which lost connection to the primary or to the majority of the cluster. Such a node will fall behind with its snapshot of the data, if the primary continues writing. > That's exactly it: I'm not using replication as a way for a slave to takeover > the master in case of failure, but to spread data availability where I need > it, and without requiring a central server to be accessible (SPOF). I understand. So this is increasing "read-only availability", sort of, which is what's possible with today's tools. I'm still claiming that you rather want to increase overall availability, once that's possible. But arguing about inexistent solutions is pretty pointless. > But as you mention it, we > don't yet have a multi-master production setup. > > I still hope it'll get on the radar sooner than later, though ;) Well, it's certainly on *my* radar ;-) Regards Markus Wanner
Simon Riggs wrote: > When primary server fails, it would be good if the clients connected to > the primary knew to reconnect to the standby servers automatically. > > We might want to specify that centrally and then send the redirection > address to the client when it connects. Sounds like lots of work though. > > Seems fairly straightforward to specify a standby connection service at > client level: .pgreconnect, or pgreconnect.conf > No config, then option not used. > > Would work with various forms of replication. > > Implementation would be to make PQreset() try secondary connection if > the primary one fails to reset. Of course you can program this manually, > but the feature is that you wouldn't need to, nor would you need to > request changes to 27 different interfaces either. I assumed share/pg_service.conf would help in this regard; place the file on a central server and modify that so everyone connects to another server. Perhaps we could even add round-robin functionality to that. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Fri, 2008-08-15 at 12:24 -0400, Bruce Momjian wrote: > Simon Riggs wrote: > > When primary server fails, it would be good if the clients connected to > > the primary knew to reconnect to the standby servers automatically. > > > > We might want to specify that centrally and then send the redirection > > address to the client when it connects. Sounds like lots of work though. > > > > Seems fairly straightforward to specify a standby connection service at > > client level: .pgreconnect, or pgreconnect.conf > > No config, then option not used. > > > > Would work with various forms of replication. > > > > Implementation would be to make PQreset() try secondary connection if > > the primary one fails to reset. Of course you can program this manually, > > but the feature is that you wouldn't need to, nor would you need to > > request changes to 27 different interfaces either. > > I assumed share/pg_service.conf would help in this regard; place the > file on a central server and modify that so everyone connects to another > server. Perhaps we could even add round-robin functionality to that. I do want to keep it as simple as possible, but we do need a way that will work without reconfiguration at the time of danger. It needs to be preconfigured and tested, then change controlled so we all know it works. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > > > Implementation would be to make PQreset() try secondary connection if > > > the primary one fails to reset. Of course you can program this manually, > > > but the feature is that you wouldn't need to, nor would you need to > > > request changes to 27 different interfaces either. > > > > I assumed share/pg_service.conf would help in this regard; place the > > file on a central server and modify that so everyone connects to another > > server. Perhaps we could even add round-robin functionality to that. > > I do want to keep it as simple as possible, but we do need a way that > will work without reconfiguration at the time of danger. It needs to be > preconfigured and tested, then change controlled so we all know it > works. OK, so using share/pg_service.conf as an implementation example, how would this work? The application supplies multiple service names and libpq tries attaching to each one in the list until one works? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Fri, 2008-08-15 at 14:25 -0400, Bruce Momjian wrote: > Simon Riggs wrote: > > > > Implementation would be to make PQreset() try secondary connection if > > > > the primary one fails to reset. Of course you can program this manually, > > > > but the feature is that you wouldn't need to, nor would you need to > > > > request changes to 27 different interfaces either. > > > > > > I assumed share/pg_service.conf would help in this regard; place the > > > file on a central server and modify that so everyone connects to another > > > server. Perhaps we could even add round-robin functionality to that. > > > > I do want to keep it as simple as possible, but we do need a way that > > will work without reconfiguration at the time of danger. It needs to be > > preconfigured and tested, then change controlled so we all know it > > works. > > OK, so using share/pg_service.conf as an implementation example, how > would this work? The application supplies multiple service names and > libpq tries attaching to each one in the list until one works? This could work in one of two ways (maybe more) * supply a group for each service. If main service goes down, try other services in your group * supply a secondary service for each main service. If primary goes down we look at secondary service It might also be possible to daisy-chain the retries, so after trying the secondary/others we move onto the secondary's secondary in much the same way that a telephone hunt group works. This was suggested as a complementary feature alongside other server-side features I'm working on. I'm not thinking of doing this myself, since I know much less about the client side code than I'd need to do this in the time available. Also, I'm not sure whether it is unpopular or simply misunderstood. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support