Thread: Automatic Client Failover

Automatic Client Failover

From
Simon Riggs
Date:
When primary server fails, it would be good if the clients connected to
the primary knew to reconnect to the standby servers automatically.

We might want to specify that centrally and then send the redirection
address to the client when it connects. Sounds like lots of work though.

Seems fairly straightforward to specify a standby connection service at
client level: .pgreconnect, or pgreconnect.conf
No config, then option not used.

Would work with various forms of replication.

Implementation would be to make PQreset() try secondary connection if
the primary one fails to reset. Of course you can program this manually,
but the feature is that you wouldn't need to, nor would you need to
request changes to 27 different interfaces either.

Good? Bad? Ugly? 

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Automatic Client Failover

From
"Jonah H. Harris"
Date:
On Mon, Aug 4, 2008 at 5:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> When primary server fails, it would be good if the clients connected to
> the primary knew to reconnect to the standby servers automatically.

This would be a nice feature which many people I've talked to have
asked for.  In Oracle-land, it's called Transparent Application
Failover (TAF) and it gives you a lot of options, including the
ability to write your own callbacks when a failover is detected.

+1

-- 
Jonah H. Harris, Senior DBA
myYearbook.com


Re: Automatic Client Failover

From
Josh Berkus
Date:
On Monday 04 August 2008 14:08, Simon Riggs wrote:
> When primary server fails, it would be good if the clients connected to
> the primary knew to reconnect to the standby servers automatically.
>
> We might want to specify that centrally and then send the redirection
> address to the client when it connects. Sounds like lots of work though.
>
> Seems fairly straightforward to specify a standby connection service at
> client level: .pgreconnect, or pgreconnect.conf
> No config, then option not used.

Well, it's less simple, but you can already do this with pgPool on the 
client machine.


-- 
--Josh

Josh Berkus
PostgreSQL
San Francisco


Re: Automatic Client Failover

From
"Jonah H. Harris"
Date:
On Mon, Aug 4, 2008 at 5:39 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Well, it's less simple, but you can already do this with pgPool on the
> client machine.

Yeah, but if you have tens or hundreds of clients, you wouldn't want
to be installing/managing a pgpool on each.  Similarly, I think an
application should have the option of being notified of a connection
change; I know that wasn't in Simon's proposal, but I've found it
necessary in several applications which rely on things such as
temporary tables.  You don't want the app just blowing up because a
table doesn't exist; you want to be able to handle it gracefully.

-- 
Jonah H. Harris, Senior DBA
myYearbook.com


Re: Automatic Client Failover

From
Tom Lane
Date:
"Jonah H. Harris" <jonah.harris@gmail.com> writes:
> On Mon, Aug 4, 2008 at 5:39 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> Well, it's less simple, but you can already do this with pgPool on the
>> client machine.

> Yeah, but if you have tens or hundreds of clients, you wouldn't want
> to be installing/managing a pgpool on each.

Huh?  The pgpool is on the server, not on the client side.

There is one really bad consequence of the oversimplified failover
design that Simon proposes, which is that clients might try to fail over
for reasons other than a primary server failure.  (Think network
partition.)  You really want any such behavior to be managed centrally,
IMHO.
        regards, tom lane


Re: Automatic Client Failover

From
Hannu Krosing
Date:
On Mon, 2008-08-04 at 22:08 +0100, Simon Riggs wrote:
> When primary server fails, it would be good if the clients connected to
> the primary knew to reconnect to the standby servers automatically.
> 
> We might want to specify that centrally and then send the redirection
> address to the client when it connects. Sounds like lots of work though.

One way to do it is _outside_ of client, by having a separately managed
subnet for logical DB addresses. So when a failover occurs, then you
move that logical DB address to the new host, flush ARP caches and just
reconnect.

This also solves the case of inadvertent failover in case of unrelated
network failure.

> Seems fairly straightforward to specify a standby connection service at
> client level: .pgreconnect, or pgreconnect.conf
> No config, then option not used.
> 
> Would work with various forms of replication.
> 
> Implementation would be to make PQreset() try secondary connection if
> the primary one fails to reset. Of course you can program this manually,
> but the feature is that you wouldn't need to, nor would you need to
> request changes to 27 different interfaces either.
> 
> Good? Bad? Ugly? 
> 
> -- 
>  Simon Riggs           www.2ndQuadrant.com
>  PostgreSQL Training, Services and Support
> 
> 



Re: Automatic Client Failover

From
Dimitri Fontaine
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

Le 5 août 08 à 01:13, Tom Lane a écrit :
> There is one really bad consequence of the oversimplified failover
> design that Simon proposes, which is that clients might try to fail
> over
> for reasons other than a primary server failure.  (Think network
> partition.)  You really want any such behavior to be managed
> centrally,
> IMHO.


Then, what about having pgbouncer capability into -core. This would
probably mean, AFAIUI,  than the listen()ing process would no longer
be postmaster but a specialized one, with the portable poll()/
select()/... process, that is now know as pgbouncer.

Existing pgbouncer would have to be expanded to: - provide a backward compatible mode    (session pooling, release
serversession at client closing time) - allow to configure several backend servers and to try next on   
certain conditions - add hooks for clients to know when some events happen    (failure of current master, automatic
switchover,etc) 

Existing pgbouncer hooks and next ones could be managed with catalog
tables as we have special options table for autovacuum, e.g.,
pg_connection_pool, which could contain arbitrary SQL for new backend
fork, backend closing, failover, switchover, etc; and maybe the client
hooks would be NOTIFY messages sent from the backend at its initiative.

Would we then have the centrally managed behavior Tom is mentioning?
I'm understanding this in 2 ways: - this extension would be able to distinguish between failure cases
where we are able   to do an automatic failover from "hard" crashes (impacting the
listener) - when we have read-only slave(s) pgbouncer will be able to redirect
ro statements to it.

Maybe it would even be useful to see about Markus' work in Postgres-R
and its inter-backend communication system allowing the executor to
require more than one backend working on a single query. The pgbouncer
inherited system would then be a pre-forked backend pooling manager
too...

Once more, I hope that giving (not so) random ideas here as a (not
yet) pgsql hacker is helping the project more than it's disturbing
real work...

Regards,
- --
dim

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkiXk5gACgkQlBXRlnbh1bkBhACfQdgHh27yGeyHgeCrC7aV1LET
U4IAn1N6FaanI2BEWMLyPWKmGtedaSQC
=ifVF
-----END PGP SIGNATURE-----


Re: Automatic Client Failover

From
Tom Lane
Date:
Dimitri Fontaine <dfontaine@hi-media.com> writes:
> Le 5 ao�t 08 � 01:13, Tom Lane a �crit :
>> There is one really bad consequence of the oversimplified failover
>> design that Simon proposes, which is that clients might try to fail
>> over for reasons other than a primary server failure.  (Think network
>> partition.)  You really want any such behavior to be managed
>> centrally, IMHO.

> Then, what about having pgbouncer capability into -core. This would  
> probably mean, AFAIUI,  than the listen()ing process would no longer  
> be postmaster but a specialized one,

Huh?  The problem case is that the primary server goes down, which would
certainly mean that a pgbouncer instance on the same machine goes with
it.  So it seems to me that integrating pgbouncer is 100% backwards.

Failover that actually works is not something we can provide with
trivial changes to Postgres.  It's really a major project in its
own right: you need heartbeat detection, STONITH capability,
IP address redirection, etc.  I think we should be recommending
external failover-management project(s) instead of offering a
half-baked home-grown solution.  Searching freshmeat for "failover"
finds plenty of potential candidates, but not having used any of
them I'm not sure which are worth closer investigation.
        regards, tom lane


Re: Automatic Client Failover

From
Josh Berkus
Date:
Tom,

> Failover that actually works is not something we can provide with
> trivial changes to Postgres. 

I think the proposal was for an extremely simple "works 75% of the time" 
failover solution.  While I can see the attraction of that, the 
consequences of having failover *not* work are pretty severe.

On the other hand, we will need to deal with this for the built-in 
replication project.

-- 
--Josh

Josh Berkus
PostgreSQL
San Francisco


Re: Automatic Client Failover

From
daveg
Date:
On Mon, Aug 04, 2008 at 05:17:59PM -0400, Jonah H. Harris wrote:
> On Mon, Aug 4, 2008 at 5:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > When primary server fails, it would be good if the clients connected to
> > the primary knew to reconnect to the standby servers automatically.
> 
> This would be a nice feature which many people I've talked to have
> asked for.  In Oracle-land, it's called Transparent Application
> Failover (TAF) and it gives you a lot of options, including the
> ability to write your own callbacks when a failover is detected.

This might be better done as part of a proxy server, eg pgbouncer, pgpool
than as part of postgresql or libpq. I like the concept, but the logic to
determine when a failover has occurred is complex and a client will often
not have access to enough information to make this determination accurately.

postgresql could have hooks to support this though, ie to determine when a
standby thinks it has become the master.

-dg

-- 
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.


Re: Automatic Client Failover

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> I think the proposal was for an extremely simple "works 75% of the time" 
> failover solution.  While I can see the attraction of that, the 
> consequences of having failover *not* work are pretty severe.

Exactly.  The point of failover (or any other HA feature) is to get
several nines worth of reliability.  "It usually works" is simply
not playing in the right league.

> On the other hand, we will need to deal with this for the built-in 
> replication project.

Nope, that's orthogonal.  A failover solution depends on having a master
and a slave database, but it has nothing directly to do with how those
DBs are synchronized.
        regards, tom lane


Re: Automatic Client Failover

From
Simon Riggs
Date:
On Mon, 2008-08-04 at 22:56 -0400, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
> > I think the proposal was for an extremely simple "works 75% of the time" 
> > failover solution.  While I can see the attraction of that, the 
> > consequences of having failover *not* work are pretty severe.
> 
> Exactly.  The point of failover (or any other HA feature) is to get
> several nines worth of reliability.  "It usually works" is simply
> not playing in the right league.

Why would you all presume that I haven't thought about the things you
mention? Where did I say "...and this would be the only feature required
for full and correct HA failover." The post is specifically about Client
Failover, as the title clearly states.

Your comments were illogical anyway, since if it was so bad a technique
then it would not work for pgpool either, since it is also a client. If
pgpool can do this, why can't another client? Why can't *all* clients?

With correctly configured other components the primary will shut down if
it is no longer the boss. The client will then be disconnected. If it
switches to its secondary connection, we can have an option to read
session_replication_role to ensure that this is set to origin. This
covers the case where the client has lost connection with primary,
though it is still up, yet can reach the standby which has not changed
state.

DB2, SQLServer and Oracle all provide this feature, BTW. We don't need
to follow, but we should do that consciously. I'm comfortable with us
deciding not to do it, if that is our considered judgement.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Automatic Client Failover

From
"Greg Stark"
Date:

Greg

On 5-Aug-08, at 12:15 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>
> There is one really bad consequence of the oversimplified failover
> design that Simon proposes, which is that clients might try to fail  
> over
> for reasons other than a primary server failure.  (Think network
> partition.)  You really want any such behavior to be managed  
> centrally,
> IMHO.

The alternative to a cwnrallu managed   failover system is one based  
on a quorum system. At first glance it seems to me that would fit our  
use case better. But the point remains that we would be better off  
adopting a complete system than trying to reinvent one. 


Re: Automatic Client Failover

From
Dimitri Fontaine
Date:
Le mardi 05 août 2008, Tom Lane a écrit :
> Huh?  The problem case is that the primary server goes down, which would
> certainly mean that a pgbouncer instance on the same machine goes with
> it.  So it seems to me that integrating pgbouncer is 100% backwards.

With all due respect, it seems to me you're missing an important piece of the
scheme here: I certainly failed to explain correctly. Of course, I'm not sure
(by and large) that detailing what I have in mind will answer your concerns,
but still...

What I have in mind is having the pgbouncer listening process both at master
and slave sites. So your clients can already connect to slave for normal
operations, and the listener process simply connects them to the master,
transparently.
When we later provider RO slave, some queries could be processed locally
instead of getting sent to the master.
The point being that the client does not have to care itself whether it's
connecting to a master or a slave, -core knows what it can handle for the
client and handles it (proxying the connection).

Now, that does not solve the client side automatic failover per-se, it's
another way to think about it:- both master & slave accept connection in any mode- master & slave are able to "speak"
toeach other (life link)- when master knows it's crashing (elog(FATAL)), it can say so to the slave- when said so,
slavecan switch to master 

It obviously only catches some errors on master, the ones we're able to log
about. So it does nothing on its own for allowing HA in case of master crash.
But...

> Failover that actually works is not something we can provide with
> trivial changes to Postgres.  It's really a major project in its
> own right: you need heartbeat detection, STONITH capability,
> IP address redirection, etc.  I think we should be recommending
> external failover-management project(s) instead of offering a
> half-baked home-grown solution.  Searching freshmeat for "failover"
> finds plenty of potential candidates, but not having used any of
> them I'm not sure which are worth closer investigation.

We have worked here with heartbeat, and automating failover is hard. Not for
technical reasons only, also because:- current PostgreSQL offers no sync replication, switching means trading or
losingthe D in ACID,- you do not want to lose any commited data. 

If 8.4 resolve this, failover implementation will be a lot easier.

What I see my proposal fit is the ability to handle a part of the smartness
in -core directly, so the hard part of the STONITH/failover/switchback could
be implemented in cooperation with -core, not playing tricks against it.

For example, switching back when master gets back online would only means for
the master to tell the slave to now redirect the queries to him as soon as
it's ready --- which still is the hard part, sync back data.

Having clients able to blindly connect to master or any slave and having the
current cluster topology smartness into -core would certainly help here, even
if not fullfilling all HA goals.

Of course, in the case of master hard crash, we still have to get sure it
won't restart on its own, and we have to have an external way to get a chosen
slave become the master.

I'm even envisioning than -core could help STONITH projects with having sth
like the recovery.conf file for the master to restart in not-up-to-date slave
mode. Whether we implement resyncing to the new master in -core or from
external scripts is another concern, but certainly -core could help here
(even if not in 8.4, of course).

I'm still thinking that this proposal has a place in the scheme of an
integrated HA solution and offers interresting bits.

Regards,
--
dim

Re: Automatic Client Failover

From
Hannu Krosing
Date:
On Tue, 2008-08-05 at 07:52 +0100, Simon Riggs wrote:
> On Mon, 2008-08-04 at 22:56 -0400, Tom Lane wrote:
> > Josh Berkus <josh@agliodbs.com> writes:
> > > I think the proposal was for an extremely simple "works 75% of the time" 
> > > failover solution.  While I can see the attraction of that, the 
> > > consequences of having failover *not* work are pretty severe.
> > 
> > Exactly.  The point of failover (or any other HA feature) is to get
> > several nines worth of reliability.  "It usually works" is simply
> > not playing in the right league.
> 
> Why would you all presume that I haven't thought about the things you
> mention? Where did I say "...and this would be the only feature required
> for full and correct HA failover." The post is specifically about Client
> Failover, as the title clearly states.

I guess having the title "Automatic Client Failover" suggest to most
readers, that you are trying to solve the client side separately from
server. 

> Your comments were illogical anyway, since if it was so bad a technique
> then it would not work for pgpool either, since it is also a client. If
> pgpool can do this, why can't another client? Why can't *all* clients?

IIRC pgpool was itself a poor-mans replication solution, so it _is_ the
point of doing failover.

> With correctly configured other components the primary will shut down if
> it is no longer the boss. The client will then be disconnected. If it
> switches to its secondary connection, we can have an option to read
> session_replication_role to ensure that this is set to origin. 

Probably this should not be an option, but a must.

maybe session_replication_role should be a DBA-defined function, so that
the same client failover mechanism can be applied to different
replication solutions, both server-built-in and external.

create function session_replication_role() 
returns enum('master','ro-slave','please-wait-coming-online','...')
$$
...


> This
> covers the case where the client has lost connection with primary,
> though it is still up, yet can reach the standby which has not changed
> state.
> 
> DB2, SQLServer and Oracle all provide this feature, BTW. We don't need
> to follow, but we should do that consciously. I'm comfortable with us
> deciding not to do it, if that is our considered judgement.

The main argument seemed to be, that it can't be "Automatic Client-ONLY
Failover."

--------------
Hannu







Re: Automatic Client Failover

From
Simon Riggs
Date:
On Tue, 2008-08-05 at 11:50 +0300, Hannu Krosing wrote:
> On Tue, 2008-08-05 at 07:52 +0100, Simon Riggs wrote:
> > On Mon, 2008-08-04 at 22:56 -0400, Tom Lane wrote:
> > > Josh Berkus <josh@agliodbs.com> writes:
> > > > I think the proposal was for an extremely simple "works 75% of the time" 
> > > > failover solution.  While I can see the attraction of that, the 
> > > > consequences of having failover *not* work are pretty severe.
> > > 
> > > Exactly.  The point of failover (or any other HA feature) is to get
> > > several nines worth of reliability.  "It usually works" is simply
> > > not playing in the right league.
> > 
> > Why would you all presume that I haven't thought about the things you
> > mention? Where did I say "...and this would be the only feature required
> > for full and correct HA failover." The post is specifically about Client
> > Failover, as the title clearly states.
> 
> I guess having the title "Automatic Client Failover" suggest to most
> readers, that you are trying to solve the client side separately from
> server. 

Yes, that's right: separately. Why would anybody presume I meant "and by
the way you can turn off all other HA measures not mentioned here"? Not
mentioning a topic means no change or no impact in that area, at least
on all other hackers threads.

> > Your comments were illogical anyway, since if it was so bad a technique
> > then it would not work for pgpool either, since it is also a client. If
> > pgpool can do this, why can't another client? Why can't *all* clients?
> 
> IIRC pgpool was itself a poor-mans replication solution, so it _is_ the
> point of doing failover.

Agreed. 

> > With correctly configured other components the primary will shut down if
> > it is no longer the boss. The client will then be disconnected. If it
> > switches to its secondary connection, we can have an option to read
> > session_replication_role to ensure that this is set to origin. 
> 
> Probably this should not be an option, but a must.

Perhaps, but some people doing read only queries don't really care which
one they are connected to. 

> maybe session_replication_role should be a DBA-defined function, so that
> the same client failover mechanism can be applied to different
> replication solutions, both server-built-in and external.
> 
> create function session_replication_role() 
> returns enum('master','ro-slave','please-wait-coming-online','...')
> $$
> ...

Maybe, trouble is "please wait coming online" is the message a Hot
Standby would give also. Happy to list out all the states so we can make
this work for everyone.

> > This
> > covers the case where the client has lost connection with primary,
> > though it is still up, yet can reach the standby which has not changed
> > state.
> > 
> > DB2, SQLServer and Oracle all provide this feature, BTW. We don't need
> > to follow, but we should do that consciously. I'm comfortable with us
> > deciding not to do it, if that is our considered judgement.
> 
> The main argument seemed to be, that it can't be "Automatic Client-ONLY
> Failover."

No argument. Never was. It can't be. 

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Automatic Client Failover

From
Markus Wanner
Date:
Hi,

Tom Lane wrote:
> Huh?  The pgpool is on the server, not on the client side.

Not necessarily. Having pgpool on the client side works just as well.

> There is one really bad consequence of the oversimplified failover
> design that Simon proposes, which is that clients might try to fail over
> for reasons other than a primary server failure.

Why is that? It's just fine for a client to (re)connect to another 
server due to a fluky connection to the current server. I had something 
pretty similar in mind for Postgres-R. (Except that we should definitely 
allow to specify more than just a primary and a secondary server.)
> (Think network partition.)

Uh... well, yeah, of course the servers themselves need to exchange 
their state and make sure they only accept clients if they are up and 
running (as seen by the cluster). That's what the 'view' of a GCS is all 
about. Or STONITH, for that matter.
> You really want any such behavior to be managed centrally,> IMHO.

Controlling that client behavior reliably would involve using multiple 
(at least N+1) connections to different servers, so you can control the 
client even if N of the servers fail. That's certainly more complex than 
what Simon proposed.

Speaking in terms of orthogonality, client failover is orthogonal to the 
(cluster-wide) server state management. Which in turn is orthogonal to 
how the nodes replicate data. (Modulo some side effects like nodes 
lagging behind for async replication...)

Regards

Markus Wanner



Re: Automatic Client Failover

From
Markus Wanner
Date:
Hi,

Greg Stark wrote:
> a cwnrallu

What is that?

Regards

Markus Wanner


Re: Automatic Client Failover

From
Markus Wanner
Date:
Hi,

Simon Riggs wrote:
> On Tue, 2008-08-05 at 11:50 +0300, Hannu Krosing wrote:
>> I guess having the title "Automatic Client Failover" suggest to most
>> readers, that you are trying to solve the client side separately from
>> server. 
> 
> Yes, that's right: separately. Why would anybody presume I meant "and by
> the way you can turn off all other HA measures not mentioned here"? Not
> mentioning a topic means no change or no impact in that area, at least
> on all other hackers threads.

I think the pgbouncer-in-core idea caused some confusion here.

IMO the client failover method is very to what DNS round-robin setups do 
for webservers: even if clients might failover 'automatically', you 
still have to maintain the server states (which servers do you list in 
the DNS?) and care about 'replication' of your site to the webservers.

Regards

Markus Wanner



Re: Automatic Client Failover

From
Dimitri Fontaine
Date:
Le mardi 05 août 2008, Markus Wanner a écrit :
>  > (Think network partition.)
>
> Uh... well, yeah, of course the servers themselves need to exchange
> their state and make sure they only accept clients if they are up and
> running (as seen by the cluster). That's what the 'view' of a GCS is all
> about. Or STONITH, for that matter.

That's where I'm thinking that some -core smartness would makes this part
simpler, hence the confusion (sorry about that) on the thread.

If slave nodes were able to accept connection and redirect them to master, the
client wouldn't need to care about connecting to master or slave, just to
connect to a live node.

So the proposal for Automatic Client Failover becomes much more simpler.
--
dim

Re: Automatic Client Failover

From
Markus Wanner
Date:
Hi,

Dimitri Fontaine wrote:
> If slave nodes were able to accept connection and redirect them to master, the 
> client wouldn't need to care about connecting to master or slave, just to 
> connect to a live node.

I've thought about that as well, but think about it this way: to protect 
against N failing nodes, you need to forward *every* request through N 
living nodes, before actually hitting the node which processes the 
query. To me, that sounds like an awful lot of traffic within the 
cluster, which can easily be avoided with automatic client failover.

(Why are you stating, that only slaves need to redirect? What is 
happening in case of a master failure?)

> So the proposal for Automatic Client Failover becomes much more simpler.

I'm arguing it's the other way around: taking down a node of the cluster 
becomes much simpler with ACF, because clients automatically reconnect 
to another node themselves. The servers don't need to care.

Regards

Markus Wanner



Re: Automatic Client Failover

From
Dimitri Fontaine
Date:
Le mardi 05 août 2008, Markus Wanner a écrit :
> I've thought about that as well, but think about it this way: to protect
> against N failing nodes, you need to forward *every* request through N
> living nodes, before actually hitting the node which processes the
> query. To me, that sounds like an awful lot of traffic within the
> cluster, which can easily be avoided with automatic client failover.
>
> (Why are you stating, that only slaves need to redirect? What is
> happening in case of a master failure?)

I'm thinking in term of single master multiple slaves scenario...
In single master case, each slave only needs to know who the current master is
and if itself can process read-only queries (locally) or not.

You seem to be thinking in term of multi-master, where the choosing of a
master node is a different concern, as a failing master does not imply slave
promotion.

> > So the proposal for Automatic Client Failover becomes much more simpler.
>
> I'm arguing it's the other way around: taking down a node of the cluster
> becomes much simpler with ACF, because clients automatically reconnect
> to another node themselves. The servers don't need to care.

Well, in the single master case I'm not sure to agree, but in the case of
multi master configuration, it well seems that choosing some alive master is
a client task.

Now what about multi-master multi-slave case? Does such a configuration have
sense?
It this ever becomes possible (2 active/active masters servers, with some
slaves for long running queries, e.g.), then you may want the ACF-enabled
connection routine to choose to connect to any master or slave in the pool,
and have the slave be itself an AFC client to target some alive master.

Does this still makes sense?
--
dim

Re: Automatic Client Failover

From
Markus Wanner
Date:
Hi,

Dimitri Fontaine wrote:
> I'm thinking in term of single master multiple slaves scenario...
> In single master case, each slave only needs to know who the current master is 
> and if itself can process read-only queries (locally) or not.

I don't think that's as trivial as you make it sound. I'd rather put it 
as: all nodes need to agree on exactly one master node at any given 
point in time. However, IMO that has nothing to do with automatic client 
failover.

> You seem to be thinking in term of multi-master, where the choosing of a 
> master node is a different concern, as a failing master does not imply slave 
> promotion.

I'm thinking about the problem which AFC tries to solve: connection 
losses between the client and one of the servers (no matter if it's a 
master or a slave). As opposed to a traditional single-node database, 
there might be other servers available to connect to, once a client lost 
the current connection (and thus suspects the server behind that 
connection to have gone down).

Redirecting writing transactions from slaves to the master node solves 
another problem. Being able to 'rescue' such forwarded connections in 
case of a failure of the master is just a nice side effect. But it 
doesn't solve the problem of connection losses between a client and the 
master.

> Well, in the single master case I'm not sure to agree, but in the case of 
> multi master configuration, it well seems that choosing some alive master is 
> a client task.

Given a failure of the master server, how do you expect clients, which 
were connected to that master server, to "failover"? Some way or 
another, they need to be able to (re)connect to one of the slaves (which 
possibly turned into the new master by then).

Of course, you can load that burden on the application, and simply let 
that try to connect to another server upon connection failures. AFAIU 
Simon is proposing to put that logic into libpq. I see merits in that 
for multiple replication solutions and don't think anything exclusively 
server-sided could solve the same issue (because the client currently 
only has one connection to one server, which might fail at any time).

[ Please note that you still need the retry-loop in the application. It 
mainly saves having to care about the list of servers and server states 
in the app. ]

> Now what about multi-master multi-slave case? Does such a configuration have 
> sense?

Heh.. I'm glad you are asking. ;-)

IMO the only reason for master-slave replication is ease of 
implementation. It's certainly not something a sane end-users is ever 
requesting by himself, because he needs that "feature". After all, not 
being able to run writing queries on certain nodes is not a feature, but 
a bare limitation.

In your question, you are implicitly assuming an existing multi-master 
implementation. Given my reasoning, this would make an additional 
master-slave replication pretty useless. Thus I'm claiming that such a 
configuration does not make sense.

> It this ever becomes possible (2 active/active masters servers, with some 
> slaves for long running queries, e.g.), then you may want the ACF-enabled 
> connection routine to choose to connect to any master or slave in the pool, 

You can do the same with multi-master replication, without any disadvantage.

> and have the slave be itself an AFC client to target some alive master.

Huh? AFC for master-slave communication? That implies that slaves are 
connected to the master(s) via libpq, which I think is not such a good fit.

Regards

Markus Wanner


Re: Automatic Client Failover

From
Dimitri Fontaine
Date:
Le mardi 05 août 2008, Markus Wanner a écrit :
> Dimitri Fontaine wrote:
> > I'm thinking in term of single master multiple slaves scenario...
> > In single master case, each slave only needs to know who the current
> > master is and if itself can process read-only queries (locally) or not.
>
> I don't think that's as trivial as you make it sound. I'd rather put it
> as: all nodes need to agree on exactly one master node at any given
> point in time. However, IMO that has nothing to do with automatic client
> failover.

Agreed, the idea is trying to help the AFC by reducing what I understood was
its realm. It seems I'm misunderstanding the perimeter of the proposed
change...

And as for the apparent triviality, it resides only in the concept, and when
you're confronted to nodes acting as master or slave depending on context
(session_replication_role) it becomes more interresting.

> I'm thinking about the problem which AFC tries to solve: connection
> losses between the client and one of the servers (no matter if it's a
> master or a slave). As opposed to a traditional single-node database,
> there might be other servers available to connect to, once a client lost
> the current connection (and thus suspects the server behind that
> connection to have gone down).
>
> Redirecting writing transactions from slaves to the master node solves
> another problem. Being able to 'rescue' such forwarded connections in
> case of a failure of the master is just a nice side effect. But it
> doesn't solve the problem of connection losses between a client and the
> master.

Agreed. It simply allows the ACF part not to bother with master(s) slave(s)
topology, which still looks as a great win for me.

> Given a failure of the master server, how do you expect clients, which
> were connected to that master server, to "failover"? Some way or
> another, they need to be able to (re)connect to one of the slaves (which
> possibly turned into the new master by then).

Yes, you still need ACF, I'm sure I never wanted to say anything against this.

> IMO the only reason for master-slave replication is ease of
> implementation. It's certainly not something a sane end-users is ever
> requesting by himself, because he needs that "feature". After all, not
> being able to run writing queries on certain nodes is not a feature, but
> a bare limitation.

I'm not agreeing here.
I have replication needs where some data are only yo be edited by an admin
backoffice, then replicated to servers. Those servers also write data (logs)
which are to be sent to the main server (now a slave) which will compute
stats on-the-fly (trigger based at replication receiving).

Now, this configuration needs to be resistant to network failure of any node,
central one included. So I don't want synchronous replication, thanks. And I
don't want multi-master either, as I WANT to forbid central to edit data from
the servers, and to forbid servers to edit data coming from the backoffice.

Now, I certainly would appreciate having the central server not being a SPOF
by having two masters both active at any time.

Of course, if I want HA, whatever features and failure autodetection
PostgreSQL gives me, I still need ACF. And if I get master/slave instead of
master/master, I need STONITH and hearbeat or equivalent.
I was just trying to propose ideas for having those external part as easy as
possible to get right with whatever integrated solution comes from -core.

> In your question, you are implicitly assuming an existing multi-master
> implementation. Given my reasoning, this would make an additional
> master-slave replication pretty useless. Thus I'm claiming that such a
> configuration does not make sense.

I disagree here, see above.

> Huh? AFC for master-slave communication? That implies that slaves are
> connected to the master(s) via libpq, which I think is not such a good fit.

I'm using londiste (from Skytools), a master/slaves replication solution in
python. I'm not sure whether the psycopg component is using libpq or
implementing the fe protocol itself, but it seems to me in any case it would
be a candidate to benefit from Simon's proposal.

Regards,
--
dim

Re: Automatic Client Failover

From
Markus Wanner
Date:
Hi,

Dimitri Fontaine wrote:
>> Redirecting writing transactions from slaves to the master node solves
>> another problem. Being able to 'rescue' such forwarded connections in
>> case of a failure of the master is just a nice side effect. But it
>> doesn't solve the problem of connection losses between a client and the
>> master.
> 
> Agreed. It simply allows the ACF part not to bother with master(s) slave(s) 
> topology, which still looks as a great win for me.

Hm.. yeah, for master-slave replication I'm slowly beginning to see 
merit in it. However, given the lacking use of master-slave...

> Yes, you still need ACF, I'm sure I never wanted to say anything against this.

Ah, okay. I thought you were proposing an alternative.

>> IMO the only reason for master-slave replication is ease of
>> implementation. It's certainly not something a sane end-users is ever
>> requesting by himself, because he needs that "feature". After all, not
>> being able to run writing queries on certain nodes is not a feature, but
>> a bare limitation.
> 
> I'm not agreeing here.

Somehow, I just knew it..  ;-)

> I have replication needs where some data are only yo be edited by an admin 
> backoffice, then replicated to servers. Those servers also write data (logs) 
> which are to be sent to the main server (now a slave) which will compute 
> stats on-the-fly (trigger based at replication receiving).

Sure, you can currently do that because there exist master-slave 
replication solutions which can do that. And that's perfectly fine.

Comparing that with concepts of an inexistent multi-master replication 
solution is not fair by definition.




> 
> Now, this configuration needs to be resistant to network failure of any node, 
> central one included. So I don't want synchronous replication, thanks. And I 
> don't want multi-master either, as I WANT to forbid central to edit data from 
> the servers, and to forbid servers to edit data coming from the backoffice.
> 
> Now, I certainly would appreciate having the central server not being a SPOF 
> by having two masters both active at any time.
> 
> Of course, if I want HA, whatever features and failure autodetection 
> PostgreSQL gives me, I still need ACF. And if I get master/slave instead of 
> master/master, I need STONITH and hearbeat or equivalent.
> I was just trying to propose ideas for having those external part as easy as 
> possible to get right with whatever integrated solution comes from -core.
> 
>> In your question, you are implicitly assuming an existing multi-master
>> implementation. Given my reasoning, this would make an additional
>> master-slave replication pretty useless. Thus I'm claiming that such a
>> configuration does not make sense.
> 
> I disagree here, see above.
> 
>> Huh? AFC for master-slave communication? That implies that slaves are
>> connected to the master(s) via libpq, which I think is not such a good fit.
> 
> I'm using londiste (from Skytools), a master/slaves replication solution in 
> python. I'm not sure whether the psycopg component is using libpq or 
> implementing the fe protocol itself, but it seems to me in any case it would 
> be a candidate to benefit from Simon's proposal.
> 
> Regards,



Re: Automatic Client Failover

From
Markus Wanner
Date:
Hi,

(sorry... I'm typing too fast and hitting the wrong keys... continuing 
the previous mail now...)

Dimitri Fontaine wrote:
> Now, this configuration needs to be resistant to network failure of any node, 

Yeah, increasing availability is the primary purpose of doing replication.

> central one included. So I don't want synchronous replication, thanks.

I do not understanding that reasoning. Synchronous replication is 
certainly *more* resilient to network failures, as it does *not* loose 
any data on failover.

However, you are speaking about "logs" and "stats". That certainly 
sounds like data you can afford to loose during a failover, because you 
can easily recreate it. And as asynchronous replication is faster, 
that's why you should prefer async replication here, IMO.

> And I 
> don't want multi-master either, as I WANT to forbid central to edit data from 
> the servers, and to forbid servers to edit data coming from the backoffice.

Well, I'd say you are (ab)using replication as an access controlling 
method. That's not quite what it's made for, but you can certainly use 
it that way.

As I understand master-slave replication, a slave should be able to take 
over from the master in case that one fails. In that case, the slave 
must suddenly become writable and your access controlling is void.

In case you are preventing that, you are using replication only to 
transfer data and not to increase availability. That's fine, but it's 
quite a different use case. And something I admittedly haven't thought 
about. Thanks for pointing me to this use case of replication.

We could probably combine Postgres-R (for multi-master replication) with 
londiste (to transfer selected data) asynchronously to other nodes.

> Of course, if I want HA, whatever features and failure autodetection 
> PostgreSQL gives me, I still need ACF.

Agreed.

> And if I get master/slave instead of 
> master/master, I need STONITH and hearbeat or equivalent.

A two-node setup with STONITH has the disadvantage, that you need manual 
intervention to bring up a crashed node again. (To remove the bullet 
from inside its head).

I'm thus recommending to use at least three nodes for any kind of 
high-availability setup. Even if the third one only serves as a quorum 
and doesn't hold a replica of the data. It allows automation of node 
recovery, which does not only ease administration, but eliminates a 
possible source of errors.

> I was just trying to propose ideas for having those external part as easy as 
> possible to get right with whatever integrated solution comes from -core.

Yeah, that'd be great.

However, ISTM that it's not quite clear, yet, what solution will get 
integrated into -core.

>> Huh? AFC for master-slave communication? That implies that slaves are
>> connected to the master(s) via libpq, which I think is not such a good fit.
> 
> I'm using londiste (from Skytools), a master/slaves replication solution in 
> python. I'm not sure whether the psycopg component is using libpq or 
> implementing the fe protocol itself, but it seems to me in any case it would 
> be a candidate to benefit from Simon's proposal.

Hm.. yeah, that might be true. On the other hand, the servers in the 
cluster need to keep track of their state anyway, so there's not that 
much to be gained here.

Regards

Markus Wanner



Re: Automatic Client Failover

From
Dimitri Fontaine
Date:
Le mardi 05 août 2008, Markus Wanner a écrit :
> I do not understanding that reasoning. Synchronous replication is
> certainly *more* resilient to network failures, as it does *not* loose
> any data on failover.
>
> However, you are speaking about "logs" and "stats". That certainly
> sounds like data you can afford to loose during a failover, because you
> can easily recreate it. And as asynchronous replication is faster,
> that's why you should prefer async replication here, IMO.

That's not exactly this, I want to preserve any of the database servers from
erroring whenever a network failure happens. Sync is not an answer here.

> Well, I'd say you are (ab)using replication as an access controlling
> method. That's not quite what it's made for, but you can certainly use
> it that way.

The fact that I need those controls led me to this replication design.

> As I understand master-slave replication, a slave should be able to take
> over from the master in case that one fails. In that case, the slave
> must suddenly become writable and your access controlling is void.
>
> In case you are preventing that, you are using replication only to
> transfer data and not to increase availability. That's fine, but it's
> quite a different use case. And something I admittedly haven't thought
> about. Thanks for pointing me to this use case of replication.

That's exactly it: I'm not using replication as a way for a slave to takeover
the master in case of failure, but to spread data availability where I need
it, and without requiring a central server to be accessible (SPOF).

> Hm.. yeah, that might be true. On the other hand, the servers in the
> cluster need to keep track of their state anyway, so there's not that
> much to be gained here.

In the case of a slave replicated node which is there to replace the master
when it goes offline, yes the slave needs to know it's a slave. PITR based
solution achieve this by having the slave eternaly in recovery mode, by the
time it pass this step it's a master.
Slony, AFAIUI, will soon be using the session_replication_role GUC to decide
about its "state". Here it's more interresting since a single server can acts
as a master for some data and as a slave for some others, and the triggers to
run are not the same depending on the role.

Of course, with multi-master replication, the client can INSERT to any member
of the cluster and the same triggers will get run, you're not after disabling
replication trigger if you're acting as a slave. But as you mention it, we
don't yet have a multi-master production setup.

I still hope it'll get on the radar sooner than later, though ;)
--
dim

Re: Automatic Client Failover

From
Markus Wanner
Date:
Hi,

Dimitri Fontaine wrote:
> That's not exactly this, I want to preserve any of the database servers from 
> erroring whenever a network failure happens. Sync is not an answer here.

So, you want your base data to remain readable on the slaves, even if it 
looses connection to the master, right?

However, this is not dependent on any timing property of replication of 
writing transaction (i.e. sync vs async). Instead, it's very well 
possible for any kind of replication solution, to continue allowing 
read-only access to nodes which lost connection to the primary or to the 
majority of the cluster. Such a node will fall behind with its snapshot 
of the data, if the primary continues writing.

> That's exactly it: I'm not using replication as a way for a slave to takeover 
> the master in case of failure, but to spread data availability where I need 
> it, and without requiring a central server to be accessible (SPOF).

I understand. So this is increasing "read-only availability", sort of, 
which is what's possible with today's tools. I'm still claiming that you 
rather want to increase overall availability, once that's possible. But 
arguing about inexistent solutions is pretty pointless.

> But as you mention it, we 
> don't yet have a multi-master production setup.
> 
> I still hope it'll get on the radar sooner than later, though ;)

Well, it's certainly on *my* radar ;-)

Regards

Markus Wanner


Re: Automatic Client Failover

From
Bruce Momjian
Date:
Simon Riggs wrote:
> When primary server fails, it would be good if the clients connected to
> the primary knew to reconnect to the standby servers automatically.
> 
> We might want to specify that centrally and then send the redirection
> address to the client when it connects. Sounds like lots of work though.
> 
> Seems fairly straightforward to specify a standby connection service at
> client level: .pgreconnect, or pgreconnect.conf
> No config, then option not used.
> 
> Would work with various forms of replication.
> 
> Implementation would be to make PQreset() try secondary connection if
> the primary one fails to reset. Of course you can program this manually,
> but the feature is that you wouldn't need to, nor would you need to
> request changes to 27 different interfaces either.

I assumed share/pg_service.conf would help in this regard;  place the
file on a central server and modify that so everyone connects to another
server. Perhaps we could even add round-robin functionality to that.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Automatic Client Failover

From
Simon Riggs
Date:
On Fri, 2008-08-15 at 12:24 -0400, Bruce Momjian wrote:
> Simon Riggs wrote:
> > When primary server fails, it would be good if the clients connected to
> > the primary knew to reconnect to the standby servers automatically.
> > 
> > We might want to specify that centrally and then send the redirection
> > address to the client when it connects. Sounds like lots of work though.
> > 
> > Seems fairly straightforward to specify a standby connection service at
> > client level: .pgreconnect, or pgreconnect.conf
> > No config, then option not used.
> > 
> > Would work with various forms of replication.
> > 
> > Implementation would be to make PQreset() try secondary connection if
> > the primary one fails to reset. Of course you can program this manually,
> > but the feature is that you wouldn't need to, nor would you need to
> > request changes to 27 different interfaces either.
> 
> I assumed share/pg_service.conf would help in this regard;  place the
> file on a central server and modify that so everyone connects to another
> server. Perhaps we could even add round-robin functionality to that.

I do want to keep it as simple as possible, but we do need a way that
will work without reconfiguration at the time of danger. It needs to be
preconfigured and tested, then change controlled so we all know it
works.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Automatic Client Failover

From
Bruce Momjian
Date:
Simon Riggs wrote:
> > > Implementation would be to make PQreset() try secondary connection if
> > > the primary one fails to reset. Of course you can program this manually,
> > > but the feature is that you wouldn't need to, nor would you need to
> > > request changes to 27 different interfaces either.
> > 
> > I assumed share/pg_service.conf would help in this regard;  place the
> > file on a central server and modify that so everyone connects to another
> > server. Perhaps we could even add round-robin functionality to that.
> 
> I do want to keep it as simple as possible, but we do need a way that
> will work without reconfiguration at the time of danger. It needs to be
> preconfigured and tested, then change controlled so we all know it
> works.

OK, so using share/pg_service.conf as an implementation example, how
would this work?  The application supplies multiple service names and
libpq tries attaching to each one in the list until one works?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Automatic Client Failover

From
Simon Riggs
Date:
On Fri, 2008-08-15 at 14:25 -0400, Bruce Momjian wrote:
> Simon Riggs wrote:
> > > > Implementation would be to make PQreset() try secondary connection if
> > > > the primary one fails to reset. Of course you can program this manually,
> > > > but the feature is that you wouldn't need to, nor would you need to
> > > > request changes to 27 different interfaces either.
> > > 
> > > I assumed share/pg_service.conf would help in this regard;  place the
> > > file on a central server and modify that so everyone connects to another
> > > server. Perhaps we could even add round-robin functionality to that.
> > 
> > I do want to keep it as simple as possible, but we do need a way that
> > will work without reconfiguration at the time of danger. It needs to be
> > preconfigured and tested, then change controlled so we all know it
> > works.
> 
> OK, so using share/pg_service.conf as an implementation example, how
> would this work?  The application supplies multiple service names and
> libpq tries attaching to each one in the list until one works?

This could work in one of two ways (maybe more)
* supply a group for each service. If main service goes down, try other
services in your group
* supply a secondary service for each main service. If primary goes down
we look at secondary service 

It might also be possible to daisy-chain the retries, so after trying
the secondary/others we move onto the secondary's secondary in much the
same way that a telephone hunt group works.

This was suggested as a complementary feature alongside other
server-side features I'm working on. I'm not thinking of doing this
myself, since I know much less about the client side code than I'd need
to do this in the time available. Also, I'm not sure whether it is
unpopular or simply misunderstood.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support