Home > mailing lists

Re: Replication Ideas - Mailing list pgsql-general

From	Chris Travers
Subject	Re: Replication Ideas
Date	August 25, 2003 17:06:41
Msg-id	3F4A420E.6090604@travelamericas.com Whole thread Raw
In response to	Re: Replication Ideas (Ron Johnson <ron.l.johnson@cox.net>)
Responses	Re: Replication Ideas (Ron Johnson <ron.l.johnson@cox.net>) Re: Replication Ideas (Alvaro Herrera <alvherre@dcc.uchile.cl>)
List	pgsql-general

Tree view

Ron Johnson wrote:

>This is vaguely similar to Two Phase Commit, which is a sine qua
>non of distributed transactions, which is the s.q.n. of multi-master
>replication.
>
>
>

I may be wrong, but if I recall correctly, one of the problems with a
standard 2-phase commit is that if one server goes down, the other
masters cannot commit their transactions.  This would make a clustered
database server have a downtime equivalent to the total downtime of all
of its nodes.  This is a real problem.  Of course my understanding of
Two Phase Commit may be incorrect, in which case, I would appreciate it
if someone could point out where I am wrong.

It had occurred to me that the issue was one of failure handling more
than one of concept.  I.e. the problem is how one node's failure is
handled rather than the fundamental structure of Two Phase Commit.  If a
single node fails, we don't want that to take down the whole cluster,
and I have actually revised my logic a bit more (to make it even
safer).  In this I assume that:

1:  General failures on any one node are rare
2:  A failure is more likely to prevent a transaction from being
committed than allow one to be committed.

This hot-failover solution requires a transparency from a client
perspective-- i.e. the client should not have to choose a different
server should one go and should not need to know when a server comes
back up.  This also means that we need to assume that a load balancing
solution can be a part of the clustering solution.  I would assume that
this would require a shared IP address for the public interface of the
server and a private communicatiions channel where each node has a
separate IP address (similar to Microsoft's implimentation of Network
Load Balancing).  Also, different transactions within a single
connection should be able to be handled by different nodes, so if one
node goes down, users don't have to reconnect.

So here is my suggested logic for high availablility/load balanced
clustering:

1:  All nodes recognize each user connection and delegage transactions
rather than connections.

2:  At the beginning of a transaction, nodes decide who will take it.
Any operation which does not change the information or schema of the
database is handled exclusively on that node.  Other operations are
distributed across nodes.

3:  When the transaction is committed, the nodes "vote" on whether the
commitment of the transaction is valid. Majority rules, and the minority
must remove themselves from the cluster until they can synchronize their
databases with the existing masters.  If the vote is split 50/50 (i.e.
one node fails in a 2 node cluster), success is considered more likely
to be valid than failure, and the node(s) which failed to commit the
transaction must remove themselves from the cluster until they can recover.

Best Wishes,
Chris Travers

pgsql-general by date:

From: Jacob Vennervald Madsen
Date: 25 August 2003, 16:03:50
Subject: Re: Return cursor

From: "Patrick Hatcher"
Date: 25 August 2003, 17:37:52
Subject: Sales numbers off hold off using please.

Re: Replication Ideas - Mailing list pgsql-general

Previous

Next