Re: Postgres Replication - Mailing list pgsql-hackers

From reinoud@xs4all.nl (Reinoud van Leeuwen)
Subject Re: Postgres Replication
Date
Msg-id 3b403d96.562404297@192.168.1.10
Whole thread Raw
In response to Postgres Replication  (Darren Johnson <djohnson@greatbridge.com>)
Responses Re: Postgres Replication  (Alex Pilosov <alex@pilosoft.com>)
List pgsql-hackers
On Mon, 11 Jun 2001 19:46:44 GMT, you wrote:

>We have been researching replication for several months now, and
>I have some opinions to share to the community for feedback,
>discussion, and/or participation. Our goal is to get a replication
>solution for PostgreSQL that will meet most needs of users
>and applications alike (mission impossible theme here :). 
>
>My research work along with others contributors has been collected
>and presented here http://www.greatbridge.org/genpage?replication_top
>If there is something missing, especially PostgreSQL related
>work, I would like to know about it, and my apologies to any
>one who got left off the list. This work is ongoing and doesn't
>draw a conclusion, which IMHO should be left up to the user,
>but I'm offering my opinions to spur discussion and/or feed back
>from this list, and try not to offend any one.
>
>Here's my opinion: of the approaches we've surveyed, the most
>promising one is the Postgres-R project from the Information and
>Communication Systems Group, ETH  in Zurich, Switzerland, originally 
>produced by Bettina Kemme, Gustavo Alonso, and others.  Although 
>Postgres-R is a synchronous approach, I believe it is the closest to 
>the goal mentioned above. Here is an abstract of the advantages.
>
>1) Postgres-R is built on the PostgreSQL-6.4.2 code base.  The 
>replication 
>functionality is an optional parameter, so there will be insignificant 
>overhead for non replication situations. The replication and 
>communication
>managers are the two new modules added to the PostgreSQL code base.
>
>2) The replication manager's main function is controlling the
>replication protocol via a message handling process. It receives
>messages from the local and remote backends and forwards write
>sets and decision messages via the communication manager to the
>other servers. The replication manager controls all the transactions 
>running on the local server by keeping track of the states, including 
>which protocol phase (read, send, lock, or write) the transaction is
>in. The replication manager maintains a two way channel
>implemented as buffered sockets to each backend.

what does "manager controls all the transactions" mean? I hope it does
*not* mean that a bug in the manager would cause transactions not to
commit...

>
>3) The main task of the communication manager is to provide simple
>socket based interface between the replication manager and the
>group communication system (currently Ensemble). The
>communication system is a cluster of servers connected via
>the communication manager.  The replication manager also maintains
>three one-way channels to the communication system: a broadcast
>channel to send messages, a total-order channel to receive
>totally orders write sets, and a no-order channel to listen for
>decision messages from the communication system. Decision
>messages can be received at any time where the reception of
>totally ordered write sets can be blocked in certain phases.
>
>4) Based on a two phase locking approach, all dead lock situations
>are local and detectable by Postgres-R code base, and aborted.

Does this imply locking over different servers? That would mean a
grinding halt when a network outage occurs...

>5) The write set messages used to send database changes to other
>servers, can use either the SQL statements or the actual tuples
>changed. This is a parameter based on number of tuples changed
>by a transaction. While sending the tuple changes reduces
>overhead in query parse, plan and execution, there is a negative
>effect in sending a large write set across the network.
>
>6) Postgres-R uses a synchronous approach that keeps the data on 
>all sites consistent and provides serializability. The user does not 
>have to bother with conflict resolution, and receives the same 
>correctness and consistency of a centralized system.
>
>7) Postgres-R could be part of a good fault-resilient and load 
>distribution 
>solution.  It is peer-to-peer based and incurs low overhead propagating 
>updates to the other cluster members.  All replicated databases locally 
>process queries.
>
>8) Compared to other synchronous replication strategies (e.g., standard 
>distributed 2-phase-locking + 2-phase-commit), Postgres-R has much 
>better performance using 2-phase-locking.

Coming from a Sybase background I have some experience with
replication. The way it works in Sybase Replication server is as
follows:
- for each replicated database, there is a "log reader" process that
reads the WAL and captures only *committed transactions* to the
replication server. (it does not make much sense to replicate other
things IMHO :-).
- the replication server stores incoming data in a que ("stable
device"), until it is sure it has reached its final destination

- a replication server can send data to another replication server in
a compact (read: WAN friendly) way. A chain of replication servers can
be made, depending on network architecture)

- the final replication server makes a almost standard client
connection to the target database and translates the compact
transactions back to SQL statements. By using masks, extra
functionality can be built in. 

This kind of architecture has several advantages:
- only committed transactions are replicated which saves overhead
- it does not have very much impact on performance of the source
server (apart from reading the WAL)
- since every replication server has a stable device, data is stored
when the network is down and nothing gets lost (nor stops performing)
- because only the log reader and the connection from the final
replication server are RDBMS specific, it is possible to replicate
from MS to Oracle using a Sybase replication server (or different
versions etc).

I do not know how much of this is patented or copyrighted, but the
architecture seems elegant and robust to me. I have done
implementations of bi-directional replication too. It *is* possible
but does require some funky setup and maintenance. (but it is better
that letting offices on different continents working on the same
database :-)

just my 2 EURO cts  :-)


-- 
__________________________________________________
"Nothing is as subjective as reality"
Reinoud van Leeuwen       reinoud@xs4all.nl
http://www.xs4all.nl/~reinoud
__________________________________________________


pgsql-hackers by date:

Previous
From: Alex Pilosov
Date:
Subject: Re: inet/cidr type comparisons
Next
From: Alex Pilosov
Date:
Subject: Re: Postgres Replication