Thread: Postgres Replication

Postgres Replication

From

Darren Johnson

Date:

11 June 2001, 18:48:24

We have been researching replication for several months now, and
I have some opinions to share to the community for feedback,
discussion, and/or participation. Our goal is to get a replication
solution for PostgreSQL that will meet most needs of users
and applications alike (mission impossible theme here :). 

My research work along with others contributors has been collected
and presented here http://www.greatbridge.org/genpage?replication_top
If there is something missing, especially PostgreSQL related
work, I would like to know about it, and my apologies to any
one who got left off the list. This work is ongoing and doesn't
draw a conclusion, which IMHO should be left up to the user,
but I'm offering my opinions to spur discussion and/or feed back
from this list, and try not to offend any one.

Here's my opinion: of the approaches we've surveyed, the most
promising one is the Postgres-R project from the Information and
Communication Systems Group, ETH  in Zurich, Switzerland, originally 
produced by Bettina Kemme, Gustavo Alonso, and others.  Although 
Postgres-R is a synchronous approach, I believe it is the closest to 
the goal mentioned above. Here is an abstract of the advantages.

1) Postgres-R is built on the PostgreSQL-6.4.2 code base.  The 
replication 
functionality is an optional parameter, so there will be insignificant 
overhead for non replication situations. The replication and 
communication
managers are the two new modules added to the PostgreSQL code base.

2) The replication manager's main function is controlling the
replication protocol via a message handling process. It receives
messages from the local and remote backends and forwards write
sets and decision messages via the communication manager to the
other servers. The replication manager controls all the transactions 
running on the local server by keeping track of the states, including 
which protocol phase (read, send, lock, or write) the transaction is
in. The replication manager maintains a two way channel
implemented as buffered sockets to each backend.

3) The main task of the communication manager is to provide simple
socket based interface between the replication manager and the
group communication system (currently Ensemble). The
communication system is a cluster of servers connected via
the communication manager.  The replication manager also maintains
three one-way channels to the communication system: a broadcast
channel to send messages, a total-order channel to receive
totally orders write sets, and a no-order channel to listen for
decision messages from the communication system. Decision
messages can be received at any time where the reception of
totally ordered write sets can be blocked in certain phases.

4) Based on a two phase locking approach, all dead lock situations
are local and detectable by Postgres-R code base, and aborted.

5) The write set messages used to send database changes to other
servers, can use either the SQL statements or the actual tuples
changed. This is a parameter based on number of tuples changed
by a transaction. While sending the tuple changes reduces
overhead in query parse, plan and execution, there is a negative
effect in sending a large write set across the network.

6) Postgres-R uses a synchronous approach that keeps the data on 
all sites consistent and provides serializability. The user does not 
have to bother with conflict resolution, and receives the same 
correctness and consistency of a centralized system.

7) Postgres-R could be part of a good fault-resilient and load 
distribution 
solution.  It is peer-to-peer based and incurs low overhead propagating 
updates to the other cluster members.  All replicated databases locally 
process queries.

8) Compared to other synchronous replication strategies (e.g., standard 
distributed 2-phase-locking + 2-phase-commit), Postgres-R has much 
better performance using 2-phase-locking.


There are some issues that are not currently addressed by
Postgres-R, but some enhancements made to PostgreSQL since the
6.4.2 tree are very favorable to addressing these short comings.

1) The addition of WAL in 7.1 has the information for recovering 
failed/off-line servers, currently all the servers would have to be 
stopped, and a copy would be used to get all the servers synchronized
before starting again. 

2)Being synchronous, Postgres-R would not be a good solution 
for off line/WAN scenarios where asynchronous replication is 
required.  There are some theories on this issue which involve servers
connecting and disconnecting from the cluster.

3)As in any serialized synchronous approach there is  change in the 
flow of execution of a transaction; while most of these changes can 
be solved by calling newly developed functions at certain time points, 
synchronous replica control is tightly coupled with the concurrency 
control.
Hence, especially in PostgreSQL 7.2 some parts of the concurrency control
(MVCC) might have to be adjusted. This can lead to a slightly more 
complicated maintenance than a system that does not change the backend.

4)Partial replication is not addressed. 


Any feedback on this post will be appreciated.

Thanks,

Darren

Re: Postgres Replication

From

reinoud@xs4all.nl (Reinoud van Leeuwen)

Date:

11 June 2001, 21:06:27

On Mon, 11 Jun 2001 19:46:44 GMT, you wrote:

>We have been researching replication for several months now, and
>I have some opinions to share to the community for feedback,
>discussion, and/or participation. Our goal is to get a replication
>solution for PostgreSQL that will meet most needs of users
>and applications alike (mission impossible theme here :). 
>
>My research work along with others contributors has been collected
>and presented here http://www.greatbridge.org/genpage?replication_top
>If there is something missing, especially PostgreSQL related
>work, I would like to know about it, and my apologies to any
>one who got left off the list. This work is ongoing and doesn't
>draw a conclusion, which IMHO should be left up to the user,
>but I'm offering my opinions to spur discussion and/or feed back
>from this list, and try not to offend any one.
>
>Here's my opinion: of the approaches we've surveyed, the most
>promising one is the Postgres-R project from the Information and
>Communication Systems Group, ETH  in Zurich, Switzerland, originally 
>produced by Bettina Kemme, Gustavo Alonso, and others.  Although 
>Postgres-R is a synchronous approach, I believe it is the closest to 
>the goal mentioned above. Here is an abstract of the advantages.
>
>1) Postgres-R is built on the PostgreSQL-6.4.2 code base.  The 
>replication 
>functionality is an optional parameter, so there will be insignificant 
>overhead for non replication situations. The replication and 
>communication
>managers are the two new modules added to the PostgreSQL code base.
>
>2) The replication manager's main function is controlling the
>replication protocol via a message handling process. It receives
>messages from the local and remote backends and forwards write
>sets and decision messages via the communication manager to the
>other servers. The replication manager controls all the transactions 
>running on the local server by keeping track of the states, including 
>which protocol phase (read, send, lock, or write) the transaction is
>in. The replication manager maintains a two way channel
>implemented as buffered sockets to each backend.

what does "manager controls all the transactions" mean? I hope it does
*not* mean that a bug in the manager would cause transactions not to
commit...

>
>3) The main task of the communication manager is to provide simple
>socket based interface between the replication manager and the
>group communication system (currently Ensemble). The
>communication system is a cluster of servers connected via
>the communication manager.  The replication manager also maintains
>three one-way channels to the communication system: a broadcast
>channel to send messages, a total-order channel to receive
>totally orders write sets, and a no-order channel to listen for
>decision messages from the communication system. Decision
>messages can be received at any time where the reception of
>totally ordered write sets can be blocked in certain phases.
>
>4) Based on a two phase locking approach, all dead lock situations
>are local and detectable by Postgres-R code base, and aborted.

Does this imply locking over different servers? That would mean a
grinding halt when a network outage occurs...

>5) The write set messages used to send database changes to other
>servers, can use either the SQL statements or the actual tuples
>changed. This is a parameter based on number of tuples changed
>by a transaction. While sending the tuple changes reduces
>overhead in query parse, plan and execution, there is a negative
>effect in sending a large write set across the network.
>
>6) Postgres-R uses a synchronous approach that keeps the data on 
>all sites consistent and provides serializability. The user does not 
>have to bother with conflict resolution, and receives the same 
>correctness and consistency of a centralized system.
>
>7) Postgres-R could be part of a good fault-resilient and load 
>distribution 
>solution.  It is peer-to-peer based and incurs low overhead propagating 
>updates to the other cluster members.  All replicated databases locally 
>process queries.
>
>8) Compared to other synchronous replication strategies (e.g., standard 
>distributed 2-phase-locking + 2-phase-commit), Postgres-R has much 
>better performance using 2-phase-locking.

Coming from a Sybase background I have some experience with
replication. The way it works in Sybase Replication server is as
follows:
- for each replicated database, there is a "log reader" process that
reads the WAL and captures only *committed transactions* to the
replication server. (it does not make much sense to replicate other
things IMHO :-).
- the replication server stores incoming data in a que ("stable
device"), until it is sure it has reached its final destination

- a replication server can send data to another replication server in
a compact (read: WAN friendly) way. A chain of replication servers can
be made, depending on network architecture)

- the final replication server makes a almost standard client
connection to the target database and translates the compact
transactions back to SQL statements. By using masks, extra
functionality can be built in. 

This kind of architecture has several advantages:
- only committed transactions are replicated which saves overhead
- it does not have very much impact on performance of the source
server (apart from reading the WAL)
- since every replication server has a stable device, data is stored
when the network is down and nothing gets lost (nor stops performing)
- because only the log reader and the connection from the final
replication server are RDBMS specific, it is possible to replicate
from MS to Oracle using a Sybase replication server (or different
versions etc).

I do not know how much of this is patented or copyrighted, but the
architecture seems elegant and robust to me. I have done
implementations of bi-directional replication too. It *is* possible
but does require some funky setup and maintenance. (but it is better
that letting offices on different continents working on the same
database :-)

just my 2 EURO cts  :-)


-- 
__________________________________________________
"Nothing is as subjective as reality"
Reinoud van Leeuwen       reinoud@xs4all.nl
http://www.xs4all.nl/~reinoud
__________________________________________________

Re: Postgres Replication

From

Alex Pilosov

Date:

11 June 2001, 21:37:01

On Mon, 11 Jun 2001, Reinoud van Leeuwen wrote:

> On Mon, 11 Jun 2001 19:46:44 GMT, you wrote:

> what does "manager controls all the transactions" mean? I hope it does
> *not* mean that a bug in the manager would cause transactions not to
> commit...
Well yeah it does. Bugs are a fact of life. :)

> >4) Based on a two phase locking approach, all dead lock situations
> >are local and detectable by Postgres-R code base, and aborted.
> 
> Does this imply locking over different servers? That would mean a
> grinding halt when a network outage occurs...
Don't know, but see below.

> Coming from a Sybase background I have some experience with
> replication. The way it works in Sybase Replication server is as
> follows:
> - for each replicated database, there is a "log reader" process that
> reads the WAL and captures only *committed transactions* to the
> replication server. (it does not make much sense to replicate other
> things IMHO :-).
> - the replication server stores incoming data in a que ("stable
> device"), until it is sure it has reached its final destination
> 
> - a replication server can send data to another replication server in
> a compact (read: WAN friendly) way. A chain of replication servers can
> be made, depending on network architecture)
> 
> - the final replication server makes a almost standard client
> connection to the target database and translates the compact
> transactions back to SQL statements. By using masks, extra
> functionality can be built in. 
> 
> This kind of architecture has several advantages:
> - only committed transactions are replicated which saves overhead
> - it does not have very much impact on performance of the source
> server (apart from reading the WAL)
> - since every replication server has a stable device, data is stored
> when the network is down and nothing gets lost (nor stops performing)
> - because only the log reader and the connection from the final
> replication server are RDBMS specific, it is possible to replicate
> from MS to Oracle using a Sybase replication server (or different
> versions etc).
> 
> I do not know how much of this is patented or copyrighted, but the
> architecture seems elegant and robust to me. I have done
> implementations of bi-directional replication too. It *is* possible
> but does require some funky setup and maintenance. (but it is better
> that letting offices on different continents working on the same
> database :-)
Yes, the above architecture is what almost every vendor of replication
software uses. And I'm sure if you worked much with Sybase, you hate the
garbage that their repserver is :). 

The architecture of postgres-r and repserver are fundamentally different
for a good reason: repserver only wants to replicate committed
transactions, while postgres-r is more of a 'clustering' solution (albeit
they don't say this word), and is capable to do much more than simple rep
server. 

I.E. you can safely put half of your clients to second server in a
replicated postgres-r cluster without being worried that a conflict (or a
wierd locking situation) may occur.

Try that with sybase, it is fundamentally designed for one-way
replication, and the fact that you can do one-way replication in both
directions doesn't mean its safe to do that!

I'm not sure how postgres-r handles network problems. To be useful, a good
replication solution must have an option of "no network->no updates" as
well as "no network->queue updates and send them later". However, it is
far easier to add queuing to a correct 'eager locking' database than it is
to add proper locking to a queue-based replicator.

-alex

Re: Postgres Replication

From

Darren Johnson

Date:

12 June 2001, 01:11:36

Thanks for the feedback.  I'll try to address both your issues here.

>> what does "manager controls all the transactions" mean? 
> 
The replication manager controls the transactions by serializing the 
write set messages. 
This ensures all transactions are committed in the same order on each 
server, so bugs
here are not allowed  ;-)

>> I hope it does
>> *not* mean that a bug in the manager would cause transactions not to
>> commit...
> 
> Well yeah it does. Bugs are a fact of life. :

> 
>>> 4) Based on a two phase locking approach, all dead lock situations
>>> are local and detectable by Postgres-R code base, and aborted.
>> 
>> Does this imply locking over different servers? That would mean a
>> grinding halt when a network outage occurs...
> 
> Don't know, but see below.

There is a branch of the Postgres-R code that has some failure detection 
implemented,
so we will have to merge this functionality with the version of 
Postgres-R we have, and
test this issue.  I'll let you the results.

>> 
>> - the replication server stores incoming data in a que ("stable
>> device"), until it is sure it has reached its final destination
> 
I like this idea for recovering servers that have been down a short 
period of time, using WAL
to recover transactions missed during the outage.

>> 
>> This kind of architecture has several advantages:
>> - only committed transactions are replicated which saves overhead
>> - it does not have very much impact on performance of the source
>> server (apart from reading the WAL)
>> - since every replication server has a stable device, data is stored
>> when the network is down and nothing gets lost (nor stops performing)
>> - because only the log reader and the connection from the final
>> replication server are RDBMS specific, it is possible to replicate
>> from MS to Oracle using a Sybase replication server (or different
>> versions etc).
> 
There are some issues with the "log reader" approach:
1) The databases are not synchronized until the log reader completes its 
processing.
2) I'm not sure about Sybase, but the log reader sends SQL statements to 
the other servers
which are then parsed, planned and executed.  This over head could be 
avoided if only
the tuple changes are replicated.
3) Works fine for read only situations, but peer-to-peer applications 
using this approach
must be designed with a conflict resolution scheme. 

Don't get me wrong, I believe we can learn from the replication 
techniques used by commercial
databases like Sybase, and try to implement the good ones into 
PostgreSQL.  Postgres-R is
a synchronous approach which out performs the traditional approaches to 
synchronous replication.
Being based on PostgreSQL-6.4.2, getting this approach in the 7.2 tree 
might be better than
reinventing the wheel.

Thanks again,

Darren


Thanks again,

Darren