Thread: Postgres Replication
We have been researching replication for several months now, and I have some opinions to share to the community for feedback, discussion, and/or participation. Our goal is to get a replication solution for PostgreSQL that will meet most needs of users and applications alike (mission impossible theme here :). My research work along with others contributors has been collected and presented here http://www.greatbridge.org/genpage?replication_top If there is something missing, especially PostgreSQL related work, I would like to know about it, and my apologies to any one who got left off the list. This work is ongoing and doesn't draw a conclusion, which IMHO should be left up to the user, but I'm offering my opinions to spur discussion and/or feed back from this list, and try not to offend any one. Here's my opinion: of the approaches we've surveyed, the most promising one is the Postgres-R project from the Information and Communication Systems Group, ETH in Zurich, Switzerland, originally produced by Bettina Kemme, Gustavo Alonso, and others. Although Postgres-R is a synchronous approach, I believe it is the closest to the goal mentioned above. Here is an abstract of the advantages. 1) Postgres-R is built on the PostgreSQL-6.4.2 code base. The replication functionality is an optional parameter, so there will be insignificant overhead for non replication situations. The replication and communication managers are the two new modules added to the PostgreSQL code base. 2) The replication manager's main function is controlling the replication protocol via a message handling process. It receives messages from the local and remote backends and forwards write sets and decision messages via the communication manager to the other servers. The replication manager controls all the transactions running on the local server by keeping track of the states, including which protocol phase (read, send, lock, or write) the transaction is in. The replication manager maintains a two way channel implemented as buffered sockets to each backend. 3) The main task of the communication manager is to provide simple socket based interface between the replication manager and the group communication system (currently Ensemble). The communication system is a cluster of servers connected via the communication manager. The replication manager also maintains three one-way channels to the communication system: a broadcast channel to send messages, a total-order channel to receive totally orders write sets, and a no-order channel to listen for decision messages from the communication system. Decision messages can be received at any time where the reception of totally ordered write sets can be blocked in certain phases. 4) Based on a two phase locking approach, all dead lock situations are local and detectable by Postgres-R code base, and aborted. 5) The write set messages used to send database changes to other servers, can use either the SQL statements or the actual tuples changed. This is a parameter based on number of tuples changed by a transaction. While sending the tuple changes reduces overhead in query parse, plan and execution, there is a negative effect in sending a large write set across the network. 6) Postgres-R uses a synchronous approach that keeps the data on all sites consistent and provides serializability. The user does not have to bother with conflict resolution, and receives the same correctness and consistency of a centralized system. 7) Postgres-R could be part of a good fault-resilient and load distribution solution. It is peer-to-peer based and incurs low overhead propagating updates to the other cluster members. All replicated databases locally process queries. 8) Compared to other synchronous replication strategies (e.g., standard distributed 2-phase-locking + 2-phase-commit), Postgres-R has much better performance using 2-phase-locking. There are some issues that are not currently addressed by Postgres-R, but some enhancements made to PostgreSQL since the 6.4.2 tree are very favorable to addressing these short comings. 1) The addition of WAL in 7.1 has the information for recovering failed/off-line servers, currently all the servers would have to be stopped, and a copy would be used to get all the servers synchronized before starting again. 2)Being synchronous, Postgres-R would not be a good solution for off line/WAN scenarios where asynchronous replication is required. There are some theories on this issue which involve servers connecting and disconnecting from the cluster. 3)As in any serialized synchronous approach there is change in the flow of execution of a transaction; while most of these changes can be solved by calling newly developed functions at certain time points, synchronous replica control is tightly coupled with the concurrency control. Hence, especially in PostgreSQL 7.2 some parts of the concurrency control (MVCC) might have to be adjusted. This can lead to a slightly more complicated maintenance than a system that does not change the backend. 4)Partial replication is not addressed. Any feedback on this post will be appreciated. Thanks, Darren
On Mon, 11 Jun 2001 19:46:44 GMT, you wrote: >We have been researching replication for several months now, and >I have some opinions to share to the community for feedback, >discussion, and/or participation. Our goal is to get a replication >solution for PostgreSQL that will meet most needs of users >and applications alike (mission impossible theme here :). > >My research work along with others contributors has been collected >and presented here http://www.greatbridge.org/genpage?replication_top >If there is something missing, especially PostgreSQL related >work, I would like to know about it, and my apologies to any >one who got left off the list. This work is ongoing and doesn't >draw a conclusion, which IMHO should be left up to the user, >but I'm offering my opinions to spur discussion and/or feed back >from this list, and try not to offend any one. > >Here's my opinion: of the approaches we've surveyed, the most >promising one is the Postgres-R project from the Information and >Communication Systems Group, ETH in Zurich, Switzerland, originally >produced by Bettina Kemme, Gustavo Alonso, and others. Although >Postgres-R is a synchronous approach, I believe it is the closest to >the goal mentioned above. Here is an abstract of the advantages. > >1) Postgres-R is built on the PostgreSQL-6.4.2 code base. The >replication >functionality is an optional parameter, so there will be insignificant >overhead for non replication situations. The replication and >communication >managers are the two new modules added to the PostgreSQL code base. > >2) The replication manager's main function is controlling the >replication protocol via a message handling process. It receives >messages from the local and remote backends and forwards write >sets and decision messages via the communication manager to the >other servers. The replication manager controls all the transactions >running on the local server by keeping track of the states, including >which protocol phase (read, send, lock, or write) the transaction is >in. The replication manager maintains a two way channel >implemented as buffered sockets to each backend. what does "manager controls all the transactions" mean? I hope it does *not* mean that a bug in the manager would cause transactions not to commit... > >3) The main task of the communication manager is to provide simple >socket based interface between the replication manager and the >group communication system (currently Ensemble). The >communication system is a cluster of servers connected via >the communication manager. The replication manager also maintains >three one-way channels to the communication system: a broadcast >channel to send messages, a total-order channel to receive >totally orders write sets, and a no-order channel to listen for >decision messages from the communication system. Decision >messages can be received at any time where the reception of >totally ordered write sets can be blocked in certain phases. > >4) Based on a two phase locking approach, all dead lock situations >are local and detectable by Postgres-R code base, and aborted. Does this imply locking over different servers? That would mean a grinding halt when a network outage occurs... >5) The write set messages used to send database changes to other >servers, can use either the SQL statements or the actual tuples >changed. This is a parameter based on number of tuples changed >by a transaction. While sending the tuple changes reduces >overhead in query parse, plan and execution, there is a negative >effect in sending a large write set across the network. > >6) Postgres-R uses a synchronous approach that keeps the data on >all sites consistent and provides serializability. The user does not >have to bother with conflict resolution, and receives the same >correctness and consistency of a centralized system. > >7) Postgres-R could be part of a good fault-resilient and load >distribution >solution. It is peer-to-peer based and incurs low overhead propagating >updates to the other cluster members. All replicated databases locally >process queries. > >8) Compared to other synchronous replication strategies (e.g., standard >distributed 2-phase-locking + 2-phase-commit), Postgres-R has much >better performance using 2-phase-locking. Coming from a Sybase background I have some experience with replication. The way it works in Sybase Replication server is as follows: - for each replicated database, there is a "log reader" process that reads the WAL and captures only *committed transactions* to the replication server. (it does not make much sense to replicate other things IMHO :-). - the replication server stores incoming data in a que ("stable device"), until it is sure it has reached its final destination - a replication server can send data to another replication server in a compact (read: WAN friendly) way. A chain of replication servers can be made, depending on network architecture) - the final replication server makes a almost standard client connection to the target database and translates the compact transactions back to SQL statements. By using masks, extra functionality can be built in. This kind of architecture has several advantages: - only committed transactions are replicated which saves overhead - it does not have very much impact on performance of the source server (apart from reading the WAL) - since every replication server has a stable device, data is stored when the network is down and nothing gets lost (nor stops performing) - because only the log reader and the connection from the final replication server are RDBMS specific, it is possible to replicate from MS to Oracle using a Sybase replication server (or different versions etc). I do not know how much of this is patented or copyrighted, but the architecture seems elegant and robust to me. I have done implementations of bi-directional replication too. It *is* possible but does require some funky setup and maintenance. (but it is better that letting offices on different continents working on the same database :-) just my 2 EURO cts :-) -- __________________________________________________ "Nothing is as subjective as reality" Reinoud van Leeuwen reinoud@xs4all.nl http://www.xs4all.nl/~reinoud __________________________________________________
On Mon, 11 Jun 2001, Reinoud van Leeuwen wrote: > On Mon, 11 Jun 2001 19:46:44 GMT, you wrote: > what does "manager controls all the transactions" mean? I hope it does > *not* mean that a bug in the manager would cause transactions not to > commit... Well yeah it does. Bugs are a fact of life. :) > >4) Based on a two phase locking approach, all dead lock situations > >are local and detectable by Postgres-R code base, and aborted. > > Does this imply locking over different servers? That would mean a > grinding halt when a network outage occurs... Don't know, but see below. > Coming from a Sybase background I have some experience with > replication. The way it works in Sybase Replication server is as > follows: > - for each replicated database, there is a "log reader" process that > reads the WAL and captures only *committed transactions* to the > replication server. (it does not make much sense to replicate other > things IMHO :-). > - the replication server stores incoming data in a que ("stable > device"), until it is sure it has reached its final destination > > - a replication server can send data to another replication server in > a compact (read: WAN friendly) way. A chain of replication servers can > be made, depending on network architecture) > > - the final replication server makes a almost standard client > connection to the target database and translates the compact > transactions back to SQL statements. By using masks, extra > functionality can be built in. > > This kind of architecture has several advantages: > - only committed transactions are replicated which saves overhead > - it does not have very much impact on performance of the source > server (apart from reading the WAL) > - since every replication server has a stable device, data is stored > when the network is down and nothing gets lost (nor stops performing) > - because only the log reader and the connection from the final > replication server are RDBMS specific, it is possible to replicate > from MS to Oracle using a Sybase replication server (or different > versions etc). > > I do not know how much of this is patented or copyrighted, but the > architecture seems elegant and robust to me. I have done > implementations of bi-directional replication too. It *is* possible > but does require some funky setup and maintenance. (but it is better > that letting offices on different continents working on the same > database :-) Yes, the above architecture is what almost every vendor of replication software uses. And I'm sure if you worked much with Sybase, you hate the garbage that their repserver is :). The architecture of postgres-r and repserver are fundamentally different for a good reason: repserver only wants to replicate committed transactions, while postgres-r is more of a 'clustering' solution (albeit they don't say this word), and is capable to do much more than simple rep server. I.E. you can safely put half of your clients to second server in a replicated postgres-r cluster without being worried that a conflict (or a wierd locking situation) may occur. Try that with sybase, it is fundamentally designed for one-way replication, and the fact that you can do one-way replication in both directions doesn't mean its safe to do that! I'm not sure how postgres-r handles network problems. To be useful, a good replication solution must have an option of "no network->no updates" as well as "no network->queue updates and send them later". However, it is far easier to add queuing to a correct 'eager locking' database than it is to add proper locking to a queue-based replicator. -alex
Thanks for the feedback. I'll try to address both your issues here. >> what does "manager controls all the transactions" mean? > The replication manager controls the transactions by serializing the write set messages. This ensures all transactions are committed in the same order on each server, so bugs here are not allowed ;-) >> I hope it does >> *not* mean that a bug in the manager would cause transactions not to >> commit... > > Well yeah it does. Bugs are a fact of life. : > >>> 4) Based on a two phase locking approach, all dead lock situations >>> are local and detectable by Postgres-R code base, and aborted. >> >> Does this imply locking over different servers? That would mean a >> grinding halt when a network outage occurs... > > Don't know, but see below. There is a branch of the Postgres-R code that has some failure detection implemented, so we will have to merge this functionality with the version of Postgres-R we have, and test this issue. I'll let you the results. >> >> - the replication server stores incoming data in a que ("stable >> device"), until it is sure it has reached its final destination > I like this idea for recovering servers that have been down a short period of time, using WAL to recover transactions missed during the outage. >> >> This kind of architecture has several advantages: >> - only committed transactions are replicated which saves overhead >> - it does not have very much impact on performance of the source >> server (apart from reading the WAL) >> - since every replication server has a stable device, data is stored >> when the network is down and nothing gets lost (nor stops performing) >> - because only the log reader and the connection from the final >> replication server are RDBMS specific, it is possible to replicate >> from MS to Oracle using a Sybase replication server (or different >> versions etc). > There are some issues with the "log reader" approach: 1) The databases are not synchronized until the log reader completes its processing. 2) I'm not sure about Sybase, but the log reader sends SQL statements to the other servers which are then parsed, planned and executed. This over head could be avoided if only the tuple changes are replicated. 3) Works fine for read only situations, but peer-to-peer applications using this approach must be designed with a conflict resolution scheme. Don't get me wrong, I believe we can learn from the replication techniques used by commercial databases like Sybase, and try to implement the good ones into PostgreSQL. Postgres-R is a synchronous approach which out performs the traditional approaches to synchronous replication. Being based on PostgreSQL-6.4.2, getting this approach in the 7.2 tree might be better than reinventing the wheel. Thanks again, Darren Thanks again, Darren