Thread: Conflict resolution in Multimaster replication(Postgres-R)
Hello, My basic question is: in multimaster replication, if each site goes ahead and does the modifications issued by the transaction and then sends the writeset to others in the group, how the ACID properties be maintained? Details: Suppose there are two sites in the group, lets say, A and B and are managing a database D. Two transactions TA and TB started in sites A and B respectively, at nearly same time, wanted to update same row of a table in the database. As, no locking structures and other concurrency handling structures are replicated each will go ahead and do the modifications in their corresponding databases and sends the writeset. Since, both writesets contain update to the same row, will the two transactions be rolled back or anything other than this happens? A more general question is: for Transactional isolation level 4(serializable level), the information such as locking of rows be transmitted across sites? If not, what is the mechanism to address concurrency with serializibility. Thanks, Srinivas
<font face="Verdana, Helvetica, Arial"><span style="font-size:12.0px">Hi Srinivas, <br /><br /> Multi-master replicationin Postgres-R is handled using a process called certification that ensures there are no serializability violations. Look at the paper by Kemme and Alonzo entitled “Don’t be Lazy, Be Consistent...” (<a href="http://www.cs.mcgill.ca/~kemme/papers/vldb00.html).">http://www.cs.mcgill.ca/~kemme/papers/vldb00.html).</a> In thefirst case you describe one transaction must abort if applying them would break serializability. <br /><br /> In thesecond case you describe, you must transmit read sets as well as write sets. The same sort of algorithm is applied asfor writes. <br /><br /> Please email me directly if you want more information. <br /><br /> Thanks, Robert<br /><br />On 9/3/08 4:02 PM, "M2Y" <mailtoyahoo@gmail.com> wrote:<br /><br /></span></font><blockquote><font face="Verdana,Helvetica, Arial"><span style="font-size:12.0px">Hello,<br /><br /> My basic question is: in multimaster replication,if each site goes<br /> ahead and does the modifications issued by the transaction and then<br /> sends the writesetto others in the group, how the ACID properties be<br /> maintained?<br /><br /> Details:<br /> Suppose there aretwo sites in the group, lets say, A and B and are<br /> managing a database D. Two transactions TA and TB started in sitesA<br /> and B respectively, at nearly same time, wanted to update same row of<br /> a table in the database. As, nolocking structures and other<br /> concurrency handling structures are replicated each will go ahead and<br /> do the modificationsin their corresponding databases and sends the<br /> writeset. Since, both writesets contain update to the samerow, will<br /> the two transactions be rolled back or anything other than this<br /> happens?<br /><br /> A more generalquestion is: for Transactional isolation level<br /> 4(serializable level), the information such as locking of rowsbe<br /> transmitted across sites? If not, what is the mechanism to address<br /> concurrency with serializibility.<br/><br /> Thanks,<br /> Srinivas<br /><br /> --<br /> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)<br/> To make changes to your subscription:<br /><a href="http://www.postgresql.org/mailpref/pgsql-hackers">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/><br /></span></font></blockquote><fontface="Verdana, Helvetica, Arial"><span style="font-size:12.0px"><br /><br /> -- <br />Robert Hodges, CTO, Continuent, Inc.<br /> Email: robert.hodges@continuent.com<br /> Mobile: +1-510-501-3728 Skype: hodgesrm<br /></span></font>
Hello Srinivas, M2Y wrote: > My basic question is: in multimaster replication, if each site goes > ahead and does the modifications issued by the transaction and then > sends the writeset to others in the group, how the ACID properties be > maintained? Well, there are different approaches. With regard to locking, you can differentiate between pessimistic and optimistic locking. Synchronous replication solutions like two-phase commit and most statement based solutions do pessimistic locking: upon updating a row - and thus locking it - they make sure every other node locks the same row as well before proceeding. Asynchronous replication solutions (including the original Postgres-R algorithm just mentioned by Robert Hodges) do optimistic locking: they proceed with the transaction even if they don't know in advance if the same rows can be locked for that transaction on all other nodes. They optimistically assume it will be possible to lock them in most cases. In all other cases, they abort one of two conflicting transactions. Here you can distinguish even further between eager and lazy solutions: a lazy solution defers the check against conflicting transactions on other nodes to sometime after the commit. Upon detecting such a conflict, it must thus abort an already committed transaction (late abort). That's a violation of the ACID properties, but it's worthwhile in certain cases. On the other hand, the eager solution does the conflict checking and possible aborting *before* committing, thus maintaining full ACID properties. That method then suffers from an increased commit latency, dependent on the network. See [1] for more information. > A more general question is: for Transactional isolation level > 4(serializable level), the information such as locking of rows be > transmitted across sites? If not, what is the mechanism to address > concurrency with serializibility. Depends on the algorithm, but transferring every single lock is generally considered to be too expensive. Instead, locks are transferred indirectly by sending statements or changesets or such. What kind of replication are you interested in? Regards Markus Wanner [1]: Terms and Definitions for Database Replication: http://www.postgres-r.org/documentation/terms
Thank you very much Robert and Markus. > What kind of replication are you interested in? > Markus: It looks like the hybrid approach used by Postgres-R(as described in that paper) is good. Thanks, Srinivas
Hello Srinivas, M2Y wrote: > Markus: It looks like the hybrid approach used by Postgres-R(as > described in that paper) is good. Well, yeah. That's why am working on it ;-) You are very welcome to download the patch and dig into the sources. See www.postgres-r.org for more information. To answer your original question in more details: > Suppose there are two sites in the group, lets say, A and B and are > managing a database D. Two transactions TA and TB started in sites A > and B respectively, at nearly same time, wanted to update same row of > a table in the database. As, no locking structures and other > concurrency handling structures are replicated each will go ahead and > do the modifications in their corresponding databases and sends the > writeset. Correct so far. Note that both transactions might have applied changes, but they have not committed, yet. In eager mode we rely on the Group Communication System to deliver these two changesets [1] in the same order on both nodes. Let's say both receive TA's changeset first, then TB's. The backend which processed TA on node A can commit, because its changes don't conflict with anything else. The changeset of TB is forwarded to a helper backend, which tries to apply its changes. But the helper backend detects the conflict against TA and aborts (because it knows TA takes precedence on all other nodes as well). On node B, the backend which processed TB has to wait with its commit, because another changeset, namely TA's came in first. For that changeset a helper backend is started as well, which applies the changes of TA. During application of changes, that helper backend detects a conflict against the (yet uncommitted) changes of TB. As it knows its transaction TA takes precedence over TB (on all other nodesas well), it tells TB to abort and continues applying its own changes. I hope that was an understandable explanation. Regards Markus Wanner [1]: In the original Postgres-R paper, these are called writesets. But in my implementation, I've altered its meaning somewhat. Because of that (and because I admittedly like "changeset" better), I've decided to call them changesets now...