Thread: Consistency of distributed transactions
Hello, I am interested in creating a system where Java EE distributed transactions would work with multiple Postgres databases. I'm having some difficulty understanding the transaction isolation guarantees that I would get from this configuration. Can I make my distributed transactions SERIALIZABLE or is it possible that one distributed transaction could interfere with another? In particular, I'm concerned about this scenario. Imagine that I have two databases. One database contains a list of savings accounts and their balances, while the other contains a list of current accounts and their balances. Fred wants to transfer £100 from his current account to his savings account, so the system begins a distributed transaction (A). Meanwhile, I have a transaction (B) which is going to total up all the liabilities of the bank. Now, things happen in this order: 1. Transaction B totals all the values in the current account database. 2. Transaction A debits Fred's current account. In the process, the current account database's implementation of MVCC decides that transaction B will be regarded as taking place before transaction A. 3. Transaction A credits Fred's savings account. 4. Transaction A commits. 5. Transaction B totals all the values in the savings account database. Transaction A no longer exists, so -- on the savings account database -- transaction B is considered to execute later. 6. Transaction B gets the wrong answer, because there was no consistent ordering between A and B. Is there a solution to this, or is the point that I'm simply asking too much? Perhaps the Java EE container is not promising consistency in the sense I'm talking about. Thank you for any help you can give, and I hope I've managed to explain a complicated problem clearly! Pete
On 1/06/2011 4:52 AM, Pete Chown wrote: > Is there a solution to this, or is the point that I'm simply asking too > much? Perhaps the Java EE container is not promising consistency in the > sense I'm talking about. Distributed transactions will give you atomicity if done right - with two-phase commit (2PC) - but AFAIK will *NOT* give you consistency across the databases in question. You have to ensure consistency yourself, usually by locking data or taking snapshots. If you want strong consistency guarantees between certain data, try to keep them in the same database. If you can't, you may need to be prepared to (try to) do your own consistency enforcement using application-coordinated record or table locking. There are scaling limits to keeping things in one database on one machine, which is why some database vendors offer multi-master setups. Pg doesn't, at present. That said, multi-master configurations have their own performance and scalability problems because providing consistency across a distributed multi-machine database adds a lot of overhead. They can be made to be faster than a single machine setup, but (from what I've heard; I have no direct experience here) you usually have to throw lots of hardware at them and use low-latency interconnects. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
Craig Ringer wrote: > On 1/06/2011 4:52 AM, Pete Chown wrote: > >> Is there a solution to this, or is the point that I'm simply asking too >> much? Perhaps the Java EE container is not promising consistency in the >> sense I'm talking about. > > Distributed transactions will give you atomicity if done right - with > two-phase commit (2PC) - but AFAIK will *NOT* give you consistency > across the databases in question. You have to ensure consistency > yourself, usually by locking data or taking snapshots. > > If you want strong consistency guarantees between certain data, try to > keep them in the same database. If you can't, you may need to be > prepared to (try to) do your own consistency enforcement using > application-coordinated record or table locking. > > There are scaling limits to keeping things in one database on one > machine, which is why some database vendors offer multi-master setups. > Pg doesn't, at present. That said, multi-master configurations have > their own performance and scalability problems because providing > consistency across a distributed multi-machine database adds a lot of > overhead. They can be made to be faster than a single machine setup, > but (from what I've heard; I have no direct experience here) you > usually have to throw lots of hardware at them and use low-latency > interconnects. > My recollection of distributed transactions is that the manager tells all players to do their thing. Then they each tell the manager "ok, did the deed". Then, if all players say ok, each of them is told by the manager to go ahead and commit, else rollback. Sorry, I don't recall the procedure for commit failure but I think you're in manual undo-land at that point.
On 06/01/2011 11:11 AM, Rob Sargent wrote: > My recollection of distributed transactions is that the manager tells > all players to do their thing. Then they each tell the manager "ok, did > the deed". Then, if all players say ok, each of them is told by the > manager to go ahead and commit, else rollback. Sorry, I don't recall the > procedure for commit failure but I think you're in manual undo-land at > that point. Distributed transactions only work well when all participants (or all but one participant) support two phase commit or a similar scheme where they can almost totally guarantee a successful commit. The idea is that each participant is told "I'm done doing work, prepare to commit and tell me when you can guarantee - as far as is ever possible - that commit will succeed". Once each participant affirms, the transaction manager can tell each to go ahead and be - almost - totally confident that all will go well. See: http://www.postgresql.org/docs/current/static/sql-prepare-transaction.html http://en.wikipedia.org/wiki/Two-phase_commit_protocol Of course, even with 2PC it's possible that something will go wrong when the final commit is issued, but it's (a) extremely unlikely and (b) likely that the failure can be remedied later by successfully applying the changes, rather than having to roll everything back across all the servers. A good example of where 2PC is important: think ATMs. There are two transactions - the writing of the accounting records, and the vending of cash. It is utterly vital that both succeed or fail together, but one is a physical operation that cannot be rolled back or undone. So you do the account updates, do the logical equivalent of PREPARE TRANSACTION to make sure the changes are on record and safe but not yet applied, vend the cash, and then COMMIT PREPARED. If the cash roller jammed, you can ROLLBACK PREPARED. (Of course, if the UPS has a hiccup between PREPARE TRANSACTION and COMMIT PREPARED, you don't know for sure if you successfully vended the cash, only that you intended to. You have to examine the ATM to find out, but then you can decide whether to commit or rollback the prepared transaction(s), instead of having to shrug your shoulders and say "er, yeah, we vended some cash but we don't know to whom or when because we lost those open transactions in a power failure"). -- Craig Ringer
Craig Ringer wrote: > Distributed transactions will give you atomicity if done right - with > two-phase commit (2PC) - but AFAIK will *NOT* give you consistency > across the databases in question. That's useful to know -- thanks. At least now I know my idea won't work, so I can forget about it and try to think of something else. :-) Pete