Thread: XA support (distributed transactions)
Hi, We (Fabalabs) are looking into adding XA support (distributed transactions) to Postgresql. I have been searching the web and came up with http://www.nabble.com/Re%3A-Postgres-XA-support-t226681.html#a633000 which cites some design difficulty. It is the case that the X/Open XA standard mandates "transaction branches" that are managed by calls to xa_start()/xa_end() in an out-of-band manner (via some kind of exported C VMT, sigh), in parallel to "native resource manager access" and so Michael Allman's objections (see link above) are correct when one assumes one single persistent postgresql connection per database prevailing, and a transaction manager that arbitrarily commands when to start/end transaction branches. What is the current status? ---------- Definitions: The goal of a Distributed Transaction is to enable a transaction to span multiple resources databases and still have the usual guarantees hold (ACID). Overview over X/Open Distributed Transactions: Parts: - Transaction Manager (1) - Resource Manager (n) - Application Program In chronological order, With a "global transaction", the Transaction Manager decides when to start one and makes up a globally unique id for the transaction branch. Then it notifies every Resource Manager that a transaction branch for a global transaction is to be started, passing the globally unique id along. From here, the Application Program accesses the Resource Manager as usual (SQL statements, ...), with the exception that transaction management commands are off-limits (BEGIN WORK, COMMIT WORK, ROLLBACK WORK). The Application Program accesses all other Resource Managers involved in the global transaction and does likewise. To commit the global transaction, the transaction manager is called and does: For all Resource Managers, notify it to dissociate the connection (of the "normal" native Postgresql connection) from the global transaction. For all Resource Managers, notify it that the global transaction is to be prepared to be committed (2-phase commit). Any failure results in rollback and error. On success, (From this point on, amnesia on part of the database about the prepared transaction is to be prevented) For all Resource Managers, notify it that the global transaction is to be committed. There are funny optimizations possible, but this is a very minimal picture of what is "supposed to be going on", just so we all have a basis for talk :) with kind regards, Danny Milosavljevic Fabalabs R&D
Attachment
Danny Milosavljevic wrote: > Hi, > > We (Fabalabs) are looking into adding XA support (distributed > transactions) to Postgresql. Postgres already supports 2 phase commits, which is can basis on which XA can be implemented I think. The "only" missing part is an transaction manager, but that wouldn't have to be integrated into postgres. As far as I understand things, the only thing you really need the transaction manager for is for automatic recovery if one member of a distributed transaction fails while the transaction is still in progress. In that case the transaction manager needs to either rollback the transaction, if it wasn't already prepared on all nodes, or commit it. The only difficulty I see in this "transaction manager" is that it will needto (persistenly) keep track of transactions, because due to the design of 2-phase-commit, you cannot deduce the faith of a transaction by just looking at the nodes. Image you find a transaction that is prepared in 3 out of 5 participating nodes. It might be that it was originally prepared on all 5 nodes, and already comitted on two of them, or it might have been comitted on only 3 nodes before the transaction manager itself crashed. In the first case you should commit the transaction on those 3 nodes, while you should roll it back in the second case. I believe there are open-source implementations of such transaction managers, but I don't have any links at hand. greetings, Florian Pflug
On 11/20/06, Florian G. Pflug <fgp@phlo.org> wrote: > Danny Milosavljevic wrote: > > Hi, > > > > We (Fabalabs) are looking into adding XA support (distributed > > transactions) to Postgresql. > > Postgres already supports 2 phase commits, which is can basis > on which XA can be implemented I think. The "only" missing > part is an transaction manager, but that wouldn't have to > be integrated into postgres. As far as I understand things, > the only thing you really need the transaction manager for > is for automatic recovery if one member of a distributed > transaction fails while the transaction is still in progress. > In that case the transaction manager needs to either rollback > the transaction, if it wasn't already prepared on all nodes, > or commit it. Hmm, a bit not on topic, but how big is 2PC overhead in PostgreSQL. I mean, I have an application which could benefit from 2PC but it is not mandatory -- it is connected to two DBs, and the second transaction is committed if first one succeeds. There is virtually no chance that the second commit won't succeed (rows are locked for update and all transactions lock rows in same order, etc, etc), but using 2PC would make a nice warm feeling. Then again, the process is quite stressful for that DB so I want to push as little overhead there as possible. Regards, Dawid
Dawid Kuroczko wrote: > On 11/20/06, Florian G. Pflug <fgp@phlo.org> wrote: >> Danny Milosavljevic wrote: >> > Hi, >> > >> > We (Fabalabs) are looking into adding XA support (distributed >> > transactions) to Postgresql. >> >> Postgres already supports 2 phase commits, which is can basis >> on which XA can be implemented I think. The "only" missing >> part is an transaction manager, but that wouldn't have to >> be integrated into postgres. As far as I understand things, >> the only thing you really need the transaction manager for >> is for automatic recovery if one member of a distributed >> transaction fails while the transaction is still in progress. >> In that case the transaction manager needs to either rollback >> the transaction, if it wasn't already prepared on all nodes, >> or commit it. > > Hmm, a bit not on topic, but how big is 2PC overhead in > PostgreSQL. I mean, I have an application which could > benefit from 2PC but it is not mandatory -- it is connected > to two DBs, and the second transaction is committed > if first one succeeds. There is virtually no chance that > the second commit won't succeed (rows are locked > for update and all transactions lock rows in same > order, etc, etc), but using 2PC would make a nice > warm feeling. Then again, the process is quite > stressful for that DB so I want to push as little > overhead there as possible. I think the biggest problem of XA without a transaction manager is that leftover 2PC-Transactions will hold their locks until a DBA manually resolves the situation by either comitting or rolling back the transaction. If the transaction only inserted new rows, than you might never realized that it's still there (because it's holding no row-level locks for rows visible to other transaction) until you notice that vacuum is no longer reclaiming free space (Because those old rows that vacuum could reclaim are still visible to your lingering 2pc transaction). So 2PC without some monitoring that ensures that "lost" transactions are dealt with seems like a guarantee for trouble to me. greetings, Florian Pflug
Danny Milosavljevic wrote: > Hi, > > We (Fabalabs) are looking into adding XA support (distributed > transactions) to Postgresql. > > I have been searching the web and came up with > http://www.nabble.com/Re%3A-Postgres-XA-support-t226681.html#a633000 > which cites some design difficulty. That mail discusses problems we had trying to implement XADataSource in the JDBC driver. Things have evolved from that, and we have a working solution that works. It doesn't support some of the advanced features, suspend/resume and transaction interleaving in particular, but support for those is rather flaky in other DBMS's as well, and it's not a problem in practice because J2EE application server implementations work well without them. Please read all the more recent discussions on pgsql-jdbc mailing list on the subject if you want to contribute. > It is the case that the X/Open XA standard mandates "transaction > branches" that are managed by calls to xa_start()/xa_end() in an > out-of-band manner (via some kind of exported C VMT, sigh), in parallel > to "native resource manager access" and so Michael Allman's objections > (see link above) are correct when one assumes one single persistent > postgresql connection per database prevailing, and a transaction manager > that arbitrarily commands when to start/end transaction branches. > > What is the current status? As I said above, we have working solution for the JDBC driver. I'm not sure if what we have would suffice for a native C XA client library. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Florian, > I believe there are open-source implementations of such transaction > managers, but I don't have any links at hand. In Java, the Sequoia Project. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
Josh Berkus wrote: > Florian, > >> I believe there are open-source implementations of such transaction >> managers, but I don't have any links at hand. > > In Java, the Sequoia Project. I haven't used Sequoia, but I don't think you can use it as a general purpose transaction manager. It does take care of replicating updates to all the databases it controls, but AFAIK you can't use it to perform ad hoc distributed transactions across resource managers. There is a lot of open source JTA (Java Transaction API) transaction manager implementations, though. JOTM and SimpleJTA to name a few. And all the J2EE application server implementations include a transaction manager as well. I'm not aware of any open source non-Java transaction managers. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hi, On Mon, 20 Nov 2006 18:47:12 +0100, Florian G. Pflug wrote: > Danny Milosavljevic wrote: >> Hi, >> >> We (Fabalabs) are looking into adding XA support (distributed >> transactions) to Postgresql. > > Postgres already supports 2 phase commits, which is can basis > on which XA can be implemented I think. Yes, among other things needed. > The "only" missing > part is an transaction manager, but that wouldn't have to > be integrated into postgres. [...] Our parent company, Fabasoft, already wrote a transaction manager ("Fabasoft Transaction Manager"). I doubt that it is Open Source but I'll ask. On the other hand, it's there and already works on Oracle so we aren't stepping in the dark, testing-/interface-wise. > The only difficulty I see in this "transaction manager" is that > it will need to (persistenly) keep track of transactions, Indeed, it does keep track of the state of the global transactions, in a sophisticated text file :). > because > due to the design of 2-phase-commit, you cannot deduce the faith > of a transaction by just looking at the nodes. But you can find out the prepared transactions of a two phase commit (there is a pg system table for that, "pg_prepared_xacts"). What's important is that if the database management system were to crash, the prepared entries in that table _need to persist_ and be available after a restart (so that the transaction manager can find out what actually worked and what didn't work per database before the connection broke - in XA slang, with a "XA RECOVER"). What's also important is that PREPARED transactions must not complete (not commit, not rollback) when the pq connection to the database is closed. This ensures that the database doesn't lose (nor commit) already prepared transactions in the unlikely but evil case of a crash/disconnect just between a 2PC PREPARE and a 2PC COMMIT. (Transactions not yet prepared can just vanish in thin air on the DBMS' discretion) Also, XA has some kind of semi-parallelism where one global transaction branch is supposed to be able to be suspended, another global transaction branch to be worked on "instead", then this one suspended too, then the original resumed, and so on, leading to something reminiscent of time slices :). I have no idea how far Postgresql supports this particularily tricky part, and whether it makes any sense to (or whether just a commit/rollback of a previously crashed-but-prepared transaction suffices). So my actual questions are: what is a postgresql transaction tied to, if anything? Or is it a first-class object? What happens to prepared-but-not-yet-committed 2PC transactions on a crash? regards, Danny Milosavljevic
Attachment
Danny Milosavljevic wrote: > > The "only" missing > > part is an transaction manager, but that wouldn't have to > > be integrated into postgres. [...] > > Our parent company, Fabasoft, already wrote a transaction manager > ("Fabasoft Transaction Manager"). I doubt that it is Open Source but > I'll ask. Are you a Java shop, using JTA, or are working with a X/Open XA spec compliant interface? If the former, I'd strongly suggest not to invent the wheel again. Implementing a transaction manager reliably and efficiently isn't as easy as it looks, and there's plenty of open source JTA implementations available. > > because > > due to the design of 2-phase-commit, you cannot deduce the faith > > of a transaction by just looking at the nodes. > > But you can find out the prepared transactions of a two phase commit > (there is a pg system table for that, "pg_prepared_xacts"). > > What's important is that if the database management system were to > crash, the prepared entries in that table _need to persist_ and be > available after a restart (so that the transaction manager can find out > what actually worked and what didn't work per database before the > connection broke - in XA slang, with a "XA RECOVER"). > > What's also important is that PREPARED transactions must not complete > (not commit, not rollback) when the pq connection to the database is > closed. This ensures that the database doesn't lose (nor commit) already > prepared transactions in the unlikely but evil case of a > crash/disconnect just between a 2PC PREPARE and a 2PC COMMIT. > (Transactions not yet prepared can just vanish in thin air on the DBMS' > discretion) We have all that in PostgreSQL. > Also, XA has some kind of semi-parallelism where one global transaction > branch is supposed to be able to be suspended, another global > transaction branch to be worked on "instead", then this one suspended > too, then the original resumed, and so on, leading to something > reminiscent of time slices :). I have no idea how far Postgresql > supports this particularily tricky part, and whether it makes any sense > to (or whether just a commit/rollback of a previously > crashed-but-prepared transaction suffices). We don't support that in PostgreSQL. That's what the thread on pgsql-jdbc was all about. There's been a lot of discussion on this, and the conclusion is: there's isn't anything interesting you can't do without it, and implementing it would require huge changes. What we have is sufficient for reliable distributed transaction. > So my actual questions are: > what is a postgresql transaction tied to, if anything? Or is it a > first-class object? > What happens to prepared-but-not-yet-committed 2PC transactions on a crash? After PREPARE TRANSACTION, the transaction is disassociated from the connection, and it'll stay prepared even after a server crash. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hi, Heikki Linnakangas wrote: > Danny Milosavljevic wrote: > >> > The "only" missing >> > part is an transaction manager, but that wouldn't have to >> > be integrated into postgres. [...] >> >> Our parent company, Fabasoft, already wrote a transaction manager >> ("Fabasoft Transaction Manager"). I doubt that it is Open Source but >> I'll ask. > > > Are you a Java shop, using JTA, or are working with a X/Open XA spec Fabasoft is an e-Government Document Management Provider, who has Administration Offices and such as customers (who need looong retention, search, versioning, signatures, paranoid access control, access logging, custom forms and so on). Fabasoft has its own object/components model and uses C/C++/Javascript code to implement methods there (the kernel is in C++). While it is possible to use a Java or dotnet gate to access it, I don't think that they are actually using those in production systems (at least not in the parts not written by the customers themselves). > compliant interface? If the former, I'd strongly suggest not to invent > the wheel again. Implementing a transaction manager reliably and > efficiently isn't as easy as it looks, and there's plenty of open source > JTA implementations available. Good to know, but the whole software already works fine for years in an Fabasoft Components -> Fabasoft Transaction Manager -> Oracle Database setting (with Distributed Transactions), so its not like we are starting anew :) From what I gather even PostgreSQL as it is now could be sufficient if the Transaction Manager were to use a protocol (like Pgsql-JDBC) does to talk to PostgreSQL. > >> [prepared transactions persist after a crash] >> [not to complete PREPARED transactions unilaterally] > > We have all that in PostgreSQL. > >> [time slices] > > We don't support that in PostgreSQL. That's what the thread on > pgsql-jdbc was all about. There's been a lot of discussion on this, and > the conclusion is: there's isn't anything interesting you can't do > without it, and implementing it would require huge changes. What we have > is sufficient for reliable distributed transaction. Yes, I think so, too. In fact, I'm off to a meeting with management to see whether that is fine. We'll see. > After PREPARE TRANSACTION, the transaction is disassociated from the > connection, and it'll stay prepared even after a server crash. Very nice. regards, Danny Milosavljevic
Attachment
Ühel kenal päeval, T, 2006-11-21 kell 10:12, kirjutas Danny Milosavljevic: > Hi, > > On Mon, 20 Nov 2006 18:47:12 +0100, Florian G. Pflug wrote: > > The only difficulty I see in this "transaction manager" is that > > it will need to (persistenly) keep track of transactions, > > Indeed, it does keep track of the state of the global transactions, in a > sophisticated text file :). How is this file kept ACID-compliant ? -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Hi, Heikki Linnakangas wrote: > Danny Milosavljevic wrote: > >> Hi, >> >> We (Fabalabs) are looking into adding XA support (distributed >> transactions) to Postgresql. > [...] > > As I said above, we have working solution for the JDBC driver. I'm not > sure if what we have would suffice for a native C XA client library. Well, let's try and find out :) A quick meeting yielded that "we don't need interleaving / migration either", so I guess its ok. I am porting the JDBC XA stuff (as-is) to C, as a first step. regards, Danny Milosavljevic
Attachment
On Nov 21, 2006, at 4:09 AM, Heikki Linnakangas wrote: > Josh Berkus wrote: >> Florian, >>> I believe there are open-source implementations of such transaction >>> managers, but I don't have any links at hand. >> In Java, the Sequoia Project. > > I haven't used Sequoia, but I don't think you can use it as a > general purpose transaction manager. It does take care of > replicating updates to all the databases it controls, but AFAIK you > can't use it to perform ad hoc distributed transactions across > resource managers. I don't believe Sequoia uses 2PC/XA, either. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On 11/26/06, Jim Nasby <decibel@decibel.org> wrote: > > I don't believe Sequoia uses 2PC/XA, either. And it really doesn't. The mechanism used by Sequoia is based on statement replication across the cluster nodes, using group communication. There are controllers nodes responsable by the execution of the statements on the backend nodes, each controller having it's own backends that cannot be shared by different controllers. Each backend is a real database system accessed by the controller via JDBC. The group of controllers is designed to provide a abstraction to the group of distributed databases called a "virtual database". The controller node that receives the statement does a pre-processing of this statement, replacing functions like random(), now() and other specified in the configuration file by constant values and then does a broadcast of the statement to all controller nodes. Each controller in the group executes the statement on all it's backends and send the result of the execution to the controller responsable by the statement issued. There are internal mechanisms to guarantee ordered execution of the statements and proper execution of transactions across the virtual database, but I am only giving a short description of the funcionality based on the documentation available on the website and hands-on testing of the middleware that I'm doing at work for quite some time. -- João Eduardo Mikos http://www.icewall.org/~jeduardo/