Thread: XA support (distributed transactions)

XA support (distributed transactions)

From
Danny Milosavljevic
Date:
Hi,

We (Fabalabs) are looking into adding XA support (distributed
transactions) to Postgresql.

I have been searching the web and came up with
http://www.nabble.com/Re%3A-Postgres-XA-support-t226681.html#a633000
which cites some design difficulty.

It is the case that the X/Open XA standard mandates "transaction
branches" that are managed by calls to xa_start()/xa_end() in an
out-of-band manner (via some kind of exported C VMT, sigh), in parallel
to "native resource manager access" and so Michael Allman's objections
(see link above) are correct when one assumes one single persistent
postgresql connection per database prevailing, and a transaction manager
that arbitrarily commands when to start/end transaction branches.

What is the current status?

----------
Definitions:

The goal of a Distributed Transaction is to enable a transaction to span
multiple resources databases and still have the usual guarantees hold
(ACID).

Overview over X/Open Distributed Transactions:

Parts:
- Transaction Manager (1)
- Resource Manager (n)
- Application Program

In chronological order,

With a "global transaction", the Transaction Manager decides when to
start one and makes up a globally unique id for the transaction branch.
Then it notifies every Resource Manager that a transaction branch for a
global transaction is to be started, passing the globally unique id along.

  From here, the Application Program accesses the Resource Manager as
usual (SQL statements, ...), with the exception that transaction
management commands are off-limits (BEGIN WORK, COMMIT WORK, ROLLBACK WORK).

The Application Program accesses all other Resource Managers involved in
the global transaction and does likewise.

To commit the global transaction, the transaction manager is called and
does:

For all Resource Managers, notify it to dissociate the connection (of
the "normal" native Postgresql connection) from the global transaction.
For all Resource Managers, notify it that the global transaction is to
be prepared to be committed (2-phase commit).
Any failure results in rollback and error.
On success,
(From this point on, amnesia on part of the database about the prepared
transaction is to be prevented)
For all Resource Managers, notify it that the global transaction is to
be committed.

There are funny optimizations possible, but this is a very minimal
picture of what is "supposed to be going on", just so we all have a
basis for talk :)

with kind regards,
    Danny Milosavljevic
    Fabalabs R&D



Attachment

Re: XA support (distributed transactions)

From
"Florian G. Pflug"
Date:
Danny Milosavljevic wrote:
> Hi,
> 
> We (Fabalabs) are looking into adding XA support (distributed
> transactions) to Postgresql.

Postgres already supports 2 phase commits, which is can basis
on which XA can be implemented I think. The "only" missing
part is an transaction manager, but that wouldn't have to
be integrated into postgres. As far as I understand things,
the only thing you really need the transaction manager for
is for automatic recovery if one member of a distributed
transaction fails while the transaction is still in progress.
In that case the transaction manager needs to either rollback
the transaction, if it wasn't already prepared on all nodes,
or commit it.

The only difficulty I see in this "transaction manager" is that
it will needto (persistenly) keep track of transactions, because
due to the design of 2-phase-commit, you cannot deduce the faith
of a transaction by just looking at the nodes.

Image you find a transaction that is prepared in 3 out of 5
participating nodes. It might be that it was originally
prepared on all 5 nodes, and already comitted on two of them,
or it might have been comitted on only 3 nodes before the
transaction manager itself crashed. In the first case you
should commit the transaction on those 3 nodes, while you
should roll it back in the second case.

I believe there are open-source implementations of such transaction 
managers, but I don't have any links at hand.

greetings, Florian Pflug



Re: XA support (distributed transactions)

From
"Dawid Kuroczko"
Date:
On 11/20/06, Florian G. Pflug <fgp@phlo.org> wrote:
> Danny Milosavljevic wrote:
> > Hi,
> >
> > We (Fabalabs) are looking into adding XA support (distributed
> > transactions) to Postgresql.
>
> Postgres already supports 2 phase commits, which is can basis
> on which XA can be implemented I think. The "only" missing
> part is an transaction manager, but that wouldn't have to
> be integrated into postgres. As far as I understand things,
> the only thing you really need the transaction manager for
> is for automatic recovery if one member of a distributed
> transaction fails while the transaction is still in progress.
> In that case the transaction manager needs to either rollback
> the transaction, if it wasn't already prepared on all nodes,
> or commit it.

Hmm, a bit not on topic, but how big is 2PC overhead in
PostgreSQL.  I mean, I have an application which could
benefit from 2PC but it is not mandatory -- it is connected
to two DBs, and the second transaction is committed
if first one succeeds.  There is virtually no chance that
the second commit won't succeed (rows are locked
for update and all transactions lock rows in same
order, etc, etc), but using 2PC would make a nice
warm feeling.  Then again, the process is quite
stressful for that DB so I want to push as little
overhead there as possible.
  Regards,      Dawid


Re: XA support (distributed transactions)

From
"Florian G. Pflug"
Date:
Dawid Kuroczko wrote:
> On 11/20/06, Florian G. Pflug <fgp@phlo.org> wrote:
>> Danny Milosavljevic wrote:
>> > Hi,
>> >
>> > We (Fabalabs) are looking into adding XA support (distributed
>> > transactions) to Postgresql.
>>
>> Postgres already supports 2 phase commits, which is can basis
>> on which XA can be implemented I think. The "only" missing
>> part is an transaction manager, but that wouldn't have to
>> be integrated into postgres. As far as I understand things,
>> the only thing you really need the transaction manager for
>> is for automatic recovery if one member of a distributed
>> transaction fails while the transaction is still in progress.
>> In that case the transaction manager needs to either rollback
>> the transaction, if it wasn't already prepared on all nodes,
>> or commit it.
> 
> Hmm, a bit not on topic, but how big is 2PC overhead in
> PostgreSQL.  I mean, I have an application which could
> benefit from 2PC but it is not mandatory -- it is connected
> to two DBs, and the second transaction is committed
> if first one succeeds.  There is virtually no chance that
> the second commit won't succeed (rows are locked
> for update and all transactions lock rows in same
> order, etc, etc), but using 2PC would make a nice
> warm feeling.  Then again, the process is quite
> stressful for that DB so I want to push as little
> overhead there as possible.

I think the biggest problem of XA without a transaction manager
is that leftover 2PC-Transactions will hold their locks
until a DBA manually resolves the situation by either comitting
or rolling back the transaction. If the transaction only
inserted new rows, than you might never realized that it's
still there (because it's holding no row-level locks for
rows visible to other transaction) until you notice that
vacuum is no longer reclaiming free space (Because those
old rows that vacuum could reclaim are still visible to
your lingering 2pc transaction). So 2PC without some monitoring
that ensures that "lost" transactions are dealt with seems
like a guarantee for trouble to me.

greetings, Florian Pflug




Re: XA support (distributed transactions)

From
"Heikki Linnakangas"
Date:
Danny Milosavljevic wrote:
> Hi,
> 
> We (Fabalabs) are looking into adding XA support (distributed
> transactions) to Postgresql.
> 
> I have been searching the web and came up with
> http://www.nabble.com/Re%3A-Postgres-XA-support-t226681.html#a633000
> which cites some design difficulty.

That mail discusses problems we had trying to implement XADataSource in 
the JDBC driver. Things have evolved from that, and we have a working 
solution that works. It doesn't support some of the advanced features, 
suspend/resume and transaction interleaving in particular, but support 
for those is rather flaky in other DBMS's as well, and it's not a 
problem in practice because J2EE application server implementations work 
well without them. Please read all the more recent discussions on 
pgsql-jdbc mailing list on the subject if you want to contribute.

> It is the case that the X/Open XA standard mandates "transaction
> branches" that are managed by calls to xa_start()/xa_end() in an
> out-of-band manner (via some kind of exported C VMT, sigh), in parallel
> to "native resource manager access" and so Michael Allman's objections
> (see link above) are correct when one assumes one single persistent
> postgresql connection per database prevailing, and a transaction manager
> that arbitrarily commands when to start/end transaction branches.
> 
> What is the current status?

As I said above, we have working solution for the JDBC driver. I'm not 
sure if what we have would suffice for a native C XA client library.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: XA support (distributed transactions)

From
Josh Berkus
Date:
Florian,

> I believe there are open-source implementations of such transaction
> managers, but I don't have any links at hand.

In Java, the Sequoia Project.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: XA support (distributed transactions)

From
"Heikki Linnakangas"
Date:
Josh Berkus wrote:
> Florian,
> 
>> I believe there are open-source implementations of such transaction
>> managers, but I don't have any links at hand.
> 
> In Java, the Sequoia Project.

I haven't used Sequoia, but I don't think you can use it as a general 
purpose transaction manager. It does take care of replicating updates to 
all the databases it controls, but AFAIK you can't use it to perform ad 
hoc distributed transactions across resource managers.

There is a lot of open source JTA (Java Transaction API) transaction 
manager implementations, though. JOTM and SimpleJTA to name a few. And 
all the J2EE application server implementations include a transaction 
manager as well.

I'm not aware of any open source non-Java transaction managers.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: XA support (distributed transactions)

From
Danny Milosavljevic
Date:
Hi,

On Mon, 20 Nov 2006 18:47:12 +0100, Florian G. Pflug wrote:

 > Danny Milosavljevic wrote:
 >> Hi,
 >>
 >> We (Fabalabs) are looking into adding XA support (distributed
 >> transactions) to Postgresql.
 >
 > Postgres already supports 2 phase commits, which is can basis
 > on which XA can be implemented I think.

Yes, among other things needed.

 > The "only" missing
 > part is an transaction manager, but that wouldn't have to
 > be integrated into postgres. [...]

Our parent company, Fabasoft, already wrote a transaction manager
("Fabasoft Transaction Manager"). I doubt that it is Open Source but
I'll ask.

On the other hand, it's there and already works on Oracle so we aren't
stepping in the dark, testing-/interface-wise.

 > The only difficulty I see in this "transaction manager" is that

 > it will need to (persistenly) keep track of transactions,

Indeed, it does keep track of the state of the global transactions, in a
sophisticated text file :).

 > because
 > due to the design of 2-phase-commit, you cannot deduce the faith
 > of a transaction by just looking at the nodes.

But you can find out the prepared transactions of a two phase commit
(there is a pg system table for that, "pg_prepared_xacts").

What's important is that if the database management system were to
crash, the prepared entries in that table _need to persist_ and be
available after a restart (so that the transaction manager can find out
what actually worked and what didn't work per database before the
connection broke - in XA slang, with a "XA RECOVER").

What's also important is that PREPARED transactions must not complete
(not commit, not rollback) when the pq connection to the database is
closed. This ensures that the database doesn't lose (nor commit) already
prepared transactions in the unlikely but evil case of a
crash/disconnect just between a 2PC PREPARE and a 2PC COMMIT.
(Transactions not yet prepared can just vanish in thin air on the DBMS'
discretion)

Also, XA has some kind of semi-parallelism where one global transaction
branch is supposed to be able to be suspended, another global
transaction branch to be worked on "instead", then this one suspended
too, then the original resumed, and so on, leading to something
reminiscent of time slices :). I have no idea how far Postgresql
supports this particularily tricky part, and whether it makes any sense
to (or whether just a commit/rollback of a previously
crashed-but-prepared transaction suffices).

So my actual questions are:
what is a postgresql transaction tied to, if anything? Or is it a
first-class object?
What happens to prepared-but-not-yet-committed 2PC transactions on a crash?

regards,
   Danny Milosavljevic


Attachment

Re: XA support (distributed transactions)

From
"Heikki Linnakangas"
Date:
Danny Milosavljevic wrote:
>  > The "only" missing
>  > part is an transaction manager, but that wouldn't have to
>  > be integrated into postgres. [...]
> 
> Our parent company, Fabasoft, already wrote a transaction manager 
> ("Fabasoft Transaction Manager"). I doubt that it is Open Source but 
> I'll ask.

Are you a Java shop, using JTA, or are working with a X/Open XA spec 
compliant interface? If the former, I'd strongly suggest not to invent 
the wheel again. Implementing a transaction manager reliably and 
efficiently isn't as easy as it looks, and there's plenty of open source 
JTA implementations available.

>  > because
>  > due to the design of 2-phase-commit, you cannot deduce the faith
>  > of a transaction by just looking at the nodes.
> 
> But you can find out the prepared transactions of a two phase commit
> (there is a pg system table for that, "pg_prepared_xacts").
> 
> What's important is that if the database management system were to 
> crash, the prepared entries in that table _need to persist_ and be 
> available after a restart (so that the transaction manager can find out 
> what actually worked and what didn't work per database before the 
> connection broke - in XA slang, with a "XA RECOVER").
>
> What's also important is that PREPARED transactions must not complete 
> (not commit, not rollback) when the pq connection to the database is 
> closed. This ensures that the database doesn't lose (nor commit) already 
> prepared transactions in the unlikely but evil case of a 
> crash/disconnect just between a 2PC PREPARE and a 2PC COMMIT. 
> (Transactions not yet prepared can just vanish in thin air on the DBMS' 
> discretion)

We have all that in PostgreSQL.

> Also, XA has some kind of semi-parallelism where one global transaction 
> branch is supposed to be able to be suspended, another global 
> transaction branch to be worked on "instead", then this one suspended 
> too, then the original resumed, and so on, leading to something 
> reminiscent of time slices :). I have no idea how far Postgresql 
> supports this particularily tricky part, and whether it makes any sense 
> to (or whether just a commit/rollback of a previously 
> crashed-but-prepared transaction suffices).

We don't support that in PostgreSQL. That's what the thread on 
pgsql-jdbc was all about. There's been a lot of discussion on this, and 
the conclusion is: there's isn't anything interesting you can't do 
without it, and implementing it would require huge changes. What we have 
is sufficient for reliable distributed transaction.

> So my actual questions are:
> what is a postgresql transaction tied to, if anything? Or is it a
> first-class object?
> What happens to prepared-but-not-yet-committed 2PC transactions on a crash?

After PREPARE TRANSACTION, the transaction is disassociated from the 
connection, and it'll stay prepared even after a server crash.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: XA support (distributed transactions)

From
Danny Milosavljevic
Date:
Hi,

Heikki Linnakangas wrote:
> Danny Milosavljevic wrote:
>
>>  > The "only" missing
>>  > part is an transaction manager, but that wouldn't have to
>>  > be integrated into postgres. [...]
>>
>> Our parent company, Fabasoft, already wrote a transaction manager
>> ("Fabasoft Transaction Manager"). I doubt that it is Open Source but
>> I'll ask.
>
>
> Are you a Java shop, using JTA, or are working with a X/Open XA spec

Fabasoft is an e-Government Document Management Provider, who has
Administration Offices and such as customers (who need looong retention,
search, versioning, signatures, paranoid access control, access logging,
custom forms and so on).

Fabasoft has its own object/components model and uses C/C++/Javascript
code to implement methods there (the kernel is in C++). While it is
possible to use a Java or dotnet gate to access it, I don't think that
they are actually using those in production systems (at least not in the
parts not written by the customers themselves).

> compliant interface? If the former, I'd strongly suggest not to invent
> the wheel again. Implementing a transaction manager reliably and
> efficiently isn't as easy as it looks, and there's plenty of open source
> JTA implementations available.

Good to know, but the whole software already works fine for years in an
Fabasoft Components -> Fabasoft Transaction Manager -> Oracle Database
setting (with Distributed Transactions), so its not like we are starting
anew :)

 From what I gather even PostgreSQL as it is now could be sufficient if
the Transaction Manager were to use a protocol (like Pgsql-JDBC) does to
talk to PostgreSQL.

>
>> [prepared transactions persist after a crash]
>> [not to complete PREPARED transactions unilaterally]
>
> We have all that in PostgreSQL.
>
>> [time slices]
>
> We don't support that in PostgreSQL. That's what the thread on
> pgsql-jdbc was all about. There's been a lot of discussion on this, and
> the conclusion is: there's isn't anything interesting you can't do
> without it, and implementing it would require huge changes. What we have
> is sufficient for reliable distributed transaction.

Yes, I think so, too. In fact, I'm off to a meeting with management to
see whether that is fine. We'll see.

> After PREPARE TRANSACTION, the transaction is disassociated from the
> connection, and it'll stay prepared even after a server crash.

Very nice.

regards,
   Danny Milosavljevic

Attachment

Re: XA support (distributed transactions)

From
Hannu Krosing
Date:
Ühel kenal päeval, T, 2006-11-21 kell 10:12, kirjutas Danny
Milosavljevic:
> Hi,
> 
> On Mon, 20 Nov 2006 18:47:12 +0100, Florian G. Pflug wrote:

>  > The only difficulty I see in this "transaction manager" is that
>  > it will need to (persistenly) keep track of transactions,
> 
> Indeed, it does keep track of the state of the global transactions, in a 
> sophisticated text file :).

How is this file kept ACID-compliant ?

-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



Re: XA support (distributed transactions)

From
Danny Milosavljevic
Date:
Hi,

Heikki Linnakangas wrote:
> Danny Milosavljevic wrote:
>
>> Hi,
>>
>> We (Fabalabs) are looking into adding XA support (distributed
>> transactions) to Postgresql.
> [...]
>
> As I said above, we have working solution for the JDBC driver. I'm not
> sure if what we have would suffice for a native C XA client library.

Well, let's try and find out :)

A quick meeting yielded that "we don't need interleaving / migration
either", so I guess its ok.

I am porting the JDBC XA stuff (as-is) to C, as a first step.

regards,
   Danny Milosavljevic


Attachment

Re: XA support (distributed transactions)

From
Jim Nasby
Date:
On Nov 21, 2006, at 4:09 AM, Heikki Linnakangas wrote:
> Josh Berkus wrote:
>> Florian,
>>> I believe there are open-source implementations of such transaction
>>> managers, but I don't have any links at hand.
>> In Java, the Sequoia Project.
>
> I haven't used Sequoia, but I don't think you can use it as a  
> general purpose transaction manager. It does take care of  
> replicating updates to all the databases it controls, but AFAIK you  
> can't use it to perform ad hoc distributed transactions across  
> resource managers.

I don't believe Sequoia uses 2PC/XA, either.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)




Re: XA support (distributed transactions)

From
"J. Eduardo"
Date:
On 11/26/06, Jim Nasby <decibel@decibel.org> wrote:
>
> I don't believe Sequoia uses 2PC/XA, either.

And it really doesn't. The mechanism used by Sequoia is based on
statement replication across the cluster nodes, using group
communication. There are controllers nodes responsable by the
execution of the statements on the backend nodes, each controller
having it's own backends that cannot be shared by different
controllers. Each backend is a real database system accessed by the
controller via JDBC. The group of controllers is designed to provide a
abstraction to the group of distributed databases called a "virtual
database".

The controller node that receives the statement does a pre-processing
of this statement,  replacing functions like random(), now() and other
specified in the configuration file by constant values and then does a
broadcast of  the statement to all controller nodes. Each controller
in the group executes the statement on all it's backends and send the
result of the execution to the controller responsable by the statement
issued.

There are internal mechanisms to guarantee ordered execution of the
statements and proper execution of transactions across the virtual
database, but I am only giving a short description of the funcionality
based on the documentation available on the website and hands-on
testing of the middleware that I'm doing at work for quite some time.

--
João Eduardo Mikos
http://www.icewall.org/~jeduardo/