Thread: jdbc xa support

jdbc xa support

From

Michael Allman

Date:

19 July 2005, 20:34:40

Hi,

I'm implementing XA support in the postgres JDBC driver to complement the
backend two phase commit support in CVS.  Is anyone else working on this?

Michael

Re: jdbc xa support

From

Dave Cramer

Date:

19 July 2005, 21:40:35

Not sure, but look at the archives, there was some discussion about
various mechanism.

Dave
On 19-Jul-05, at 7:34 PM, Michael Allman wrote:

> Hi,
>
> I'm implementing XA support in the postgres JDBC driver to
> complement the backend two phase commit support in CVS.  Is anyone
> else working on this?
>
> Michael
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>
>

Re: jdbc xa support

From

Heikki Linnakangas

Date:

20 July 2005, 13:31:44

I'm working on it. Glad to hear your interested. I don't have much time to
work on it though, so I would be very happy if you took over.

I'll send you a copy of my workbench off-list so you can take a look. It's
work in progress, but I hope it helps..

I believe this is the discussion Dave mentioned:

http://archives.postgresql.org/pgsql-jdbc/2005-06/msg00165.php

- Heikki

On Tue, 19 Jul 2005, Dave Cramer wrote:

> Not sure, but look at the archives, there was some discussion about various
> mechanism.
>
> Dave
> On 19-Jul-05, at 7:34 PM, Michael Allman wrote:
>
>> Hi,
>>
>> I'm implementing XA support in the postgres JDBC driver to complement the
>> backend two phase commit support in CVS.  Is anyone else working on this?
>>
>> Michael

Re: jdbc xa support

From

Michael Allman

Date:

20 July 2005, 22:22:02

Here's my first cut:

http://www.allman.ms/pgjdbcxa/pgjdbcxa-20050720.jar

At this point I know the documentation is sparse.  I'll try to improve
that situation soon.  Until then, I wanted to give everyone the
opportunity to take a first look at the code and the approach.

I also have some questions, some of which are embedded in the .java files
as comments.  If anyone has answers, please pass them along.

I'll let you chew on this and check back tomorrow.

Cheers,

Michael

On Wed, 20 Jul 2005, Heikki Linnakangas wrote:

> I'm working on it. Glad to hear your interested. I don't have much time to
> work on it though, so I would be very happy if you took over.
>
> I'll send you a copy of my workbench off-list so you can take a look. It's
> work in progress, but I hope it helps..
>
> I believe this is the discussion Dave mentioned:
>
> http://archives.postgresql.org/pgsql-jdbc/2005-06/msg00165.php
>
> - Heikki
>
> On Tue, 19 Jul 2005, Dave Cramer wrote:
>
>> Not sure, but look at the archives, there was some discussion about various
>> mechanism.
>>
>> Dave
>> On 19-Jul-05, at 7:34 PM, Michael Allman wrote:
>>
>>> Hi,
>>>
>>> I'm implementing XA support in the postgres JDBC driver to complement the
>>> backend two phase commit support in CVS.  Is anyone else working on this?
>>>
>>> Michael
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>              http://archives.postgresql.org
>

Re: jdbc xa support

From

Heikki Linnakangas

Date:

21 July 2005, 16:15:37

Thanks.

So far so good. You still have all the same tough issues ahead as I do,
though.

I realize it's work in progress, but here's some comments in no particular
order:

1. To answer the question in the source code: PostgreSQL doesn't support
transaction timeouts

2. Using prepared statements like "PREPARE TRANSACTION ?" won't work. You
can only use prepared statements for normal SELECT/UPDATE/DELETE commands.

3. How are you planning to handle transaction interleaving discussed
in the thread Dave mentioned?

4. recover is broken because it ignores the flags argument. That's going
to cause an endless loop in the transaction manager when it tries to
recover. See this discussion:
http://forum.java.sun.com/thread.jspa?threadID=475468&messageID=2232566

5. commit and rollback check that the transaction is found in
XID_TO_TRANSACTION_STATE_MAP. However, after a crash/recover cycle, the
map will be empty.

6. isSameRM considers two connections to the same database as different
RMs. I'm not sure what the implications of this are, but I feel that's not
right. I have the same issue in my implementation as well...

Also take a look at a list of comments on my code by Mike Bonnet. Some of
them probably apply to your code as well. Those comments are about the
version that's on my pg webpage:

http://users.tkk.fi/~hlinnaka/pgsql/

The XA and JTA specifications are quite complicated. I'd like to see a
good set of test cases that exercise all possible scenarious and also
error conditions. We're also going to need testers with access to the
popular application servers so that we know our implementation works with
them. AFAIK, the only open source application server that does recovery
properly is the CVS head version of JOnAS.

Also, if we violate some parts of the specs (like the transaction
interleaving part), it's important to know exactly what the limitations
are and why. I started to write down the exact preconditions for each
method in the javadoc comments, and also  separate which preconditions
come from the specs and which are just implementation-specific
limitations.

On Wed, 20 Jul 2005, Michael Allman wrote:

> Here's my first cut:
>
> http://www.allman.ms/pgjdbcxa/pgjdbcxa-20050720.jar
>
> At this point I know the documentation is sparse.  I'll try to improve that
> situation soon.  Until then, I wanted to give everyone the opportunity to
> take a first look at the code and the approach.
>
> I also have some questions, some of which are embedded in the .java files as
> comments.  If anyone has answers, please pass them along.
>
> I'll let you chew on this and check back tomorrow.
>
> Cheers,
>
> Michael
>
> On Wed, 20 Jul 2005, Heikki Linnakangas wrote:
>
>> I'm working on it. Glad to hear your interested. I don't have much time to
>> work on it though, so I would be very happy if you took over.
>>
>> I'll send you a copy of my workbench off-list so you can take a look. It's
>> work in progress, but I hope it helps..
>>
>> I believe this is the discussion Dave mentioned:
>>
>> http://archives.postgresql.org/pgsql-jdbc/2005-06/msg00165.php
>>
>> - Heikki
>>
>> On Tue, 19 Jul 2005, Dave Cramer wrote:
>>
>>> Not sure, but look at the archives, there was some discussion about
>>> various mechanism.
>>>
>>> Dave
>>> On 19-Jul-05, at 7:34 PM, Michael Allman wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm implementing XA support in the postgres JDBC driver to complement the
>>>> backend two phase commit support in CVS.  Is anyone else working on this?
>>>>
>>>> Michael
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 4: Have you searched our list archives?
>>
>>              http://archives.postgresql.org
>>
>

- Heikki

Re: jdbc xa support

From

Michael Allman

Date:

21 July 2005, 22:04:44

Heikki,

Thanks for your feedback.  I am working on improvements and documentation.
The rest of my response is below.

On Thu, 21 Jul 2005, Heikki Linnakangas wrote:

> Thanks.
>
> So far so good. You still have all the same tough issues ahead as I do,
> though.
>
> I realize it's work in progress, but here's some comments in no particular
> order:
>
> 1. To answer the question in the source code: PostgreSQL doesn't support
> transaction timeouts

I figured.

> 2. Using prepared statements like "PREPARE TRANSACTION ?" won't work. You can
> only use prepared statements for normal SELECT/UPDATE/DELETE commands.

Doesn't the driver support client side prepared statements?

> 3. How are you planning to handle transaction interleaving discussed in the
> thread Dave mentioned?

I'm not.  PostgreSQL does not support this behavior, and I see no need to
pretend it does.  I think the appropriate thing to do is throw an
exception when the second start is called.

I have serious doubts that any SQL database in the world supports this
behavior correctly.  If you know of one that does, I'd like to see its
magic.

> 4. recover is broken because it ignores the flags argument. That's going to
> cause an endless loop in the transaction manager when it tries to recover.
> See this discussion:
> http://forum.java.sun.com/thread.jspa?threadID=475468&messageID=2232566

That is problematic.  The API for recovery is stateful, and, IMHO, poorly
designed.  If you look at the original DTP XA spec you'll see it makes
much more sense.

I don't know what to do about this yet.

> 5. commit and rollback check that the transaction is found in
> XID_TO_TRANSACTION_STATE_MAP. However, after a crash/recover cycle, the map
> will be empty.

I think I have a good solution to this.  I'll post it with my next
patchset, coming soon.  I'll elaborate then.

> 6. isSameRM considers two connections to the same database as different RMs.
> I'm not sure what the implications of this are, but I feel that's not right.
> I have the same issue in my implementation as well...

They're different RM's because you can't join a transaction across two
physical JDBC Connections.  Each XAResource instance is associated with
exactly one physical connection instance.

In light of the implementation, I could probably just define isSameRM() as
return this == otherXAResource . . .

> Also take a look at a list of comments on my code by Mike Bonnet. Some of
> them probably apply to your code as well. Those comments are about the
> version that's on my pg webpage:
>
> http://users.tkk.fi/~hlinnaka/pgsql/

I saw these.  Nothing jumped out at me.  The documentation is inadequate,
and I'll work on that.  The logging may be inadequate.  I don't know about
that.  I've used this kind of XA implementation before and this kind of
logging has been adequate for me to debug problems.

> The XA and JTA specifications are quite complicated. I'd like to see a good
> set of test cases that exercise all possible scenarious and also error
> conditions. We're also going to need testers with access to the popular
> application servers so that we know our implementation works with them.
> AFAIK, the only open source application server that does recovery properly is
> the CVS head version of JOnAS.

I have some cactus test cases for an XML database that has an XA driver.
I'm not feeling too motivated to port them to PostgreSQL.

> Also, if we violate some parts of the specs (like the transaction
> interleaving part), it's important to know exactly what the limitations are
> and why. I started to write down the exact preconditions for each method in
> the javadoc comments, and also  separate which preconditions come from the
> specs and which are just implementation-specific limitations.

I think the interleaving business is a non-issue.  I can't think of a real
world case where a transaction manager would do this.  Can you?

Besides, like I said, I doubt any other SQL database supports this.  I
know Berkeley DB does, but Berkeley DB lets you associate any database
call with any transaction, so it's easy.

JTA was written with more than just SQL databases in mind, and I don't
think we need to bend over backwards to implement some corner
functionality for a resource which, by its design, doesn't support it.

Thanks again.  I'll post another revision which fixes recovery and
addresses other issues soon.

Michael

> On Wed, 20 Jul 2005, Michael Allman wrote:
>
>> Here's my first cut:
>>
>> http://www.allman.ms/pgjdbcxa/pgjdbcxa-20050720.jar
>>
>> At this point I know the documentation is sparse.  I'll try to improve that
>> situation soon.  Until then, I wanted to give everyone the opportunity to
>> take a first look at the code and the approach.
>>
>> I also have some questions, some of which are embedded in the .java files
>> as comments.  If anyone has answers, please pass them along.
>>
>> I'll let you chew on this and check back tomorrow.
>>
>> Cheers,
>>
>> Michael
>>
>> On Wed, 20 Jul 2005, Heikki Linnakangas wrote:
>>
>>> I'm working on it. Glad to hear your interested. I don't have much time to
>>> work on it though, so I would be very happy if you took over.
>>>
>>> I'll send you a copy of my workbench off-list so you can take a look. It's
>>> work in progress, but I hope it helps..
>>>
>>> I believe this is the discussion Dave mentioned:
>>>
>>> http://archives.postgresql.org/pgsql-jdbc/2005-06/msg00165.php
>>>
>>> - Heikki
>>>
>>> On Tue, 19 Jul 2005, Dave Cramer wrote:
>>>
>>>> Not sure, but look at the archives, there was some discussion about
>>>> various mechanism.
>>>>
>>>> Dave
>>>> On 19-Jul-05, at 7:34 PM, Michael Allman wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm implementing XA support in the postgres JDBC driver to complement
>>>>> the backend two phase commit support in CVS.  Is anyone else working on
>>>>> this?
>>>>>
>>>>> Michael
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 4: Have you searched our list archives?
>>>
>>>              http://archives.postgresql.org
>>>
>>
>
> - Heikki
>

Re: jdbc xa support

From

Michael Allman

Date:

21 July 2005, 22:20:05

I've posted a newer revision here:

http://www.allman.ms/pgjdbcxa/pgjdbcxa-20050721.jar

I modified the implementation of recover to put the recovered xid's in the
transaction state map so that they'll be there when the tm tries to commit
or roll them back.

I renamed some variable and class names to take them a little closer to
their actual meaning.

I added small amount of code documentation.  Much more to come.

I think I made some other minor adjustments which I've forgotten.

Michael

On Thu, 21 Jul 2005, Heikki Linnakangas wrote:

> Thanks.
>
> So far so good. You still have all the same tough issues ahead as I do,
> though.
>
> I realize it's work in progress, but here's some comments in no particular
> order:
>
> 1. To answer the question in the source code: PostgreSQL doesn't support
> transaction timeouts
>
> 2. Using prepared statements like "PREPARE TRANSACTION ?" won't work. You can
> only use prepared statements for normal SELECT/UPDATE/DELETE commands.
>
> 3. How are you planning to handle transaction interleaving discussed in the
> thread Dave mentioned?
>
> 4. recover is broken because it ignores the flags argument. That's going to
> cause an endless loop in the transaction manager when it tries to recover.
> See this discussion:
> http://forum.java.sun.com/thread.jspa?threadID=475468&messageID=2232566
>
> 5. commit and rollback check that the transaction is found in
> XID_TO_TRANSACTION_STATE_MAP. However, after a crash/recover cycle, the map
> will be empty.
>
> 6. isSameRM considers two connections to the same database as different RMs.
> I'm not sure what the implications of this are, but I feel that's not right.
> I have the same issue in my implementation as well...
>
> Also take a look at a list of comments on my code by Mike Bonnet. Some of
> them probably apply to your code as well. Those comments are about the
> version that's on my pg webpage:
>
> http://users.tkk.fi/~hlinnaka/pgsql/
>
> The XA and JTA specifications are quite complicated. I'd like to see a good
> set of test cases that exercise all possible scenarious and also error
> conditions. We're also going to need testers with access to the popular
> application servers so that we know our implementation works with them.
> AFAIK, the only open source application server that does recovery properly is
> the CVS head version of JOnAS.
>
> Also, if we violate some parts of the specs (like the transaction
> interleaving part), it's important to know exactly what the limitations are
> and why. I started to write down the exact preconditions for each method in
> the javadoc comments, and also  separate which preconditions come from the
> specs and which are just implementation-specific limitations.
>
> On Wed, 20 Jul 2005, Michael Allman wrote:
>
>> Here's my first cut:
>>
>> http://www.allman.ms/pgjdbcxa/pgjdbcxa-20050720.jar
>>
>> At this point I know the documentation is sparse.  I'll try to improve that
>> situation soon.  Until then, I wanted to give everyone the opportunity to
>> take a first look at the code and the approach.
>>
>> I also have some questions, some of which are embedded in the .java files
>> as comments.  If anyone has answers, please pass them along.
>>
>> I'll let you chew on this and check back tomorrow.
>>
>> Cheers,
>>
>> Michael
>>
>> On Wed, 20 Jul 2005, Heikki Linnakangas wrote:
>>
>>> I'm working on it. Glad to hear your interested. I don't have much time to
>>> work on it though, so I would be very happy if you took over.
>>>
>>> I'll send you a copy of my workbench off-list so you can take a look. It's
>>> work in progress, but I hope it helps..
>>>
>>> I believe this is the discussion Dave mentioned:
>>>
>>> http://archives.postgresql.org/pgsql-jdbc/2005-06/msg00165.php
>>>
>>> - Heikki
>>>
>>> On Tue, 19 Jul 2005, Dave Cramer wrote:
>>>
>>>> Not sure, but look at the archives, there was some discussion about
>>>> various mechanism.
>>>>
>>>> Dave
>>>> On 19-Jul-05, at 7:34 PM, Michael Allman wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm implementing XA support in the postgres JDBC driver to complement
>>>>> the backend two phase commit support in CVS.  Is anyone else working on
>>>>> this?
>>>>>
>>>>> Michael
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 4: Have you searched our list archives?
>>>
>>>              http://archives.postgresql.org
>>>
>>
>
> - Heikki
>

Re: jdbc xa support

From

Oliver Jowett

Date:

21 July 2005, 23:02:24

Michael Allman wrote:

>> 4. recover is broken because it ignores the flags argument. That's
>> going to cause an endless loop in the transaction manager when it
>> tries to recover. See this discussion:
>> http://forum.java.sun.com/thread.jspa?threadID=475468&messageID=2232566
>
>
> That is problematic.  The API for recovery is stateful, and, IMHO,
> poorly designed.  If you look at the original DTP XA spec you'll see it
> makes much more sense.

Huh? TMSTARTRSCAN etc are in the original DTP XA spec too, assuming you
mean this one:

X/Open CAE Specification
Distributed Transaction Processing: The XA Specification
ISBN: 1 872630 24 3
X/Open Document Number: XO/CAE/91/300

-O

Re: jdbc xa support

From

Michael Allman

Date:

21 July 2005, 23:06:51

On Fri, 22 Jul 2005, Oliver Jowett wrote:

> Michael Allman wrote:
>
>>> 4. recover is broken because it ignores the flags argument. That's
>>> going to cause an endless loop in the transaction manager when it
>>> tries to recover. See this discussion:
>>> http://forum.java.sun.com/thread.jspa?threadID=475468&messageID=2232566
>>
>>
>> That is problematic.  The API for recovery is stateful, and, IMHO,
>> poorly designed.  If you look at the original DTP XA spec you'll see it
>> makes much more sense.
>
> Huh? TMSTARTRSCAN etc are in the original DTP XA spec too, assuming you
> mean this one:
>
> X/Open CAE Specification
> Distributed Transaction Processing: The XA Specification
> ISBN: 1 872630 24 3
> X/Open Document Number: XO/CAE/91/300

Yes, and if you look at the specification for the xa_recover() function 47
you see it's a lot different from its Java counterpart.

Michael

Re: jdbc xa support

From

Oliver Jowett

Date:

21 July 2005, 23:48:18

Michael Allman wrote:
> On Fri, 22 Jul 2005, Oliver Jowett wrote:
>
>> Michael Allman wrote:
>>
>>>> 4. recover is broken because it ignores the flags argument. That's
>>>> going to cause an endless loop in the transaction manager when it
>>>> tries to recover. See this discussion:
>>>> http://forum.java.sun.com/thread.jspa?threadID=475468&messageID=2232566
>>>
>>>
>>>
>>> That is problematic.  The API for recovery is stateful, and, IMHO,
>>> poorly designed.  If you look at the original DTP XA spec you'll see it
>>> makes much more sense.
>>
>>
>> Huh? TMSTARTRSCAN etc are in the original DTP XA spec too, assuming you
>> mean this one:
>>
>> X/Open CAE Specification
>> Distributed Transaction Processing: The XA Specification
>> ISBN: 1 872630 24 3
>> X/Open Document Number: XO/CAE/91/300
>
>
> Yes, and if you look at the specification for the xa_recover() function
> 47 you see it's a lot different from its Java counterpart.

Actually, I don't see that at all; the Java API is pretty much a direct
mapping of the C API. Quote from the C API:

===
So that all XIDs may be returned irrespective of the size of the array
xids, one or more xa_recover () calls may be used within a single
recovery scan. The flags parameter to xa_recover () defines when a
recovery scan should start or end, or start and end. The start of a
recovery scan moves a cursor to the start of a list of prepared and
heuristically completed transactions. Throughout the recovery scan the
cursor marks the current position in that list. Each call advances the
cursor past the set of XIDs it returns.
[...]
Upon success, xa_recover () places zero or more XIDs in the space
pointed to by xids. The function returns the number of XIDs it has
placed there. If this value is less than count, there are no more XIDs
to recover and the current scan ends.
[...]
Following are the valid settings of flags:
TMSTARTRSCAN
This flag indicates that xa_recover () should start a recovery scan for
the thread of control and position the cursor to the start of the list.
XIDs are returned from that point. If a recovery scan is already open,
the effect is as if the recovery scan were ended and then restarted.
TMENDRSCAN
This flag indicates that xa_recover () should end the recovery scan
after returning the XIDs. If this flag is used in conjunction with
TMSTARTRSCAN, the single xa_recover () call starts and then ends a scan.
TMNOFLAGS
This flag must be used when no other flags are set in flags. A recovery
scan must already be started. XIDs are returned starting at the current
cursor position.
===

Java API:

===
The flag parameter indicates where the recover scan should start or end,
or start and end. This method may be invoked one or more times during a
recovery scan. The resource manager maintains a cursor which marks the
current position of the prepared or heuristically completed transaction
list. Each invocation of the recover method moves the cursor passed the
set of Xids that are returned.
[...]
flag
One of TMSTARTRSCAN, TMENDRSCAN, TMNOFLAGS. TMNOFLAGS must be used
when no other flags are used.
TMSTARTRSCAN - indicates that the recovery
scan should be started at the beginning of the prepared or heuristically
completed transaction list.
TMENDRSCAN - indicates that the recovery scan should be ended after the
method returns the Xid list. If this flag is used in conjunction with
the TMSTARTRSCAN, this method invocation starts and ends the recovery
scan.
TMNOFLAGS - this flag must be used when no other flags are specified.
This flag may be used only if the recovery scan has already been
started. The list of Xids are returned
Returns: xid[]
The resource manager returns zero or more Xids for the transaction
branches that are currently in a prepared or heuristically completed
state.
====

The only significant difference is that the Java API doesn't talk about
how to signal no-more-XIDs .. I'm guessing this is just an oversight.
The thread referred above to mentions using a null or zero-sized return
from recover() to signal that.

But both the Java and C versions are stateful and essentially
equivalent, so I'm not sure what your complaint is exactly..

-O

Re: jdbc xa support

From

Michael Allman

Date:

22 July 2005, 01:48:22

On Fri, 22 Jul 2005, Oliver Jowett wrote:

> Michael Allman wrote:
>> On Fri, 22 Jul 2005, Oliver Jowett wrote:
>>
>>> Michael Allman wrote:
>>>
>>>>> 4. recover is broken because it ignores the flags argument. That's
>>>>> going to cause an endless loop in the transaction manager when it
>>>>> tries to recover. See this discussion:
>>>>> http://forum.java.sun.com/thread.jspa?threadID=475468&messageID=2232566
>>>>
>>>>
>>>>
>>>> That is problematic.  The API for recovery is stateful, and, IMHO,
>>>> poorly designed.  If you look at the original DTP XA spec you'll see it
>>>> makes much more sense.
>>>
>>>
>>> Huh? TMSTARTRSCAN etc are in the original DTP XA spec too, assuming you
>>> mean this one:
>>>
>>> X/Open CAE Specification
>>> Distributed Transaction Processing: The XA Specification
>>> ISBN: 1 872630 24 3
>>> X/Open Document Number: XO/CAE/91/300
>>
>>
>> Yes, and if you look at the specification for the xa_recover() function
>> 47 you see it's a lot different from its Java counterpart.
>
> Actually, I don't see that at all; the Java API is pretty much a direct
> mapping of the C API. Quote from the C API:
>
> ===
> So that all XIDs may be returned irrespective of the size of the array
> xids, one or more xa_recover () calls may be used within a single
> recovery scan. The flags parameter to xa_recover () defines when a
> recovery scan should start or end, or start and end. The start of a
> recovery scan moves a cursor to the start of a list of prepared and
> heuristically completed transactions. Throughout the recovery scan the
> cursor marks the current position in that list. Each call advances the
> cursor past the set of XIDs it returns.
> [...]
> Upon success, xa_recover () places zero or more XIDs in the space
> pointed to by xids. The function returns the number of XIDs it has
> placed there. If this value is less than count, there are no more XIDs
> to recover and the current scan ends.
> [...]
> Following are the valid settings of flags:
> TMSTARTRSCAN
> This flag indicates that xa_recover () should start a recovery scan for
> the thread of control and position the cursor to the start of the list.
> XIDs are returned from that point. If a recovery scan is already open,
> the effect is as if the recovery scan were ended and then restarted.
> TMENDRSCAN
> This flag indicates that xa_recover () should end the recovery scan
> after returning the XIDs. If this flag is used in conjunction with
> TMSTARTRSCAN, the single xa_recover () call starts and then ends a scan.
> TMNOFLAGS
> This flag must be used when no other flags are set in flags. A recovery
> scan must already be started. XIDs are returned starting at the current
> cursor position.
> ===
>
> Java API:
>
> ===
> The flag parameter indicates where the recover scan should start or end,
> or start and end. This method may be invoked one or more times during a
> recovery scan. The resource manager maintains a cursor which marks the
> current position of the prepared or heuristically completed transaction
> list. Each invocation of the recover method moves the cursor passed the
> set of Xids that are returned.
> [...]
> flag
> One of TMSTARTRSCAN, TMENDRSCAN, TMNOFLAGS. TMNOFLAGS must be used
> when no other flags are used.
> TMSTARTRSCAN - indicates that the recovery
> scan should be started at the beginning of the prepared or heuristically
> completed transaction list.
> TMENDRSCAN - indicates that the recovery scan should be ended after the
> method returns the Xid list. If this flag is used in conjunction with
> the TMSTARTRSCAN, this method invocation starts and ends the recovery
> scan.
> TMNOFLAGS - this flag must be used when no other flags are specified.
> This flag may be used only if the recovery scan has already been
> started. The list of Xids are returned
> Returns: xid[]
> The resource manager returns zero or more Xids for the transaction
> branches that are currently in a prepared or heuristically completed
> state.
> ====
>
> The only significant difference is that the Java API doesn't talk about
> how to signal no-more-XIDs .. I'm guessing this is just an oversight.
> The thread referred above to mentions using a null or zero-sized return
> from recover() to signal that.
>
> But both the Java and C versions are stateful and essentially
> equivalent, so I'm not sure what your complaint is exactly..

The C version takes an argument specifying the maximum number of xids to
recover.  The Java version does not.  Without this information, the Java
version not only looks silly but doesn't make a lot of sense either.  For
example, how many recovered xids should we return on the first call to
recover()?

Anyway, do you think my implementation of the recover() method violates
the JTA spec?

Michael

Re: jdbc xa support

From

Oliver Jowett

Date:

22 July 2005, 02:02:31

Michael Allman wrote:

> The C version takes an argument specifying the maximum number of xids to
> recover.  The Java version does not.

That's at least partly because the C version makes the caller allocate
the array.

>  Without this information, the Java
> version not only looks silly but doesn't make a lot of sense either.

It seems ok to me -- it puts the burden for selecting a suitable batch
size on the resource rather than the TM, but that's six of one, half a
dozen of the other. That also means the RM can generate the array at
whatever size is convenient, rather than having to internally buffer if
it retrieves xids in some large block size than the TM selects.

It certainly doesn't make it more or less stateful, so I still don't
understand your original objection.

> For example, how many recovered xids should we return on the first call
> to recover()?

For our JDBC implementation? Set a fetchsize on your query to something
reasonable -- perhaps 500? -- and return up to that many Xids per call
until you hit the end of the resultset, then return empty arrays
thereafter until a new scan starts.

> Anyway, do you think my implementation of the recover() method violates
> the JTA spec?

The code in pgjdbcxa-20050721.jar appears to violate the spec, as you
completely ignore the flags argument. You need to track the recovery
scan state even if you decide to return all Xids in one array, because
subsequent calls shouldn't return those Xids again until a new scan is
started, per the API docs.

-O

Re: jdbc xa support

From

Michael Allman

Date:

22 July 2005, 02:28:13

On Fri, 22 Jul 2005, Oliver Jowett wrote:

> Michael Allman wrote:
>
>> The C version takes an argument specifying the maximum number of xids to
>> recover.  The Java version does not.
>
> That's at least partly because the C version makes the caller allocate the
> array.

In this situation it is understandable why the C method is stateful and
how it is supposed to work.  The caller tells the resource manager how
many xids it can handle at once by passing a length argument. It also
passes an XID* which it malloc()'s to hold

length * (3 * sizeof(long) + XIDDATASIZE * sizeof(char))

bytes, or something like that.

Without a length argument allowing the TM to tell the RM how many xids it
can take, it doesn't make sense to return "some" xids at a time.  Because
how many "some" is is determined by the RM, not the TM, and that defeats
the purpose of the C version's approach.

This is why I say the Java version doesn't make sense.

>>  Without this information, the Java version not only looks silly but
>> doesn't make a lot of sense either.
>
> It seems ok to me -- it puts the burden for selecting a suitable batch size
> on the resource rather than the TM, but that's six of one, half a dozen of
> the other. That also means the RM can generate the array at whatever size is
> convenient, rather than having to internally buffer if it retrieves xids in
> some large block size than the TM selects.
>
> It certainly doesn't make it more or less stateful, so I still don't
> understand your original objection.
>
>> For example, how many recovered xids should we return on the first call to
>> recover()?
>
> For our JDBC implementation? Set a fetchsize on your query to something
> reasonable -- perhaps 500? -- and return up to that many Xids per call until
> you hit the end of the resultset, then return empty arrays thereafter until a
> new scan starts.

Why 500?  It's simpler to return all of them.

>> Anyway, do you think my implementation of the recover() method violates the
>> JTA spec?
>
> The code in pgjdbcxa-20050721.jar appears to violate the spec, as you
> completely ignore the flags argument. You need to track the recovery scan
> state even if you decide to return all Xids in one array, because subsequent
> calls shouldn't return those Xids again until a new scan is started, per the
> API docs.

I do track the recovery scan state.  Each call puts the scan cursor at the
end of the list.

I still don't see a violation of the API.

It looks like the JTA API is wrong or there's a typo.  If we follow the
spirit of the DTP spec it seems that the TMNOFLAGS flag means "return some
xids starting from where we last left off".  I'm still not sure what
TMENDRSCAN means.

Michael

Re: jdbc xa support

From

Oliver Jowett

Date:

22 July 2005, 02:56:37

Michael Allman wrote:

> Without a length argument allowing the TM to tell the RM how many xids
> it can take, it doesn't make sense to return "some" xids at a time.
> Because how many "some" is is determined by the RM, not the TM, and that
> defeats the purpose of the C version's approach.

Well, personal preference perhaps, it seems fine to me to allow the RM
to break up the list into more manageable parts if it wants to. For
example, consider a Java RM that is implemented by JNI calls to a C
implementation of the XA interface.

>> For our JDBC implementation? Set a fetchsize on your query to
>> something reasonable -- perhaps 500? -- and return up to that many
>> Xids per call until you hit the end of the resultset, then return
>> empty arrays thereafter until a new scan starts.
>
> Why 500?  It's simpler to return all of them.

Sure, it doesn't really matter either way, it's just an implementation
decision. My only concern was if you're pulling in a million Xids at
once then you are going to bloat out the heap more than you need to --
but that's unlikely to happen in any real system anyway.

>>> Anyway, do you think my implementation of the recover() method
>>> violates the JTA spec?
>>
>>
>> The code in pgjdbcxa-20050721.jar appears to violate the spec, as you
>> completely ignore the flags argument. You need to track the recovery
>> scan state even if you decide to return all Xids in one array, because
>> subsequent calls shouldn't return those Xids again until a new scan is
>> started, per the API docs.
>
>
> I do track the recovery scan state.  Each call puts the scan cursor at
> the end of the list.

Uh, are we looking at the same code here? I don't see anything in the
code from pgjdbcxa-20050721.jar that records whether we are at the end
of the list or not between calls to recover(), and I don't see anything
that looks for TMSTARTRSCAN to reset that state. If I've missed it, can
you point it out to me?

> I still don't see a violation of the API.

The API says that if a TM does this:

   Xid[] xids_1 = resource.recover(TMSTARTRSCAN);
   Xid[] xids_2 = resource.recover(TMNOFLAGS);
   Xid[] xids_3 = resource.recover(TMNOFLAGS);

then xids_1, xids_2, and xids_3 reflect consecutive (and possibly empty
if you hit the end) parts of the recovery list. It seems your code does
not respect this -- it will return the full list of Xids repeatedly in
each of xids_1, xids_2 and xids_3.

Given that the way that the TM decides that the recovery scan is
complete is by looking for an empty array returned from recover(), your
code is going to hit exactly the infinite-loop case described in the
thread that Heikki posted.

> It looks like the JTA API is wrong or there's a typo.  If we follow the
> spirit of the DTP spec it seems that the TMNOFLAGS flag means "return
> some xids starting from where we last left off".  I'm still not sure
> what TMENDRSCAN means.

The JTA specification is fairly clear about their meanings. TMNOFLAGS
does indeed mean "continue the current scan". TMENDRSCAN means "I'm done
with this scan", and allows RMs to free resources they have allocated
for the scan.

===
The flag parameter indicates where the recover scan should start or end,
or start and end. This method may be invoked one or more times during a
recoveryscan. The resource manager maintains a cursor which marks the
current position of the prepared or heuristically completed transaction
list. Each invocation of the recover method moves the cursor past the
set of Xids that are returned.

[...]

TMSTARTRSCAN - indicates that the recoveryscan should be started at the
beginning of the prepared or heuristically completed transaction list.

TMENDRSCAN - indicates that the recovery scan should be ended after the
method returns the Xid list. If this flag is used in conjunction with
the TMSTARTRSCAN, this method invocation starts and ends the recovery scan.

TMNOFLAGS - this flag must be used when no other flags are specified.
This  flag may be used only if the recovery scan has already been
started. The list of Xids are returned
===

-O

Re: jdbc xa support

From

Michael Allman

Date:

22 July 2005, 03:25:56

On Fri, 22 Jul 2005, Oliver Jowett wrote:

>> I do track the recovery scan state.  Each call puts the scan cursor at the
>> end of the list.
>
> Uh, are we looking at the same code here? I don't see anything in the code
> from pgjdbcxa-20050721.jar that records whether we are at the end of the list
> or not between calls to recover(), and I don't see anything that looks for
> TMSTARTRSCAN to reset that state. If I've missed it, can you point it out to
> me?

You're right.  But so am I.  Since the code returns the complete list, the
"cursor" is always at the "end" of the "list" of prepared xids.  The code
simply starts at the beginning for each call to recover().

>> I still don't see a violation of the API.
>
> The API says that if a TM does this:
>
>  Xid[] xids_1 = resource.recover(TMSTARTRSCAN);
>  Xid[] xids_2 = resource.recover(TMNOFLAGS);
>  Xid[] xids_3 = resource.recover(TMNOFLAGS);
>
> then xids_1, xids_2, and xids_3 reflect consecutive (and possibly empty if
> you hit the end) parts of the recovery list. It seems your code does not
> respect this -- it will return the full list of Xids repeatedly in each of
> xids_1, xids_2 and xids_3.

I don't see anything in the JTA spec that says the TMNOFLAGS int means
anything other than that no other flag was passed to recover().  In the
DTP spec it says something about returning xids starting at the current
cursor position.

>> It looks like the JTA API is wrong or there's a typo.  If we follow the
>> spirit of the DTP spec it seems that the TMNOFLAGS flag means "return some
>> xids starting from where we last left off".  I'm still not sure what
>> TMENDRSCAN means.
>
> The JTA specification is fairly clear about their meanings. TMNOFLAGS does
> indeed mean "continue the current scan". TMENDRSCAN means "I'm done with this
> scan", and allows RMs to free resources they have allocated for the scan.

I added the lines

     if (flag != TMSTARTRSCAN) {
         return new Xid[0];
     }

to the top of the recover() method and posted a new version at

http://www.allman.ms/pgjdbcxa/pgjdbcxa-20050722.jar

How does that look?

Michael

Re: jdbc xa support

From

Heikki Linnakangas

Date:

22 July 2005, 04:00:38

On Thu, 21 Jul 2005, Michael Allman wrote:

> On Thu, 21 Jul 2005, Heikki Linnakangas wrote:
>
>> 2. Using prepared statements like "PREPARE TRANSACTION ?" won't work. You
>> can only use prepared statements for normal SELECT/UPDATE/DELETE commands.
>
> Doesn't the driver support client side prepared statements?

No, they're server side. I tried that too at first, but it didn't work.

>> 3. How are you planning to handle transaction interleaving discussed in the
>> thread Dave mentioned?
>
> I'm not.  PostgreSQL does not support this behavior, and I see no need to
> pretend it does.  I think the appropriate thing to do is throw an exception
> when the second start is called.

I agree. However, I'd like to try it with the popular TMs to make sure
they work without it.

> I have serious doubts that any SQL database in the world supports this
> behavior correctly.  If you know of one that does, I'd like to see its magic.

I tested it on some SQL databases, and at least Oracle seems to support
it. DB2 fakes it by preparing early. Derby seems to support it, but it
only supports XA in embedded mode.

>> 4. recover is broken because it ignores the flags argument. That's going to
>> cause an endless loop in the transaction manager when it tries to recover.
>> See this discussion:
>> http://forum.java.sun.com/thread.jspa?threadID=475468&messageID=2232566
>
> That is problematic.  The API for recovery is stateful, and, IMHO, poorly
> designed.  If you look at the original DTP XA spec you'll see it makes much
> more sense.

I agree that it sucks.

> I don't know what to do about this yet.

The simplest implementation is one that returns all the recovered xids if
flags include TMSTARTRSCAN, and an empty array in all other cases. That
way, the internal implementation don't have to be stateful even though the
API is.

>> 6. isSameRM considers two connections to the same database as different
>> RMs. I'm not sure what the implications of this are, but I feel that's not
>> right. I have the same issue in my implementation as well...
>
> They're different RM's because you can't join a transaction across two
> physical JDBC Connections.  Each XAResource instance is associated with
> exactly one physical connection instance.

I don't think that's the correct definition of an RM. See section 2.2.4
of the XA specification. I think the Postgres database or cluster is one
RM. But as I said, I don't know what implications your implementation has.
It might work just fine, or not.

> In light of the implementation, I could probably just define isSameRM() as
> return this == otherXAResource . . .

Yep.

>> The XA and JTA specifications are quite complicated. I'd like to see a good
>> set of test cases that exercise all possible scenarious and also error
>> conditions. We're also going to need testers with access to the popular
>> application servers so that we know our implementation works with them.
>> AFAIK, the only open source application server that does recovery properly
>> is the CVS head version of JOnAS.
>
> I have some cactus test cases for an XML database that has an XA driver. I'm
> not feeling too motivated to port them to PostgreSQL.

Can you send them? I'd like to take a look, even if we can't use them
directly. Which XML database is that?

>> Also, if we violate some parts of the specs (like the transaction
>> interleaving part), it's important to know exactly what the limitations are
>> and why. I started to write down the exact preconditions for each method in
>> the javadoc comments, and also  separate which preconditions come from the
>> specs and which are just implementation-specific limitations.
>
> I think the interleaving business is a non-issue.  I can't think of a real
> world case where a transaction manager would do this.  Can you?

Using interleaving, the application server could get away with a smaller
connection pool. It could recycle the connections right after end call,
without waiting for the prepare/commit cycle.

I don't know if any application server does that in practice. If it turns
out to be a problem, we might get away by some clever locking. We
could make the second start call block and wait for the previous
transaction to finish.

> Besides, like I said, I doubt any other SQL database supports this.  I know
> Berkeley DB does, but Berkeley DB lets you associate any database call with
> any transaction, so it's easy.
>
> JTA was written with more than just SQL databases in mind, and I don't think
> we need to bend over backwards to implement some corner functionality for a
> resource which, by its design, doesn't support it.

I agree, if the application servers work without it.

- Heikki

Re: jdbc xa support

From

Michael Allman

Date:

22 July 2005, 16:15:44

On Fri, 22 Jul 2005, Heikki Linnakangas wrote:

> On Thu, 21 Jul 2005, Michael Allman wrote:
>
>> On Thu, 21 Jul 2005, Heikki Linnakangas wrote:
>>
>>> 2. Using prepared statements like "PREPARE TRANSACTION ?" won't work. You
>>> can only use prepared statements for normal SELECT/UPDATE/DELETE commands.
>>
>> Doesn't the driver support client side prepared statements?
>
> No, they're server side. I tried that too at first, but it didn't work.

I will make corrections.

>>> 3. How are you planning to handle transaction interleaving discussed in
>>> the thread Dave mentioned?
>>
>> I'm not.  PostgreSQL does not support this behavior, and I see no need to
>> pretend it does.  I think the appropriate thing to do is throw an exception
>> when the second start is called.
>
> I agree. However, I'd like to try it with the popular TMs to make sure they
> work without it.
>
>> I have serious doubts that any SQL database in the world supports this
>> behavior correctly.  If you know of one that does, I'd like to see its
>> magic.
>
> I tested it on some SQL databases, and at least Oracle seems to support it.
> DB2 fakes it by preparing early. Derby seems to support it, but it only
> supports XA in embedded mode.

If Oracle supports it, it's likely because they have some server-side
stored procedures that do something magical.  I don't know.  I'm not an
SQL expert, but I don't think SQL by itself supports the association of
discrete DML statements with arbitrary transactions.

You might want to check out SimpleJTA:

http://www.simplejta.org/

They have some XA driver notes.  Among them the following nugget:

<quote>
JTA specifications allow an XAResource object to shared amongst multiple
concurrent transactions with the restriction that the resource can be
enlisted with a single transaction at a point in time. Resource sharing
amonst multiple transactions appears to cause a problem in Oracle in a
multi-threaded environment. Therefore, SimpleJTA is configured to defer
the reuse of an XAResource object by other transactions until the existing
transaction is completed, i.e., either committed or rolled back.
</quote>

>>> 4. recover is broken because it ignores the flags argument. That's going
>>> to cause an endless loop in the transaction manager when it tries to
>>> recover. See this discussion:
>>> http://forum.java.sun.com/thread.jspa?threadID=475468&messageID=2232566
>>
>> That is problematic.  The API for recovery is stateful, and, IMHO, poorly
>> designed.  If you look at the original DTP XA spec you'll see it makes much
>> more sense.
>
> I agree that it sucks.
>
>> I don't know what to do about this yet.
>
> The simplest implementation is one that returns all the recovered xids if
> flags include TMSTARTRSCAN, and an empty array in all other cases. That way,
> the internal implementation don't have to be stateful even though the API is.

I posted a new version last night that does this.  I think it works.

>>> 6. isSameRM considers two connections to the same database as different
>>> RMs. I'm not sure what the implications of this are, but I feel that's not
>>> right. I have the same issue in my implementation as well...
>>
>> They're different RM's because you can't join a transaction across two
>> physical JDBC Connections.  Each XAResource instance is associated with
>> exactly one physical connection instance.
>
> I don't think that's the correct definition of an RM. See section 2.2.4 of
> the XA specification. I think the Postgres database or cluster is one RM. But
> as I said, I don't know what implications your implementation has. It might
> work just fine, or not.

It's up to the implementor to define the scope of an "RM" and what
isSameRM() means --- hence the interface method.

The TM uses this method when it has another XAResource to enlist in the
transaction and wants to know if it should start another branch for it
(with start(newBranchXid, TMNOFLAGS)) or can join an existing transaction
branch (with start(existingBranchXid, TMJOIN)).

The DTP XA spec says a single RM *may* service multiple independent
resource domains.  There are RM's that work like this, e.g. Berkeley DB
where transactions are represented as first-class Objects which can be
passed around within the same environment.  However, PostgreSQL does not
support this behavior.  Again, you can't join a transaction across
physical database connections.

One possible alternative we might explore is allowing an XAResource
instance, say xaRes1, for the same database as another XAResource
instance, say xaRes2, to adopt the same physical connection instance as
xaRes2.  So xaRes2.isSameRM(xaRes1) would return true if the underlying
physical connections pointed to the same PostgreSQL database (with the
same user credentials).  Then if a TM tried to join xaRes2 to xaRes1's
transaction branch, we could implement xaRes1.start(existingBranchXid,
TMJOIN) to assign xaRes1.physicalConnection = xaRes2.physicalConnection.
Then they would share the same transaction branch and context.  How about
that?

>> In light of the implementation, I could probably just define isSameRM() as
>> return this == otherXAResource . . .
>
> Yep.
>
>>> The XA and JTA specifications are quite complicated. I'd like to see a
>>> good set of test cases that exercise all possible scenarious and also
>>> error conditions. We're also going to need testers with access to the
>>> popular application servers so that we know our implementation works with
>>> them. AFAIK, the only open source application server that does recovery
>>> properly is the CVS head version of JOnAS.
>>
>> I have some cactus test cases for an XML database that has an XA driver.
>> I'm not feeling too motivated to port them to PostgreSQL.
>
> Can you send them? I'd like to take a look, even if we can't use them
> directly. Which XML database is that?

It's for Berkeley DB XML.  I think they're in CVS:

http://berkeley-dbxml-adapter.dev.java.net/

However, these are high-level tests at the user API level.  Perhaps we
should write tests to XAResource directly?  I don't even think they'd need
to be cactus tests.

>>> Also, if we violate some parts of the specs (like the transaction
>>> interleaving part), it's important to know exactly what the limitations
>>> are and why. I started to write down the exact preconditions for each
>>> method in the javadoc comments, and also  separate which preconditions
>>> come from the specs and which are just implementation-specific
>>> limitations.
>>
>> I think the interleaving business is a non-issue.  I can't think of a real
>> world case where a transaction manager would do this.  Can you?
>
> Using interleaving, the application server could get away with a smaller
> connection pool. It could recycle the connections right after end call,
> without waiting for the prepare/commit cycle.

But then leave the first transaction hanging?  My gut tells me you should
resolve, either rollback or commit, a transaction as soon as you can.
Prepared transactions hold onto their locks, right?  Leaving them
unresolved would lead to poorer concurrency, not better.

I don't really understand the rationale behind this "interleaving" idea.

> I don't know if any application server does that in practice. If it turns out
> to be a problem, we might get away by some clever locking. We could make the
> second start call block and wait for the previous transaction to finish.
>
>> Besides, like I said, I doubt any other SQL database supports this.  I know
>> Berkeley DB does, but Berkeley DB lets you associate any database call with
>> any transaction, so it's easy.
>>
>> JTA was written with more than just SQL databases in mind, and I don't
>> think we need to bend over backwards to implement some corner functionality
>> for a resource which, by its design, doesn't support it.
>
> I agree, if the application servers work without it.
>
> - Heikki
>

Re: jdbc xa support

From

Michael Allman

Date:

22 July 2005, 20:07:35

I've uploaded a new version of my patch to

http://www.allman.ms/pgjdbcxa/pgjdbcxa-20050722-2.jar

This version includes some bug fixes and a small number of (working) unit
tests.

It occurred to me recently that start(xid, TMJOIN) is broken.  Given that
this implementation doesn't support transaction branch joining, it should
probably just throw an XAException.  Of course, since isSameRm() returns
true only for identical PGXAResource instances a TM should never call
start(xid, TMJOIN).  Makes sense?

Michael

On Fri, 22 Jul 2005, Michael Allman wrote:

> On Fri, 22 Jul 2005, Heikki Linnakangas wrote:
>
>> On Thu, 21 Jul 2005, Michael Allman wrote:
>>
>>> On Thu, 21 Jul 2005, Heikki Linnakangas wrote:
>>>
>>>> 2. Using prepared statements like "PREPARE TRANSACTION ?" won't work. You
>>>> can only use prepared statements for normal SELECT/UPDATE/DELETE
>>>> commands.
>>>
>>> Doesn't the driver support client side prepared statements?
>>
>> No, they're server side. I tried that too at first, but it didn't work.
>
> I will make corrections.
>
>>>> 3. How are you planning to handle transaction interleaving discussed in
>>>> the thread Dave mentioned?
>>>
>>> I'm not.  PostgreSQL does not support this behavior, and I see no need to
>>> pretend it does.  I think the appropriate thing to do is throw an
>>> exception when the second start is called.
>>
>> I agree. However, I'd like to try it with the popular TMs to make sure they
>> work without it.
>>
>>> I have serious doubts that any SQL database in the world supports this
>>> behavior correctly.  If you know of one that does, I'd like to see its
>>> magic.
>>
>> I tested it on some SQL databases, and at least Oracle seems to support it.
>> DB2 fakes it by preparing early. Derby seems to support it, but it only
>> supports XA in embedded mode.
>
> If Oracle supports it, it's likely because they have some server-side stored
> procedures that do something magical.  I don't know.  I'm not an SQL expert,
> but I don't think SQL by itself supports the association of discrete DML
> statements with arbitrary transactions.
>
> You might want to check out SimpleJTA:
>
> http://www.simplejta.org/
>
> They have some XA driver notes.  Among them the following nugget:
>
> <quote>
> JTA specifications allow an XAResource object to shared amongst multiple
> concurrent transactions with the restriction that the resource can be
> enlisted with a single transaction at a point in time. Resource sharing
> amonst multiple transactions appears to cause a problem in Oracle in a
> multi-threaded environment. Therefore, SimpleJTA is configured to defer the
> reuse of an XAResource object by other transactions until the existing
> transaction is completed, i.e., either committed or rolled back.
> </quote>
>
>>>> 4. recover is broken because it ignores the flags argument. That's going
>>>> to cause an endless loop in the transaction manager when it tries to
>>>> recover. See this discussion:
>>>> http://forum.java.sun.com/thread.jspa?threadID=475468&messageID=2232566
>>>
>>> That is problematic.  The API for recovery is stateful, and, IMHO, poorly
>>> designed.  If you look at the original DTP XA spec you'll see it makes
>>> much more sense.
>>
>> I agree that it sucks.
>>
>>> I don't know what to do about this yet.
>>
>> The simplest implementation is one that returns all the recovered xids if
>> flags include TMSTARTRSCAN, and an empty array in all other cases. That
>> way, the internal implementation don't have to be stateful even though the
>> API is.
>
> I posted a new version last night that does this.  I think it works.
>
>>>> 6. isSameRM considers two connections to the same database as different
>>>> RMs. I'm not sure what the implications of this are, but I feel that's
>>>> not right. I have the same issue in my implementation as well...
>>>
>>> They're different RM's because you can't join a transaction across two
>>> physical JDBC Connections.  Each XAResource instance is associated with
>>> exactly one physical connection instance.
>>
>> I don't think that's the correct definition of an RM. See section 2.2.4 of
>> the XA specification. I think the Postgres database or cluster is one RM.
>> But as I said, I don't know what implications your implementation has. It
>> might work just fine, or not.
>
> It's up to the implementor to define the scope of an "RM" and what isSameRM()
> means --- hence the interface method.
>
> The TM uses this method when it has another XAResource to enlist in the
> transaction and wants to know if it should start another branch for it (with
> start(newBranchXid, TMNOFLAGS)) or can join an existing transaction branch
> (with start(existingBranchXid, TMJOIN)).
>
> The DTP XA spec says a single RM *may* service multiple independent resource
> domains.  There are RM's that work like this, e.g. Berkeley DB where
> transactions are represented as first-class Objects which can be passed
> around within the same environment.  However, PostgreSQL does not support
> this behavior.  Again, you can't join a transaction across physical database
> connections.
>
> One possible alternative we might explore is allowing an XAResource instance,
> say xaRes1, for the same database as another XAResource instance, say xaRes2,
> to adopt the same physical connection instance as xaRes2.  So
> xaRes2.isSameRM(xaRes1) would return true if the underlying physical
> connections pointed to the same PostgreSQL database (with the same user
> credentials).  Then if a TM tried to join xaRes2 to xaRes1's transaction
> branch, we could implement xaRes1.start(existingBranchXid, TMJOIN) to assign
> xaRes1.physicalConnection = xaRes2.physicalConnection. Then they would share
> the same transaction branch and context.  How about that?
>
>>> In light of the implementation, I could probably just define isSameRM() as
>>> return this == otherXAResource . . .
>>
>> Yep.
>>
>>>> The XA and JTA specifications are quite complicated. I'd like to see a
>>>> good set of test cases that exercise all possible scenarious and also
>>>> error conditions. We're also going to need testers with access to the
>>>> popular application servers so that we know our implementation works with
>>>> them. AFAIK, the only open source application server that does recovery
>>>> properly is the CVS head version of JOnAS.
>>>
>>> I have some cactus test cases for an XML database that has an XA driver.
>>> I'm not feeling too motivated to port them to PostgreSQL.
>>
>> Can you send them? I'd like to take a look, even if we can't use them
>> directly. Which XML database is that?
>
> It's for Berkeley DB XML.  I think they're in CVS:
>
> http://berkeley-dbxml-adapter.dev.java.net/
>
> However, these are high-level tests at the user API level.  Perhaps we should
> write tests to XAResource directly?  I don't even think they'd need to be
> cactus tests.
>
>>>> Also, if we violate some parts of the specs (like the transaction
>>>> interleaving part), it's important to know exactly what the limitations
>>>> are and why. I started to write down the exact preconditions for each
>>>> method in the javadoc comments, and also  separate which preconditions
>>>> come from the specs and which are just implementation-specific
>>>> limitations.
>>>
>>> I think the interleaving business is a non-issue.  I can't think of a real
>>> world case where a transaction manager would do this.  Can you?
>>
>> Using interleaving, the application server could get away with a smaller
>> connection pool. It could recycle the connections right after end call,
>> without waiting for the prepare/commit cycle.
>
> But then leave the first transaction hanging?  My gut tells me you should
> resolve, either rollback or commit, a transaction as soon as you can.
> Prepared transactions hold onto their locks, right?  Leaving them unresolved
> would lead to poorer concurrency, not better.
>
> I don't really understand the rationale behind this "interleaving" idea.
>
>> I don't know if any application server does that in practice. If it turns
>> out to be a problem, we might get away by some clever locking. We could
>> make the second start call block and wait for the previous transaction to
>> finish.
>>
>>> Besides, like I said, I doubt any other SQL database supports this.  I
>>> know Berkeley DB does, but Berkeley DB lets you associate any database
>>> call with any transaction, so it's easy.
>>>
>>> JTA was written with more than just SQL databases in mind, and I don't
>>> think we need to bend over backwards to implement some corner
>>> functionality for a resource which, by its design, doesn't support it.
>>
>> I agree, if the application servers work without it.
>>
>> - Heikki
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

Re: jdbc xa support

From

Oliver Jowett

Date:

23 July 2005, 04:09:16

Michael Allman wrote:

> Since the code returns the complete list,
> the "cursor" is always at the "end" of the "list" of prepared xids.

Correct.

> The code simply starts at the beginning for each call to recover().

This however isn't true -- the cursor should only be reset if
TMSTARTRSCAN is specified, not on every call.

> I don't see anything in the JTA spec that says the TMNOFLAGS int means
> anything other than that no other flag was passed to recover().  In the
> DTP spec it says something about returning xids starting at the current
> cursor position.

See the description of recover() in the JTA spec:

>> The flag parameter indicates where the recover scan should start or end,
>> or start and end. This method may be invoked one or more times during a
>> recovery scan. The resource manager maintains a cursor which marks the
>> current position of the prepared or heuristically completed transaction
>> list. Each invocation of the recover method moves the cursor passed the
>> set of Xids that are returned.
[...]
>> TMSTARTRSCAN - indicates that the recovery scan should be started at the beginning of the
>> prepared or heuristically completed transaction list.

So if TMSTARTRSCAN is specified, you move the cursor to the start of the
list.

Then (regardless of if TMSTARTRSCAN was specified) you generate an array
of Xids to return starting from the current cursor position, and move
the cursor forward past those Xids.

In your case, if you return the whole list when TMSTARTRSCAN is
specified, then that implies you should return an empty list when it's
not specified.

> I added the lines
>
>     if (flag != TMSTARTRSCAN) {
>         return new Xid[0];
>     }
>
> to the top of the recover() method and posted a new version at
>
> http://www.allman.ms/pgjdbcxa/pgjdbcxa-20050722.jar

Not quite -- it's a flag not an enumerated value -- the TM can specify
TMENDRSCAN|TMSTARTRSCAN to restart a scan currently in progress.

'if ((flag & TMSTARTRSCAN) == 0)' should work.

-O

Re: jdbc xa support

From

Oliver Jowett

Date:

23 July 2005, 04:13:07

Heikki Linnakangas wrote:
> On Thu, 21 Jul 2005, Michael Allman wrote:
>
>> On Thu, 21 Jul 2005, Heikki Linnakangas wrote:
>>
>>> 2. Using prepared statements like "PREPARE TRANSACTION ?" won't work.
>>> You can only use prepared statements for normal SELECT/UPDATE/DELETE
>>> commands.
>>
>> Doesn't the driver support client side prepared statements?
>
> No, they're server side. I tried that too at first, but it didn't work.

Yeah, it boils down to: you can only put a ? placeholder where there's a
PARAM terminal in the server's SQL grammar, as the driver translates the
placeholders to '$n' strings on Parse and uses Bind to pass the actual
values on each execution. COMMIT PREPARED etc take a Sconst, not a
PARAM, for their argument.

-O

Re: jdbc xa support

From

Heikki Linnakangas

Date:

23 July 2005, 04:56:05

On Fri, 22 Jul 2005, Michael Allman wrote:

> On Fri, 22 Jul 2005, Heikki Linnakangas wrote:
>
>> On Thu, 21 Jul 2005, Michael Allman wrote:
>>
>>> I have serious doubts that any SQL database in the world supports this
>>> behavior correctly.  If you know of one that does, I'd like to see its
>>> magic.
>>
>> I tested it on some SQL databases, and at least Oracle seems to support it.
>> DB2 fakes it by preparing early. Derby seems to support it, but it only
>> supports XA in embedded mode.
>
> If Oracle supports it, it's likely because they have some server-side stored
> procedures that do something magical.  I don't know.  I'm not an SQL expert,
> but I don't think SQL by itself supports the association of discrete DML
> statements with arbitrary transactions.

SQL spec doesn't say anything about two-phase commit or XA, so no, SQL
itself doesn't support any of that.

Just looked at MySQL/InnoDB, they have these commands to deal with XA:

XA BEGIN <xid> [JOIN | RESUME]
XA START TRANSACTION <xid> [JOIN | RESUME]
XA COMMIT <xid> [ONE PHASE]
XA END <xid> [SUSPEND [FOR MIGRATE]]
XA PREPARE <xid>
XA RECOVER
XA ROLLBACK <xid>

They have all the support in the backend, so their driver
implementation is trivial. (Of course, since it's MySQL, I wouldn't bet
that they actually work the way they should, but anyway :))

> You might want to check out SimpleJTA:
>
> http://www.simplejta.org/
>
> They have some XA driver notes.  Among them the following nugget:
>
> <quote>
> JTA specifications allow an XAResource object to shared amongst multiple
> concurrent transactions with the restriction that the resource can be
> enlisted with a single transaction at a point in time. Resource sharing
> amonst multiple transactions appears to cause a problem in Oracle in a
> multi-threaded environment. Therefore, SimpleJTA is configured to defer the
> reuse of an XAResource object by other transactions until the existing
> transaction is completed, i.e., either committed or rolled back.
> </quote>

I wouldn't be surprised if all the other TMs did the same. We'll have to
test it.

>> I agree that it sucks.
>>
>>> I don't know what to do about this yet.
>>
>> The simplest implementation is one that returns all the recovered xids if
>> flags include TMSTARTRSCAN, and an empty array in all other cases. That
>> way, the internal implementation don't have to be stateful even though the
>> API is.
>
> I posted a new version last night that does this.  I think it works.

It's legal to give TMSTARTRSCAN | TMRENDSCAN as flags. Otherwise,
looks good to me.

>>>> 6. isSameRM considers two connections to the same database as different
>>>> RMs. I'm not sure what the implications of this are, but I feel that's
>>>> not right. I have the same issue in my implementation as well...
>>>
>>> They're different RM's because you can't join a transaction across two
>>> physical JDBC Connections.  Each XAResource instance is associated with
>>> exactly one physical connection instance.
>>
>> I don't think that's the correct definition of an RM. See section 2.2.4 of
>> the XA specification. I think the Postgres database or cluster is one RM.
>> But as I said, I don't know what implications your implementation has. It
>> might work just fine, or not.
>
> It's up to the implementor to define the scope of an "RM" and what isSameRM()
> means --- hence the interface method.
>
> The TM uses this method when it has another XAResource to enlist in the
> transaction and wants to know if it should start another branch for it (with
> start(newBranchXid, TMNOFLAGS)) or can join an existing transaction branch
> (with start(existingBranchXid, TMJOIN)).
>
> The DTP XA spec says a single RM *may* service multiple independent resource
> domains.  There are RM's that work like this, e.g. Berkeley DB where
> transactions are represented as first-class Objects which can be passed
> around within the same environment.  However, PostgreSQL does not support
> this behavior.  Again, you can't join a transaction across physical database
> connections.

What's a resource domain? The way I understand it, a resource domain might
be a Postgres database or cluster.

> One possible alternative we might explore is allowing an XAResource instance,
> say xaRes1, for the same database as another XAResource instance, say xaRes2,
> to adopt the same physical connection instance as xaRes2.  So
> xaRes2.isSameRM(xaRes1) would return true if the underlying physical
> connections pointed to the same PostgreSQL database (with the same user
> credentials).  Then if a TM tried to join xaRes2 to xaRes1's transaction
> branch, we could implement xaRes1.start(existingBranchXid, TMJOIN) to assign
> xaRes1.physicalConnection = xaRes2.physicalConnection. Then they would share
> the same transaction branch and context.  How about that?

That sounds right. We'll need a global map of xids and
physicalConnections.

But let's see how far we can get with the simpler method, that is, just
define isSameRM as "return this == other", and not implement TMJOIN at
all.

- Heikki

Re: jdbc xa support

From

Michael Allman

Date:

23 July 2005, 13:25:33

>> I added the lines
>>
>>     if (flag != TMSTARTRSCAN) {
>>         return new Xid[0];
>>     }
>>
>> to the top of the recover() method and posted a new version at
>>
>> http://www.allman.ms/pgjdbcxa/pgjdbcxa-20050722.jar
>
> Not quite -- it's a flag not an enumerated value -- the TM can specify
> TMENDRSCAN|TMSTARTRSCAN to restart a scan currently in progress.
>
> 'if ((flag & TMSTARTRSCAN) == 0)' should work.
>

I'll make a correction in the next patch.  Thanks.

Michael

Re: jdbc xa support

From

Michael Allman

Date:

23 July 2005, 13:46:14

On Sat, 23 Jul 2005, Heikki Linnakangas wrote:

> On Fri, 22 Jul 2005, Michael Allman wrote:
>
>> On Fri, 22 Jul 2005, Heikki Linnakangas wrote:
>>
>>> On Thu, 21 Jul 2005, Michael Allman wrote:
>>>
>>>> I have serious doubts that any SQL database in the world supports this
>>>> behavior correctly.  If you know of one that does, I'd like to see its
>>>> magic.
>>>
>>> I tested it on some SQL databases, and at least Oracle seems to support
>>> it. DB2 fakes it by preparing early. Derby seems to support it, but it
>>> only supports XA in embedded mode.
>>
>> If Oracle supports it, it's likely because they have some server-side
>> stored procedures that do something magical.  I don't know.  I'm not an SQL
>> expert, but I don't think SQL by itself supports the association of
>> discrete DML statements with arbitrary transactions.
>
> SQL spec doesn't say anything about two-phase commit or XA, so no, SQL itself
> doesn't support any of that.
>
> Just looked at MySQL/InnoDB, they have these commands to deal with XA:
>
> XA BEGIN <xid> [JOIN | RESUME]
> XA START TRANSACTION <xid> [JOIN | RESUME] XA COMMIT <xid> [ONE PHASE]
> XA END <xid> [SUSPEND [FOR MIGRATE]]
> XA PREPARE <xid>
> XA RECOVER
> XA ROLLBACK <xid>
>
> They have all the support in the backend, so their driver implementation is
> trivial. (Of course, since it's MySQL, I wouldn't bet that they actually work
> the way they should, but anyway :))

What version of MySQL is this?  I cannot find documentation for these
commands.

>> You might want to check out SimpleJTA:
>>
>> http://www.simplejta.org/
>>
>> They have some XA driver notes.  Among them the following nugget:
>>
>> <quote>
>> JTA specifications allow an XAResource object to shared amongst multiple
>> concurrent transactions with the restriction that the resource can be
>> enlisted with a single transaction at a point in time. Resource sharing
>> amonst multiple transactions appears to cause a problem in Oracle in a
>> multi-threaded environment. Therefore, SimpleJTA is configured to defer the
>> reuse of an XAResource object by other transactions until the existing
>> transaction is completed, i.e., either committed or rolled back.
>> </quote>
>
> I wouldn't be surprised if all the other TMs did the same. We'll have to test
> it.
>
>>> I agree that it sucks.
>>>
>>>> I don't know what to do about this yet.
>>>
>>> The simplest implementation is one that returns all the recovered xids if
>>> flags include TMSTARTRSCAN, and an empty array in all other cases. That
>>> way, the internal implementation don't have to be stateful even though the
>>> API is.
>>
>> I posted a new version last night that does this.  I think it works.
>
> It's legal to give TMSTARTRSCAN | TMRENDSCAN as flags. Otherwise, looks good
> to me.

Oliver pointed out the same thing.  I made a correction and will post it
later on.

>>>>> 6. isSameRM considers two connections to the same database as different
>>>>> RMs. I'm not sure what the implications of this are, but I feel that's
>>>>> not right. I have the same issue in my implementation as well...
>>>>
>>>> They're different RM's because you can't join a transaction across two
>>>> physical JDBC Connections.  Each XAResource instance is associated with
>>>> exactly one physical connection instance.
>>>
>>> I don't think that's the correct definition of an RM. See section 2.2.4 of
>>> the XA specification. I think the Postgres database or cluster is one RM.
>>> But as I said, I don't know what implications your implementation has. It
>>> might work just fine, or not.
>>
>> It's up to the implementor to define the scope of an "RM" and what
>> isSameRM() means --- hence the interface method.
>>
>> The TM uses this method when it has another XAResource to enlist in the
>> transaction and wants to know if it should start another branch for it
>> (with start(newBranchXid, TMNOFLAGS)) or can join an existing transaction
>> branch (with start(existingBranchXid, TMJOIN)).
>>
>> The DTP XA spec says a single RM *may* service multiple independent
>> resource domains.  There are RM's that work like this, e.g. Berkeley DB
>> where transactions are represented as first-class Objects which can be
>> passed around within the same environment.  However, PostgreSQL does not
>> support this behavior.  Again, you can't join a transaction across physical
>> database connections.
>
> What's a resource domain? The way I understand it, a resource domain might be
> a Postgres database or cluster.

Maybe I made that up.  I don't know.  I'm 99% sure the current
implementation satisfies the spec.

>> One possible alternative we might explore is allowing an XAResource
>> instance, say xaRes1, for the same database as another XAResource instance,
>> say xaRes2, to adopt the same physical connection instance as xaRes2.  So
>> xaRes2.isSameRM(xaRes1) would return true if the underlying physical
>> connections pointed to the same PostgreSQL database (with the same user
>> credentials).  Then if a TM tried to join xaRes2 to xaRes1's transaction
>> branch, we could implement xaRes1.start(existingBranchXid, TMJOIN) to
>> assign xaRes1.physicalConnection = xaRes2.physicalConnection. Then they
>> would share the same transaction branch and context.  How about that?
>
> That sounds right. We'll need a global map of xids and physicalConnections.

It's going to be more complicated/impossible than I first thought.  We
would need to do the switcharoo on the client's connection handle, too.
Otherwise, they'd be sending updates on one physical connection and
committing the transaction on the other physical connection.

> But let's see how far we can get with the simpler method, that is, just
> define isSameRM as "return this == other", and not implement TMJOIN at all.

I'm still not sure we should change the implementation of isSameRM().
It's a little more defensive right now.  To do "return this == other"
would make a (currently valid) assumption about how PGXAResource is used.

Likewise, I think the implementatin of start() with the TMJOIN flag is
correct, though with the current implementation it should never be called.

On a related note, can we get the backend to indicate if a transaction was
read-only when the "PREPARE TRANSACTION" statement is called?  (Also, I
believe preparing a read-only transaction should automatically "commit"
it.)  We could then return XA_RDONLY from the driver's prepare() method in
that case.  As it is, it either throws an exception or returns XA_OK.

Michael

Re: jdbc xa support

From

Michael Allman

Date:

23 July 2005, 14:06:16

On Sat, 23 Jul 2005, Oliver Jowett wrote:

> Not quite -- it's a flag not an enumerated value -- the TM can specify
> TMENDRSCAN|TMSTARTRSCAN to restart a scan currently in progress.
>
> 'if ((flag & TMSTARTRSCAN) == 0)' should work.

I just realized how ironic it is that in the one method on XAResource that
takes a flag argument that may take an |'d argument the argument name is
singular "flag".  In end() and start(), which may take only a single flag,
the argument name is a plural "flags".  Ha ha ha!!!

Michael

Re: jdbc xa support

From

Heikki Linnakangas

Date:

23 July 2005, 18:19:04

On Sat, 23 Jul 2005, Michael Allman wrote:

> On Sat, 23 Jul 2005, Heikki Linnakangas wrote:
>>
>> Just looked at MySQL/InnoDB, they have these commands to deal with XA:
>>
>> XA BEGIN <xid> [JOIN | RESUME]
>> XA START TRANSACTION <xid> [JOIN | RESUME] XA COMMIT <xid> [ONE PHASE]
>> XA END <xid> [SUSPEND [FOR MIGRATE]]
>> XA PREPARE <xid>
>> XA RECOVER
>> XA ROLLBACK <xid>
>>
>> They have all the support in the backend, so their driver implementation is
>> trivial. (Of course, since it's MySQL, I wouldn't bet that they actually
>> work the way they should, but anyway :))
>
> What version of MySQL is this?  I cannot find documentation for these
> commands.

I downloaded a nighlt build of the jdbc driver. The filename seems to be
"mysql-connection-java-3.2-nightly-20050718.tar.gz". Maybe those commands
are so new that they are not documented? Or they are old and undocumented
anyway..

> On a related note, can we get the backend to indicate if a transaction was
> read-only when the "PREPARE TRANSACTION" statement is called?  (Also, I
> believe preparing a read-only transaction should automatically "commit" it.)
> We could then return XA_RDONLY from the driver's prepare() method in that
> case.  As it is, it either throws an exception or returns XA_OK.

It should be possible, but we're already past the feature freeze so it'll
have to wait for version 8.2. The backend already keeps track which
transactions are read-only, and skips updating the clog for them. That
information just needs to be exposed.

- Heikki

Re: jdbc xa support

From

Oliver Jowett

Date:

23 July 2005, 20:28:29

Heikki Linnakangas wrote:

>> On a related note, can we get the backend to indicate if a transaction
>> was read-only when the "PREPARE TRANSACTION" statement is called?
>> (Also, I believe preparing a read-only transaction should
>> automatically "commit" it.) We could then return XA_RDONLY from the
>> driver's prepare() method in that case.  As it is, it either throws an
>> exception or returns XA_OK.
>
>
> It should be possible, but we're already past the feature freeze so
> it'll have to wait for version 8.2. The backend already keeps track
> which transactions are read-only, and skips updating the clog for them.
> That information just needs to be exposed.

If the client uses Connection.setReadOnly() presumably we can track that
in the driver, which will catch some of the cases at least.

-O

Re: jdbc xa support

From

Michael Allman

Date:

24 July 2005, 01:28:15

On Sun, 24 Jul 2005, Heikki Linnakangas wrote:

> On Sat, 23 Jul 2005, Michael Allman wrote:
>
>> On a related note, can we get the backend to indicate if a transaction was
>> read-only when the "PREPARE TRANSACTION" statement is called?  (Also, I
>> believe preparing a read-only transaction should automatically "commit"
>> it.) We could then return XA_RDONLY from the driver's prepare() method in
>> that case.  As it is, it either throws an exception or returns XA_OK.
>
> It should be possible, but we're already past the feature freeze so it'll
> have to wait for version 8.2. The backend already keeps track which
> transactions are read-only, and skips updating the clog for them. That
> information just needs to be exposed.

Cool.  It would be a nice optimization to incorporate in the driver when
it becomes available in the backend.

Michael