Thread: jdbc xa support
Hi, I'm implementing XA support in the postgres JDBC driver to complement the backend two phase commit support in CVS. Is anyone else working on this? Michael
Not sure, but look at the archives, there was some discussion about various mechanism. Dave On 19-Jul-05, at 7:34 PM, Michael Allman wrote: > Hi, > > I'm implementing XA support in the postgres JDBC driver to > complement the backend two phase commit support in CVS. Is anyone > else working on this? > > Michael > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > >
I'm working on it. Glad to hear your interested. I don't have much time to work on it though, so I would be very happy if you took over. I'll send you a copy of my workbench off-list so you can take a look. It's work in progress, but I hope it helps.. I believe this is the discussion Dave mentioned: http://archives.postgresql.org/pgsql-jdbc/2005-06/msg00165.php - Heikki On Tue, 19 Jul 2005, Dave Cramer wrote: > Not sure, but look at the archives, there was some discussion about various > mechanism. > > Dave > On 19-Jul-05, at 7:34 PM, Michael Allman wrote: > >> Hi, >> >> I'm implementing XA support in the postgres JDBC driver to complement the >> backend two phase commit support in CVS. Is anyone else working on this? >> >> Michael
Here's my first cut: http://www.allman.ms/pgjdbcxa/pgjdbcxa-20050720.jar At this point I know the documentation is sparse. I'll try to improve that situation soon. Until then, I wanted to give everyone the opportunity to take a first look at the code and the approach. I also have some questions, some of which are embedded in the .java files as comments. If anyone has answers, please pass them along. I'll let you chew on this and check back tomorrow. Cheers, Michael On Wed, 20 Jul 2005, Heikki Linnakangas wrote: > I'm working on it. Glad to hear your interested. I don't have much time to > work on it though, so I would be very happy if you took over. > > I'll send you a copy of my workbench off-list so you can take a look. It's > work in progress, but I hope it helps.. > > I believe this is the discussion Dave mentioned: > > http://archives.postgresql.org/pgsql-jdbc/2005-06/msg00165.php > > - Heikki > > On Tue, 19 Jul 2005, Dave Cramer wrote: > >> Not sure, but look at the archives, there was some discussion about various >> mechanism. >> >> Dave >> On 19-Jul-05, at 7:34 PM, Michael Allman wrote: >> >>> Hi, >>> >>> I'm implementing XA support in the postgres JDBC driver to complement the >>> backend two phase commit support in CVS. Is anyone else working on this? >>> >>> Michael > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org >
Thanks. So far so good. You still have all the same tough issues ahead as I do, though. I realize it's work in progress, but here's some comments in no particular order: 1. To answer the question in the source code: PostgreSQL doesn't support transaction timeouts 2. Using prepared statements like "PREPARE TRANSACTION ?" won't work. You can only use prepared statements for normal SELECT/UPDATE/DELETE commands. 3. How are you planning to handle transaction interleaving discussed in the thread Dave mentioned? 4. recover is broken because it ignores the flags argument. That's going to cause an endless loop in the transaction manager when it tries to recover. See this discussion: http://forum.java.sun.com/thread.jspa?threadID=475468&messageID=2232566 5. commit and rollback check that the transaction is found in XID_TO_TRANSACTION_STATE_MAP. However, after a crash/recover cycle, the map will be empty. 6. isSameRM considers two connections to the same database as different RMs. I'm not sure what the implications of this are, but I feel that's not right. I have the same issue in my implementation as well... Also take a look at a list of comments on my code by Mike Bonnet. Some of them probably apply to your code as well. Those comments are about the version that's on my pg webpage: http://users.tkk.fi/~hlinnaka/pgsql/ The XA and JTA specifications are quite complicated. I'd like to see a good set of test cases that exercise all possible scenarious and also error conditions. We're also going to need testers with access to the popular application servers so that we know our implementation works with them. AFAIK, the only open source application server that does recovery properly is the CVS head version of JOnAS. Also, if we violate some parts of the specs (like the transaction interleaving part), it's important to know exactly what the limitations are and why. I started to write down the exact preconditions for each method in the javadoc comments, and also separate which preconditions come from the specs and which are just implementation-specific limitations. On Wed, 20 Jul 2005, Michael Allman wrote: > Here's my first cut: > > http://www.allman.ms/pgjdbcxa/pgjdbcxa-20050720.jar > > At this point I know the documentation is sparse. I'll try to improve that > situation soon. Until then, I wanted to give everyone the opportunity to > take a first look at the code and the approach. > > I also have some questions, some of which are embedded in the .java files as > comments. If anyone has answers, please pass them along. > > I'll let you chew on this and check back tomorrow. > > Cheers, > > Michael > > On Wed, 20 Jul 2005, Heikki Linnakangas wrote: > >> I'm working on it. Glad to hear your interested. I don't have much time to >> work on it though, so I would be very happy if you took over. >> >> I'll send you a copy of my workbench off-list so you can take a look. It's >> work in progress, but I hope it helps.. >> >> I believe this is the discussion Dave mentioned: >> >> http://archives.postgresql.org/pgsql-jdbc/2005-06/msg00165.php >> >> - Heikki >> >> On Tue, 19 Jul 2005, Dave Cramer wrote: >> >>> Not sure, but look at the archives, there was some discussion about >>> various mechanism. >>> >>> Dave >>> On 19-Jul-05, at 7:34 PM, Michael Allman wrote: >>> >>>> Hi, >>>> >>>> I'm implementing XA support in the postgres JDBC driver to complement the >>>> backend two phase commit support in CVS. Is anyone else working on this? >>>> >>>> Michael >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 4: Have you searched our list archives? >> >> http://archives.postgresql.org >> > - Heikki
Heikki, Thanks for your feedback. I am working on improvements and documentation. The rest of my response is below. On Thu, 21 Jul 2005, Heikki Linnakangas wrote: > Thanks. > > So far so good. You still have all the same tough issues ahead as I do, > though. > > I realize it's work in progress, but here's some comments in no particular > order: > > 1. To answer the question in the source code: PostgreSQL doesn't support > transaction timeouts I figured. > 2. Using prepared statements like "PREPARE TRANSACTION ?" won't work. You can > only use prepared statements for normal SELECT/UPDATE/DELETE commands. Doesn't the driver support client side prepared statements? > 3. How are you planning to handle transaction interleaving discussed in the > thread Dave mentioned? I'm not. PostgreSQL does not support this behavior, and I see no need to pretend it does. I think the appropriate thing to do is throw an exception when the second start is called. I have serious doubts that any SQL database in the world supports this behavior correctly. If you know of one that does, I'd like to see its magic. > 4. recover is broken because it ignores the flags argument. That's going to > cause an endless loop in the transaction manager when it tries to recover. > See this discussion: > http://forum.java.sun.com/thread.jspa?threadID=475468&messageID=2232566 That is problematic. The API for recovery is stateful, and, IMHO, poorly designed. If you look at the original DTP XA spec you'll see it makes much more sense. I don't know what to do about this yet. > 5. commit and rollback check that the transaction is found in > XID_TO_TRANSACTION_STATE_MAP. However, after a crash/recover cycle, the map > will be empty. I think I have a good solution to this. I'll post it with my next patchset, coming soon. I'll elaborate then. > 6. isSameRM considers two connections to the same database as different RMs. > I'm not sure what the implications of this are, but I feel that's not right. > I have the same issue in my implementation as well... They're different RM's because you can't join a transaction across two physical JDBC Connections. Each XAResource instance is associated with exactly one physical connection instance. In light of the implementation, I could probably just define isSameRM() as return this == otherXAResource . . . > Also take a look at a list of comments on my code by Mike Bonnet. Some of > them probably apply to your code as well. Those comments are about the > version that's on my pg webpage: > > http://users.tkk.fi/~hlinnaka/pgsql/ I saw these. Nothing jumped out at me. The documentation is inadequate, and I'll work on that. The logging may be inadequate. I don't know about that. I've used this kind of XA implementation before and this kind of logging has been adequate for me to debug problems. > The XA and JTA specifications are quite complicated. I'd like to see a good > set of test cases that exercise all possible scenarious and also error > conditions. We're also going to need testers with access to the popular > application servers so that we know our implementation works with them. > AFAIK, the only open source application server that does recovery properly is > the CVS head version of JOnAS. I have some cactus test cases for an XML database that has an XA driver. I'm not feeling too motivated to port them to PostgreSQL. > Also, if we violate some parts of the specs (like the transaction > interleaving part), it's important to know exactly what the limitations are > and why. I started to write down the exact preconditions for each method in > the javadoc comments, and also separate which preconditions come from the > specs and which are just implementation-specific limitations. I think the interleaving business is a non-issue. I can't think of a real world case where a transaction manager would do this. Can you? Besides, like I said, I doubt any other SQL database supports this. I know Berkeley DB does, but Berkeley DB lets you associate any database call with any transaction, so it's easy. JTA was written with more than just SQL databases in mind, and I don't think we need to bend over backwards to implement some corner functionality for a resource which, by its design, doesn't support it. Thanks again. I'll post another revision which fixes recovery and addresses other issues soon. Michael > On Wed, 20 Jul 2005, Michael Allman wrote: > >> Here's my first cut: >> >> http://www.allman.ms/pgjdbcxa/pgjdbcxa-20050720.jar >> >> At this point I know the documentation is sparse. I'll try to improve that >> situation soon. Until then, I wanted to give everyone the opportunity to >> take a first look at the code and the approach. >> >> I also have some questions, some of which are embedded in the .java files >> as comments. If anyone has answers, please pass them along. >> >> I'll let you chew on this and check back tomorrow. >> >> Cheers, >> >> Michael >> >> On Wed, 20 Jul 2005, Heikki Linnakangas wrote: >> >>> I'm working on it. Glad to hear your interested. I don't have much time to >>> work on it though, so I would be very happy if you took over. >>> >>> I'll send you a copy of my workbench off-list so you can take a look. It's >>> work in progress, but I hope it helps.. >>> >>> I believe this is the discussion Dave mentioned: >>> >>> http://archives.postgresql.org/pgsql-jdbc/2005-06/msg00165.php >>> >>> - Heikki >>> >>> On Tue, 19 Jul 2005, Dave Cramer wrote: >>> >>>> Not sure, but look at the archives, there was some discussion about >>>> various mechanism. >>>> >>>> Dave >>>> On 19-Jul-05, at 7:34 PM, Michael Allman wrote: >>>> >>>>> Hi, >>>>> >>>>> I'm implementing XA support in the postgres JDBC driver to complement >>>>> the backend two phase commit support in CVS. Is anyone else working on >>>>> this? >>>>> >>>>> Michael >>> >>> ---------------------------(end of broadcast)--------------------------- >>> TIP 4: Have you searched our list archives? >>> >>> http://archives.postgresql.org >>> >> > > - Heikki >
I've posted a newer revision here: http://www.allman.ms/pgjdbcxa/pgjdbcxa-20050721.jar I modified the implementation of recover to put the recovered xid's in the transaction state map so that they'll be there when the tm tries to commit or roll them back. I renamed some variable and class names to take them a little closer to their actual meaning. I added small amount of code documentation. Much more to come. I think I made some other minor adjustments which I've forgotten. Michael On Thu, 21 Jul 2005, Heikki Linnakangas wrote: > Thanks. > > So far so good. You still have all the same tough issues ahead as I do, > though. > > I realize it's work in progress, but here's some comments in no particular > order: > > 1. To answer the question in the source code: PostgreSQL doesn't support > transaction timeouts > > 2. Using prepared statements like "PREPARE TRANSACTION ?" won't work. You can > only use prepared statements for normal SELECT/UPDATE/DELETE commands. > > 3. How are you planning to handle transaction interleaving discussed in the > thread Dave mentioned? > > 4. recover is broken because it ignores the flags argument. That's going to > cause an endless loop in the transaction manager when it tries to recover. > See this discussion: > http://forum.java.sun.com/thread.jspa?threadID=475468&messageID=2232566 > > 5. commit and rollback check that the transaction is found in > XID_TO_TRANSACTION_STATE_MAP. However, after a crash/recover cycle, the map > will be empty. > > 6. isSameRM considers two connections to the same database as different RMs. > I'm not sure what the implications of this are, but I feel that's not right. > I have the same issue in my implementation as well... > > Also take a look at a list of comments on my code by Mike Bonnet. Some of > them probably apply to your code as well. Those comments are about the > version that's on my pg webpage: > > http://users.tkk.fi/~hlinnaka/pgsql/ > > The XA and JTA specifications are quite complicated. I'd like to see a good > set of test cases that exercise all possible scenarious and also error > conditions. We're also going to need testers with access to the popular > application servers so that we know our implementation works with them. > AFAIK, the only open source application server that does recovery properly is > the CVS head version of JOnAS. > > Also, if we violate some parts of the specs (like the transaction > interleaving part), it's important to know exactly what the limitations are > and why. I started to write down the exact preconditions for each method in > the javadoc comments, and also separate which preconditions come from the > specs and which are just implementation-specific limitations. > > On Wed, 20 Jul 2005, Michael Allman wrote: > >> Here's my first cut: >> >> http://www.allman.ms/pgjdbcxa/pgjdbcxa-20050720.jar >> >> At this point I know the documentation is sparse. I'll try to improve that >> situation soon. Until then, I wanted to give everyone the opportunity to >> take a first look at the code and the approach. >> >> I also have some questions, some of which are embedded in the .java files >> as comments. If anyone has answers, please pass them along. >> >> I'll let you chew on this and check back tomorrow. >> >> Cheers, >> >> Michael >> >> On Wed, 20 Jul 2005, Heikki Linnakangas wrote: >> >>> I'm working on it. Glad to hear your interested. I don't have much time to >>> work on it though, so I would be very happy if you took over. >>> >>> I'll send you a copy of my workbench off-list so you can take a look. It's >>> work in progress, but I hope it helps.. >>> >>> I believe this is the discussion Dave mentioned: >>> >>> http://archives.postgresql.org/pgsql-jdbc/2005-06/msg00165.php >>> >>> - Heikki >>> >>> On Tue, 19 Jul 2005, Dave Cramer wrote: >>> >>>> Not sure, but look at the archives, there was some discussion about >>>> various mechanism. >>>> >>>> Dave >>>> On 19-Jul-05, at 7:34 PM, Michael Allman wrote: >>>> >>>>> Hi, >>>>> >>>>> I'm implementing XA support in the postgres JDBC driver to complement >>>>> the backend two phase commit support in CVS. Is anyone else working on >>>>> this? >>>>> >>>>> Michael >>> >>> ---------------------------(end of broadcast)--------------------------- >>> TIP 4: Have you searched our list archives? >>> >>> http://archives.postgresql.org >>> >> > > - Heikki >
Michael Allman wrote: >> 4. recover is broken because it ignores the flags argument. That's >> going to cause an endless loop in the transaction manager when it >> tries to recover. See this discussion: >> http://forum.java.sun.com/thread.jspa?threadID=475468&messageID=2232566 > > > That is problematic. The API for recovery is stateful, and, IMHO, > poorly designed. If you look at the original DTP XA spec you'll see it > makes much more sense. Huh? TMSTARTRSCAN etc are in the original DTP XA spec too, assuming you mean this one: X/Open CAE Specification Distributed Transaction Processing: The XA Specification ISBN: 1 872630 24 3 X/Open Document Number: XO/CAE/91/300 -O
On Fri, 22 Jul 2005, Oliver Jowett wrote: > Michael Allman wrote: > >>> 4. recover is broken because it ignores the flags argument. That's >>> going to cause an endless loop in the transaction manager when it >>> tries to recover. See this discussion: >>> http://forum.java.sun.com/thread.jspa?threadID=475468&messageID=2232566 >> >> >> That is problematic. The API for recovery is stateful, and, IMHO, >> poorly designed. If you look at the original DTP XA spec you'll see it >> makes much more sense. > > Huh? TMSTARTRSCAN etc are in the original DTP XA spec too, assuming you > mean this one: > > X/Open CAE Specification > Distributed Transaction Processing: The XA Specification > ISBN: 1 872630 24 3 > X/Open Document Number: XO/CAE/91/300 Yes, and if you look at the specification for the xa_recover() function 47 you see it's a lot different from its Java counterpart. Michael
Michael Allman wrote: > On Fri, 22 Jul 2005, Oliver Jowett wrote: > >> Michael Allman wrote: >> >>>> 4. recover is broken because it ignores the flags argument. That's >>>> going to cause an endless loop in the transaction manager when it >>>> tries to recover. See this discussion: >>>> http://forum.java.sun.com/thread.jspa?threadID=475468&messageID=2232566 >>> >>> >>> >>> That is problematic. The API for recovery is stateful, and, IMHO, >>> poorly designed. If you look at the original DTP XA spec you'll see it >>> makes much more sense. >> >> >> Huh? TMSTARTRSCAN etc are in the original DTP XA spec too, assuming you >> mean this one: >> >> X/Open CAE Specification >> Distributed Transaction Processing: The XA Specification >> ISBN: 1 872630 24 3 >> X/Open Document Number: XO/CAE/91/300 > > > Yes, and if you look at the specification for the xa_recover() function > 47 you see it's a lot different from its Java counterpart. Actually, I don't see that at all; the Java API is pretty much a direct mapping of the C API. Quote from the C API: === So that all XIDs may be returned irrespective of the size of the array xids, one or more xa_recover () calls may be used within a single recovery scan. The flags parameter to xa_recover () defines when a recovery scan should start or end, or start and end. The start of a recovery scan moves a cursor to the start of a list of prepared and heuristically completed transactions. Throughout the recovery scan the cursor marks the current position in that list. Each call advances the cursor past the set of XIDs it returns. [...] Upon success, xa_recover () places zero or more XIDs in the space pointed to by xids. The function returns the number of XIDs it has placed there. If this value is less than count, there are no more XIDs to recover and the current scan ends. [...] Following are the valid settings of flags: TMSTARTRSCAN This flag indicates that xa_recover () should start a recovery scan for the thread of control and position the cursor to the start of the list. XIDs are returned from that point. If a recovery scan is already open, the effect is as if the recovery scan were ended and then restarted. TMENDRSCAN This flag indicates that xa_recover () should end the recovery scan after returning the XIDs. If this flag is used in conjunction with TMSTARTRSCAN, the single xa_recover () call starts and then ends a scan. TMNOFLAGS This flag must be used when no other flags are set in flags. A recovery scan must already be started. XIDs are returned starting at the current cursor position. === Java API: === The flag parameter indicates where the recover scan should start or end, or start and end. This method may be invoked one or more times during a recovery scan. The resource manager maintains a cursor which marks the current position of the prepared or heuristically completed transaction list. Each invocation of the recover method moves the cursor passed the set of Xids that are returned. [...] flag One of TMSTARTRSCAN, TMENDRSCAN, TMNOFLAGS. TMNOFLAGS must be used when no other flags are used. TMSTARTRSCAN - indicates that the recovery scan should be started at the beginning of the prepared or heuristically completed transaction list. TMENDRSCAN - indicates that the recovery scan should be ended after the method returns the Xid list. If this flag is used in conjunction with the TMSTARTRSCAN, this method invocation starts and ends the recovery scan. TMNOFLAGS - this flag must be used when no other flags are specified. This flag may be used only if the recovery scan has already been started. The list of Xids are returned Returns: xid[] The resource manager returns zero or more Xids for the transaction branches that are currently in a prepared or heuristically completed state. ==== The only significant difference is that the Java API doesn't talk about how to signal no-more-XIDs .. I'm guessing this is just an oversight. The thread referred above to mentions using a null or zero-sized return from recover() to signal that. But both the Java and C versions are stateful and essentially equivalent, so I'm not sure what your complaint is exactly.. -O
On Fri, 22 Jul 2005, Oliver Jowett wrote: > Michael Allman wrote: >> On Fri, 22 Jul 2005, Oliver Jowett wrote: >> >>> Michael Allman wrote: >>> >>>>> 4. recover is broken because it ignores the flags argument. That's >>>>> going to cause an endless loop in the transaction manager when it >>>>> tries to recover. See this discussion: >>>>> http://forum.java.sun.com/thread.jspa?threadID=475468&messageID=2232566 >>>> >>>> >>>> >>>> That is problematic. The API for recovery is stateful, and, IMHO, >>>> poorly designed. If you look at the original DTP XA spec you'll see it >>>> makes much more sense. >>> >>> >>> Huh? TMSTARTRSCAN etc are in the original DTP XA spec too, assuming you >>> mean this one: >>> >>> X/Open CAE Specification >>> Distributed Transaction Processing: The XA Specification >>> ISBN: 1 872630 24 3 >>> X/Open Document Number: XO/CAE/91/300 >> >> >> Yes, and if you look at the specification for the xa_recover() function >> 47 you see it's a lot different from its Java counterpart. > > Actually, I don't see that at all; the Java API is pretty much a direct > mapping of the C API. Quote from the C API: > > === > So that all XIDs may be returned irrespective of the size of the array > xids, one or more xa_recover () calls may be used within a single > recovery scan. The flags parameter to xa_recover () defines when a > recovery scan should start or end, or start and end. The start of a > recovery scan moves a cursor to the start of a list of prepared and > heuristically completed transactions. Throughout the recovery scan the > cursor marks the current position in that list. Each call advances the > cursor past the set of XIDs it returns. > [...] > Upon success, xa_recover () places zero or more XIDs in the space > pointed to by xids. The function returns the number of XIDs it has > placed there. If this value is less than count, there are no more XIDs > to recover and the current scan ends. > [...] > Following are the valid settings of flags: > TMSTARTRSCAN > This flag indicates that xa_recover () should start a recovery scan for > the thread of control and position the cursor to the start of the list. > XIDs are returned from that point. If a recovery scan is already open, > the effect is as if the recovery scan were ended and then restarted. > TMENDRSCAN > This flag indicates that xa_recover () should end the recovery scan > after returning the XIDs. If this flag is used in conjunction with > TMSTARTRSCAN, the single xa_recover () call starts and then ends a scan. > TMNOFLAGS > This flag must be used when no other flags are set in flags. A recovery > scan must already be started. XIDs are returned starting at the current > cursor position. > === > > Java API: > > === > The flag parameter indicates where the recover scan should start or end, > or start and end. This method may be invoked one or more times during a > recovery scan. The resource manager maintains a cursor which marks the > current position of the prepared or heuristically completed transaction > list. Each invocation of the recover method moves the cursor passed the > set of Xids that are returned. > [...] > flag > One of TMSTARTRSCAN, TMENDRSCAN, TMNOFLAGS. TMNOFLAGS must be used > when no other flags are used. > TMSTARTRSCAN - indicates that the recovery > scan should be started at the beginning of the prepared or heuristically > completed transaction list. > TMENDRSCAN - indicates that the recovery scan should be ended after the > method returns the Xid list. If this flag is used in conjunction with > the TMSTARTRSCAN, this method invocation starts and ends the recovery > scan. > TMNOFLAGS - this flag must be used when no other flags are specified. > This flag may be used only if the recovery scan has already been > started. The list of Xids are returned > Returns: xid[] > The resource manager returns zero or more Xids for the transaction > branches that are currently in a prepared or heuristically completed > state. > ==== > > The only significant difference is that the Java API doesn't talk about > how to signal no-more-XIDs .. I'm guessing this is just an oversight. > The thread referred above to mentions using a null or zero-sized return > from recover() to signal that. > > But both the Java and C versions are stateful and essentially > equivalent, so I'm not sure what your complaint is exactly.. The C version takes an argument specifying the maximum number of xids to recover. The Java version does not. Without this information, the Java version not only looks silly but doesn't make a lot of sense either. For example, how many recovered xids should we return on the first call to recover()? Anyway, do you think my implementation of the recover() method violates the JTA spec? Michael
Michael Allman wrote: > The C version takes an argument specifying the maximum number of xids to > recover. The Java version does not. That's at least partly because the C version makes the caller allocate the array. > Without this information, the Java > version not only looks silly but doesn't make a lot of sense either. It seems ok to me -- it puts the burden for selecting a suitable batch size on the resource rather than the TM, but that's six of one, half a dozen of the other. That also means the RM can generate the array at whatever size is convenient, rather than having to internally buffer if it retrieves xids in some large block size than the TM selects. It certainly doesn't make it more or less stateful, so I still don't understand your original objection. > For example, how many recovered xids should we return on the first call > to recover()? For our JDBC implementation? Set a fetchsize on your query to something reasonable -- perhaps 500? -- and return up to that many Xids per call until you hit the end of the resultset, then return empty arrays thereafter until a new scan starts. > Anyway, do you think my implementation of the recover() method violates > the JTA spec? The code in pgjdbcxa-20050721.jar appears to violate the spec, as you completely ignore the flags argument. You need to track the recovery scan state even if you decide to return all Xids in one array, because subsequent calls shouldn't return those Xids again until a new scan is started, per the API docs. -O
On Fri, 22 Jul 2005, Oliver Jowett wrote: > Michael Allman wrote: > >> The C version takes an argument specifying the maximum number of xids to >> recover. The Java version does not. > > That's at least partly because the C version makes the caller allocate the > array. In this situation it is understandable why the C method is stateful and how it is supposed to work. The caller tells the resource manager how many xids it can handle at once by passing a length argument. It also passes an XID* which it malloc()'s to hold length * (3 * sizeof(long) + XIDDATASIZE * sizeof(char)) bytes, or something like that. Without a length argument allowing the TM to tell the RM how many xids it can take, it doesn't make sense to return "some" xids at a time. Because how many "some" is is determined by the RM, not the TM, and that defeats the purpose of the C version's approach. This is why I say the Java version doesn't make sense. >> Without this information, the Java version not only looks silly but >> doesn't make a lot of sense either. > > It seems ok to me -- it puts the burden for selecting a suitable batch size > on the resource rather than the TM, but that's six of one, half a dozen of > the other. That also means the RM can generate the array at whatever size is > convenient, rather than having to internally buffer if it retrieves xids in > some large block size than the TM selects. > > It certainly doesn't make it more or less stateful, so I still don't > understand your original objection. > >> For example, how many recovered xids should we return on the first call to >> recover()? > > For our JDBC implementation? Set a fetchsize on your query to something > reasonable -- perhaps 500? -- and return up to that many Xids per call until > you hit the end of the resultset, then return empty arrays thereafter until a > new scan starts. Why 500? It's simpler to return all of them. >> Anyway, do you think my implementation of the recover() method violates the >> JTA spec? > > The code in pgjdbcxa-20050721.jar appears to violate the spec, as you > completely ignore the flags argument. You need to track the recovery scan > state even if you decide to return all Xids in one array, because subsequent > calls shouldn't return those Xids again until a new scan is started, per the > API docs. I do track the recovery scan state. Each call puts the scan cursor at the end of the list. I still don't see a violation of the API. It looks like the JTA API is wrong or there's a typo. If we follow the spirit of the DTP spec it seems that the TMNOFLAGS flag means "return some xids starting from where we last left off". I'm still not sure what TMENDRSCAN means. Michael
Michael Allman wrote: > Without a length argument allowing the TM to tell the RM how many xids > it can take, it doesn't make sense to return "some" xids at a time. > Because how many "some" is is determined by the RM, not the TM, and that > defeats the purpose of the C version's approach. Well, personal preference perhaps, it seems fine to me to allow the RM to break up the list into more manageable parts if it wants to. For example, consider a Java RM that is implemented by JNI calls to a C implementation of the XA interface. >> For our JDBC implementation? Set a fetchsize on your query to >> something reasonable -- perhaps 500? -- and return up to that many >> Xids per call until you hit the end of the resultset, then return >> empty arrays thereafter until a new scan starts. > > Why 500? It's simpler to return all of them. Sure, it doesn't really matter either way, it's just an implementation decision. My only concern was if you're pulling in a million Xids at once then you are going to bloat out the heap more than you need to -- but that's unlikely to happen in any real system anyway. >>> Anyway, do you think my implementation of the recover() method >>> violates the JTA spec? >> >> >> The code in pgjdbcxa-20050721.jar appears to violate the spec, as you >> completely ignore the flags argument. You need to track the recovery >> scan state even if you decide to return all Xids in one array, because >> subsequent calls shouldn't return those Xids again until a new scan is >> started, per the API docs. > > > I do track the recovery scan state. Each call puts the scan cursor at > the end of the list. Uh, are we looking at the same code here? I don't see anything in the code from pgjdbcxa-20050721.jar that records whether we are at the end of the list or not between calls to recover(), and I don't see anything that looks for TMSTARTRSCAN to reset that state. If I've missed it, can you point it out to me? > I still don't see a violation of the API. The API says that if a TM does this: Xid[] xids_1 = resource.recover(TMSTARTRSCAN); Xid[] xids_2 = resource.recover(TMNOFLAGS); Xid[] xids_3 = resource.recover(TMNOFLAGS); then xids_1, xids_2, and xids_3 reflect consecutive (and possibly empty if you hit the end) parts of the recovery list. It seems your code does not respect this -- it will return the full list of Xids repeatedly in each of xids_1, xids_2 and xids_3. Given that the way that the TM decides that the recovery scan is complete is by looking for an empty array returned from recover(), your code is going to hit exactly the infinite-loop case described in the thread that Heikki posted. > It looks like the JTA API is wrong or there's a typo. If we follow the > spirit of the DTP spec it seems that the TMNOFLAGS flag means "return > some xids starting from where we last left off". I'm still not sure > what TMENDRSCAN means. The JTA specification is fairly clear about their meanings. TMNOFLAGS does indeed mean "continue the current scan". TMENDRSCAN means "I'm done with this scan", and allows RMs to free resources they have allocated for the scan. === The flag parameter indicates where the recover scan should start or end, or start and end. This method may be invoked one or more times during a recoveryscan. The resource manager maintains a cursor which marks the current position of the prepared or heuristically completed transaction list. Each invocation of the recover method moves the cursor past the set of Xids that are returned. [...] TMSTARTRSCAN - indicates that the recoveryscan should be started at the beginning of the prepared or heuristically completed transaction list. TMENDRSCAN - indicates that the recovery scan should be ended after the method returns the Xid list. If this flag is used in conjunction with the TMSTARTRSCAN, this method invocation starts and ends the recovery scan. TMNOFLAGS - this flag must be used when no other flags are specified. This flag may be used only if the recovery scan has already been started. The list of Xids are returned === -O
On Fri, 22 Jul 2005, Oliver Jowett wrote: >> I do track the recovery scan state. Each call puts the scan cursor at the >> end of the list. > > Uh, are we looking at the same code here? I don't see anything in the code > from pgjdbcxa-20050721.jar that records whether we are at the end of the list > or not between calls to recover(), and I don't see anything that looks for > TMSTARTRSCAN to reset that state. If I've missed it, can you point it out to > me? You're right. But so am I. Since the code returns the complete list, the "cursor" is always at the "end" of the "list" of prepared xids. The code simply starts at the beginning for each call to recover(). >> I still don't see a violation of the API. > > The API says that if a TM does this: > > Xid[] xids_1 = resource.recover(TMSTARTRSCAN); > Xid[] xids_2 = resource.recover(TMNOFLAGS); > Xid[] xids_3 = resource.recover(TMNOFLAGS); > > then xids_1, xids_2, and xids_3 reflect consecutive (and possibly empty if > you hit the end) parts of the recovery list. It seems your code does not > respect this -- it will return the full list of Xids repeatedly in each of > xids_1, xids_2 and xids_3. I don't see anything in the JTA spec that says the TMNOFLAGS int means anything other than that no other flag was passed to recover(). In the DTP spec it says something about returning xids starting at the current cursor position. >> It looks like the JTA API is wrong or there's a typo. If we follow the >> spirit of the DTP spec it seems that the TMNOFLAGS flag means "return some >> xids starting from where we last left off". I'm still not sure what >> TMENDRSCAN means. > > The JTA specification is fairly clear about their meanings. TMNOFLAGS does > indeed mean "continue the current scan". TMENDRSCAN means "I'm done with this > scan", and allows RMs to free resources they have allocated for the scan. I added the lines if (flag != TMSTARTRSCAN) { return new Xid[0]; } to the top of the recover() method and posted a new version at http://www.allman.ms/pgjdbcxa/pgjdbcxa-20050722.jar How does that look? Michael
On Thu, 21 Jul 2005, Michael Allman wrote: > On Thu, 21 Jul 2005, Heikki Linnakangas wrote: > >> 2. Using prepared statements like "PREPARE TRANSACTION ?" won't work. You >> can only use prepared statements for normal SELECT/UPDATE/DELETE commands. > > Doesn't the driver support client side prepared statements? No, they're server side. I tried that too at first, but it didn't work. >> 3. How are you planning to handle transaction interleaving discussed in the >> thread Dave mentioned? > > I'm not. PostgreSQL does not support this behavior, and I see no need to > pretend it does. I think the appropriate thing to do is throw an exception > when the second start is called. I agree. However, I'd like to try it with the popular TMs to make sure they work without it. > I have serious doubts that any SQL database in the world supports this > behavior correctly. If you know of one that does, I'd like to see its magic. I tested it on some SQL databases, and at least Oracle seems to support it. DB2 fakes it by preparing early. Derby seems to support it, but it only supports XA in embedded mode. >> 4. recover is broken because it ignores the flags argument. That's going to >> cause an endless loop in the transaction manager when it tries to recover. >> See this discussion: >> http://forum.java.sun.com/thread.jspa?threadID=475468&messageID=2232566 > > That is problematic. The API for recovery is stateful, and, IMHO, poorly > designed. If you look at the original DTP XA spec you'll see it makes much > more sense. I agree that it sucks. > I don't know what to do about this yet. The simplest implementation is one that returns all the recovered xids if flags include TMSTARTRSCAN, and an empty array in all other cases. That way, the internal implementation don't have to be stateful even though the API is. >> 6. isSameRM considers two connections to the same database as different >> RMs. I'm not sure what the implications of this are, but I feel that's not >> right. I have the same issue in my implementation as well... > > They're different RM's because you can't join a transaction across two > physical JDBC Connections. Each XAResource instance is associated with > exactly one physical connection instance. I don't think that's the correct definition of an RM. See section 2.2.4 of the XA specification. I think the Postgres database or cluster is one RM. But as I said, I don't know what implications your implementation has. It might work just fine, or not. > In light of the implementation, I could probably just define isSameRM() as > return this == otherXAResource . . . Yep. >> The XA and JTA specifications are quite complicated. I'd like to see a good >> set of test cases that exercise all possible scenarious and also error >> conditions. We're also going to need testers with access to the popular >> application servers so that we know our implementation works with them. >> AFAIK, the only open source application server that does recovery properly >> is the CVS head version of JOnAS. > > I have some cactus test cases for an XML database that has an XA driver. I'm > not feeling too motivated to port them to PostgreSQL. Can you send them? I'd like to take a look, even if we can't use them directly. Which XML database is that? >> Also, if we violate some parts of the specs (like the transaction >> interleaving part), it's important to know exactly what the limitations are >> and why. I started to write down the exact preconditions for each method in >> the javadoc comments, and also separate which preconditions come from the >> specs and which are just implementation-specific limitations. > > I think the interleaving business is a non-issue. I can't think of a real > world case where a transaction manager would do this. Can you? Using interleaving, the application server could get away with a smaller connection pool. It could recycle the connections right after end call, without waiting for the prepare/commit cycle. I don't know if any application server does that in practice. If it turns out to be a problem, we might get away by some clever locking. We could make the second start call block and wait for the previous transaction to finish. > Besides, like I said, I doubt any other SQL database supports this. I know > Berkeley DB does, but Berkeley DB lets you associate any database call with > any transaction, so it's easy. > > JTA was written with more than just SQL databases in mind, and I don't think > we need to bend over backwards to implement some corner functionality for a > resource which, by its design, doesn't support it. I agree, if the application servers work without it. - Heikki
On Fri, 22 Jul 2005, Heikki Linnakangas wrote: > On Thu, 21 Jul 2005, Michael Allman wrote: > >> On Thu, 21 Jul 2005, Heikki Linnakangas wrote: >> >>> 2. Using prepared statements like "PREPARE TRANSACTION ?" won't work. You >>> can only use prepared statements for normal SELECT/UPDATE/DELETE commands. >> >> Doesn't the driver support client side prepared statements? > > No, they're server side. I tried that too at first, but it didn't work. I will make corrections. >>> 3. How are you planning to handle transaction interleaving discussed in >>> the thread Dave mentioned? >> >> I'm not. PostgreSQL does not support this behavior, and I see no need to >> pretend it does. I think the appropriate thing to do is throw an exception >> when the second start is called. > > I agree. However, I'd like to try it with the popular TMs to make sure they > work without it. > >> I have serious doubts that any SQL database in the world supports this >> behavior correctly. If you know of one that does, I'd like to see its >> magic. > > I tested it on some SQL databases, and at least Oracle seems to support it. > DB2 fakes it by preparing early. Derby seems to support it, but it only > supports XA in embedded mode. If Oracle supports it, it's likely because they have some server-side stored procedures that do something magical. I don't know. I'm not an SQL expert, but I don't think SQL by itself supports the association of discrete DML statements with arbitrary transactions. You might want to check out SimpleJTA: http://www.simplejta.org/ They have some XA driver notes. Among them the following nugget: <quote> JTA specifications allow an XAResource object to shared amongst multiple concurrent transactions with the restriction that the resource can be enlisted with a single transaction at a point in time. Resource sharing amonst multiple transactions appears to cause a problem in Oracle in a multi-threaded environment. Therefore, SimpleJTA is configured to defer the reuse of an XAResource object by other transactions until the existing transaction is completed, i.e., either committed or rolled back. </quote> >>> 4. recover is broken because it ignores the flags argument. That's going >>> to cause an endless loop in the transaction manager when it tries to >>> recover. See this discussion: >>> http://forum.java.sun.com/thread.jspa?threadID=475468&messageID=2232566 >> >> That is problematic. The API for recovery is stateful, and, IMHO, poorly >> designed. If you look at the original DTP XA spec you'll see it makes much >> more sense. > > I agree that it sucks. > >> I don't know what to do about this yet. > > The simplest implementation is one that returns all the recovered xids if > flags include TMSTARTRSCAN, and an empty array in all other cases. That way, > the internal implementation don't have to be stateful even though the API is. I posted a new version last night that does this. I think it works. >>> 6. isSameRM considers two connections to the same database as different >>> RMs. I'm not sure what the implications of this are, but I feel that's not >>> right. I have the same issue in my implementation as well... >> >> They're different RM's because you can't join a transaction across two >> physical JDBC Connections. Each XAResource instance is associated with >> exactly one physical connection instance. > > I don't think that's the correct definition of an RM. See section 2.2.4 of > the XA specification. I think the Postgres database or cluster is one RM. But > as I said, I don't know what implications your implementation has. It might > work just fine, or not. It's up to the implementor to define the scope of an "RM" and what isSameRM() means --- hence the interface method. The TM uses this method when it has another XAResource to enlist in the transaction and wants to know if it should start another branch for it (with start(newBranchXid, TMNOFLAGS)) or can join an existing transaction branch (with start(existingBranchXid, TMJOIN)). The DTP XA spec says a single RM *may* service multiple independent resource domains. There are RM's that work like this, e.g. Berkeley DB where transactions are represented as first-class Objects which can be passed around within the same environment. However, PostgreSQL does not support this behavior. Again, you can't join a transaction across physical database connections. One possible alternative we might explore is allowing an XAResource instance, say xaRes1, for the same database as another XAResource instance, say xaRes2, to adopt the same physical connection instance as xaRes2. So xaRes2.isSameRM(xaRes1) would return true if the underlying physical connections pointed to the same PostgreSQL database (with the same user credentials). Then if a TM tried to join xaRes2 to xaRes1's transaction branch, we could implement xaRes1.start(existingBranchXid, TMJOIN) to assign xaRes1.physicalConnection = xaRes2.physicalConnection. Then they would share the same transaction branch and context. How about that? >> In light of the implementation, I could probably just define isSameRM() as >> return this == otherXAResource . . . > > Yep. > >>> The XA and JTA specifications are quite complicated. I'd like to see a >>> good set of test cases that exercise all possible scenarious and also >>> error conditions. We're also going to need testers with access to the >>> popular application servers so that we know our implementation works with >>> them. AFAIK, the only open source application server that does recovery >>> properly is the CVS head version of JOnAS. >> >> I have some cactus test cases for an XML database that has an XA driver. >> I'm not feeling too motivated to port them to PostgreSQL. > > Can you send them? I'd like to take a look, even if we can't use them > directly. Which XML database is that? It's for Berkeley DB XML. I think they're in CVS: http://berkeley-dbxml-adapter.dev.java.net/ However, these are high-level tests at the user API level. Perhaps we should write tests to XAResource directly? I don't even think they'd need to be cactus tests. >>> Also, if we violate some parts of the specs (like the transaction >>> interleaving part), it's important to know exactly what the limitations >>> are and why. I started to write down the exact preconditions for each >>> method in the javadoc comments, and also separate which preconditions >>> come from the specs and which are just implementation-specific >>> limitations. >> >> I think the interleaving business is a non-issue. I can't think of a real >> world case where a transaction manager would do this. Can you? > > Using interleaving, the application server could get away with a smaller > connection pool. It could recycle the connections right after end call, > without waiting for the prepare/commit cycle. But then leave the first transaction hanging? My gut tells me you should resolve, either rollback or commit, a transaction as soon as you can. Prepared transactions hold onto their locks, right? Leaving them unresolved would lead to poorer concurrency, not better. I don't really understand the rationale behind this "interleaving" idea. > I don't know if any application server does that in practice. If it turns out > to be a problem, we might get away by some clever locking. We could make the > second start call block and wait for the previous transaction to finish. > >> Besides, like I said, I doubt any other SQL database supports this. I know >> Berkeley DB does, but Berkeley DB lets you associate any database call with >> any transaction, so it's easy. >> >> JTA was written with more than just SQL databases in mind, and I don't >> think we need to bend over backwards to implement some corner functionality >> for a resource which, by its design, doesn't support it. > > I agree, if the application servers work without it. > > - Heikki >
I've uploaded a new version of my patch to http://www.allman.ms/pgjdbcxa/pgjdbcxa-20050722-2.jar This version includes some bug fixes and a small number of (working) unit tests. It occurred to me recently that start(xid, TMJOIN) is broken. Given that this implementation doesn't support transaction branch joining, it should probably just throw an XAException. Of course, since isSameRm() returns true only for identical PGXAResource instances a TM should never call start(xid, TMJOIN). Makes sense? Michael On Fri, 22 Jul 2005, Michael Allman wrote: > On Fri, 22 Jul 2005, Heikki Linnakangas wrote: > >> On Thu, 21 Jul 2005, Michael Allman wrote: >> >>> On Thu, 21 Jul 2005, Heikki Linnakangas wrote: >>> >>>> 2. Using prepared statements like "PREPARE TRANSACTION ?" won't work. You >>>> can only use prepared statements for normal SELECT/UPDATE/DELETE >>>> commands. >>> >>> Doesn't the driver support client side prepared statements? >> >> No, they're server side. I tried that too at first, but it didn't work. > > I will make corrections. > >>>> 3. How are you planning to handle transaction interleaving discussed in >>>> the thread Dave mentioned? >>> >>> I'm not. PostgreSQL does not support this behavior, and I see no need to >>> pretend it does. I think the appropriate thing to do is throw an >>> exception when the second start is called. >> >> I agree. However, I'd like to try it with the popular TMs to make sure they >> work without it. >> >>> I have serious doubts that any SQL database in the world supports this >>> behavior correctly. If you know of one that does, I'd like to see its >>> magic. >> >> I tested it on some SQL databases, and at least Oracle seems to support it. >> DB2 fakes it by preparing early. Derby seems to support it, but it only >> supports XA in embedded mode. > > If Oracle supports it, it's likely because they have some server-side stored > procedures that do something magical. I don't know. I'm not an SQL expert, > but I don't think SQL by itself supports the association of discrete DML > statements with arbitrary transactions. > > You might want to check out SimpleJTA: > > http://www.simplejta.org/ > > They have some XA driver notes. Among them the following nugget: > > <quote> > JTA specifications allow an XAResource object to shared amongst multiple > concurrent transactions with the restriction that the resource can be > enlisted with a single transaction at a point in time. Resource sharing > amonst multiple transactions appears to cause a problem in Oracle in a > multi-threaded environment. Therefore, SimpleJTA is configured to defer the > reuse of an XAResource object by other transactions until the existing > transaction is completed, i.e., either committed or rolled back. > </quote> > >>>> 4. recover is broken because it ignores the flags argument. That's going >>>> to cause an endless loop in the transaction manager when it tries to >>>> recover. See this discussion: >>>> http://forum.java.sun.com/thread.jspa?threadID=475468&messageID=2232566 >>> >>> That is problematic. The API for recovery is stateful, and, IMHO, poorly >>> designed. If you look at the original DTP XA spec you'll see it makes >>> much more sense. >> >> I agree that it sucks. >> >>> I don't know what to do about this yet. >> >> The simplest implementation is one that returns all the recovered xids if >> flags include TMSTARTRSCAN, and an empty array in all other cases. That >> way, the internal implementation don't have to be stateful even though the >> API is. > > I posted a new version last night that does this. I think it works. > >>>> 6. isSameRM considers two connections to the same database as different >>>> RMs. I'm not sure what the implications of this are, but I feel that's >>>> not right. I have the same issue in my implementation as well... >>> >>> They're different RM's because you can't join a transaction across two >>> physical JDBC Connections. Each XAResource instance is associated with >>> exactly one physical connection instance. >> >> I don't think that's the correct definition of an RM. See section 2.2.4 of >> the XA specification. I think the Postgres database or cluster is one RM. >> But as I said, I don't know what implications your implementation has. It >> might work just fine, or not. > > It's up to the implementor to define the scope of an "RM" and what isSameRM() > means --- hence the interface method. > > The TM uses this method when it has another XAResource to enlist in the > transaction and wants to know if it should start another branch for it (with > start(newBranchXid, TMNOFLAGS)) or can join an existing transaction branch > (with start(existingBranchXid, TMJOIN)). > > The DTP XA spec says a single RM *may* service multiple independent resource > domains. There are RM's that work like this, e.g. Berkeley DB where > transactions are represented as first-class Objects which can be passed > around within the same environment. However, PostgreSQL does not support > this behavior. Again, you can't join a transaction across physical database > connections. > > One possible alternative we might explore is allowing an XAResource instance, > say xaRes1, for the same database as another XAResource instance, say xaRes2, > to adopt the same physical connection instance as xaRes2. So > xaRes2.isSameRM(xaRes1) would return true if the underlying physical > connections pointed to the same PostgreSQL database (with the same user > credentials). Then if a TM tried to join xaRes2 to xaRes1's transaction > branch, we could implement xaRes1.start(existingBranchXid, TMJOIN) to assign > xaRes1.physicalConnection = xaRes2.physicalConnection. Then they would share > the same transaction branch and context. How about that? > >>> In light of the implementation, I could probably just define isSameRM() as >>> return this == otherXAResource . . . >> >> Yep. >> >>>> The XA and JTA specifications are quite complicated. I'd like to see a >>>> good set of test cases that exercise all possible scenarious and also >>>> error conditions. We're also going to need testers with access to the >>>> popular application servers so that we know our implementation works with >>>> them. AFAIK, the only open source application server that does recovery >>>> properly is the CVS head version of JOnAS. >>> >>> I have some cactus test cases for an XML database that has an XA driver. >>> I'm not feeling too motivated to port them to PostgreSQL. >> >> Can you send them? I'd like to take a look, even if we can't use them >> directly. Which XML database is that? > > It's for Berkeley DB XML. I think they're in CVS: > > http://berkeley-dbxml-adapter.dev.java.net/ > > However, these are high-level tests at the user API level. Perhaps we should > write tests to XAResource directly? I don't even think they'd need to be > cactus tests. > >>>> Also, if we violate some parts of the specs (like the transaction >>>> interleaving part), it's important to know exactly what the limitations >>>> are and why. I started to write down the exact preconditions for each >>>> method in the javadoc comments, and also separate which preconditions >>>> come from the specs and which are just implementation-specific >>>> limitations. >>> >>> I think the interleaving business is a non-issue. I can't think of a real >>> world case where a transaction manager would do this. Can you? >> >> Using interleaving, the application server could get away with a smaller >> connection pool. It could recycle the connections right after end call, >> without waiting for the prepare/commit cycle. > > But then leave the first transaction hanging? My gut tells me you should > resolve, either rollback or commit, a transaction as soon as you can. > Prepared transactions hold onto their locks, right? Leaving them unresolved > would lead to poorer concurrency, not better. > > I don't really understand the rationale behind this "interleaving" idea. > >> I don't know if any application server does that in practice. If it turns >> out to be a problem, we might get away by some clever locking. We could >> make the second start call block and wait for the previous transaction to >> finish. >> >>> Besides, like I said, I doubt any other SQL database supports this. I >>> know Berkeley DB does, but Berkeley DB lets you associate any database >>> call with any transaction, so it's easy. >>> >>> JTA was written with more than just SQL databases in mind, and I don't >>> think we need to bend over backwards to implement some corner >>> functionality for a resource which, by its design, doesn't support it. >> >> I agree, if the application servers work without it. >> >> - Heikki >> > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend >
Michael Allman wrote: > Since the code returns the complete list, > the "cursor" is always at the "end" of the "list" of prepared xids. Correct. > The code simply starts at the beginning for each call to recover(). This however isn't true -- the cursor should only be reset if TMSTARTRSCAN is specified, not on every call. > I don't see anything in the JTA spec that says the TMNOFLAGS int means > anything other than that no other flag was passed to recover(). In the > DTP spec it says something about returning xids starting at the current > cursor position. See the description of recover() in the JTA spec: >> The flag parameter indicates where the recover scan should start or end, >> or start and end. This method may be invoked one or more times during a >> recovery scan. The resource manager maintains a cursor which marks the >> current position of the prepared or heuristically completed transaction >> list. Each invocation of the recover method moves the cursor passed the >> set of Xids that are returned. [...] >> TMSTARTRSCAN - indicates that the recovery scan should be started at the beginning of the >> prepared or heuristically completed transaction list. So if TMSTARTRSCAN is specified, you move the cursor to the start of the list. Then (regardless of if TMSTARTRSCAN was specified) you generate an array of Xids to return starting from the current cursor position, and move the cursor forward past those Xids. In your case, if you return the whole list when TMSTARTRSCAN is specified, then that implies you should return an empty list when it's not specified. > I added the lines > > if (flag != TMSTARTRSCAN) { > return new Xid[0]; > } > > to the top of the recover() method and posted a new version at > > http://www.allman.ms/pgjdbcxa/pgjdbcxa-20050722.jar Not quite -- it's a flag not an enumerated value -- the TM can specify TMENDRSCAN|TMSTARTRSCAN to restart a scan currently in progress. 'if ((flag & TMSTARTRSCAN) == 0)' should work. -O
Heikki Linnakangas wrote: > On Thu, 21 Jul 2005, Michael Allman wrote: > >> On Thu, 21 Jul 2005, Heikki Linnakangas wrote: >> >>> 2. Using prepared statements like "PREPARE TRANSACTION ?" won't work. >>> You can only use prepared statements for normal SELECT/UPDATE/DELETE >>> commands. >> >> Doesn't the driver support client side prepared statements? > > No, they're server side. I tried that too at first, but it didn't work. Yeah, it boils down to: you can only put a ? placeholder where there's a PARAM terminal in the server's SQL grammar, as the driver translates the placeholders to '$n' strings on Parse and uses Bind to pass the actual values on each execution. COMMIT PREPARED etc take a Sconst, not a PARAM, for their argument. -O
On Fri, 22 Jul 2005, Michael Allman wrote: > On Fri, 22 Jul 2005, Heikki Linnakangas wrote: > >> On Thu, 21 Jul 2005, Michael Allman wrote: >> >>> I have serious doubts that any SQL database in the world supports this >>> behavior correctly. If you know of one that does, I'd like to see its >>> magic. >> >> I tested it on some SQL databases, and at least Oracle seems to support it. >> DB2 fakes it by preparing early. Derby seems to support it, but it only >> supports XA in embedded mode. > > If Oracle supports it, it's likely because they have some server-side stored > procedures that do something magical. I don't know. I'm not an SQL expert, > but I don't think SQL by itself supports the association of discrete DML > statements with arbitrary transactions. SQL spec doesn't say anything about two-phase commit or XA, so no, SQL itself doesn't support any of that. Just looked at MySQL/InnoDB, they have these commands to deal with XA: XA BEGIN <xid> [JOIN | RESUME] XA START TRANSACTION <xid> [JOIN | RESUME] XA COMMIT <xid> [ONE PHASE] XA END <xid> [SUSPEND [FOR MIGRATE]] XA PREPARE <xid> XA RECOVER XA ROLLBACK <xid> They have all the support in the backend, so their driver implementation is trivial. (Of course, since it's MySQL, I wouldn't bet that they actually work the way they should, but anyway :)) > You might want to check out SimpleJTA: > > http://www.simplejta.org/ > > They have some XA driver notes. Among them the following nugget: > > <quote> > JTA specifications allow an XAResource object to shared amongst multiple > concurrent transactions with the restriction that the resource can be > enlisted with a single transaction at a point in time. Resource sharing > amonst multiple transactions appears to cause a problem in Oracle in a > multi-threaded environment. Therefore, SimpleJTA is configured to defer the > reuse of an XAResource object by other transactions until the existing > transaction is completed, i.e., either committed or rolled back. > </quote> I wouldn't be surprised if all the other TMs did the same. We'll have to test it. >> I agree that it sucks. >> >>> I don't know what to do about this yet. >> >> The simplest implementation is one that returns all the recovered xids if >> flags include TMSTARTRSCAN, and an empty array in all other cases. That >> way, the internal implementation don't have to be stateful even though the >> API is. > > I posted a new version last night that does this. I think it works. It's legal to give TMSTARTRSCAN | TMRENDSCAN as flags. Otherwise, looks good to me. >>>> 6. isSameRM considers two connections to the same database as different >>>> RMs. I'm not sure what the implications of this are, but I feel that's >>>> not right. I have the same issue in my implementation as well... >>> >>> They're different RM's because you can't join a transaction across two >>> physical JDBC Connections. Each XAResource instance is associated with >>> exactly one physical connection instance. >> >> I don't think that's the correct definition of an RM. See section 2.2.4 of >> the XA specification. I think the Postgres database or cluster is one RM. >> But as I said, I don't know what implications your implementation has. It >> might work just fine, or not. > > It's up to the implementor to define the scope of an "RM" and what isSameRM() > means --- hence the interface method. > > The TM uses this method when it has another XAResource to enlist in the > transaction and wants to know if it should start another branch for it (with > start(newBranchXid, TMNOFLAGS)) or can join an existing transaction branch > (with start(existingBranchXid, TMJOIN)). > > The DTP XA spec says a single RM *may* service multiple independent resource > domains. There are RM's that work like this, e.g. Berkeley DB where > transactions are represented as first-class Objects which can be passed > around within the same environment. However, PostgreSQL does not support > this behavior. Again, you can't join a transaction across physical database > connections. What's a resource domain? The way I understand it, a resource domain might be a Postgres database or cluster. > One possible alternative we might explore is allowing an XAResource instance, > say xaRes1, for the same database as another XAResource instance, say xaRes2, > to adopt the same physical connection instance as xaRes2. So > xaRes2.isSameRM(xaRes1) would return true if the underlying physical > connections pointed to the same PostgreSQL database (with the same user > credentials). Then if a TM tried to join xaRes2 to xaRes1's transaction > branch, we could implement xaRes1.start(existingBranchXid, TMJOIN) to assign > xaRes1.physicalConnection = xaRes2.physicalConnection. Then they would share > the same transaction branch and context. How about that? That sounds right. We'll need a global map of xids and physicalConnections. But let's see how far we can get with the simpler method, that is, just define isSameRM as "return this == other", and not implement TMJOIN at all. - Heikki
>> I added the lines >> >> if (flag != TMSTARTRSCAN) { >> return new Xid[0]; >> } >> >> to the top of the recover() method and posted a new version at >> >> http://www.allman.ms/pgjdbcxa/pgjdbcxa-20050722.jar > > Not quite -- it's a flag not an enumerated value -- the TM can specify > TMENDRSCAN|TMSTARTRSCAN to restart a scan currently in progress. > > 'if ((flag & TMSTARTRSCAN) == 0)' should work. > I'll make a correction in the next patch. Thanks. Michael
On Sat, 23 Jul 2005, Heikki Linnakangas wrote: > On Fri, 22 Jul 2005, Michael Allman wrote: > >> On Fri, 22 Jul 2005, Heikki Linnakangas wrote: >> >>> On Thu, 21 Jul 2005, Michael Allman wrote: >>> >>>> I have serious doubts that any SQL database in the world supports this >>>> behavior correctly. If you know of one that does, I'd like to see its >>>> magic. >>> >>> I tested it on some SQL databases, and at least Oracle seems to support >>> it. DB2 fakes it by preparing early. Derby seems to support it, but it >>> only supports XA in embedded mode. >> >> If Oracle supports it, it's likely because they have some server-side >> stored procedures that do something magical. I don't know. I'm not an SQL >> expert, but I don't think SQL by itself supports the association of >> discrete DML statements with arbitrary transactions. > > SQL spec doesn't say anything about two-phase commit or XA, so no, SQL itself > doesn't support any of that. > > Just looked at MySQL/InnoDB, they have these commands to deal with XA: > > XA BEGIN <xid> [JOIN | RESUME] > XA START TRANSACTION <xid> [JOIN | RESUME] XA COMMIT <xid> [ONE PHASE] > XA END <xid> [SUSPEND [FOR MIGRATE]] > XA PREPARE <xid> > XA RECOVER > XA ROLLBACK <xid> > > They have all the support in the backend, so their driver implementation is > trivial. (Of course, since it's MySQL, I wouldn't bet that they actually work > the way they should, but anyway :)) What version of MySQL is this? I cannot find documentation for these commands. >> You might want to check out SimpleJTA: >> >> http://www.simplejta.org/ >> >> They have some XA driver notes. Among them the following nugget: >> >> <quote> >> JTA specifications allow an XAResource object to shared amongst multiple >> concurrent transactions with the restriction that the resource can be >> enlisted with a single transaction at a point in time. Resource sharing >> amonst multiple transactions appears to cause a problem in Oracle in a >> multi-threaded environment. Therefore, SimpleJTA is configured to defer the >> reuse of an XAResource object by other transactions until the existing >> transaction is completed, i.e., either committed or rolled back. >> </quote> > > I wouldn't be surprised if all the other TMs did the same. We'll have to test > it. > >>> I agree that it sucks. >>> >>>> I don't know what to do about this yet. >>> >>> The simplest implementation is one that returns all the recovered xids if >>> flags include TMSTARTRSCAN, and an empty array in all other cases. That >>> way, the internal implementation don't have to be stateful even though the >>> API is. >> >> I posted a new version last night that does this. I think it works. > > It's legal to give TMSTARTRSCAN | TMRENDSCAN as flags. Otherwise, looks good > to me. Oliver pointed out the same thing. I made a correction and will post it later on. >>>>> 6. isSameRM considers two connections to the same database as different >>>>> RMs. I'm not sure what the implications of this are, but I feel that's >>>>> not right. I have the same issue in my implementation as well... >>>> >>>> They're different RM's because you can't join a transaction across two >>>> physical JDBC Connections. Each XAResource instance is associated with >>>> exactly one physical connection instance. >>> >>> I don't think that's the correct definition of an RM. See section 2.2.4 of >>> the XA specification. I think the Postgres database or cluster is one RM. >>> But as I said, I don't know what implications your implementation has. It >>> might work just fine, or not. >> >> It's up to the implementor to define the scope of an "RM" and what >> isSameRM() means --- hence the interface method. >> >> The TM uses this method when it has another XAResource to enlist in the >> transaction and wants to know if it should start another branch for it >> (with start(newBranchXid, TMNOFLAGS)) or can join an existing transaction >> branch (with start(existingBranchXid, TMJOIN)). >> >> The DTP XA spec says a single RM *may* service multiple independent >> resource domains. There are RM's that work like this, e.g. Berkeley DB >> where transactions are represented as first-class Objects which can be >> passed around within the same environment. However, PostgreSQL does not >> support this behavior. Again, you can't join a transaction across physical >> database connections. > > What's a resource domain? The way I understand it, a resource domain might be > a Postgres database or cluster. Maybe I made that up. I don't know. I'm 99% sure the current implementation satisfies the spec. >> One possible alternative we might explore is allowing an XAResource >> instance, say xaRes1, for the same database as another XAResource instance, >> say xaRes2, to adopt the same physical connection instance as xaRes2. So >> xaRes2.isSameRM(xaRes1) would return true if the underlying physical >> connections pointed to the same PostgreSQL database (with the same user >> credentials). Then if a TM tried to join xaRes2 to xaRes1's transaction >> branch, we could implement xaRes1.start(existingBranchXid, TMJOIN) to >> assign xaRes1.physicalConnection = xaRes2.physicalConnection. Then they >> would share the same transaction branch and context. How about that? > > That sounds right. We'll need a global map of xids and physicalConnections. It's going to be more complicated/impossible than I first thought. We would need to do the switcharoo on the client's connection handle, too. Otherwise, they'd be sending updates on one physical connection and committing the transaction on the other physical connection. > But let's see how far we can get with the simpler method, that is, just > define isSameRM as "return this == other", and not implement TMJOIN at all. I'm still not sure we should change the implementation of isSameRM(). It's a little more defensive right now. To do "return this == other" would make a (currently valid) assumption about how PGXAResource is used. Likewise, I think the implementatin of start() with the TMJOIN flag is correct, though with the current implementation it should never be called. On a related note, can we get the backend to indicate if a transaction was read-only when the "PREPARE TRANSACTION" statement is called? (Also, I believe preparing a read-only transaction should automatically "commit" it.) We could then return XA_RDONLY from the driver's prepare() method in that case. As it is, it either throws an exception or returns XA_OK. Michael
On Sat, 23 Jul 2005, Oliver Jowett wrote: > Not quite -- it's a flag not an enumerated value -- the TM can specify > TMENDRSCAN|TMSTARTRSCAN to restart a scan currently in progress. > > 'if ((flag & TMSTARTRSCAN) == 0)' should work. I just realized how ironic it is that in the one method on XAResource that takes a flag argument that may take an |'d argument the argument name is singular "flag". In end() and start(), which may take only a single flag, the argument name is a plural "flags". Ha ha ha!!! Michael
On Sat, 23 Jul 2005, Michael Allman wrote: > On Sat, 23 Jul 2005, Heikki Linnakangas wrote: >> >> Just looked at MySQL/InnoDB, they have these commands to deal with XA: >> >> XA BEGIN <xid> [JOIN | RESUME] >> XA START TRANSACTION <xid> [JOIN | RESUME] XA COMMIT <xid> [ONE PHASE] >> XA END <xid> [SUSPEND [FOR MIGRATE]] >> XA PREPARE <xid> >> XA RECOVER >> XA ROLLBACK <xid> >> >> They have all the support in the backend, so their driver implementation is >> trivial. (Of course, since it's MySQL, I wouldn't bet that they actually >> work the way they should, but anyway :)) > > What version of MySQL is this? I cannot find documentation for these > commands. I downloaded a nighlt build of the jdbc driver. The filename seems to be "mysql-connection-java-3.2-nightly-20050718.tar.gz". Maybe those commands are so new that they are not documented? Or they are old and undocumented anyway.. > On a related note, can we get the backend to indicate if a transaction was > read-only when the "PREPARE TRANSACTION" statement is called? (Also, I > believe preparing a read-only transaction should automatically "commit" it.) > We could then return XA_RDONLY from the driver's prepare() method in that > case. As it is, it either throws an exception or returns XA_OK. It should be possible, but we're already past the feature freeze so it'll have to wait for version 8.2. The backend already keeps track which transactions are read-only, and skips updating the clog for them. That information just needs to be exposed. - Heikki
Heikki Linnakangas wrote: >> On a related note, can we get the backend to indicate if a transaction >> was read-only when the "PREPARE TRANSACTION" statement is called? >> (Also, I believe preparing a read-only transaction should >> automatically "commit" it.) We could then return XA_RDONLY from the >> driver's prepare() method in that case. As it is, it either throws an >> exception or returns XA_OK. > > > It should be possible, but we're already past the feature freeze so > it'll have to wait for version 8.2. The backend already keeps track > which transactions are read-only, and skips updating the clog for them. > That information just needs to be exposed. If the client uses Connection.setReadOnly() presumably we can track that in the driver, which will catch some of the cases at least. -O
On Sun, 24 Jul 2005, Heikki Linnakangas wrote: > On Sat, 23 Jul 2005, Michael Allman wrote: > >> On a related note, can we get the backend to indicate if a transaction was >> read-only when the "PREPARE TRANSACTION" statement is called? (Also, I >> believe preparing a read-only transaction should automatically "commit" >> it.) We could then return XA_RDONLY from the driver's prepare() method in >> that case. As it is, it either throws an exception or returns XA_OK. > > It should be possible, but we're already past the feature freeze so it'll > have to wait for version 8.2. The backend already keeps track which > transactions are read-only, and skips updating the clog for them. That > information just needs to be exposed. Cool. It would be a nice optimization to incorporate in the driver when it becomes available in the backend. Michael