Thread: Incorrect response code after XA recovery

Incorrect response code after XA recovery

From
Ondrej Chaloupka
Date:
Hi,

I would like to consult with you a problematic response put by PostgreSQL after transaction recovery run by Narayana
(JBossTS).

I work on tests for Narayana and I hit a issue with PostgreSQL. The db returns incorrect code XAException.XA_HEURHAZ
whenthe TM does recovery after crash of the jboss eap app server. 
The exception is following:
Caused by: org.postgresql.util.PSQLException: ERROR: prepared transaction with identifier
"131072_AAAAAAAAAAAAAP//fwAAAd7TXOBR8jj5AAAAKDE=_AAAAAAAAAAAAAP//fwAAAd7TXOBR8jj5AAAALQAAAAAAAAAA"does not exist 

It's run on PostgreSQL 9.2 but the older versions seem to be affected as well.

The problem occurs when TM runs on JTS transactions.

The idea of the test:
The test enlists two resources to a transaction. There is called prepare on resource of PostgreSQL. The app server
crashesbefore prepare is called on second transaction participant. After restart of the app server TM tries to recover
thetransaction. As the fail occurs during prepare phase rollback is expected. 

The OTS specification requires both bottom up and top down recovery to be triggered by the recovering resource. This
causesthat two rollback calls are done against the DB. DB receives rollback call and does the rollback. Then for the
secondtime it returns the exceptional code. As the DB already rollbacked the transaction and forgot about it the DB
returnserror that no such transaction exists. But this seems to be against OTS specification. 
There are some more details in the following bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=988724

Do you have some experience with such behaviour? Can I suppose this being problem of PostgreSQL? Or is there already
somebug for this issue in Postgres bugtracking system? 

Thank you
Ondra


Re: Incorrect response code after XA recovery

From
Tom Lane
Date:
Ondrej Chaloupka <ochaloup@redhat.com> writes:
> The OTS specification requires both bottom up and top down recovery to be triggered by the recovering resource. This
causesthat two rollback calls are done against the DB. DB receives rollback call and does the rollback. Then for the
secondtime it returns the exceptional code. As the DB already rollbacked the transaction and forgot about it the DB
returnserror that no such transaction exists. But this seems to be against OTS specification. 

It's not likely that we would consider changing the behavior of ROLLBACK
PREPARED.  The alternatives we would have are (1) silently accept a
ROLLBACK against a non-existent transaction ID, or (2) remember every
rolled-back ID forever.  Neither seems sane in the least.

It seems to me that this is something client-side code, probably the XA
manager, would need to deal with.  The XA manager already has to track
uncommitted 2-phase transactions, and would furthermore have the best
idea of when it would be safe to forget about a rolled-back ID.

Right offhand it appears to me that that Red Hat bug is filed against
the correct component, and you need to push them harder to fix their
bug/shortcoming rather than claim it's our problem.

            regards, tom lane


Re: Incorrect response code after XA recovery

From
Tom Lane
Date:
Tom Jenkinson <tom.jenkinson@redhat.com> writes:
> A little bit of information in the linked bugzilla report is that the
> exception being returned has an XA error code of XAER_RMERR "An error
> occurred in rolling back the transaction branch. The resource manager is
> free to forget about the branch when returning this error so long as all
> accessing threads of control have been notified of the branch�s state."

> That does not sound right to me, wouldn't XAER_NOTA "The specified XID
> is not known by the resource manager" be more accurate?

No idea, but in any case that's outside Postgres' purview.  It's barely
possible that the Postgres JDBC driver has something to do with that,
but it sounds more like the XA manager's turf.

            regards, tom lane


Re: Incorrect response code after XA recovery

From
Tom Lane
Date:
Tom Jenkinson <tom.jenkinson@redhat.com> writes:
> On Mon 29 Jul 2013 15:46:12 BST, Tom Lane wrote:
>> No idea, but in any case that's outside Postgres' purview.  It's barely
>> possible that the Postgres JDBC driver has something to do with that,
>> but it sounds more like the XA manager's turf.

> I am not sure what you mean here as I don't know the structure of how
> the PostGres project is packaged, all I know is that the PostGres JDBC
> driver component appears to be returning an XAException with the
> message "Error rolling back prepared transaction" and an errorCode of
> XAException.XAER_RMERR rather than XAER_NOTA.

> Is there a different component within your bug tracking system  we
> should be using to raise this against the JDBC driver instead?

The folk who would fix anything in the JDBC driver tend to read
pgsql-jdbc sooner than pgsql-bugs, so cc'ing there for comment.

            regards, tom lane


Re: Incorrect response code after XA recovery

From
Tom Jenkinson
Date:
Hi Tom,

On Mon 29 Jul 2013 15:46:12 BST, Tom Lane wrote:
> Tom Jenkinson <tom.jenkinson@redhat.com> writes:
>> A little bit of information in the linked bugzilla report is that the
>> exception being returned has an XA error code of XAER_RMERR "An error
>> occurred in rolling back the transaction branch. The resource manager is
>> free to forget about the branch when returning this error so long as all
>> accessing threads of control have been notified of the branch’s state."
>
>> That does not sound right to me, wouldn't XAER_NOTA "The specified XID
>> is not known by the resource manager" be more accurate?
>
> No idea, but in any case that's outside Postgres' purview.  It's barely
> possible that the Postgres JDBC driver has something to do with that,
> but it sounds more like the XA manager's turf.

I am not sure what you mean here as I don't know the structure of how
the PostGres project is packaged, all I know is that the PostGres JDBC
driver component appears to be returning an XAException with the
message "Error rolling back prepared transaction" and an errorCode of
XAException.XAER_RMERR rather than XAER_NOTA.

Is there a different component within your bug tracking system  we
should be using to raise this against the JDBC driver instead?

Thanks,
Tom


Re: Incorrect response code after XA recovery

From
Tom Jenkinson
Date:
Hi Tom,

A little bit of information in the linked bugzilla report is that the
exception being returned has an XA error code of XAER_RMERR "An error
occurred in rolling back the transaction branch. The resource manager is
free to forget about the branch when returning this error so long as all
accessing threads of control have been notified of the branch’s state."

That does not sound right to me, wouldn't XAER_NOTA "The specified XID
is not known by the resource manager" be more accurate?

Thanks,
Tom

On 29/07/13 14:50, Tom Lane wrote:
> Ondrej Chaloupka <ochaloup@redhat.com> writes:
>> The OTS specification requires both bottom up and top down recovery to be triggered by the recovering resource. This
causesthat two rollback calls are done against the DB. DB receives rollback call and does the rollback. Then for the
secondtime it returns the exceptional code. As the DB already rollbacked the transaction and forgot about it the DB
returnserror that no such transaction exists. But this seems to be against OTS specification. 
> It's not likely that we would consider changing the behavior of ROLLBACK
> PREPARED.  The alternatives we would have are (1) silently accept a
> ROLLBACK against a non-existent transaction ID, or (2) remember every
> rolled-back ID forever.  Neither seems sane in the least.
>
> It seems to me that this is something client-side code, probably the XA
> manager, would need to deal with.  The XA manager already has to track
> uncommitted 2-phase transactions, and would furthermore have the best
> idea of when it would be safe to forget about a rolled-back ID.
>
> Right offhand it appears to me that that Red Hat bug is filed against
> the correct component, and you need to push them harder to fix their
> bug/shortcoming rather than claim it's our problem.
>
>             regards, tom lane



Re: [GENERAL] Incorrect response code after XA recovery

From
Alban Hertroys
Date:
On Jul 29, 2013, at 16:57, Tom Jenkinson <tom.jenkinson@redhat.com> wrote:

> Hi Tom,
>
> On Mon 29 Jul 2013 15:46:12 BST, Tom Lane wrote:
>> Tom Jenkinson <tom.jenkinson@redhat.com> writes:
>>> A little bit of information in the linked bugzilla report is that the
>>> exception being returned has an XA error code of XAER_RMERR "An error
>>> occurred in rolling back the transaction branch. The resource manager is
>>> free to forget about the branch when returning this error so long as all
>>> accessing threads of control have been notified of the branch’s state."
>>
>>> That does not sound right to me, wouldn't XAER_NOTA "The specified XID
>>> is not known by the resource manager" be more accurate?
>>
>> No idea, but in any case that's outside Postgres' purview.  It's barely
>> possible that the Postgres JDBC driver has something to do with that,
>> but it sounds more like the XA manager's turf.
>
> I am not sure what you mean here as I don't know the structure of how the PostGres project is packaged, all I know is
thatthe PostGres JDBC driver component appears to be returning an XAException with the message "Error rolling back
preparedtransaction" and an errorCode of XAException.XAER_RMERR rather than XAER_NOTA. 


Looking at the error codes, it appears that it isn't even the Postgres JDBC driver returning that error, but the XA
manageryou're using, which is not a part of Postgres (nor is the JDBC driver, for that matter - that's a separate
project).

The errors you're quoting are from the XA manager and are about XA manager stuff. For all we know, the actual error
appearsto be occuring in the XA manager and not in Postgres. It's possible that the XA manager error is a result of an
errorthat Postgres returned, but since the XA manager prints its own error message and not the original one, you'll
needto uncover those error messages before we can help you with them. 

For all we know at this point, the error is with your XA manager, not with Postgres.

If you want to be sure, grep the source of the JDBC driver for those error codes; I doubt you'll find them in there.
Google was kind enough to point me here: http://jdbc.postgresql.org/development/git.html

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: [GENERAL] Incorrect response code after XA recovery

From
Tom Jenkinson
Date:
Hi Alban,

I stripped down the code to a raw XA example using the latest postgres
driver available in maven central. It demonstrates that regardless of
what the codebase might suggest, it is certainly the case that postgres
is returning XAER_RMERR in the scenario where the resource manager no
longer knows about the Xid.

The code is available here:
https://github.com/tomjenkinson/xa-recovery/commit/944d45e86a91eacb9489843acfbf6a80f1b4b820

I hope that this helps,
Tom

On Mon 29 Jul 2013 18:52:31 BST, Alban Hertroys wrote:
> On Jul 29, 2013, at 16:57, Tom Jenkinson <tom.jenkinson@redhat.com> wrote:
>
>> Hi Tom,
>>
>> On Mon 29 Jul 2013 15:46:12 BST, Tom Lane wrote:
>>> Tom Jenkinson <tom.jenkinson@redhat.com> writes:
>>>> A little bit of information in the linked bugzilla report is that the
>>>> exception being returned has an XA error code of XAER_RMERR "An error
>>>> occurred in rolling back the transaction branch. The resource manager is
>>>> free to forget about the branch when returning this error so long as all
>>>> accessing threads of control have been notified of the branch’s state."
>>>
>>>> That does not sound right to me, wouldn't XAER_NOTA "The specified XID
>>>> is not known by the resource manager" be more accurate?
>>>
>>> No idea, but in any case that's outside Postgres' purview.  It's barely
>>> possible that the Postgres JDBC driver has something to do with that,
>>> but it sounds more like the XA manager's turf.
>>
>> I am not sure what you mean here as I don't know the structure of how the PostGres project is packaged, all I know
isthat the PostGres JDBC driver component appears to be returning an XAException with the message "Error rolling back
preparedtransaction" and an errorCode of XAException.XAER_RMERR rather than XAER_NOTA. 
>
>
> Looking at the error codes, it appears that it isn't even the Postgres JDBC driver returning that error, but the XA
manageryou're using, which is not a part of Postgres (nor is the JDBC driver, for that matter - that's a separate
project).
>
> The errors you're quoting are from the XA manager and are about XA manager stuff. For all we know, the actual error
appearsto be occuring in the XA manager and not in Postgres. It's possible that the XA manager error is a result of an
errorthat Postgres returned, but since the XA manager prints its own error message and not the original one, you'll
needto uncover those error messages before we can help you with them. 
>
> For all we know at this point, the error is with your XA manager, not with Postgres.
>
> If you want to be sure, grep the source of the JDBC driver for those error codes; I doubt you'll find them in there.
> Google was kind enough to point me here: http://jdbc.postgresql.org/development/git.html
>
> Alban Hertroys
> --
> If you can't see the forest for the trees,
> cut the trees and you'll find there is no forest.
>


Re: [GENERAL] Incorrect response code after XA recovery

From
Alvaro Herrera
Date:
Tom Jenkinson escribió:
> Hi Alban,
>
> I stripped down the code to a raw XA example using the latest
> postgres driver available in maven central. It demonstrates that
> regardless of what the codebase might suggest, it is certainly the
> case that postgres is returning XAER_RMERR in the scenario where the
> resource manager no longer knows about the Xid.
>
> The code is available here:
> https://github.com/tomjenkinson/xa-recovery/commit/944d45e86a91eacb9489843acfbf6a80f1b4b820

Those error codes do certainly appear in the PGXAConnection.java source
in the pgjdbc git.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


Re: [JDBC] Incorrect response code after XA recovery

From
Jeremy Whiting
Date:
Hi Tom,
 The driver currently doesn't report back to the calling client (tm)
XAException.XAER_NOTA code as Ondrej and Tom Jenkinson have identified.
Instead it returns XAException.XAER_RMERR. See line 416

https://github.com/pgjdbc/pgjdbc/blob/master/org/postgresql/xa/PGXAConnection.java#416

 which imo is used for general errors in the resource manager.

 I've written a test case that can be pulled into the pgjdbc testsuite
that will make verifying this error easier. It is based on the example
code Tom Jenkinson provided. A pull request has been created which can
be found here...

https://github.com/pgjdbc/pgjdbc/pull/73

 I am currently coding up a change to the driver in anticipation there
is agreement in the pgjdbc group to change the rollback method. Another
pull request will be created for that. Let's see what discussion and
decision is made by the more active members in pgjdbc.

Regards,
Jeremy

On 29/07/13 16:11, Tom Lane wrote:
> Tom Jenkinson <tom.jenkinson@redhat.com> writes:
>> On Mon 29 Jul 2013 15:46:12 BST, Tom Lane wrote:
>>> No idea, but in any case that's outside Postgres' purview.  It's barely
>>> possible that the Postgres JDBC driver has something to do with that,
>>> but it sounds more like the XA manager's turf.
>> I am not sure what you mean here as I don't know the structure of how
>> the PostGres project is packaged, all I know is that the PostGres JDBC
>> driver component appears to be returning an XAException with the
>> message "Error rolling back prepared transaction" and an errorCode of
>> XAException.XAER_RMERR rather than XAER_NOTA.
>> Is there a different component within your bug tracking system  we
>> should be using to raise this against the JDBC driver instead?
> The folk who would fix anything in the JDBC driver tend to read
> pgsql-jdbc sooner than pgsql-bugs, so cc'ing there for comment.
>
>             regards, tom lane
>
>




Re: [JDBC] Incorrect response code after XA recovery

From
Jeremy Whiting
Date:
Hello Tom,
  A quick update on progress. A second PR was created to provide a patch.

https://github.com/pgjdbc/pgjdbc/pull/76

Regards,
Jeremy

On 31/07/13 12:36, Jeremy Whiting wrote:
> Hi Tom,
>   The driver currently doesn't report back to the calling client (tm)
> XAException.XAER_NOTA code as Ondrej and Tom Jenkinson have identified.
> Instead it returns XAException.XAER_RMERR. See line 416
>
> https://github.com/pgjdbc/pgjdbc/blob/master/org/postgresql/xa/PGXAConnection.java#416
>
>   which imo is used for general errors in the resource manager.
>
>   I've written a test case that can be pulled into the pgjdbc testsuite
> that will make verifying this error easier. It is based on the example
> code Tom Jenkinson provided. A pull request has been created which can
> be found here...
>
> https://github.com/pgjdbc/pgjdbc/pull/73
>
>   I am currently coding up a change to the driver in anticipation there
> is agreement in the pgjdbc group to change the rollback method. Another
> pull request will be created for that. Let's see what discussion and
> decision is made by the more active members in pgjdbc.
>
> Regards,
> Jeremy
>
> On 29/07/13 16:11, Tom Lane wrote:
>> Tom Jenkinson <tom.jenkinson@redhat.com> writes:
>>> On Mon 29 Jul 2013 15:46:12 BST, Tom Lane wrote:
>>>> No idea, but in any case that's outside Postgres' purview.  It's barely
>>>> possible that the Postgres JDBC driver has something to do with that,
>>>> but it sounds more like the XA manager's turf.
>>> I am not sure what you mean here as I don't know the structure of how
>>> the PostGres project is packaged, all I know is that the PostGres JDBC
>>> driver component appears to be returning an XAException with the
>>> message "Error rolling back prepared transaction" and an errorCode of
>>> XAException.XAER_RMERR rather than XAER_NOTA.
>>> Is there a different component within your bug tracking system  we
>>> should be using to raise this against the JDBC driver instead?
>> The folk who would fix anything in the JDBC driver tend to read
>> pgsql-jdbc sooner than pgsql-bugs, so cc'ing there for comment.
>>
>>             regards, tom lane
>>
>>
>


--
Jeremy Whiting
Senior Software Engineer, Performance Team
Red Hat

------------------------------------------------------------
Registered Address: Red Hat UK Ltd, 64 Baker Street, 4th Floor, London. W1U 7DF. United Kingdom.
Registered in England and Wales under Company Registration No. 03798903. Directors: Michael Cunningham (USA), Mark
Hegarty(Ireland), Matt Parson (USA), Charlie Peters (USA) 

Re: [JDBC] Incorrect response code after XA recovery

From
Heikki Linnakangas
Date:
On 05.08.2013 17:58, Jeremy Whiting wrote:
> Hello Tom,
> A quick update on progress. A second PR was created to provide a patch.
>
> https://github.com/pgjdbc/pgjdbc/pull/76

Thanks. Looks good to me.

I wish the backend would throw a more specific error code for this,
42704 is used for many other errors as well. But at COMMIT/ROLLBACK
PREPARED, it's probably safe to assume that it means that the
transaction does not exist.

- Heikki