Thread: Incorrect response code after XA recovery

Incorrect response code after XA recovery

From
Ondrej Chaloupka
Date:
Hi,

I would like to consult with you a problematic response put by PostgreSQL after transaction recovery run by Narayana
(JBossTS).

I work on tests for Narayana and I hit a issue with PostgreSQL. The db returns incorrect code XAException.XA_HEURHAZ
whenthe TM does recovery after crash of the jboss eap app server. 
The exception is following:
Caused by: org.postgresql.util.PSQLException: ERROR: prepared transaction with identifier
"131072_AAAAAAAAAAAAAP//fwAAAd7TXOBR8jj5AAAAKDE=_AAAAAAAAAAAAAP//fwAAAd7TXOBR8jj5AAAALQAAAAAAAAAA"does not exist 

It's run on PostgreSQL 9.2 but the older versions seem to be affected as well.

The problem occurs when TM runs on JTS transactions.

The idea of the test:
The test enlists two resources to a transaction. There is called prepare on resource of PostgreSQL. The app server
crashesbefore prepare is called on second transaction participant. After restart of the app server TM tries to recover
thetransaction. As the fail occurs during prepare phase rollback is expected. 

The OTS specification requires both bottom up and top down recovery to be triggered by the recovering resource. This
causesthat two rollback calls are done against the DB. DB receives rollback call and does the rollback. Then for the
secondtime it returns the exceptional code. As the DB already rollbacked the transaction and forgot about it the DB
returnserror that no such transaction exists. But this seems to be against OTS specification. 
There are some more details in the following bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=988724

Do you have some experience with such behaviour? Can I suppose this being problem of PostgreSQL? Or is there already
somebug for this issue in Postgres bugtracking system? 

Thank you
Ondra


Re: [BUGS] Incorrect response code after XA recovery

From
Tom Lane
Date:
Ondrej Chaloupka <ochaloup@redhat.com> writes:
> The OTS specification requires both bottom up and top down recovery to be triggered by the recovering resource. This
causesthat two rollback calls are done against the DB. DB receives rollback call and does the rollback. Then for the
secondtime it returns the exceptional code. As the DB already rollbacked the transaction and forgot about it the DB
returnserror that no such transaction exists. But this seems to be against OTS specification. 

It's not likely that we would consider changing the behavior of ROLLBACK
PREPARED.  The alternatives we would have are (1) silently accept a
ROLLBACK against a non-existent transaction ID, or (2) remember every
rolled-back ID forever.  Neither seems sane in the least.

It seems to me that this is something client-side code, probably the XA
manager, would need to deal with.  The XA manager already has to track
uncommitted 2-phase transactions, and would furthermore have the best
idea of when it would be safe to forget about a rolled-back ID.

Right offhand it appears to me that that Red Hat bug is filed against
the correct component, and you need to push them harder to fix their
bug/shortcoming rather than claim it's our problem.

            regards, tom lane


Re: [BUGS] Incorrect response code after XA recovery

From
Tom Lane
Date:
Tom Jenkinson <tom.jenkinson@redhat.com> writes:
> A little bit of information in the linked bugzilla report is that the
> exception being returned has an XA error code of XAER_RMERR "An error
> occurred in rolling back the transaction branch. The resource manager is
> free to forget about the branch when returning this error so long as all
> accessing threads of control have been notified of the branch�s state."

> That does not sound right to me, wouldn't XAER_NOTA "The specified XID
> is not known by the resource manager" be more accurate?

No idea, but in any case that's outside Postgres' purview.  It's barely
possible that the Postgres JDBC driver has something to do with that,
but it sounds more like the XA manager's turf.

            regards, tom lane


Re: [BUGS] Incorrect response code after XA recovery

From
Tom Jenkinson
Date:
Hi Tom,

A little bit of information in the linked bugzilla report is that the
exception being returned has an XA error code of XAER_RMERR "An error
occurred in rolling back the transaction branch. The resource manager is
free to forget about the branch when returning this error so long as all
accessing threads of control have been notified of the branch’s state."

That does not sound right to me, wouldn't XAER_NOTA "The specified XID
is not known by the resource manager" be more accurate?

Thanks,
Tom

On 29/07/13 14:50, Tom Lane wrote:
> Ondrej Chaloupka <ochaloup@redhat.com> writes:
>> The OTS specification requires both bottom up and top down recovery to be triggered by the recovering resource. This
causesthat two rollback calls are done against the DB. DB receives rollback call and does the rollback. Then for the
secondtime it returns the exceptional code. As the DB already rollbacked the transaction and forgot about it the DB
returnserror that no such transaction exists. But this seems to be against OTS specification. 
> It's not likely that we would consider changing the behavior of ROLLBACK
> PREPARED.  The alternatives we would have are (1) silently accept a
> ROLLBACK against a non-existent transaction ID, or (2) remember every
> rolled-back ID forever.  Neither seems sane in the least.
>
> It seems to me that this is something client-side code, probably the XA
> manager, would need to deal with.  The XA manager already has to track
> uncommitted 2-phase transactions, and would furthermore have the best
> idea of when it would be safe to forget about a rolled-back ID.
>
> Right offhand it appears to me that that Red Hat bug is filed against
> the correct component, and you need to push them harder to fix their
> bug/shortcoming rather than claim it's our problem.
>
>             regards, tom lane



Re: [BUGS] Incorrect response code after XA recovery

From
Tom Jenkinson
Date:
Hi Tom,

On Mon 29 Jul 2013 15:46:12 BST, Tom Lane wrote:
> Tom Jenkinson <tom.jenkinson@redhat.com> writes:
>> A little bit of information in the linked bugzilla report is that the
>> exception being returned has an XA error code of XAER_RMERR "An error
>> occurred in rolling back the transaction branch. The resource manager is
>> free to forget about the branch when returning this error so long as all
>> accessing threads of control have been notified of the branch’s state."
>
>> That does not sound right to me, wouldn't XAER_NOTA "The specified XID
>> is not known by the resource manager" be more accurate?
>
> No idea, but in any case that's outside Postgres' purview.  It's barely
> possible that the Postgres JDBC driver has something to do with that,
> but it sounds more like the XA manager's turf.

I am not sure what you mean here as I don't know the structure of how
the PostGres project is packaged, all I know is that the PostGres JDBC
driver component appears to be returning an XAException with the
message "Error rolling back prepared transaction" and an errorCode of
XAException.XAER_RMERR rather than XAER_NOTA.

Is there a different component within your bug tracking system  we
should be using to raise this against the JDBC driver instead?

Thanks,
Tom


Re: [BUGS] Incorrect response code after XA recovery

From
Alban Hertroys
Date:
On Jul 29, 2013, at 16:57, Tom Jenkinson <tom.jenkinson@redhat.com> wrote:

> Hi Tom,
>
> On Mon 29 Jul 2013 15:46:12 BST, Tom Lane wrote:
>> Tom Jenkinson <tom.jenkinson@redhat.com> writes:
>>> A little bit of information in the linked bugzilla report is that the
>>> exception being returned has an XA error code of XAER_RMERR "An error
>>> occurred in rolling back the transaction branch. The resource manager is
>>> free to forget about the branch when returning this error so long as all
>>> accessing threads of control have been notified of the branch’s state."
>>
>>> That does not sound right to me, wouldn't XAER_NOTA "The specified XID
>>> is not known by the resource manager" be more accurate?
>>
>> No idea, but in any case that's outside Postgres' purview.  It's barely
>> possible that the Postgres JDBC driver has something to do with that,
>> but it sounds more like the XA manager's turf.
>
> I am not sure what you mean here as I don't know the structure of how the PostGres project is packaged, all I know is
thatthe PostGres JDBC driver component appears to be returning an XAException with the message "Error rolling back
preparedtransaction" and an errorCode of XAException.XAER_RMERR rather than XAER_NOTA. 


Looking at the error codes, it appears that it isn't even the Postgres JDBC driver returning that error, but the XA
manageryou're using, which is not a part of Postgres (nor is the JDBC driver, for that matter - that's a separate
project).

The errors you're quoting are from the XA manager and are about XA manager stuff. For all we know, the actual error
appearsto be occuring in the XA manager and not in Postgres. It's possible that the XA manager error is a result of an
errorthat Postgres returned, but since the XA manager prints its own error message and not the original one, you'll
needto uncover those error messages before we can help you with them. 

For all we know at this point, the error is with your XA manager, not with Postgres.

If you want to be sure, grep the source of the JDBC driver for those error codes; I doubt you'll find them in there.
Google was kind enough to point me here: http://jdbc.postgresql.org/development/git.html

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: [BUGS] Incorrect response code after XA recovery

From
Tom Jenkinson
Date:
Hi Alban,

I stripped down the code to a raw XA example using the latest postgres
driver available in maven central. It demonstrates that regardless of
what the codebase might suggest, it is certainly the case that postgres
is returning XAER_RMERR in the scenario where the resource manager no
longer knows about the Xid.

The code is available here:
https://github.com/tomjenkinson/xa-recovery/commit/944d45e86a91eacb9489843acfbf6a80f1b4b820

I hope that this helps,
Tom

On Mon 29 Jul 2013 18:52:31 BST, Alban Hertroys wrote:
> On Jul 29, 2013, at 16:57, Tom Jenkinson <tom.jenkinson@redhat.com> wrote:
>
>> Hi Tom,
>>
>> On Mon 29 Jul 2013 15:46:12 BST, Tom Lane wrote:
>>> Tom Jenkinson <tom.jenkinson@redhat.com> writes:
>>>> A little bit of information in the linked bugzilla report is that the
>>>> exception being returned has an XA error code of XAER_RMERR "An error
>>>> occurred in rolling back the transaction branch. The resource manager is
>>>> free to forget about the branch when returning this error so long as all
>>>> accessing threads of control have been notified of the branch’s state."
>>>
>>>> That does not sound right to me, wouldn't XAER_NOTA "The specified XID
>>>> is not known by the resource manager" be more accurate?
>>>
>>> No idea, but in any case that's outside Postgres' purview.  It's barely
>>> possible that the Postgres JDBC driver has something to do with that,
>>> but it sounds more like the XA manager's turf.
>>
>> I am not sure what you mean here as I don't know the structure of how the PostGres project is packaged, all I know
isthat the PostGres JDBC driver component appears to be returning an XAException with the message "Error rolling back
preparedtransaction" and an errorCode of XAException.XAER_RMERR rather than XAER_NOTA. 
>
>
> Looking at the error codes, it appears that it isn't even the Postgres JDBC driver returning that error, but the XA
manageryou're using, which is not a part of Postgres (nor is the JDBC driver, for that matter - that's a separate
project).
>
> The errors you're quoting are from the XA manager and are about XA manager stuff. For all we know, the actual error
appearsto be occuring in the XA manager and not in Postgres. It's possible that the XA manager error is a result of an
errorthat Postgres returned, but since the XA manager prints its own error message and not the original one, you'll
needto uncover those error messages before we can help you with them. 
>
> For all we know at this point, the error is with your XA manager, not with Postgres.
>
> If you want to be sure, grep the source of the JDBC driver for those error codes; I doubt you'll find them in there.
> Google was kind enough to point me here: http://jdbc.postgresql.org/development/git.html
>
> Alban Hertroys
> --
> If you can't see the forest for the trees,
> cut the trees and you'll find there is no forest.
>


Re: [BUGS] Incorrect response code after XA recovery

From
Alvaro Herrera
Date:
Tom Jenkinson escribió:
> Hi Alban,
>
> I stripped down the code to a raw XA example using the latest
> postgres driver available in maven central. It demonstrates that
> regardless of what the codebase might suggest, it is certainly the
> case that postgres is returning XAER_RMERR in the scenario where the
> resource manager no longer knows about the Xid.
>
> The code is available here:
> https://github.com/tomjenkinson/xa-recovery/commit/944d45e86a91eacb9489843acfbf6a80f1b4b820

Those error codes do certainly appear in the PGXAConnection.java source
in the pgjdbc git.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services