Re: [PATCH] Fix CommitTransactionCommand() to CallXactCallbacks() inTBLOCK_ABORT_END - Mailing list pgsql-hackers

From Gilles Darold
Subject Re: [PATCH] Fix CommitTransactionCommand() to CallXactCallbacks() inTBLOCK_ABORT_END
Date
Msg-id 3f50f63e-1ba7-d729-d016-a06afc0d167b@darold.net
Whole thread Raw
In response to [PATCH] Fix CommitTransactionCommand() to CallXactCallbacks() inTBLOCK_ABORT_END  (Dave Sharpe <dave.sharpe@lzlabs.com>)
Responses Re: [PATCH] Fix CommitTransactionCommand() to CallXactCallbacks() inTBLOCK_ABORT_END  (Dave Sharpe <dave.sharpe@lzlabs.com>)
List pgsql-hackers
Le 25/03/2020 à 03:49, Dave Sharpe a écrit :

Hi pghackers,

This is my first time posting here ...  Gilles Darold and I are developing a new FDW which is based on the contrib/postgres_fdw. The postgres_fdw logic uses a RegisterXactCallback function to send local transactions remote. But I found that a registered XactCallback is not always called for a successful client ROLLBACK statement. So, a successful local ROLLBACK does not get sent remote by FDW, and now the local and remote transaction states are messed up, out of sync. The local database is "eating" the successful rollback.

 

I attach a git format-patch file 0001-Fix-CommitTransactionCommand-to-CallXactCallbacks-in.patch

The patch fixes the problem and is ready to commit as far as I can tell. It adds some comment lines and three lines of code to CommitTransactionCommand() in the TBLOCK_ABORT_END case. Line (1) to reset the transaction's blockState back to TBLOCK_ABORT, ahead of (2) a new call to callXactCallbacks(). If the callback returns successful (which is usually the case), (3) the new code switches back to TBLOCK_ABORT_END, then continues with old code to CleanupTransaction() as before. If any callback does error out, the TBLOCK_ABORT was the correct block state for an error.

 

I have not added a regression test since my scenario involves a C module ... I didn't see such a regression test, but somebody can teach me where to put the C module and Makefile if a regression test should be added. Heads up that the scenario to hit this is a bit complex (to me) ... I attach the following unit test files:

1) eat_rollback.c, _PG_init() installs my_utility_hook() for INFO log for debugging.

RegisterSubXactCallback(mySubtransactionCallback) which injects some logging and an ERROR for savepoints, i.e., negative testing, e.g., like a remote FDW savepoint failed.

And RegisterXactTransaction(myTransactionCallback) with info logging.

2) Makefile, to make the eat_rollback.so

3) eat_rollback2.sql, drives the fail sequence, especially the SAVEPOINT, which errors out, then later a successful ROLLBACK gets incorrectly eaten (no CALLBACK info logging, demonstrates the bug), then another successful ROLLBACK (now there is CALLBACK info logging).

4) eat_rollback2.out, output without the fix, the rollback at line 68 is successful but there is not myTransactionCallback() INFO output

5) eat_rollback2.fixed, output after the fix, the rollback at line 68 is successful, and now there is a myTransactionCallback() INFO log. Success!

 

It first failed for me in v11.1, I suspect it failed since before then too. And it is still failing in current master.

 

Bye the way, we worked around the bug in our FDW by handling sub/xact in the utility hook. A transaction callback is still needed for implicit, internal rollbacks. Another advantage of the workaround is (I think) it reduces the code complexity and improves performance because the entire subxact stack is not unwound to drive each and every subtransaction level to remote. Also sub/transaction statements are sent remote as they arrive (local and remote are more "transactionally" synced, not stacked by FDW for later). And of course, the workaround doesn't hit the above bug, since our utility hook correctly handles the client ROLLBACK. If it makes sense to the community, I could try and patch contrib/postgres_fdw to do what we are doing. But note that I don't need it myself: we have our own new FDW for remote DB2 z/OS (!) ... at LzLabs we are a building a software defined mainframe with PostgreSQL (of all things).

 

Hope it helps!

 

Dave Sharpe

I don't necessarily agree with everything I say. (MM)

www.lzlabs.com


Hi,


As I'm involved in this thread I have given a review to this bug report and I don't think that there is a problem here but as a disclaimer my knowledge on internal transaction management is probably not enough to have a definitive opinion.


Actually the callback function is called when the error is thrown:

psql:eat_rollback2.sql:20: INFO:  00000: myTransactionCallback() XactEvent 2 (is abort) level 1 <-------------------
LOCATION:  myTransactionCallback, eat_rollback.c:52
psql:eat_rollback2.sql:20: ERROR:  XX000: no no no
LOCATION:  mySubtransactionCallback, eat_rollback.c:65


this is probably why the callback is not called on the subsequent ROLLBACK execution because abort processing is already done (src/backend/access/transam/xact.c:3890).


With this simple test and the debug extension:

BEGIN;
DROP INDEX "index2", "index3"; -- will fail indexes doesn't exist
ROLLBACK;


the callback function is also called at error level, not on the ROLLBACK:

---------------------------------------------------------------------------------BEGIN
psql:eat_rollback-2/sql/eat_rollback3.sql:39: INFO:  00000: my_utility_hook() utility transaction kind 0 (is not rollback)
LOCATION:  my_utility_hook, eat_rollback.c:28
BEGIN
---------------------------------------------------------------------------------DROP INDEX + error
psql:eat_rollback-2/sql/eat_rollback3.sql:41: INFO:  00000: myTransactionCallback() XactEvent 2 (is abort) level 1 <---------------
LOCATION:  myTransactionCallback, eat_rollback.c:52
psql:eat_rollback-2/sql/eat_rollback3.sql:41: ERROR:  42704: index "concur_index2" does not exist
LOCATION:  DropErrorMsgNonExistent, tablecmds.c:1209
---------------------------------------------------------------------------------ROLLBACK
psql:eat_rollback-2/sql/eat_rollback3.sql:43: INFO:  00000: my_utility_hook() utility transaction kind 3 (is rollback)
LOCATION:  my_utility_hook, eat_rollback.c:28
ROLLBACK


So this is probably the expected behavior , however the proposed patch will throw the callback function two times which might not be what we want.


---------------------------------------------------------------------------------BEGIN
psql:eat_rollback-2/sql/eat_rollback3.sql:39: INFO:  00000: my_utility_hook() utility transaction kind 0 (is not rollback)
LOCATION:  my_utility_hook, eat_rollback.c:28
BEGIN
---------------------------------------------------------------------------------DROP INDEX + error
psql:eat_rollback-2/sql/eat_rollback3.sql:41: INFO:  00000: myTransactionCallback() XactEvent 2 (is abort) level 1 <-------------- first call
LOCATION:  myTransactionCallback, eat_rollback.c:52
psql:eat_rollback-2/sql/eat_rollback3.sql:41: ERROR:  42704: index "concur_index2" does not exist
LOCATION:  DropErrorMsgNonExistent, tablecmds.c:1209
---------------------------------------------------------------------------------ROLLBACK
psql:eat_rollback-2/sql/eat_rollback3.sql:43: INFO:  00000: my_utility_hook() utility transaction kind 3 (is rollback) <-------------- second call
LOCATION:  my_utility_hook, eat_rollback.c:28
psql:eat_rollback-2/sql/eat_rollback3.sql:43: INFO:  00000: myTransactionCallback() XactEvent 2 (is abort) level 1
LOCATION:  myTransactionCallback, eat_rollback.c:52
ROLLBACK


But again I may be wrong.


-- 
Gilles Darold
http://www.darold.net/

pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: Columns correlation and adaptive query optimization
Next
From: Stephen Frost
Date:
Subject: Re: backup manifests