I experimented with the attached patch. It seems to work to stop the crash Michael exhibited (I've not tried to duplicate Jim's original example, though). However, it also causes a regression test failure, because transactions.sql does this:
Neat patch, it would have taken me longer to figure out that... I have tried with Jim's test case and the patch is working.
which is exactly the case we're trying to reject now. So that fills me with fear that this approach would break existing applications. On the other hand, I do not really see a good alternative :-(.
This behavior is present since 2004 with fe548629, so that's a real concern to me, especially if there are drivers around relying on this behavior. There are for example some code patterns around Postgres ODBC that could be impacted, not sure which ones but I guess that some internal error handling is not going to like that.
I thought about trying to detect whether the Portal actually had any references to "new in subtransaction" objects to decide whether to kill it or not, but that seems awfully fragile.
Ideas?
Yes. This diff on top of your patch: @@ -922,8 +922,7 @@ AtSubAbort_Portals(SubTransactionId mySubid, * must be forced into FAILED state, for the same reasons * discussed below. */ - if (portal->status == PORTAL_READY || - portal->status == PORTAL_ACTIVE) + if (portal->status == PORTAL_ACTIVE) MarkPortalFailed(portal);
This way only the active portals are marked as failed. The regression tests that are failing with your patch applied visibly do not activate the portal they use, just marking it as ready to be used. This seems to be the safest approach to me on stable branches, as well as on master, this way we are sure that resources on the failed subxacts are cleaned up correctly, and that existing applications are not impacted.
I am attaching a new patch with what I changed and a test case based on my previous one.