Pavan Deolasee <pavan.deolasee@gmail.com> writes:
> On Tue, Feb 7, 2012 at 3:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hmm. It works fine if you issue an actual ROLLBACK command there,
>> so my gut reaction is that AbortOutOfAnyTransaction isn't sufficiently
>> duplicating the full-fledged ROLLBACK code path. No time to dig further
>> right now though.
> It works OK for a ROLLBACK command because we create a new unnamed
> portal for the ROLLBACK command, silently dropping the old one if it
> already exists. Since the ROLLBACK command then runs successfully, we
> don't see the same assertion. Would it be safe to drop FAILED unnamed
> portals during AbortOutAnyTransaction ? May be it is if we can do that
> before creating a new portal for ROLLBACK command itself.
I poked at this some more, and noticed another dangerous-seeming issue:
at the time PortalCleanup is called, if it does get called, the portal's
stmts and sourceText are already pointing at garbage. The reason is
that exec_simple_query blithely does this:
/* * We don't have to copy anything into the portal, because everything * we are passing here is
inMessageContext, which will outlive the * portal anyway. */ PortalDefineQuery(portal,
NULL, query_string, commandTag,
plantree_list, NULL);
That comment is a true statement for the normal successful path of
control, wherein we reach the PortalDrop a few dozen lines below.
But it's not true if the command suffers an error. We will mark
the portal PORTAL_FAILED right away (in one of various PG_CATCH
blocks) but we don't clean it up until another command arrives,
so the current contents of MessageContext are long gone when portal
cleanup happens.
It seems to me that the most reliable fix for these issues is to
institute the same policy for transitioning a portal to FAILED state
as we already have for transitions to DONE state, ie, we should run the
cleanup hook immediately. IOW it would be a good idea to create a
function MarkPortalFailed that is just like MarkPortalDone except for
the specific portal state transition, and use that instead of merely
setting portal->status = PORTAL_FAILED in error cleanup situations.
This would ensure that the cleanup hook gets to run before we possibly
cut the portal's memory out from under it. It's more than is probably
needed to resolve the specific crash you're reporting, but it seems
likely to forestall future issues.
I haven't actually tested such a fix yet, but will go do that now.
regards, tom lane