Home > mailing lists

Re: Assertion failure in AtCleanup_Portals - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Assertion failure in AtCleanup_Portals
Date	February 15, 2012 15:15:40
Msg-id	555.1329333320@sss.pgh.pa.us Whole thread Raw
In response to	Re: Assertion failure in AtCleanup_Portals (Pavan Deolasee <pavan.deolasee@gmail.com>)
List	pgsql-hackers

Tree view

Pavan Deolasee <pavan.deolasee@gmail.com> writes:
> On Tue, Feb 7, 2012 at 3:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hmm.  It works fine if you issue an actual ROLLBACK command there,
>> so my gut reaction is that AbortOutOfAnyTransaction isn't sufficiently
>> duplicating the full-fledged ROLLBACK code path.  No time to dig further
>> right now though.

> It works OK for a ROLLBACK command because we create a new unnamed
> portal for the ROLLBACK command, silently dropping the old one if it
> already exists. Since the ROLLBACK command then runs successfully, we
> don't see the same assertion. Would it be safe to drop FAILED unnamed
> portals during AbortOutAnyTransaction ? May be it is if we can do that
> before creating a new portal for ROLLBACK command itself.

I poked at this some more, and noticed another dangerous-seeming issue:
at the time PortalCleanup is called, if it does get called, the portal's
stmts and sourceText are already pointing at garbage.  The reason is
that exec_simple_query blithely does this:
       /*        * We don't have to copy anything into the portal, because everything        * we are passing here is
inMessageContext, which will outlive the        * portal anyway.        */       PortalDefineQuery(portal,
          NULL,                         query_string,                         commandTag,
plantree_list,                        NULL);

That comment is a true statement for the normal successful path of
control, wherein we reach the PortalDrop a few dozen lines below.
But it's not true if the command suffers an error.  We will mark
the portal PORTAL_FAILED right away (in one of various PG_CATCH
blocks) but we don't clean it up until another command arrives,
so the current contents of MessageContext are long gone when portal
cleanup happens.

It seems to me that the most reliable fix for these issues is to
institute the same policy for transitioning a portal to FAILED state
as we already have for transitions to DONE state, ie, we should run the
cleanup hook immediately.  IOW it would be a good idea to create a
function MarkPortalFailed that is just like MarkPortalDone except for
the specific portal state transition, and use that instead of merely
setting portal->status = PORTAL_FAILED in error cleanup situations.
This would ensure that the cleanup hook gets to run before we possibly
cut the portal's memory out from under it.  It's more than is probably
needed to resolve the specific crash you're reporting, but it seems
likely to forestall future issues.

I haven't actually tested such a fix yet, but will go do that now.
        regards, tom lane

pgsql-hackers by date:

From: Robert Haas
Date: 15 February 2012, 15:06:57
Subject: Re: [trivial patch] typo in doc/src/sgml/sepgsql.sgml

From: Bruce Momjian
Date: 15 February 2012, 15:22:35
Subject: Re: pg_upgrade message

Re: Assertion failure in AtCleanup_Portals - Mailing list pgsql-hackers

Previous

Next