Thread: AtAbort_Portsl problem

AtAbort_Portsl problem

From
Tatsuo Ishii
Date:
Hi,

While ago, I reported a problem regarding exec_execute_message crash
in transaction abort state if sync message issued right after parse,
bind and execute message (which is normal if used with pgpool). After
further investigation, I concluded that there's a serious problem with
unnamed portal handling.

The previous unnamed portal remains until next exec_bind_message
call(it calls CreatePortal which drops unnamed portal to replace it
with new one). If the next call to exec_parse_message fails by some
reasons (for example, parse error), the previous unnamed portal
remains with some data trashed by AtAbort_Portals. The reason why I
see exec_execute_message crash is, it looks into the broken portal,
especially portal->stmts:
is_xact_command = IsTransactionStmtList(portal->stmts);

unfortunately which was already freed by PortalReleaseCachedPlan call
in AtAbort_Portals because portal->cplan and portal->stmts belong to
the same memory context in some cases.

The reason why we don't see the problem until now is probably a) just
lucky b) libpq/drivers do not issue execute if parse fails. But bug is
bug anyway, I believe we need to fix this.

One of fixes I can think of is, set NULL to portal->stmts in
AtAbort_Portsl(see attached patches against CVS Head). This will not
result in memory leak since either portal->stmts belongs to the same
memory context of portal->cplan (in this case the memory context will
be dropped by PortalReleaseCachedPlan) or belongs to the same one of
portal heap meory, which will be dropped at the same time when the
portal itself dropped.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
Index: src/backend/utils/mmgr/portalmem.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/mmgr/portalmem.c,v
retrieving revision 1.115
diff -c -r1.115 portalmem.c
*** src/backend/utils/mmgr/portalmem.c    2 Jan 2010 16:57:58 -0000    1.115
--- src/backend/utils/mmgr/portalmem.c    17 Jan 2010 13:19:07 -0000
***************
*** 684,689 ****
--- 684,700 ----             PortalReleaseCachedPlan(portal);          /*
+          * Remove the reference to PlannedStmts and/or utility
+          * statements.  This is neccessary since
+          * PortalReleaseCachedPlan might release memory context which
+          * stmts belogs to and subsequent call to exec_execute_message
+          * looks into stmts to check if it's a transaction
+          * command(which is allow to call even if in an aborted
+          * transaction state), which might refer to freed memory.
+          */
+         portal->stmts = NULL;
+ 
+         /*          * Any resources belonging to the portal will be released in the          * upcoming
transaction-widecleanup; they will be gone before we run          * PortalDrop. 

Re: AtAbort_Portsl problem

From
Tom Lane
Date:
Tatsuo Ishii <ishii@postgresql.org> writes:
> While ago, I reported a problem regarding exec_execute_message crash
> in transaction abort state if sync message issued right after parse,
> bind and execute message (which is normal if used with pgpool). After
> further investigation, I concluded that there's a serious problem with
> unnamed portal handling.

I was never able to get your test case to work (or, rather, crash).
Does this patch address the original issue, or is this something
different?
        regards, tom lane


Re: AtAbort_Portsl problem

From
Tatsuo Ishii
Date:
> Tatsuo Ishii <ishii@postgresql.org> writes:
> > While ago, I reported a problem regarding exec_execute_message crash
> > in transaction abort state if sync message issued right after parse,
> > bind and execute message (which is normal if used with pgpool). After
> > further investigation, I concluded that there's a serious problem with
> > unnamed portal handling.
> 
> I was never able to get your test case to work (or, rather, crash).
> Does this patch address the original issue, or is this something
> different?

The patch addresses the original issue.  The reason why you didn't see
crash was just you were lucky, I believe. I'm sure that your
exec_execute_message looks into already-freed-memory.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


Re: AtAbort_Portsl problem

From
Tom Lane
Date:
Tatsuo Ishii <ishii@postgresql.org> writes:
> The patch addresses the original issue.  The reason why you didn't see
> crash was just you were lucky, I believe. I'm sure that your
> exec_execute_message looks into already-freed-memory.

[ shrug... ]  If it did, it would have crashed, because I invariably
build with --cassert-enabled -> CLOBBER_FREED_MEMORY.

I do now see the risk path you are talking about, I think:

1. bind to some fully_planned prepared statement, causing the Portal  to link directly to a CachedPlan's statement
list;
2. invalidate the prepared statement, so that the CachedPlanSource  drops its reference to the CachedPlan;
3. AbortTransaction, so that AtAbort_Portals runs  PortalReleaseCachedPlan.  This makes the CachedPlan's reference
countgo to zero, so it drops its memory.  Now the Portal's  stmts pointer is dangling;
 
4. try to execute the Portal.

I do not believe it's possible to make this happen through libpq
alone, at least not without a great deal more hacking than you
did on it.  libpq doesn't ever put very much in between binding
a portal and executing it.  But a non-libpq-based client could
easily do it if it intermixed binding and executing a few different
portals.  Also step 2 might happen unluckily through a SI reset
triggered by some concurrent session.

I don't like the proposed patch though since it closes only one of
the paths by which a Portal might drop its reference to a CachedPlan.
I think the right place to clear portal->stmts is in
PortalReleaseCachedPlan.
        regards, tom lane