Re: BUG #15703: Segfault in cancelled CALL-Statements - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #15703: Segfault in cancelled CALL-Statements
Date
Msg-id 8678.1553097373@sss.pgh.pa.us
Whole thread Raw
In response to BUG #15703: Segfault in cancelled CALL-Statements  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re: BUG #15703: Segfault in cancelled CALL-Statements  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: BUG #15703: Segfault in cancelled CALL-Statements  ("Julian Schauder" <julian.schauder@gmx.de>)
Re: BUG #15703: Segfault in cancelled CALL-Statements  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
PG Bug reporting form <noreply@postgresql.org> writes:
> there seems to be a SEGFAULT issue with CALL-Procedures once they get
> SIGINT'ed.

What I'm seeing in an assert-enabled build is an assertion failure:

TRAP: FailedAssertion("!(portal->resowner == ((void *)0))", File: "portalmem.c"\
, Line: 874)

#0  0x00007fbe9cd6beab in raise () from /lib64/libc.so.6
#1  0x00007fbe9cd565b9 in abort () from /lib64/libc.so.6
#2  0x000000000089d9cf in ExceptionalCondition (conditionName=conditionName@entry=0xac34b0 "!(portal->resowner ==
((void*)0))",  
    errorType=errorType@entry=0x8ec7c9 "FailedAssertion", fileName=fileName@entry=0xac3163 "portalmem.c",
lineNumber=lineNumber@entry=874)
    at assert.c:54
#3  0x00000000008ca924 in AtCleanup_Portals () at portalmem.c:874
#4  0x00000000005145c7 in CleanupTransaction () at xact.c:2676
#5  0x00000000005186ca in AbortCurrentTransaction () at xact.c:3133
#6  0x0000000000789004 in PostgresMain (argc=1, argv=argv@entry=0x2bc3ad0, dbname=<optimized out>, username=0x2b96518
"tgl")at postgres.c:4033 
#7  0x000000000070d91f in BackendRun (port=0x2bbce20, port=0x2bbce20) at postmaster.c:4399
#8  BackendStartup (port=0x2bbce20) at postmaster.c:4090
#9  ServerLoop () at postmaster.c:1703
#10 0x000000000070e81e in PostmasterMain (argc=argc@entry=2, argv=argv@entry=0x2b944c0) at postmaster.c:1376
#11 0x000000000047acae in main (argc=2, argv=0x2b944c0) at main.c:228

(gdb) f 3
#3  0x00000000008ca924 in AtCleanup_Portals () at portalmem.c:874
874                             Assert(portal->resowner == NULL);
(gdb) p *portal
$1 = {name = 0x2bff4b8 "<unnamed portal 1>", prepStmtName = 0x0, portalContext = 0x2c728a0, resowner = 0x2c141d0,
cleanup= 0x0, createSubid = 1,  
  activeSubid = 1, sourceText = 0x2c729b8 "SELECT GENERATE_SERIES(1,100000)", commandTag = 0xa10eb3 "SELECT", stmts =
0x2c74e30, 
  cplan = 0x2c75520, portalParams = 0x0, queryEnv = 0x0, strategy = PORTAL_ONE_SELECT, cursorOptions = 4, run_once =
false, 
  status = PORTAL_FAILED, portalPinned = true, autoHeld = true, queryDesc = 0x0, tupDesc = 0x2c84118, formats = 0x0,
holdStore= 0x2c83ee8,  
  holdContext = 0x2c83dd0, holdSnapshot = 0x0, atStart = false, atEnd = false, portalPos = 10, creation_time =
606411039547087,visible = true} 


Poking around a bit, it seems like this is a thinko in HoldPinnedPortals:
it's not considering the possibility of a failure during HoldPortal().
In your test the failure is triggered by query cancel, but an ordinary
execution error while finishing out a cursor's calculations would be
a problem too.

I think it should not mark the portal as autoHeld until it's been
successfully held.  The attached patch seems to make the problem
go away for me --- can you confirm it fixes your original issue?

I find HoldPinnedPortals suspicious in another way: is it really
OK for it to mark *all* pinned portals as auto-held?  What if we're
in a nest of procedures run by different PLs (of which only the
innermost will have any idea we're committing)?  Will the right
things happen if we auto-hold a portal or two and then get a failure
on another one?

            regards, tom lane

diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index a92b4541bd..697c980282 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -1262,8 +1262,8 @@ HoldPinnedPortals(void)
                         (errcode(ERRCODE_INVALID_TRANSACTION_TERMINATION),
                          errmsg("cannot perform transaction commands inside a cursor loop that is not read-only")));

-            portal->autoHeld = true;
             HoldPortal(portal);
+            portal->autoHeld = true;
         }
     }
 }

pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: BUG #15668: Server crash in transformPartitionRangeBounds
Next
From: Tom Lane
Date:
Subject: Re: BUG #15703: Segfault in cancelled CALL-Statements