Re: BUG #16811: Severe reproducible server backend crash - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #16811: Severe reproducible server backend crash
Date
Msg-id 208316.1610040164@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #16811: Severe reproducible server backend crash  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: BUG #16811: Severe reproducible server backend crash
Re: BUG #16811: Severe reproducible server backend crash
List pgsql-bugs
Thomas Munro <thomas.munro@gmail.com> writes:
> Thanks for the report.  I happened to have DBeaver here and could
> reproduce this, and got the following core:

I can reproduce it without anything extra.  What's needed is to run
the problematic statement in extended query mode, which you can
do like this:

$ cat foo.sql
do $$ begin rollback; end $$;

$ pgbench -n -f foo.sql -M prepared
pgbench: error: client 0 aborted in command 0 (SQL) of script 0; perhaps the backend died while processing

That lnext() should certainly not find pstmt->stmts to be NIL,
seeing that we are inside a loop over that list.  Ergo, something
is clobbering this active portal.  A bit of gdb'ing says the clobber
happens here:

#0  AtAbort_Portals () at portalmem.c:833
    (this appears to be inlined code from PortalReleaseCachedPlan)
#1  0x00000000005a4ce2 in AbortTransaction () at xact.c:2711
#2  0x00000000005a55d5 in AbortCurrentTransaction () at xact.c:3322
#3  0x00000000006d1557 in _SPI_rollback (chain=<optimized out>) at spi.c:326
#4  0x00007feef9e851c5 in exec_stmt_rollback (stmt=0x2babca8,
    estate=0x7fff35e55ee0) at pl_exec.c:4961
#5  exec_stmts (estate=0x7fff35e55ee0, stmts=0x2babd80) at pl_exec.c:2081
#6  0x00007feef9e863cb in exec_stmt_block (estate=0x7fff35e55ee0,
    block=0x2babdd8) at pl_exec.c:1904
#7  0x00007feef9e864bb in exec_toplevel_block (
    estate=estate@entry=0x7fff35e55ee0, block=0x2babdd8) at pl_exec.c:1602
#8  0x00007feef9e86ced in plpgsql_exec_function (func=func@entry=0x2ba7c60,
    fcinfo=fcinfo@entry=0x7fff35e56060,
    simple_eval_estate=simple_eval_estate@entry=0x2bad6b0,
    simple_eval_resowner=simple_eval_resowner@entry=0x2b12e40,
    atomic=<optimized out>) at pl_exec.c:605
#9  0x00007feef9e8fd58 in plpgsql_inline_handler (fcinfo=<optimized out>)
    at pl_handler.c:344
#10 0x000000000091a540 in FunctionCall1Coll (flinfo=0x7fff35e561f0,
    collation=<optimized out>, arg1=<optimized out>) at fmgr.c:1141
#11 0x000000000091aaa9 in OidFunctionCall1Coll (functionId=<optimized out>,
    collation=collation@entry=0, arg1=45120272) at fmgr.c:1419
#12 0x000000000064df7e in ExecuteDoStmt (stmt=stmt@entry=0x2b07ed8,
    atomic=atomic@entry=false) at functioncmds.c:2027
#13 0x000000000080fa14 in standard_ProcessUtility (pstmt=0x2b07e40,
    queryString=0x2b079a0 "do $$ begin rollback; end $$;",
    context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
    dest=0xa90540 <donothingDR>, qc=0x7fff35e56630) at utility.c:696
#14 0x000000000080d044 in PortalRunUtility (portal=0x2b47240, pstmt=0x2b07e40,
    isTopLevel=<optimized out>, setHoldSnapshot=<optimized out>,
    dest=0xa90540 <donothingDR>, qc=0x7fff35e56630) at pquery.c:1159
#15 0x000000000080db24 in PortalRunMulti (portal=portal@entry=0x2b47240,
    isTopLevel=isTopLevel@entry=true,
    setHoldSnapshot=setHoldSnapshot@entry=false, dest=0xa90540 <donothingDR>,
    dest@entry=0x2adfa88, altdest=0xa90540 <donothingDR>,
    altdest@entry=0x2adfa88, qc=qc@entry=0x7fff35e56630) at pquery.c:1311
#16 0x000000000080e937 in PortalRun (portal=portal@entry=0x2b47240,
    count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,
    run_once=run_once@entry=true, dest=dest@entry=0x2adfa88,
    altdest=altdest@entry=0x2adfa88, qc=0x7fff35e56630) at pquery.c:779
#17 0x000000000080c77b in exec_execute_message (max_rows=9223372036854775807,
    portal_name=0x2adf670 "") at postgres.c:2196
#18 PostgresMain (argc=argc@entry=1, argv=argv@entry=0x7fff35e569c0,
    dbname=<optimized out>, username=<optimized out>) at postgres.c:4452

So I would say that the conditions under which AtAbort_Portals
decides that it can destroy a portal rather than just mark it failed
need to be reconsidered.  It's not clear to me exactly how that
should change though.  Maybe Peter has more insight.

            regards, tom lane



pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #16813: error to solve the problem "Windows could not stat file - over 4GB"
Next
From: PG Bug reporting form
Date:
Subject: BUG #16814: Invalid memory access on regexp_match with .* and BRE