Thread: BUG #13286: Core dumped during pg_terminate_backend call.

BUG #13286: Core dumped during pg_terminate_backend call.

From
maxim.boguk@gmail.com
Date:
The following bug has been logged on the website:

Bug reference:      13286
Logged by:          Maxim Boguk
Email address:      maxim.boguk@gmail.com
PostgreSQL version: 9.4.1
Operating system:   Centos Linux
Description:

Once per few days database getting core dumped during cron run of the
following query:

/usr/pgsql-9.4/bin/psql --no-psqlrc --single-transaction -d postgres -tx -c
"SELECT pg_terminate_backend(pid),now(),now()-xact_start as duration,* from
pg_stat_activity where (now() - pg_stat_activity.xact_start) > '1200
second'::interval and usename NOT IN ('postgres') and state<>'idle'"

I enabled collecting core and installed debug files for postgres.

There are gdb backtrace of the latest core:

Core was generated by `postgres: postgres postgres [local] SELECT
  '.
Program terminated with signal 11, Segmentation fault.
#0  0x000000000066bf9b in BackendIdGetTransactionIds (backendID=<value
optimized out>, xid=0x7f47769a5538, xmin=0x7f47769a553c) at sinvaladt.c:426
426                     xact =
&ProcGlobal->allPgXact[stateP->proc->pgprocno];

(gdb) bt
#0  0x000000000066bf9b in BackendIdGetTransactionIds (backendID=<value
optimized out>, xid=0x7f47769a5538, xmin=0x7f47769a553c) at sinvaladt.c:426
#1  0x00000000006287f4 in pgstat_read_current_status () at pgstat.c:2871
#2  0x0000000000628879 in pgstat_fetch_stat_numbackends () at pgstat.c:2342
#3  0x00000000006fb132 in pg_stat_get_activity (fcinfo=0x7fffaccf0ee0) at
pgstatfuncs.c:579
#4  0x000000000059dbc7 in ExecMakeTableFunctionResult (funcexpr=0x2899ae0,
econtext=0x2892bf0, argContext=<value optimized out>,
expectedDesc=0x289b0a8, randomAccess=0 '\000') at execQual.c:2193
#5  0x00000000005af622 in FunctionNext (node=0x2896cf0) at
nodeFunctionscan.c:95
#6  0x000000000059e69e in ExecScanFetch (node=0x2896cf0, accessMtd=0x5af370
<FunctionNext>, recheckMtd=0x5aeca0 <FunctionRecheck>) at execScan.c:82
#7  ExecScan (node=0x2896cf0, accessMtd=0x5af370 <FunctionNext>,
recheckMtd=0x5aeca0 <FunctionRecheck>) at execScan.c:167
#8  0x00000000005976e8 in ExecProcNode (node=0x2896cf0) at
execProcnode.c:426
#9  0x00000000005a87d2 in ExecHashJoin (node=0x2894790) at
nodeHashjoin.c:154
#10 0x0000000000597678 in ExecProcNode (node=0x2894790) at
execProcnode.c:457
#11 0x00000000005aea60 in ExecNestLoop (node=0x2892cb0) at
nodeNestloop.c:123
#12 0x0000000000597698 in ExecProcNode (node=0x2892cb0) at
execProcnode.c:449
#13 0x0000000000596252 in ExecutePlan (queryDesc=0x27e8670, direction=<value
optimized out>, count=0) at execMain.c:1486
#14 standard_ExecutorRun (queryDesc=0x27e8670, direction=<value optimized
out>, count=0) at execMain.c:319
#15 0x00007f4da708017b in pgss_ExecutorRun (queryDesc=0x27e8670,
direction=ForwardScanDirection, count=0) at pg_stat_statements.c:875
#16 0x0000000000686797 in PortalRunSelect (portal=0x2890930, forward=<value
optimized out>, count=0, dest=<value optimized out>) at pquery.c:946
#17 0x00000000006879c1 in PortalRun (portal=0x2890930,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x7f4db1a68308,
altdest=0x7f4db1a68308, completionTag=0x7fffaccf1840 "") at pquery.c:790
#18 0x000000000068404e in exec_simple_query (query_string=0x2839090 "SELECT
pg_terminate_backend(pid),now(),now()-xact_start as duration,* from
pg_stat_activity where (now() - pg_stat_activity.xact_start) > '1200
second'::interval and usename NOT IN ('postgres') and st"...) at
postgres.c:1072
#19 0x00000000006856c8 in PostgresMain (argc=<value optimized out>,
argv=<value optimized out>, dbname=0x27a83f0 "postgres", username=<value
optimized out>) at postgres.c:4074
#20 0x0000000000632d7d in BackendRun (argc=<value optimized out>,
argv=<value optimized out>) at postmaster.c:4155
#21 BackendStartup (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:3829
#22 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at
postmaster.c:1597
#23 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:1244
#24 0x00000000005cadb8 in main (argc=3, argv=0x27a7490) at main.c:228

Error isn't repeatable but happens once week or so in random intervals.

Re: BUG #13286: Core dumped during pg_terminate_backend call.

From
David Gould
Date:
On Thu, 14 May 2015 04:24:57 +0000
maxim.boguk@gmail.com wrote:

> Bug reference:      13286
> Logged by:          Maxim Boguk
> Email address:      maxim.boguk@gmail.com
> PostgreSQL version: 9.4.1
> Operating system:   Centos Linux
> Description:
>
> Once per few days database getting core dumped during cron run of the
> following query:
...
> Program terminated with signal 11, Segmentation fault.
> #0  0x000000000066bf9b in BackendIdGetTransactionIds (backendID=<value
> optimized out>, xid=0x7f47769a5538, xmin=0x7f47769a553c) at sinvaladt.c:426
> 426                     xact =
> &ProcGlobal->allPgXact[stateP->proc->pgprocno];
>
> (gdb) bt
> #0  0x000000000066bf9b in BackendIdGetTransactionIds (backendID=<value
> optimized out>, xid=0x7f47769a5538, xmin=0x7f47769a553c) at sinvaladt.c:426
> #1  0x00000000006287f4 in pgstat_read_current_status () at pgstat.c:2871
> #2  0x0000000000628879 in pgstat_fetch_stat_numbackends () at pgstat.c:2342
> #3  0x00000000006fb132 in pg_stat_get_activity (fcinfo=0x7fffaccf0ee0) at
> pgstatfuncs.c:579

This looks like a duplicate of bug #12918 which is a segfault in
BackendIdGetTransactionIds that affects pg_stat_activity and similar things.
One of my clients also hit that but it is resolved by Tom Lanes patch
pasted below.

-dg
---

From: Tom Lane <tgl@sss.pgh.pa.us>
To: Vladimir Borodin <root@simply.name>
cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #12918: Segfault in BackendIdGetTransactionIds
Date: Mon, 30 Mar 2015 13:00:01 -0400
Sender: pgsql-bugs-owner@postgresql.org

diff --git a/src/backend/storage/ipc/sinvaladt.c b/src/backend/storage/ipc/sinvaladt.c
index 81b85c0..a2fde89 100644
*** a/src/backend/storage/ipc/sinvaladt.c
--- b/src/backend/storage/ipc/sinvaladt.c
*************** BackendIdGetProc(int backendID)
*** 403,411 ****
  void
  BackendIdGetTransactionIds(int backendID, TransactionId *xid, TransactionId *xmin)
  {
-     ProcState  *stateP;
      SISeg       *segP = shmInvalBuffer;
-     PGXACT       *xact;

      *xid = InvalidTransactionId;
      *xmin = InvalidTransactionId;
--- 403,409 ----
*************** BackendIdGetTransactionIds(int backendID
*** 415,425 ****

      if (backendID > 0 && backendID <= segP->lastBackend)
      {
!         stateP = &segP->procState[backendID - 1];
!         xact = &ProcGlobal->allPgXact[stateP->proc->pgprocno];

!         *xid = xact->xid;
!         *xmin = xact->xmin;
      }

      LWLockRelease(SInvalWriteLock);
--- 413,428 ----

      if (backendID > 0 && backendID <= segP->lastBackend)
      {
!         ProcState  *stateP = &segP->procState[backendID - 1];
!         PGPROC       *proc = stateP->proc;

!         if (proc != NULL)
!         {
!             PGXACT       *xact = &ProcGlobal->allPgXact[proc->pgprocno];
!
!             *xid = xact->xid;
!             *xmin = xact->xmin;
!         }
      }

      LWLockRelease(SInvalWriteLock);



--
David Gould                                   daveg@sonic.net
If simplicity worked, the world would be overrun with insects.