BUG #5905: segfault on pg_terminate_backend - Mailing list pgsql-bugs

From Matt Harrington
Subject BUG #5905: segfault on pg_terminate_backend
Date
Msg-id 201103011850.p21IoAW2090500@wwwmaster.postgresql.org
Whole thread Raw
Responses Re: BUG #5905: segfault on pg_terminate_backend
Re: BUG #5905: segfault on pg_terminate_backend
Re: BUG #5905: segfault on pg_terminate_backend
List pgsql-bugs
The following bug has been logged online:

Bug reference:      5905
Logged by:          Matt Harrington
Email address:      matt.harrington@rentrak.com
PostgreSQL version: 8.4.3
Operating system:   CentOS 4.7
Description:        segfault on pg_terminate_backend
Details:

I have occasionally triggered a segfault while executing
pg_terminate_backend (or kill on child postgres processes.)  I still have
the core file and associated logs.

$ gdb -c core.5746 /usr/local/stow/postgresql-8.4.3/bin/postmaster
GNU gdb Red Hat Linux (6.3.0.0-1.159.el4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host
libthread_db library "/lib64/tls/libthread_db.so.1".

Core was generated by `postgres: postgres att_dev [local] SELECT
                        '.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib64/libcrypt.so.1...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/tls/libm.so.6...done.
Loaded symbols for /lib64/tls/libm.so.6
Reading symbols from /lib64/tls/libc.so.6...done.
Loaded symbols for /lib64/tls/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from
/data_storage/local/stow/postgresql-8.4.3/lib/auto_explain.so...done.
Loaded symbols for
/data_storage/local/stow/postgresql-8.4.3/lib/auto_explain.so
Reading symbols from
/data_storage/local/stow/postgresql-8.4.3/lib/pg_stat_statements.so...done.
Loaded symbols for
/data_storage/local/stow/postgresql-8.4.3/lib/pg_stat_statements.so
Reading symbols from
/data_storage/local/stow/postgresql-8.4.3/lib/plpgsql.so...done.
Loaded symbols for /data_storage/local/stow/postgresql-8.4.3/lib/plpgsql.so
#0  ResourceOwnerEnlargeCatCacheRefs (owner=0x0) at resowner.c:601
601        if (owner->ncatrefs < owner->maxcatrefs)
(gdb) bt
#0  ResourceOwnerEnlargeCatCacheRefs (owner=0x0) at resowner.c:601
#1  0x000000000066586e in SearchCatCache (cache=0xa26ff0, v1=Variable "v1"
is not available.
) at catcache.c:1182
#2  0x0000000000536095 in sql_exec_error_callback (arg=Variable "arg" is not
available.
) at functions.c:883
#3  0x000000000067355c in errfinish (dummy=Variable "dummy" is not
available.
) at elog.c:386
#4  0x00000000005cf3e5 in log_disconnections (code=Variable "code" is not
available.
) at postgres.c:4019
#5  0x00000000005bd020 in proc_exit_prepare (code=1) at ipc.c:182
#6  0x00000000005bd048 in proc_exit (code=0) at ipc.c:95
#7  0x00000000006736d6 in errfinish (dummy=Variable "dummy" is not
available.
) at elog.c:488
#8  0x00000000005cb4c1 in ProcessInterrupts () at postgres.c:2653
#9  0x00000000005342fa in ExecScan (node=0xaccb60, accessMtd=0x540000
<SeqNext>) at execScan.c:101
#10 0x000000000052d86d in ExecProcNode (node=0xaccb60) at
execProcnode.c:363
#11 0x000000000053c83c in ExecHashJoin (node=0xb49920) at
nodeHashjoin.c:131
#12 0x000000000052d7bd in ExecProcNode (node=0xb49920) at
execProcnode.c:412
#13 0x0000000000540b1a in ExecSort (node=0xb49810) at nodeSort.c:102
#14 0x000000000052d79a in ExecProcNode (node=0xb49810) at
execProcnode.c:423
#15 0x000000000052b5ca in standard_ExecutorRun (queryDesc=0xb153a0,
direction=ForwardScanDirection, count=10) at execMain.c:1504
#16 0x0000002a98417197 in explain_ExecutorRun (queryDesc=0xb153a0,
direction=ForwardScanDirection, count=10) at auto_explain.c:168
#17 0x0000002a9851a3e3 in pgss_ExecutorRun (queryDesc=0xb153a0,
direction=ForwardScanDirection, count=10) at pg_stat_statements.c:514
#18 0x00000000005cff49 in PortalRunSelect (portal=0xaa2db0, forward=Variable
"forward" is not available.
) at pquery.c:953
#19 0x00000000005d0e80 in PortalRunFetch (portal=0xaa2db0,
fdirection=FETCH_FORWARD, count=10, dest=0x92db20) at pquery.c:1402
#20 0x0000000000548a81 in _SPI_cursor_operation (portal=0xaa2db0,
direction=FETCH_FORWARD, count=10, dest=0x92db20) at spi.c:2088
#21 0x0000002ae7a195f4 in exec_for_query (estate=0x7fbfffcd50,
stmt=0xaec558, portal=0xaa2db0, prefetch_ok=1 '\001') at pl_exec.c:4211
#22 0x0000002ae7a197f6 in exec_stmt_fors (estate=0x7fbfffcd50,
stmt=0xaec558) at pl_exec.c:1852
#23 0x0000002ae7a16699 in exec_stmts (estate=0x7fbfffcd50, stmts=Variable
"stmts" is not available.
) at pl_exec.c:1269
#24 0x0000002ae7a17ff1 in exec_stmt_block (estate=0x0, block=0xaec650) at
pl_exec.c:1137
#25 0x0000002ae7a18b69 in plpgsql_exec_function (func=0xa88458,
fcinfo=0x7fbfffd020) at pl_exec.c:315
#26 0x0000002ae7a0f940 in plpgsql_call_handler (fcinfo=0x7fbfffd020) at
pl_handler.c:95
#27 0x000000000052fd64 in ExecMakeTableFunctionResult (funcexpr=0xabe000,
econtext=0xabdd60, expectedDesc=0xabdec0, randomAccess=0 '\0')
    at execQual.c:2016
#28 0x000000000053f561 in FunctionNext (node=0xabdc50) at
nodeFunctionscan.c:64
#29 0x0000000000534331 in ExecScan (node=0x0, accessMtd=0x53f4f0
<FunctionNext>) at execScan.c:68
#30 0x000000000052d81d in ExecProcNode (node=0xabdc50) at
execProcnode.c:385
#31 0x0000000000537acc in ExecAppend (node=0xaa5dd0) at nodeAppend.c:269
#32 0x000000000052d88d in ExecProcNode (node=0xaa5dd0) at
execProcnode.c:348
#33 0x000000000052b5ca in standard_ExecutorRun (queryDesc=0xb5dbc0,
direction=ForwardScanDirection, count=0) at execMain.c:1504
#34 0x0000002a98417197 in explain_ExecutorRun (queryDesc=0xb5dbc0,
direction=ForwardScanDirection, count=0) at auto_explain.c:168
#35 0x0000002a9851a3e3 in pgss_ExecutorRun (queryDesc=0xb5dbc0,
direction=ForwardScanDirection, count=0) at pg_stat_statements.c:514
#36 0x0000000000536da4 in fmgr_sql (fcinfo=0x7fbfffd9d0) at functions.c:472
#37 0x000000000052fd64 in ExecMakeTableFunctionResult (funcexpr=0xaf14a0,
econtext=0xaf11d0, expectedDesc=0xaf1360, randomAccess=0 '\0')
    at execQual.c:2016
#38 0x000000000053f561 in FunctionNext (node=0xaf10c0) at
nodeFunctionscan.c:64
#39 0x0000000000534331 in ExecScan (node=0x0, accessMtd=0x53f4f0
<FunctionNext>) at execScan.c:68
#40 0x000000000052d81d in ExecProcNode (node=0xaf10c0) at
execProcnode.c:385
#41 0x000000000052b5ca in standard_ExecutorRun (queryDesc=0xa98450,
direction=ForwardScanDirection, count=0) at execMain.c:1504
---Type <return> to continue, or q <return> to quit---
#42 0x0000002a98417197 in explain_ExecutorRun (queryDesc=0xa98450,
direction=ForwardScanDirection, count=0) at auto_explain.c:168
#43 0x0000002a9851a3e3 in pgss_ExecutorRun (queryDesc=0xa98450,
direction=ForwardScanDirection, count=0) at pg_stat_statements.c:514
#44 0x00000000005cff49 in PortalRunSelect (portal=0xaa2ca0, forward=Variable
"forward" is not available.
) at pquery.c:953
#45 0x00000000005d0760 in PortalRun (portal=0xaa2ca0,
count=9223372036854775807, isTopLevel=1 '\001', dest=0xa55c98,
altdest=0xa55c98,
    completionTag=0x7fbfffe360 "") at pquery.c:779
#46 0x00000000005cc2d6 in exec_simple_query (
    query_string=0xa53da0 "select * from
rtk_role_maint.sql_to_refresh_grants_for_dml       ('att', 'att_dml') ;") at
postgres.c:991
#47 0x00000000005cdcdf in PostgresMain (argc=4, argv=Variable "argv" is not
available.
) at postgres.c:3614
#48 0x00000000005a4ee0 in ServerLoop () at postmaster.c:3447
#49 0x00000000005a686d in PostmasterMain (argc=3, argv=0x9b04e0) at
postmaster.c:1040
#50 0x00000000005530be in main (argc=3, argv=0x9b04e0) at main.c:188


$ grep -A70 'Segmentation fault' postgresql-2011-03-01_000000.log
<6719, 2011-03-01 06:00:15 PST> LOG:  server process (PID 5746) was
terminated by signal 11: Segmentation fault
<6719, 2011-03-01 06:00:15 PST> LOG:  terminating any other active server
processes
<5940, 2011-03-01 06:00:15 PST> WARNING:  terminating connection because of
crash of another server process
<5940, 2011-03-01 06:00:15 PST> DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
<5940, 2011-03-01 06:00:15 PST> HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
...
<6040, 2011-03-01 06:00:17 PST> FATAL:  the database system is in recovery
mode
<6039, 2011-03-01 06:00:17 PST> LOG:  database system was interrupted; last
known up at 2011-03-01 05:51:18 PST
<6039, 2011-03-01 06:00:17 PST> LOG:  database system was not properly shut
down; automatic recovery in progress
<6039, 2011-03-01 06:00:17 PST> LOG:  redo starts at 7D0/6FF7E490
<6039, 2011-03-01 06:00:17 PST> LOG:  unexpected pageaddr 7D0/20F9A000 in
log file 2000, segment 111, offset 16359424
<6039, 2011-03-01 06:00:17 PST> LOG:  redo done at 7D0/6FF97728
<6039, 2011-03-01 06:00:17 PST> LOG:  last completed transaction was at log
time 2011-03-01 06:00:04.083136-08
<6039, 2011-03-01 06:00:17 PST> LOG:  checkpoint starting: end-of-recovery
immediate
<6039, 2011-03-01 06:00:17 PST> LOG:  checkpoint complete: wrote 24 buffers
(0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.009
s, sync=0.002 s, total=0.011 s
<6044, 2011-03-01 06:00:18 PST> LOG:  autovacuum launcher started
<6719, 2011-03-01 06:00:18 PST> LOG:  database system is ready to accept
connections

Unfortunately, I have not been successful in generating a reproducible case;
it may be a race condition edge case.

pgsql-bugs by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: BUG #5902: pl/pgsql plans are not invalidated on discard all
Next
From: Merlin Moncure
Date:
Subject: Re: BUG #5905: segfault on pg_terminate_backend