Re: BUG #5905: segfault on pg_terminate_backend - Mailing list pgsql-bugs

From Merlin Moncure
Subject Re: BUG #5905: segfault on pg_terminate_backend
Date
Msg-id AANLkTik_NFbQ5eXizWJrK_cRjBYtxUJKVSMhtz2GT3jT@mail.gmail.com
Whole thread Raw
In response to BUG #5905: segfault on pg_terminate_backend  ("Matt Harrington" <matt.harrington@rentrak.com>)
List pgsql-bugs
On Tue, Mar 1, 2011 at 12:50 PM, Matt Harrington
<matt.harrington@rentrak.com> wrote:
>
> The following bug has been logged online:
>
> Bug reference: =A0 =A0 =A05905
> Logged by: =A0 =A0 =A0 =A0 =A0Matt Harrington
> Email address: =A0 =A0 =A0matt.harrington@rentrak.com
> PostgreSQL version: 8.4.3
> Operating system: =A0 CentOS 4.7
> Description: =A0 =A0 =A0 =A0segfault on pg_terminate_backend
> Details:
>
> I have occasionally triggered a segfault while executing
> pg_terminate_backend (or kill on child postgres processes.) =A0I still ha=
ve
> the core file and associated logs.
>
> $ gdb -c core.5746 /usr/local/stow/postgresql-8.4.3/bin/postmaster
> GNU gdb Red Hat Linux (6.3.0.0-1.159.el4rh)
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you
> are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. =A0Type "show warranty" for deta=
ils.
> This GDB was configured as "x86_64-redhat-linux-gnu"...Using host
> libthread_db library "/lib64/tls/libthread_db.so.1".
>
> Core was generated by `postgres: postgres att_dev [local] SELECT
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0'.
> Program terminated with signal 11, Segmentation fault.
> Reading symbols from /lib64/libcrypt.so.1...done.
> Loaded symbols for /lib64/libcrypt.so.1
> Reading symbols from /lib64/libdl.so.2...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/tls/libm.so.6...done.
> Loaded symbols for /lib64/tls/libm.so.6
> Reading symbols from /lib64/tls/libc.so.6...done.
> Loaded symbols for /lib64/tls/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from
> /data_storage/local/stow/postgresql-8.4.3/lib/auto_explain.so...done.
> Loaded symbols for
> /data_storage/local/stow/postgresql-8.4.3/lib/auto_explain.so
> Reading symbols from
> /data_storage/local/stow/postgresql-8.4.3/lib/pg_stat_statements.so...don=
e.
> Loaded symbols for
> /data_storage/local/stow/postgresql-8.4.3/lib/pg_stat_statements.so
> Reading symbols from
> /data_storage/local/stow/postgresql-8.4.3/lib/plpgsql.so...done.
> Loaded symbols for /data_storage/local/stow/postgresql-8.4.3/lib/plpgsql.=
so
> #0 =A0ResourceOwnerEnlargeCatCacheRefs (owner=3D0x0) at resowner.c:601
> 601 =A0 =A0 =A0 =A0 =A0 =A0 if (owner->ncatrefs < owner->maxcatrefs)
> (gdb) bt
> #0 =A0ResourceOwnerEnlargeCatCacheRefs (owner=3D0x0) at resowner.c:601
> #1 =A00x000000000066586e in SearchCatCache (cache=3D0xa26ff0, v1=3DVariab=
le "v1"
> is not available.
> ) at catcache.c:1182
> #2 =A00x0000000000536095 in sql_exec_error_callback (arg=3DVariable "arg"=
 is not
> available.
> ) at functions.c:883
> #3 =A00x000000000067355c in errfinish (dummy=3DVariable "dummy" is not
> available.
> ) at elog.c:386
> #4 =A00x00000000005cf3e5 in log_disconnections (code=3DVariable "code" is=
 not
> available.
> ) at postgres.c:4019
> #5 =A00x00000000005bd020 in proc_exit_prepare (code=3D1) at ipc.c:182
> #6 =A00x00000000005bd048 in proc_exit (code=3D0) at ipc.c:95
> #7 =A00x00000000006736d6 in errfinish (dummy=3DVariable "dummy" is not
> available.
> ) at elog.c:488
> #8 =A00x00000000005cb4c1 in ProcessInterrupts () at postgres.c:2653
> #9 =A00x00000000005342fa in ExecScan (node=3D0xaccb60, accessMtd=3D0x5400=
00
> <SeqNext>) at execScan.c:101
> #10 0x000000000052d86d in ExecProcNode (node=3D0xaccb60) at
> execProcnode.c:363
> #11 0x000000000053c83c in ExecHashJoin (node=3D0xb49920) at
> nodeHashjoin.c:131
> #12 0x000000000052d7bd in ExecProcNode (node=3D0xb49920) at
> execProcnode.c:412
> #13 0x0000000000540b1a in ExecSort (node=3D0xb49810) at nodeSort.c:102
> #14 0x000000000052d79a in ExecProcNode (node=3D0xb49810) at
> execProcnode.c:423
> #15 0x000000000052b5ca in standard_ExecutorRun (queryDesc=3D0xb153a0,
> direction=3DForwardScanDirection, count=3D10) at execMain.c:1504
> #16 0x0000002a98417197 in explain_ExecutorRun (queryDesc=3D0xb153a0,
> direction=3DForwardScanDirection, count=3D10) at auto_explain.c:168
> #17 0x0000002a9851a3e3 in pgss_ExecutorRun (queryDesc=3D0xb153a0,
> direction=3DForwardScanDirection, count=3D10) at pg_stat_statements.c:514
> #18 0x00000000005cff49 in PortalRunSelect (portal=3D0xaa2db0, forward=3DV=
ariable
> "forward" is not available.
> ) at pquery.c:953
> #19 0x00000000005d0e80 in PortalRunFetch (portal=3D0xaa2db0,
> fdirection=3DFETCH_FORWARD, count=3D10, dest=3D0x92db20) at pquery.c:1402
> #20 0x0000000000548a81 in _SPI_cursor_operation (portal=3D0xaa2db0,
> direction=3DFETCH_FORWARD, count=3D10, dest=3D0x92db20) at spi.c:2088
> #21 0x0000002ae7a195f4 in exec_for_query (estate=3D0x7fbfffcd50,
> stmt=3D0xaec558, portal=3D0xaa2db0, prefetch_ok=3D1 '\001') at pl_exec.c:=
4211
> #22 0x0000002ae7a197f6 in exec_stmt_fors (estate=3D0x7fbfffcd50,
> stmt=3D0xaec558) at pl_exec.c:1852
> #23 0x0000002ae7a16699 in exec_stmts (estate=3D0x7fbfffcd50, stmts=3DVari=
able
> "stmts" is not available.
> ) at pl_exec.c:1269
> #24 0x0000002ae7a17ff1 in exec_stmt_block (estate=3D0x0, block=3D0xaec650=
) at
> pl_exec.c:1137
> #25 0x0000002ae7a18b69 in plpgsql_exec_function (func=3D0xa88458,
> fcinfo=3D0x7fbfffd020) at pl_exec.c:315
> #26 0x0000002ae7a0f940 in plpgsql_call_handler (fcinfo=3D0x7fbfffd020) at
> pl_handler.c:95
> #27 0x000000000052fd64 in ExecMakeTableFunctionResult (funcexpr=3D0xabe00=
0,
> econtext=3D0xabdd60, expectedDesc=3D0xabdec0, randomAccess=3D0 '\0')
> =A0 =A0at execQual.c:2016
> #28 0x000000000053f561 in FunctionNext (node=3D0xabdc50) at
> nodeFunctionscan.c:64
> #29 0x0000000000534331 in ExecScan (node=3D0x0, accessMtd=3D0x53f4f0
> <FunctionNext>) at execScan.c:68
> #30 0x000000000052d81d in ExecProcNode (node=3D0xabdc50) at
> execProcnode.c:385
> #31 0x0000000000537acc in ExecAppend (node=3D0xaa5dd0) at nodeAppend.c:269
> #32 0x000000000052d88d in ExecProcNode (node=3D0xaa5dd0) at
> execProcnode.c:348
> #33 0x000000000052b5ca in standard_ExecutorRun (queryDesc=3D0xb5dbc0,
> direction=3DForwardScanDirection, count=3D0) at execMain.c:1504
> #34 0x0000002a98417197 in explain_ExecutorRun (queryDesc=3D0xb5dbc0,
> direction=3DForwardScanDirection, count=3D0) at auto_explain.c:168
> #35 0x0000002a9851a3e3 in pgss_ExecutorRun (queryDesc=3D0xb5dbc0,
> direction=3DForwardScanDirection, count=3D0) at pg_stat_statements.c:514
> #36 0x0000000000536da4 in fmgr_sql (fcinfo=3D0x7fbfffd9d0) at functions.c=
:472
> #37 0x000000000052fd64 in ExecMakeTableFunctionResult (funcexpr=3D0xaf14a=
0,
> econtext=3D0xaf11d0, expectedDesc=3D0xaf1360, randomAccess=3D0 '\0')
> =A0 =A0at execQual.c:2016
> #38 0x000000000053f561 in FunctionNext (node=3D0xaf10c0) at
> nodeFunctionscan.c:64
> #39 0x0000000000534331 in ExecScan (node=3D0x0, accessMtd=3D0x53f4f0
> <FunctionNext>) at execScan.c:68
> #40 0x000000000052d81d in ExecProcNode (node=3D0xaf10c0) at
> execProcnode.c:385
> #41 0x000000000052b5ca in standard_ExecutorRun (queryDesc=3D0xa98450,
> direction=3DForwardScanDirection, count=3D0) at execMain.c:1504
> ---Type <return> to continue, or q <return> to quit---
> #42 0x0000002a98417197 in explain_ExecutorRun (queryDesc=3D0xa98450,
> direction=3DForwardScanDirection, count=3D0) at auto_explain.c:168
> #43 0x0000002a9851a3e3 in pgss_ExecutorRun (queryDesc=3D0xa98450,
> direction=3DForwardScanDirection, count=3D0) at pg_stat_statements.c:514
> #44 0x00000000005cff49 in PortalRunSelect (portal=3D0xaa2ca0, forward=3DV=
ariable
> "forward" is not available.
> ) at pquery.c:953
> #45 0x00000000005d0760 in PortalRun (portal=3D0xaa2ca0,
> count=3D9223372036854775807, isTopLevel=3D1 '\001', dest=3D0xa55c98,
> altdest=3D0xa55c98,
> =A0 =A0completionTag=3D0x7fbfffe360 "") at pquery.c:779
> #46 0x00000000005cc2d6 in exec_simple_query (
> =A0 =A0query_string=3D0xa53da0 "select * from
> rtk_role_maint.sql_to_refresh_grants_for_dml =A0 =A0 =A0 ('att', 'att_dml=
') ;") at
> postgres.c:991
> #47 0x00000000005cdcdf in PostgresMain (argc=3D4, argv=3DVariable "argv" =
is not
> available.
> ) at postgres.c:3614
> #48 0x00000000005a4ee0 in ServerLoop () at postmaster.c:3447
> #49 0x00000000005a686d in PostmasterMain (argc=3D3, argv=3D0x9b04e0) at
> postmaster.c:1040
> #50 0x00000000005530be in main (argc=3D3, argv=3D0x9b04e0) at main.c:188
>
>
> $ grep -A70 'Segmentation fault' postgresql-2011-03-01_000000.log
> <6719, 2011-03-01 06:00:15 PST> LOG: =A0server process (PID 5746) was
> terminated by signal 11: Segmentation fault
> <6719, 2011-03-01 06:00:15 PST> LOG: =A0terminating any other active serv=
er
> processes
> <5940, 2011-03-01 06:00:15 PST> WARNING: =A0terminating connection becaus=
e of
> crash of another server process
> <5940, 2011-03-01 06:00:15 PST> DETAIL: =A0The postmaster has commanded t=
his
> server process to roll back the current transaction and exit, because
> another server process exited abnormally and possibly corrupted shared
> memory.
> <5940, 2011-03-01 06:00:15 PST> HINT: =A0In a moment you should be able to
> reconnect to the database and repeat your command.
> ...
> <6040, 2011-03-01 06:00:17 PST> FATAL: =A0the database system is in recov=
ery
> mode
> <6039, 2011-03-01 06:00:17 PST> LOG: =A0database system was interrupted; =
last
> known up at 2011-03-01 05:51:18 PST
> <6039, 2011-03-01 06:00:17 PST> LOG: =A0database system was not properly =
shut
> down; automatic recovery in progress
> <6039, 2011-03-01 06:00:17 PST> LOG: =A0redo starts at 7D0/6FF7E490
> <6039, 2011-03-01 06:00:17 PST> LOG: =A0unexpected pageaddr 7D0/20F9A000 =
in
> log file 2000, segment 111, offset 16359424
> <6039, 2011-03-01 06:00:17 PST> LOG: =A0redo done at 7D0/6FF97728
> <6039, 2011-03-01 06:00:17 PST> LOG: =A0last completed transaction was at=
 log
> time 2011-03-01 06:00:04.083136-08
> <6039, 2011-03-01 06:00:17 PST> LOG: =A0checkpoint starting: end-of-recov=
ery
> immediate
> <6039, 2011-03-01 06:00:17 PST> LOG: =A0checkpoint complete: wrote 24 buf=
fers
> (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=3D0=
.009
> s, sync=3D0.002 s, total=3D0.011 s
> <6044, 2011-03-01 06:00:18 PST> LOG: =A0autovacuum launcher started
> <6719, 2011-03-01 06:00:18 PST> LOG: =A0database system is ready to accept
> connections
>
> Unfortunately, I have not been successful in generating a reproducible ca=
se;
> it may be a race condition edge case.

did you read: http://archives.postgresql.org/pgsql-bugs/2010-03/msg00170.ph=
p?

this may have been fixed -- check follow ups in pgsql-commits.  and
upgrade to latest 8.4!!

merlin

pgsql-bugs by date:

Previous
From: "Matt Harrington"
Date:
Subject: BUG #5905: segfault on pg_terminate_backend
Next
From: Heikki Linnakangas
Date:
Subject: Re: BUG #5905: segfault on pg_terminate_backend