Thread: BUG #12918: Segfault in BackendIdGetTransactionIds
The following bug has been logged on the website: Bug reference: 12918 Logged by: Vladimir Email address: root@simply.name PostgreSQL version: 9.4.1 Operating system: RHEL 6.6 Description: Hello. After upgrading from 9.3.6 to 9.4.1 (both installed from packages on yum.postgresql.org) we have started getting segfaults of different backends. Backtraces of all coredumps look similar: (gdb) bt #0 0x000000000066bf9b in BackendIdGetTransactionIds (backendID=<value optimized out>, xid=0x7f2a1b714798, xmin=0x7f2a1b71479c) at sinvaladt.c:426 #1 0x00000000006287f4 in pgstat_read_current_status () at pgstat.c:2871 #2 0x0000000000628879 in pgstat_fetch_stat_numbackends () at pgstat.c:2342 #3 0x00000000006f9d5a in pg_stat_get_db_numbackends (fcinfo=<value optimized out>) at pgstatfuncs.c:1080 #4 0x000000000059c345 in ExecMakeFunctionResultNoSets (fcache=0x1f4c270, econtext=0x1f4bbe0, isNull=0x1f5e588 "", isDone=<value optimized out>) at execQual.c:2023 #5 0x00000000005981a3 in ExecTargetList (projInfo=<value optimized out>, isDone=0x0) at execQual.c:5304 #6 ExecProject (projInfo=<value optimized out>, isDone=0x0) at execQual.c:5519 #7 0x00000000005a458d in advance_aggregates (aggstate=0x1f4bdc0, pergroup=0x1f5e380) at nodeAgg.c:556 #8 0x00000000005a4da5 in agg_retrieve_direct (node=<value optimized out>) at nodeAgg.c:1223 #9 ExecAgg (node=<value optimized out>) at nodeAgg.c:1115 #10 0x0000000000597638 in ExecProcNode (node=0x1f4bdc0) at execProcnode.c:476 #11 0x0000000000596252 in ExecutePlan (queryDesc=0x1eae6d0, direction=<value optimized out>, count=0) at execMain.c:1486 #12 standard_ExecutorRun (queryDesc=0x1eae6d0, direction=<value optimized out>, count=0) at execMain.c:319 #13 0x0000000000686797 in PortalRunSelect (portal=0x1ea5660, forward=<value optimized out>, count=0, dest=<value optimized out>) at pquery.c:946 #14 0x00000000006879c1 in PortalRun (portal=0x1ea5660, count=9223372036854775807, isTopLevel=1 '\001', dest=0x1f5a528, altdest=0x1f5a528, completionTag=0x7fff277b3b80 "") at pquery.c:790 #15 0x000000000068404e in exec_simple_query (query_string=0x1e989d0 "SELECT sum(numbackends) FROM pg_stat_database;") at postgres.c:1072 #16 0x00000000006856c8 in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, dbname=0x1e7f398 "postgres", username=<value optimized out>) at postgres.c:4074 #17 0x0000000000632d7d in BackendRun (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:4155 #18 BackendStartup (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:3829 #19 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1597 #20 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1244 #21 0x00000000005cadb8 in main (argc=3, argv=0x1e7e5e0) at main.c:228 (gdb) Unfortunatelly, I can't give a clear sequence of steps to reproduce the problem, segfaults are happening in quiet random time and under random workloads :( So I'm trying to reproduce it on testing stand where PostgreSQL is built with --enable-debug flag to give you more information (but still no luck for last two weeks). The common conditions are: 1. it happens only on master hosts (never on any of the streaming replicas), 2. it happens on simple queries to pg_catalog or system views as shown in the backtrace above, 3. it happens only with direct connecting to PostgreSQL (production-queries go through pgbouncer and no coredumps contain production queries). And till now it happened only with python-psycopg2 (we have tried versions 2.5.3-1.rhel6 with postgresql93-libs, 2.5.4-1.rhel6 and 2.6-1.rhel6 with postgresql94-libs). I've asked about it on psycopg-list [0] but it doesn't seem to be the client problem. [0] http://www.postgresql.org/message-id/flat/CA+mi_8a246TK6YBLzf_7c5sc+XuiMaGafG0mhrFbp6Nq+SQt3w@mail.gmail.com#CA+mi_8a246TK6YBLzf_7c5sc+XuiMaGafG0mhrFbp6Nq+SQt3w@mail.gmail.com
root@simply.name writes: > After upgrading from 9.3.6 to 9.4.1 (both installed from packages on > yum.postgresql.org) we have started getting segfaults of different backends. > Backtraces of all coredumps look similar: > (gdb) bt > #0 0x000000000066bf9b in BackendIdGetTransactionIds (backendID=<value > optimized out>, xid=0x7f2a1b714798, xmin=0x7f2a1b71479c) at sinvaladt.c:426 > #1 0x00000000006287f4 in pgstat_read_current_status () at pgstat.c:2871 > #2 0x0000000000628879 in pgstat_fetch_stat_numbackends () at pgstat.c:2342 Hmm ... looks to me like BackendIdGetTransactionIds is simply busted. It supposes that there are no inactive entries in the sinval array within the range 0 .. lastBackend. But there can be, in which case dereferencing stateP->proc crashes. The reason it's hard to reproduce is the relatively narrow window between where pgstat_read_current_status saw the backend as active and where we're inspecting its sinval entry. regards, tom lane
> 30 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2015 =D0=B3., =D0=B2 19:33, Tom Lane = <tgl@sss.pgh.pa.us> =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0=B0): >=20 > root@simply.name writes: >> After upgrading from 9.3.6 to 9.4.1 (both installed from packages on >> yum.postgresql.org) we have started getting segfaults of different = backends. >> Backtraces of all coredumps look similar: >> (gdb) bt >> #0 0x000000000066bf9b in BackendIdGetTransactionIds = (backendID=3D<value >> optimized out>, xid=3D0x7f2a1b714798, xmin=3D0x7f2a1b71479c) at = sinvaladt.c:426 >> #1 0x00000000006287f4 in pgstat_read_current_status () at = pgstat.c:2871 >> #2 0x0000000000628879 in pgstat_fetch_stat_numbackends () at = pgstat.c:2342 >=20 > Hmm ... looks to me like BackendIdGetTransactionIds is simply busted. > It supposes that there are no inactive entries in the sinval array > within the range 0 .. lastBackend. But there can be, in which case > dereferencing stateP->proc crashes. The reason it's hard to reproduce > is the relatively narrow window between where = pgstat_read_current_status > saw the backend as active and where we're inspecting its sinval entry. I=E2=80=99ve also tried to revert dd1a3bcc where this function appeared = but couldn=E2=80=99t do it :( If you would be able to make a build = without this commit (if it is easier than fix it in right way), I could = install it on several production hosts to test it. >=20 > regards, tom lane -- May the force be with you=E2=80=A6 https://simply.name
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > root@simply.name writes: > > After upgrading from 9.3.6 to 9.4.1 (both installed from packages on > > yum.postgresql.org) we have started getting segfaults of different back= ends. > > Backtraces of all coredumps look similar: > > (gdb) bt > > #0 0x000000000066bf9b in BackendIdGetTransactionIds (backendID=3D<value > > optimized out>, xid=3D0x7f2a1b714798, xmin=3D0x7f2a1b71479c) at sinvala= dt.c:426 > > #1 0x00000000006287f4 in pgstat_read_current_status () at pgstat.c:2871 > > #2 0x0000000000628879 in pgstat_fetch_stat_numbackends () at pgstat.c:= 2342 >=20 > Hmm ... looks to me like BackendIdGetTransactionIds is simply busted. > It supposes that there are no inactive entries in the sinval array > within the range 0 .. lastBackend. But there can be, in which case > dereferencing stateP->proc crashes. The reason it's hard to reproduce > is the relatively narrow window between where pgstat_read_current_status > saw the backend as active and where we're inspecting its sinval entry. As an immediate short-term workaround, from what I can tell,=20 disabling calls to pg_stat_activity, and pg_stat_database (views), and pg_stat_get_activity, pg_stat_get_backend_idset, and pg_stat_get_db_numbackends (functions) should prevent triggering this bug. These are likely being run by a monitoring system (eg: check_postgres =66rom Nagios). Thanks! Stephen
> 30 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2015 =D0=B3., =D0=B2 19:44, Stephen = Frost <sfrost@snowman.net> =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0=B0= ): >=20 > * Tom Lane (tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>) wrote: >> root@simply.name writes: >>> After upgrading from 9.3.6 to 9.4.1 (both installed from packages on >>> yum.postgresql.org) we have started getting segfaults of different = backends. >>> Backtraces of all coredumps look similar: >>> (gdb) bt >>> #0 0x000000000066bf9b in BackendIdGetTransactionIds = (backendID=3D<value >>> optimized out>, xid=3D0x7f2a1b714798, xmin=3D0x7f2a1b71479c) at = sinvaladt.c:426 >>> #1 0x00000000006287f4 in pgstat_read_current_status () at = pgstat.c:2871 >>> #2 0x0000000000628879 in pgstat_fetch_stat_numbackends () at = pgstat.c:2342 >>=20 >> Hmm ... looks to me like BackendIdGetTransactionIds is simply busted. >> It supposes that there are no inactive entries in the sinval array >> within the range 0 .. lastBackend. But there can be, in which case >> dereferencing stateP->proc crashes. The reason it's hard to = reproduce >> is the relatively narrow window between where = pgstat_read_current_status >> saw the backend as active and where we're inspecting its sinval = entry. >=20 > As an immediate short-term workaround, from what I can tell,=20 > disabling calls to pg_stat_activity, and pg_stat_database (views), and > pg_stat_get_activity, pg_stat_get_backend_idset, and > pg_stat_get_db_numbackends (functions) should prevent triggering this > bug. I suppose, pg_stat_replication should not be asked too. We have already = done that on most critical databases but it is hard to be blind :( >=20 > These are likely being run by a monitoring system (eg: check_postgres > from Nagios). >=20 > Thanks! >=20 > Stephen -- May the force be with you=E2=80=A6 https://simply.name
* Vladimir Borodin (root@simply.name) wrote: > > 30 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2015 =D0=B3., =D0=B2 19:44, Stephen F= rost <sfrost@snowman.net> =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0=B0= ): > > * Tom Lane (tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>) wrote: > >> root@simply.name writes: > >>> After upgrading from 9.3.6 to 9.4.1 (both installed from packages on > >>> yum.postgresql.org) we have started getting segfaults of different ba= ckends. > >>> Backtraces of all coredumps look similar: > >>> (gdb) bt > >>> #0 0x000000000066bf9b in BackendIdGetTransactionIds (backendID=3D<va= lue > >>> optimized out>, xid=3D0x7f2a1b714798, xmin=3D0x7f2a1b71479c) at sinva= ladt.c:426 > >>> #1 0x00000000006287f4 in pgstat_read_current_status () at pgstat.c:2= 871 > >>> #2 0x0000000000628879 in pgstat_fetch_stat_numbackends () at pgstat.= c:2342 > >>=20 > >> Hmm ... looks to me like BackendIdGetTransactionIds is simply busted. > >> It supposes that there are no inactive entries in the sinval array > >> within the range 0 .. lastBackend. But there can be, in which case > >> dereferencing stateP->proc crashes. The reason it's hard to reproduce > >> is the relatively narrow window between where pgstat_read_current_stat= us > >> saw the backend as active and where we're inspecting its sinval entry. > >=20 > > As an immediate short-term workaround, from what I can tell,=20 > > disabling calls to pg_stat_activity, and pg_stat_database (views), and > > pg_stat_get_activity, pg_stat_get_backend_idset, and > > pg_stat_get_db_numbackends (functions) should prevent triggering this > > bug. >=20 > I suppose, pg_stat_replication should not be asked too. We have already d= one that on most critical databases but it is hard to be blind :( Ah, yes, not sure where I dropped that; it was in my initial list but didn't make it into the final email. I would expect a fix to be included in the next point release, hopefully released in the next couple of months. Thanks! Stephen
* Vladimir Borodin (root@simply.name) wrote: >=20 > > 30 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2015 =D0=B3., =D0=B2 19:33, Tom Lane = <tgl@sss.pgh.pa.us> =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0=B0): > >=20 > > root@simply.name writes: > >> After upgrading from 9.3.6 to 9.4.1 (both installed from packages on > >> yum.postgresql.org) we have started getting segfaults of different bac= kends. > >> Backtraces of all coredumps look similar: > >> (gdb) bt > >> #0 0x000000000066bf9b in BackendIdGetTransactionIds (backendID=3D<val= ue > >> optimized out>, xid=3D0x7f2a1b714798, xmin=3D0x7f2a1b71479c) at sinval= adt.c:426 > >> #1 0x00000000006287f4 in pgstat_read_current_status () at pgstat.c:28= 71 > >> #2 0x0000000000628879 in pgstat_fetch_stat_numbackends () at pgstat.c= :2342 > >=20 > > Hmm ... looks to me like BackendIdGetTransactionIds is simply busted. > > It supposes that there are no inactive entries in the sinval array > > within the range 0 .. lastBackend. But there can be, in which case > > dereferencing stateP->proc crashes. The reason it's hard to reproduce > > is the relatively narrow window between where pgstat_read_current_status > > saw the backend as active and where we're inspecting its sinval entry. >=20 > I=E2=80=99ve also tried to revert dd1a3bcc where this function appeared b= ut couldn=E2=80=99t do it :( If you would be able to make a build without t= his commit (if it is easier than fix it in right way), I could install it o= n several production hosts to test it. Hopefully a fix will be forthcoming shortly. Reverting it won't work though, no, as it included a catalog bump. Thanks, Stephen
Vladimir Borodin <root@simply.name> writes: > I���ve also tried to revert dd1a3bcc where this function appeared but couldn���t do it :( If you would be able to makea build without this commit (if it is easier than fix it in right way), I could install it on several production hoststo test it. Try this. regards, tom lane diff --git a/src/backend/storage/ipc/sinvaladt.c b/src/backend/storage/ipc/sinvaladt.c index 81b85c0..a2fde89 100644 *** a/src/backend/storage/ipc/sinvaladt.c --- b/src/backend/storage/ipc/sinvaladt.c *************** BackendIdGetProc(int backendID) *** 403,411 **** void BackendIdGetTransactionIds(int backendID, TransactionId *xid, TransactionId *xmin) { - ProcState *stateP; SISeg *segP = shmInvalBuffer; - PGXACT *xact; *xid = InvalidTransactionId; *xmin = InvalidTransactionId; --- 403,409 ---- *************** BackendIdGetTransactionIds(int backendID *** 415,425 **** if (backendID > 0 && backendID <= segP->lastBackend) { ! stateP = &segP->procState[backendID - 1]; ! xact = &ProcGlobal->allPgXact[stateP->proc->pgprocno]; ! *xid = xact->xid; ! *xmin = xact->xmin; } LWLockRelease(SInvalWriteLock); --- 413,428 ---- if (backendID > 0 && backendID <= segP->lastBackend) { ! ProcState *stateP = &segP->procState[backendID - 1]; ! PGPROC *proc = stateP->proc; ! if (proc != NULL) ! { ! PGXACT *xact = &ProcGlobal->allPgXact[proc->pgprocno]; ! ! *xid = xact->xid; ! *xmin = xact->xmin; ! } } LWLockRelease(SInvalWriteLock);
> 30 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2015 =D0=B3., =D0=B2 20:00, Tom Lane = <tgl@sss.pgh.pa.us> =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0=B0): >=20 > Vladimir Borodin <root@simply.name> writes: >> I=E2=80=99ve also tried to revert dd1a3bcc where this function = appeared but couldn=E2=80=99t do it :( If you would be able to make a = build without this commit (if it is easier than fix it in right way), I = could install it on several production hosts to test it. >=20 > Try this. 38 minutes from a bug report to the patch with a fix! You are fantastic. = Thanks. It compiles, passes 'make check' and 'make check-world=E2=80=99 (I = think, you have checked it but just in case...). I=E2=80=99ve built a = package and installed it on one host. If everything would be ok, = tomorrow I will install it on several hosts and slowly farther. The = problem reproduces on our number of hosts approximately once a week. If = the problem disappears I will let you know in a couple of weeks. Thanks again. >=20 > regards, tom lane >=20 > diff --git a/src/backend/storage/ipc/sinvaladt.c = b/src/backend/storage/ipc/sinvaladt.c > index 81b85c0..a2fde89 100644 > *** a/src/backend/storage/ipc/sinvaladt.c > --- b/src/backend/storage/ipc/sinvaladt.c > *************** BackendIdGetProc(int backendID) > *** 403,411 **** > void > BackendIdGetTransactionIds(int backendID, TransactionId *xid, = TransactionId *xmin) > { > - ProcState *stateP; > SISeg *segP =3D shmInvalBuffer; > - PGXACT *xact; >=20 > *xid =3D InvalidTransactionId; > *xmin =3D InvalidTransactionId; > --- 403,409 ---- > *************** BackendIdGetTransactionIds(int backendID > *** 415,425 **** >=20 > if (backendID > 0 && backendID <=3D segP->lastBackend) > { > ! stateP =3D &segP->procState[backendID - 1]; > ! xact =3D &ProcGlobal->allPgXact[stateP->proc->pgprocno]; >=20 > ! *xid =3D xact->xid; > ! *xmin =3D xact->xmin; > } >=20 > LWLockRelease(SInvalWriteLock); > --- 413,428 ---- >=20 > if (backendID > 0 && backendID <=3D segP->lastBackend) > { > ! ProcState *stateP =3D &segP->procState[backendID - 1]; > ! PGPROC *proc =3D stateP->proc; >=20 > ! if (proc !=3D NULL) > ! { > ! PGXACT *xact =3D = &ProcGlobal->allPgXact[proc->pgprocno]; > !=20 > ! *xid =3D xact->xid; > ! *xmin =3D xact->xmin; > ! } > } >=20 > LWLockRelease(SInvalWriteLock); -- May the force be with you=E2=80=A6 https://simply.name
On Mon, 30 Mar 2015 13:00:01 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Vladimir Borodin <root@simply.name> writes: > > I=E2=80=99ve also tried to revert dd1a3bcc where this function appeared= but couldn=E2=80=99t do it :( If you would be able to make a build without= this commit (if it is easier than fix it in right way), I could install it= on several production hosts to test it. >=20 > Try this. Nice to see a patch, in advance of need ;-) Thanks! We have had a couple segfaults recently but once we enabled core files it stopped happening. Until just now. I can build with the patch, but if a 9.4.2 is immanent it would be nice to know before scheduling an extra round of downtimes. This is apparently from a python trigger calling get_app_name(). I can provide the rest of the stack if it would be useful. Program terminated with signal 11, Segmentation fault. #0 0x000000000066148b in BackendIdGetTransactionIds (backendID=3D<value op= timized out>, xid=3D0x7f5d56ae1598, xmin=3D0x7f5d56ae159c) at sinvaladt.c:426 426 sinvaladt.c: No such file or directory. in sinvaladt.c Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.= 5.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) bt #0 0x000000000066148b in BackendIdGetTransactionIds (backendID=3D<value op= timized out>, xid=3D0x7f5d56ae1598, xmin=3D0x7f5d56ae159c) at sinvaladt.c:426 #1 0x000000000061f064 in pgstat_read_current_status () at pgstat.c:2871 #2 0x000000000061f0e9 in pgstat_fetch_stat_numbackends () at pgstat.c:2342 #3 0x00000000006ef373 in pg_stat_get_activity (fcinfo=3D0x7fffd2e78f50) at= pgstatfuncs.c:591 #4 0x00000000005977ec in ExecMakeTableFunctionResult (funcexpr=3D0x17fdae0= , econtext=3D0x17fd770, argContext=3D<value optimized out>,=20 expectedDesc=3D0x17ffd70, randomAccess=3D0 '\000') at execQual.c:2193 #5 0x00000000005a91f2 in FunctionNext (node=3D0x17fd660) at nodeFunctionsc= an.c:95 #6 0x00000000005982ce in ExecScanFetch (node=3D0x17fd660, accessMtd=3D0x5a= 8f40 <FunctionNext>, recheckMtd=3D0x5a8870 <FunctionRecheck>) at execScan.c:82 #7 ExecScan (node=3D0x17fd660, accessMtd=3D0x5a8f40 <FunctionNext>, rechec= kMtd=3D0x5a8870 <FunctionRecheck>) at execScan.c:167 #8 0x00000000005913c8 in ExecProcNode (node=3D0x17fd660) at execProcnode.c= :426 #9 0x000000000058ff32 in ExecutePlan (queryDesc=3D0x17f81f0, direction=3D<= value optimized out>, count=3D1) at execMain.c:1486 #10 standard_ExecutorRun (queryDesc=3D0x17f81f0, direction=3D<value optimiz= ed out>, count=3D1) at execMain.c:319 #11 0x00007f69a7d3867b in explain_ExecutorRun (queryDesc=3D0x17f81f0, direc= tion=3DForwardScanDirection, count=3D1) at auto_explain.c:243 #12 0x00007f69a7b33965 in pgss_ExecutorRun (queryDesc=3D0x17f81f0, directio= n=3DForwardScanDirection, count=3D1) at pg_stat_statements.c:873 #13 0x000000000059bd6c in postquel_getnext (fcinfo=3D<value optimized out>)= at functions.c:853 #14 fmgr_sql (fcinfo=3D<value optimized out>) at functions.c:1148 #15 0x0000000000595f85 in ExecMakeFunctionResultNoSets (fcache=3D0x17ed920,= econtext=3D0x17ed730, isNull=3D0x17ee2a8 " ",=20 isDone=3D<value optimized out>) at execQual.c:2023 #16 0x0000000000591e53 in ExecTargetList (projInfo=3D<value optimized out>,= isDone=3D0x7fffd2e798fc) at execQual.c:5304 #17 ExecProject (projInfo=3D<value optimized out>, isDone=3D0x7fffd2e798fc)= at execQual.c:5519 #18 0x00000000005a98fb in ExecResult (node=3D0x17ed620) at nodeResult.c:155 #19 0x0000000000591478 in ExecProcNode (node=3D0x17ed620) at execProcnode.c= :373 #20 0x000000000058ff32 in ExecutePlan (queryDesc=3D0x166c610, direction=3D<= value optimized out>, count=3D0) at execMain.c:1486 #21 standard_ExecutorRun (queryDesc=3D0x166c610, direction=3D<value optimiz= ed out>, count=3D0) at execMain.c:319 #22 0x00007f69a7d3867b in explain_ExecutorRun (queryDesc=3D0x166c610, direc= tion=3DForwardScanDirection, count=3D0) at auto_explain.c:243 #23 0x00007f69a7b33965 in pgss_ExecutorRun (queryDesc=3D0x166c610, directio= n=3DForwardScanDirection, count=3D0) at pg_stat_statements.c:873 #24 0x00000000005b39d0 in _SPI_pquery (plan=3D0x7fffd2e79d10, paramLI=3D0x0= , snapshot=3D<value optimized out>, crosscheck_snapshot=3D0x0,=20 read_only=3D0 '\000', fire_triggers=3D1 '\001', tcount=3D0) at spi.c:23= 72 #25 _SPI_execute_plan (plan=3D0x7fffd2e79d10, paramLI=3D0x0, snapshot=3D<va= lue optimized out>, crosscheck_snapshot=3D0x0,=20 read_only=3D0 '\000', fire_triggers=3D1 '\001', tcount=3D0) at spi.c:21= 60 #26 0x00000000005b4076 in SPI_execute (src=3D0x15f6054 "SELECT get_app_name= () AS a", read_only=3D0 '\000', tcount=3D0) at spi.c:386 #27 0x00007f5d5672f702 in PLy_spi_execute_query (query=3D0x15f6054 "SELECT = get_app_name() AS a", limit=3D0) at plpy_spi.c:357 -dg --=20 David Gould 510 282 0869 daveg@sonic.net If simplicity worked, the world would be overrun with insects.
David Gould <daveg@sonic.net> writes: > We have had a couple segfaults recently but once we enabled core files it > stopped happening. Until just now. I can build with the > patch, but if a 9.4.2 is immanent it would be nice to know before > scheduling an extra round of downtimes. No plans for an imminent 9.4.2. There's been some discussion about a set of releases in May; the only way something happens sooner than that is if we find a staggeringly-bad bug. regards, tom lane
> 30 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2015 =D0=B3., =D0=B2 20:54, Vladimir = Borodin <root@simply.name> =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0=B0= ): >=20 >>=20 >> 30 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2015 =D0=B3., =D0=B2 20:00, Tom = Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>> =D0=BD=D0=B0=D0=BF=D0=B8= =D1=81=D0=B0=D0=BB(=D0=B0): >>=20 >> Vladimir Borodin <root@simply.name <mailto:root@simply.name>> writes: >>> I=E2=80=99ve also tried to revert dd1a3bcc where this function = appeared but couldn=E2=80=99t do it :( If you would be able to make a = build without this commit (if it is easier than fix it in right way), I = could install it on several production hosts to test it. >>=20 >> Try this. >=20 > 38 minutes from a bug report to the patch with a fix! You are = fantastic. Thanks. >=20 > It compiles, passes 'make check' and 'make check-world=E2=80=99 (I = think, you have checked it but just in case...). I=E2=80=99ve built a = package and installed it on one host. If everything would be ok, = tomorrow I will install it on several hosts and slowly farther. The = problem reproduces on our number of hosts approximately once a week. If = the problem disappears I will let you know in a couple of weeks. No segfaults for more than a week since I=E2=80=99ve upgraded all hosts. = Seems, that the patch is good. Thank you very much. >=20 > Thanks again. >=20 >>=20 >> regards, tom lane >>=20 >> diff --git a/src/backend/storage/ipc/sinvaladt.c = b/src/backend/storage/ipc/sinvaladt.c >> index 81b85c0..a2fde89 100644 >> *** a/src/backend/storage/ipc/sinvaladt.c >> --- b/src/backend/storage/ipc/sinvaladt.c >> *************** BackendIdGetProc(int backendID) >> *** 403,411 **** >> void >> BackendIdGetTransactionIds(int backendID, TransactionId *xid, = TransactionId *xmin) >> { >> - ProcState *stateP; >> SISeg *segP =3D shmInvalBuffer; >> - PGXACT *xact; >>=20 >> *xid =3D InvalidTransactionId; >> *xmin =3D InvalidTransactionId; >> --- 403,409 ---- >> *************** BackendIdGetTransactionIds(int backendID >> *** 415,425 **** >>=20 >> if (backendID > 0 && backendID <=3D segP->lastBackend) >> { >> ! stateP =3D &segP->procState[backendID - 1]; >> ! xact =3D &ProcGlobal->allPgXact[stateP->proc->pgprocno]; >>=20 >> ! *xid =3D xact->xid; >> ! *xmin =3D xact->xmin; >> } >>=20 >> LWLockRelease(SInvalWriteLock); >> --- 413,428 ---- >>=20 >> if (backendID > 0 && backendID <=3D segP->lastBackend) >> { >> ! ProcState *stateP =3D &segP->procState[backendID - 1]; >> ! PGPROC *proc =3D stateP->proc; >>=20 >> ! if (proc !=3D NULL) >> ! { >> ! PGXACT *xact =3D = &ProcGlobal->allPgXact[proc->pgprocno]; >> !=20 >> ! *xid =3D xact->xid; >> ! *xmin =3D xact->xmin; >> ! } >> } >>=20 >> LWLockRelease(SInvalWriteLock); >=20 >=20 > -- > May the force be with you=E2=80=A6 > https://simply.name <https://simply.name/> -- May the force be with you=E2=80=A6 https://simply.name