Please help me debug regular segfaults on 8.3.10 - Mailing list pgsql-general

From pgsql
Subject Please help me debug regular segfaults on 8.3.10
Date
Msg-id hrq276$1lu$1@news.hub.org
Whole thread Raw
Responses Re: Please help me debug regular segfaults on 8.3.10  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: Please help me debug regular segfaults on 8.3.10  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Please help me debug regular segfaults on 8.3.10  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Hi,

one of our pgsql instances recently started to segfault multiple times a
week. I tried a couple of things to pin it down to a certain query
or job but failed to find any pattern. All I can offer is some notes
and a set of similar looking back traces.

Thanks in advance.



Machine details
---------------
* CentOS release 5.4 (Final)
* Linux 2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 11:30:06 EDT 2010 x86_64
x86_64 x86_64 GNU/Linux
* 4x Quad-Core AMD Opteron 8354
* 64GB RAM (ECC)



PostgreSQL packages
-------------------
* postgresql-8.3.10-2PGDG.el5
* postgresql-contrib-8.3.10-2PGDG.el5
* postgresql-devel-8.3.10-2PGDG.el5
* postgresql-libs-8.3.10-2PGDG.el5
* postgresql-plperl-8.3.10-2PGDG.el5
* postgresql-plpython-8.3.10-2PGDG.el5
* postgresql-pltcl-8.3.10-2PGDG.el5
* postgresql-server-8.3.10-2PGDG.el5



Environment
-----------
* Multiple databases with a total of 1TB in size
* So far the back traces show three different databases
* Some larger hash indexes exist (requiring reindex after each crash)
* The only loaded PL is pl/pgsql
* The system is doing around 3000 TPS constantly



Things that didn't make any change
----------------------------------
* Updated from 8.3.7 to 8.3.10
* Updated OS kernel



2010-05-04 | core.21207
-----------------------
Core was generated by `postgres: <user> <database_1> <client ip>('.
Program terminated with signal 11, Segmentation fault.
[New process 21207]
#0  0x000000000066acae in pfree ()
(gdb) bt
#0  0x000000000066acae in pfree ()
#1  0x0000000000648c6e in ?? ()
#2  0x0000000000648f34 in ?? ()
#3  0x00000000006493d4 in RelationCacheInvalidateEntry ()
#4  0x0000000000644fcd in ?? ()
#5  0x0000000000644882 in ?? ()
#6  0x00000000006448be in CommandEndInvalidationMessages ()
#7  0x0000000000472993 in CommandCounterIncrement ()
#8  0x00000000005342ea in ?? ()
#9  0x0000000000534543 in SPI_execute_plan ()
#10 0x00002ad2863f0148 in ?? () from /usr/lib64/pgsql/plpgsql.so
#11 0x00002ad2863f1a26 in ?? () from /usr/lib64/pgsql/plpgsql.so
#12 0x00002ad2863f3372 in ?? () from /usr/lib64/pgsql/plpgsql.so
#13 0x00002ad2863f3ce5 in plpgsql_exec_function () from
/usr/lib64/pgsql/plpgsql.so
#14 0x00002ad2863ea7be in plpgsql_call_handler () from
/usr/lib64/pgsql/plpgsql.so
#15 0x000000000052023e in ExecMakeFunctionResult ()
#16 0x000000000051d1f3 in ExecProject ()
#17 0x000000000052df13 in ExecResult ()
#18 0x000000000051cc66 in ExecProcNode ()
#19 0x000000000051bedf in ExecutorRun ()
#20 0x00000000005b1481 in ?? ()
#21 0x00000000005b2689 in PortalRun ()
#22 0x00000000005ae3b0 in ?? ()
#23 0x00000000005af038 in PostgresMain ()
#24 0x00000000005856a7 in ?? ()
#25 0x000000000058632b in PostmasterMain ()
#26 0x000000000053eece in main ()


2010-04-29 | core.20832
-----------------------
Core was generated by `postgres: <user> <database_1> <client ip>('.
Program terminated with signal 11, Segmentation fault.
[New process 20832]
#0  0x000000000066acae in pfree ()
(gdb) bt
#0  0x000000000066acae in pfree ()
#1  0x0000000000648c6e in ?? ()
#2  0x0000000000648f34 in ?? ()
#3  0x00000000006493d4 in RelationCacheInvalidateEntry ()
#4  0x0000000000644fcd in ?? ()
#5  0x0000000000644882 in ?? ()
#6  0x00000000006448be in CommandEndInvalidationMessages ()
#7  0x0000000000472993 in CommandCounterIncrement ()
#8  0x00000000005342ea in ?? ()
#9  0x0000000000534543 in SPI_execute_plan ()
#10 0x00002b41879e1148 in ?? () from /usr/lib64/pgsql/plpgsql.so
#11 0x00002b41879e2a26 in ?? () from /usr/lib64/pgsql/plpgsql.so
#12 0x00002b41879e4372 in ?? () from /usr/lib64/pgsql/plpgsql.so
#13 0x00002b41879e4ce5 in plpgsql_exec_function () from
/usr/lib64/pgsql/plpgsql.so
#14 0x00002b41879db7be in plpgsql_call_handler () from
/usr/lib64/pgsql/plpgsql.so
#15 0x000000000052023e in ExecMakeFunctionResult ()
#16 0x000000000051d1f3 in ExecProject ()
#17 0x000000000052df13 in ExecResult ()
#18 0x000000000051cc66 in ExecProcNode ()
#19 0x000000000051bedf in ExecutorRun ()
#20 0x00000000005b1481 in ?? ()
#21 0x00000000005b2689 in PortalRun ()
#22 0x00000000005ae3b0 in ?? ()
#23 0x00000000005af038 in PostgresMain ()
#24 0x00000000005856a7 in ?? ()
#25 0x000000000058632b in PostmasterMain ()
#26 0x000000000053eece in main ()


2010-04-27 | core.25421
-----------------------
Core was generated by `postgres: <user> <database_1> <client ip>('.
Program terminated with signal 11, Segmentation fault.
[New process 25421]
#0  0x000000000066acae in pfree ()
(gdb) bt
#0  0x000000000066acae in pfree ()
#1  0x0000000000648c6e in ?? ()
#2  0x0000000000648f34 in ?? ()
#3  0x00000000006493d4 in RelationCacheInvalidateEntry ()
#4  0x0000000000644fcd in ?? ()
#5  0x0000000000644882 in ?? ()
#6  0x00000000006448be in CommandEndInvalidationMessages ()
#7  0x0000000000472993 in CommandCounterIncrement ()
#8  0x00000000005342ea in ?? ()
#9  0x0000000000534543 in SPI_execute_plan ()
#10 0x00002b41879e1148 in ?? () from /usr/lib64/pgsql/plpgsql.so
#11 0x00002b41879e2a26 in ?? () from /usr/lib64/pgsql/plpgsql.so
#12 0x00002b41879e4372 in ?? () from /usr/lib64/pgsql/plpgsql.so
#13 0x00002b41879e4ce5 in plpgsql_exec_function () from
/usr/lib64/pgsql/plpgsql.so
#14 0x00002b41879db7be in plpgsql_call_handler () from
/usr/lib64/pgsql/plpgsql.so
#15 0x000000000052023e in ExecMakeFunctionResult ()
#16 0x000000000051d1f3 in ExecProject ()
#17 0x000000000052df13 in ExecResult ()
#18 0x000000000051cc66 in ExecProcNode ()
#19 0x000000000051bedf in ExecutorRun ()
#20 0x00000000005b1481 in ?? ()
#21 0x00000000005b2689 in PortalRun ()
#22 0x00000000005ae3b0 in ?? ()
#23 0x00000000005af038 in PostgresMain ()
#24 0x00000000005856a7 in ?? ()
#25 0x000000000058632b in PostmasterMain ()
#26 0x000000000053eece in main ()


2010-04-24 | core.23631
-----------------------
Core was generated by `postgres: <user> <database_2> <client ip>('.
Program terminated with signal 11, Segmentation fault.
[New process 23631]
#0  0x000000000066acae in pfree ()
(gdb) bt
#0  0x000000000066acae in pfree ()
#1  0x0000000000648c6e in ?? ()
#2  0x0000000000648f34 in ?? ()
#3  0x00000000006493d4 in RelationCacheInvalidateEntry ()
#4  0x0000000000644fcd in ?? ()
#5  0x0000000000644882 in ?? ()
#6  0x00000000006448be in CommandEndInvalidationMessages ()
#7  0x0000000000472993 in CommandCounterIncrement ()
#8  0x00000000005342ea in ?? ()
#9  0x0000000000534543 in SPI_execute_plan ()
#10 0x00002b41879a0148 in ?? () from /usr/lib64/pgsql/plpgsql.so
#11 0x00002b41879a1a26 in ?? () from /usr/lib64/pgsql/plpgsql.so
#12 0x00002b41879a3372 in ?? () from /usr/lib64/pgsql/plpgsql.so
#13 0x00002b41879a3ce5 in plpgsql_exec_function () from
/usr/lib64/pgsql/plpgsql.so
#14 0x00002b418799a7be in plpgsql_call_handler () from
/usr/lib64/pgsql/plpgsql.so
#15 0x000000000052023e in ExecMakeFunctionResult ()
#16 0x000000000051d1f3 in ExecProject ()
#17 0x000000000052df13 in ExecResult ()
#18 0x000000000051cc66 in ExecProcNode ()
#19 0x000000000051bedf in ExecutorRun ()
#20 0x00000000005b1481 in ?? ()
#21 0x00000000005b2689 in PortalRun ()
#22 0x00000000005ae3b0 in ?? ()
#23 0x00000000005af038 in PostgresMain ()
#24 0x00000000005856a7 in ?? ()
#25 0x000000000058632b in PostmasterMain ()
#26 0x000000000053eece in main ()


2010-04-23 | core.9419
-----------------------
Core was generated by `postgres: <user> <database_1> <client ip>('.
Program terminated with signal 11, Segmentation fault.
[New process 9419]
#0  0x000000000066acae in pfree ()
(gdb) bt
#0  0x000000000066acae in pfree ()
#1  0x0000000000648c6e in ?? ()
#2  0x0000000000648f34 in ?? ()
#3  0x00000000006493d4 in RelationCacheInvalidateEntry ()
#4  0x0000000000644fcd in ?? ()
#5  0x0000000000644882 in ?? ()
#6  0x00000000006448be in CommandEndInvalidationMessages ()
#7  0x0000000000472993 in CommandCounterIncrement ()
#8  0x00000000005342ea in ?? ()
#9  0x0000000000534543 in SPI_execute_plan ()
#10 0x00002b3acaef4148 in ?? () from /usr/lib64/pgsql/plpgsql.so
#11 0x00002b3acaef5a26 in ?? () from /usr/lib64/pgsql/plpgsql.so
#12 0x00002b3acaef7372 in ?? () from /usr/lib64/pgsql/plpgsql.so
#13 0x00002b3acaef7ce5 in plpgsql_exec_function () from
/usr/lib64/pgsql/plpgsql.so
#14 0x00002b3acaeee7be in plpgsql_call_handler () from
/usr/lib64/pgsql/plpgsql.so
#15 0x000000000052023e in ExecMakeFunctionResult ()
#16 0x000000000051d1f3 in ExecProject ()
#17 0x000000000052df13 in ExecResult ()
#18 0x000000000051cc66 in ExecProcNode ()
#19 0x000000000051bedf in ExecutorRun ()
#20 0x00000000005b1481 in ?? ()
#21 0x00000000005b2689 in PortalRun ()
#22 0x00000000005ae3b0 in ?? ()
#23 0x00000000005af038 in PostgresMain ()
#24 0x00000000005856a7 in ?? ()
#25 0x000000000058632b in PostmasterMain ()
#26 0x000000000053eece in main ()


2010-04-22 | core.16801
-----------------------
Core was generated by `postgres: <user> <database_2> <client ip>('.
Program terminated with signal 11, Segmentation fault.
[New process 16801]
#0  0x000000000066acae in pfree ()
(gdb) bt
#0  0x000000000066acae in pfree ()
#1  0x0000000000648c6e in ?? ()
#2  0x0000000000648f34 in ?? ()
#3  0x00000000006493d4 in RelationCacheInvalidateEntry ()
#4  0x0000000000644fcd in ?? ()
#5  0x0000000000644882 in ?? ()
#6  0x00000000006448be in CommandEndInvalidationMessages ()
#7  0x0000000000472993 in CommandCounterIncrement ()
#8  0x00000000005342ea in ?? ()
#9  0x0000000000534543 in SPI_execute_plan ()
#10 0x00002b3acaeb3148 in ?? () from /usr/lib64/pgsql/plpgsql.so
#11 0x00002b3acaeb4a26 in ?? () from /usr/lib64/pgsql/plpgsql.so
#12 0x00002b3acaeb6372 in ?? () from /usr/lib64/pgsql/plpgsql.so
#13 0x00002b3acaeb6ce5 in plpgsql_exec_function () from
/usr/lib64/pgsql/plpgsql.so
#14 0x00002b3acaead7be in plpgsql_call_handler () from
/usr/lib64/pgsql/plpgsql.so
#15 0x000000000052023e in ExecMakeFunctionResult ()
#16 0x000000000051d1f3 in ExecProject ()
#17 0x000000000052df13 in ExecResult ()
#18 0x000000000051cc66 in ExecProcNode ()
#19 0x000000000051bedf in ExecutorRun ()
#20 0x00000000005b1481 in ?? ()
#21 0x00000000005b2689 in PortalRun ()
#22 0x00000000005ae3b0 in ?? ()
#23 0x00000000005af038 in PostgresMain ()
#24 0x00000000005856a7 in ?? ()
#25 0x000000000058632b in PostmasterMain ()
#26 0x000000000053eece in main ()


2010-04-15 | core.32242
-----------------------
Core was generated by `postgres: <user> <database_3> <client ip>('.
Program terminated with signal 11, Segmentation fault.
[New process 32242]
#0  0x000000000066acae in pfree ()
(gdb) bt
#0  0x000000000066acae in pfree ()
#1  0x0000000000648c6e in ?? ()
#2  0x0000000000648f34 in ?? ()
#3  0x00000000006493d4 in RelationCacheInvalidateEntry ()
#4  0x0000000000644fcd in ?? ()
#5  0x0000000000644882 in ?? ()
#6  0x00000000006448be in CommandEndInvalidationMessages ()
#7  0x0000000000472993 in CommandCounterIncrement ()
#8  0x0000000000525c25 in fmgr_sql ()
#9  0x000000000052023e in ExecMakeFunctionResult ()
#10 0x000000000051d1f3 in ExecProject ()
#11 0x000000000052df13 in ExecResult ()
#12 0x000000000051cc66 in ExecProcNode ()
#13 0x000000000051bedf in ExecutorRun ()
#14 0x00000000005b1481 in ?? ()
#15 0x00000000005b2689 in PortalRun ()
#16 0x00000000005ae3b0 in ?? ()
#17 0x00000000005af038 in PostgresMain ()
#18 0x00000000005856a7 in ?? ()
#19 0x000000000058632b in PostmasterMain ()
#20 0x000000000053eece in main ()


2010-04-14 | core.10776
-----------------------
Core was generated by `postgres: <user> <database_1> <client ip>('.
Program terminated with signal 11, Segmentation fault.
[New process 10776]
#0  0x000000000066acae in pfree ()
(gdb) bt
#0  0x000000000066acae in pfree ()
#1  0x0000000000648c6e in ?? ()
#2  0x0000000000648f34 in ?? ()
#3  0x00000000006493d4 in RelationCacheInvalidateEntry ()
#4  0x0000000000644fcd in ?? ()
#5  0x0000000000644882 in ?? ()
#6  0x00000000006448be in CommandEndInvalidationMessages ()
#7  0x0000000000472993 in CommandCounterIncrement ()
#8  0x00000000005342ea in ?? ()
#9  0x0000000000534543 in SPI_execute_plan ()
#10 0x00002b3acaeb3148 in ?? () from /usr/lib64/pgsql/plpgsql.so
#11 0x00002b3acaeb4a26 in ?? () from /usr/lib64/pgsql/plpgsql.so
#12 0x00002b3acaeb6372 in ?? () from /usr/lib64/pgsql/plpgsql.so
#13 0x00002b3acaeb6ce5 in plpgsql_exec_function () from
/usr/lib64/pgsql/plpgsql.so
#14 0x00002b3acaead7be in plpgsql_call_handler () from
/usr/lib64/pgsql/plpgsql.so
#15 0x000000000052023e in ExecMakeFunctionResult ()
#16 0x000000000051d1f3 in ExecProject ()
#17 0x000000000052df13 in ExecResult ()
#18 0x000000000051cc66 in ExecProcNode ()
#19 0x000000000051bedf in ExecutorRun ()
#20 0x00000000005b1481 in ?? ()
#21 0x00000000005b2689 in PortalRun ()
#22 0x00000000005ae3b0 in ?? ()
#23 0x00000000005af038 in PostgresMain ()
#24 0x00000000005856a7 in ?? ()
#25 0x000000000058632b in PostmasterMain ()
#26 0x000000000053eece in main ()

pgsql-general by date:

Previous
From: Andre Lopes
Date:
Subject: How to exit/abort from a function that returns VOID?
Next
From: Adrian Klaver
Date:
Subject: Re: How to exit/abort from a function that returns VOID?