Thread: segmentation fault postgres 9.3.5 core dump perlu related ?
Update/Information sharing on my pursuit of segmentation faults
FreeBSD 10.0-RELEASE-p12 amd64
Postgres version 9.3.5
Below are three postgres core files generated from two different machine ( Georgia and Alabama ) on Feb 11.
These cores would not be caused from an environment update issue that I last suspected might be causing the segfaults
So I am kind of back to square one in terms of thinking what is occurring.
? I am not sure that I understand the associated time events in the postgres log file output. Is this whatever happens to be running on the other postgress forked process when the cored process was detected ?
If this is the case then I have probably been reading to much from the content of the postgres log file at the time of core.
This probably just represents collateral damage of routine transactions that were in other forked processes at the time one of the processes cored ?
Therefore I would now just assert that postgres has a sporadic segmentation problem, no known way to reliably cause it
and am uncertain as to how to proceed to resolve it.
Georgia 8:38
Georgia 17:55
Alabama: 15:30
--
If someone sees something suggesting a direction to pursue from these core file back traces much appreciated.
Thanks
Dave
Georgia - Core 17:55 – Feb 11
(gdb) bt
#0 0x00000000006f8670 in SearchCatCache ()
#1 0x0000000000672537 in enum_in ()
#2 0x000000000071375b in InputFunctionCall ()
#3 0x0000000000713b7e in OidInputFunctionCall ()
#4 0x0000000000509a3d in coerce_type ()
#5 0x0000000000511af3 in make_fn_arguments ()
#6 0x0000000000513fed in make_op ()
#7 0x000000000050f53b in ?? ()
#8 0x000000000050d706 in transformExpr ()
#9 0x0000000000518333 in transformTargetList ()
#10 0x00000000004f02bc in transformStmt ()
#11 0x000000000064109d in pg_analyze_and_rewrite_params ()
#12 0x00000000006fbc6b in ?? ()
#13 0x00000000006fb6f5 in GetCachedPlan ()
#14 0x000000000059597a in SPI_plan_get_cached_plan ()
#15 0x00000008024ed34d in ?? () from /usr/local/lib/postgresql/plpgsql.so
#16 0x00000008024f2590 in ?? () from /usr/local/lib/postgresql/plpgsql.so
#17 0x00000008024ee0d0 in ?? () from /usr/local/lib/postgresql/plpgsql.so
#18 0x00000008024eaf3b in ?? () from /usr/local/lib/postgresql/plpgsql.so
#19 0x00000008024ea243 in plpgsql_exec_function () from /usr/local/lib/postgresql/plpgsql.so
#20 0x00000008024e6551 in plpgsql_call_handler () from /usr/local/lib/postgresql/plpgsql.so
#21 0x000000000057611f in ExecMakeTableFunctionResult ()
#22 0x000000000058b6c7 in ?? ()
#23 0x000000000057bab2 in ExecScan ()
#24 0x00000000005756b8 in ExecProcNode ()
#25 0x0000000000573630 in standard_ExecutorRun ()
#26 0x0000000000645b0a in ?? ()
#27 0x0000000000645719 in PortalRun ()
#28 0x00000000006438ea in PostgresMain ()
#29 0x00000000005ff267 in PostmasterMain ()
#30 0x00000000005a31ba in main ()
(gdb) info threads
Id Target Id Frame
* 2 Thread 802c06400 (LWP 100070) 0x00000000006f8670 in SearchCatCache ()
* 1 Thread 802c06400 (LWP 100070) 0x00000000006f8670 in SearchCatCache ()
? The gdb info threads response is still an annoying piece of information. Connecting gdb to a healthy running postmaster gives the same thread count as the core file. (2)
However, other system system tools (top ps ) which indicate number of threads for the process only indicate one thread on the healty process. So I think this is a debugger bug.
2015-02-11T17:55:13.732147-05:00 georgia local0 info postgres[38321]: [7236-1] user=ace_db_client, db=ace_db, proc=38321, audit=dbm_client9, LOG: du
ration: 4.384 ms statement: COMMIT
2015-02-11T17:55:13.743399-05:00 georgia local0 info postgres[86738]: [12-1] user=redcom, db=ace_db, proc=86738, audit=[unknown], LOG: duration: 14.
581 ms statement: SELECT database, COALESCE(max(extract(epoch FROM CURRENT_TIMESTAMP-prepared)),0) FROM pg_prepared_xacts JOIN pg_database ON datnam
e=database WHERE datname='ace_db' GROUP BY database ORDER BY 1
2015-02-11T17:55:13.833624-05:00 georgia local0 info postgres[1018]: [11-1] user=, db=, proc=1018, audit=, LOG: server process (PID 38319) was termi
nated by signal 11: Segmentation fault
2015-02-11T17:55:13.833669-05:00 georgia local0 info postgres[1018]: [11-2] user=, db=, proc=1018, audit=, DETAIL: Failed process was running: SELEC
T * FROM cc.register_port_sip_user($1, $2, $3, $4, $5, $6, $7, $8, $9, $10 )
2015-02-11T17:55:13.833701-05:00 georgia local0 info postgres[1018]: [12-1] user=, db=, proc=1018, audit=, LOG: terminating any other active server
processes
2015-02-11T17:55:13.833896-05:00 georgia local0 notice postgres[38321]: [7237-1] user=ace_db_client, db=ace_db, proc=38321, audit=dbm_client9, WARNIN
G: terminating connection because of crash of another server process
2015-02-11T17:55:13.833923-05:00 georgia local0 notice postgres[38321]: [7237-2] user=ace_db_client, db=ace_db, proc=38321, audit=dbm_client9, DETAIL
: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally a
nd possibly corrupted shared memory.
2015-02
Georgia-Core 8:38 - Feb 11
[New process 101032]
[New Thread 802c06400 (LWP 101032)]
Core was generated by `postgres'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
(gdb) bt
#0 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#1 0x000000080c4cab49 in Perl_sv_clear () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#2 0x000000080c4cb13a in Perl_sv_free2 () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#3 0x000000080c4e5102 in Perl_free_tmps () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#4 0x000000080bcfedea in plperl_destroy_interp () from /usr/local/lib/postgresql/plperl.so
#5 0x000000080bcfec05 in plperl_fini () from /usr/local/lib/postgresql/plperl.so
#6 0x00000000006292c6 in ?? ()
#7 0x000000000062918d in proc_exit ()
#8 0x00000000006443f3 in PostgresMain ()
#9 0x00000000005ff267 in PostmasterMain ()
#10 0x00000000005a31ba in main ()
(gdb) info threads
Id Target Id Frame
* 2 Thread 802c06400 (LWP 101032) 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
* 1 Thread 802c06400 (LWP 101032) 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
Postgres.log content
ation: 0.087 ms statement: UNLISTEN "tbl_changed"
2015-02-11T08:38:48.227368-05:00 georgia local0 info postgres[27177]: [1276-1] user=ace_db_client, db=ace_db, proc=27177, audit=dbm_client6, LOG: du
ration: 0.152 ms statement: UNLISTEN "tbl_changed"
2015-02-11T08:38:48.246438-05:00 georgia local0 info postgres[27176]: [1262-1] user=ace_db_client, db=ace_db, proc=27176, audit=dbm_client8, LOG: du
ration: 0.155 ms statement: UNLISTEN "tbl_changed"
2015-02-11T08:38:48.576282-05:00 georgia local0 info postgres[27174]: [388-1] user=ace_db_client, db=ace_db, proc=27174, audit=dbm_client2, LOG: dur
ation: 0.094 ms statement: UNLISTEN "tbl_changed"
2015-02-11T08:38:49.754208-05:00 georgia local0 info postgres[1018]: [7-1] user=, db=, proc=1018, audit=, LOG: server process (PID 27172) was termin
ated by signal 11: Segmentation fault
2015-02-11T08:38:49.754236-05:00 georgia local0 info postgres[1018]: [8-1] user=, db=, proc=1018, audit=, LOG: terminating any other active server p
rocesses
2015-02-11T08:38:49.763667-05:00 georgia local0 notice postgres[19938]: [7-1] user=, db=, proc=19938, audit=, WARNING: terminating connection becaus
e of crash of another server process
2015-02-11T08:38:49.763693-05:00 georgia local0 notice postgres[19938]: [7-2] user=, db=, proc=19938, audit=, DETAIL: The postmaster has commanded t
his server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memo
ry.
2015-02-11T08:38:49.763711-05:00 georgia local0 notice postgres[19938]: [7-3] user=, db=, proc=19938, audit=, HINT: In a moment you should be able t
o reconnect to the database and repeat your command.
2015-02-11T08:38:49.769432-05:00 georgia local0 notice postgres[20073]: [9-1] user=redcom, db=ace_db, proc=20073, audit=[unknown], WARNING: terminat
ing connection because of crash of another server process
2015-02-11T08:38:49.769657-05:00 georgia local0 notice postgres[20073]: [9-2] user=redcom, db=ace_db, proc=20073, audit=[unknown], DETAIL: The postm
aster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly
corrupted shared memo
Alabama – 15:30 Feb 11
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000801da7883 in ?? () from /lib/libc.so.7
(gdb) bt
#0 0x0000000801da7883 in ?? () from /lib/libc.so.7
#1 0x0000000801da943b in ?? () from /lib/libc.so.7
#2 0x0000000801db457c in free () from /lib/libc.so.7
#3 0x000000000072b739 in ?? ()
#4 0x000000000072b9bd in MemoryContextDelete ()
#5 0x00000000006fbc17 in ?? ()
#6 0x00000000006fb6f5 in GetCachedPlan ()
#7 0x0000000000594eec in ?? ()
#8 0x00000008024ee8c5 in ?? () from /usr/local/lib/postgresql/plpgsql.so
#9 0x00000008024ef3e5 in ?? () from /usr/local/lib/postgresql/plpgsql.so
#10 0x00000008024ebf3b in ?? () from /usr/local/lib/postgresql/plpgsql.so
#11 0x00000008024eb243 in plpgsql_exec_function () from /usr/local/lib/postgresql/plpgsql.so
#12 0x00000008024e7551 in plpgsql_call_handler () from /usr/local/lib/postgresql/plpgsql.so
#13 0x000000000057611f in ExecMakeTableFunctionResult ()
#14 0x000000000058b6c7 in ?? ()
#15 0x000000000057bab2 in ExecScan ()
#16 0x00000000005756b8 in ExecProcNode ()
#17 0x0000000000573630 in standard_ExecutorRun ()
#18 0x0000000000645b0a in ?? ()
#19 0x0000000000645719 in PortalRun ()
#20 0x00000000006438ea in PostgresMain ()
#21 0x00000000005ff267 in PostmasterMain ()
#22 0x00000000005a31ba in main ()
(gdb) info threads
Id Target Id Frame
* 2 Thread 802c06400 (LWP 100574) 0x0000000801da7883 in ?? () from /lib/libc.so.7
* 1 Thread 802c06400 (LWP 100574) 0x0000000801da7883 in ?? () from /lib/libc.so.7
2015-02-11T15:16:19.029980-05:00 alabama local0 warning postgres[1980]: [7-6] #011
2015-02-11T15:16:19.029989-05:00 alabama local0 warning postgres[1980]: [7-7] #011
2015-02-11T15:16:19.030000-05:00 alabama local0 warning postgres[1980]: [7-8] #011
2015-02-11T15:30:44.991096-05:00 alabama local0 info postgres[54202]: [3-1] user=, db=, proc=54202, audit=, LOG: server process (PID 87242) was
terminated by signal 11: Segmentation fault
2015-02-11T15:30:44.991122-05:00 alabama local0 info postgres[54202]: [3-2] user=, db=, proc=54202, audit=, DETAIL: Failed process was running:
SELECT * FROM cc.get_port_and_registration_data($1, $2, $3, $4, $5)
2015-02-11T15:30:44.991175-05:00 alabama local0 info postgres[54202]: [4-1] user=, db=, proc=54202, audit=, LOG: terminating any other active se
rver processes
2015-02-11T15:30:45.004506-05:00 alabama local0 notice postgres[87241]: [3-1] user=ace_db_client, db=ace_db, proc=87241, audit=dbm_client5, WARNI
NG: terminating connection because of crash of another server process
2015-02-11T15:30:45.004567-05:00 alabama local0 notice postgres[87241]: [3-2] user=ace_db_client, db=ace_db, proc=87241, audit=dbm_client5, DETAI
L: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnorma
lly and possibly corrupted shared memory.
2015-02-11T15:30:45.129123-05:00 alabama local0 notice postgres[87241]: [3-3] user=ace_db_client, db=ace_db, proc=87241, audit=dbm_client5, HINT:
In a moment you should be able to reconnect to the database and repeat your command.
2015-02-11T15:30:45.129437-05:00 alabama local0 notice postgres[87238]: [3-1] user=ace_db_client, db=ace_db, proc=87238, audit=dbm_client2, WARNI
NG: terminating connection because of crash of another server process
On Feb 12, 2015, at 3:21 PM, Day, David <dday@redcom.com> wrote:Update/Information sharing on my pursuit of segmentation faultsFreeBSD 10.0-RELEASE-p12 amd64Postgres version 9.3.5Below are three postgres core files generated from two different machine ( Georgia and Alabama ) on Feb 11.These cores would not be caused from an environment update issue that I last suspected might be causing the segfaultsSo I am kind of back to square one in terms of thinking what is occurring.? I am not sure that I understand the associated time events in the postgres log file output. Is this whatever happens to be running on the other postgress forked process when the cored process was detected ?If this is the case then I have probably been reading to much from the content of the postgres log file at the time of core.This probably just represents collateral damage of routine transactions that were in other forked processes at the time one of the processes cored ?Therefore I would now just assert that postgres has a sporadic segmentation problem, no known way to reliably cause itand am uncertain as to how to proceed to resolve it.
Georgia-Core 8:38 - Feb 11[New process 101032][New Thread 802c06400 (LWP 101032)]Core was generated by `postgres'.Program terminated with signal SIGSEGV, Segmentation fault.#0 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18(gdb) bt#0 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18#1 0x000000080c4cab49 in Perl_sv_clear () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18#2 0x000000080c4cb13a in Perl_sv_free2 () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18#3 0x000000080c4e5102 in Perl_free_tmps () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18#4 0x000000080bcfedea in plperl_destroy_interp () from /usr/local/lib/postgresql/plperl.so#5 0x000000080bcfec05 in plperl_fini () from /usr/local/lib/postgresql/plperl.so#6 0x00000000006292c6 in ?? ()#7 0x000000000062918d in proc_exit ()#8 0x00000000006443f3 in PostgresMain ()#9 0x00000000005ff267 in PostmasterMain ()#10 0x00000000005a31ba in main ()(gdb) info threadsId Target Id Frame* 2 Thread 802c06400 (LWP 101032) 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18* 1 Thread 802c06400 (LWP 101032) 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
Guy,
No I had not seen that bug report before. ( https://rt.perl.org/Public/Bug/Display.html?id=122199 )
We did migrate from FreeBSD 9.x (2?) and I think it true
that we were not experiencing the problem at time.
So it might be a good fit/explanation for our current experience
There were a couple of suggestions to follow up on.
I’ll keep the thread updated.
Thanks, a good start to my Friday the 13th.
Regards
Dave Day
From: Guy Helmer [mailto:ghelmer@palisadesystems.com]
Sent: Thursday, February 12, 2015 6:19 PM
To: Day, David
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?
On Feb 12, 2015, at 3:21 PM, Day, David <dday@redcom.com> wrote:
Update/Information sharing on my pursuit of segmentation faults
FreeBSD 10.0-RELEASE-p12 amd64
Postgres version 9.3.5
Below are three postgres core files generated from two different machine ( Georgia and Alabama ) on Feb 11.
These cores would not be caused from an environment update issue that I last suspected might be causing the segfaults
So I am kind of back to square one in terms of thinking what is occurring.
? I am not sure that I understand the associated time events in the postgres log file output. Is this whatever happens to be running on the other postgress forked process when the cored process was detected ?
If this is the case then I have probably been reading to much from the content of the postgres log file at the time of core.
This probably just represents collateral damage of routine transactions that were in other forked processes at the time one of the processes cored ?
Therefore I would now just assert that postgres has a sporadic segmentation problem, no known way to reliably cause it
and am uncertain as to how to proceed to resolve it.
. . .
Georgia-Core 8:38 - Feb 11
[New process 101032]
[New Thread 802c06400 (LWP 101032)]
Core was generated by `postgres'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
(gdb) bt
#0 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#1 0x000000080c4cab49 in Perl_sv_clear () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#2 0x000000080c4cb13a in Perl_sv_free2 () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#3 0x000000080c4e5102 in Perl_free_tmps () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#4 0x000000080bcfedea in plperl_destroy_interp () from /usr/local/lib/postgresql/plperl.so
#5 0x000000080bcfec05 in plperl_fini () from /usr/local/lib/postgresql/plperl.so
#6 0x00000000006292c6 in ?? ()
#7 0x000000000062918d in proc_exit ()
#8 0x00000000006443f3 in PostgresMain ()
#9 0x00000000005ff267 in PostmasterMain ()
#10 0x00000000005a31ba in main ()
(gdb) info threads
Id Target Id Frame
* 2 Thread 802c06400 (LWP 101032) 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
* 1 Thread 802c06400 (LWP 101032) 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
Given two of the coredumps are in down in libperl and this is FreeBSD 10.0 amd64, have you seen this?
Michael Moll suggested trying setting vm.pmap.pcid_enabled to 0 but I don’t recall seeing if that helped.
Guy
Update/Information sharing: ( FreeBSD 10.0 (amd64) – Postgres 9.3.5 – Perl 5.18 )
I have converted our Postgres plperlu functions to plpython2u to see if the postgres segmentation faults disappear.
Lacking a known way to reproduce the error on demand, I will have to wait a few weeks for the absence of the symptom before I might conclude that this bug reported to me by Guy Helmer was the issue. Migration/Upgrade to FreeBsd 10.1 was not an immediate option.
Regards
Dave
----
Guy,
No I had not seen that bug report before. ( https://rt.perl.org/Public/Bug/Display.html?id=122199 )
We did migrate from FreeBSD 9.x (2?) and I think it true
that we were not experiencing the problem at time.
So it might be a good fit/explanation for our current experience
There were a couple of suggestions to follow up on.
I’ll keep the thread updated.
Thanks, a good start to my Friday the 13th.
Regards
Dave Day
From: Guy Helmer [mailto:ghelmer@palisadesystems.com]
Sent: Thursday, February 12, 2015 6:19 PM
To: Day, David
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?
On Feb 12, 2015, at 3:21 PM, Day, David <dday@redcom.com> wrote:
Update/Information sharing on my pursuit of segmentation faults
FreeBSD 10.0-RELEASE-p12 amd64
Postgres version 9.3.5
Below are three postgres core files generated from two different machine ( Georgia and Alabama ) on Feb 11.
These cores would not be caused from an environment update issue that I last suspected might be causing the segfaults
So I am kind of back to square one in terms of thinking what is occurring.
? I am not sure that I understand the associated time events in the postgres log file output. Is this whatever happens to be running on the other postgress forked process when the cored process was detected ?
If this is the case then I have probably been reading to much from the content of the postgres log file at the time of core.
This probably just represents collateral damage of routine transactions that were in other forked processes at the time one of the processes cored ?
Therefore I would now just assert that postgres has a sporadic segmentation problem, no known way to reliably cause it
and am uncertain as to how to proceed to resolve it.
. . .
Georgia-Core 8:38 - Feb 11
[New process 101032]
[New Thread 802c06400 (LWP 101032)]
Core was generated by `postgres'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
(gdb) bt
#0 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#1 0x000000080c4cab49 in Perl_sv_clear () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#2 0x000000080c4cb13a in Perl_sv_free2 () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#3 0x000000080c4e5102 in Perl_free_tmps () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#4 0x000000080bcfedea in plperl_destroy_interp () from /usr/local/lib/postgresql/plperl.so
#5 0x000000080bcfec05 in plperl_fini () from /usr/local/lib/postgresql/plperl.so
#6 0x00000000006292c6 in ?? ()
#7 0x000000000062918d in proc_exit ()
#8 0x00000000006443f3 in PostgresMain ()
#9 0x00000000005ff267 in PostmasterMain ()
#10 0x00000000005a31ba in main ()
(gdb) info threads
Id Target Id Frame
* 2 Thread 802c06400 (LWP 101032) 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
* 1 Thread 802c06400 (LWP 101032) 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
Given two of the coredumps are in down in libperl and this is FreeBSD 10.0 amd64, have you seen this?
Michael Moll suggested trying setting vm.pmap.pcid_enabled to 0 but I don’t recall seeing if that helped.
Guy
Hi,
Update : A storey with a happy ending.
I have not seen this segmentation fault since converting the pgperl functions to python within the
FreeBSD 9.x environment. So I believe Guy Helmer’s suggested causation was likely spot on.
Due to the inability to reproduce the issue on demand there is a small chance this is not the root cause, but I’ll let the current empirical health of the system speak loudest on this matter.
We are in the process of migrating development efforts to 10.x so selection of perl over python should become a non-issue.
Thanks all who assisted me in figuring this out.
Best Regards
Dave
From: Day, David
Sent: Wednesday, February 18, 2015 8:07 AM
To: 'Guy Helmer'
Cc: 'pgsql-general@postgresql.org'
Subject: RE: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?
Update/Information sharing: ( FreeBSD 10.0 (amd64) – Postgres 9.3.5 – Perl 5.18 )
I have converted our Postgres plperlu functions to plpython2u to see if the postgres segmentation faults disappear.
Lacking a known way to reproduce the error on demand, I will have to wait a few weeks for the absence of the symptom before I might conclude that this bug reported to me by Guy Helmer was the issue. Migration/Upgrade to FreeBsd 10.1 was not an immediate option.
Regards
Dave
----
Guy,
No I had not seen that bug report before. ( https://rt.perl.org/Public/Bug/Display.html?id=122199 )
We did migrate from FreeBSD 9.x (2?) and I think it true
that we were not experiencing the problem at time.
So it might be a good fit/explanation for our current experience
There were a couple of suggestions to follow up on.
I’ll keep the thread updated.
Thanks, a good start to my Friday the 13th.
Regards
Dave Day
From: Guy Helmer [mailto:ghelmer@palisadesystems.com]
Sent: Thursday, February 12, 2015 6:19 PM
To: Day, David
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] segmentation fault postgres 9.3.5 core dump perlu related ?
On Feb 12, 2015, at 3:21 PM, Day, David <dday@redcom.com> wrote:
Update/Information sharing on my pursuit of segmentation faults
FreeBSD 10.0-RELEASE-p12 amd64
Postgres version 9.3.5
Below are three postgres core files generated from two different machine ( Georgia and Alabama ) on Feb 11.
These cores would not be caused from an environment update issue that I last suspected might be causing the segfaults
So I am kind of back to square one in terms of thinking what is occurring.
? I am not sure that I understand the associated time events in the postgres log file output. Is this whatever happens to be running on the other postgress forked process when the cored process was detected ?
If this is the case then I have probably been reading to much from the content of the postgres log file at the time of core.
This probably just represents collateral damage of routine transactions that were in other forked processes at the time one of the processes cored ?
Therefore I would now just assert that postgres has a sporadic segmentation problem, no known way to reliably cause it
and am uncertain as to how to proceed to resolve it.
. . .
Georgia-Core 8:38 - Feb 11
[New process 101032]
[New Thread 802c06400 (LWP 101032)]
Core was generated by `postgres'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
(gdb) bt
#0 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#1 0x000000080c4cab49 in Perl_sv_clear () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#2 0x000000080c4cb13a in Perl_sv_free2 () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#3 0x000000080c4e5102 in Perl_free_tmps () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
#4 0x000000080bcfedea in plperl_destroy_interp () from /usr/local/lib/postgresql/plperl.so
#5 0x000000080bcfec05 in plperl_fini () from /usr/local/lib/postgresql/plperl.so
#6 0x00000000006292c6 in ?? ()
#7 0x000000000062918d in proc_exit ()
#8 0x00000000006443f3 in PostgresMain ()
#9 0x00000000005ff267 in PostmasterMain ()
#10 0x00000000005a31ba in main ()
(gdb) info threads
Id Target Id Frame
* 2 Thread 802c06400 (LWP 101032) 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
* 1 Thread 802c06400 (LWP 101032) 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18
Given two of the coredumps are in down in libperl and this is FreeBSD 10.0 amd64, have you seen this?
Michael Moll suggested trying setting vm.pmap.pcid_enabled to 0 but I don’t recall seeing if that helped.
Guy