Segmentation fault with core dump - Mailing list pgsql-general

From Joshua Berry
Subject Segmentation fault with core dump
Date
Msg-id CAPmZXM03MEDEn6nqqf_Phs3M1DK-EaXP5_K-LmirneOJMAQ-Hg@mail.gmail.com
Whole thread Raw
Responses Re: Segmentation fault with core dump
Re: Segmentation fault with core dump
List pgsql-general
Hi Group,

I'm using PG 9.1.9 with a client application using various versions of the pgsqlODBC driver on Windows. Cursors are used heavily, as well as some pretty heavy trigger queries on db writes which update several materialized views. 

The server has 48GB RAM installed, PG is configured for 12GB shared buffers, 8MB max_stack_depth, 32MB temp_buffers, and 2MB work_mem. Most of the other settings are defaults.

The server will seg fault from every few days to up to two weeks. Each time one of the postgres server processes seg faults, the server gets terminated by signal 11, restarts in recovery for up to 30 seconds, after which time it accepts connections as if nothing ever happened. Unfortunately all the open cursors and connections are lost, so the client apps are left in a bad state.

 Seg faults have also occurred with PG 8.4. However that server's DELL OMSA (hardware health monitoring system) began to report RAM parity errors, so I figured that the seg faults were due to hardware issues and I did not configure the system to save core files in order to debug. I migrated the database to a server running PG9.1 with the hopes that the problem would disappear, but it has not. So now I'm starting to debug.

Below are the relevant details. I'm not terribly savvy with gdb, so please let me know what else I could/should examine from the core dump, as well as anything else about the system/configuration.

Kind Regards,
-Joshua

#NB: some info in square brackets has been [redacted]
# grep postmaster /var/log/messages
Apr 10 13:18:32 [hostname] kernel: postmaster[17356]: segfault at 40 ip 0000000000710e2e sp 00007fffd193ca70 error 4 in postgres[400000+4ea000]

gdb /usr/pgsql-9.1/bin/postmaster -c core.17356
[...loading/reading symbols...]
Core was generated by `postgres: [username] [databasename] [client_ipaddress](1500) SELECT              '.
Program terminated with signal 11, Segmentation fault.
#0  ResourceOwnerEnlargeCatCacheRefs (owner=0x0) at resowner.c:605
605             if (owner->ncatrefs < owner->maxcatrefs)
(gdb) q

# uname -a
Linux [hostname] 2.6.32-358.2.1.el6.x86_64 #1 SMP Tue Mar 12 14:18:09 CDT 2013 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/redhat-release
Scientific Linux release 6.3 (Carbon)

# psql -U jberry
psql (9.1.9)
Type "help" for help.

jberry=# select version();
                                                   version
--------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit
(1 row)

pgsql-general by date:

Previous
From: News Subsystem
Date:
Subject: ...
Next
From: Alvaro Herrera
Date:
Subject: Re: Segmentation fault with core dump