Elusive segfault with 9.3.5 & query cancel - Mailing list pgsql-hackers

From Josh Berkus
Subject Elusive segfault with 9.3.5 & query cancel
Date
Msg-id 54821B9A.4060403@agliodbs.com
Whole thread Raw
List pgsql-hackers
Hackers,

This is not a complete enough report for a diagnosis.  I'm posting it
here just in case someone else sees something like it, and having an
additional report will help figure out the underlying issue.

* 700GB database with around 5,000 writes per second
* 8 replicas handling around 10,000 read queries per second each
* replicas are slammed (40-70% utilization)
* replication produces lots of replication query cancels

In this scenario, a specific query against some of the less busy and
fairly small tables would produce a segfault (signal 11) once every 1-4
days randomly.  This query could have 100's of successful runs for every
segfault. This was not reproduceable manually, and the segfaults never
happened on the master.  Nor did we ever see a segfault based on any
other query, including against the tables which were generally the
source of the query cancels.

In case it's relevant, the query included use of regexp_split_to_array()
and ORDER BY random(), neither of which are generally used in the user's
other queries.

We made some changes which decreased query cancel (optimizing queries,
turning on hot_standby_feedback) and we haven't seen a segfault since
then.  As far as the user is concerned, this solves the problem, so I'm
never going to get a trace or a core dump file.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Testing DDL deparsing support
Next
From: Josh Berkus
Date:
Subject: Re: Elusive segfault with 9.3.5 & query cancel