Thread: Elusive segfault with 9.3.5 & query cancel

Elusive segfault with 9.3.5 & query cancel

From
Josh Berkus
Date:
Hackers,

This is not a complete enough report for a diagnosis.  I'm posting it
here just in case someone else sees something like it, and having an
additional report will help figure out the underlying issue.

* 700GB database with around 5,000 writes per second
* 8 replicas handling around 10,000 read queries per second each
* replicas are slammed (40-70% utilization)
* replication produces lots of replication query cancels

In this scenario, a specific query against some of the less busy and
fairly small tables would produce a segfault (signal 11) once every 1-4
days randomly.  This query could have 100's of successful runs for every
segfault. This was not reproduceable manually, and the segfaults never
happened on the master.  Nor did we ever see a segfault based on any
other query, including against the tables which were generally the
source of the query cancels.

In case it's relevant, the query included use of regexp_split_to_array()
and ORDER BY random(), neither of which are generally used in the user's
other queries.

We made some changes which decreased query cancel (optimizing queries,
turning on hot_standby_feedback) and we haven't seen a segfault since
then.  As far as the user is concerned, this solves the problem, so I'm
never going to get a trace or a core dump file.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Elusive segfault with 9.3.5 & query cancel

From
Josh Berkus
Date:
On 12/05/2014 12:54 PM, Josh Berkus wrote:
> Hackers,
> 
> This is not a complete enough report for a diagnosis.  I'm posting it
> here just in case someone else sees something like it, and having an
> additional report will help figure out the underlying issue.
> 
> * 700GB database with around 5,000 writes per second
> * 8 replicas handling around 10,000 read queries per second each
> * replicas are slammed (40-70% utilization)
> * replication produces lots of replication query cancels
> 
> In this scenario, a specific query against some of the less busy and
> fairly small tables would produce a segfault (signal 11) once every 1-4
> days randomly.  This query could have 100's of successful runs for every
> segfault. This was not reproduceable manually, and the segfaults never
> happened on the master.  Nor did we ever see a segfault based on any
> other query, including against the tables which were generally the
> source of the query cancels.
> 
> In case it's relevant, the query included use of regexp_split_to_array()
> and ORDER BY random(), neither of which are generally used in the user's
> other queries.
> 
> We made some changes which decreased query cancel (optimizing queries,
> turning on hot_standby_feedback) and we haven't seen a segfault since
> then.  As far as the user is concerned, this solves the problem, so I'm
> never going to get a trace or a core dump file.

Forgot a major piece of evidence as to why I think this is related to
query cancel:  in each case, the segfault was preceeded by a
multi-backend query cancel 3ms to 30ms beforehand.  It is possible that
the backend running the query which segfaulted might have been the only
backend *not* cancelled due to query conflict concurrently.
Contradicting this, there are other multi-backend query cancels in the
logs which do NOT produce a segfault.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Elusive segfault with 9.3.5 & query cancel

From
Peter Geoghegan
Date:
On Fri, Dec 5, 2014 at 1:29 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> We made some changes which decreased query cancel (optimizing queries,
>> turning on hot_standby_feedback) and we haven't seen a segfault since
>> then.  As far as the user is concerned, this solves the problem, so I'm
>> never going to get a trace or a core dump file.
>
> Forgot a major piece of evidence as to why I think this is related to
> query cancel:  in each case, the segfault was preceeded by a
> multi-backend query cancel 3ms to 30ms beforehand.  It is possible that
> the backend running the query which segfaulted might have been the only
> backend *not* cancelled due to query conflict concurrently.
> Contradicting this, there are other multi-backend query cancels in the
> logs which do NOT produce a segfault.

I wonder if it would be useful to add additional instrumentation so
that even without a core dump, there was some cursory information
about the nature of a segfault.

Yes, doing something with a SIGSEGV handler is very scary, and there
are major portability concerns (e.g.
https://bugs.ruby-lang.org/issues/9654), but I believe it can be made
robust on Linux. For what it's worth, this open source project offers
that kind of functionality in the form of a library:
https://github.com/vmarkovtsev/DeathHandler

-- 
Peter Geoghegan



Re: Elusive segfault with 9.3.5 & query cancel

From
Jim Nasby
Date:
On 12/5/14, 4:11 PM, Peter Geoghegan wrote:
> On Fri, Dec 5, 2014 at 1:29 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> We made some changes which decreased query cancel (optimizing queries,
>>> turning on hot_standby_feedback) and we haven't seen a segfault since
>>> then.  As far as the user is concerned, this solves the problem, so I'm
>>> never going to get a trace or a core dump file.
>>
>> Forgot a major piece of evidence as to why I think this is related to
>> query cancel:  in each case, the segfault was preceeded by a
>> multi-backend query cancel 3ms to 30ms beforehand.  It is possible that
>> the backend running the query which segfaulted might have been the only
>> backend *not* cancelled due to query conflict concurrently.
>> Contradicting this, there are other multi-backend query cancels in the
>> logs which do NOT produce a segfault.
>
> I wonder if it would be useful to add additional instrumentation so
> that even without a core dump, there was some cursory information
> about the nature of a segfault.
>
> Yes, doing something with a SIGSEGV handler is very scary, and there
> are major portability concerns (e.g.
> https://bugs.ruby-lang.org/issues/9654), but I believe it can be made
> robust on Linux. For what it's worth, this open source project offers
> that kind of functionality in the form of a library:
> https://github.com/vmarkovtsev/DeathHandler

Perhaps we should also officially recommend production servers be setup to create core files. AFAIK the only downside
isthe time it would take to write a core that's huge because of shared buffers, but perhaps there's some way to avoid
writingthose? (That means the core won't help if the bug is due to something in a buffer, but that seems unlikely
enoughthat the tradeoff is worth it...)
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Elusive segfault with 9.3.5 & query cancel

From
Peter Geoghegan
Date:
On Fri, Dec 5, 2014 at 2:41 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> Perhaps we should also officially recommend production servers be setup to
> create core files. AFAIK the only downside is the time it would take to
> write a core that's huge because of shared buffers

I don't think that's every going to be practical.

-- 
Peter Geoghegan



Re: Elusive segfault with 9.3.5 & query cancel

From
Tom Lane
Date:
Peter Geoghegan <pg@heroku.com> writes:
> On Fri, Dec 5, 2014 at 2:41 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>> Perhaps we should also officially recommend production servers be setup to
>> create core files. AFAIK the only downside is the time it would take to
>> write a core that's huge because of shared buffers

> I don't think that's every going to be practical.

I'm fairly sure that on some distros (Red Hat, at least) there is distro
policy against having daemons produce core dumps by default, for multiple
reasons including possible disk space consumption and leakage of secure
information.  So even if we recommended this, the recommendation would be
overridden by some/many packagers.

There is much to be said though for trying to emit at least a minimal
stack trace into the postmaster log file.  I'm pretty sure glibc has a
function for that; dunno if it's going to be practical on other platforms.
        regards, tom lane



Re: Elusive segfault with 9.3.5 & query cancel

From
Josh Berkus
Date:
On 12/05/2014 02:41 PM, Jim Nasby wrote:
> Perhaps we should also officially recommend production servers be setup
> to create core files. AFAIK the only downside is the time it would take
> to write a core that's huge because of shared buffers, but perhaps
> there's some way to avoid writing those? (That means the core won't help
> if the bug is due to something in a buffer, but that seems unlikely
> enough that the tradeoff is worth it...)

Not practical in a lot of cases.  For example, this user was unwilling
to enable core dumps on the production replicas because writing out the
16GB of shared buffers they had took over 10 minutes in a test.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Elusive segfault with 9.3.5 & query cancel

From
Peter Geoghegan
Date:
On Fri, Dec 5, 2014 at 3:49 PM, Josh Berkus <josh@agliodbs.com> wrote:
> to enable core dumps on the production replicas because writing out the
> 16GB of shared buffers they had took over 10 minutes in a test.

No one ever thinks it'll happen to them anyway - recommending enabling
core dumps seems like a waste of time, since as Tom mentioned package
managers shouldn't be expected to get on board with that plan. I think
a zero overhead backtrace feature from within a SIGSEGV handler (with
appropriate precautions around corrupt/exhausted call stacks) using
glibc is the right thing here.

Indeed, glibc does have infrastructure that can be used to get a
backtrace [1], which is probably what we'd end up using, but even
POSIX has infrastructure like sigaltstack(). It can be done.

[1] https://www.gnu.org/software/libc/manual/html_node/Backtraces.html
-- 
Peter Geoghegan



Re: Elusive segfault with 9.3.5 & query cancel

From
Jim Nasby
Date:
On 12/5/14, 5:49 PM, Josh Berkus wrote:
> On 12/05/2014 02:41 PM, Jim Nasby wrote:
>> Perhaps we should also officially recommend production servers be setup
>> to create core files. AFAIK the only downside is the time it would take
>> to write a core that's huge because of shared buffers, but perhaps
>> there's some way to avoid writing those? (That means the core won't help
>> if the bug is due to something in a buffer, but that seems unlikely
>> enough that the tradeoff is worth it...)
>
> Not practical in a lot of cases.  For example, this user was unwilling
> to enable core dumps on the production replicas because writing out the
> 16GB of shared buffers they had took over 10 minutes in a test.

Which is why I wondered if there's a way to avoid writing out shared buffers...

But at least getting a stack trace would be a big start.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Elusive segfault with 9.3.5 & query cancel

From
Richard Frith-Macdonald
Date:
On 5 Dec 2014, at 22:41, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>
>
> Perhaps we should also officially recommend production servers be setup to create core files. AFAIK the only downside
isthe time it would take to write a core that's huge because of shared buffers, but perhaps there's some way to avoid
writingthose? (That means the core won't help if the bug is due to something in a buffer, but that seems unlikely
enoughthat the tradeoff is worth it...) 

Good idea.  It seems the madvise() system call (with MADV_DONTDUMP) is exactly what's needed to avoid dumping shared
buffers.