Re: [BUGS] BUG #14781: server process was terminated by signal 11:Segmentation fault - Mailing list pgsql-bugs

From Alvaro Herrera
Subject Re: [BUGS] BUG #14781: server process was terminated by signal 11:Segmentation fault
Date
Msg-id 20170816164737.mvd3dl4xgk2ofoia@alvherre.pgsql
Whole thread Raw
In response to Re: [BUGS] BUG #14781: server process was terminated by signal 11: Segmentation fault  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [BUGS] BUG #14781: server process was terminated by signal 11: Segmentation fault  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Tom Lane wrote:
> Maksim Karaba <Maksim_Karaba@epam.com> writes:
> > Unfortunately we cannot reproduce this issue on other servers, only on production system.
> > And we cannot provide internal database info, schema structure and tables info.
> 
> [ shrug... ]  We may just have to wait for somebody to be more
> forthcoming.
> 
> FWIW, the stack trace seems to indicate that an incorrect plan has been
> generated, ie one that has a remote join node without an EPQ recheck
> subplan.  That mistake in itself is probably pretty deterministic.  The
> reason you can't reproduce the crash easily is that the lack of a subplan
> only manifests as a crash if we enter the EPQ recheck code, and that only
> happens if the query tries to update a row that's just been updated by
> some concurrent query.  So it's not going to crash except under concurrent
> load, which probably also explains why the bug wasn't found long ago.

One way to figure out the exact bug is to explore the sequence of WAL
records that leads to the tuple causing the crash; it should be possible
to create a reproducer by writing an isolationtester script that
produces the same WAL sequence.  That's how we found the bug fixed in
https://git.postgresql.org/pg/commitdiff/459c64d3227f8 for example.

> If you want to push this forward rather than wait for somebody else
> to hit the problem, you could try adding something like
> 
>     if (fsplan->scan.scanrelid == 0 && outerPlanState(node) == NULL &&
>         (estate->es_plannedstmt->commandType != CMD_SELECT ||
>          estate->es_rowMarks))
>         elog(WARNING, "foreign join plan lacks EPQ support");
> 
> near the beginning of postgresBeginForeignScan and then running your app
> on a test server.

Hmm, is there a reason this cannot be included as a sanity check always?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: [BUGS] Hello I got this error when installing postgresql 9.4 on my antsle debian 8 LXC do you know a work around?
Next
From: Tom Lane
Date:
Subject: Re: [BUGS] BUG #14781: server process was terminated by signal 11: Segmentation fault