Re: [HACKERS] emergency outage requiring database restart - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: [HACKERS] emergency outage requiring database restart
Date
Msg-id CAHyXU0xY8EUDRnXmeZ9OXD5EpM+vXfAuvOwJTUHNHpA-AV=L_Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] emergency outage requiring database restart  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
List pgsql-hackers
On Tue, Jan 3, 2017 at 1:05 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> On 11/7/16 5:31 PM, Merlin Moncure wrote:
> > Regardless, it seems like you might be on to something, and I'm
> > inclined to patch your change, test it, and roll it out to production.
> > If it helps or at least narrows the problem down, we ought to give it
> > consideration for inclusion (unless someone else can think of a good
> > reason not to do that, heh!).
>
> Any results yet?

Not yet.   But I do have some interesting findings.  At this point I
do not think the problem is within  pl/sh itself, but that when a
process is invoked from pl/sh misbehaves that misbehavior can
penetrate into the database processes.  I also believe that this
problem is fd related, so that the 'close on exec' might reasonably
fix it.  All cases of database damage I have observed remain
completely mitigated by enabling database checksums.

Recently, a sqsh process kicked off via pl/sh crashed with signal 11
but the database process was otherwise intact and fine.  This is
strong supporting evidence to my points above, I think.  I've also
turned up a fairly reliable reproduction case from some unrelated
application changes.  If I can demonstrate that close on exec flag
works and prevents these occurrences we can close the book on this.

merlin



pgsql-hackers by date:

Previous
From: Dmitry Dolgov
Date:
Subject: Re: Index Skip Scan
Next
From: Konstantin Knizhnik
Date:
Subject: Re: [Proposal] Global temporary tables