On Tue, Jan 3, 2017 at 1:05 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> On 11/7/16 5:31 PM, Merlin Moncure wrote:
> > Regardless, it seems like you might be on to something, and I'm
> > inclined to patch your change, test it, and roll it out to production.
> > If it helps or at least narrows the problem down, we ought to give it
> > consideration for inclusion (unless someone else can think of a good
> > reason not to do that, heh!).
>
> Any results yet?
Not yet. But I do have some interesting findings. At this point I
do not think the problem is within pl/sh itself, but that when a
process is invoked from pl/sh misbehaves that misbehavior can
penetrate into the database processes. I also believe that this
problem is fd related, so that the 'close on exec' might reasonably
fix it. All cases of database damage I have observed remain
completely mitigated by enabling database checksums.
Recently, a sqsh process kicked off via pl/sh crashed with signal 11
but the database process was otherwise intact and fine. This is
strong supporting evidence to my points above, I think. I've also
turned up a fairly reliable reproduction case from some unrelated
application changes. If I can demonstrate that close on exec flag
works and prevents these occurrences we can close the book on this.
merlin