Re: emergency outage requiring database restart - Mailing list pgsql-hackers

From Tom Lane
Subject Re: emergency outage requiring database restart
Date
Msg-id 17649.1478008605@sss.pgh.pa.us
Whole thread Raw
In response to Re: emergency outage requiring database restart  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: emergency outage requiring database restart  (Andres Freund <andres@anarazel.de>)
Re: emergency outage requiring database restart  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-hackers
Merlin Moncure <mmoncure@gmail.com> writes:
> On Mon, Oct 31, 2016 at 10:32 AM, Oskari Saarenmaa <os@ohmu.fi> wrote:
>> Your production system's postgres backends probably have a lot more open
>> files associated with them than the simple test case does.  Since Postgres
>> likes to keep files open as long as possible and only closes them when you
>> need to free up fds to open new files, it's possible that your production
>> backends have almost all allowed fds used when you execute your pl/sh
>> function.
>>
>> If that's the case, the sqsh process that's executed may not have enough fds
>> to do what it wanted to do and because of busted error handling could end up
>> writing to fds that were opened by Postgres and point to $PGDATA files.

> Does that apply?  the mechanics are a sqsh function that basically does:
> cat foo.sql  | sqsh <args>
> pipe redirection opens a new process, right?

Yeah, but I doubt that either level of the shell would attempt to close
inherited file handles.

The real problem with Oskari's theory is that it requires not merely
busted, but positively brain-dead error handling in the shell and/or
sqsh, ie ignoring open() failures altogether.  That seems kind of
unlikely.  Still, I suspect he might be onto something --- there must
be some reason you can reproduce the issue in production and not in
your test bed, and number-of-open-files is as good a theory as I've
heard.

Maybe the issue is not with open() failures, but with the resulting
FD numbers being much larger than sqsh is expecting.  It would be
weird to try to store an FD in something narrower than int, but
I could see a use of select() being unprepared for large FDs.
Still, it's hard to translate that idea into scribbling on the
wrong file...
        regards, tom lane



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Improve output of BitmapAnd EXPLAIN ANALYZE
Next
From: Andres Freund
Date:
Subject: Re: emergency outage requiring database restart