Vick Khera wrote:
> On Tue, Aug 25, 2009 at 4:55 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> > I've always thought that the fd.c layer is more about not having to
> > configure the code explicitly for max-files-per-process limits. Once
> > you get into ENFILE conditions, even if Postgres manages to stay up,
> > everything else on the box is going to start falling over. So the
> > sysadmin is likely to have to resort to a reboot anyway.
>
> In my case, all sorts of processes were complaining about being unable
> to open files. Once Pg panicked and closed all its files, everything
> came back to normal. I didn't have to reboot because most everything
> was written to retry and/or restart itself, and nothing critical like
> sshd croaked.
Hmm. How many DB connections were there at the time? Are they normally
long-lived?
I'm wondering if the problem could be caused by too many backends
holding the maximum of open files each. In my system,
/proc/sys/fs/file-max says ~200k, and per-process limit is 1024, so it
would take about 200 backends with all FDs in use to bring the system to
a near collapse that won't be solved until Postgres is restarted. This
doesn't sound so far-fetched if the connections are long lived, perhaps
from a pooler.
Maybe we should have another inter-backend signal: when a process gets
ENFILE, signal all other backends and they close a bunch of files each.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support