Too many open files (was Re: spinlock problems reported earlier) - Mailing list pgsql-hackers

From Tom Lane
Subject Too many open files (was Re: spinlock problems reported earlier)
Date
Msg-id 25215.967405372@sss.pgh.pa.us
Whole thread Raw
In response to [7.0.2] spinlock problems reported earlier ...  (The Hermit Hacker <scrappy@hub.org>)
Responses Re: Too many open files (was Re: spinlock problems reported earlier)
List pgsql-hackers
The Hermit Hacker <scrappy@hub.org> writes:
> I've been monitoring 'open files' on that machine, and after raising them
> to 8192, saw it hit "Open Files Peak: 8179" this morning and once more
> have a dead database ...

> Tom, you stated "That sure looks like you'd better tweak your kernel
> settings ... but offhand I don't see how it could lead to "stuck spinlock"
> errors.", so I'm wondering if maybe there is a bug, in that it should be
> handling running out of FDs better?

Ah-hah, now that I get to see the log file before it vanished, I have
a theory about how no FDs leads to stuck spinlock.  The postmaster's own
log has

postmaster: StreamConnection: accept: Too many open files in system
postmaster: StreamConnection: accept: Too many open files in system
FATAL 1:  ReleaseLruFile: No open files available to be closed

FATAL: s_lock(20048065) at spin.c:127, stuck spinlock. Aborting.

FATAL: s_lock(20048065) at spin.c:127, stuck spinlock. Aborting.

(more of same)

while the backend log has a bunch of

IpcSemaphoreLock: semop failed (Identifier removed) id=524288
IpcSemaphoreLock: semop failed (Identifier removed) id=524288
IpcSemaphoreLock: semop failed (Identifier removed) id=524288
IpcSemaphoreLock: semop failed (Identifier removed) id=524288

*followed by* the spinlock gripes.

Here's my theory:

1. Postmaster gets a connection, tries to read pg_hba.conf, which it
does via AllocateFile().  On EMFILE failure that calls ReleaseLruFile,
which elog()'s because in the postmaster environment there are not
going to be any open virtual FDs to close.

2. elog() inside the postmaster causes the postmaster to shut down.
Which it does faithfully, including cleaning up after itself, which
includes removing the semaphores it owns.

3. Backends start falling over with semaphore-operation failures.
This is treated as a system-restart event (backend does proc_exit(255))
but there's no postmaster to kill the other backends and start a new
cycle of life.

4. At least one dying backend leaves the lock manager's spinlock locked
(which it should not), so by and by we start to see stuck-spinlock
gripes from backends that haven't yet tried to do a semop.  But that's
pretty far down the cause-and-effect chain.

It looks to me like we have several things we want to do here.

1. ReleaseLruFile() should not immediately elog() but should return 
a failure code instead, allowing AllocateFile() to return NULL, which
the postmaster can handle more gracefully than it does an elog().

2. ProcReleaseSpins() ought to be done by proc_exit().  Someone was lazy
and hard-coded it into elog() instead.

3. I think the real problem here is that the backends are opening too
damn many files.  IIRC, FreeBSD is one of the platforms where
sysconf(_SC_OPEN_MAX) will return a large number, which means that fd.c
will have no useful limit on the number of open files it eats up.
Increasing your kernel NFILES setting will just allow Postgres to eat
up more FDs, and eventually (if you allow enough backends to run)
you'll be up against it again.  Even if we manage to make Postgres
itself fairly bulletproof against EMFILE failures, much of the rest
of your system will be kayoed when PG is eating up every available
kernel FD, so that is not the path to true happiness.

(You might care to use lsof or some such to see just how many open
files you have per backend.  I bet it's a lot.)

Hmm, this is interesting: on HPUX, man sysconf(2) says that
sysconf(_SC_OPEN_MAX) returns the max number of open files per process
--- which is what fd.c assumes it means.  But I see that on your FreeBSD
box, the sysconf man page defines it as
    _SC_OPEN_MAX            The maximum number of open files per user id.

which suggests that *on that platform* we need to divide by MAXBACKENDS.
Does anyone know of a more portable way to determine the appropriate
number of open files per backend?

Otherwise, we'll have to put some kind of a-priori sanity check on
what we will believe from sysconf().  I don't much care for the idea of
putting a hard-wired limit on max files per backend, but that might be
the quick-and-dirty answer.

Another possibility is to add a postmaster parameter "max open files
for whole installation", which we'd then divide by MAXBACKENDS to
determine max files per backend, rather than trying to discover a
safe value on-the-fly.

In any case, I think we want something quick and dirty for a 7.0.*
back-patch.  Maybe just limiting what we believe from sysconf() to
100 or so would be OK for a patch.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Philip Warner
Date:
Subject: Re: Proposal for supporting outer joins in 7.1
Next
From: Tom Lane
Date:
Subject: signed, volatile, etc