Thread: Postmaster dies with FATAL 1: ReleaseLruFile: No opened files - no one can be closed

Hello Tom,

I was hoping you might have some insight on a problem we've encountered
with PostgreSQL 6.5.0 (RedHat 5.2) this morning, since you are the
"file descriptor" king, as it were :-) . The database is backing a website
used by a network of hospitals for materials management and this morning,
the postmaster died with the following appearing in the system log:

Nov 14 11:50:14 emptoris logger: 
FATAL 1:  ReleaseLruFile: No opened files - no one can be closed

This is the first time this has ever happened. I've had such good luck
with PostgreSQL that I didn't have the postmaster started by inittab.
The number of backends should have been very light today (Sunday) --
only a few ODBC users and an occassional HTTP user, so after the 
postmaster exited, the log (I assume these are forked backend complaints)
shows:

Nov 14 11:55:03 emptoris logger: 
pq_recvbuf: unexpected EOF on client connection
Nov 14 11:55:03 emptoris logger: 
pq_recvbuf: unexpected EOF on client connection
Nov 14 11:55:04 emptoris logger: 
pq_flush: send() failed: Broken pipe
Nov 14 11:55:04 emptoris logger: 
FATAL: pq_endmessage failed: errno=32     

>From previous posts, I know you've done a cleanup with respect to 
file descriptors, but all I see in the log after 6.5.0 is a 6.5.1 entry:

ACL file descriptor leak fix(Atsushi Ogawa)

Is this a rare occurence or something that might have been fixed between
6.5.0 and 6.5.3? Like I said, this is the first time this has happened and
otherwise has been very robust under much heavier loads -- so much so
that I didn't put the postmaster into inittab for respawning.  Its 
been working pretty much flawlessly in production for about a year. 

Anyways, after starting the postmaster again, I vacuum analyzed the 
database, accessed the HTTP application, etc. without problems.

Any info would be greatly appreciated, 

Mike Mascari
(mascarim@yahoo.com)






=====

__________________________________________________
Do You Yahoo!?
Bid and sell for free at http://auctions.yahoo.com


Mike Mascari <mascarim@yahoo.com> writes:
> FATAL 1:  ReleaseLruFile: No opened files - no one can be closed

> This is the first time this has ever happened.

I've never seen that either.  Offhand I do not recall any post-6.5
changes that would affect it, so the problem (whatever it is) is
probably still there.

After eyeballing the code, it seems there are only two ways this
could happen:

1. the number of "allocated" (non-virtual) file descriptors grew to
exceed the number of files Postgres thinks it can have open;

2. something else was temporarily exhausting your kernel's file table
space, so that ENFILE was returned for many successive attempts to
open a file.  (After each one, fd.c will close another file and try
again.)

#2 seems improbable on an unloaded system, and isn't real probable even
on a loaded one, since you'd have to assume that some other process
managed to suck up each filetable slot that fd.c released before fd.c
could re-acquire it.  Once, yes, but several dozen times in a row?

So I'm guessing a leak of allocated file descriptors.

After grovelling through the calls to AllocateFile, I only see one
prospect for a leak: it looks to me like verify_password() neglects
to close the password file if an invalid user name is given.  Do you
use a plain (non-encrypted) password file?  If so, I'll bet you can
reproduce the crash by trying repeatedly to connect with a username
that's not in the password file.  If that pans out, it's a simple fix:
add "FreeFile(pw_file);" near the bottom of verify_password() in
src/backend/libpq/password.c.  Let me know if this guess is right...
        regards, tom lane