Thread: Postmaster dies with FATAL 1: ReleaseLruFile: No opened files - no one can be closed
Postmaster dies with FATAL 1: ReleaseLruFile: No opened files - no one can be closed
From
Mike Mascari
Date:
Hello Tom, I was hoping you might have some insight on a problem we've encountered with PostgreSQL 6.5.0 (RedHat 5.2) this morning, since you are the "file descriptor" king, as it were :-) . The database is backing a website used by a network of hospitals for materials management and this morning, the postmaster died with the following appearing in the system log: Nov 14 11:50:14 emptoris logger: FATAL 1: ReleaseLruFile: No opened files - no one can be closed This is the first time this has ever happened. I've had such good luck with PostgreSQL that I didn't have the postmaster started by inittab. The number of backends should have been very light today (Sunday) -- only a few ODBC users and an occassional HTTP user, so after the postmaster exited, the log (I assume these are forked backend complaints) shows: Nov 14 11:55:03 emptoris logger: pq_recvbuf: unexpected EOF on client connection Nov 14 11:55:03 emptoris logger: pq_recvbuf: unexpected EOF on client connection Nov 14 11:55:04 emptoris logger: pq_flush: send() failed: Broken pipe Nov 14 11:55:04 emptoris logger: FATAL: pq_endmessage failed: errno=32 >From previous posts, I know you've done a cleanup with respect to file descriptors, but all I see in the log after 6.5.0 is a 6.5.1 entry: ACL file descriptor leak fix(Atsushi Ogawa) Is this a rare occurence or something that might have been fixed between 6.5.0 and 6.5.3? Like I said, this is the first time this has happened and otherwise has been very robust under much heavier loads -- so much so that I didn't put the postmaster into inittab for respawning. Its been working pretty much flawlessly in production for about a year. Anyways, after starting the postmaster again, I vacuum analyzed the database, accessed the HTTP application, etc. without problems. Any info would be greatly appreciated, Mike Mascari (mascarim@yahoo.com) ===== __________________________________________________ Do You Yahoo!? Bid and sell for free at http://auctions.yahoo.com
Re: [HACKERS] Postmaster dies with FATAL 1: ReleaseLruFile: No opened files - no one can be closed
From
Tom Lane
Date:
Mike Mascari <mascarim@yahoo.com> writes: > FATAL 1: ReleaseLruFile: No opened files - no one can be closed > This is the first time this has ever happened. I've never seen that either. Offhand I do not recall any post-6.5 changes that would affect it, so the problem (whatever it is) is probably still there. After eyeballing the code, it seems there are only two ways this could happen: 1. the number of "allocated" (non-virtual) file descriptors grew to exceed the number of files Postgres thinks it can have open; 2. something else was temporarily exhausting your kernel's file table space, so that ENFILE was returned for many successive attempts to open a file. (After each one, fd.c will close another file and try again.) #2 seems improbable on an unloaded system, and isn't real probable even on a loaded one, since you'd have to assume that some other process managed to suck up each filetable slot that fd.c released before fd.c could re-acquire it. Once, yes, but several dozen times in a row? So I'm guessing a leak of allocated file descriptors. After grovelling through the calls to AllocateFile, I only see one prospect for a leak: it looks to me like verify_password() neglects to close the password file if an invalid user name is given. Do you use a plain (non-encrypted) password file? If so, I'll bet you can reproduce the crash by trying repeatedly to connect with a username that's not in the password file. If that pans out, it's a simple fix: add "FreeFile(pw_file);" near the bottom of verify_password() in src/backend/libpq/password.c. Let me know if this guess is right... regards, tom lane