Thread: Re: [HACKERS] Postmaster dies with FATAL 1: ReleaseLruFile: No opened files - no one can be closed

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Mike Mascari <mascarim@yahoo.com> writes:
> > FATAL 1:  ReleaseLruFile: No opened files - no one can be closed
> 
> > This is the first time this has ever happened.
> 
> I've never seen that either.  Offhand I do not recall any post-6.5
> changes that would affect it, so the problem (whatever it is) is
> probably still there.
> 
> After eyeballing the code, it seems there are only two ways this
> could happen:
> 
> 1. the number of "allocated" (non-virtual) file descriptors grew to
> exceed the number of files Postgres thinks it can have open;
> 
> 2. something else was temporarily exhausting your kernel's file table
> space, so that ENFILE was returned for many successive attempts to
> open a file.  (After each one, fd.c will close another file and try
> again.)
> 
> #2 seems improbable on an unloaded system, and isn't real probable even
> on a loaded one, since you'd have to assume that some other process
> managed to suck up each filetable slot that fd.c released before fd.c
> could re-acquire it.  Once, yes, but several dozen times in a row?
> 

Thanks for the response, Tom. When looking at the system log, 
the kernel was logging messages regarding IPX network name collisions
which apprently can happen when there are autoconfigured Win95 boxes
on the same subnet. These messages were flooding the log at a rate of
one every second or two...Even though #2 seems improbable, and just
glancing at the IPX kernel code didn't point to how that may have
caused a continual consumption of file descriptors, I'm willing to 
blame the kernel on this (and me for using autoprimary and autointerface
options).

Thanks again, 

Mike Mascari
(mascarim@yahoo.com)






=====

__________________________________________________
Do You Yahoo!?
Bid and sell for free at http://auctions.yahoo.com


Mike Mascari <mascarim@yahoo.com> writes:
> Thanks for the response, Tom. When looking at the system log, 
> the kernel was logging messages regarding IPX network name collisions
> which apprently can happen when there are autoconfigured Win95 boxes
> on the same subnet. These messages were flooding the log at a rate of
> one every second or two...Even though #2 seems improbable, and just
> glancing at the IPX kernel code didn't point to how that may have
> caused a continual consumption of file descriptors, I'm willing to 
> blame the kernel on this (and me for using autoprimary and autointerface
> options).

That doesn't strike me as a bulletproof explanation.  fd.c has a tight
loop that close()s an FD and then tries to open() the file it wants,
repeat until success or an error other than ENFILE/EMFILE.  If the
scenario really is that it got ENFILE every time until it was down to
zero FDs, there'd have to be something sucking up each freed FD within
microseconds of its being freed.  Repeatedly.  Forty or fifty (or more)
times in a row.  I don't think a once-a-second Win95 lossage will do
that.  And if you were down to zero free FDs system-wide, Postgres
wouldn't be the only thing having troubles!

I take it you don't use Postgres password authentication at all?  If you
do, the other theory looks a lot more viable to me... I haven't had time
to try to reproduce a crash yet, but I'm pretty sure there's one there.
        regards, tom lane