Thread: Re: [HACKERS] Postmaster dies with FATAL 1: ReleaseLruFile: No opened files - no one can be closed
Re: [HACKERS] Postmaster dies with FATAL 1: ReleaseLruFile: No opened files - no one can be closed
From
Mike Mascari
Date:
--- Tom Lane <tgl@sss.pgh.pa.us> wrote: > Mike Mascari <mascarim@yahoo.com> writes: > > FATAL 1: ReleaseLruFile: No opened files - no one can be closed > > > This is the first time this has ever happened. > > I've never seen that either. Offhand I do not recall any post-6.5 > changes that would affect it, so the problem (whatever it is) is > probably still there. > > After eyeballing the code, it seems there are only two ways this > could happen: > > 1. the number of "allocated" (non-virtual) file descriptors grew to > exceed the number of files Postgres thinks it can have open; > > 2. something else was temporarily exhausting your kernel's file table > space, so that ENFILE was returned for many successive attempts to > open a file. (After each one, fd.c will close another file and try > again.) > > #2 seems improbable on an unloaded system, and isn't real probable even > on a loaded one, since you'd have to assume that some other process > managed to suck up each filetable slot that fd.c released before fd.c > could re-acquire it. Once, yes, but several dozen times in a row? > Thanks for the response, Tom. When looking at the system log, the kernel was logging messages regarding IPX network name collisions which apprently can happen when there are autoconfigured Win95 boxes on the same subnet. These messages were flooding the log at a rate of one every second or two...Even though #2 seems improbable, and just glancing at the IPX kernel code didn't point to how that may have caused a continual consumption of file descriptors, I'm willing to blame the kernel on this (and me for using autoprimary and autointerface options). Thanks again, Mike Mascari (mascarim@yahoo.com) ===== __________________________________________________ Do You Yahoo!? Bid and sell for free at http://auctions.yahoo.com
Re: [HACKERS] Postmaster dies with FATAL 1: ReleaseLruFile: No opened files - no one can be closed
From
Tom Lane
Date:
Mike Mascari <mascarim@yahoo.com> writes: > Thanks for the response, Tom. When looking at the system log, > the kernel was logging messages regarding IPX network name collisions > which apprently can happen when there are autoconfigured Win95 boxes > on the same subnet. These messages were flooding the log at a rate of > one every second or two...Even though #2 seems improbable, and just > glancing at the IPX kernel code didn't point to how that may have > caused a continual consumption of file descriptors, I'm willing to > blame the kernel on this (and me for using autoprimary and autointerface > options). That doesn't strike me as a bulletproof explanation. fd.c has a tight loop that close()s an FD and then tries to open() the file it wants, repeat until success or an error other than ENFILE/EMFILE. If the scenario really is that it got ENFILE every time until it was down to zero FDs, there'd have to be something sucking up each freed FD within microseconds of its being freed. Repeatedly. Forty or fifty (or more) times in a row. I don't think a once-a-second Win95 lossage will do that. And if you were down to zero free FDs system-wide, Postgres wouldn't be the only thing having troubles! I take it you don't use Postgres password authentication at all? If you do, the other theory looks a lot more viable to me... I haven't had time to try to reproduce a crash yet, but I'm pretty sure there's one there. regards, tom lane