Re: [GENERAL] Release LRU file - Mailing list pgsql-general
From | Mike Mascari |
---|---|
Subject | Re: [GENERAL] Release LRU file |
Date | |
Msg-id | 385FA7E0.C9C26701@mascari.com Whole thread Raw |
In response to | Release LRU file (Kimi <kimi@cricket.org>) |
List | pgsql-general |
Kimi wrote: > > Hi, > > This is in continuation of mails I sent last week about postgres > crashing > We are running pg 6.5.1, on Redhar 5.1 with DBI 0.92 and DBD 1.13 on a > 512 MB RAM > and SCSI machine > > Our application consists of requests going upto 150 per second on this > database > with an expected uptime of 24 by 7. > Earlier we were getting spinlock messages which we have hoped to sort > out by raising > number of open files per process to 1024 from the earlier 256 > > Postgres crashes giving an error message : FATAL 1: Release LRU file : > No opened files / > no one can be closed. > > Now can anybody help on how to solve this. > > Please help > > Bye, > > Murali > Differentiated Software Solutions We have been running a production server under a somewhat lighter load, and encountered this once. The following conversation took place on the mailing list about a month ago: http://www.PostgreSQL.ORG/mhonarc/pgsql-hackers/1999-11/msg00454.html ------------------------------------------------------------ Mike Mascari <mascarim@yahoo.com> writes: > FATAL 1: ReleaseLruFile: No opened files - no one can be closed > This is the first time this has ever happened. I've never seen that either. Offhand I do not recall any post-6.5 changes that would affect it, so the problem (whatever it is) is probably still there. After eyeballing the code, it seems there are only two ways this could happen: 1. the number of "allocated" (non-virtual) file descriptors grew to exceed the number of files Postgres thinks it can have open; 2. something else was temporarily exhausting your kernel's file table space, so that ENFILE was returned for many successive attempts to open a file. (After each one, fd.c will close another file and try again.) #2 seems improbable on an unloaded system, and isn't real probable even on a loaded one, since you'd have to assume that some other process managed to suck up each filetable slot that fd.c released before fd.c could re-acquire it. Once, yes, but several dozen times in a row? So I'm guessing a leak of allocated file descriptors. After grovelling through the calls to AllocateFile, I only see one prospect for a leak: it looks to me like verify_password() neglects to close the password file if an invalid user name is given. Do you use a plain (non-encrypted) password file? If so, I'll bet you can reproduce the crash by trying repeatedly to connect with a username that's not in the password file. If that pans out, it's a simple fix: add "FreeFile(pw_file);" near the bottom of verify_password() in src/backend/libpq/password.c. Let me know if this guess is right... regards, tom lane ------------------------------------------------------------ Hope that helps, Mike Mascari
pgsql-general by date: