Thread: Release LRU file

Release LRU file

From
Kimi
Date:
Hi,

This is in continuation of mails I sent last week about postgres
crashing
We are running pg 6.5.1, on Redhar 5.1 with DBI 0.92 and DBD 1.13 on a
512 MB RAM
and SCSI machine

Our application consists of requests going upto 150 per second on this
database
with an expected uptime of 24 by 7.
Earlier we were getting spinlock messages which we have hoped to sort
out by raising
number of open files per process to 1024 from the earlier 256

Postgres crashes giving an error message : FATAL 1: Release LRU file :
No opened files /
no one can be closed.

Now can anybody help on how to solve this.

Please help

Bye,

Murali
Differentiated Software Solutions



Re: [GENERAL] Release LRU file

From
Mike Mascari
Date:
Kimi wrote:
>
> Hi,
>
> This is in continuation of mails I sent last week about postgres
> crashing
> We are running pg 6.5.1, on Redhar 5.1 with DBI 0.92 and DBD 1.13 on a
> 512 MB RAM
> and SCSI machine
>
> Our application consists of requests going upto 150 per second on this
> database
> with an expected uptime of 24 by 7.
> Earlier we were getting spinlock messages which we have hoped to sort
> out by raising
> number of open files per process to 1024 from the earlier 256
>
> Postgres crashes giving an error message : FATAL 1: Release LRU file :
> No opened files /
> no one can be closed.
>
> Now can anybody help on how to solve this.
>
> Please help
>
> Bye,
>
> Murali
> Differentiated Software Solutions


We have been running a production server under a somewhat
lighter load, and encountered this once. The following
conversation took place on the mailing list about a month
ago:

http://www.PostgreSQL.ORG/mhonarc/pgsql-hackers/1999-11/msg00454.html
------------------------------------------------------------
Mike Mascari <mascarim@yahoo.com> writes:
> FATAL 1:  ReleaseLruFile: No opened files - no one can be closed

> This is the first time this has ever happened.

I've never seen that either.  Offhand I do not recall any
post-6.5
changes that would affect it, so the problem (whatever it
is) is
probably still there.

After eyeballing the code, it seems there are only two ways
this
could happen:

1. the number of "allocated" (non-virtual) file descriptors
grew to
exceed the number of files Postgres thinks it can have open;

2. something else was temporarily exhausting your kernel's
file table
space, so that ENFILE was returned for many successive
attempts to
open a file.  (After each one, fd.c will close another file
and try
again.)

#2 seems improbable on an unloaded system, and isn't real
probable even
on a loaded one, since you'd have to assume that some other
process
managed to suck up each filetable slot that fd.c released
before fd.c
could re-acquire it.  Once, yes, but several dozen times in
a row?

So I'm guessing a leak of allocated file descriptors.

After grovelling through the calls to AllocateFile, I only
see one
prospect for a leak: it looks to me like verify_password()
neglects
to close the password file if an invalid user name is
given.  Do you
use a plain (non-encrypted) password file?  If so, I'll bet
you can
reproduce the crash by trying repeatedly to connect with a
username
that's not in the password file.  If that pans out, it's a
simple fix:
add "FreeFile(pw_file);" near the bottom of
verify_password() in
src/backend/libpq/password.c.  Let me know if this guess is
right...

                        regards, tom lane
------------------------------------------------------------

Hope that helps,

Mike Mascari