Re: [GENERAL] Release LRU file - Mailing list pgsql-general

From Mike Mascari
Subject Re: [GENERAL] Release LRU file
Date
Msg-id 385FA7E0.C9C26701@mascari.com
Whole thread Raw
In response to Release LRU file  (Kimi <kimi@cricket.org>)
List pgsql-general
Kimi wrote:
>
> Hi,
>
> This is in continuation of mails I sent last week about postgres
> crashing
> We are running pg 6.5.1, on Redhar 5.1 with DBI 0.92 and DBD 1.13 on a
> 512 MB RAM
> and SCSI machine
>
> Our application consists of requests going upto 150 per second on this
> database
> with an expected uptime of 24 by 7.
> Earlier we were getting spinlock messages which we have hoped to sort
> out by raising
> number of open files per process to 1024 from the earlier 256
>
> Postgres crashes giving an error message : FATAL 1: Release LRU file :
> No opened files /
> no one can be closed.
>
> Now can anybody help on how to solve this.
>
> Please help
>
> Bye,
>
> Murali
> Differentiated Software Solutions


We have been running a production server under a somewhat
lighter load, and encountered this once. The following
conversation took place on the mailing list about a month
ago:

http://www.PostgreSQL.ORG/mhonarc/pgsql-hackers/1999-11/msg00454.html
------------------------------------------------------------
Mike Mascari <mascarim@yahoo.com> writes:
> FATAL 1:  ReleaseLruFile: No opened files - no one can be closed

> This is the first time this has ever happened.

I've never seen that either.  Offhand I do not recall any
post-6.5
changes that would affect it, so the problem (whatever it
is) is
probably still there.

After eyeballing the code, it seems there are only two ways
this
could happen:

1. the number of "allocated" (non-virtual) file descriptors
grew to
exceed the number of files Postgres thinks it can have open;

2. something else was temporarily exhausting your kernel's
file table
space, so that ENFILE was returned for many successive
attempts to
open a file.  (After each one, fd.c will close another file
and try
again.)

#2 seems improbable on an unloaded system, and isn't real
probable even
on a loaded one, since you'd have to assume that some other
process
managed to suck up each filetable slot that fd.c released
before fd.c
could re-acquire it.  Once, yes, but several dozen times in
a row?

So I'm guessing a leak of allocated file descriptors.

After grovelling through the calls to AllocateFile, I only
see one
prospect for a leak: it looks to me like verify_password()
neglects
to close the password file if an invalid user name is
given.  Do you
use a plain (non-encrypted) password file?  If so, I'll bet
you can
reproduce the crash by trying repeatedly to connect with a
username
that's not in the password file.  If that pans out, it's a
simple fix:
add "FreeFile(pw_file);" near the bottom of
verify_password() in
src/backend/libpq/password.c.  Let me know if this guess is
right...

                        regards, tom lane
------------------------------------------------------------

Hope that helps,

Mike Mascari

pgsql-general by date:

Previous
From: Kimi
Date:
Subject: Release LRU file
Next
From: Mike Mascari
Date:
Subject: Re: [GENERAL] item descriptions in psql