Thread: Re: [HACKERS] File descriptor leakage?

Re: [HACKERS] File descriptor leakage?

From
Cyrus Rahman
Date:
> I think we ought to hold up 6.5.2 long enough to cram this patch in, but
> I'm hesitant to stick it in the stable branch without some more testing.
> Cyrus, can you try it and see if it fixes your problem?

Ok, I can't actually try the patch for another week or so, since my
development machine has temporarily become a production machine, but thanks to
Hiroshi Inoue's patch I was able to figure out how to demonstrate the problem
in an easily reproducable manner that anyone can test.

As you can see, a connection open through a vacuum does end up duplicating
its open file descriptors.  Here's a psql session demonstrating the problem:

cr@photox% psql -d template1
Welcome to the POSTGRESQL interactive sql monitor: Please read the file COPYRIGHT for copyright terms of POSTGRESQL
[PostgreSQL 6.5.1 on i386-unknown-freebsd3.2, compiled by cc ]
  type \? for help on slash commands  type \q to quit  type \g or terminate with semicolon to execute queryYou are
currentlyconnected to the database: template1
 

template1=> select * from pg_user;
usename|usesysid|usecreatedb|usetrace|usesuper|usecatupd|passwd  |valuntil                    
-------+--------+-----------+--------+--------+---------+--------+----------------------------
pgsql  |      70|t          |t       |t       |t        |********|Sat Jan 31 01:00:00 2037 EST
cr     |      71|t          |t       |t       |t        |********|                            
paxis  |      72|f          |t       |t       |t        |********|                            
(3 rows)

template1=> 
Suspended

cr@photox% ps ax|grep postgres 425  ??  Ss     2:37.25 /usr/local/pgsql/bin/postmaster -i -S -o -F (postgres
90608  ??  S      0:00.06 /usr/local/pgsql/bin/postgres cr localhost template1 

cr@photox% fstat -p 90608
USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
pgsql    postgres   90608 root /             2 drwxr-xr-x     512  r
pgsql    postgres   90608   wd /usr     366233 drwx------    1536  r
pgsql    postgres   90608 text /usr     334856 -r-xr-xr-x  1050936  r
pgsql    postgres   90608    0 /           967 crw-rw-rw-    null rw
pgsql    postgres   90608    1 /           967 crw-rw-rw-    null rw
pgsql    postgres   90608    2 /           967 crw-rw-rw-    null rw
pgsql    postgres   90608    3 /usr     365266 -rw-------    1712  r
pgsql    postgres   90608    4 /usr     366283 -rw-------  262144 rw
pgsql    postgres   90608    5* local stream ca3f3b80 <-> ca3f3cc0
pgsql    postgres   90608    6 /usr     366236 -rw-------    8192 rw
pgsql    postgres   90608    7 /usr     366239 -rw-------    8192 rw
pgsql    postgres   90608    8 /usr     366269 -rw-------   16384 rw
pgsql    postgres   90608    9 /usr     366238 -rw-------   49152 rw
pgsql    postgres   90608   10 /usr     366259 -rw-------   32768 rw
pgsql    postgres   90608   11 /usr     366281 -rw-------    8192 rw
pgsql    postgres   90608   12 /usr     366235 -rw-------  172032 rw
pgsql    postgres   90608   13 /usr     366246 -rw-------    8192 rw
pgsql    postgres   90608   14 /usr     366242 -rw-------    8192 rw
pgsql    postgres   90608   15 /usr     366249 -rw-------    8192 rw
pgsql    postgres   90608   16 /usr     366247 -rw-------   16384 rw
pgsql    postgres   90608   17 /usr     366244 -rw-------   65536 rw
pgsql    postgres   90608   18 /usr     366262 -rw-------  139264 rw
pgsql    postgres   90608   19 /usr     366237 -rw-------   16384 rw
pgsql    postgres   90608   20 /usr     366265 -rw-------   16384 rw
pgsql    postgres   90608   21 /usr     366261 -rw-------   40960 rw
pgsql    postgres   90608   22 /usr     366254 -rw-------   24576 rw
pgsql    postgres   90608   23 /usr     366292 -rw-------       0 rw
pgsql    postgres   90608   24 /usr     366258 -rw-------   65536 rw
pgsql    postgres   90608   25 /usr     366267 -rw-------   16384 rw

cr@photox% psql -d template1 -c vacuum
VACUUM

cr@photox% !fstat
fstat -p 90608
USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
pgsql    postgres   90608 root /             2 drwxr-xr-x     512  r
pgsql    postgres   90608   wd /usr     366233 drwx------    1536  r
pgsql    postgres   90608 text /usr     334856 -r-xr-xr-x  1050936  r
pgsql    postgres   90608    0 /           967 crw-rw-rw-    null rw
pgsql    postgres   90608    1 /           967 crw-rw-rw-    null rw
pgsql    postgres   90608    2 /           967 crw-rw-rw-    null rw
pgsql    postgres   90608    3 /usr     365266 -rw-------    1712  r
pgsql    postgres   90608    4 /usr     366283 -rw-------  262144 rw
pgsql    postgres   90608    5* local stream ca3f3b80 <-> ca3f3cc0
pgsql    postgres   90608    6 /usr     366236 -rw-------    8192 rw
pgsql    postgres   90608    7 /usr     366239 -rw-------    8192 rw
pgsql    postgres   90608    8 /usr     366269 -rw-------   16384 rw
pgsql    postgres   90608    9 /usr     366238 -rw-------   49152 rw
pgsql    postgres   90608   10 /usr     366259 -rw-------   32768 rw
pgsql    postgres   90608   11 /usr     366281 -rw-------    8192 rw
pgsql    postgres   90608   12 /usr     366235 -rw-------  172032 rw
pgsql    postgres   90608   13 /usr     366246 -rw-------    8192 rw
pgsql    postgres   90608   14 /usr     366242 -rw-------    8192 rw
pgsql    postgres   90608   15 /usr     366249 -rw-------    8192 rw
pgsql    postgres   90608   16 /usr     366247 -rw-------   16384 rw
pgsql    postgres   90608   17 /usr     366244 -rw-------   65536 rw
pgsql    postgres   90608   18 /usr     366262 -rw-------  139264 rw
pgsql    postgres   90608   19 /usr     366237 -rw-------   16384 rw
pgsql    postgres   90608   20 /usr     366265 -rw-------   16384 rw
pgsql    postgres   90608   21 /usr     366261 -rw-------   40960 rw
pgsql    postgres   90608   22 /usr     366254 -rw-------   24576 rw
pgsql    postgres   90608   23 /usr     366292 -rw-------       0 rw
pgsql    postgres   90608   24 /usr     366258 -rw-------   65536 rw
pgsql    postgres   90608   25 /usr     366267 -rw-------   16384 rw

cr@photox% fg
psql -d template1
select * from pg_user;
usename|usesysid|usecreatedb|usetrace|usesuper|usecatupd|passwd  |valuntil                    
-------+--------+-----------+--------+--------+---------+--------+----------------------------
pgsql  |      70|t          |t       |t       |t        |********|Sat Jan 31 01:00:00 2037 EST
cr     |      71|t          |t       |t       |t        |********|                            
paxis  |      72|f          |t       |t       |t        |********|                            
(3 rows)

template1=> 
Suspended

cr@photox% !fstat
fstat -p 90608
USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
pgsql    postgres   90608 root /             2 drwxr-xr-x     512  r
pgsql    postgres   90608   wd /usr     366233 drwx------    1536  r
pgsql    postgres   90608 text /usr     334856 -r-xr-xr-x  1050936  r
pgsql    postgres   90608    0 /           967 crw-rw-rw-    null rw
pgsql    postgres   90608    1 /           967 crw-rw-rw-    null rw
pgsql    postgres   90608    2 /           967 crw-rw-rw-    null rw
pgsql    postgres   90608    3 /usr     365266 -rw-------    1712  r
pgsql    postgres   90608    4 /usr     366283 -rw-------  262144 rw
pgsql    postgres   90608    5* local stream ca3f3b80 <-> ca3f3cc0
pgsql    postgres   90608    6 /usr     366236 -rw-------    8192 rw
pgsql    postgres   90608    7 /usr     366239 -rw-------    8192 rw
pgsql    postgres   90608    8 /usr     366269 -rw-------   16384 rw
pgsql    postgres   90608    9 /usr     366238 -rw-------   49152 rw
pgsql    postgres   90608   10 /usr     366259 -rw-------   32768 rw
pgsql    postgres   90608   11 /usr     366281 -rw-------    8192 rw
pgsql    postgres   90608   12 /usr     366235 -rw-------  172032 rw
pgsql    postgres   90608   13 /usr     366246 -rw-------    8192 rw
pgsql    postgres   90608   14 /usr     366242 -rw-------    8192 rw
pgsql    postgres   90608   15 /usr     366249 -rw-------    8192 rw
pgsql    postgres   90608   16 /usr     366247 -rw-------   16384 rw
pgsql    postgres   90608   17 /usr     366244 -rw-------   65536 rw
pgsql    postgres   90608   18 /usr     366262 -rw-------  139264 rw
pgsql    postgres   90608   19 /usr     366237 -rw-------   16384 rw
pgsql    postgres   90608   20 /usr     366265 -rw-------   16384 rw
pgsql    postgres   90608   21 /usr     366261 -rw-------   40960 rw
pgsql    postgres   90608   22 /usr     366254 -rw-------   24576 rw
pgsql    postgres   90608   23 /usr     366292 -rw-------       0 rw
pgsql    postgres   90608   24 /usr     366258 -rw-------   65536 rw
pgsql    postgres   90608   25 /usr     366267 -rw-------   16384 rw
pgsql    postgres   90608   26 /usr     366254 -rw-------   24576 rw
pgsql    postgres   90608   27 /usr     366246 -rw-------    8192 rw
pgsql    postgres   90608   28 /usr     366242 -rw-------    8192 rw
pgsql    postgres   90608   29 /usr     366249 -rw-------    8192 rw
pgsql    postgres   90608   30 /usr     366247 -rw-------   16384 rw
pgsql    postgres   90608   31 /usr     366244 -rw-------   65536 rw
pgsql    postgres   90608   32 /usr     366265 -rw-------   16384 rw
pgsql    postgres   90608   33 /usr     366292 -rw-------       0 rw
pgsql    postgres   90608   34 /usr     366258 -rw-------   65536 rw
pgsql    postgres   90608   35 /usr     366281 -rw-------    8192 rw

cr@photox% ls -iR /usr/local/pgsql/data/ | egrep '292|258|281'
366281 pg_shadow
366258 pg_attribute_relid_attnam_index
366292 pg_user


Re: [HACKERS] File descriptor leakage?

From
Tom Lane
Date:
Cyrus Rahman <cr@photox.jcmax.com> writes:
> As you can see, a connection open through a vacuum does end up duplicating
> its open file descriptors.

Indeed, phrased in that fashion it's easy to duplicate the problem.

Interestingly, this isn't a big problem on platforms where there is
a relatively low limit on number of open files per process.  A backend
will run its open file count up to the limit and then stay there
(wasting a few more virtual-file-descriptor array slots per vacuum
cycle, but this is such a small memory leak you'd likely never notice).
But on systems that let a process have thousands of kernel file
descriptors, there will be no recycling of kernel descriptors as the
number of virtual descriptors increases.

What's the consensus, hackers?  Do we risk sticking Hiroshi's patch into
6.5.2, or not?  It should definitely go into current, but I'm worried
about putting it into the stable branch right before a release...
Vadim, does it look right to you?
        regards, tom lane


Re: [HACKERS] File descriptor leakage?

From
Vadim Mikheev
Date:
Tom Lane wrote:
> 
> Interestingly, this isn't a big problem on platforms where there is                ^^^^^^^^^^^^^^^^^^^^^^^^
> a relatively low limit on number of open files per process.  A backend
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> will run its open file count up to the limit and then stay there
> (wasting a few more virtual-file-descriptor array slots per vacuum
> cycle, but this is such a small memory leak you'd likely never notice).
> But on systems that let a process have thousands of kernel file
> descriptors, there will be no recycling of kernel descriptors as the
> number of virtual descriptors increases.
> 
> What's the consensus, hackers?  Do we risk sticking Hiroshi's patch into
> 6.5.2, or not?  It should definitely go into current, but I'm worried
> about putting it into the stable branch right before a release...
> Vadim, does it look right to you?

Sorry, I have no time to look in it. But there is another solution:

> From: owner-pgsql-hackers@postgreSQL.org
> [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Vadim Mikheev
> Sent: Monday, June 07, 1999 7:49 PM
> To: Hiroshi Inoue
> Cc: The Hermit Hacker; pgsql-hackers@postgreSQL.org
> Subject: Re: [HACKERS] postgresql-v6.5beta2.tar.gz ...
>

[snip] 

> 2. fd.c:pg_nofile()->sysconf(_SC_OPEN_MAX) returns in FreeBSD 
>    near total number of files that can be opened in system
>    (by _all_ users/procs). With total number of opened files
>    ~ 2000 I can run your test with 10-20 simultaneous
>    xactions for very short time, -:)
> 
>    Should we limit fd.c:no_files to ~ 256?    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>    This is port-specific, of course...

No risk at all...

Vadim


RE: [HACKERS] File descriptor leakage?

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: root@sunpine.krs.ru [mailto:root@sunpine.krs.ru]On Behalf Of Vadim
> Mikheev
> Sent: Wednesday, September 01, 1999 1:18 AM
> To: Tom Lane
> Cc: Cyrus Rahman; Inoue@tpf.co.jp; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] File descriptor leakage?
>
> 
> Tom Lane wrote:
> > 
> > Interestingly, this isn't a big problem on platforms where there is
>                  ^^^^^^^^^^^^^^^^^^^^^^^^
> > a relatively low limit on number of open files per process.  A backend
>                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > will run its open file count up to the limit and then stay there

It's not a small problem on platforms such as cygwin, OS2 where we
couldn't unlink open files.  We have to close useless file descriptors
ASAP there.

6.5.2-release should be stable as possible.
So I don't object to the riskless way as Vadim mentioned. 

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


Re: [HACKERS] File descriptor leakage?

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
>> Tom Lane wrote:
>>>> Interestingly, this isn't a big problem on platforms where there is
>>>> a relatively low limit on number of open files per process.

> It's not a small problem on platforms such as cygwin, OS2 where we
> couldn't unlink open files.

Ah, right, good ol' microsoft strikes again...

> We have to close useless file descriptors ASAP there.
> 6.5.2-release should be stable as possible.
> So I don't object to the riskless way as Vadim mentioned. 

Well, Vadim's "riskless solution" does NOT solve the problem you mention
above, AFAICT.  Reducing the number of kernel file descriptors won't
magically cause forgotten descriptors for a table you want to delete
to not be there --- it just shortens the interval where you'll have a
problem, by shortening the interval before the descriptors get recycled.
If you reduce the number of descriptors enough to make the problem
unlikely to occur, you'll be taking a big performance hit.

So we need a proper fix to ensure the relation code doesn't forget about
open descriptors.

I will try to take a look at Hiroshi's patch this evening, and will
commit it to both branches if I can't find anything wrong with it...
        regards, tom lane