Thread: Re: [HACKERS] File descriptor leakage?
> I think we ought to hold up 6.5.2 long enough to cram this patch in, but > I'm hesitant to stick it in the stable branch without some more testing. > Cyrus, can you try it and see if it fixes your problem? Ok, I can't actually try the patch for another week or so, since my development machine has temporarily become a production machine, but thanks to Hiroshi Inoue's patch I was able to figure out how to demonstrate the problem in an easily reproducable manner that anyone can test. As you can see, a connection open through a vacuum does end up duplicating its open file descriptors. Here's a psql session demonstrating the problem: cr@photox% psql -d template1 Welcome to the POSTGRESQL interactive sql monitor: Please read the file COPYRIGHT for copyright terms of POSTGRESQL [PostgreSQL 6.5.1 on i386-unknown-freebsd3.2, compiled by cc ] type \? for help on slash commands type \q to quit type \g or terminate with semicolon to execute queryYou are currentlyconnected to the database: template1 template1=> select * from pg_user; usename|usesysid|usecreatedb|usetrace|usesuper|usecatupd|passwd |valuntil -------+--------+-----------+--------+--------+---------+--------+---------------------------- pgsql | 70|t |t |t |t |********|Sat Jan 31 01:00:00 2037 EST cr | 71|t |t |t |t |********| paxis | 72|f |t |t |t |********| (3 rows) template1=> Suspended cr@photox% ps ax|grep postgres 425 ?? Ss 2:37.25 /usr/local/pgsql/bin/postmaster -i -S -o -F (postgres 90608 ?? S 0:00.06 /usr/local/pgsql/bin/postgres cr localhost template1 cr@photox% fstat -p 90608 USER CMD PID FD MOUNT INUM MODE SZ|DV R/W pgsql postgres 90608 root / 2 drwxr-xr-x 512 r pgsql postgres 90608 wd /usr 366233 drwx------ 1536 r pgsql postgres 90608 text /usr 334856 -r-xr-xr-x 1050936 r pgsql postgres 90608 0 / 967 crw-rw-rw- null rw pgsql postgres 90608 1 / 967 crw-rw-rw- null rw pgsql postgres 90608 2 / 967 crw-rw-rw- null rw pgsql postgres 90608 3 /usr 365266 -rw------- 1712 r pgsql postgres 90608 4 /usr 366283 -rw------- 262144 rw pgsql postgres 90608 5* local stream ca3f3b80 <-> ca3f3cc0 pgsql postgres 90608 6 /usr 366236 -rw------- 8192 rw pgsql postgres 90608 7 /usr 366239 -rw------- 8192 rw pgsql postgres 90608 8 /usr 366269 -rw------- 16384 rw pgsql postgres 90608 9 /usr 366238 -rw------- 49152 rw pgsql postgres 90608 10 /usr 366259 -rw------- 32768 rw pgsql postgres 90608 11 /usr 366281 -rw------- 8192 rw pgsql postgres 90608 12 /usr 366235 -rw------- 172032 rw pgsql postgres 90608 13 /usr 366246 -rw------- 8192 rw pgsql postgres 90608 14 /usr 366242 -rw------- 8192 rw pgsql postgres 90608 15 /usr 366249 -rw------- 8192 rw pgsql postgres 90608 16 /usr 366247 -rw------- 16384 rw pgsql postgres 90608 17 /usr 366244 -rw------- 65536 rw pgsql postgres 90608 18 /usr 366262 -rw------- 139264 rw pgsql postgres 90608 19 /usr 366237 -rw------- 16384 rw pgsql postgres 90608 20 /usr 366265 -rw------- 16384 rw pgsql postgres 90608 21 /usr 366261 -rw------- 40960 rw pgsql postgres 90608 22 /usr 366254 -rw------- 24576 rw pgsql postgres 90608 23 /usr 366292 -rw------- 0 rw pgsql postgres 90608 24 /usr 366258 -rw------- 65536 rw pgsql postgres 90608 25 /usr 366267 -rw------- 16384 rw cr@photox% psql -d template1 -c vacuum VACUUM cr@photox% !fstat fstat -p 90608 USER CMD PID FD MOUNT INUM MODE SZ|DV R/W pgsql postgres 90608 root / 2 drwxr-xr-x 512 r pgsql postgres 90608 wd /usr 366233 drwx------ 1536 r pgsql postgres 90608 text /usr 334856 -r-xr-xr-x 1050936 r pgsql postgres 90608 0 / 967 crw-rw-rw- null rw pgsql postgres 90608 1 / 967 crw-rw-rw- null rw pgsql postgres 90608 2 / 967 crw-rw-rw- null rw pgsql postgres 90608 3 /usr 365266 -rw------- 1712 r pgsql postgres 90608 4 /usr 366283 -rw------- 262144 rw pgsql postgres 90608 5* local stream ca3f3b80 <-> ca3f3cc0 pgsql postgres 90608 6 /usr 366236 -rw------- 8192 rw pgsql postgres 90608 7 /usr 366239 -rw------- 8192 rw pgsql postgres 90608 8 /usr 366269 -rw------- 16384 rw pgsql postgres 90608 9 /usr 366238 -rw------- 49152 rw pgsql postgres 90608 10 /usr 366259 -rw------- 32768 rw pgsql postgres 90608 11 /usr 366281 -rw------- 8192 rw pgsql postgres 90608 12 /usr 366235 -rw------- 172032 rw pgsql postgres 90608 13 /usr 366246 -rw------- 8192 rw pgsql postgres 90608 14 /usr 366242 -rw------- 8192 rw pgsql postgres 90608 15 /usr 366249 -rw------- 8192 rw pgsql postgres 90608 16 /usr 366247 -rw------- 16384 rw pgsql postgres 90608 17 /usr 366244 -rw------- 65536 rw pgsql postgres 90608 18 /usr 366262 -rw------- 139264 rw pgsql postgres 90608 19 /usr 366237 -rw------- 16384 rw pgsql postgres 90608 20 /usr 366265 -rw------- 16384 rw pgsql postgres 90608 21 /usr 366261 -rw------- 40960 rw pgsql postgres 90608 22 /usr 366254 -rw------- 24576 rw pgsql postgres 90608 23 /usr 366292 -rw------- 0 rw pgsql postgres 90608 24 /usr 366258 -rw------- 65536 rw pgsql postgres 90608 25 /usr 366267 -rw------- 16384 rw cr@photox% fg psql -d template1 select * from pg_user; usename|usesysid|usecreatedb|usetrace|usesuper|usecatupd|passwd |valuntil -------+--------+-----------+--------+--------+---------+--------+---------------------------- pgsql | 70|t |t |t |t |********|Sat Jan 31 01:00:00 2037 EST cr | 71|t |t |t |t |********| paxis | 72|f |t |t |t |********| (3 rows) template1=> Suspended cr@photox% !fstat fstat -p 90608 USER CMD PID FD MOUNT INUM MODE SZ|DV R/W pgsql postgres 90608 root / 2 drwxr-xr-x 512 r pgsql postgres 90608 wd /usr 366233 drwx------ 1536 r pgsql postgres 90608 text /usr 334856 -r-xr-xr-x 1050936 r pgsql postgres 90608 0 / 967 crw-rw-rw- null rw pgsql postgres 90608 1 / 967 crw-rw-rw- null rw pgsql postgres 90608 2 / 967 crw-rw-rw- null rw pgsql postgres 90608 3 /usr 365266 -rw------- 1712 r pgsql postgres 90608 4 /usr 366283 -rw------- 262144 rw pgsql postgres 90608 5* local stream ca3f3b80 <-> ca3f3cc0 pgsql postgres 90608 6 /usr 366236 -rw------- 8192 rw pgsql postgres 90608 7 /usr 366239 -rw------- 8192 rw pgsql postgres 90608 8 /usr 366269 -rw------- 16384 rw pgsql postgres 90608 9 /usr 366238 -rw------- 49152 rw pgsql postgres 90608 10 /usr 366259 -rw------- 32768 rw pgsql postgres 90608 11 /usr 366281 -rw------- 8192 rw pgsql postgres 90608 12 /usr 366235 -rw------- 172032 rw pgsql postgres 90608 13 /usr 366246 -rw------- 8192 rw pgsql postgres 90608 14 /usr 366242 -rw------- 8192 rw pgsql postgres 90608 15 /usr 366249 -rw------- 8192 rw pgsql postgres 90608 16 /usr 366247 -rw------- 16384 rw pgsql postgres 90608 17 /usr 366244 -rw------- 65536 rw pgsql postgres 90608 18 /usr 366262 -rw------- 139264 rw pgsql postgres 90608 19 /usr 366237 -rw------- 16384 rw pgsql postgres 90608 20 /usr 366265 -rw------- 16384 rw pgsql postgres 90608 21 /usr 366261 -rw------- 40960 rw pgsql postgres 90608 22 /usr 366254 -rw------- 24576 rw pgsql postgres 90608 23 /usr 366292 -rw------- 0 rw pgsql postgres 90608 24 /usr 366258 -rw------- 65536 rw pgsql postgres 90608 25 /usr 366267 -rw------- 16384 rw pgsql postgres 90608 26 /usr 366254 -rw------- 24576 rw pgsql postgres 90608 27 /usr 366246 -rw------- 8192 rw pgsql postgres 90608 28 /usr 366242 -rw------- 8192 rw pgsql postgres 90608 29 /usr 366249 -rw------- 8192 rw pgsql postgres 90608 30 /usr 366247 -rw------- 16384 rw pgsql postgres 90608 31 /usr 366244 -rw------- 65536 rw pgsql postgres 90608 32 /usr 366265 -rw------- 16384 rw pgsql postgres 90608 33 /usr 366292 -rw------- 0 rw pgsql postgres 90608 34 /usr 366258 -rw------- 65536 rw pgsql postgres 90608 35 /usr 366281 -rw------- 8192 rw cr@photox% ls -iR /usr/local/pgsql/data/ | egrep '292|258|281' 366281 pg_shadow 366258 pg_attribute_relid_attnam_index 366292 pg_user
Cyrus Rahman <cr@photox.jcmax.com> writes: > As you can see, a connection open through a vacuum does end up duplicating > its open file descriptors. Indeed, phrased in that fashion it's easy to duplicate the problem. Interestingly, this isn't a big problem on platforms where there is a relatively low limit on number of open files per process. A backend will run its open file count up to the limit and then stay there (wasting a few more virtual-file-descriptor array slots per vacuum cycle, but this is such a small memory leak you'd likely never notice). But on systems that let a process have thousands of kernel file descriptors, there will be no recycling of kernel descriptors as the number of virtual descriptors increases. What's the consensus, hackers? Do we risk sticking Hiroshi's patch into 6.5.2, or not? It should definitely go into current, but I'm worried about putting it into the stable branch right before a release... Vadim, does it look right to you? regards, tom lane
Tom Lane wrote: > > Interestingly, this isn't a big problem on platforms where there is ^^^^^^^^^^^^^^^^^^^^^^^^ > a relatively low limit on number of open files per process. A backend ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > will run its open file count up to the limit and then stay there > (wasting a few more virtual-file-descriptor array slots per vacuum > cycle, but this is such a small memory leak you'd likely never notice). > But on systems that let a process have thousands of kernel file > descriptors, there will be no recycling of kernel descriptors as the > number of virtual descriptors increases. > > What's the consensus, hackers? Do we risk sticking Hiroshi's patch into > 6.5.2, or not? It should definitely go into current, but I'm worried > about putting it into the stable branch right before a release... > Vadim, does it look right to you? Sorry, I have no time to look in it. But there is another solution: > From: owner-pgsql-hackers@postgreSQL.org > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Vadim Mikheev > Sent: Monday, June 07, 1999 7:49 PM > To: Hiroshi Inoue > Cc: The Hermit Hacker; pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] postgresql-v6.5beta2.tar.gz ... > [snip] > 2. fd.c:pg_nofile()->sysconf(_SC_OPEN_MAX) returns in FreeBSD > near total number of files that can be opened in system > (by _all_ users/procs). With total number of opened files > ~ 2000 I can run your test with 10-20 simultaneous > xactions for very short time, -:) > > Should we limit fd.c:no_files to ~ 256? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > This is port-specific, of course... No risk at all... Vadim
> -----Original Message----- > From: root@sunpine.krs.ru [mailto:root@sunpine.krs.ru]On Behalf Of Vadim > Mikheev > Sent: Wednesday, September 01, 1999 1:18 AM > To: Tom Lane > Cc: Cyrus Rahman; Inoue@tpf.co.jp; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] File descriptor leakage? > > > Tom Lane wrote: > > > > Interestingly, this isn't a big problem on platforms where there is > ^^^^^^^^^^^^^^^^^^^^^^^^ > > a relatively low limit on number of open files per process. A backend > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > will run its open file count up to the limit and then stay there It's not a small problem on platforms such as cygwin, OS2 where we couldn't unlink open files. We have to close useless file descriptors ASAP there. 6.5.2-release should be stable as possible. So I don't object to the riskless way as Vadim mentioned. Regards. Hiroshi Inoue Inoue@tpf.co.jp
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: >> Tom Lane wrote: >>>> Interestingly, this isn't a big problem on platforms where there is >>>> a relatively low limit on number of open files per process. > It's not a small problem on platforms such as cygwin, OS2 where we > couldn't unlink open files. Ah, right, good ol' microsoft strikes again... > We have to close useless file descriptors ASAP there. > 6.5.2-release should be stable as possible. > So I don't object to the riskless way as Vadim mentioned. Well, Vadim's "riskless solution" does NOT solve the problem you mention above, AFAICT. Reducing the number of kernel file descriptors won't magically cause forgotten descriptors for a table you want to delete to not be there --- it just shortens the interval where you'll have a problem, by shortening the interval before the descriptors get recycled. If you reduce the number of descriptors enough to make the problem unlikely to occur, you'll be taking a big performance hit. So we need a proper fix to ensure the relation code doesn't forget about open descriptors. I will try to take a look at Hiroshi's patch this evening, and will commit it to both branches if I can't find anything wrong with it... regards, tom lane