Thread: Deleted WAL files held open by backends in Linux
I ran across the subject issue and spent some time puzzling over it. It seemed strange that the only backends which were holding open deleted WAL files were ones where the connection was established with a login which has no write permissions. Eventually, I vaguely recalled there was such an issue discussed in recent months and found it in the archives. It was this: http://archives.postgresql.org/pgsql-bugs/2009-09/msg00144.php As a DBA I feel that it's irritating and a potential time waster. I guess it is a stretch to imagine that a database would have enough read-only connections to exhaust resources by holding open one deleted WAL file each; unless they have, say, 200 such connections and they're cutting things so close that a wasted 3.2GB of disk space at the WAL file location will run them out. I'm not sure whether Tom's comment that "There is zero hope of making that work." referred to the idea that we could close deleted WAL files or to something else. Is a fix feasible? This was observed on: PostgreSQL 8.3.7 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070115 (SUSE Linux) SUSE Linux Enterprise Server 10 (x86_64) VERSION = 10 PATCHLEVEL = 2 Linux version 2.6.16.60-0.39.3-smp (geeko@buildhost) (gcc version 4.1.2 20070115 (SUSE Linux)) #1 SMP Mon May 11 11:46:34 UTC 2009 -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > I guess it is a stretch to imagine that a database would have > enough read-only connections to exhaust resources by holding open > one deleted WAL file each; unless they have, say, 200 such > connections and they're cutting things so close that a wasted 3.2GB > of disk space at the WAL file location will run them out. AFAIK, we rename and reuse old WAL segment files. So, we don't waste disk space unless checkpoint_segments are too small. Also, if you are worried about disk space, how about adding ftruncate(to 0 byte) before unlink() ? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > I ran across the subject issue and spent some time puzzling over it. > ... > I'm not sure whether Tom's comment that "There is zero hope of > making that work." referred to the idea that we could close deleted > WAL files or to something else. Is a fix feasible? The reason I was dismissive of that is this scenario: a backend is told to execute a data-modifying transaction (so it has to write and commit some WAL records) and then the client promptly goes to sleep and stays that way for awhile. The backend has nothing to do either, so it's just sitting waiting on the client socket. There is noplace reasonable here to make it decide to close its open WAL file. We could possibly do a pre-emptive close() after transaction commit, but that would result in a useless distributed overhead of opening and closing the current WAL file when a backend is engaged in a series of transactions. On Linux this simply isn't an issue, anyway --- if you can't afford one open file descriptor per backend for WAL, you're going to have a far bigger problem with the backend's habit of caching descriptors for data files. On Windows it might be a problem, but I think we have got workarounds for all the important cases. The important point is that we want any overhead for this to be pushed into non-performance-critical code paths. > It seemed strange that the only backends which were holding open > deleted WAL files were ones where the connection was established > with a login which has no write permissions. A backend would never open a WAL file unless it had to write a WAL record, so I'm having a hard time believing that these were totally read-only transactions. Can you give specifics? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It seemed strange that the only backends which were holding open >> deleted WAL files were ones where the connection was established >> with a login which has no write permissions. > > A backend would never open a WAL file unless it had to write a WAL > record, so I'm having a hard time believing that these were > totally read-only transactions. Can you give specifics? Here's some information captured before the original post: MILWAUKEE-PG:~ # lsof | grep deleted COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME postgres 13105 ccsa 10u REG 8,4 16777216 268442394 /var/pgsql/data/cc/pg_xlog/00000001000000E0000000BB (deleted) postgres 30500 ccsa 4u REG 8,4 16777216 268442626 /var/pgsql/data/cc/pg_xlog/00000001000000E20000003F (deleted) postgres 30501 ccsa 5u REG 8,4 16777216 283509014 /var/pgsql/data/cc/pg_xlog/00000001000000E200000016 (deleted) MILWAUKEE-PG:~ # ps auxf|grep ^ccsa USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND ccsa 30490 0.0 0.0 290988 8608 ? S Oct24 0:13 /usr/local/pgsql-8.3.7/bin/postgres -D /var/pgsql/data/cc ccsa 30491 0.0 0.0 14524 1056 ? Ss Oct24 0:10 \_ postgres: logger process ccsa 30493 0.0 1.0 291876 268760 ? Ss Oct24 1:36 \_ postgres: writer process ccsa 30494 0.0 0.0 291120 1452 ? Ss Oct24 0:01 \_ postgres: wal writer process ccsa 30495 0.0 0.0 291552 1564 ? Ss Oct24 0:00 \_ postgres: autovacuum launcher process ccsa 30496 0.0 0.0 14612 952 ? Ss Oct24 0:01 \_ postgres: archiver process last was 00000001000000E20000005B ccsa 30497 0.0 0.0 15280 1576 ? Ss Oct24 6:37 \_ postgres: stats collector process ccsa 30500 0.0 0.9 292448 231844 ? Ss Oct24 10:42 \_ postgres: viewer cc 127.0.0.1(36846) idle ccsa 30501 0.0 0.8 292496 209428 ? Ss Oct24 10:28 \_ postgres: viewer cc 165.219.78.11(53940) idle ccsa 3107 0.0 0.0 292508 5836 ? Ss Nov09 0:32 \_ postgres: cc cc 127.0.0.1(40442) idle ccsa 3113 0.0 1.1 305304 286280 ? Ss Nov09 6:51 \_ postgres: cc cc 127.0.0.1(49969) idle ccsa 3119 0.0 0.0 292508 5836 ? Ss Nov09 0:29 \_ postgres: cc cc 127.0.0.1(54127) idle ccsa 13105 0.0 0.0 293396 23852 ? Ss Nov23 0:08 \_ postgres: viewer cc 165.219.90.178(51481) idle ccsa 1485 0.8 1.1 312400 291508 ? Ss Nov24 14:18 \_ postgres: cc cc 127.0.0.1(42692) idle ccsa 10752 0.3 1.1 312712 290856 ? Ss Nov24 4:54 \_ postgres: cc cc 127.0.0.1(45119) idle ccsa 10908 1.3 1.1 313120 292836 ? Ss Nov24 20:03 \_ postgres: cc cc 127.0.0.1(42065) idle ccsa 25099 0.7 1.1 312864 285048 ? Ss 14:02 1:03 \_ postgres: cc cc 127.0.0.1(50361) idle ccsa 26571 0.0 0.3 298912 89528 ? Ss 14:29 0:06 \_ postgres: cc cc 127.0.0.1(52213) idle ccsa 26809 0.1 0.3 298940 88816 ? Ss 14:33 0:06 \_ postgres: cc cc 127.0.0.1(59145) idle ccsa 26812 0.6 1.1 310060 274192 ? Ss 14:33 0:44 \_ postgres: cc cc 127.0.0.1(59962) idle MILWAUKEE-PG:~ # ll /var/pgsql/data/cc/pg_xlog/ total 1671172 -rw------- 1 ccsa users 253 2009-11-20 21:29 00000001000000E00000000A.00001810.backup -rw------- 1 ccsa users 16777216 2009-11-25 16:10 00000001000000E20000005B -rw------- 1 ccsa users 16777216 2009-11-25 16:22 00000001000000E20000005C -rw------- 1 ccsa users 16777216 2009-11-25 04:15 00000001000000E20000005D -rw------- 1 ccsa users 16777216 2009-11-25 05:15 00000001000000E20000005E -rw------- 1 ccsa users 16777216 2009-11-25 06:15 00000001000000E20000005F -rw------- 1 ccsa users 16777216 2009-11-25 07:15 00000001000000E200000060 -rw------- 1 ccsa users 16777216 2009-11-25 07:42 00000001000000E200000061 [...] -rw------- 1 ccsa users 16777216 2009-11-25 15:34 00000001000000E2000000BB -rw------- 1 ccsa users 16777216 2009-11-25 15:37 00000001000000E2000000BC -rw------- 1 ccsa users 16777216 2009-11-25 15:44 00000001000000E2000000BD -rw------- 1 ccsa users 16777216 2009-11-25 15:50 00000001000000E2000000BE -rw------- 1 ccsa users 16777216 2009-11-25 15:57 00000001000000E2000000BF -rw------- 1 ccsa users 16777216 2009-11-25 16:04 00000001000000E2000000C0 drwx------ 2 ccsa users 94 2009-11-25 16:12 archive_status You will note that the connections logged in as "viewer" (and only those) are holding open a deleted WAL file. This user has not been granted anything except SELECT permissions to any tables. In addition, immediately after creating the database, we ran: REVOKE CREATE ON DATABASE cc FROM public; REVOKE CREATE ON SCHEMA public FROM public; Two of these connections are from software publishers, which only issue SELECT statements against three tables (to which they only have SELECT permission). The other is from monitoring software, which only runs SELECT against C functions from our fsutil software (which can be seen on pgfoundry). All are JDBC connections. I suspect that autoCommit is on; if you think it matters, I can go review the code to confirm. On one of our development machines, with a number of PostgreSQL databases, I found about 60 such WAL files deleted and held open by backends logged in under the read-only user. It wasted my time primarily because I was looking for situations where we might have deleted a directory tree without first stopping the process using it; while 60 file handles and 1GB of disk space is no big deal for us, it caused me to investigate the issue to confirm there weren't zombie PostgreSQL processes out there. At a minimum, we should add the extra 16MB per connection that might be taken on the WAL file system to the calculations people should do when sizing that, just in case someone is trying to cut that thin while planning on using a lot of connections. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> A backend would never open a WAL file unless it had to write a WAL >> record, so I'm having a hard time believing that these were >> totally read-only transactions. Can you give specifics? > You will note that the connections logged in as "viewer" (and only > those) are holding open a deleted WAL file. This user has not been > granted anything except SELECT permissions to any tables. You sure it's not creating any temp tables? You didn't mention revoking TEMP privilege. I can think of one code path that could result in a genuinely read-only session having to write WAL: if it's forced to flush dirty buffers in order to read in other pages, and such a buffer was dirtied by as-yet-uncommitted transactions, it might have to flush WAL to be allowed to write the dirty buffer. But I think you'd have had to dial back the bgwriter to the point of uselessness before this would be a common occurrence. > At a minimum, we should add the extra 16MB per connection that might > be taken on the WAL file system to the calculations people should do > when sizing that, just in case someone is trying to cut that thin > while planning on using a lot of connections. In the first place, this is a complete non-issue except on Windows --- on other platforms we can rename and recycle the files even if they're being held open. I rather doubt anyone would think they could predict a Windows machine's disk usage that accurately anyway. In the second place, for each backend to be holding open a different dead WAL file strains the limits of credulity. Telling people to assume 16MB * max_connections would be a gross overestimate. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > You sure it's not creating any temp tables? You didn't mention > revoking TEMP privilege. They have not been revoked, but I am sure the software publisher doesn't explicitly create any, and I'd be very surprised if the monitoring software did. The tables are small enough that it's hard to believe that the 50MB work_mem would spill to disk, either (if that matters). > I can think of one code path that could result in a genuinely > read-only session having to write WAL: if it's forced to flush > dirty buffers in order to read in other pages, and such a buffer > was dirtied by as-yet-uncommitted transactions, it might have to > flush WAL to be allowed to write the dirty buffer. But I think > you'd have had to dial back the bgwriter to the point of > uselessness before this would be a common occurrence. #bgwriter_delay = 200ms bgwriter_lru_maxpages = 1000 bgwriter_lru_multiplier = 4.0 >> At a minimum, we should add the extra 16MB per connection that >> might be taken on the WAL file system to the calculations people >> should do when sizing that, just in case someone is trying to >> cut that thin while planning on using a lot of connections. > > In the first place, this is a complete non-issue except on Windows > --- on other platforms we can rename and recycle the files even > if they're being held open. I rather doubt anyone would think > they could predict a Windows machine's disk usage that accurately > anyway. In the second place, for each backend to be holding open > a different dead WAL file strains the limits of credulity. > Telling people to assume 16MB * max_connections would be a gross > overestimate. Pretty much every read only JDBC connection seems to be holding open a deleted WAL file on my Linux box, but it would take pretty pessimal timing for each connection to be holding open a different one -- I see that many connections share a deleted WAL file. project-db:~ # cat /proc/version Linux version 2.6.16.60-0.39.3-smp (geeko@buildhost) (gcc version 4.1.2 20070115 (SUSE Linux)) #1 SMP Mon May 11 11:46:34 UTC 2009 project-db:~ # lsof | grep deleted postgres 2189 ccefcirsa 20u REG 8,3 16777216 1610613340 /var/pgsql/data/ccefcir/pg_xlog/000000010000000A00000010 (deleted) postgres 2195 ccefcirsa 43u REG 8,3 16777216 1610613340 /var/pgsql/data/ccefcir/pg_xlog/000000010000000A00000010 (deleted) postgres 2511 jdashcirsa 1u CHR 136,5 7 /dev/pts/5 (deleted) postgres 2511 jdashcirsa 2u CHR 136,5 7 /dev/pts/5 (deleted) postgres 2514 jdashcirsa 3u REG 8,3 16777216 1610631538 /var/pgsql/data/jdashcir/pg_xlog/000000010000001100000076 (deleted) postgres 2812 ccsa 3u REG 8,3 16777216 3763808807 /var/pgsql/data/epayment/pg_xlog/000000020000009B00000094 (deleted) postgres 3647 ccefsa 1u CHR 136,5 7 /dev/pts/5 (deleted) postgres 3647 ccefsa 2u CHR 136,5 7 /dev/pts/5 (deleted) postgres 3650 ccefsa 3u REG 8,3 16777216 669441097 /var/pgsql/data/ccef/pg_xlog/000000020000004B000000C5 (deleted) postgres 4266 milwaukeeifsa 3u REG 8,3 16777216 2708846433 /var/pgsql/data/milwaukeeif/pg_xlog/000000020000006B00000014 (deleted) postgres 4270 juryscansa 3u REG 8,3 16777216 1073742165 /var/pgsql/data/juryscan/pg_xlog/000000020000001A000000DB (deleted) postgres 6100 juryscansa 41u REG 8,3 16777216 1073742165 /var/pgsql/data/juryscan/pg_xlog/000000020000001A000000DB (deleted) postgres 6105 juryscansa 35u REG 8,3 16777216 1073742167 /var/pgsql/data/juryscan/pg_xlog/000000020000001A0000007E (deleted) postgres 6113 juryscansa 43u REG 8,3 16777216 1073742166 /var/pgsql/data/juryscan/pg_xlog/000000020000001A0000001D (deleted) postgres 6119 juryscansa 59u REG 8,3 16777216 1073742165 /var/pgsql/data/juryscan/pg_xlog/000000020000001A000000DB (deleted) postgres 6121 juryscansa 62u REG 8,3 16777216 1073742165 /var/pgsql/data/juryscan/pg_xlog/000000020000001A000000DB (deleted) postgres 6126 juryscansa 33u REG 8,3 16777216 1073742167 /var/pgsql/data/juryscan/pg_xlog/000000020000001A0000007E (deleted) postgres 7997 washrptcirsa 13u REG 8,3 16777216 2688967689 /var/pgsql/data/washrptcir/pg_xlog/000000010000002B000000E4 (deleted) postgres 8170 washrptcirsa 23u REG 8,3 16777216 2688967689 /var/pgsql/data/washrptcir/pg_xlog/000000010000002B000000E4 (deleted) postgres 8687 washrptcirsa 11u REG 8,3 16777216 2688967689 /var/pgsql/data/washrptcir/pg_xlog/000000010000002B000000E4 (deleted) postgres 8720 washrptcirsa 3u REG 8,3 16777216 2688967689 /var/pgsql/data/washrptcir/pg_xlog/000000010000002B000000E4 (deleted) postgres 8769 washrptcirsa 6u REG 8,3 16777216 2688967689 /var/pgsql/data/washrptcir/pg_xlog/000000010000002B000000E4 (deleted) postgres 10093 washrptcirsa 18u REG 8,3 16777216 2688967689 /var/pgsql/data/washrptcir/pg_xlog/000000010000002B000000E4 (deleted) postgres 10313 ccsagsa 3u REG 8,3 16777216 3758753714 /var/pgsql/data/ccsag/pg_xlog/0000000300000035000000EE (deleted) postgres 10660 datasourcesa 50u REG 8,3 16777216 3250939657 /var/pgsql/data/datasource/pg_xlog/0000000100000004000000C9 (deleted) postgres 10661 datasourcesa 23u REG 8,3 16777216 3250939657 /var/pgsql/data/datasource/pg_xlog/0000000100000004000000C9 (deleted) postgres 10666 datasourcesa 20u REG 8,3 16777216 3250939679 /var/pgsql/data/datasource/pg_xlog/0000000100000004000000B3 (deleted) postgres 10667 datasourcesa 20u REG 8,3 16777216 3250939679 /var/pgsql/data/datasource/pg_xlog/0000000100000004000000B3 (deleted) postgres 11049 formssa 3u REG 8,3 16777216 1879057748 /var/pgsql/data/forms/pg_xlog/000000010000000B000000C1 (deleted) postgres 11209 milwaukeeifsa 19u REG 8,3 16777216 2690371279 /var/pgsql/data/milwaukeeif/pg_xlog/000000020000006900000018 (deleted) postgres 11217 milwaukeeifsa 131u REG 8,3 16777216 2690371279 /var/pgsql/data/milwaukeeif/pg_xlog/0000000200000069000000DE (deleted) postgres 11221 milwaukeeifsa 34u REG 8,3 16777216 2690371282 /var/pgsql/data/milwaukeeif/pg_xlog/000000020000006A00000051 (deleted) postgres 11223 milwaukeeifsa 16u REG 8,3 16777216 2690371282 /var/pgsql/data/milwaukeeif/pg_xlog/000000020000006A00000051 (deleted) postgres 11659 jdashccsa 6u REG 8,3 16777216 401111458 /var/pgsql/data/jdashcc/pg_xlog/000000020000006400000080 (deleted) postgres 12949 jdashccsa 22u REG 8,3 16777216 268437280 /var/pgsql/data/jdashcc/pg_xlog/0000000200000065000000BE (deleted) postgres 12976 formssa 31u REG 8,3 16777216 1879057738 /var/pgsql/data/forms/pg_xlog/000000010000000F00000082 (deleted) postgres 12978 formssa 23u REG 8,3 16777216 1879057738 /var/pgsql/data/forms/pg_xlog/000000010000000F00000082 (deleted) postgres 12980 formssa 34u REG 8,3 16777216 1879057738 /var/pgsql/data/forms/pg_xlog/000000010000000F00000082 (deleted) postgres 12992 formssa 35u REG 8,3 16777216 1879057738 /var/pgsql/data/forms/pg_xlog/000000010000000F00000082 (deleted) postgres 12993 formssa 25u REG 8,3 16777216 1879057748 /var/pgsql/data/forms/pg_xlog/000000010000000E000000F4 (deleted) postgres 12994 formssa 24u REG 8,3 16777216 1879057738 /var/pgsql/data/forms/pg_xlog/000000010000000F00000082 (deleted) postgres 14359 jdashccsa 22u REG 8,3 16777216 401111459 /var/pgsql/data/jdashcc/pg_xlog/0000000200000065000000BD (deleted) postgres 15397 washrptcirsa 26u REG 8,3 16777216 2688967664 /var/pgsql/data/washrptcir/pg_xlog/000000010000002900000079 (deleted) postgres 16422 ccsagsa 11u REG 8,3 16777216 3758531497 /var/pgsql/data/ccsag/pg_xlog/0000000300000030000000ED (deleted) postgres 16490 q409sa 3u REG 8,3 16777216 3490994896 /var/pgsql/data/codesq409/pg_xlog/000000010000000B00000056 (deleted) postgres 16533 q309sa 3u REG 8,3 16777216 1073749053 /var/pgsql/data/codesq309/pg_xlog/0000000100000007000000A4 (deleted) postgres 16721 ccsa 27u REG 8,3 16777216 3763808825 /var/pgsql/data/epayment/pg_xlog/000000020000009B00000068 (deleted) postgres 16725 ccsa 25u REG 8,3 16777216 3763808807 /var/pgsql/data/epayment/pg_xlog/000000020000009B00000094 (deleted) postgres 16729 ccsa 17u REG 8,3 16777216 3763808824 /var/pgsql/data/epayment/pg_xlog/000000020000009A000000F0 (deleted) postgres 16734 ccsa 34u REG 8,3 16777216 3763808807 /var/pgsql/data/epayment/pg_xlog/000000020000009B000000FA (deleted) postgres 16741 ccsa 18u REG 8,3 16777216 3763808825 /var/pgsql/data/epayment/pg_xlog/000000020000009B00000068 (deleted) postgres 16747 ccsa 38u REG 8,3 16777216 3763808807 /var/pgsql/data/epayment/pg_xlog/000000020000009B000000FA (deleted) postgres 18316 ccefsa 13u REG 8,3 16777216 669441097 /var/pgsql/data/ccef/pg_xlog/000000020000004B000000C5 (deleted) postgres 20568 washrptsa 3u REG 8,3 16777216 1463 /var/pgsql/data/washrpt/pg_xlog/000000020000004700000011 (deleted) postgres 23040 milwaukeeifsa 47u REG 8,3 16777216 2690371282 /var/pgsql/data/milwaukeeif/pg_xlog/000000020000006A00000051 (deleted) nscd 23041 root 7u REG 8,2 217016 1459569 /var/run/nscd/dbnP5QN9 (deleted) nscd 23041 root 8r REG 8,2 217016 1459569 /var/run/nscd/dbnP5QN9 (deleted) postgres 23322 ccefcirsa 1u CHR 136,0 2 /dev/pts/0 (deleted) postgres 23322 ccefcirsa 2u CHR 136,0 2 /dev/pts/0 (deleted) postgres 23339 milwaukeeifsa 136u REG 8,3 16777216 2708846433 /var/pgsql/data/milwaukeeif/pg_xlog/000000020000006B00000014 (deleted) postgres 23603 jawebsa 1u CHR 136,5 7 /dev/pts/5 (deleted) postgres 23603 jawebsa 2u CHR 136,5 7 /dev/pts/5 (deleted) postgres 24367 mssa 34u REG 8,3 16777216 805314300 /var/pgsql/data/ms/pg_xlog/000000010000000A00000058 (deleted) postgres 24368 mssa 37u REG 8,3 16777216 805314303 /var/pgsql/data/ms/pg_xlog/000000010000000A00000059 (deleted) postgres 24923 encryptsa 335u REG 8,3 16777216 805310179 /var/pgsql/data/encrypt/pg_xlog/000000020000003E00000023 (deleted) postgres 24925 encryptsa 27u REG 8,3 16777216 805310179 /var/pgsql/data/encrypt/pg_xlog/000000020000003E00000023 (deleted) postgres 24931 encryptsa 16u REG 8,3 16777216 805310179 /var/pgsql/data/encrypt/pg_xlog/000000020000003E00000023 (deleted) postgres 24933 encryptsa 26u REG 8,3 16777216 805310179 /var/pgsql/data/encrypt/pg_xlog/000000020000003E00000023 (deleted) postgres 24942 encryptsa 16u REG 8,3 16777216 805310179 /var/pgsql/data/encrypt/pg_xlog/000000020000003E00000023 (deleted) postgres 26267 washrptcirsa 1u CHR 136,3 5 /dev/pts/3 (deleted) postgres 26267 washrptcirsa 2u CHR 136,3 5 /dev/pts/3 (deleted) postgres 26270 washrptcirsa 3u REG 8,3 16777216 2688967703 /var/pgsql/data/washrptcir/pg_xlog/000000010000002B00000070 (deleted) postgres 27403 milwaukeeifsa 30u REG 8,3 16777216 2708846433 /var/pgsql/data/milwaukeeif/pg_xlog/000000020000006B00000014 (deleted) postgres 28916 encryptsa 3u REG 8,3 16777216 805314462 /var/pgsql/data/encrypt/pg_xlog/000000020000003C000000E8 (deleted) postgres 29217 mssa 3u REG 8,3 16777216 805314300 /var/pgsql/data/ms/pg_xlog/000000010000000A00000010 (deleted) postgres 30151 datasourcesa 3u REG 8,3 16777216 3250939659 /var/pgsql/data/datasource/pg_xlog/000000010000000400000094 (deleted) I'll test a bit to see what it takes to get a JDBC connection to open a WAL file for a connection. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> You sure it's not creating any temp tables? You didn't mention >> revoking TEMP privilege. > They have not been revoked, but I am sure the software publisher > doesn't explicitly create any, and I'd be very surprised if the > monitoring software did. The tables are small enough that it's hard > to believe that the 50MB work_mem would spill to disk, either (if > that matters). It's not about the size of a temp table, because writes to the temp table itself aren't WAL-logged. However, the system catalog entries for a temp table *are* WAL-logged. > Pretty much every read only JDBC connection seems to be holding open > a deleted WAL file on my Linux box, but it would take pretty > pessimal timing for each connection to be holding open a different > one -- I see that many connections share a deleted WAL file. This still seems a bit improbable to me. There has to be something causing those sessions to touch WAL, and the dirty-buffer scenario doesn't seem reliable enough. [ thinks... ] How about SELECT FOR UPDATE or SELECT FOR SHARE? Those cause WAL writes. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > It's not about the size of a temp table, because writes to the > temp table itself aren't WAL-logged. However, the system catalog > entries for a temp table *are* WAL-logged. Definitely not issuing any CREATE TEMP statements of any kind, unless the JDBC driver is doing that under the covers. >> Pretty much every read only JDBC connection seems to be holding >> open a deleted WAL file on my Linux box, but it would take pretty >> pessimal timing for each connection to be holding open a >> different one -- I see that many connections share a deleted WAL >> file. > > This still seems a bit improbable to me. There has to be > something causing those sessions to touch WAL, and the > dirty-buffer scenario doesn't seem reliable enough. > > [ thinks... ] How about SELECT FOR UPDATE or SELECT FOR SHARE? > Those cause WAL writes. Definitely not. Probably best not to worry about it until I can play around with some Java test code to see what it takes to cause the connection to open the WAL. I'll post when I've had a chance to try that. -Kevin
Tom Lane <tgl@sss.pgh.pa.us> wrote: > There has to be something causing those sessions to touch WAL, and > the dirty-buffer scenario doesn't seem reliable enough. This is seeming fairly likely to be the cause, though. It may be a combination of the nightly VACUUM FREEZE ANALYZE we typically do on every database and the monitoring software, which would directly run a couple queries each five seconds or so, and would also exercise the software publishers on a similar schedule. My "smoking gun" is that the monitoring software was restarted last night after the time that the nightly vacuums would have completed, and its connections are not showing as having a deleted WAL file open today. I was unable to directly cause the a backend to open a WAL file by connecting through JDBC and running the same kind of queries used by the software publisher, which does tend to point to some indirect mechanism, like the dirty buffer flushing. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> There has to be something causing those sessions to touch WAL, and >> the dirty-buffer scenario doesn't seem reliable enough. > This is seeming fairly likely to be the cause, though. It may be a > combination of the nightly VACUUM FREEZE ANALYZE we typically do on > every database and the monitoring software, which would directly run > a couple queries each five seconds or so, and would also exercise > the software publishers on a similar schedule. My "smoking gun" is > that the monitoring software was restarted last night after the time > that the nightly vacuums would have completed, and its connections > are not showing as having a deleted WAL file open today. Hmm. If the read-only sessions are long-lived then you could expect that the probability of having flushed a dirty block (and hence had to write some WAL) increases over time and eventually approaches 1. How old were the sessions you were looking at? If we think this is worth doing something about (I'm not convinced yet) then the answer would be to forcibly close a backend's open WAL file in some reasonably seldom-used code path. One possibility that comes to mind is to do it in ProcessCatchupEvent(), which will be invoked in approximately the right circumstances: a backend that is sitting idle for a long time within an otherwise busy system. That wouldn't be a 100% solution, because if the backend is handling a steady stream of queries it will likely never run ProcessCatchupEvent(). But the places that would be 100% (like transaction commit) would probably entail too much of a performance hit from repeatedly opening and closing WAL files. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > How old were the sessions you were looking at? Days to months old. > If we think this is worth doing something about > (I'm not convinced yet) Once one knows about the issue, it's only a minor annoyance, and that infrequently, so it's not worth anything that would take significant effort or would cause any measurable performance hit. > then the answer would be to forcibly close a backend's open WAL > file in some reasonably seldom-used code path. One possibility > that comes to mind is to do it in ProcessCatchupEvent(), which > will be invoked in approximately the right circumstances: a > backend that is sitting idle for a long time within an otherwise > busy system. That wouldn't be a 100% solution, because if the > backend is handling a steady stream of queries it will likely > never run ProcessCatchupEvent(). But the places that would be > 100% (like transaction commit) would probably entail too much of a > performance hit from repeatedly opening and closing WAL files. In our case there would tend to be bursts of activity on some of these connections, and they would usually have a dribble of activity from the monitoring system. This dribble would consist of a couple sub-millisecond SELECT statements once every five or ten seconds. (I have no clue whether that's frequent enough to suppress the ProcessCatchupEvent() call.) Is there a reasonably cheap way to check whether the backend has a WAL file open and whether that one is the current append target? As you point out, the need to actually close such a file would be infrequent, and it seems as though when the need exists, it wouldn't be a matter of *whether* it would need to be closed, just *when*. Currently it sounds like we would often wind up doing it the next time we try to run a query which has to flush dirty buffers, or when the connection is being closed. It seems like those might not be the ideal time to add the overhead of the close, so we would just need to find someplace which isn't worse. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Is there a reasonably cheap way to check whether the backend has a > WAL file open and whether that one is the current append target? Detecting whether we have a WAL file open is trivial (just look at the static variable holding the file descriptor). Determining whether it's still the current append target is not so cheap though; it would require examining shared-memory status which means taking a lock on that status (and it's a high-traffic lock already). We could have the open WAL file dropped if stale as a side-effect anytime we have occasion to examine that shared state anyway. But in a nearly-read-only session such as your example I'm not sure that would happen often enough to fix the problem. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Determining whether it's still the current append target is not so > cheap though; it would require examining shared-memory status > which means taking a lock on that status (and it's a high-traffic > lock already). I haven't reviewed the internal locking techniques, so this may well be a dumb question, but... Since we only care whether the value is equal, and an occasional false report of equality wouldn't hurt anything, couldn't we bypass the lock in this particular case? -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Determining whether it's still the current append target is not so >> cheap though; it would require examining shared-memory status >> which means taking a lock on that status (and it's a high-traffic >> lock already). > I haven't reviewed the internal locking techniques, so this may well > be a dumb question, but... Since we only care whether the value is > equal, and an occasional false report of equality wouldn't hurt > anything, couldn't we bypass the lock in this particular case? Perhaps, if you didn't mind sometimes getting a wrong answer. I guess the cost of that would be pretty small in this particular usage. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Perhaps, if you didn't mind sometimes getting a wrong answer. Well, it would be a heuristic which would close the deleted file *almost* all the time. When it didn't, the next check would probably catch it. Assuming that you would never get an indication that it was different when it hadn't changed, the worst that could happen is that you infrequently get the current behavior. That all *seems* sane to me, if low priroity. Any objections to my putting it on the TODO list? -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > Any objections to my putting it on the TODO list? Hearing none, added. (Apologies for missing the box where I should have commented on the what the change did.) -Kevin