Thread: Is IDLE session really idle?
Hello everybody!
I have many app servers using connection pooling. At any time, there are about 1000 total connection to the database from all the app servers; however, only few random connections are active.
The application often executes relatively big sorts. The work_mem size is 32MB, and eventually many sessions have a chance to run a sort and allocate a big sort area. I see hundreds of postgres processes with DATA segment > 15MB.
Eventually, it consumes all the available memory. Most of this memory is allocated to the sessions that are idle. I cannot change the connection pooling on the application side, and the big sorts cannot be eliminated. I need a solution on the DB side.
I see two ways to solve the problem.:
The question is, how safe it is to kill an idle session? If a session just became idle in pg_stats_activity, is it possible that it is still returning data to the client, or doing some other useful work?
In case it makes a difference: The Postgres version is 8.2.4 on Linux Suse 10.
Thank you.
I have many app servers using connection pooling. At any time, there are about 1000 total connection to the database from all the app servers; however, only few random connections are active.
The application often executes relatively big sorts. The work_mem size is 32MB, and eventually many sessions have a chance to run a sort and allocate a big sort area. I see hundreds of postgres processes with DATA segment > 15MB.
Eventually, it consumes all the available memory. Most of this memory is allocated to the sessions that are idle. I cannot change the connection pooling on the application side, and the big sorts cannot be eliminated. I need a solution on the DB side.
I see two ways to solve the problem.:
- Reducing the work_mem to something like 10MB will cap the total memory consumption. However, the memory will still be allocated to the mostly idle sessions. Also, reduced work_mem will cause increased disk I/O, which is already high.
- We can kill the idle sessions periodically. This will free up a big chunk of memory already allocated to the sessions. The application will gradually reestablish the connections, and the new sessions will start with small memory foot-print.
The question is, how safe it is to kill an idle session? If a session just became idle in pg_stats_activity, is it possible that it is still returning data to the client, or doing some other useful work?
In case it makes a difference: The Postgres version is 8.2.4 on Linux Suse 10.
Thank you.
Igor Polishchuk <ipolishchuk@hi5.com> writes: > The application often executes relatively big sorts. The work_mem size is > 32MB, and eventually many sessions have a chance to run a sort and allocate > a big sort area. I see hundreds of postgres processes with DATA segment > > 15MB. > Eventually, it consumes all the available memory. Most of this memory is > allocated to the sessions that are idle. I cannot change the connection > pooling on the application side, and the big sorts cannot be eliminated. I > need a solution on the DB side. Find out why the workspace isn't being released back to the OS. On some old versions of Unix, not releasing data space is the norm, but that is not true on any remotely modern version of Linux. > In case it makes a difference: The Postgres version is 8.2.4 on Linux Suse > 10. ... which apparently is what you are running, which makes me question whether you are really seeing any such effect at all. Are you sure what you are measuring isn't just top's inclination to report only a portion of shared memory as having been used by the process? I strongly suspect that what you are seeing is an illusion caused by your measurement tools. regards, tom lane
Hi Igor, On Mon, Jun 15, 2009 at 01:15:30PM -0700, Igor Polishchuk wrote: > 2. We can kill the idle sessions periodically. This will free up a big chunk > of memory already allocated to the sessions. The application will gradually > reestablish the connections, and the new sessions will start with small > memory foot-print. Don't do that. There is a race condition - if the application just starts using the connection a millisecond after you thought it was idle and issued the kill command, bad things might happen, depending on the robustness of the application. It might have already validated the connection and started the first query. > The question is, how safe it is to kill an idle session? If a session just > became idle in pg_stats_activity, is it possible that it is still returning > data to the client, or doing some other useful work? I recently figured out (on PostgreSQL 8.2.x) that an IDLE session might still be returning query results. I wrote an application to dump whole tables to flat files. It does a SELECT * FROM table, then streams to the file. Access is via JDBC, results are retrieved row by row via ResultSet.next(). I noticed that in pg_stat_activity a session switched between IDLE and "SELECT * FROM table" command while data was being retrieved. HTH, Tino. -- "What we nourish flourishes." - "Was wir nähren erblüht." www.lichtkreis-chemnitz.de www.craniosacralzentrum.de
On Montag 15 Juni 2009 Igor Polishchuk wrote: > I see two ways to solve the problem.: Why not just increase vm.swappiness = 100 in /etc/sysctl.conf, and/or use ramzswap? That will swap out unneeded memory very fast, and free it for use by others. There was an article on lwn.net recently: http://lwn.net/SubscriberLink/334649/76bbe60a81cfc7f5/ ( Compcache: in-memory compressed swapping ) If installed it on several servers, and my desktop. quite nice: Compresses 2GB to 512MB on my desktop, or 494MB to 186MB on a server. Even if staying with "normal" swap, increasing the swappiness can quickly free memory, but swap I/O might slow things down, depending on your setup. mfg zmi -- // Michael Monnerie, Ing.BSc ----- http://it-management.at // Tel: 0660 / 415 65 31 .network.your.ideas. // PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import" // Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4 // Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4
On Mon, Jun 15, 2009 at 2:15 PM, Igor Polishchuk<ipolishchuk@hi5.com> wrote: > Hello everybody! > > I have many app servers using connection pooling. At any time, there are > about 1000 total connection to the database from all the app servers; > however, only few random connections are active. > The application often executes relatively big sorts. The work_mem size is > 32MB, and eventually many sessions have a chance to run a sort and allocate > a big sort area. I see hundreds of postgres processes with DATA segment > > 15MB. > Eventually, it consumes all the available memory. Most of this memory is > allocated to the sessions that are idle. I cannot change the connection > pooling on the application side, and the big sorts cannot be eliminated. I > need a solution on the DB side. Could you post the commands and output you used to determine this? I agree with Tom that you might be mis-measuring memory usage. You do not mention your shared_memory size or total memory size. If you've got say 16G of ram and 12 Gig of shared_memory, then it's quite possible having touched all of shared_memory a pgsql backend will show 10 or 12 Gigs memory used. It's not actually independently using that much, but some people freak when they see it the first time.
Thank you guys for your responses. I'm using top to look for the memory consumption by the postgres processes. Inside the top, I'm using the interactive sorting command F and choose the sort by Data segment size. The exact sequence is: 1. launch top 2. Press 'F', the list of available for sorting fields appears on the screen, including "s: DATA = Data+Stack size (kb)" 3. press "s" 4. Top now shows the list of processes sorted by the field DATA Here how the sorted top screen looks in the end: top - 15:14:37 up 1 day, 23:20, 3 users, load average: 5.64, 5.18, 5.10 Tasks: 819 total, 2 running, 817 sleeping, 0 stopped, 0 zombie Cpu(s): 1.4%us, 0.8%sy, 0.0%ni, 75.4%id, 22.1%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 32876676k total, 18455704k used, 14420972k free, 129856k buffers Swap: 2104504k total, 2720k used, 2101784k free, 15038240k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ DATA COMMAND 4392 hyperic 16 0 410m 73m 9672 S 0 0.2 136:17.29 363m java 29487 postgres 16 0 108m 83m 2680 S 0 0.3 0:01.10 81m view 803 postgres 16 0 2300m 180m 163m S 0 0.6 0:01.04 23m postgres 1808 postgres 16 0 2300m 164m 147m S 0 0.5 0:01.03 23m postgres 577 postgres 16 0 2298m 166m 150m S 0 0.5 0:00.87 22m postgres 568 postgres 16 0 2298m 141m 126m S 0 0.4 0:00.63 22m postgres 1506 postgres 16 0 2298m 139m 124m S 0 0.4 0:00.81 22m postgres 362 postgres 16 0 2292m 128m 115m S 0 0.4 0:00.66 16m postgres 7674 postgres 15 0 2288m 29m 20m S 0 0.1 0:00.10 13m postgres 7238 postgres 16 0 2289m 61m 52m S 0 0.2 0:00.23 12m postgres 7440 postgres 16 0 2288m 51m 42m S 0 0.2 0:00.18 12m postgres 7248 postgres 16 0 2288m 52m 44m S 0 0.2 0:00.17 12m postgres 7336 postgres 16 0 2288m 59m 50m S 0 0.2 0:00.20 12m postgres 7246 postgres 16 0 2288m 52m 44m S 0 0.2 0:00.12 12m postgres 6913 postgres 16 0 2288m 59m 51m S 0 0.2 0:00.22 12m postgres 7013 postgres 16 0 2288m 51m 43m S 0 0.2 0:00.10 12m postgres 7288 postgres 16 0 2288m 48m 39m S 0 0.2 0:00.16 12m postgres 7327 postgres 16 0 2288m 53m 44m S 0 0.2 0:00.16 12m postgres 7070 postgres 16 0 2288m 50m 42m S 0 0.2 0:00.16 12m postgres 7543 postgres 15 0 2288m 47m 39m S 0 0.1 0:00.13 11m postgres ........ Also, in vmstat, I see the gradual reduction in size of the cache memory. Apparently, the Linux cache gets gradually dismissed by the postgres processes memory areas. Eventually, the database just hangs and the host becomes unresponsive for about 15 minutes till the sessions die out. So, I believe it is not just my misinterpretation of the metrics. Isn't it true, that work memory once allocated for a session does not get deallocated till the sessions is closed? It was my impression, anyway. So, eventually enough sessions get big work memory allocated to starve the Linux out of memory. My physical memory size is 32GB, Shared_buffers = 2GB On 6/15/09 2:13 PM, "Scott Marlowe" <scott.marlowe@gmail.com> wrote: > On Mon, Jun 15, 2009 at 2:15 PM, Igor Polishchuk<ipolishchuk@hi5.com> wrote: >> Hello everybody! >> >> I have many app servers using connection pooling. At any time, there are >> about 1000 total connection to the database from all the app servers; >> however, only few random connections are active. >> The application often executes relatively big sorts. The work_mem size is >> 32MB, and eventually many sessions have a chance to run a sort and allocate >> a big sort area. I see hundreds of postgres processes with DATA segment > >> 15MB. >> Eventually, it consumes all the available memory. Most of this memory is >> allocated to the sessions that are idle. I cannot change the connection >> pooling on the application side, and the big sorts cannot be eliminated. I >> need a solution on the DB side. > > Could you post the commands and output you used to determine this? I > agree with Tom that you might be mis-measuring memory usage. > > You do not mention your shared_memory size or total memory size. If > you've got say 16G of ram and 12 Gig of shared_memory, then it's quite > possible having touched all of shared_memory a pgsql backend will show > 10 or 12 Gigs memory used. It's not actually independently using that > much, but some people freak when they see it the first time.
On Mon, Jun 15, 2009 at 4:25 PM, Igor Polishchuk<ipolishchuk@hi5.com> wrote: > Thank you guys for your responses. > > I'm using top to look for the memory consumption by the postgres processes. > Inside the top, I'm using the interactive sorting command F and choose the > sort by Data segment size. The exact sequence is: > 1. launch top > 2. Press 'F', the list of available for sorting fields appears on the > screen, including "s: DATA = Data+Stack size (kb)" > 3. press "s" > 4. Top now shows the list of processes sorted by the field DATA > > Here how the sorted top screen looks in the end: > > top - 15:14:37 up 1 day, 23:20, 3 users, load average: 5.64, 5.18, 5.10 > Tasks: 819 total, 2 running, 817 sleeping, 0 stopped, 0 zombie > Cpu(s): 1.4%us, 0.8%sy, 0.0%ni, 75.4%id, 22.1%wa, 0.0%hi, 0.3%si, > 0.0%st > Mem: 32876676k total, 18455704k used, 14420972k free, 129856k buffers > Swap: 2104504k total, 2720k used, 2101784k free, 15038240k cached Out of 32G, you've got 14G free, 18G used of which 15G is cache. Looks good. > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ DATA COMMAND > 4392 hyperic 16 0 410m 73m 9672 S 0 0.2 136:17.29 363m java > 29487 postgres 16 0 108m 83m 2680 S 0 0.3 0:01.10 81m view > 803 postgres 16 0 2300m 180m 163m S 0 0.6 0:01.04 23m postgres > 1808 postgres 16 0 2300m 164m 147m S 0 0.5 0:01.03 23m postgres > 577 postgres 16 0 2298m 166m 150m S 0 0.5 0:00.87 22m postgres > 568 postgres 16 0 2298m 141m 126m S 0 0.4 0:00.63 22m postgres > 1506 postgres 16 0 2298m 139m 124m S 0 0.4 0:00.81 22m postgres > 362 postgres 16 0 2292m 128m 115m S 0 0.4 0:00.66 16m postgres > 7674 postgres 15 0 2288m 29m 20m S 0 0.1 0:00.10 13m postgres > 7238 postgres 16 0 2289m 61m 52m S 0 0.2 0:00.23 12m postgres > 7440 postgres 16 0 2288m 51m 42m S 0 0.2 0:00.18 12m postgres > 7248 postgres 16 0 2288m 52m 44m S 0 0.2 0:00.17 12m postgres > 7336 postgres 16 0 2288m 59m 50m S 0 0.2 0:00.20 12m postgres > 7246 postgres 16 0 2288m 52m 44m S 0 0.2 0:00.12 12m postgres > 6913 postgres 16 0 2288m 59m 51m S 0 0.2 0:00.22 12m postgres > 7013 postgres 16 0 2288m 51m 43m S 0 0.2 0:00.10 12m postgres > 7288 postgres 16 0 2288m 48m 39m S 0 0.2 0:00.16 12m postgres > 7327 postgres 16 0 2288m 53m 44m S 0 0.2 0:00.16 12m postgres > 7070 postgres 16 0 2288m 50m 42m S 0 0.2 0:00.16 12m postgres > 7543 postgres 15 0 2288m 47m 39m S 0 0.1 0:00.13 11m postgres VIRT of 2.2G is pretty normal, considering it likely includes all of shared_buffers ever touched. Is this machine now at idle? > Also, in vmstat, I see the gradual reduction in size of the cache memory. > Apparently, the Linux cache gets gradually dismissed by the postgres > processes memory areas. Well, this top doesn't show that. As long as you've got 14G or so free the linux kernel won't be discarding cache. > Eventually, the database just hangs and the host > becomes unresponsive for about 15 minutes till the sessions die out. > So, I believe it is not just my misinterpretation of the metrics. Do you have output of top and vmstat when this is happening? > Isn't it true, that work memory once allocated for a session does not get > deallocated till the sessions is closed? By session do you mean transaction, or the life of the connection? > It was my impression, anyway. > So, eventually enough sessions get big work memory allocated to starve the > Linux out of memory. But we've seen no evidence of this happening in vmstat or top. It would be really handy to see what they're saying when this is happening. > My physical memory size is 32GB, > Shared_buffers = 2GB
"Scott Whitney" <swhitney@journyx.com> writes: > Sorry to bother you. Can you route me towards any information on this > statement? > " top's inclination to report only a portion > of shared memory as having been used by the process?" On many platforms, top shows each backend process as having "used" only as many pages of PG's shared memory segment as that specific process has actually touched in its lifespan. So the normal behavior is that the reported memory size gradually rises from just-private-space to private-space-plus-all-of-shared-memory as the process randomly happens to have need to touch each of the shared buffer slots. This is on top of any actual increase in the process's own (private) memory space. I suspect the other guy was seeing this effect more than any real long-term increase in private space. But of course there's not enough evidence in what he posted to prove it one way or the other. Some versions of "top" distinguish shared and private memory usage, which really helps. But a lot don't, or people may not know which column to pay attention to. > I _thought_ I had seen such behavior in the past, but I've never found (not > particularly looked) for such information. Top seems to lie to me on a > fairly-regular basis, and not just via PG... This particular effect only applies to applications that use large chunks of shared memory. There might be some other misleading things :-( regards, tom lane
Igor Polishchuk <ipolishchuk@hi5.com> writes: > ... Eventually, the database just hangs and the host > becomes unresponsive for about 15 minutes till the sessions die out. Well, that's a problem all right, but I'm entirely unconvinced by your hypothesis about what's causing it. Have you tried watching "vmstat" (particularly the swap rate) while things go downhill? Do any signs of distress start to appear in the postmaster log or other system logs? Do other services on the box start to fail, and if so what are they complaining about? > Isn't it true, that work memory once allocated for a session does not get > deallocated till the sessions is closed? No. regards, tom lane
Hi, Tom Lane <tgl@sss.pgh.pa.us> writes: > "Scott Whitney" <swhitney@journyx.com> writes: >> I _thought_ I had seen such behavior in the past, but I've never found (not >> particularly looked) for such information. Top seems to lie to me on a >> fairly-regular basis, and not just via PG... > > This particular effect only applies to applications that use large > chunks of shared memory. There might be some other misleading things > :-( You might appreciate this reading (which talks about using exmap to get reliable figures, but it seems to relate to shared objects and it's unclear whether exmap will account correctly for IPC shared memory): http://ktown.kde.org/~seli/memory/desktop_benchmark.html The tool used to measure memory usage was Exmap - the only tool for measuring memory usage that I've ever found to be actually useful (I think I've already blogged about it ;) ). Its so-called effective memory usage numbers try to account for things like dividing shared libraries among all the processes using them, unlike tools like top that just report the numbers they find in /proc and nobody really knows how to interpret them. In other words, if you use things like top or free for precise measuring of memory usage, you're crazy. Nevertheless, for the crazy ones, I used also free alongside with Exmap, just for the fun of it, numbers from free will follow in parentheses. They should not be considered to be useful though. Regards, -- dim