Thread: sudden spurt in swap utilization (was:cpu bound postgresql setup.)
A scary phenomenon is being exhibited by the server , which is the server is slurping all the swap suddenly , some of the relevant sar -r output are: 10:30:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad 10:40:01 AM 979068 31892208 97.02 10588 28194876 1781568 314872 15.02 66500 10:50:01 AM 1791536 31079740 94.55 10480 27426512 1782848 313592 14.96 43880 11:00:01 AM 4678768 28192508 85.77 9692 27213312 1784888 311552 14.86 33296 11:10:01 AM 179208 32692068 99.45 3180 27569008 1725136 371304 17.71 65444 11:20:01 AM 225604 32645672 99.31 2604 29817192 1693672 402768 19.21 78312 <------- 11:30:01 AM 520224 32351052 98.42 1780 26863576 0 2096440 100.00 1585772 <------ within 10mins 11:40:02 AM 483532 32387744 98.53 2672 27220404 0 2096440 100.00 43876 11:50:01 AM 162700 32708576 99.51 3316 27792540 0 2096440 100.00 43708 12:00:01 PM 420176 32451100 98.72 3772 28181316 0 2096440 100.00 43708 12:10:01 PM 331624 32539652 98.99 3236 27857760 0 2096440 100.00 0 12:20:01 PM 1023428 31847848 96.89 4632 27450504 0 2096440 100.00 0 12:30:01 PM 763296 32107980 97.68 4988 28270704 0 2096440 100.00 0 12:40:01 PM 770280 32100996 97.66 5260 28423292 0 2096440 100.00 0 Then i added more swap made it 4GB from 2GB 02:10:05 PM 8734144 24137132 73.43 5532 21219972 2096788 2096124 49.99 52 02:12:01 PM 5989044 26882232 81.78 6108 23606680 2096788 2096124 49.99 52 02:14:01 PM 1517724 31353552 95.38 6320 26988280 2096788 2096124 49.99 52 02:16:01 PM 316692 32554584 99.04 6516 28840264 1844856 2348056 56.00 251984 02:18:01 PM 450672 32420604 98.63 7748 27238712 0 4192912 100.00 2096840 <---- all swap gone. 02:20:01 PM 164388 32706888 99.50 7556 27118104 0 4192912 100.00 2096840 02:22:01 PM 848544 32022732 97.42 6212 26718712 0 4192912 100.00 2096840 02:24:01 PM 231332 32639944 99.30 6136 27276720 0 4192912 100.00 2096840 02:26:01 PM 639560 32231716 98.05 5608 27029372 0 4192912 100.00 2096840 02:28:01 PM 868824 32002452 97.36 4648 26253996 0 4192912 100.00 2096840 ....... 03:04:01 PM 854408 32016868 97.40 4976 27182140 0 4192912 100.00 0 03:06:01 PM 1571904 31299372 95.22 5184 27513232 0 4192912 100.00 0 03:08:02 PM 304600 32566676 99.07 5420 27850780 0 4192912 100.00 0 03:10:01 PM 915352 31955924 97.22 5632 28076320 0 4192912 100.00 0 03:12:01 PM 705132 32166144 97.85 5680 28057444 0 4192912 100.00 0 03:14:01 PM 369516 32501760 98.88 6136 27684364 0 4192912 100.00 0 in vmstat the system does not seems to be swapping vmstat 5 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 24 2 4192912 947796 6036 27785324 1 0 451 208 0 0 50 6 39 5 0 22 3 4192912 1028956 6044 27795728 0 0 1730 555 13445 14736 67 12 17 4 0 24 0 4192912 877508 6052 27806172 0 0 1595 2292 13334 15666 67 9 19 5 0 14 8 4192912 820432 6068 27819756 0 0 2331 1351 13208 16192 66 9 14 11 0 23 1 4192912 925960 6076 27831644 0 0 1932 1584 13144 16291 71 9 14 5 0 2 3 4192912 895288 6084 27846432 0 0 2496 991 13450 16303 70 9 13 8 0 17 0 4192912 936252 6092 27859868 0 0 2122 826 13438 16233 69 9 17 5 0 8 1 4192912 906164 6100 27873640 0 0 2277 858 13440 16235 63 8 19 10 0 I reduced work_mem from 4GB to 2GB to 512MB (now). I clearly remember that this abnormal consumption of swap was NOT there even when work_mem was 4GB. eg during happier times swap utilisation was: http://pastebin.com/bnE1pFZ9 the question is whats making postgres slurp the swap? i am posting my current postgresql.conf once again. # cat postgresql.conf | grep -v "^\s*#" | grep -v "^\s*$" listen_addresses = '*' # what IP address(es) to listen on; port = 5432 # (change requires restart) max_connections = 300 # (change requires restart) shared_buffers = 10GB # min 128kB work_mem = 512MB # min 64kB fsync = on # turns forced synchronization on or off synchronous_commit = on # immediate fsync at commit checkpoint_segments = 30 # in logfile segments, min 1, 16MB each archive_mode = on # allows archiving to be done archive_command = '/opt/scripts/archive_wal.sh %p %f ' archive_timeout = 600 # force a logfile segment switch after this effective_cache_size = 18GB constraint_exclusion = on # on, off, or partition logging_collector = on # Enable capturing of stderr and csvlog log_directory = '/var/log/postgresql' # directory where log files are written, log_filename = 'postgresql.log' # log file name pattern, log_truncate_on_rotation = on # If on, an existing log file of the log_rotation_age = 1d # Automatic rotation of logfiles will log_error_verbosity = verbose # terse, default, or verbose messages log_min_duration_statement = 5000 # -1 is disabled, 0 logs all statements datestyle = 'iso, mdy' lc_messages = 'en_US.UTF-8' # locale for system error message lc_monetary = 'en_US.UTF-8' # locale for monetary formatting lc_numeric = 'en_US.UTF-8' # locale for number formatting lc_time = 'en_US.UTF-8' # locale for time formatting default_text_search_config = 'pg_catalog.english' add_missing_from = on custom_variable_classes = 'general' # list of custom variable class names general.report_level = '' general.disable_audittrail2 = '' general.employee=''
On Fri, 2010-06-25 at 15:25 +0530, Rajesh Kumar Mallah wrote: > shared_buffers = 10GB # min 128kB > work_mem = 512MB # min 64kB These are still pretty high IMHO. How many *concurrent* connections do you have? -- Devrim GÜNDÜZ PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer PostgreSQL RPM Repository: http://yum.pgrpms.org Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz
Attachment
Rajesh Kumar Mallah wrote: > A scary phenomenon is being exhibited by the server , which is the server > is slurping all the swap suddenly > > 8 1 4192912 906164 6100 27873640 0 0 2277 858 13440 16235 > 63 8 19 10 0 > > I reduced work_mem from 4GB to 2GB to 512MB (now). I clearly remember that this > abnormal consumption of swap was NOT there even when work_mem was 4GB. > eg during happier times swap utilisation was: http://pastebin.com/bnE1pFZ9 > > the question is whats making postgres slurp the swap? i am posting my > current postgresql.conf > once again. > > # cat postgresql.conf | grep -v "^\s*#" | grep -v "^\s*$" > listen_addresses = '*' # what IP address(es) to listen on; > port = 5432 # (change requires restart) > max_connections = 300 # (change requires restart) > Hello Rajesh, In constrast with e.g. shared_buffers and effective_cache_size, work_mem is amount of memory per 'thing' (e.g. order/group by) that wants some working memory, so even a single backend can use several pieces of work_mem memory. Looking at your postgresql.conf, other memory values seem a bit too high as well for a 32GB ram server. It is probably a good idea to use pgtune (on pgfoundry) to get some reasonable ball park settings for your hardware. regards, Yeb Havinga
Re: sudden spurt in swap utilization (was:cpu bound postgresql setup.)
From
Rajesh Kumar Mallah
Date:
Dear List, Hmmm , lemme test efficacy of pg_tune. I would reduce shared buffers also. regarding concurrent queries: its now non business hours and SELECT procpid,current_query from pg_stat_activity where current_query not ilike '%idle%' ; is just 5-10, i am yet to measure it during business hours. Warm Regds Rajesh Kumar Mallah. On Fri, Jun 25, 2010 at 4:58 PM, Yeb Havinga <yebhavinga@gmail.com> wrote: > Rajesh Kumar Mallah wrote: >> >> A scary phenomenon is being exhibited by the server , which is the server >> is slurping all the swap suddenly >> 8 1 4192912 906164 6100 27873640 0 0 2277 858 13440 16235 >> 63 8 19 10 0 >> >> I reduced work_mem from 4GB to 2GB to 512MB (now). I clearly remember that >> this >> abnormal consumption of swap was NOT there even when work_mem was 4GB. >> eg during happier times swap utilisation was: http://pastebin.com/bnE1pFZ9 >> the question is whats making postgres slurp the swap? i am posting my >> current postgresql.conf >> once again. >> >> # cat postgresql.conf | grep -v "^\s*#" | grep -v "^\s*$" >> listen_addresses = '*' # what IP address(es) to listen on; >> port = 5432 # (change requires restart) >> max_connections = 300 # (change requires restart) >> > > Hello Rajesh, > > In constrast with e.g. shared_buffers and effective_cache_size, work_mem is > amount of memory per 'thing' (e.g. order/group by) that wants some working > memory, so even a single backend can use several pieces of work_mem memory. > > Looking at your postgresql.conf, other memory values seem a bit too high as > well for a 32GB ram server. It is probably a good idea to use pgtune (on > pgfoundry) to get some reasonable ball park settings for your hardware. > > regards, > Yeb Havinga > >
Rajesh Kumar Mallah <mallah.rajesh@gmail.com> wrote: > its now non business hours and > SELECT procpid,current_query from pg_stat_activity where > current_query not ilike '%idle%' ; > is just 5-10, i am yet to measure it during business hours. Be careful about '<IDLE> in transaction' status. Those are a problem if the transaction remains active for very long, because vacuum (autovacuum or otherwise) can't free space for dead rows which could still be visible to the '<IDLE> in transaction' connection. It's normal to see this status briefly between statements in a transaction, but it's a problem if a connection just sits there in this status. -Kevin
Re: sudden spurt in swap utilization (was:cpu bound postgresql setup.)
From
Rajesh Kumar Mallah
Date:
I changed shared_buffers from 10G to 4G , swap usage has almost become nil. # free total used free shared buffers cached Mem: 32871276 24575824 8295452 0 11064 22167324 -/+ buffers/cache: 2397436 30473840 Swap: 4192912 352 4192560 I also observed that there was a huge IO wait and load spike initially which gradually reduced to normal levels. Now things seems to be fine. but real test shall be during business hours. vmstat output: http://pastebin.com/ygu8gUhS the iowait now is very respectable < 10% and CPU is idling most of the time. # vmstat 10 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 2 1 352 8482444 11336 22299100 1 0 450 208 0 0 50 6 39 5 0 4 0 352 8393840 11352 22304484 0 0 480 163 9260 12717 32 4 62 3 0 5 1 352 8474788 11360 22308980 0 0 304 445 8295 12358 28 4 67 2 0 3 0 352 8370672 11376 22316676 0 0 648 158 8760 13214 38 4 55 3 0 11 0 352 8193824 11392 22323572 0 0 621 577 8800 13163 37 4 56 3 0 2 0 352 8229012 11408 22326664 0 0 169 405 9588 13696 34 4 61 1 0 6 1 352 8319176 11424 22333144 0 0 559 170 8830 12929 32 4 61 3 0 I shall also try pgtune in a while.
Re: sudden spurt in swap utilization (was:cpu bound postgresql setup.)
From
Rajesh Kumar Mallah
Date:
Dear List,
pgtune suggests the following:
(current value are in braces via reason) , (*) indicates significant difference from current value.
default_statistics_target = 50 # pgtune wizard 2010-06-25 (current 100 via default)
(*) maintenance_work_mem = 1GB # pgtune wizard 2010-06-25 (16MB via default)
checkpoint_completion_target = 0.9 # pgtune wizard 2010-06-25 (0.5 via default)
(*) effective_cache_size = 22GB # pgtune wizard 2010-06-25 (18GB , specified)
work_mem = 192MB # pgtune wizard 2010-06-25 (256MB , specified)
(*) wal_buffers = 8MB # pgtune wizard 2010-06-25 ( 64kb , via default)
checkpoint_segments = 16 # pgtune wizard 2010-06-25 (30 , specified)
shared_buffers = 7680MB # pgtune wizard 2010-06-25 ( 4096 MB , specified)
(*) max_connections = 80 # pgtune wizard 2010-06-25 ( 300 , ;-) specified )
when i reduce max_connections i start getting errors, i will see again concurrent connections
during business hours. lot of our connections are in <IDLE in transaction state> during business
this peculiar behavior of mod_perl servers have been discussed in past i think. dont' remember
if there was any resolution.
pgtune suggests the following:
(current value are in braces via reason) , (*) indicates significant difference from current value.
default_statistics_target = 50 # pgtune wizard 2010-06-25 (current 100 via default)
(*) maintenance_work_mem = 1GB # pgtune wizard 2010-06-25 (16MB via default)
checkpoint_completion_target = 0.9 # pgtune wizard 2010-06-25 (0.5 via default)
(*) effective_cache_size = 22GB # pgtune wizard 2010-06-25 (18GB , specified)
work_mem = 192MB # pgtune wizard 2010-06-25 (256MB , specified)
(*) wal_buffers = 8MB # pgtune wizard 2010-06-25 ( 64kb , via default)
checkpoint_segments = 16 # pgtune wizard 2010-06-25 (30 , specified)
shared_buffers = 7680MB # pgtune wizard 2010-06-25 ( 4096 MB , specified)
(*) max_connections = 80 # pgtune wizard 2010-06-25 ( 300 , ;-) specified )
when i reduce max_connections i start getting errors, i will see again concurrent connections
during business hours. lot of our connections are in <IDLE in transaction state> during business
this peculiar behavior of mod_perl servers have been discussed in past i think. dont' remember
if there was any resolution.
On 25/06/10 16:59, Rajesh Kumar Mallah wrote: > when i reduce max_connections i start getting errors, i will see again > concurrent connections > during business hours. lot of our connections are in <IDLE in > transaction state> during business > this peculiar behavior of mod_perl servers have been discussed in > past i think. dont' remember > if there was any resolution. If connections spend any significant amount of time in <IDLE in transaction> state, that might indicate you're not committing/rolling back after running queries - can you show an example of the code you're using? e.g. something like my $dbh = DBI->connect(...); my $sth = $dbh->prepare(q{select ... }); $sth->fetchall_arrayref; $sth->rollback; Tom
Rajesh Kumar Mallah <mallah.rajesh@gmail.com> wrote: > pgtune suggests the following: > (current value are in braces via reason) , (*) indicates > significant difference from current value. Different people have come to different conclusions on some of these settings. I believe that's probably because differences in hardware and workloads actually make the best choice different in different environments, and it's not always clear how to characterize that to make the best choice. If yo get conflicting advice on particular settings, I would strongly recommend testing to establish what works best for your actual workload on your hardware and OS. That said, my experience suggests... > default_statistics_target = 50 # pgtune wizard 2010-06-25 > (current 100 via default) Higher values add a little bit to the planning time of complex queries, but reduce the risk of choosing a bad plan. I would recommend leaving this at 100 unless you notice problems with long plan times. > (*) maintenance_work_mem = 1GB # pgtune wizard 2010-06-25 > (16MB via default) Yeah, I'd boost this to 1GB. > checkpoint_completion_target = 0.9 # pgtune wizard 2010-06-25 > (0.5 via default) I'd change this one by itself, and probably after some of the other tuning is done, so you can get a good sense of "before" and "after". I'm guessing that 0.9 would be better, but I would test it. > (*) effective_cache_size = 22GB # pgtune wizard 2010-06-25 > (18GB , specified) Unless you're running other processes on the box which consume a lot of RAM, 18GB is probably lower than ideal, although this setting isn't too critical -- it doesn't affect actual RAM allocation; it just gives the optimizer a hint about how much might get cached. A higher setting encourages index use; a lower setting encourages table scans. > work_mem = 192MB # pgtune wizard 2010-06-25 > (256MB , specified) With 300 connections, I think that either of these could lead you to experience intermittent bursts of extreme swapping. I'd drop it to somewhere in the 16MB to 32MB range until I had a connection pool configured such that it was actually keeping the number of active connections much lower. > (*) wal_buffers = 8MB # pgtune wizard 2010-06-25 > (64kb , via default) Sure, I'd boost this. > checkpoint_segments = 16 # pgtune wizard 2010-06-25 > (30 , specified) If you have the disk space for the 30 segments, I wouldn't reduce it. > shared_buffers = 7680MB # pgtune wizard 2010-06-25 > (4096 MB , specified) This one is perhaps the most sensitive to workload. Anywhere between 1GB and 8GB might be best for you. Greg Smith has some great advice on how to tune this for your workload. > (*) max_connections = 80 # pgtune wizard 2010-06-25 > (300 , ;-) specified) > > when i reduce max_connections i start getting errors, i will see > again concurrent connections during business hours. That's probably a good number to get to, but you have to reduce the number of actual connections before you set the limit that low. > lot of our connections are in <IDLE> in transaction state If any of these stay in that state for more than a minute or two, you need to address that if you want to get your connection count under control. If any of them persist for hours or days, you need to fix it to avoid bloat which can kill performance. -Kevin
Rajesh Kumar Mallah wrote: > default_statistics_target = 50 # pgtune wizard 2010-06-25 > (current 100 via default) > (*) effective_cache_size = 22GB # pgtune wizard 2010-06-25 (18GB , > specified) > checkpoint_segments = 16 # pgtune wizard 2010-06-25 (30 , specified) You probably want to keep your existing values for all of these. Your effective_cache_size setting may be a little low, but I wouldn't worry about changing that right now--you have bigger problems right now. > (*) maintenance_work_mem = 1GB # pgtune wizard 2010-06-25 (16MB via > default) > (*) wal_buffers = 8MB # pgtune wizard 2010-06-25 ( 64kb , via default) > checkpoint_completion_target = 0.9 # pgtune wizard 2010-06-25 > (0.5 via default) > shared_buffers = 7680MB # pgtune wizard 2010-06-25 ( 4096 MB , > specified) These are all potentially better for your system, but you'll use more RAM if you make these changes. For example, if you're having swap trouble, you definitely don't want to increase maintenance_work_mem. I suspect that 8GB of shared_buffers is probably the most you want to use. Most systems stop gaining any more benefit from that somewhere between 8GB and 10GB, and instead performance gets worse; it's better to be on the low side of that drop. You can probably support 8GB just fine if you sort out the work_mem issues. > (*) max_connections = 80 # pgtune wizard 2010-06-25 ( 300 , ;-) > specified ) > work_mem = 192MB # pgtune wizard 2010-06-25 (256MB , specified) pgtune makes a guess at how many connections you'll have based on specified workload. If you know you have more connections than that, you should specify that on the command line: pgtune -c 300 ... It will then re-compute the work_mem figure more accurately using that higher connection count. Right now, it's guessing 192MB based on 80 connections, which is on the high side of reasonable. 192MB with *300* connections is way oversized. My rough computation says that if you tell it the number of connections correctly, pgtune will suggest to you around 50MB for work_mem. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Re: sudden spurt in swap utilization (was:cpu bound postgresql setup.)
From
Rajesh Kumar Mallah
Date:
Dear tom, we have autocommit off in dbi. Any commit or rollback from the persistent modperl process immediately issues begin work; if the modperl process is waiting for request the database backend remains in idle in transaction state. Unless we modify data in a http request we neighter issue a commit nor rollback. On 6/25/10, Tom Molesworth <tom@audioboundary.com> wrote: > On 25/06/10 16:59, Rajesh Kumar Mallah wrote: >> when i reduce max_connections i start getting errors, i will see again >> concurrent connections >> during business hours. lot of our connections are in <IDLE in >> transaction state> during business >> this peculiar behavior of mod_perl servers have been discussed in >> past i think. dont' remember >> if there was any resolution. > > If connections spend any significant amount of time in <IDLE in > transaction> state, that might indicate you're not committing/rolling > back after running queries - can you show an example of the code you're > using? > > e.g. something like my $dbh = DBI->connect(...); my $sth = > $dbh->prepare(q{select ... }); $sth->fetchall_arrayref; $sth->rollback; > > Tom > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Sent from Gmail for mobile | mobile.google.com
Re: Re: sudden spurt in swap utilization (was:cpu bound postgresql setup.)
From
Tom Molesworth
Date:
On 25/06/10 20:00, Rajesh Kumar Mallah wrote: > Dear tom, we have autocommit off in dbi. Any commit or rollback from > the persistent modperl process immediately issues begin work; if the > modperl process is waiting for request the database backend remains in > idle in transaction state. Unless we modify data in a http request we > neighter issue a commit nor rollback. > The backend shouldn't go to 'idle in transaction' state until there is some activity within the transaction. I've attached an example script to demonstrate this - note that even SELECT queries will leave the handle as 'IDLE in transaction' unless you've changed the transaction isolation level from the default. Any queries that are idle in transaction will block connection pooling and cause old versions of table rows to hang around, as described in other replies. Note that this is nothing to do with mod_perl, it's purely due to the way transactions are handled - a one-off script would also have this issue, but on exit issues an implicit rollback and disconnects. Typically your database wrapper would handle this (I think DBIx::Class should take care of this automatically, although I haven't used it myself). Tom
Attachment
Re: sudden spurt in swap utilization (was:cpu bound postgresql setup.)
From
Rajesh Kumar Mallah
Date:
Dear Greg/Kevin/List ,
Many thanks for the comments regarding the params, I am however able to change an
experiment on production in a certain time window , when that arrives i shall post
my observations.
Rajesh Kumar Mallah.
Tradeindia.com - India's Largest B2B eMarketPlace.
Many thanks for the comments regarding the params, I am however able to change an
experiment on production in a certain time window , when that arrives i shall post
my observations.
Rajesh Kumar Mallah.
Tradeindia.com - India's Largest B2B eMarketPlace.
Re: sudden spurt in swap utilization (was:cpu bound postgresql setup.)
From
Rajesh Kumar Mallah
Date:
Dear List,
Today has been good since morning. Although it is a lean day
for us but the indications are nice. I thank everyone who shared
the concern. I think the most significant change has been to reduce
shared_buffers from 10G to 4G , this has lead to reduced memory
usage and some breathing space to the OS.
Although i am yet to incorporate the suggestions from pgtune but
i think the issue of max_connection needs to be addressed first.
I am investigating application issues and about the mechanism that
puts many backend to '<IDLE> in transaction ' mode for significant
times. I thank Tom for the script he sent. Once that resolves i shall
check pooling as suggested by Kevin, then eventually max_connections
can be reduced. I shall also check pgpool and pgbouncer if they are
helpful in this regard.
I observed that the number of simultaneous connection today (lean day)
hovers between 1 to 10 , occasionally shooting to 15 but never more than
20 i would say.
I am happy that i/o waits are negligible and cpu is idling also for a while.
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
22 0 18468 954120 13460 28491772 0 0 568 1558 13645 18355 62 10 27 2 0
16 0 18468 208100 13476 28469084 0 0 580 671 14039 17055 67 13 19 1 0
10 2 18812 329032 13400 28356972 0 46 301 1768 13848 17884 68 10 20 1 0
16 2 18812 366596 13416 28361620 0 0 325 535 13957 16649 72 11 16 1 0
50 1 18812 657048 13432 28366548 0 0 416 937 13823 16667 62 9 28 1 0
6 1 18812 361040 13452 28371908 0 0 323 522 14352 16789 74 12 14 0 0
33 0 18812 162760 12604 28210152 0 0 664 1544 14701 16315 66 11 22 2 0
5 0 18812 212028 10764 27921800 0 0 552 648 14567 17737 67 10 21 1 0
6 0 18796 279920 10548 27890388 3 0 359 562 12635 15976 60 9 30 1 0
8 0 18796 438820 10564 27894440 0 0 289 2144 12234 15770 57 8 34 1 0
5 0 18796 531800 10580 27901700 0 0 514 394 12169 16005 59 8 32 1 0
17 0 18796 645868 10596 27890704 0 0 423 948 13369 16554 67 10 23 1 0
9 1 18796 1076540 10612 27898604 0 0 598 403 12703 17363 71 10 18 1 0
8 0 18796 1666508 10628 27904748 0 0 430 1123 13314 17421 57 9 32 1 0
9 1 18776 1541444 10644 27913092 1 0 653 954 13194 16822 75 11 12 1 0
8 0 18776 1526728 10660 27921380 0 0 692 788 13073 16987 74 9 15 1 0
8 0 18776 1482304 10676 27933176 0 0 966 2029 13017 16651 76 12 11 1 0
21 0 18776 1683260 10700 27937492 0 0 298 663 13110 15796 67 10 23 1 0
18 0 18776 2087664 10716 27943512 0 0 406 622 12399 17072 62 9 28 1 0
Today has been good since morning. Although it is a lean day
for us but the indications are nice. I thank everyone who shared
the concern. I think the most significant change has been to reduce
shared_buffers from 10G to 4G , this has lead to reduced memory
usage and some breathing space to the OS.
Although i am yet to incorporate the suggestions from pgtune but
i think the issue of max_connection needs to be addressed first.
I am investigating application issues and about the mechanism that
puts many backend to '<IDLE> in transaction ' mode for significant
times. I thank Tom for the script he sent. Once that resolves i shall
check pooling as suggested by Kevin, then eventually max_connections
can be reduced. I shall also check pgpool and pgbouncer if they are
helpful in this regard.
I observed that the number of simultaneous connection today (lean day)
hovers between 1 to 10 , occasionally shooting to 15 but never more than
20 i would say.
I am happy that i/o waits are negligible and cpu is idling also for a while.
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
22 0 18468 954120 13460 28491772 0 0 568 1558 13645 18355 62 10 27 2 0
16 0 18468 208100 13476 28469084 0 0 580 671 14039 17055 67 13 19 1 0
10 2 18812 329032 13400 28356972 0 46 301 1768 13848 17884 68 10 20 1 0
16 2 18812 366596 13416 28361620 0 0 325 535 13957 16649 72 11 16 1 0
50 1 18812 657048 13432 28366548 0 0 416 937 13823 16667 62 9 28 1 0
6 1 18812 361040 13452 28371908 0 0 323 522 14352 16789 74 12 14 0 0
33 0 18812 162760 12604 28210152 0 0 664 1544 14701 16315 66 11 22 2 0
5 0 18812 212028 10764 27921800 0 0 552 648 14567 17737 67 10 21 1 0
6 0 18796 279920 10548 27890388 3 0 359 562 12635 15976 60 9 30 1 0
8 0 18796 438820 10564 27894440 0 0 289 2144 12234 15770 57 8 34 1 0
5 0 18796 531800 10580 27901700 0 0 514 394 12169 16005 59 8 32 1 0
17 0 18796 645868 10596 27890704 0 0 423 948 13369 16554 67 10 23 1 0
9 1 18796 1076540 10612 27898604 0 0 598 403 12703 17363 71 10 18 1 0
8 0 18796 1666508 10628 27904748 0 0 430 1123 13314 17421 57 9 32 1 0
9 1 18776 1541444 10644 27913092 1 0 653 954 13194 16822 75 11 12 1 0
8 0 18776 1526728 10660 27921380 0 0 692 788 13073 16987 74 9 15 1 0
8 0 18776 1482304 10676 27933176 0 0 966 2029 13017 16651 76 12 11 1 0
21 0 18776 1683260 10700 27937492 0 0 298 663 13110 15796 67 10 23 1 0
18 0 18776 2087664 10716 27943512 0 0 406 622 12399 17072 62 9 28 1 0
With 300 connections, I think that either of these could lead you to
experience intermittent bursts of extreme swapping. I'd drop it to
somewhere in the 16MB to 32MB range until I had a connection pool
configured such that it was actually keeping the number of active
connections much lower.Sure, I'd boost this.
> (*) wal_buffers = 8MB # pgtune wizard 2010-06-25
> (64kb , via default)If you have the disk space for the 30 segments, I wouldn't reduce
> checkpoint_segments = 16 # pgtune wizard 2010-06-25
> (30 , specified)
it.This one is perhaps the most sensitive to workload. Anywhere
> shared_buffers = 7680MB # pgtune wizard 2010-06-25
> (4096 MB , specified)
between 1GB and 8GB might be best for you. Greg Smith has some
great advice on how to tune this for your workload.That's probably a good number to get to, but you have to reduce the
> (*) max_connections = 80 # pgtune wizard 2010-06-25
> (300 , ;-) specified)
>
> when i reduce max_connections i start getting errors, i will see
> again concurrent connections during business hours.
number of actual connections before you set the limit that low.If any of these stay in that state for more than a minute or two,
> lot of our connections are in <IDLE> in transaction state
you need to address that if you want to get your connection count
under control. If any of them persist for hours or days, you need
to fix it to avoid bloat which can kill performance.
-Kevin
Re: sudden spurt in swap utilization (was:cpu bound postgresql setup.)
From
Rajesh Kumar Mallah
Date:
Dear List ,
A simple (perl) script was made to 'watch' the state transitions of
back ends. On startup It captures a set of pids for watching
and displays a visual representation of the states for next 30 intervals
of 1 seconds each. The X axis is interval cnt, Y axis is pid and the
origin is on top-left.
The state value can be Active Query (*) , or <IDLE> indicated by '.' or
'<IDLE> in transaction' indicated by '?' . for my server below is a random
output (during lean hours and on a lean day).
----------------------------------------------------------------------------------------------------
PID 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
----------------------------------------------------------------------------------------------------
4334 ? ? ? ? * ? ? ? ? ? * ? ? ? ? ? ? ? ?
6904 ? ? . . . * ? . . . . . . ? ? .
6951 ? ? ? . . . . ? ? ? ? ? . . . ? ? ? . . . ? . . . . . ? ? .
7009 ? * ? ? . . . . . . . . . * * . * ? ? . . . * ? ? ? . . . ?
7077 ? . ? . . . * . ? . . . . ? . . . . . . . ? . . ? . . . ? ?
7088 ? . . ? . ? ? ? . . . . . . ? . . ? ? * . . . . . ? . ? . *
7091 ? . . * ? ? ? ? ? ? ? * ? . . ? * . * . . . . . . . . . . .
7093 ? ? . ? . . . . ? . ? * . . . . . . . . . ? ? ? . ? ? ? . .
7112 * * . . . ? ? ? . . . . . . . . ? ? . ? . ? . . ? . . . . .
7135 ? . . * . ? ? ? . ? ? . . . ? . . . . . . . ? . . . ? ? . .
7142 ? . ? . . . . . . * . . . ? . . . . . . . . . . . . . .
7166 ? . ? ? ? * * . ? * . ? . . . ? . ? ? . . . * . . . ? . . .
8202 ? ? . . . * . ? . . . . . . . * ? . . . ? ? . . . . ? ? ? .
8223 ? . . . . . . ?
8237 ? ? ? . ? ? ? ? . . . . . . . ? . . . . . ? . . * ? . . . .
8251 ? . ? . . . . . ? ? . . . * ? . . . ? . . . . . . . . . . .
8278 ? ? . . . . ? . . . . . . . ? . . . . . . ? ? . . * . . . .
8290 ? . .
8294 ? ? . . . . . . . . . . . . ? . . . ? ? . . . . . . . . * *
8303 ? * ? . ? ? ? . ? ? ? . . . . * . . . . . . . . . . . . . .
8306 ? ? . . . ? . . . ? . . . . . . * . . .
8309 * ? ? ? ? . . . ? . . .
8329 ? . * * . . . . . . . * . ? . * . ? . * . * ? . . .
----------------------------------------------------------------------------------------------------
(*) Active Query , (.) Idle , (?) Idle in transaction,<blank> backend over.
----------------------------------------------------------------------------------------------------
Looks like most of the graph space is filled with (.) or (?) and very
less active queries (long running queries > 1s). on a busy day and busi hour i shall check the and post again. The script is presented which depends only on perl , DBI and DBD::Pg.
script pasted here:
http://pastebin.com/mrjSZfLB
Regds
mallah.
A simple (perl) script was made to 'watch' the state transitions of
back ends. On startup It captures a set of pids for watching
and displays a visual representation of the states for next 30 intervals
of 1 seconds each. The X axis is interval cnt, Y axis is pid and the
origin is on top-left.
The state value can be Active Query (*) , or <IDLE> indicated by '.' or
'<IDLE> in transaction' indicated by '?' . for my server below is a random
output (during lean hours and on a lean day).
----------------------------------------------------------------------------------------------------
PID 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
----------------------------------------------------------------------------------------------------
4334 ? ? ? ? * ? ? ? ? ? * ? ? ? ? ? ? ? ?
6904 ? ? . . . * ? . . . . . . ? ? .
6951 ? ? ? . . . . ? ? ? ? ? . . . ? ? ? . . . ? . . . . . ? ? .
7009 ? * ? ? . . . . . . . . . * * . * ? ? . . . * ? ? ? . . . ?
7077 ? . ? . . . * . ? . . . . ? . . . . . . . ? . . ? . . . ? ?
7088 ? . . ? . ? ? ? . . . . . . ? . . ? ? * . . . . . ? . ? . *
7091 ? . . * ? ? ? ? ? ? ? * ? . . ? * . * . . . . . . . . . . .
7093 ? ? . ? . . . . ? . ? * . . . . . . . . . ? ? ? . ? ? ? . .
7112 * * . . . ? ? ? . . . . . . . . ? ? . ? . ? . . ? . . . . .
7135 ? . . * . ? ? ? . ? ? . . . ? . . . . . . . ? . . . ? ? . .
7142 ? . ? . . . . . . * . . . ? . . . . . . . . . . . . . .
7166 ? . ? ? ? * * . ? * . ? . . . ? . ? ? . . . * . . . ? . . .
8202 ? ? . . . * . ? . . . . . . . * ? . . . ? ? . . . . ? ? ? .
8223 ? . . . . . . ?
8237 ? ? ? . ? ? ? ? . . . . . . . ? . . . . . ? . . * ? . . . .
8251 ? . ? . . . . . ? ? . . . * ? . . . ? . . . . . . . . . . .
8278 ? ? . . . . ? . . . . . . . ? . . . . . . ? ? . . * . . . .
8290 ? . .
8294 ? ? . . . . . . . . . . . . ? . . . ? ? . . . . . . . . * *
8303 ? * ? . ? ? ? . ? ? ? . . . . * . . . . . . . . . . . . . .
8306 ? ? . . . ? . . . ? . . . . . . * . . .
8309 * ? ? ? ? . . . ? . . .
8329 ? . * * . . . . . . . * . ? . * . ? . * . * ? . . .
----------------------------------------------------------------------------------------------------
(*) Active Query , (.) Idle , (?) Idle in transaction,<blank> backend over.
----------------------------------------------------------------------------------------------------
Looks like most of the graph space is filled with (.) or (?) and very
less active queries (long running queries > 1s). on a busy day and busi hour i shall check the and post again. The script is presented which depends only on perl , DBI and DBD::Pg.
script pasted here:
http://pastebin.com/mrjSZfLB
Regds
mallah.
On Sat, Jun 26, 2010 at 3:23 PM, Rajesh Kumar Mallah <mallah.rajesh@gmail.com> wrote:
Dear List,
Today has been good since morning. Although it is a lean day
for us but the indications are nice. I thank everyone who shared
the concern. I think the most significant change has been to reduce
shared_buffers from 10G to 4G , this has lead to reduced memory
usage and some breathing space to the OS.
Although i am yet to incorporate the suggestions from pgtune but
i think the issue of max_connection needs to be addressed first.
I am investigating application issues and about the mechanism that
puts many backend to '<IDLE> in transaction ' mode for significant
times. I thank Tom for the script he sent. Once that resolves i shall
check pooling as suggested by Kevin, then eventually max_connections
can be reduced. I shall also check pgpool and pgbouncer if they are
helpful in this regard.
I observed that the number of simultaneous connection today (lean day)
hovers between 1 to 10 , occasionally shooting to 15 but never more than
20 i would say.
I am happy that i/o waits are negligible and cpu is idling also for a while.22 0 18468 954120 13460 28491772 0 0 568 1558 13645 18355 62 10 27 2 0
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
16 0 18468 208100 13476 28469084 0 0 580 671 14039 17055 67 13 19 1 0
10 2 18812 329032 13400 28356972 0 46 301 1768 13848 17884 68 10 20 1 0
16 2 18812 366596 13416 28361620 0 0 325 535 13957 16649 72 11 16 1 0
50 1 18812 657048 13432 28366548 0 0 416 937 13823 16667 62 9 28 1 0
6 1 18812 361040 13452 28371908 0 0 323 522 14352 16789 74 12 14 0 0
33 0 18812 162760 12604 28210152 0 0 664 1544 14701 16315 66 11 22 2 0
5 0 18812 212028 10764 27921800 0 0 552 648 14567 17737 67 10 21 1 0
6 0 18796 279920 10548 27890388 3 0 359 562 12635 15976 60 9 30 1 0
8 0 18796 438820 10564 27894440 0 0 289 2144 12234 15770 57 8 34 1 0
5 0 18796 531800 10580 27901700 0 0 514 394 12169 16005 59 8 32 1 0
17 0 18796 645868 10596 27890704 0 0 423 948 13369 16554 67 10 23 1 0
9 1 18796 1076540 10612 27898604 0 0 598 403 12703 17363 71 10 18 1 0
8 0 18796 1666508 10628 27904748 0 0 430 1123 13314 17421 57 9 32 1 0
9 1 18776 1541444 10644 27913092 1 0 653 954 13194 16822 75 11 12 1 0
8 0 18776 1526728 10660 27921380 0 0 692 788 13073 16987 74 9 15 1 0
8 0 18776 1482304 10676 27933176 0 0 966 2029 13017 16651 76 12 11 1 0
21 0 18776 1683260 10700 27937492 0 0 298 663 13110 15796 67 10 23 1 0
18 0 18776 2087664 10716 27943512 0 0 406 622 12399 17072 62 9 28 1 0With 300 connections, I think that either of these could lead you to
experience intermittent bursts of extreme swapping. I'd drop it to
somewhere in the 16MB to 32MB range until I had a connection pool
configured such that it was actually keeping the number of active
connections much lower.Sure, I'd boost this.
> (*) wal_buffers = 8MB # pgtune wizard 2010-06-25
> (64kb , via default)If you have the disk space for the 30 segments, I wouldn't reduce
> checkpoint_segments = 16 # pgtune wizard 2010-06-25
> (30 , specified)
it.This one is perhaps the most sensitive to workload. Anywhere
> shared_buffers = 7680MB # pgtune wizard 2010-06-25
> (4096 MB , specified)
between 1GB and 8GB might be best for you. Greg Smith has some
great advice on how to tune this for your workload.That's probably a good number to get to, but you have to reduce the
> (*) max_connections = 80 # pgtune wizard 2010-06-25
> (300 , ;-) specified)
>
> when i reduce max_connections i start getting errors, i will see
> again concurrent connections during business hours.
number of actual connections before you set the limit that low.If any of these stay in that state for more than a minute or two,
> lot of our connections are in <IDLE> in transaction state
you need to address that if you want to get your connection count
under control. If any of them persist for hours or days, you need
to fix it to avoid bloat which can kill performance.
-Kevin