Thread: high user cpu, massive SELECTs, no io waiting problem
Hi list,
first time for me here, hope you’re not dealing too severely with me regarding guidelines. Giving my best.
We are running PostgreSQL 8.4.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (Debian 4.3.2-1.1) 4.3.2, 64-bit on a Supermicro SuperServer 8026B-6RF.
This version is downloaded from postgresql.org and selfcompiled, running for over a year now. The Server has 128 GB RAM and Four Intel® Xeon® X7550 with 64 logical cores.
Operating System is “Linux database1 2.6.32-bpo.5-amd64 #1 SMP Mon Dec 13 17:10:39 UTC 2010 x86_64 GNU/Linux”.
The System boots through iscsi over a Qlogic QLE4062C HBA. Pgdata and xlog is logged in over iscsi HBA too. We tried en and disabling jumbo frames. Makes no difference.
We are using a DELL Equallogic SAN Backend with SAS drives.
Postgres is used as backend for a high performance website. We are using nginx with php-fastcgi and memcached.
Since a few weeks we have really strange peaks on this system. User CPU is increasing up to 100% and we have lots of SELECTs running.
There is no iowait at this time, only high user cpu and we don’t know where this is coming from. It seems like this is only happening under certain circumstances.
We can solve this problem by simply removing the load from the website by delivering an offline page. We let database calm down for a while and then slowly throttling users.
See ganglia: http://dl.dropbox.com/u/183323/CPUloadprobsdb1.jpg
Has someone made similar experiences? Perhaps there is some issue between Postgres 8.4.4 and kernel 2.6.32?
Thank in advance
Thomas
--
Turtle Entertainment GmbH
Thomas Pöhler, Manager IT Operations
Siegburger Str. 189
50679 Cologne
Germany
fon. +49 221 880449-331
fax. +49 221 880449-399
http://www.turtle-entertainment.com/
Managing Director: Ralf Reichert
Register Court: Local Court Cologne, HRB 36678
On Tue, Feb 15, 2011 at 10:19 AM, Thomas Pöhler <tp@turtle-entertainment.de> wrote: > Since a few weeks we have really strange peaks on this system. User CPU is > increasing up to 100% and we have lots of SELECTs running. Are you using pooling of some kind, or do you have LOTS of connections? > There is no iowait at this time, only high user cpu and we don’t know where > this is coming from. It seems like this is only happening under certain > circumstances. run htop and look for red. if youi've got lots of red bar on each CPU but no io wait then it's waiting for memory access. Most of these multi-core machines will be memory read / write speed bound. Pooling will help relieve some of that memory bandwidth load, but might not be enough to eliminate it.
Thomas Pöhler<tp@turtle-entertainment.de> wrote: > we have lots of SELECTs running. How many? Could you show your postgresql.conf file, with all comments removed? What does vmstat 1 (or similar) show at baseline and during your problem episodes? -Kevin
You have also run analyze verbose, and checked to make sure you don’t have a ton of bloated indexes?
- check the process with strace –p PID
- check the diskIO with iostat, not vmstat
- run analyze verbose, and possible reindex the database, or cluster the larger tables.
- dump from pg_stat_activity, and check what the largest objects are based on relpages from pg_class.
- check index scans/table scans from pg_statio tables if you have track_activities on in the .conf file.
- John
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Thomas Pöhler
Sent: 15 February 2011 17:19
To: pgsql-performance@postgresql.org
Cc: Felix Feinhals; Verteiler_A-Team; Björn Metzdorf
Subject: [PERFORM] high user cpu, massive SELECTs, no io waiting problem
Hi list,
first time for me here, hope you’re not dealing too severely with me regarding guidelines. Giving my best.
We are running PostgreSQL 8.4.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (Debian 4.3.2-1.1) 4.3.2, 64-bit on a Supermicro SuperServer 8026B-6RF.
This version is downloaded from postgresql.org and selfcompiled, running for over a year now. The Server has 128 GB RAM and Four Intel® Xeon® X7550 with 64 logical cores.
Operating System is “Linux database1 2.6.32-bpo.5-amd64 #1 SMP Mon Dec 13 17:10:39 UTC 2010 x86_64 GNU/Linux”.
The System boots through iscsi over a Qlogic QLE4062C HBA. Pgdata and xlog is logged in over iscsi HBA too. We tried en and disabling jumbo frames. Makes no difference.
We are using a DELL Equallogic SAN Backend with SAS drives.
Postgres is used as backend for a high performance website. We are using nginx with php-fastcgi and memcached.
Since a few weeks we have really strange peaks on this system. User CPU is increasing up to 100% and we have lots of SELECTs running.
There is no iowait at this time, only high user cpu and we don’t know where this is coming from. It seems like this is only happening under certain circumstances.
We can solve this problem by simply removing the load from the website by delivering an offline page. We let database calm down for a while and then slowly throttling users.
See ganglia: http://dl.dropbox.com/u/183323/CPUloadprobsdb1.jpg
Has someone made similar experiences? Perhaps there is some issue between Postgres 8.4.4 and kernel 2.6.32?
Thank in advance
Thomas
--
Turtle Entertainment GmbH
Thomas Pöhler, Manager IT Operations
Siegburger Str. 189
50679 Cologne
Germany
fon. +49 221 880449-331
fax. +49 221 880449-399
http://www.turtle-entertainment.com/
Managing Director: Ralf Reichert
Register Court: Local Court Cologne, HRB 36678
This communication is for informational purposes only. It is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction. All market prices, data and other information are not warranted as to completeness or accuracy and are subject to change without notice. Any comments or statements made herein do not necessarily reflect those of JPMorgan Chase & Co., its subsidiaries and affiliates. This transmission may contain information that is privileged, confidential, legally privileged, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by JPMorgan Chase & Co., its subsidiaries and affiliates, as applicable, for any loss or damage arising in any way from its use. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format. Thank you. Please refer to http://www.jpmorgan.com/pages/disclosures for disclosures relating to European legal entities.
On Tue, Feb 15, 2011 at 6:19 PM, Thomas Pöhler <tp@turtle-entertainment.de> wrote: > Hi list, > > See ganglia: http://dl.dropbox.com/u/183323/CPUloadprobsdb1.jpg > What is the bottom graph? queries/minute? Looks like Your database is just getting hammered. Maybe there is a really badly coded page somewhere (a query for each user or something similar)? Greetings Marcin Mańk
On 15/02/2011 18:19, Thomas Pöhler wrote: > Hi list, > > first time for me here, hope you’re not dealing too severely with me > regarding guidelines. Giving my best. > > We are running PostgreSQL 8.4.4 on x86_64-unknown-linux-gnu, compiled by > GCC gcc (Debian 4.3.2-1.1) 4.3.2, 64-bit on a Supermicro SuperServer > 8026B-6RF. > > This version is downloaded from postgresql.org and selfcompiled, running > for over a year now. The Server has 128 GB RAM and Four Intel® Xeon® > X7550 with 64 logical cores. So, 64 logical cores total. > Operating System is “Linux database1 2.6.32-bpo.5-amd64 #1 SMP Mon Dec > 13 17:10:39 UTC 2010 x86_64 GNU/Linux”. > > The System boots through iscsi over a Qlogic QLE4062C HBA. Pgdata and > xlog is logged in over iscsi HBA too. We tried en and disabling jumbo > frames. Makes no difference. Are you using 10 Gbit/s Ethernet for iSCSI? Regular 1 Gbit/s Ethernet might be too slow for you. > Since a few weeks we have really strange peaks on this system. User CPU > is increasing up to 100% and we have lots of SELECTs running. > See ganglia: http://dl.dropbox.com/u/183323/CPUloadprobsdb1.jpg > > Has someone made similar experiences? Perhaps there is some issue > between Postgres 8.4.4 and kernel 2.6.32? From your graph it looks like the number of active processes (I'm assuming they are PostgreSQL processes) is going out of control. There is an old problem (which I've encountered so I'm replying but it may or may not be in your case) in which PostgreSQL starts behaving badly even for SELECT queries if the number of simultaneous queries exceeds the number of logical CPUs. To test this, I'd recommend setting up a utility like pgpool-II (http://pgpool.projects.postgresql.org/) in front of the database to try and limit the number of active connections to nearly 64 (maybe you can have good results with 80 or 100). You might also experiment with pgsql.max_links setting of PHP but IIRC PHP will just refuse more connections than that instead of waiting for them (but maybe your application can spin-wait for them, possibly while also using usleep()).
On Tue, Feb 15, 2011 at 6:00 PM, Ivan Voras <ivoras@freebsd.org> wrote: > There is an old problem (which I've encountered so I'm replying but it may > or may not be in your case) in which PostgreSQL starts behaving badly even > for SELECT queries if the number of simultaneous queries exceeds the number > of logical CPUs. Note that this is a problem for most RDBMS engines, not just postgresql. The performance drop off isn't too bad, but the total number of connections times even a doubling of response time results in a slow server. > To test this, I'd recommend setting up a utility like > pgpool-II (http://pgpool.projects.postgresql.org/) in front of the database > to try and limit the number of active connections to nearly 64 (maybe you > can have good results with 80 or 100). pgpool IS the answer for most of these issues. > You might also experiment with pgsql.max_links setting of PHP but IIRC PHP > will just refuse more connections than that instead of waiting for them (but > maybe your application can spin-wait for them, possibly while also using > usleep()). That setting is PER PROCESS so it might not help that much. http://www.php.net/manual/en/pgsql.configuration.php#ini.pgsql.max-links
Kevin Grittner wrote: > Could you show your postgresql.conf file, with all comments removed I just added a sample query to provide the data we always want here without people having to edit their config files, by querying pg_settings for it, to http://wiki.postgresql.org/wiki/Server_Configuration I already updated http://wiki.postgresql.org/wiki/SlowQueryQuestions and http://wiki.postgresql.org/wiki/Guide_to_reporting_problems to mention this too. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
On Tue, Feb 15, 2011 at 20:01, Scott Marlowe <scott.marlowe@gmail.com> wrote: > run htop and look for red. if youi've got lots of red bar on each CPU > but no io wait then it's waiting for memory access. I don't think this is true. AFAICT the red bar refers to "system time", time that's spent in the kernel -- either in syscalls or kernel background threads. Operating systems don't generally account memory accesses (cache misses) for processes, if you don't specially ask for it. The closest thing I know of is using Linux perf tools, e.g. "perf top -e cache-misses". OProfile, DTrace and SystemTap can probably do something similar. Regards, Marti
Greg Smith <greg@2ndquadrant.com> wrote: > I just added a sample query to provide the data we always want > here without people having to edit their config files, by > querying pg_settings for it, to > http://wiki.postgresql.org/wiki/Server_Configuration Nice! Thanks! A few very nice things about this: (1) You don't need rights to the postgresql.conf file; any user can run this. (2) You don't need to know how to strip the comments with sed or perl or something, or go through the file with tedious manual editing. (3) It shows some things which aren't coming from the postgresql.conf file which might be of interest. In fact, I wonder whether we shouldn't leave a couple items you've excluded, since they are sometimes germane to problems posted, like lc_collate and TimeZone. -Kevin
Kevin Grittner wrote: > In fact, I wonder whether we shouldn't leave a couple items you've > excluded, since they are sometimes germane to problems posted, like > lc_collate and TimeZone. I pulled some of them out only because they're not really postgresql.conf settings; lc_collate and lc_ctype for example are set at initdb time. Feel free to hack on that example if you feel it could be improved, just be aware which of those things are not really in the main config file when pondering if they should be included. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
On Wed, Feb 16, 2011 at 6:44 AM, Marti Raudsepp <marti@juffo.org> wrote: > On Tue, Feb 15, 2011 at 20:01, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> run htop and look for red. if youi've got lots of red bar on each CPU >> but no io wait then it's waiting for memory access. > > I don't think this is true. AFAICT the red bar refers to "system > time", time that's spent in the kernel -- either in syscalls or kernel > background threads. My point being that if you've got a lot of RED it'll be the OS waiting for memory access. Trust me, when we start to hit our memory bandwidth (in the 70 to 80 GB/s range) we start to get more and more red and more and more kernel wait time.
Yeah, at max load we are. We're running quad 12 core AMD Magny Cours. Under max load all of our cores go about 20 to 30% red (i.e. kernel) and we wind up waiting on the kernel much more. Could be a mix of context switching and waiting on memory, so it's just a guesstimate I'm making based on previous testing with Greg Smith's memory streaming test and familiarity with this system. On Wed, Feb 16, 2011 at 8:53 AM, Strange, John W <john.w.strange@jpmchase.com> wrote: > Scott, are you really moving that much data through memory, 70-80GB/sec is the limit of the new intel 7500 series in abest case scenario. > > - John > > -----Original Message----- > From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Scott Marlowe > Sent: 16 February 2011 15:43 > To: Marti Raudsepp > Cc: Thomas Pöhler; pgsql-performance@postgresql.org; Felix Feinhals; Verteiler_A-Team; Björn Metzdorf > Subject: Re: [PERFORM] high user cpu, massive SELECTs, no io waiting problem > > On Wed, Feb 16, 2011 at 6:44 AM, Marti Raudsepp <marti@juffo.org> wrote: >> On Tue, Feb 15, 2011 at 20:01, Scott Marlowe <scott.marlowe@gmail.com> wrote: >>> run htop and look for red. if youi've got lots of red bar on each CPU >>> but no io wait then it's waiting for memory access. >> >> I don't think this is true. AFAICT the red bar refers to "system >> time", time that's spent in the kernel -- either in syscalls or kernel >> background threads. > > My point being that if you've got a lot of RED it'll be the OS waiting > for memory access. Trust me, when we start to hit our memory > bandwidth (in the 70 to 80 GB/s range) we start to get more and more > red and more and more kernel wait time. > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > This communication is for informational purposes only. It is not > intended as an offer or solicitation for the purchase or sale of > any financial instrument or as an official confirmation of any > transaction. All market prices, data and other information are not > warranted as to completeness or accuracy and are subject to change > without notice. Any comments or statements made herein do not > necessarily reflect those of JPMorgan Chase & Co., its subsidiaries > and affiliates. > > This transmission may contain information that is privileged, > confidential, legally privileged, and/or exempt from disclosure > under applicable law. If you are not the intended recipient, you > are hereby notified that any disclosure, copying, distribution, or > use of the information contained herein (including any reliance > thereon) is STRICTLY PROHIBITED. Although this transmission and any > attachments are believed to be free of any virus or other defect > that might affect any computer system into which it is received and > opened, it is the responsibility of the recipient to ensure that it > is virus free and no responsibility is accepted by JPMorgan Chase & > Co., its subsidiaries and affiliates, as applicable, for any loss > or damage arising in any way from its use. If you received this > transmission in error, please immediately contact the sender and > destroy the material in its entirety, whether in electronic or hard > copy format. Thank you. > > Please refer to http://www.jpmorgan.com/pages/disclosures for > disclosures relating to European legal entities. > -- To understand recursion, one must first understand recursion.
Justin Pitts <justinpitts@gmail.com> wrote: > I think adding > > UNION ALL SELECT 'postgres version', version(); > > might be a good thing. Good point. Added. > Greg Smith <greg@2ndquadrant.com> wrote: >> Kevin Grittner wrote: >>> >>> In fact, I wonder whether we shouldn't leave a couple items >>> you've excluded, since they are sometimes germane to problems >>> posted, like lc_collate and TimeZone. >> >> I pulled some of them out only because they're not really >> postgresql.conf settings; lc_collate and lc_ctype for example are >> set at initdb time. Feel free to hack on that example if you >> feel it could be improved, just be aware which of those things >> are not really in the main config file when pondering if they >> should be included. Basically, the ones I could remember us needing to ask about on multiple occasions, I put back -- provisionally. If someone thinks they're pointless, I won't worry about them being dropped again: time zone, character encoding scheme, character set, and collation. I'm pretty sure I've seen us ask about all of those in trying to sort out a problem. I also tried the query on a newly installed HEAD build which had no manual changes to the postgresql.conf file and found a few others which seemed to me to be worth suppressing. I took my shot -- anyone else is welcome to do so.... :-) -Kevin
Hi, we are using two instances of pgbouncer v1.4 for connection pooling. One for prepared statements (pool_mode session) and one without (pool_mode transaction). Pgbouncer.ini: [pgbouncer] pool_mode = transaction/session server_reset_query = DISCARD ALL; server_check_query = select 1 server_check_delay = 10 max_client_conn = 10000 default_pool_size = 450 log_connections = 0 log_disconnections = 0 log_pooler_errors = 1 client_login_timeout = 0 I will examine htop next time during a peak. If I remember correctly vmstat showed lots of context switches during a peak above 50k. We are running a biweekly downtime where we do a complete reindex and vaccum full. We cannot identify certain queries causingthis. The last graph in ganglia (http://dl.dropbox.com/u/183323/CPUloadprobsdb1.jpg) shows the avg_queries from pgbouncers stats.I think this is a symptom of many waiting queries which accumulate. Our iscsi is connected with 3Gibt/s. But that's more than enough. We don't have high traffic throughput. This is the result of the query you gave me: version PostgreSQL 8.4.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (Debian 4.3.2-1.1) 4.3.2, 64-bit checkpoint_segments 40 custom_variable_classes pg_stat_statements effective_cache_size 48335MB escape_string_warning off fsync on lc_collate C lc_ctype C listen_addresses * log_destination stderr log_line_prefix %t %p %d %u %r log_lock_waits on log_min_duration_statement 1s log_min_messages notice log_rotation_size 10MB log_temp_files 50MB logging_collector on maintenance_work_mem 1GB max_connections 1000 max_prepared_transactions 5 max_stack_depth 2MB pg_stat_statements.max 10000 pg_stat_statements.track all port 5433 server_encoding UTF8 shared_buffers 16GB TimeZone Europe/Berlin update_process_title on wal_buffers 1MB work_mem 32MB Seems like connection limit 10000 is way too much on pgbouncer? Our queries overall are not that CPU intensive. If they areslow, they are mostly waiting for disk io. When having a look at the traffic of this database server we see 2/3 of thetraffic is going to san/disk and only 1/3 going to the server. In other words from the traffic view, 2/3 of our operationsare writes and 1/3 are reads. The database is fitting completely into ram, so reads should not be a problem. Appreciate your help! Thomas -----Ursprüngliche Nachricht----- Von: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov] Gesendet: Mittwoch, 16. Februar 2011 17:09 An: Greg Smith; Justin Pitts Cc: pgsql-performance@postgresql.org; Verteiler_A-Team; Björn Metzdorf; Felix Feinhals; Thomas Pöhler Betreff: Re: [PERFORM] high user cpu, massive SELECTs, no io waiting problem Justin Pitts <justinpitts@gmail.com> wrote: > I think adding > > UNION ALL SELECT 'postgres version', version(); > > might be a good thing. Good point. Added. > Greg Smith <greg@2ndquadrant.com> wrote: >> Kevin Grittner wrote: >>> >>> In fact, I wonder whether we shouldn't leave a couple items >>> you've excluded, since they are sometimes germane to problems >>> posted, like lc_collate and TimeZone. >> >> I pulled some of them out only because they're not really >> postgresql.conf settings; lc_collate and lc_ctype for example are >> set at initdb time. Feel free to hack on that example if you >> feel it could be improved, just be aware which of those things >> are not really in the main config file when pondering if they >> should be included. Basically, the ones I could remember us needing to ask about on multiple occasions, I put back -- provisionally. If someone thinks they're pointless, I won't worry about them being dropped again: time zone, character encoding scheme, character set, and collation. I'm pretty sure I've seen us ask about all of those in trying to sort out a problem. I also tried the query on a newly installed HEAD build which had no manual changes to the postgresql.conf file and found a few others which seemed to me to be worth suppressing. I took my shot -- anyone else is welcome to do so.... :-) -Kevin
2011/2/16 Thomas Pöhler <tp@turtle-entertainment.de>: > Hi, > > we are using two instances of pgbouncer v1.4 for connection pooling. > One for prepared statements (pool_mode session) and one without (pool_mode transaction). > > Pgbouncer.ini: > [pgbouncer] > pool_mode = transaction/session > server_reset_query = DISCARD ALL; > server_check_query = select 1 > server_check_delay = 10 > max_client_conn = 10000 > default_pool_size = 450 > log_connections = 0 > log_disconnections = 0 > log_pooler_errors = 1 > client_login_timeout = 0 > > > I will examine htop next time during a peak. > > If I remember correctly vmstat showed lots of context switches during a peak above 50k. > > We are running a biweekly downtime where we do a complete reindex and vaccum full. We cannot identify certain queries causingthis. > > The last graph in ganglia (http://dl.dropbox.com/u/183323/CPUloadprobsdb1.jpg) shows the avg_queries from pgbouncers stats.I think this is a symptom of many waiting queries which accumulate. > > Our iscsi is connected with 3Gibt/s. But that's more than enough. We don't have high traffic throughput. > > This is the result of the query you gave me: > > version PostgreSQL 8.4.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (Debian 4.3.2-1.1) 4.3.2, 64-bit > checkpoint_segments 40 > custom_variable_classes pg_stat_statements > effective_cache_size 48335MB > escape_string_warning off > fsync on > lc_collate C > lc_ctype C > listen_addresses * > log_destination stderr > log_line_prefix %t %p %d %u %r > log_lock_waits on > log_min_duration_statement 1s > log_min_messages notice > log_rotation_size 10MB > log_temp_files 50MB > logging_collector on > maintenance_work_mem 1GB > max_connections 1000 > max_prepared_transactions 5 > max_stack_depth 2MB > pg_stat_statements.max 10000 > pg_stat_statements.track all > port 5433 > server_encoding UTF8 > shared_buffers 16GB > TimeZone Europe/Berlin > update_process_title on > wal_buffers 1MB > work_mem 32MB > > > Seems like connection limit 10000 is way too much on pgbouncer? Our queries overall are not that CPU intensive. If theyare slow, they are mostly waiting for disk io. When having a look at the traffic of this database server we see 2/3 ofthe traffic is going to san/disk and only 1/3 going to the server. In other words from the traffic view, 2/3 of our operationsare writes and 1/3 are reads. The database is fitting completely into ram, so reads should not be a problem. I used pgbouncer with way more than that, not an issue on its own *but* can you export the pgbouncers in another box ? I get issues in very high-mem usage (more than IO) and ton's of connection via pgbouncer, then moving the bouncer in a 3rd box salve the situation. > > Appreciate your help! > Thomas > > -----Ursprüngliche Nachricht----- > Von: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov] > Gesendet: Mittwoch, 16. Februar 2011 17:09 > An: Greg Smith; Justin Pitts > Cc: pgsql-performance@postgresql.org; Verteiler_A-Team; Björn Metzdorf; Felix Feinhals; Thomas Pöhler > Betreff: Re: [PERFORM] high user cpu, massive SELECTs, no io waiting problem > > Justin Pitts <justinpitts@gmail.com> wrote: >> I think adding >> >> UNION ALL SELECT 'postgres version', version(); >> >> might be a good thing. > > Good point. Added. > >> Greg Smith <greg@2ndquadrant.com> wrote: >>> Kevin Grittner wrote: >>>> >>>> In fact, I wonder whether we shouldn't leave a couple items >>>> you've excluded, since they are sometimes germane to problems >>>> posted, like lc_collate and TimeZone. >>> >>> I pulled some of them out only because they're not really >>> postgresql.conf settings; lc_collate and lc_ctype for example are >>> set at initdb time. Feel free to hack on that example if you >>> feel it could be improved, just be aware which of those things >>> are not really in the main config file when pondering if they >>> should be included. > > Basically, the ones I could remember us needing to ask about on > multiple occasions, I put back -- provisionally. If someone thinks > they're pointless, I won't worry about them being dropped again: > time zone, character encoding scheme, character set, and collation. > I'm pretty sure I've seen us ask about all of those in trying to > sort out a problem. > > I also tried the query on a newly installed HEAD build which had no > manual changes to the postgresql.conf file and found a few others > which seemed to me to be worth suppressing. > > I took my shot -- anyone else is welcome to do so.... :-) > > -Kevin > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
Thomas Pöhler<tp@turtle-entertainment.de> wrote: > we are using two instances of pgbouncer v1.4 for connection > pooling. One for prepared statements (pool_mode session) and one > without (pool_mode transaction). > max_client_conn = 10000 > default_pool_size = 450 Your best defense against the "thundering herd" issues you describe would be to eliminate the session pool (if you can), and drop the default_pool_size for the transaction pool to where at peak the number of backends actually busy is about twice your number of *actual* cores. (Don't count hyperthreading "logical" cores for this purpose.) max_client_conn can be as high as you need; the point is for the connection pool to funnel the requests through a much smaller pool of database connections. > If I remember correctly vmstat showed lots of context switches > during a peak above 50k. Yeah, that's part of the reason throughput tanks when your active connection count gets too high. > We are running a biweekly downtime where we do a complete reindex > and vacuum full. We cannot identify certain queries causing this. If you really get bloat which requires VACUUM FULL, tracking down the reason should be a high priority. You normally shouldn't need to run that. Also, I hope when you run that it is VACUUM FULL followed by REINDEX, not the other way around. In fact, it would probably be faster to CLUSTER (if you have room) or drop the indexes, VACUUM FULL, and then create the indexes again. > The last graph in ganglia > (http://dl.dropbox.com/u/183323/CPUloadprobsdb1.jpg) shows the > avg_queries from pgbouncers stats. I think this is a symptom of > many waiting queries which accumulate. While it seems counter-intuitive, you're likely to have fewer queries waiting a long time there if you reduce default_pool_size so that contention doesn't kill performance when the queries *do* get to run. > max_connections 1000 This is what you need to try to reduce. > max_prepared_transactions 5 If you're actually using prepared transactions, make sure none are lingering about for a long time during these incidents. Well, *ever*, really -- but I would definitely check during problem periods. > wal_buffers 1MB You should bump this to 16MB. > The database is fitting completely into ram Then you should probably be adjusting sequential_page_cost and rand_page_cost. You'll probably get plans which run faster, which should help overall. -Kevin
Thomas Pöhler wrote: > We are running a biweekly downtime where we do a complete reindex and vaccum full. We cannot identify certain queries causingthis. If you feel that you need VACUUM FULL, either something terribly wrong has happened, or someone has gotten confused. In both cases it's unlikely you want to keep doing that. See http://wiki.postgresql.org/wiki/VACUUM_FULL for a nice document leading through figuring what to do instead. Note that if you have a database that fits in RAM, but is filled with the sort of index bloat garbage that using VACUUM FULL will leave behind, it will cause excessive CPU use when running queries. If you already have planned downtime, you really should try to use use CLUSTER instead, to remove that from the list of possible causes for your issue. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
> Thomas Pöhler wrote: I remember you said you were using nginx and php-fastcgi, how many web server boxes do you have, and what are the specs ?
I think adding UNION ALL SELECT 'postgres version', version(); might be a good thing. On Wed, Feb 16, 2011 at 9:55 AM, Greg Smith <greg@2ndquadrant.com> wrote: > Kevin Grittner wrote: >> >> In fact, I wonder whether we shouldn't leave a couple items you've >> excluded, since they are sometimes germane to problems posted, like >> lc_collate and TimeZone. > > I pulled some of them out only because they're not really postgresql.conf > settings; lc_collate and lc_ctype for example are set at initdb time. Feel > free to hack on that example if you feel it could be improved, just be aware > which of those things are not really in the main config file when pondering > if they should be included. > > -- > Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD > PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us > "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Scott, are you really moving that much data through memory, 70-80GB/sec is the limit of the new intel 7500 series in a bestcase scenario. - John -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Scott Marlowe Sent: 16 February 2011 15:43 To: Marti Raudsepp Cc: Thomas Pöhler; pgsql-performance@postgresql.org; Felix Feinhals; Verteiler_A-Team; Björn Metzdorf Subject: Re: [PERFORM] high user cpu, massive SELECTs, no io waiting problem On Wed, Feb 16, 2011 at 6:44 AM, Marti Raudsepp <marti@juffo.org> wrote: > On Tue, Feb 15, 2011 at 20:01, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> run htop and look for red. if youi've got lots of red bar on each CPU >> but no io wait then it's waiting for memory access. > > I don't think this is true. AFAICT the red bar refers to "system > time", time that's spent in the kernel -- either in syscalls or kernel > background threads. My point being that if you've got a lot of RED it'll be the OS waiting for memory access. Trust me, when we start to hit our memory bandwidth (in the 70 to 80 GB/s range) we start to get more and more red and more and more kernel wait time. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance This communication is for informational purposes only. It is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction. All market prices, data and other information are not warranted as to completeness or accuracy and are subject to change without notice. Any comments or statements made herein do not necessarily reflect those of JPMorgan Chase & Co., its subsidiaries and affiliates. This transmission may contain information that is privileged, confidential, legally privileged, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by JPMorgan Chase & Co., its subsidiaries and affiliates, as applicable, for any loss or damage arising in any way from its use. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format. Thank you. Please refer to http://www.jpmorgan.com/pages/disclosures for disclosures relating to European legal entities.