Thread: high user cpu, massive SELECTs, no io waiting problem

high user cpu, massive SELECTs, no io waiting problem

From
Thomas Pöhler
Date:

Hi list,

 

first time for me here, hope you’re not dealing too severely with me regarding guidelines. Giving my best.

 

We are running PostgreSQL 8.4.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (Debian 4.3.2-1.1) 4.3.2, 64-bit on a Supermicro SuperServer 8026B-6RF.

This version is downloaded from postgresql.org and selfcompiled, running for over a year now. The Server has 128 GB RAM and Four Intel® Xeon® X7550 with 64 logical cores.

Operating System is “Linux database1 2.6.32-bpo.5-amd64 #1 SMP Mon Dec 13 17:10:39 UTC 2010 x86_64 GNU/Linux”.

 

The System boots through iscsi over a Qlogic QLE4062C HBA. Pgdata and xlog is logged in over iscsi HBA too. We tried en and disabling jumbo frames. Makes no difference.

We are using a DELL Equallogic SAN Backend with SAS drives.

 

Postgres is used as  backend for a high performance website. We are using nginx with php-fastcgi and memcached.

 

Since a few weeks we have really strange peaks on this system. User CPU is increasing up to 100% and we have lots of SELECTs running.

There is no iowait at this time, only high user cpu and we don’t know where this is coming from. It seems like this is only happening under certain circumstances.

 

We can solve this problem by simply removing the load from the website by delivering an offline page. We let database calm down for a while and then slowly throttling users.

 

See ganglia: http://dl.dropbox.com/u/183323/CPUloadprobsdb1.jpg

 

Has someone made similar experiences? Perhaps there is some issue between Postgres 8.4.4 and kernel 2.6.32?

 

Thank in advance

Thomas

 

 

 

--

Turtle Entertainment GmbH

Thomas Pöhler, Manager IT Operations

Siegburger Str. 189

50679 Cologne

Germany

fon. +49 221 880449-331

fax. +49 221 880449-399

http://www.turtle-entertainment.com/

http://www.esl.eu/

http://www.consoles.net/

Managing Director: Ralf Reichert

Register Court: Local Court Cologne, HRB 36678

 

Re: high user cpu, massive SELECTs, no io waiting problem

From
Scott Marlowe
Date:
On Tue, Feb 15, 2011 at 10:19 AM, Thomas Pöhler
<tp@turtle-entertainment.de> wrote:
> Since a few weeks we have really strange peaks on this system. User CPU is
> increasing up to 100% and we have lots of SELECTs running.

Are you using pooling of some kind, or do you have LOTS of connections?

> There is no iowait at this time, only high user cpu and we don’t know where
> this is coming from. It seems like this is only happening under certain
> circumstances.

run htop and look for red.  if youi've got lots of red bar on each CPU
but no io wait then it's waiting for memory access.  Most of these
multi-core machines will be memory read / write speed bound.  Pooling
will help relieve some of that memory bandwidth load, but might not be
enough to eliminate it.

Re: high user cpu, massive SELECTs, no io waiting problem

From
"Kevin Grittner"
Date:
Thomas Pöhler<tp@turtle-entertainment.de> wrote:

> we have lots of SELECTs running.

How many?

Could you show your postgresql.conf file, with all comments removed?

What does vmstat 1 (or similar) show at baseline and during your
problem episodes?

-Kevin

Re: high user cpu, massive SELECTs, no io waiting problem

From
"Strange, John W"
Date:

You have also run analyze verbose, and checked to make sure you don’t have a ton of bloated indexes?

 

- check the process with strace –p PID

- check the diskIO with iostat, not vmstat

- run analyze verbose, and possible reindex the database, or cluster the larger tables.

- dump from pg_stat_activity, and check what the largest objects are based on relpages from pg_class.

- check index scans/table scans from pg_statio tables if you have track_activities on in the .conf file.

 

- John

 

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Thomas Pöhler
Sent: 15 February 2011 17:19
To: pgsql-performance@postgresql.org
Cc: Felix Feinhals; Verteiler_A-Team; Björn Metzdorf
Subject: [PERFORM] high user cpu, massive SELECTs, no io waiting problem

 

Hi list,

 

first time for me here, hope you’re not dealing too severely with me regarding guidelines. Giving my best.

 

We are running PostgreSQL 8.4.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (Debian 4.3.2-1.1) 4.3.2, 64-bit on a Supermicro SuperServer 8026B-6RF.

This version is downloaded from postgresql.org and selfcompiled, running for over a year now. The Server has 128 GB RAM and Four Intel® Xeon® X7550 with 64 logical cores.

Operating System is “Linux database1 2.6.32-bpo.5-amd64 #1 SMP Mon Dec 13 17:10:39 UTC 2010 x86_64 GNU/Linux”.

 

The System boots through iscsi over a Qlogic QLE4062C HBA. Pgdata and xlog is logged in over iscsi HBA too. We tried en and disabling jumbo frames. Makes no difference.

We are using a DELL Equallogic SAN Backend with SAS drives.

 

Postgres is used as  backend for a high performance website. We are using nginx with php-fastcgi and memcached.

 

Since a few weeks we have really strange peaks on this system. User CPU is increasing up to 100% and we have lots of SELECTs running.

There is no iowait at this time, only high user cpu and we don’t know where this is coming from. It seems like this is only happening under certain circumstances.

 

We can solve this problem by simply removing the load from the website by delivering an offline page. We let database calm down for a while and then slowly throttling users.

 

See ganglia: http://dl.dropbox.com/u/183323/CPUloadprobsdb1.jpg

 

Has someone made similar experiences? Perhaps there is some issue between Postgres 8.4.4 and kernel 2.6.32?

 

Thank in advance

Thomas

 

 

 

--

Turtle Entertainment GmbH

Thomas Pöhler, Manager IT Operations

Siegburger Str. 189

50679 Cologne

Germany

fon. +49 221 880449-331

fax. +49 221 880449-399

http://www.turtle-entertainment.com/

http://www.esl.eu/

http://www.consoles.net/

Managing Director: Ralf Reichert

Register Court: Local Court Cologne, HRB 36678

 

This communication is for informational purposes only. It is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction. All market prices, data and other information are not warranted as to completeness or accuracy and are subject to change without notice. Any comments or statements made herein do not necessarily reflect those of JPMorgan Chase & Co., its subsidiaries and affiliates. This transmission may contain information that is privileged, confidential, legally privileged, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by JPMorgan Chase & Co., its subsidiaries and affiliates, as applicable, for any loss or damage arising in any way from its use. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format. Thank you. Please refer to http://www.jpmorgan.com/pages/disclosures for disclosures relating to European legal entities.

Re: high user cpu, massive SELECTs, no io waiting problem

From
marcin mank
Date:
On Tue, Feb 15, 2011 at 6:19 PM, Thomas Pöhler
<tp@turtle-entertainment.de> wrote:
> Hi list,
>
> See ganglia: http://dl.dropbox.com/u/183323/CPUloadprobsdb1.jpg
>

What is the bottom graph? queries/minute? Looks like Your database is
just getting hammered.
Maybe there is a really badly coded page somewhere (a query for each
user or something similar)?

Greetings
Marcin Mańk

Re: high user cpu, massive SELECTs, no io waiting problem

From
Ivan Voras
Date:
On 15/02/2011 18:19, Thomas Pöhler wrote:
> Hi list,
>
> first time for me here, hope you’re not dealing too severely with me
> regarding guidelines. Giving my best.
>
> We are running PostgreSQL 8.4.4 on x86_64-unknown-linux-gnu, compiled by
> GCC gcc (Debian 4.3.2-1.1) 4.3.2, 64-bit on a Supermicro SuperServer
> 8026B-6RF.
>
> This version is downloaded from postgresql.org and selfcompiled, running
> for over a year now. The Server has 128 GB RAM and Four Intel® Xeon®
> X7550 with 64 logical cores.

So, 64 logical cores total.

> Operating System is “Linux database1 2.6.32-bpo.5-amd64 #1 SMP Mon Dec
> 13 17:10:39 UTC 2010 x86_64 GNU/Linux”.
>
> The System boots through iscsi over a Qlogic QLE4062C HBA. Pgdata and
> xlog is logged in over iscsi HBA too. We tried en and disabling jumbo
> frames. Makes no difference.

Are you using 10 Gbit/s Ethernet for iSCSI? Regular 1 Gbit/s Ethernet
might be too slow for you.

> Since a few weeks we have really strange peaks on this system. User CPU
> is increasing up to 100% and we have lots of SELECTs running.

> See ganglia: http://dl.dropbox.com/u/183323/CPUloadprobsdb1.jpg
>
> Has someone made similar experiences? Perhaps there is some issue
> between Postgres 8.4.4 and kernel 2.6.32?

 From your graph it looks like the number of active processes (I'm
assuming they are PostgreSQL processes) is going out of control.

There is an old problem (which I've encountered so I'm replying but it
may or may not be in your case) in which PostgreSQL starts behaving
badly even for SELECT queries if the number of simultaneous queries
exceeds the number of logical CPUs. To test this, I'd recommend setting
up a utility like pgpool-II (http://pgpool.projects.postgresql.org/) in
front of the database to try and limit the number of active connections
to nearly 64 (maybe you can have good results with 80 or 100).

You might also experiment with pgsql.max_links setting of PHP but IIRC
PHP will just refuse more connections than that instead of waiting for
them (but maybe your application can spin-wait for them, possibly while
also using usleep()).



Re: high user cpu, massive SELECTs, no io waiting problem

From
Scott Marlowe
Date:
On Tue, Feb 15, 2011 at 6:00 PM, Ivan Voras <ivoras@freebsd.org> wrote:
> There is an old problem (which I've encountered so I'm replying but it may
> or may not be in your case) in which PostgreSQL starts behaving badly even
> for SELECT queries if the number of simultaneous queries exceeds the number
> of logical CPUs.

Note that this is a problem for most RDBMS engines, not just
postgresql.  The performance drop off isn't too bad, but the total
number of connections times even a doubling of response time results
in a slow server.

> To test this, I'd recommend setting up a utility like
> pgpool-II (http://pgpool.projects.postgresql.org/) in front of the database
> to try and limit the number of active connections to nearly 64 (maybe you
> can have good results with 80 or 100).

pgpool IS the answer for most of these issues.

> You might also experiment with pgsql.max_links setting of PHP but IIRC PHP
> will just refuse more connections than that instead of waiting for them (but
> maybe your application can spin-wait for them, possibly while also using
> usleep()).

That setting is PER PROCESS so it might not help that much.

http://www.php.net/manual/en/pgsql.configuration.php#ini.pgsql.max-links

Re: high user cpu, massive SELECTs, no io waiting problem

From
Greg Smith
Date:
Kevin Grittner wrote:
> Could you show your postgresql.conf file, with all comments removed


I just added a sample query to provide the data we always want here
without people having to edit their config files, by querying
pg_settings for it, to http://wiki.postgresql.org/wiki/Server_Configuration

I already updated http://wiki.postgresql.org/wiki/SlowQueryQuestions and
http://wiki.postgresql.org/wiki/Guide_to_reporting_problems to mention
this too.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: high user cpu, massive SELECTs, no io waiting problem

From
Marti Raudsepp
Date:
On Tue, Feb 15, 2011 at 20:01, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> run htop and look for red.  if youi've got lots of red bar on each CPU
> but no io wait then it's waiting for memory access.

I don't think this is true. AFAICT the red bar refers to "system
time", time that's spent in the kernel -- either in syscalls or kernel
background threads.

Operating systems don't generally account memory accesses (cache
misses) for processes, if you don't specially ask for it. The closest
thing I know of is using Linux perf tools, e.g. "perf top -e
cache-misses". OProfile, DTrace and SystemTap can probably do
something similar.

Regards,
Marti

Re: high user cpu, massive SELECTs, no io waiting problem

From
"Kevin Grittner"
Date:
Greg Smith <greg@2ndquadrant.com> wrote:

> I just added a sample query to provide the data we always want
> here without people having to edit their config files, by
> querying pg_settings for it, to
> http://wiki.postgresql.org/wiki/Server_Configuration

Nice!  Thanks!

A few very nice things about this:

(1) You don't need rights to the postgresql.conf file; any user can
run this.

(2) You don't need to know how to strip the comments with sed or
perl or something, or go through the file with tedious manual
editing.

(3) It shows some things which aren't coming from the
postgresql.conf file which might be of interest.

In fact, I wonder whether we shouldn't leave a couple items you've
excluded, since they are sometimes germane to problems posted, like
lc_collate and TimeZone.

-Kevin

Re: high user cpu, massive SELECTs, no io waiting problem

From
Greg Smith
Date:
Kevin Grittner wrote:
> In fact, I wonder whether we shouldn't leave a couple items you've
> excluded, since they are sometimes germane to problems posted, like
> lc_collate and TimeZone.

I pulled some of them out only because they're not really
postgresql.conf settings; lc_collate and lc_ctype for example are set at
initdb time.  Feel free to hack on that example if you feel it could be
improved, just be aware which of those things are not really in the main
config file when pondering if they should be included.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: high user cpu, massive SELECTs, no io waiting problem

From
Scott Marlowe
Date:
On Wed, Feb 16, 2011 at 6:44 AM, Marti Raudsepp <marti@juffo.org> wrote:
> On Tue, Feb 15, 2011 at 20:01, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> run htop and look for red.  if youi've got lots of red bar on each CPU
>> but no io wait then it's waiting for memory access.
>
> I don't think this is true. AFAICT the red bar refers to "system
> time", time that's spent in the kernel -- either in syscalls or kernel
> background threads.

My point being that if you've got a lot of RED it'll be the OS waiting
for memory access.  Trust me, when we start to hit our memory
bandwidth (in the 70 to 80 GB/s range) we start to get more and more
red and more and more kernel wait time.

Re: high user cpu, massive SELECTs, no io waiting problem

From
Scott Marlowe
Date:
Yeah, at max load we are.  We're running quad 12 core AMD Magny Cours.
 Under max load all of our cores go about 20 to 30% red (i.e. kernel)
and we wind up waiting on the kernel much more.  Could be a mix of
context switching and waiting on memory, so it's just a guesstimate
I'm making based on previous testing with Greg Smith's memory
streaming test and familiarity with this system.

On Wed, Feb 16, 2011 at 8:53 AM, Strange, John W
<john.w.strange@jpmchase.com> wrote:
> Scott, are you really moving that much data through memory, 70-80GB/sec is the limit of the new intel 7500 series in
abest case scenario. 
>
> - John
>
> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Scott
Marlowe
> Sent: 16 February 2011 15:43
> To: Marti Raudsepp
> Cc: Thomas Pöhler; pgsql-performance@postgresql.org; Felix Feinhals; Verteiler_A-Team; Björn Metzdorf
> Subject: Re: [PERFORM] high user cpu, massive SELECTs, no io waiting problem
>
> On Wed, Feb 16, 2011 at 6:44 AM, Marti Raudsepp <marti@juffo.org> wrote:
>> On Tue, Feb 15, 2011 at 20:01, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>>> run htop and look for red.  if youi've got lots of red bar on each CPU
>>> but no io wait then it's waiting for memory access.
>>
>> I don't think this is true. AFAICT the red bar refers to "system
>> time", time that's spent in the kernel -- either in syscalls or kernel
>> background threads.
>
> My point being that if you've got a lot of RED it'll be the OS waiting
> for memory access.  Trust me, when we start to hit our memory
> bandwidth (in the 70 to 80 GB/s range) we start to get more and more
> red and more and more kernel wait time.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
> This communication is for informational purposes only. It is not
> intended as an offer or solicitation for the purchase or sale of
> any financial instrument or as an official confirmation of any
> transaction. All market prices, data and other information are not
> warranted as to completeness or accuracy and are subject to change
> without notice. Any comments or statements made herein do not
> necessarily reflect those of JPMorgan Chase & Co., its subsidiaries
> and affiliates.
>
> This transmission may contain information that is privileged,
> confidential, legally privileged, and/or exempt from disclosure
> under applicable law. If you are not the intended recipient, you
> are hereby notified that any disclosure, copying, distribution, or
> use of the information contained herein (including any reliance
> thereon) is STRICTLY PROHIBITED. Although this transmission and any
> attachments are believed to be free of any virus or other defect
> that might affect any computer system into which it is received and
> opened, it is the responsibility of the recipient to ensure that it
> is virus free and no responsibility is accepted by JPMorgan Chase &
> Co., its subsidiaries and affiliates, as applicable, for any loss
> or damage arising in any way from its use. If you received this
> transmission in error, please immediately contact the sender and
> destroy the material in its entirety, whether in electronic or hard
> copy format. Thank you.
>
> Please refer to http://www.jpmorgan.com/pages/disclosures for
> disclosures relating to European legal entities.
>



--
To understand recursion, one must first understand recursion.

Re: high user cpu, massive SELECTs, no io waiting problem

From
"Kevin Grittner"
Date:
Justin Pitts <justinpitts@gmail.com> wrote:
> I think adding
>
> UNION ALL SELECT 'postgres version', version();
>
> might be a good thing.

Good point.  Added.

> Greg Smith <greg@2ndquadrant.com> wrote:
>> Kevin Grittner wrote:
>>>
>>> In fact, I wonder whether we shouldn't leave a couple items
>>> you've excluded, since they are sometimes germane to problems
>>> posted, like lc_collate and TimeZone.
>>
>> I pulled some of them out only because they're not really
>> postgresql.conf settings; lc_collate and lc_ctype for example are
>> set at initdb time.  Feel free to hack on that example if you
>> feel it could be improved, just be aware which of those things
>> are not really in the main config file when pondering if they
>> should be included.

Basically, the ones I could remember us needing to ask about on
multiple occasions, I put back -- provisionally.  If someone thinks
they're pointless, I won't worry about them being dropped again:
time zone, character encoding scheme, character set, and collation.
I'm pretty sure I've seen us ask about all of those in trying to
sort out a problem.

I also tried the query on a newly installed HEAD build which had no
manual changes to the postgresql.conf file and found a few others
which seemed to me to be worth suppressing.

I took my shot -- anyone else is welcome to do so....  :-)

-Kevin

Re: high user cpu, massive SELECTs, no io waiting problem

From
Thomas Pöhler
Date:
Hi,

we are using two instances of pgbouncer v1.4 for connection pooling.
One for prepared statements (pool_mode session) and one without (pool_mode transaction).

Pgbouncer.ini:
[pgbouncer]
pool_mode = transaction/session
server_reset_query = DISCARD ALL;
server_check_query = select 1
server_check_delay = 10
max_client_conn = 10000
default_pool_size = 450
log_connections = 0
log_disconnections = 0
log_pooler_errors = 1
client_login_timeout = 0


I will examine htop next time during a peak.

If I remember correctly vmstat showed lots of context switches during a peak above 50k.

We are running a biweekly downtime where we do a complete reindex and vaccum full. We cannot identify certain queries
causingthis.  

The last graph in ganglia (http://dl.dropbox.com/u/183323/CPUloadprobsdb1.jpg) shows the avg_queries from pgbouncers
stats.I think this is a symptom of many waiting queries which accumulate. 

Our iscsi is connected with 3Gibt/s. But that's more than enough. We don't have high traffic throughput.

This is the result of the query you gave me:

version    PostgreSQL 8.4.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (Debian 4.3.2-1.1) 4.3.2, 64-bit
checkpoint_segments    40
custom_variable_classes    pg_stat_statements
effective_cache_size    48335MB
escape_string_warning    off
fsync    on
lc_collate    C
lc_ctype    C
listen_addresses    *
log_destination    stderr
log_line_prefix    %t %p %d %u %r
log_lock_waits    on
log_min_duration_statement    1s
log_min_messages    notice
log_rotation_size    10MB
log_temp_files    50MB
logging_collector    on
maintenance_work_mem    1GB
max_connections    1000
max_prepared_transactions    5
max_stack_depth    2MB
pg_stat_statements.max    10000
pg_stat_statements.track    all
port    5433
server_encoding    UTF8
shared_buffers    16GB
TimeZone    Europe/Berlin
update_process_title    on
wal_buffers    1MB
work_mem    32MB


Seems like connection limit 10000 is way too much on pgbouncer? Our queries overall are not that CPU intensive. If they
areslow, they are mostly waiting for disk io. When having a look at the traffic of this database server we see 2/3 of
thetraffic is going to san/disk and only 1/3 going to the server. In other words from the traffic view, 2/3 of our
operationsare writes and 1/3 are reads. The database is fitting completely into ram, so reads should not be a problem. 

Appreciate your help!
Thomas

-----Ursprüngliche Nachricht-----
Von: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Gesendet: Mittwoch, 16. Februar 2011 17:09
An: Greg Smith; Justin Pitts
Cc: pgsql-performance@postgresql.org; Verteiler_A-Team; Björn Metzdorf; Felix Feinhals; Thomas Pöhler
Betreff: Re: [PERFORM] high user cpu, massive SELECTs, no io waiting problem

Justin Pitts <justinpitts@gmail.com> wrote:
> I think adding
>
> UNION ALL SELECT 'postgres version', version();
>
> might be a good thing.

Good point.  Added.

> Greg Smith <greg@2ndquadrant.com> wrote:
>> Kevin Grittner wrote:
>>>
>>> In fact, I wonder whether we shouldn't leave a couple items
>>> you've excluded, since they are sometimes germane to problems
>>> posted, like lc_collate and TimeZone.
>>
>> I pulled some of them out only because they're not really
>> postgresql.conf settings; lc_collate and lc_ctype for example are
>> set at initdb time.  Feel free to hack on that example if you
>> feel it could be improved, just be aware which of those things
>> are not really in the main config file when pondering if they
>> should be included.

Basically, the ones I could remember us needing to ask about on
multiple occasions, I put back -- provisionally.  If someone thinks
they're pointless, I won't worry about them being dropped again:
time zone, character encoding scheme, character set, and collation.
I'm pretty sure I've seen us ask about all of those in trying to
sort out a problem.

I also tried the query on a newly installed HEAD build which had no
manual changes to the postgresql.conf file and found a few others
which seemed to me to be worth suppressing.

I took my shot -- anyone else is welcome to do so....  :-)

-Kevin

Re: high user cpu, massive SELECTs, no io waiting problem

From
Cédric Villemain
Date:
2011/2/16 Thomas Pöhler <tp@turtle-entertainment.de>:
> Hi,
>
> we are using two instances of pgbouncer v1.4 for connection pooling.
> One for prepared statements (pool_mode session) and one without (pool_mode transaction).
>
> Pgbouncer.ini:
> [pgbouncer]
> pool_mode = transaction/session
> server_reset_query = DISCARD ALL;
> server_check_query = select 1
> server_check_delay = 10
> max_client_conn = 10000
> default_pool_size = 450
> log_connections = 0
> log_disconnections = 0
> log_pooler_errors = 1
> client_login_timeout = 0
>
>
> I will examine htop next time during a peak.
>
> If I remember correctly vmstat showed lots of context switches during a peak above 50k.
>
> We are running a biweekly downtime where we do a complete reindex and vaccum full. We cannot identify certain queries
causingthis. 
>
> The last graph in ganglia (http://dl.dropbox.com/u/183323/CPUloadprobsdb1.jpg) shows the avg_queries from pgbouncers
stats.I think this is a symptom of many waiting queries which accumulate. 
>
> Our iscsi is connected with 3Gibt/s. But that's more than enough. We don't have high traffic throughput.
>
> This is the result of the query you gave me:
>
> version PostgreSQL 8.4.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (Debian 4.3.2-1.1) 4.3.2, 64-bit
> checkpoint_segments     40
> custom_variable_classes pg_stat_statements
> effective_cache_size    48335MB
> escape_string_warning   off
> fsync   on
> lc_collate      C
> lc_ctype        C
> listen_addresses        *
> log_destination stderr
> log_line_prefix %t %p %d %u %r
> log_lock_waits  on
> log_min_duration_statement      1s
> log_min_messages        notice
> log_rotation_size       10MB
> log_temp_files  50MB
> logging_collector       on
> maintenance_work_mem    1GB
> max_connections 1000
> max_prepared_transactions       5
> max_stack_depth 2MB
> pg_stat_statements.max  10000
> pg_stat_statements.track        all
> port    5433
> server_encoding UTF8
> shared_buffers  16GB
> TimeZone        Europe/Berlin
> update_process_title    on
> wal_buffers     1MB
> work_mem        32MB
>
>
> Seems like connection limit 10000 is way too much on pgbouncer? Our queries overall are not that CPU intensive. If
theyare slow, they are mostly waiting for disk io. When having a look at the traffic of this database server we see 2/3
ofthe traffic is going to san/disk and only 1/3 going to the server. In other words from the traffic view, 2/3 of our
operationsare writes and 1/3 are reads. The database is fitting completely into ram, so reads should not be a problem. 

I used pgbouncer with way more than that, not an issue on its own
*but* can you export the pgbouncers in another box ?
I get issues in very high-mem usage (more than IO) and ton's of
connection via pgbouncer, then moving the bouncer in a 3rd box salve
the situation.

>
> Appreciate your help!
> Thomas
>
> -----Ursprüngliche Nachricht-----
> Von: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
> Gesendet: Mittwoch, 16. Februar 2011 17:09
> An: Greg Smith; Justin Pitts
> Cc: pgsql-performance@postgresql.org; Verteiler_A-Team; Björn Metzdorf; Felix Feinhals; Thomas Pöhler
> Betreff: Re: [PERFORM] high user cpu, massive SELECTs, no io waiting problem
>
> Justin Pitts <justinpitts@gmail.com> wrote:
>> I think adding
>>
>> UNION ALL SELECT 'postgres version', version();
>>
>> might be a good thing.
>
> Good point.  Added.
>
>> Greg Smith <greg@2ndquadrant.com> wrote:
>>> Kevin Grittner wrote:
>>>>
>>>> In fact, I wonder whether we shouldn't leave a couple items
>>>> you've excluded, since they are sometimes germane to problems
>>>> posted, like lc_collate and TimeZone.
>>>
>>> I pulled some of them out only because they're not really
>>> postgresql.conf settings; lc_collate and lc_ctype for example are
>>> set at initdb time.  Feel free to hack on that example if you
>>> feel it could be improved, just be aware which of those things
>>> are not really in the main config file when pondering if they
>>> should be included.
>
> Basically, the ones I could remember us needing to ask about on
> multiple occasions, I put back -- provisionally.  If someone thinks
> they're pointless, I won't worry about them being dropped again:
> time zone, character encoding scheme, character set, and collation.
> I'm pretty sure I've seen us ask about all of those in trying to
> sort out a problem.
>
> I also tried the query on a newly installed HEAD build which had no
> manual changes to the postgresql.conf file and found a few others
> which seemed to me to be worth suppressing.
>
> I took my shot -- anyone else is welcome to do so....  :-)
>
> -Kevin
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Re: high user cpu, massive SELECTs, no io waiting problem

From
"Kevin Grittner"
Date:
Thomas Pöhler<tp@turtle-entertainment.de> wrote:

> we are using two instances of pgbouncer v1.4 for connection
> pooling.  One for prepared statements (pool_mode session) and one
> without (pool_mode transaction).

> max_client_conn = 10000
> default_pool_size = 450

Your best defense against the "thundering herd" issues you describe
would be to eliminate the session pool (if you can), and drop the
default_pool_size for the transaction pool to where at peak the
number of backends actually busy is about twice your number of
*actual* cores.  (Don't count hyperthreading "logical" cores for
this purpose.)  max_client_conn can be as high as you need; the
point is for the connection pool to funnel the requests through a
much smaller pool of database connections.

> If I remember correctly vmstat showed lots of context switches
> during a peak above 50k.

Yeah, that's part of the reason throughput tanks when your active
connection count gets too high.

> We are running a biweekly downtime where we do a complete reindex
> and vacuum full. We cannot identify certain queries causing this.

If you really get bloat which requires VACUUM FULL, tracking down
the reason should be a high priority.  You normally shouldn't need
to run that.

Also, I hope when you run that it is VACUUM FULL followed by
REINDEX, not the other way around.  In fact, it would probably be
faster to CLUSTER (if you have room) or drop the indexes, VACUUM
FULL, and then create the indexes again.

> The last graph in ganglia
> (http://dl.dropbox.com/u/183323/CPUloadprobsdb1.jpg) shows the
> avg_queries from pgbouncers stats. I think this is a symptom of
> many waiting queries which accumulate.

While it seems counter-intuitive, you're likely to have fewer
queries waiting a long time there if you reduce
default_pool_size so that contention doesn't kill performance when
the queries *do* get to run.

> max_connections    1000

This is what you need to try to reduce.

> max_prepared_transactions    5

If you're actually using prepared transactions, make sure none are
lingering about for a long time during these incidents.  Well,
*ever*, really -- but I would definitely check during problem
periods.

> wal_buffers    1MB

You should bump this to 16MB.

> The database is fitting completely into ram

Then you should probably be adjusting sequential_page_cost and
rand_page_cost.  You'll probably get plans which run faster, which
should help overall.

-Kevin

Re: high user cpu, massive SELECTs, no io waiting problem

From
Greg Smith
Date:
Thomas Pöhler wrote:
> We are running a biweekly downtime where we do a complete reindex and vaccum full. We cannot identify certain queries
causingthis. 

If you feel that you need VACUUM FULL, either something terribly wrong
has happened, or someone has gotten confused.  In both cases it's
unlikely you want to keep doing that.  See
http://wiki.postgresql.org/wiki/VACUUM_FULL for a nice document leading
through figuring what to do instead.

Note that if you have a database that fits in RAM, but is filled with
the sort of index bloat garbage that using VACUUM FULL will leave
behind, it will cause excessive CPU use when running queries.  If you
already have planned downtime, you really should try to use use CLUSTER
instead, to remove that from the list of possible causes for your issue.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: high user cpu, massive SELECTs, no io waiting problem

From
"Pierre C"
Date:
> Thomas Pöhler wrote:

I remember you said you were using nginx and php-fastcgi, how many web
server boxes do you have, and what are the specs ?

Re: high user cpu, massive SELECTs, no io waiting problem

From
Justin Pitts
Date:
I think adding

UNION ALL SELECT 'postgres version', version();

might be a good thing.

On Wed, Feb 16, 2011 at 9:55 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> Kevin Grittner wrote:
>>
>> In fact, I wonder whether we shouldn't leave a couple items you've
>> excluded, since they are sometimes germane to problems posted, like
>> lc_collate and TimeZone.
>
> I pulled some of them out only because they're not really postgresql.conf
> settings; lc_collate and lc_ctype for example are set at initdb time.  Feel
> free to hack on that example if you feel it could be improved, just be aware
> which of those things are not really in the main config file when pondering
> if they should be included.
>
> --
> Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
> PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
> "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: high user cpu, massive SELECTs, no io waiting problem

From
"Strange, John W"
Date:
Scott, are you really moving that much data through memory, 70-80GB/sec is the limit of the new intel 7500 series in a
bestcase scenario.  
 

- John

-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Scott
Marlowe
Sent: 16 February 2011 15:43
To: Marti Raudsepp
Cc: Thomas Pöhler; pgsql-performance@postgresql.org; Felix Feinhals; Verteiler_A-Team; Björn Metzdorf
Subject: Re: [PERFORM] high user cpu, massive SELECTs, no io waiting problem

On Wed, Feb 16, 2011 at 6:44 AM, Marti Raudsepp <marti@juffo.org> wrote:
> On Tue, Feb 15, 2011 at 20:01, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> run htop and look for red.  if youi've got lots of red bar on each CPU
>> but no io wait then it's waiting for memory access.
>
> I don't think this is true. AFAICT the red bar refers to "system
> time", time that's spent in the kernel -- either in syscalls or kernel
> background threads.

My point being that if you've got a lot of RED it'll be the OS waiting
for memory access.  Trust me, when we start to hit our memory
bandwidth (in the 70 to 80 GB/s range) we start to get more and more
red and more and more kernel wait time.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
This communication is for informational purposes only. It is not
intended as an offer or solicitation for the purchase or sale of
any financial instrument or as an official confirmation of any
transaction. All market prices, data and other information are not
warranted as to completeness or accuracy and are subject to change
without notice. Any comments or statements made herein do not
necessarily reflect those of JPMorgan Chase & Co., its subsidiaries
and affiliates.

This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase &
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.

Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to European legal entities.