Thread: Queries becoming slow under heavy load

From:
"Anne Rosset"
Date:

Hi,

We are running some performances tests.  With a lot of concurrent access,  queries get very slow. When there is no load, those queries run fast.

We kind of see a trend about these queries:  it seems like the ones that become very slow have an ORDER BY or MAX in them.

 

Here are our config settings:

              name               |         setting          |                                                          description                                                         
---------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------
add_missing_from                | off                      | Automatically adds missing table references to FROM clauses.
allow_system_table_mods         | off                      | Allows modifications of the structure of system tables.
archive_command                 | (disabled)               | Sets the shell command that will be called to archive a WAL file.
archive_mode                    | off                      | Allows archiving of WAL files using archive_command.
archive_timeout                 | 0                        | Forces a switch to the next xlog file if a new file has not been started within N seconds.
array_nulls                     | on                       | Enable input of NULL elements in arrays.
authentication_timeout          | 1min                     | Sets the maximum allowed time to complete client authentication.
autovacuum                      | on                       | Starts the autovacuum subprocess.
autovacuum_analyze_scale_factor | 0.1                      | Number of tuple inserts, updates or deletes prior to analyze as a fraction of reltuples.
autovacuum_analyze_threshold    | 250                      | Minimum number of tuple inserts, updates or deletes prior to analyze.
autovacuum_freeze_max_age       | 200000000                | Age at which to autovacuum a table to prevent transaction ID wraparound.
autovacuum_max_workers          | 3                        | Sets the maximum number of simultaneously running autovacuum worker processes.
autovacuum_naptime              | 5min                     | Time to sleep between autovacuum runs.
autovacuum_vacuum_cost_delay    | 20ms                     | Vacuum cost delay in milliseconds, for autovacuum.
autovacuum_vacuum_cost_limit    | -1                       | Vacuum cost amount available before napping, for autovacuum.
autovacuum_vacuum_scale_factor  | 0.2                      | Number of tuple updates or deletes prior to vacuum as a fraction of reltuples.
autovacuum_vacuum_threshold     | 500                      | Minimum number of tuple updates or deletes prior to vacuum.
backslash_quote                 | safe_encoding            | Sets whether "\'" is allowed in string literals.
bgwriter_delay                  | 200ms                    | Background writer sleep time between rounds.
bgwriter_lru_maxpages           | 100                      | Background writer maximum number of LRU pages to flush per round.
bgwriter_lru_multiplier         | 2                        | Background writer multiplier on average buffers to scan per round.
block_size                      | 8192                     | Shows the size of a disk block.
bonjour_name                    |                          | Sets the Bonjour broadcast service name.
check_function_bodies           | on                       | Check function bodies during CREATE FUNCTION.
checkpoint_completion_target    | 0.5                      | Time spent flushing dirty buffers during checkpoint, as fraction of checkpoint interval.
checkpoint_segments             | 3                        | Sets the maximum distance in log segments between automatic WAL checkpoints.
checkpoint_timeout              | 5min                     | Sets the maximum time between automatic WAL checkpoints.
checkpoint_warning              | 30s                      | Enables warnings if checkpoint segments are filled more frequently than this.
client_encoding                 | UTF8                     | Sets the client's character set encoding.
client_min_messages             | notice                   | Sets the message levels that are sent to the client.
commit_delay                    | 250                      | Sets the delay in microseconds between transaction commit and flushing WAL to disk.
commit_siblings                 | 10                       | Sets the minimum concurrent open transactions before performing commit_delay.
constraint_exclusion            | off                      | Enables the planner to use constraints to optimize queries.
cpu_index_tuple_cost            | 0.005                    | Sets the planner's estimate of the cost of processing each index entry during an index scan.
cpu_operator_cost               | 0.0025                   | Sets the planner's estimate of the cost of processing each operator or function call.
cpu_tuple_cost                  | 0.01                     | Sets the planner's estimate of the cost of processing each tuple (row).
custom_variable_classes         |                          | Sets the list of known custom variable classes.
DateStyle                       | ISO, MDY                 | Sets the display format for date and time values.
db_user_namespace               | off                      | Enables per-database user names.
deadlock_timeout                | 1s                       | Sets the time to wait on a lock before checking for deadlock.
debug_assertions                | off                      | Turns on various assertion checks.
debug_pretty_print              | off                      | Indents parse and plan tree displays.
debug_print_parse               | off                      | Prints the parse tree to the server log.
debug_print_plan                | off                      | Prints the execution plan to server log.
debug_print_rewritten           | off                      | Prints the parse tree after rewriting to server log.
default_statistics_target       | 10                       | Sets the default statistics target.
default_tablespace              |                          | Sets the default tablespace to create tables and indexes in.
default_text_search_config      | pg_catalog.simple        | Sets default text search configuration.

 

 

and the box info:

> cat /proc/meminfo

MemTotal:      8177116 kB
MemFree:       2830212 kB
Buffers:         83212 kB
Cached:        2385740 kB
SwapCached:         32 kB
Active:        4037560 kB
Inactive:      1082912 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      8177116 kB
LowFree:       2830212 kB
SwapTotal:     2097112 kB
SwapFree:      2096612 kB
Dirty:            4548 kB
Writeback:          72 kB
AnonPages:     2651288 kB
Mapped:         311824 kB
Slab:           173968 kB
PageTables:      20512 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   6185668 kB
Committed_AS:  3602784 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    263672 kB
VmallocChunk: 34359474295 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

 

> cat /proc/meminfo

MemTotal:      8177116 kB
MemFree:       2830212 kB
Buffers:         83212 kB
Cached:        2385740 kB
SwapCached:         32 kB
Active:        4037560 kB
Inactive:      1082912 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      8177116 kB
LowFree:       2830212 kB
SwapTotal:     2097112 kB
SwapFree:      2096612 kB
Dirty:            4548 kB
Writeback:          72 kB
AnonPages:     2651288 kB
Mapped:         311824 kB
Slab:           173968 kB
PageTables:      20512 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   6185668 kB
Committed_AS:  3602784 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    263672 kB
VmallocChunk: 34359474295 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

 

 

 

It seems to me that we should try increasing shared_buffers. But do you have any other suggestions? Or do you see anything wrong in our config?

 

 

Thanks,

Anne

From:
Andy Colson
Date:

On 1/25/2011 3:37 PM, Anne Rosset wrote:
> Hi,
>
> We are running some performances tests. With a lot of concurrent access,
> queries get very slow. When there is no load, those queries run fast.
>
> We kind of see a trend about these queries: it seems like the ones that
> become very slow have an ORDER BY or MAX in them.
>
> Here are our config settings:
>

<SNIP>

> It seems to me that we should try increasing shared_buffers. But do you
> have any other suggestions? Or do you see anything wrong in our config?
>
> Thanks,
>
> Anne
>

While I applaud your attempt to get us lots of information,
unfortunately the the one property you ask about (shared_buffers), I
can't seem to find.

So, maybe you could post a bit more:

1) how many concurrent clients?
2) can we see an explain analyze for a query when its fast, and then
again when its slow?
3) Is this box dedicated to PG or are there other services running?
4) Looks like you have 8 Gig of ram, so I assume this is a 64 bit OS,
can you tell us what you have for:

shared_buffers
effective_cahce_size
work_mem


5) Once many clients start hitting the db, it might not all fit into ram
and start hitting the HD, can you tell us what sort of IO you have
(sata, scsi, raid, # of disks, etc).

The stats from /proc/meminfo:
SwapTotal:     2097112 kB
SwapFree:      2096612 kB

Was this run when the system was busy?  Looks like you are not using any
swap, so thats good at least.  Oh, wait, there are two cat
/proc/meminfo's.  Is one when its fast and one when its slow?

Looks to me, in both cases, you are not using much memory at all.  (if
you happen to have 'free', its output is a little more readable, if you
wouldn't mind posting it (only really need it for when the box is slow)

-Andy

From:
"Kevin Grittner"
Date:

"Anne Rosset" <> wrote:

> We are running some performances tests.  With a lot of concurrent
> access, queries get very slow. When there is no load, those
> queries run fast.

What's "a lot"?

> We kind of see a trend about these queries:  it seems like the
> ones that become very slow have an ORDER BY or MAX in them.

Without knowing the PostgreSQL version or more details about the
queries, I would only be guessing at the cause.

> It seems to me that we should try increasing shared_buffers. But
> do you have any other suggestions? Or do you see anything wrong in
> our config?

I don't think you showed us your whole PostgreSQL configuration, and
the format was hard to read -- it's best to show the contents of
your postgresql.conf file, minus comments.

If you read this page and re-post we can probably be more helpful:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

-Kevin

From:
"Ing. Marcos Ortiz Valmaseda"
Date:

When you say that with a lot of concurrent access, queries get very slow, How many concurrent connections to your
serverhave you had? 
more that max_connections´value?
If you want to have many concurrent connections, you should have consider to use a pooling connection system like
pgbounceror pgpool. 

Which are the values for:
- work_mem
- shared_buffers
- maintenance_work_mem
- effective_cache_size
- effective_io_concurrency
- server_version

Which are your platform?

Regards
--
Ing. Marcos Luís Ortíz Valmaseda
System Engineer -- Database Administrator

Centro de Tecnologías de Gestión de Datos (DATEC)
Universidad de las Ciencias Informáticas
http://postgresql.uci.cu


From:
Ivan Voras
Date:

On 25/01/2011 22:37, Anne Rosset wrote:
> Hi,
>
> We are running some performances tests.  With a lot of concurrent
> access,  queries get very slow. When there is no load, those queries run
> fast.

As others said, you need to stat how many concurrent clients are working
on the database and also the number of logical CPUs (CPU cores,
hyperthreading) are present in the machine. So far, as a rule of thumb,
if you have more concurrent active connections (i.e. doing queries, not
idling), you will experience a sharp decline in performance if this
number exceeds the number of logical CPUs in the machine.

From:
"Anne Rosset"
Date:

Sorry it seems like the postgres configuration didn't come thru the
first time.

name    |    setting
---------------------------------    +
--------------------------
add_missing_from    |    off
allow_system_table_mods    |    off
archive_command    |    (disabled)
archive_mode    |    off
archive_timeout    |    0
array_nulls    |    on
authentication_timeout    |    1min
autovacuum    |    on
autovacuum_analyze_scale_factor    |    0.1
autovacuum_analyze_threshold    |    250
autovacuum_freeze_max_age    |    200000000
autovacuum_max_workers    |    3
autovacuum_naptime    |    5min
autovacuum_vacuum_cost_delay    |    20ms
autovacuum_vacuum_cost_limit    |    -1
autovacuum_vacuum_scale_factor    |    0.2
autovacuum_vacuum_threshold    |    500
backslash_quote    |    safe_encoding
bgwriter_delay    |    200ms
bgwriter_lru_maxpages    |    100
bgwriter_lru_multiplier    |    2
block_size    |    8192
bonjour_name    |
check_function_bodies    |    on
checkpoint_completion_target    |    0.5
checkpoint_segments    |    3
checkpoint_timeout    |    5min
checkpoint_warning    |    30s
client_encoding    |    UTF8
client_min_messages    |    notice
commit_delay    |    250
commit_siblings    |    10
constraint_exclusion    |    off
cpu_index_tuple_cost    |    0.005
cpu_operator_cost    |    0.0025
cpu_tuple_cost    |    0.01
custom_variable_classes    |
DateStyle    |    ISO, MDY
db_user_namespace    |    off
deadlock_timeout    |    1s
debug_assertions    |    off
debug_pretty_print    |    off
debug_print_parse    |    off
debug_print_plan    |    off
debug_print_rewritten    |    off
default_statistics_target    |    10
default_tablespace    |
default_text_search_config    |    pg_catalog.simple
default_transaction_isolation    |    read committed
default_transaction_read_only    |    off
default_with_oids    |    off
effective_cache_size    |    4000000kB
enable_bitmapscan    |    on
enable_hashagg    |    on
enable_hashjoin    |    on
enable_indexscan    |    on
enable_mergejoin    |    off
enable_nestloop    |    on
enable_seqscan    |    on
enable_sort    |    on
enable_tidscan    |    on
escape_string_warning    |    on
explain_pretty_print    |    on
extra_float_digits    |    0
from_collapse_limit    |    8
fsync    |    on
full_page_writes    |    on
geqo    |    off
geqo_effort    |    5
geqo_generations    |    0
geqo_pool_size    |    0
geqo_selection_bias    |    2
geqo_threshold    |    12
gin_fuzzy_search_limit    |    0
ignore_system_indexes    |    off
integer_datetimes    |    off
join_collapse_limit    |    8
krb_caseins_users    |    off
krb_server_hostname    |
krb_srvname    |    postgres
lc_collate    |    en_US.UTF-8
lc_ctype    |    en_US.UTF-8
lc_messages    |    en_US.UTF-8
lc_monetary    |    en_US.UTF-8
lc_numeric    |    en_US.UTF-8
lc_time    |    en_US.UTF-8
listen_addresses    |    127.0.0.1,208.75.198.149
local_preload_libraries    |
log_autovacuum_min_duration    |    -1
log_checkpoints    |    off
log_connections    |    off
log_destination    |    stderr
log_disconnections    |    off
log_duration    |    off
log_error_verbosity    |    default
log_executor_stats    |    off
log_hostname    |    off
log_line_prefix    |
log_lock_waits    |    off
log_min_duration_statement    |    -1
log_min_error_statement    |    error
log_min_messages    |    notice
log_parser_stats    |    off
log_planner_stats    |    off
log_rotation_age    |    0
log_rotation_size    |    0
log_statement    |    none
log_statement_stats    |    off
log_temp_files    |    -1
log_timezone    |    Asia/Kolkata
log_truncate_on_rotation    |    off
logging_collector    |    on
maintenance_work_mem    |    256MB
max_connections    |    100
max_files_per_process    |    1000
max_fsm_pages    |    500000
max_fsm_relations    |    500
max_function_args    |    100
max_identifier_length    |    63
max_index_keys    |    32
max_locks_per_transaction    |    64
max_prepared_transactions    |    5
max_stack_depth    |    5MB
password_encryption    |    on
port    |    5432
post_auth_delay    |    0
pre_auth_delay    |    0
random_page_cost    |    4
regex_flavor    |    advanced
search_path    |    "$user",public
seq_page_cost    |    1
server_encoding    |    UTF8
server_version    |    8.3.8
server_version_num    |    80308
session_replication_role    |    origin
shared_buffers    |    240MB
silent_mode    |    off
sql_inheritance    |    on
ssl    |    off
standard_conforming_strings    |    off
statement_timeout    |    0
superuser_reserved_connections    |    3
synchronize_seqscans    |    on
synchronous_commit    |    on
syslog_facility    |    LOCAL0
syslog_ident    |    postgres
tcp_keepalives_count    |    9
tcp_keepalives_idle    |    7200
tcp_keepalives_interval    |    75
temp_buffers    |    1024
temp_tablespaces    |
TimeZone    |    Asia/Kolkata
timezone_abbreviations    |    Default
trace_notify    |    off
trace_sort    |    off
track_activities    |    on
track_counts    |    on
transaction_isolation    |    read committed
transaction_read_only    |    off
transform_null_equals    |    off
unix_socket_group    |
unix_socket_permissions    |    511
update_process_title    |    on
vacuum_cost_delay    |    50ms
vacuum_cost_limit    |    200
vacuum_cost_page_dirty    |    20
vacuum_cost_page_hit    |    1
vacuum_cost_page_miss    |    10
vacuum_freeze_min_age    |    100000000
wal_buffers    |    10MB
wal_sync_method    |    fdatasync
wal_writer_delay    |    200ms
work_mem    |    64MB
xmlbinary    |    base64
xmloption    |    content
zero_damaged_pages    |    off
(176 rows)


Today we did more analysis and observed  postgress processes that
continually reported status 'D' in top. The corresponding vmstat showed
a proportionate amount of processes under the 'b' column,
"uninterruptible" state.

We've been able to match  long running database queries to such
processes. This occurs under relatively low load average (say 4 out of
8) and can involve as little as 1 single sql query.
It seems that many queries get into that state and that is causing our
load average to spike very high.
Queries are finishing even though we continue to see an increase in
postgres processes in 'D' state.
Are we facing some serious db locking?  What could lead to this?
(The box has 8G and 8 cores)

Thanks for any help,
Anne




-----Original Message-----
From: Andy Colson [mailto:]
Sent: Tuesday, January 25, 2011 2:13 PM
To: Anne Rosset
Cc: 
Subject: Re: [PERFORM] Queries becoming slow under heavy load

On 1/25/2011 3:37 PM, Anne Rosset wrote:
> Hi,
>
> We are running some performances tests. With a lot of concurrent
> access, queries get very slow. When there is no load, those queries
run fast.
>
> We kind of see a trend about these queries: it seems like the ones
> that become very slow have an ORDER BY or MAX in them.
>
> Here are our config settings:
>

<SNIP>

> It seems to me that we should try increasing shared_buffers. But do
> you have any other suggestions? Or do you see anything wrong in our
config?
>
> Thanks,
>
> Anne
>

While I applaud your attempt to get us lots of information,
unfortunately the the one property you ask about (shared_buffers), I
can't seem to find.

So, maybe you could post a bit more:

1) how many concurrent clients?
2) can we see an explain analyze for a query when its fast, and then
again when its slow?
3) Is this box dedicated to PG or are there other services running?
4) Looks like you have 8 Gig of ram, so I assume this is a 64 bit OS,
can you tell us what you have for:

shared_buffers
effective_cahce_size
work_mem


5) Once many clients start hitting the db, it might not all fit into ram
and start hitting the HD, can you tell us what sort of IO you have
(sata, scsi, raid, # of disks, etc).

The stats from /proc/meminfo:
SwapTotal:     2097112 kB
SwapFree:      2096612 kB

Was this run when the system was busy?  Looks like you are not using any
swap, so thats good at least.  Oh, wait, there are two cat
/proc/meminfo's.  Is one when its fast and one when its slow?

Looks to me, in both cases, you are not using much memory at all.  (if
you happen to have 'free', its output is a little more readable, if you
wouldn't mind posting it (only really need it for when the box is slow)

-Andy

From:
Shaun Thomas
Date:

On 01/26/2011 10:04 AM, Anne Rosset wrote:

> We've been able to match  long running database queries to such
> processes. This occurs under relatively low load average (say 4 out of
> 8) and can involve as little as 1 single sql query.

The b state means the process is blocking, waiting for... something. One
thing you need to consider is far more than your CPU usage. If you have
the 'sar' utility, run it as 'sar 1 100' just to see how your system is
working. What you want to watch for is iowait.

If even one query is churning your disks, every single other query that
has to take even one block from disk instead of cache, is going to
stall. If you see an iowait of anything greater than 5%, you'll want to
check further on the device that contains your database with iostat. My
favorite use of this is 'iostat -dmx [device] 1' where [device] is the
block device where your data files are, if your WAL is somewhere else.

And yeah, your shared_buffers are kinda on the lowish side. Your
effective_cache_size is good, but you have a lot more room to increase
PG-specific memory.

Worse however, is your checkpoints. Lord. Increase checkpoint_segments
to *at least* 20, and increase your checkpoint_completion_target to 0.7
or 0.8. Check your logs for checkpoint warnings, and I'll bet it's
constantly complaining about increasing your checkpoint segments. Every
checkpoint not started by the scheduled system risks a checkpoint spike,
which can flood your system with IO regardless of which queries are
running. That kind of IO storm will ruin your performance, and with only
3 checkpoint segments on a busy database, are probably happening constantly.

Unfortunately we still need to know more. This is just based on your PG
settings, and that's not really enough to know how "busy" your DB is.
One way to check is to log the contents of pg_stat_database, especially
the xact_commit and xact_rollback columns. Grab those with a timestamp.
If you get a snapshot of that every minute, you can figure out how many
queries you're processing per minute or per second pretty easily. We've
hit 8600 TPS before and don't have nearly the trouble you've been reporting.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870


______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

From:
"mark"
Date:

> -----Original Message-----
> From:  [mailto:pgsql-performance-
> ] On Behalf Of Ivan Voras
> Sent: Wednesday, January 26, 2011 6:25 AM
> To: 
> Subject: Re: [PERFORM] Queries becoming slow under heavy load
>
> On 25/01/2011 22:37, Anne Rosset wrote:
> > Hi,
> >
> > We are running some performances tests.  With a lot of concurrent
> > access,  queries get very slow. When there is no load, those queries
> run
> > fast.
>
> As others said, you need to stat how many concurrent clients are
> working
> on the database and also the number of logical CPUs (CPU cores,
> hyperthreading) are present in the machine. So far, as a rule of thumb,
> if you have more concurrent active connections (i.e. doing queries, not
> idling), you will experience a sharp decline in performance if this
> number exceeds the number of logical CPUs in the machine.
>
Depending on what version the OP is running - I didn't see where a version was givin - if there is a "lot" number of
idleconnections it can affect things as well. Tom indicated to me this should be "much better" in 8.4 and later.  


<slight deviation to put idle connections overhead in prespecive>
We cut our idle connections from 600+ to a bit over 300 and saw a good drop in box load and query responsiveness.
(stillget large user cpu load spikes though when a few hundred idle connection processes are woken open because they
allappear to be sleeping on the same semaphore and one of them has work to do. ) 

(yeah I know get a pooler, to bad only bouncer seems to pool out idle connections with transaction pooling but then I
loseprepared statements... I am still working on that part and getting off 8.3. yes our app tried to do its own quasi
connectionpooling. When we deployed the app on a few hundred boxes the error of this choice years ago when this app
livedon much fewer machines is now pretty evident.) 

>
> --
> Sent via pgsql-performance mailing list (pgsql-
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


From:
Scott Marlowe
Date:

On Wed, Jan 26, 2011 at 9:04 AM, Anne Rosset <> wrote:

<HUGE LIST OF SETTINGS DELETED>

PLEASE post just the settings you changed.  I'm not searching through
a list that big for the interesting bits.

> Today we did more analysis and observed  postgress processes that
> continually reported status 'D' in top.

Full stop.  The most likely problem here is that the query is now
hitting the disks and waiting.  If you have 1 disk and two users, the
access speed will drop by factors, usually much higher than 2.

To put it very simply, you need as many mirror pairs in your RAID-10
or as many disks in your RAID5 or RAID 6 as you have users reading the
disk drives.  If you're writing you need more and more disks too.
Mediating this issue we find things like SSD cache in ZFS or battery
backed RAID controllers.  They allow the reads and writes to be
streamlined quite a bit to the spinning disks, making it appear the
RAID array underneath it was much faster, had better access, and all
the sectors were near each other.   To an extent.

If you have the answer to the previous poster's question "can you tell
us what sort of IO you have (sata, scsi, raid, # of disks, etc)." you
should provide it.  If you've got a pair of 5k RPM SATA drives in a
RAID-1 you might need more hardware.

So, instead of just settings, show us a few carefully selected lines
of output from vmstat or iostat while this is happening.  Don't tell
us what you see, show us.

From:
Scott Marlowe
Date:

On Wed, Jan 26, 2011 at 10:16 AM, Shaun Thomas <> wrote:
> Worse however, is your checkpoints. Lord. Increase checkpoint_segments to
> *at least* 20, and increase your checkpoint_completion_target to 0.7 or 0.8.
> Check your logs for checkpoint warnings, and I'll bet it's constantly
> complaining about increasing your checkpoint segments. Every checkpoint not
> started by the scheduled system risks a checkpoint spike, which can flood
> your system with IO regardless of which queries are running. That kind of IO
> storm will ruin your performance, and with only 3 checkpoint segments on a
> busy database, are probably happening constantly.

To Shaun:

Unless she's not write bound but read bound.  We can't tell because we
haven't seen the queries.  We haven't seen the output of iostat or
vmstat.

To Anne:

Another tool to recommend is sar.  it's part of the sysstat package on
debian / ubuntu / rhel.  you have to enable it in various ways, it'll
tell you when you try to run it after installing it.   It allows you
to look back over the last 7 days, 5 minutes at a time, to see the
trends on your servers.  Very useful stuff and easy to graph in a
spreadsheet or web page.  Or just read it.

For instance, here's the output of sar on the data drive of a slave
(read only) server under slony replication.

sar -d -f sa25|grep "02:[01234].:.. AM.\+dev251-1"
Linux 2.6.32-27-server ()     01/25/2011     _x86_64_    (16 CPU)

12:00:01 AM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz
avgqu-sz     await     svctm     %util
09:45:01 AM  dev251-1    481.21   6981.74   1745.82     18.14
4.86     10.10      1.57     75.65
09:55:01 AM  dev251-1    620.16  28539.52   2135.67     49.46
5.25      8.47      1.22     75.42
10:05:01 AM  dev251-1   1497.16  29283.95   1898.94     20.83
13.89      9.28      0.64     96.52
10:15:01 AM  dev251-1   1040.47  17123.89   2286.10     18.66
8.89      8.54      0.87     90.49
10:25:01 AM  dev251-1    562.97   8802.77   1515.50     18.33
4.84      8.60      1.41     79.57

Let me interpret for ya, in case it's not obvious.  IO Utilization
runs from about 50% to about 90%.  when it's at 90% we are running 700
to 1000 tps, reading at a maximum of 15MB a second and writing at a
paltry 1M or so a second.  Average wait stays around 10ms.  If we use
sar -u from the same time period, we cna match up iowait to this chart
and see if we were really waiting on IO or not.

12:00:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
09:45:01 AM     all     47.44      0.00      5.20      4.94      0.00     42.42
09:55:01 AM     all     46.42      0.00      5.63      5.77      0.00     42.18
10:05:01 AM     all     48.64      0.00      6.35     11.87      0.00     33.15
10:15:01 AM     all     46.94      0.00      5.79      8.81      0.00     38.46
10:25:01 AM     all     48.68      0.00      5.58      5.42      0.00     40.32

We can see that we have at peak, 11% of our CPU power is waiting
behind IO.  We have 16 CPUs, so each one is 100/16 or 6.25% of the
total.  So at 11% we have two cores on hold the whole time basically.
In real life on this machine we have ALL cpus waiting about 11% of the
time across the board.  But the math comes out the same.  We're
waiting on IO.

Here's a heavy db server, lots of ram, same time period.  sdh is one
of a large number of disks in a RAID-10 array.  md17 is that RAID-10
array (actually the RAID0 at the top of a bunch of RAID-1s I still
don't trust linux's RAID-10 implementation).
12:00:01 AM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz
avgqu-sz     await     svctm     %util
09:45:01          sdh      5.86      5.65    158.87     28.08
0.21     35.07      3.36      1.97
09:45:01         md17    253.78    168.69   1987.64      8.50
0.00      0.00      0.00      0.00
09:55:01          sdh      5.48      5.65    134.99     25.68
0.16     30.00      3.31      1.81
09:55:01         md17    215.13    157.56   1679.67      8.54
0.00      0.00      0.00      0.00
10:05:01          sdh      4.37      5.39    106.53     25.58
0.09     21.61      3.57      1.56
10:05:01         md17    170.81    133.76   1334.43      8.60
0.00      0.00      0.00      0.00
10:15:01          sdh      6.16      5.37    177.25     29.64
0.25     40.95      3.38      2.08
10:15:01         md17    280.63    137.88   2206.95      8.36
0.00      0.00      0.00      0.00
10:25:01          sdh      4.52      3.72    116.41     26.56
0.09     20.64      3.58      1.62
10:25:01         md17    187.65    107.59   1470.88      8.41
0.00      0.00      0.00      0.00

(Note that md devices do not show values for %util, svctm or await
hence the need for sdh)

This db fits the data set in ram, the other machine doesn't.  It had a
RAID controller, but that caught fire, and burned down.  The next
caught fire, burned down, and fell into the swamp.  It now has a
straight up SAS controller with no caching.  Numbers were even better
when it had a caching RAID controller, but I got tired of replacing
them.

OK, so md17 is handling only 280 tps, while the array on the other,
smaller server, was around 1,000.  The master is reading 5 or 6
sectors per second, while the slave is reading 30k sectors a second.
The master is writing at ~1500 to 2000 sectors a second, the slave is
similar.  The slave server here was IO bound very badly because it A:
didn't have enough memory to cache the data set, and B: hadn't had
time to warm up to get what memory it had to do the job.  It was
thrown into the mix mid morning rush and it fell flat on its ass.  If
it had been warmed up first (running it at 100:1 load factor by our
load balancing module to start) it would have been ok.  It would have
still had horrible IO numbers though.   Once the caches load by
1500hrs, the slave is reading at just 500 sectors / sec, child's play
really.

So, get sar running, and get some numbers from the machine when these
things are happening.  Look for how it looks before during and after
the crisis.

From:
"Anne Rosset"
Date:

Scott,
Thanks for your response.
We are over NFS for our storage ...

Here is what we see during our performance testing:
This is about 7 seconds after the query was sent to postgres:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7090 root      25   0  689m 399m  10m R 89.9  5.0   3868:44 java
 1846 postgres  16   0  474m 198m 103m R 75.2  2.5   0:28.69 postmaster
 2170 postgres  15   0  391m 203m 188m R 44.0  2.6   0:17.63 postmaster
 2555 httpd     18   0  298m  15m 4808 R 22.0  0.2   0:00.12 httpd
 2558 root      15   0 29056 2324 1424 R  1.8  0.0   0:00.01 top
 1207 httpd     15   0  337m  20m 7064 R  0.0  0.3   0:00.69 httpd
28312 postgres  16   0  396m 183m 162m D  0.0  2.3   0:50.82 postmaster  <---- this is the query here

Notice the 0% CPU, also, notice the 183m RES memory.

Ten seconds later:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7090 root      25   0  689m 399m  10m R 92.9  5.0   3868:53 java
 2657 root      15   0 29056 2328 1424 R  1.9  0.0   0:00.01 top
28312 postgres  16   0  396m 184m 162m D  0.0  2.3   0:50.84 postmaster  <---- here

Ten seconds after that:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7090 root      25   0  689m 399m  10m R 88.7  5.0   3869:02 java
 1845 postgres  16   0  473m 223m 127m D 22.6  2.8   0:26.39 postmaster
 2412 httpd     15   0 2245m 1.4g  16m R 18.9 17.8   0:02.48 java
  966 postgres  15   0  395m 242m 221m D  0.0  3.0   1:02.31 postmaster
 2680 root      15   0 29056 2336 1424 R  0.0  0.0   0:00.01 top
28312 postgres  16   0  396m 184m 163m D  0.0  2.3   0:50.85 postmaster  <--- here

etc....

and it's not until around the 221 second mark that we see catch it consuming CPU:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7090 root      25   0  689m 399m  10m R 93.4  5.0   3872:07 java
28312 postgres  16   0  396m 225m 204m R  5.7  2.8   0:51.52 postmaster   <----- here
 3391 root      15   0 29056 2348 1424 R  1.9  0.0   0:00.01 top
 4297 root      16   0 10228  740  632 D  0.0  0.0  12:53.66 hald-addon-stor
26885 httpd     15   0 2263m 1.5g  16m R  0.0 19.0   0:00.01 java

Note that the load average is fine during this timeframe, ~4 out of 8, so plenty of CPU.

Looks like this is true "halting".

Further, or worse yet, this same behavior expands out to multiple processes, producing a true "back up". It can look
something like this. Notice the 0% cpu consumption:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7090 root      22   0  689m 399m  10m R 91.1  5.0   3874:32 java
 4139 root      15   0 29080 2344 1424 R  1.9  0.0   0:00.01 top
 1555 postgres  16   0  474m 258m 162m D  0.0  3.2   0:17.32 postmaster
 1846 postgres  16   0  474m 285m 189m D  0.0  3.6   0:47.43 postmaster
 2713 postgres  16   0  404m 202m 179m D  0.0  2.5   0:33.54 postmaster
 2801 postgres  16   0  391m 146m 131m D  0.0  1.8   0:04.48 postmaster
 2804 postgres  16   0  419m 172m 133m D  0.0  2.2   0:09.41 postmaster
 2825 postgres  16   0  473m 142m  49m D  0.0  1.8   0:04.12 postmaster

Thanks for any additional explanation/advice,
Anne

-----Original Message-----
From: Scott Marlowe [mailto:]
Sent: Wednesday, January 26, 2011 8:19 PM
To: Anne Rosset
Cc: 
Subject: Re: FW: [PERFORM] Queries becoming slow under heavy load

On Wed, Jan 26, 2011 at 9:04 AM, Anne Rosset <> wrote:

<HUGE LIST OF SETTINGS DELETED>

PLEASE post just the settings you changed.  I'm not searching through a list that big for the interesting bits.

> Today we did more analysis and observed  postgress processes that
> continually reported status 'D' in top.

Full stop.  The most likely problem here is that the query is now hitting the disks and waiting.  If you have 1 disk
andtwo users, the access speed will drop by factors, usually much higher than 2. 

To put it very simply, you need as many mirror pairs in your RAID-10 or as many disks in your RAID5 or RAID 6 as you
haveusers reading the disk drives.  If you're writing you need more and more disks too. 
Mediating this issue we find things like SSD cache in ZFS or battery backed RAID controllers.  They allow the reads and
writesto be streamlined quite a bit to the spinning disks, making it appear the RAID array underneath it was much
faster,had better access, and all 
the sectors were near each other.   To an extent.

If you have the answer to the previous poster's question "can you tell us what sort of IO you have (sata, scsi, raid, #
ofdisks, etc)." you should provide it.  If you've got a pair of 5k RPM SATA drives in a 
RAID-1 you might need more hardware.

So, instead of just settings, show us a few carefully selected lines of output from vmstat or iostat while this is
happening. Don't tell us what you see, show us. 

From:
Shaun Thomas
Date:

On 01/27/2011 11:12 PM, Anne Rosset wrote:

> Thanks for your response.
> We are over NFS for our storage ...

NFS? I'm not sure you know this, but NFS has major locking issues
that would make it a terrible candidate for hosting a database.

> and it's not until around the 221 second mark that we see catch it consuming CPU:
>
>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>   7090 root      25   0  689m 399m  10m R 93.4  5.0   3872:07 java
> 28312 postgres  16   0  396m 225m 204m R  5.7  2.8   0:51.52 postmaster<----- here
>   3391 root      15   0 29056 2348 1424 R  1.9  0.0   0:00.01 top
>   4297 root      16   0 10228  740  632 D  0.0  0.0  12:53.66 hald-addon-stor
> 26885 httpd     15   0 2263m 1.5g  16m R  0.0 19.0   0:00.01 java
>
> Note that the load average is fine during this timeframe, ~4 out of 8, so plenty of CPU.

Please listen to us. We asked you to use sar, or iostat, to tell
us how much the disk IO is being utilized. From your other
screenshots, there were at least two other PG processes that
were running and could have been thrashing the disk or locking
tables your "slow" query needed. If it's waiting for disk IO, the
CPU will remain low until it gets what it needs.

Not everything is about the CPU. Especially now that we know your DB is
running on top of NFS.

> Further, or worse yet, this same behavior expands out to multiple processes,
> producing a true "back up". It can look
> something like this. Notice the 0% cpu consumption:
>
>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>   7090 root      22   0  689m 399m  10m R 91.1  5.0   3874:32 java
>   4139 root      15   0 29080 2344 1424 R  1.9  0.0   0:00.01 top
>   1555 postgres  16   0  474m 258m 162m D  0.0  3.2   0:17.32 postmaster
>   1846 postgres  16   0  474m 285m 189m D  0.0  3.6   0:47.43 postmaster
>   2713 postgres  16   0  404m 202m 179m D  0.0  2.5   0:33.54 postmaster
>   2801 postgres  16   0  391m 146m 131m D  0.0  1.8   0:04.48 postmaster
>   2804 postgres  16   0  419m 172m 133m D  0.0  2.2   0:09.41 postmaster
>   2825 postgres  16   0  473m 142m  49m D  0.0  1.8   0:04.12 postmaster

Yes. And they could all be waiting for IO. Or NFS locking is blocking
the reads. Or... what is that Java app doing? We don't know the state
of your IO, and when you have 0% or very low CPU usage, you either have
locking contention or you're being IO starved.

And what queries are these connections performing? You can check it by
getting the contents of the pg_stat_activity system view. If they're
selecting and still "slow", compare that against the iostat or sar
results. For instance, here's an IOSTAT of our system:

iostat -dmx dm-9 1

Linux 2.6.18-92.el5 (oslchi6pedb1)     01/28/2011

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
dm-9              0.00     0.00 125.46 227.78     4.95     0.89    33.88     0.08    0.19   0.08   2.91

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
dm-9              0.00     0.00  5.00  0.00     0.04     0.00    14.40     0.05   10.60  10.60   5.30

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
dm-9              0.00     0.00  2.00  0.00     0.02     0.00    16.00     0.01    7.00   7.00   1.40

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
dm-9              0.00     0.00  4.00 1184.00     0.04     4.62     8.04    27.23   11.73   0.06   6.80

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
dm-9              0.00     0.00 11.00 847.00     0.09     3.31     8.10    29.31   49.65   0.79  67.90


That last column, %util, effectively tells us how saturated the
controller is. If the percentage is high, it's really working
hard to supply the data we're asking for, or trying to write. If
it's low, we're probably working from memory cache, or getting
less requests. There have been times our queries are "slow" and
when we check this stat, it's often at or above 90%, sometimes
for minutes at a time. That's almost always a clear indicator
you have IO contention. Queries can't work without the data
they need to return your results.

Sending us more CPU charts isn't going to help us in helping
you.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870


______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

From:
Mladen Gogala
Date:

Shaun Thomas wrote:
> On 01/27/2011 11:12 PM, Anne Rosset wrote:
>
>
>> Thanks for your response.
>> We are over NFS for our storage ...
>>
>
> NFS? I'm not sure you know this, but NFS has major locking issues
> that would make it a terrible candidate for hosting a database.
>
>
That depends on the implementation. Vendor supported NAS, running NFS3
or NFS4 should be OK. There are other databases that can use it, too.
Some databases even have a built-in NFS client.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




From:
"Anne Rosset"
Date:

Thanks to all  of you who replied and pointed NFS as a potential
culprit.
Our issue was that  pgsql's temp dir (pgsql_tmp)  was set to the default
value ( $PSQL_DIR/base/pgsql_tmp/)  which was located in NFS.
Moving the temp dir to local disk got us  a huge improvement.

Anne

-----Original Message-----
From: Shaun Thomas [mailto:]
Sent: Friday, January 28, 2011 7:31 AM
To: Anne Rosset
Cc: 
Subject: Re: FW: [PERFORM] Queries becoming slow under heavy load

On 01/27/2011 11:12 PM, Anne Rosset wrote:

> Thanks for your response.
> We are over NFS for our storage ...

NFS? I'm not sure you know this, but NFS has major locking issues that
would make it a terrible candidate for hosting a database.

> and it's not until around the 221 second mark that we see catch it
consuming CPU:
>
>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>   7090 root      25   0  689m 399m  10m R 93.4  5.0   3872:07 java
> 28312 postgres  16   0  396m 225m 204m R  5.7  2.8   0:51.52
postmaster<----- here
>   3391 root      15   0 29056 2348 1424 R  1.9  0.0   0:00.01 top
>   4297 root      16   0 10228  740  632 D  0.0  0.0  12:53.66
hald-addon-stor
> 26885 httpd     15   0 2263m 1.5g  16m R  0.0 19.0   0:00.01 java
>
> Note that the load average is fine during this timeframe, ~4 out of 8,
so plenty of CPU.

Please listen to us. We asked you to use sar, or iostat, to tell us how
much the disk IO is being utilized. From your other screenshots, there
were at least two other PG processes that were running and could have
been thrashing the disk or locking tables your "slow" query needed. If
it's waiting for disk IO, the CPU will remain low until it gets what it
needs.

Not everything is about the CPU. Especially now that we know your DB is
running on top of NFS.

> Further, or worse yet, this same behavior expands out to multiple
> processes, producing a true "back up". It can look something like
> this. Notice the 0% cpu consumption:
>
>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>   7090 root      22   0  689m 399m  10m R 91.1  5.0   3874:32 java
>   4139 root      15   0 29080 2344 1424 R  1.9  0.0   0:00.01 top
>   1555 postgres  16   0  474m 258m 162m D  0.0  3.2   0:17.32
postmaster
>   1846 postgres  16   0  474m 285m 189m D  0.0  3.6   0:47.43
postmaster
>   2713 postgres  16   0  404m 202m 179m D  0.0  2.5   0:33.54
postmaster
>   2801 postgres  16   0  391m 146m 131m D  0.0  1.8   0:04.48
postmaster
>   2804 postgres  16   0  419m 172m 133m D  0.0  2.2   0:09.41
postmaster
>   2825 postgres  16   0  473m 142m  49m D  0.0  1.8   0:04.12
postmaster

Yes. And they could all be waiting for IO. Or NFS locking is blocking
the reads. Or... what is that Java app doing? We don't know the state of
your IO, and when you have 0% or very low CPU usage, you either have
locking contention or you're being IO starved.

And what queries are these connections performing? You can check it by
getting the contents of the pg_stat_activity system view. If they're
selecting and still "slow", compare that against the iostat or sar
results. For instance, here's an IOSTAT of our system:

iostat -dmx dm-9 1

Linux 2.6.18-92.el5 (oslchi6pedb1)     01/28/2011

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await  svctm  %util
dm-9              0.00     0.00 125.46 227.78     4.95     0.89    33.88
0.08    0.19   0.08   2.91

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await  svctm  %util
dm-9              0.00     0.00  5.00  0.00     0.04     0.00    14.40
0.05   10.60  10.60   5.30

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await  svctm  %util
dm-9              0.00     0.00  2.00  0.00     0.02     0.00    16.00
0.01    7.00   7.00   1.40

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await  svctm  %util
dm-9              0.00     0.00  4.00 1184.00     0.04     4.62     8.04
27.23   11.73   0.06   6.80

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await  svctm  %util
dm-9              0.00     0.00 11.00 847.00     0.09     3.31     8.10
29.31   49.65   0.79  67.90


That last column, %util, effectively tells us how saturated the
controller is. If the percentage is high, it's really working hard to
supply the data we're asking for, or trying to write. If it's low, we're
probably working from memory cache, or getting less requests. There have
been times our queries are "slow" and when we check this stat, it's
often at or above 90%, sometimes for minutes at a time. That's almost
always a clear indicator you have IO contention. Queries can't work
without the data they need to return your results.

Sending us more CPU charts isn't going to help us in helping you.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870


______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

From:
Scott Marlowe
Date:

Excellent!  And you learned a bit more about how to monitor your
server while you were at it.  Win win!

On Thu, Feb 3, 2011 at 10:40 AM, Anne Rosset <> wrote:
> Thanks to all  of you who replied and pointed NFS as a potential
> culprit.
> Our issue was that  pgsql's temp dir (pgsql_tmp)  was set to the default
> value ( $PSQL_DIR/base/pgsql_tmp/)  which was located in NFS.
> Moving the temp dir to local disk got us  a huge improvement.
>
> Anne
>
> -----Original Message-----
> From: Shaun Thomas [mailto:]
> Sent: Friday, January 28, 2011 7:31 AM
> To: Anne Rosset
> Cc: 
> Subject: Re: FW: [PERFORM] Queries becoming slow under heavy load
>
> On 01/27/2011 11:12 PM, Anne Rosset wrote:
>
>> Thanks for your response.
>> We are over NFS for our storage ...
>
> NFS? I'm not sure you know this, but NFS has major locking issues that
> would make it a terrible candidate for hosting a database.
>
>> and it's not until around the 221 second mark that we see catch it
> consuming CPU:
>>
>>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>   7090 root      25   0  689m 399m  10m R 93.4  5.0   3872:07 java
>> 28312 postgres  16   0  396m 225m 204m R  5.7  2.8   0:51.52
> postmaster<----- here
>>   3391 root      15   0 29056 2348 1424 R  1.9  0.0   0:00.01 top
>>   4297 root      16   0 10228  740  632 D  0.0  0.0  12:53.66
> hald-addon-stor
>> 26885 httpd     15   0 2263m 1.5g  16m R  0.0 19.0   0:00.01 java
>>
>> Note that the load average is fine during this timeframe, ~4 out of 8,
> so plenty of CPU.
>
> Please listen to us. We asked you to use sar, or iostat, to tell us how
> much the disk IO is being utilized. From your other screenshots, there
> were at least two other PG processes that were running and could have
> been thrashing the disk or locking tables your "slow" query needed. If
> it's waiting for disk IO, the CPU will remain low until it gets what it
> needs.
>
> Not everything is about the CPU. Especially now that we know your DB is
> running on top of NFS.
>
>> Further, or worse yet, this same behavior expands out to multiple
>> processes, producing a true "back up". It can look something like
>> this. Notice the 0% cpu consumption:
>>
>>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>   7090 root      22   0  689m 399m  10m R 91.1  5.0   3874:32 java
>>   4139 root      15   0 29080 2344 1424 R  1.9  0.0   0:00.01 top
>>   1555 postgres  16   0  474m 258m 162m D  0.0  3.2   0:17.32
> postmaster
>>   1846 postgres  16   0  474m 285m 189m D  0.0  3.6   0:47.43
> postmaster
>>   2713 postgres  16   0  404m 202m 179m D  0.0  2.5   0:33.54
> postmaster
>>   2801 postgres  16   0  391m 146m 131m D  0.0  1.8   0:04.48
> postmaster
>>   2804 postgres  16   0  419m 172m 133m D  0.0  2.2   0:09.41
> postmaster
>>   2825 postgres  16   0  473m 142m  49m D  0.0  1.8   0:04.12
> postmaster
>
> Yes. And they could all be waiting for IO. Or NFS locking is blocking
> the reads. Or... what is that Java app doing? We don't know the state of
> your IO, and when you have 0% or very low CPU usage, you either have
> locking contention or you're being IO starved.
>
> And what queries are these connections performing? You can check it by
> getting the contents of the pg_stat_activity system view. If they're
> selecting and still "slow", compare that against the iostat or sar
> results. For instance, here's an IOSTAT of our system:
>
> iostat -dmx dm-9 1
>
> Linux 2.6.18-92.el5 (oslchi6pedb1)      01/28/2011
>
> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> dm-9              0.00     0.00 125.46 227.78     4.95     0.89    33.88
> 0.08    0.19   0.08   2.91
>
> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> dm-9              0.00     0.00  5.00  0.00     0.04     0.00    14.40
> 0.05   10.60  10.60   5.30
>
> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> dm-9              0.00     0.00  2.00  0.00     0.02     0.00    16.00
> 0.01    7.00   7.00   1.40
>
> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> dm-9              0.00     0.00  4.00 1184.00     0.04     4.62     8.04
> 27.23   11.73   0.06   6.80
>
> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> dm-9              0.00     0.00 11.00 847.00     0.09     3.31     8.10
> 29.31   49.65   0.79  67.90
>
>
> That last column, %util, effectively tells us how saturated the
> controller is. If the percentage is high, it's really working hard to
> supply the data we're asking for, or trying to write. If it's low, we're
> probably working from memory cache, or getting less requests. There have
> been times our queries are "slow" and when we check this stat, it's
> often at or above 90%, sometimes for minutes at a time. That's almost
> always a clear indicator you have IO contention. Queries can't work
> without the data they need to return your results.
>
> Sending us more CPU charts isn't going to help us in helping you.
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
> 312-676-8870
> 
>
> ______________________________________________
>
> See  http://www.peak6.com/email_disclaimer.php
> for terms and conditions related to this email
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



--
To understand recursion, one must first understand recursion.