Thread: slow database, queries accumulating

slow database, queries accumulating

From

"Anjan Dave"

Date:

23 September 2005, 16:02:37

We are experiencing consistent slowness on the database for one application. This is more a reporting type of application, heavy on the bytea data type usage (gets rendered into PDFs in the app server). A lot of queries, mostly selects and a few random updates, get accumulated on the server – with increasing volume of users on the application. Below is a snapshot of top, with about 80 selects and 3 or 4 updates. Things get better eventually if I cancel (SIGINT) some of the oldest queries. I also see a few instances of shared locks not being granted during this time…I don’t even see high iowait or memory starvation during these times, as indicated by top.

-bash-2.05b$ psql -c "select * from pg_locks;" dbname | grep f

| | 77922136 | 16761 | ShareLock | f

We (development) are looking into the query optimization (explain analyze, indexes, etc), and my understanding is that the queries when run for explain analyze execute fast, but during busy times, they become quite slow, taking from a few seconds to a few minutes to execute. I do see in the log that almost all queries do have either ORDER BY, or GROUP BY, or DISTINCT. Does it hurt to up the sort_mem to 3MB or 4MB? Should I up the effective_cache_size to 5 or 6GB? The app is does not need a lot of connections on the database, I can reduce it down from 600.

Based on the description above and the configuration below does any thing appear bad in config? Is there anything I can try in the configuration to improve performance?

The database size is about 4GB.

This is PG 7.4.7, RHAS3.0 (u5), Local 4 spindle RAID10 (15KRPM), and logs on a separate set of drives, RAID10. 6650 server, 4 x XEON, 12GB RAM.

Vacuum is done every night, full vacuum done once a week.

I had increased the shared_buffers and sort_memory recently, which didn’t help.

Thanks,
Anjan

10:44:51 up 14 days, 13:38, 2 users, load average: 0.98, 1.14, 1.12

264 processes: 257 sleeping, 7 running, 0 zombie, 0 stopped

CPU states: cpu user nice system irq softirq iowait idle

total 14.4% 0.0% 7.4% 0.0% 0.0% 0.0% 77.9%

cpu00 15.7% 0.0% 5.7% 0.0% 0.1% 0.0% 78.2%

cpu01 15.1% 0.0% 7.5% 0.0% 0.0% 0.1% 77.0%

cpu02 10.5% 0.0% 5.9% 0.0% 0.0% 0.0% 83.4%

cpu03 9.9% 0.0% 5.9% 0.0% 0.0% 0.0% 84.0%

cpu04 7.9% 0.0% 3.7% 0.0% 0.0% 0.0% 88.2%

cpu05 19.3% 0.0% 12.3% 0.0% 0.0% 0.0% 68.3%

cpu06 20.5% 0.0% 9.5% 0.0% 0.0% 0.1% 69.7%

cpu07 16.1% 0.0% 8.5% 0.0% 0.1% 0.3% 74.7%

Mem: 12081736k av, 7881972k used, 4199764k free, 0k shrd, 82372k buff

4823496k actv, 2066260k in_d, 2036k in_c

Swap: 4096532k av, 0k used, 4096532k free 6888900k cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND

16773 postgres 15 0 245M 245M 240M S 0.0 2.0 1:16 7 postmaster

16880 postgres 15 0 245M 245M 240M S 0.1 2.0 0:49 6 postmaster

16765 postgres 15 0 245M 245M 240M S 0.0 2.0 1:16 0 postmaster

16825 postgres 15 0 245M 245M 240M S 0.0 2.0 1:02 5 postmaster

16774 postgres 15 0 245M 245M 240M S 0.1 2.0 1:16 0 postmaster

16748 postgres 15 0 245M 245M 240M S 0.0 2.0 1:19 5 postmaster

16881 postgres 15 0 245M 245M 240M S 0.1 2.0 0:50 7 postmaster

16762 postgres 15 0 245M 245M 240M S 0.0 2.0 1:14 4 postmaster

…

max_connections = 600

shared_buffers = 30000 #=234MB, up from 21760=170MB min 16, at least max_connections*2, 8KB each

sort_mem = 2048 # min 64, size in KB

vacuum_mem = 32768 # up from 16384 min 1024, size in KB

# - Free Space Map -

#max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each

#max_fsm_relations = 1000 # min 100, ~50 bytes each

#fsync = true # turns forced synchronization on or off

#wal_sync_method = fsync # the default varies across platforms:

# fsync, fdatasync, open_sync, or open_datasync

#wal_buffers = 8 # min 4, 8KB each

# - Checkpoints -

checkpoint_segments = 125 # in logfile segments, min 1, 16MB each

checkpoint_timeout = 600 # range 30-3600, in seconds

#checkpoint_warning = 30 # 0 is off, in seconds

#commit_delay = 0 # range 0-100000, in microseconds

#commit_siblings = 5 # range 1-1000

# - Planner Method Enabling -

#enable_hashagg = true

#enable_hashjoin = true

#enable_indexscan = true

#enable_mergejoin = true

#enable_nestloop = true

#enable_seqscan = true

#enable_sort = true

#enable_tidscan = true

# - Planner Cost Constants -

effective_cache_size = 262144 # =2GB typically 8KB each

#random_page_cost = 4 # units are one sequential page fetch cost

#cpu_tuple_cost = 0.01 # (same)

#cpu_index_tuple_cost = 0.001 # (same)

#cpu_operator_cost = 0.0025 # (same)

# - Genetic Query Optimizer -

#geqo = true

#geqo_threshold = 11

#geqo_effort = 1

#geqo_generations = 0

#geqo_pool_size = 0 # default based on tables in statement,

# range 128-1024

#geqo_selection_bias = 2.0 # range 1.5-2.0

# - Other Planner Options -

#default_statistics_target = 10 # range 1-1000

#from_collapse_limit = 8

#join_collapse_limit = 8 # 1 disables collapsing of explicit JOINs

Re: slow database, queries accumulating

From

"Dario"

Date:

27 September 2005, 15:12:45

I have read that 600 connections are a LOT (somebody correct me please if
I'm wrong), since each connections requires a process and your server must
serve this. Besides the overhead involved, you will end up with 1200
megabytes of sort_mem allocated (probably idle most of time)...

pgpool allows you to reuse process (similar to oracle shared servers). Fact:
I didn't have the need to use it. AFAICS, it's easy to use. (I'll try to
make it work and I'll share tests, but dunno know when)

long life, little spam and prosperity


-----Mensaje original-----
De: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org]En nombre de Anjan Dave
Enviado el: viernes, 23 de septiembre de 2005 13:02
Para: pgsql-performance@postgresql.org
Asunto: [PERFORM] slow database, queries accumulating


Hi

We are experiencing consistent slowness on the database for one application.
This is more a reporting type of application, heavy on the bytea data type
usage (gets rendered into PDFs in the app server). A lot of queries, mostly
selects and a few random updates, get accumulated on the server - with
increasing volume of users on the application. Below is a snapshot of top,
with about 80 selects and 3 or 4 updates. Things get better eventually if I
cancel (SIGINT) some of the oldest queries. I also see a few instances of
shared locks not being granted during this time.I don't even see high iowait
or memory starvation during these times, as indicated by top.

-bash-2.05b$ psql -c "select * from pg_locks;" dbname | grep f
          |          |    77922136 | 16761 | ShareLock        | f



We (development) are looking into the query optimization (explain analyze,
indexes, etc), and my understanding is that the queries when run for explain
analyze execute fast, but during busy times, they become quite slow, taking
from a few seconds to a few minutes to execute. I do see in the log that
almost all queries do have either ORDER BY, or GROUP BY, or DISTINCT. Does
it hurt to up the sort_mem to 3MB or 4MB? Should I up the
effective_cache_size to 5 or 6GB? The app is does not need a lot of
connections on the database, I can reduce it down from 600.

Based on the description above and the configuration below does any thing
appear bad in config? Is there anything I can try in the configuration to
improve performance?


The database size is about 4GB.
This is PG 7.4.7, RHAS3.0 (u5), Local 4 spindle RAID10 (15KRPM), and logs on
a separate set of drives, RAID10. 6650 server, 4 x XEON, 12GB RAM.
Vacuum is done every night, full vacuum done once a week.
I had increased the shared_buffers and sort_memory recently, which didn't
help.

Thanks,
Anjan




10:44:51  up 14 days, 13:38,  2 users,  load average: 0.98, 1.14, 1.12
264 processes: 257 sleeping, 7 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total   14.4%    0.0%    7.4%   0.0%     0.0%    0.0%   77.9%
           cpu00   15.7%    0.0%    5.7%   0.0%     0.1%    0.0%   78.2%
           cpu01   15.1%    0.0%    7.5%   0.0%     0.0%    0.1%   77.0%
           cpu02   10.5%    0.0%    5.9%   0.0%     0.0%    0.0%   83.4%
           cpu03    9.9%    0.0%    5.9%   0.0%     0.0%    0.0%   84.0%
           cpu04    7.9%    0.0%    3.7%   0.0%     0.0%    0.0%   88.2%
           cpu05   19.3%    0.0%   12.3%   0.0%     0.0%    0.0%   68.3%
           cpu06   20.5%    0.0%    9.5%   0.0%     0.0%    0.1%   69.7%
           cpu07   16.1%    0.0%    8.5%   0.0%     0.1%    0.3%   74.7%
Mem:  12081736k av, 7881972k used, 4199764k free,       0k shrd,   82372k
buff
                   4823496k actv, 2066260k in_d,    2036k in_c
Swap: 4096532k av,       0k used, 4096532k free                 6888900k
cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
16773 postgres  15   0  245M 245M  240M S     0.0  2.0   1:16   7 postmaster
16880 postgres  15   0  245M 245M  240M S     0.1  2.0   0:49   6 postmaster
16765 postgres  15   0  245M 245M  240M S     0.0  2.0   1:16   0 postmaster
16825 postgres  15   0  245M 245M  240M S     0.0  2.0   1:02   5 postmaster
16774 postgres  15   0  245M 245M  240M S     0.1  2.0   1:16   0 postmaster
16748 postgres  15   0  245M 245M  240M S     0.0  2.0   1:19   5 postmaster
16881 postgres  15   0  245M 245M  240M S     0.1  2.0   0:50   7 postmaster
16762 postgres  15   0  245M 245M  240M S     0.0  2.0   1:14   4 postmaster
.
.


max_connections = 600

shared_buffers = 30000  #=234MB, up from 21760=170MB min 16, at least
max_connections*2, 8KB each
sort_mem = 2048         # min 64, size in KB
vacuum_mem = 32768              # up from 16384 min 1024, size in KB

# - Free Space Map -

#max_fsm_pages = 20000          # min max_fsm_relations*16, 6 bytes each
#max_fsm_relations = 1000       # min 100, ~50 bytes each

#fsync = true                   # turns forced synchronization on or off
#wal_sync_method = fsync        # the default varies across platforms:
                                # fsync, fdatasync, open_sync, or
open_datasync
#wal_buffers = 8                # min 4, 8KB each

# - Checkpoints -

checkpoint_segments = 125       # in logfile segments, min 1, 16MB each
checkpoint_timeout = 600        # range 30-3600, in seconds
#checkpoint_warning = 30        # 0 is off, in seconds
#commit_delay = 0               # range 0-100000, in microseconds
#commit_siblings = 5            # range 1-1000



# - Planner Method Enabling -

#enable_hashagg = true
#enable_hashjoin = true
#enable_indexscan = true
#enable_mergejoin = true
#enable_nestloop = true
#enable_seqscan = true
#enable_sort = true
#enable_tidscan = true

# - Planner Cost Constants -

effective_cache_size = 262144   # =2GB typically 8KB each
#random_page_cost = 4           # units are one sequential page fetch cost
#cpu_tuple_cost = 0.01          # (same)
#cpu_index_tuple_cost = 0.001   # (same)
#cpu_operator_cost = 0.0025     # (same)

# - Genetic Query Optimizer -

#geqo = true
#geqo_threshold = 11
#geqo_effort = 1
#geqo_generations = 0
#geqo_pool_size = 0             # default based on tables in statement,
                                # range 128-1024
#geqo_selection_bias = 2.0      # range 1.5-2.0

# - Other Planner Options -

#default_statistics_target = 10 # range 1-1000
#from_collapse_limit = 8
#join_collapse_limit = 8        # 1 disables collapsing of explicit JOINs