Thread: CPU bound at 99%

CPU bound at 99%

From
Bryan Buecking
Date:
Hi,

I'm running into an performance problem where a Postgres db is running
at 99% CPU (4 cores) with about 500 concurrent connection doing various
queries from a web application. This problem started about a week ago,
and has been steadily going downhill. I have been tweaking the config a
bit, mainly shared_memory but have seen no noticeable improvements.

at any given time there is about 5-6 postgres in startup
(ps auxwww | grep postgres | grep startup | wc -l)

about 2300 connections in idle
(ps auxwww | grep postgres | idle)

and loads of "FATAL: sorry, too many clients already" being logged.

The server that connects to the db is an apache server using persistent
connections. MaxClients is 2048 thus the high number of connections
needed. Application was written in PHP using the Pear DB class.

Here are some typical queries taking place

(table media has about 40,000 records and category about 40):

LOG: duration: 66141.530 ms  statement:
    SELECT COUNT(*) AS CNT
    FROM media m JOIN category ca USING(category_id)
    WHERE CATEGORY_ROOT(m.category_id) = '-1'
    AND m.deleted_on IS NULL

LOG:  duration: 57828.983 ms  statement:
    SELECT COUNT(*) AS CNT
    FROM media m JOIN category ca USING(category_id)
    WHERE CATEGORY_ROOT(m.category_id) = '-1'
    AND m.deleted_on IS NULL AND m.POSTED_ON + interval '7 day'

System
======
cpu             Xeon(R) CPU 5160 @ 3.00GHz stepping 06 x 4
                L1, L2 = 32K, 4096K
mem             8GB
dbms            postgresql-server 8.2.4
disks         scsi0 : LSI Logic SAS based MegaRAID driver
        SCSI device sda: 142082048 512-byte hdwr sectors (72746 MB)
        SCSI device sda: 142082048 512-byte hdwr sectors (72746 MB)

Stats
======

top - 00:28:40 up 12:43,  1 user,  load average: 46.88, 36.55, 37.65
Tasks: 2184 total,  63 running, 2119 sleeping,   1 stopped,   1 zombie
Cpu0: 99.3% us,  0.5% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi, 0.2% si
Cpu1: 98.3% us,  1.4% sy,  0.0% ni,  0.2% id,  0.0% wa,  0.0% hi, 0.0% si
Cpu2: 99.5% us,  0.5% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi, 0.0% si
Cpu3: 99.5% us,  0.5% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi, 0.0% si
Mem:   8166004k total,  6400368k used,  1765636k free,   112080k buffers
Swap:  1020088k total,        0k used,  1020088k free,  3558764k cached


$ vmstat 3
procs ---------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 4  0    0 559428 109440 3558684    0    0    11    27   31   117 96  2  2  0
 5  0    0 558996 109452 3558672    0    0     0    41 1171   835 93  1  7  0
 4  0    0 558996 109452 3558740    0    0     0    38 1172   497 98  1  1  0
11  0    0 554516 109452 3558740    0    0     0    19 1236   610 97  1  2  0
25  0    0 549860 109452 3558740    0    0     0    32 1228   332 99  1  0  0
12  0    0 555412 109452 3558740    0    0     0     4 1148   284 99  1  0  0
15  0    0 555476 109452 3558740    0    0     0    23 1202   290 99  1  0  0
15  0    0 555476 109452 3558740    0    0     0     1 1125   260 99  1  0  0
16  0    0 555460 109452 3558740    0    0     0    12 1214   278 99  1  0  0


# -----------------------------
# PostgreSQL configuration file
# -----------------------------

#data_directory = 'ConfigDir'           # use data in another directory
                                        # (change requires restart)
#hba_file = 'ConfigDir/pg_hba.conf'     # host-based authentication file
                                        # (change requires restart)
#ident_file = 'ConfigDir/pg_ident.conf' # ident configuration file
                                        # (change requires restart)

# If external_pid_file is not explicitly set, no extra PID file is written.
#external_pid_file = '(none)'           # write an extra PID file
                                        # (change requires restart)


#---------------------------------------------------------------------------
# CONNECTIONS AND AUTHENTICATION
#---------------------------------------------------------------------------

# - Connection Settings -

listen_addresses = 'localhost'            # what IP address(es) to listen on;
                                        # comma-separated list of addresses;
                                        # defaults to 'localhost', '*' = all
                                        # (change requires restart)
port = 5432                             # (change requires restart)
max_connections = 2400                  # (change requires restart)
# Note: increasing max_connections costs ~400 bytes of shared memory per
# connection slot, plus lock space (see max_locks_per_transaction).  You
# might also need to raise shared_buffers to support more connections.
superuser_reserved_connections = 3      # (change requires restart)
#unix_socket_directory = ''             # (change requires restart)
#unix_socket_group = ''                 # (change requires restart)
#unix_socket_permissions = 0777         # octal
                                        # (change requires restart)
#bonjour_name = ''                      # defaults to the computer name
                                        # (change requires restart)

# - Security & Authentication -

#authentication_timeout = 1min          # 1s-600s
#ssl = off                              # (change requires restart)
#password_encryption = on
#db_user_namespace = off

# Kerberos
#krb_server_keyfile = ''                # (change requires restart)
#krb_srvname = 'postgres'               # (change requires restart)
#krb_server_hostname = ''               # empty string matches any keytab entry
                                        # (change requires restart)
#krb_caseins_users = off                # (change requires restart)

# - TCP Keepalives -
# see 'man 7 tcp' for details

#tcp_keepalives_idle = 0                # TCP_KEEPIDLE, in seconds;
                                        # 0 selects the system default
#tcp_keepalives_interval = 0            # TCP_KEEPINTVL, in seconds;
                                        # 0 selects the system default
#tcp_keepalives_count = 0               # TCP_KEEPCNT;
                                        # 0 selects the system default


#---------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#---------------------------------------------------------------------------

# - Memory -

shared_buffers = 600MB                  # min 128kB or max_connections*16kB
                                        # (change requires restart)
temp_buffers = 10MB                     # min 800kB
#max_prepared_transactions = 5          # can be 0 or more
                                        # (change requires restart)
# Note: increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 8MB                          # min 64kB
maintenance_work_mem = 512MB            # min 1MB
max_stack_depth = 8MB                   # min 100kB

# - Free Space Map -

max_fsm_pages = 1536000         # min max_fsm_relations*16, 6 bytes each
                                        # (change requires restart)
max_fsm_relations = 10000               # min 100, ~70 bytes each
                                        # (change requires restart)

# - Kernel Resource Usage -

max_files_per_process = 1024            # min 25
                                        # (change requires restart)
#shared_preload_libraries = ''          # (change requires restart)

# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0                  # 0-1000 milliseconds
#vacuum_cost_page_hit = 1               # 0-10000 credits
#vacuum_cost_page_miss = 10             # 0-10000 credits
#vacuum_cost_page_dirty = 20            # 0-10000 credits
#vacuum_cost_limit = 200                # 0-10000 credits

# - Background writer -

#bgwriter_delay = 200ms                 # 10-10000ms between rounds
#bgwriter_lru_percent = 1.0             # 0-100% of LRU buffers scanned/round
#bgwriter_lru_maxpages = 5              # 0-1000 buffers max written/round
#bgwriter_all_percent = 0.333           # 0-100% of all buffers scanned/round
#bgwriter_all_maxpages = 5              # 0-1000 buffers max written/round


#---------------------------------------------------------------------------
# WRITE AHEAD LOG
#---------------------------------------------------------------------------

# - Settings -

#fsync = on                             # turns forced synchronization on or off
#wal_sync_method = fsync                # the default is the first option
                                        # supported by the operating system:
                                        #   open_datasync
                                        #   fdatasync
                                        #   fsync
                                        #   fsync_writethrough
                                        #   open_sync
#full_page_writes = on                  # recover from partial page writes
#wal_buffers = 64kB                     # min 32kB
                                        # (change requires restart)
#commit_delay = 0                       # range 0-100000, in microseconds
#commit_siblings = 5                    # range 1-1000

# - Checkpoints -

#checkpoint_segments = 3                # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 5min              # range 30s-1h
#checkpoint_warning = 30s               # 0 is off

# - Archiving -

#archive_command = ''           # command to use to archive a logfile segment
#archive_timeout = 0            # force a logfile segment switch after this
                                # many seconds; 0 is off


#---------------------------------------------------------------------------
# QUERY TUNING
#---------------------------------------------------------------------------

# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on

# - Planner Cost Constants -

#seq_page_cost = 1.0                    # measured on an arbitrary scale
#random_page_cost = 4.0                 # same scale as above
#cpu_tuple_cost = 0.01                  # same scale as above
#cpu_index_tuple_cost = 0.005           # same scale as above
#cpu_operator_cost = 0.0025             # same scale as above
effective_cache_size = 1000MB

# - Genetic Query Optimizer -

#geqo = on
#geqo_threshold = 12
#geqo_effort = 5                        # range 1-10
#geqo_pool_size = 0                     # selects default based on effort
#geqo_generations = 0                   # selects default based on effort
#geqo_selection_bias = 2.0              # range 1.5-2.0

# - Other Planner Options -

#default_statistics_target = 10         # range 1-1000
#constraint_exclusion = off
#from_collapse_limit = 8
#join_collapse_limit = 8                # 1 disables collapsing of explicit
                                        # JOINs


#---------------------------------------------------------------------------
# ERROR REPORTING AND LOGGING
#---------------------------------------------------------------------------

# - Where to Log -

#log_destination = 'stderr'             # Valid values are combinations of
                                        # stderr, syslog and eventlog,
                                        # depending on platform.

# This is used when logging to stderr:
#redirect_stderr = off                  # Enable capturing of stderr into log
                                        # files
                                        # (change requires restart)

# These are only used if redirect_stderr is on:
#log_directory = 'pg_log'               # Directory where log files are written
                                        # Can be absolute or relative to PGDATA
#log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # Log file name pattern.
                                        # Can include strftime() escapes
#log_truncate_on_rotation = off # If on, any existing log file of the same
                                        # name as the new log file will be
                                        # truncated rather than appended to. But
                                        # such truncation only occurs on
                                        # time-driven rotation, not on restarts
                                        # or size-driven rotation. Default is
                                        # off, meaning append to existing files
                                        # in all cases.
#log_rotation_age = 1d                  # Automatic rotation of logfiles will
                                        # happen after that time.  0 to
                                        # disable.
#log_rotation_size = 10MB               # Automatic rotation of logfiles will
                                        # happen after that much log
                                        # output.  0 to disable.

# These are relevant when logging to syslog:
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'


# - When to Log -

#client_min_messages = notice           # Values, in order of decreasing detail:
                                        #   debug5
                                        #   debug4
                                        #   debug3
                                        #   debug2
                                        #   debug1
                                        #   log
                                        #   notice
                                        #   warning
                                        #   error

#log_min_messages = notice              # Values, in order of decreasing detail:
                                        #   debug5
                                        #   debug4
                                        #   debug3
                                        #   debug2
                                        #   debug1
                                        #   info
                                        #   notice
                                        #   warning
                                        #   error
                                        #   log
                                        #   fatal
                                        #   panic

#log_error_verbosity = default          # terse, default, or verbose messages

#log_min_error_statement = error        # Values in order of increasing severity:
                                        #   debug5
                                        #   debug4
                                        #   debug3
                                        #   debug2
                                        #   debug1
                                        #   info
                                        #   notice
                                        #   warning
                                        #   error
                                        #   fatal
                                        #   panic (effectively off)

log_min_duration_statement = -1         # -1 is disabled, 0 logs all statements
                                        # and their durations.

#silent_mode = off                      # DO NOT USE without syslog or
                                        # redirect_stderr
                                        # (change requires restart)

# - What to Log -

#debug_print_parse = off
#debug_print_rewritten = off
#debug_print_plan = off
#debug_pretty_print = off
#log_connections = off
log_disconnections = on
log_duration = on
log_line_prefix = '%u@%d %h %m'         # Special values:
                                        #   %u = user name
                                        #   %d = database name
                                        #   %r = remote host and port
                                        #   %h = remote host
                                        #   %p = PID
                                        #   %t = timestamp (no milliseconds)
                                        #   %m = timestamp with milliseconds
                                        #   %i = command tag
                                        #   %c = session id
                                        #   %l = session line number
                                        #   %s = session start timestamp
                                        #   %x = transaction id
                                        #   %q = stop here in non-session
                                        #        processes
                                        #   %% = '%'
                                        # e.g. '<%u%%%d> '
#log_statement = 'none'                 # none, ddl, mod, all
#log_hostname = off


#---------------------------------------------------------------------------
# RUNTIME STATISTICS
#---------------------------------------------------------------------------

# - Query/Index Statistics Collector -

#stats_command_string = on
#update_process_title = on

#stats_start_collector = on             # needed for block or row stats
                                        # (change requires restart)
#stats_block_level = off
#stats_row_level = off
#stats_reset_on_server_start = off      # (change requires restart)


# - Statistics Monitoring -

#log_parser_stats = off
#log_planner_stats = off
#log_executor_stats = off
#log_statement_stats = off


#---------------------------------------------------------------------------
# AUTOVACUUM PARAMETERS
#---------------------------------------------------------------------------

#autovacuum = off                       # enable autovacuum subprocess?
                                        # 'on' requires stats_start_collector
                                        # and stats_row_level to also be on
#autovacuum_naptime = 1min              # time between autovacuum runs
#autovacuum_vacuum_threshold = 500      # min # of tuple updates before
                                        # vacuum
#autovacuum_analyze_threshold = 250     # min # of tuple updates before
                                        # analyze
#autovacuum_vacuum_scale_factor = 0.2   # fraction of rel size before
                                        # vacuum
#autovacuum_analyze_scale_factor = 0.1  # fraction of rel size before
                                        # analyze
#autovacuum_freeze_max_age = 200000000  # maximum XID age before forced vacuum
                                        # (change requires restart)
#autovacuum_vacuum_cost_delay = -1      # default vacuum cost delay for
                                        # autovacuum, -1 means use
                                        # vacuum_cost_delay
#autovacuum_vacuum_cost_limit = -1      # default vacuum cost limit for
                                        # autovacuum, -1 means use
                                        # vacuum_cost_limit


#---------------------------------------------------------------------------
# CLIENT CONNECTION DEFAULTS
#---------------------------------------------------------------------------

# - Statement Behavior -

#search_path = '"$user",public'         # schema names
#default_tablespace = ''                # a tablespace name, '' uses
                                        # the default
#check_function_bodies = on
#default_transaction_isolation = 'read committed'
#default_transaction_read_only = off
#statement_timeout = 0                  # 0 is disabled
#vacuum_freeze_min_age = 100000000

# - Locale and Formatting -

datestyle = 'iso, mdy'
#timezone = unknown                     # actually, defaults to TZ
                                        # environment setting
#timezone_abbreviations = 'Default'     # select the set of available timezone
                                        # abbreviations. Currently, there are
                                        #   Default
                                        #   Australia
                                        #   India
                                        # However you can also create your own
                                        # file in share/timezonesets/.
#extra_float_digits = 0                 # min -15, max 2
#client_encoding = sql_ascii            # actually, defaults to database
                                        # encoding

# These settings are initialized by initdb -- they might be changed
lc_messages = 'C'                       # locale for system error message
                                        # strings
lc_monetary = 'C'                       # locale for monetary formatting
lc_numeric = 'C'                        # locale for number formatting
lc_time = 'C'                           # locale for time formatting

# - Other Defaults -

#explain_pretty_print = on
#dynamic_library_path = '$libdir'
#local_preload_libraries = ''


#---------------------------------------------------------------------------
# LOCK MANAGEMENT
#---------------------------------------------------------------------------

#deadlock_timeout = 1s
#max_locks_per_transaction = 64         # min 10
                                        # (change requires restart)
# Note: each lock table slot uses ~270 bytes of shared memory, and there are
# max_locks_per_transaction * (max_connections + max_prepared_transactions)
# lock table slots.


#---------------------------------------------------------------------------
# VERSION/PLATFORM COMPATIBILITY
#---------------------------------------------------------------------------

# - Previous Postgres Versions -

#add_missing_from = off
#array_nulls = on
#backslash_quote = safe_encoding        # on, off, or safe_encoding
#default_with_oids = off
#escape_string_warning = on
#standard_conforming_strings = off
#regex_flavor = advanced                # advanced, extended, or basic
#sql_inheritance = on

# - Other Platforms & Clients -

#transform_null_equals = off


#---------------------------------------------------------------------------
# CUSTOMIZED OPTIONS
#---------------------------------------------------------------------------

#custom_variable_classes = ''           # list of custom variable class names

--
Bryan Buecking

Re: CPU bound at 99%

From
"Joshua D. Drake"
Date:
On Wed, 23 Apr 2008 00:31:01 +0900
Bryan Buecking <buecking@gmail.com> wrote:

> at any given time there is about 5-6 postgres in startup
> (ps auxwww | grep postgres | grep startup | wc -l)
>
> about 2300 connections in idle
> (ps auxwww | grep postgres | idle)
>
> and loads of "FATAL: sorry, too many clients already" being logged.
>
> The server that connects to the db is an apache server using
> persistent connections. MaxClients is 2048 thus the high number of
> connections needed. Application was written in PHP using the Pear DB
> class.

Sounds like your pooler isn't reusing connections properly.

Sincerely,

Joshua D. Drake


--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



Re: CPU bound at 99%

From
Erik Jones
Date:
On Apr 22, 2008, at 10:31 AM, Bryan Buecking wrote:

> Hi,
>
> I'm running into an performance problem where a Postgres db is running
> at 99% CPU (4 cores) with about 500 concurrent connection doing
> various
> queries from a web application. This problem started about a week ago,
> and has been steadily going downhill. I have been tweaking the
> config a
> bit, mainly shared_memory but have seen no noticeable improvements.
>
> at any given time there is about 5-6 postgres in startup
> (ps auxwww | grep postgres | grep startup | wc -l)
>
> about 2300 connections in idle
> (ps auxwww | grep postgres | idle)
>
> and loads of "FATAL: sorry, too many clients already" being logged.
>
> The server that connects to the db is an apache server using
> persistent
> connections. MaxClients is 2048 thus the high number of connections
> needed. Application was written in PHP using the Pear DB class.

Are you referring to PHP's persistent connections?  Do not use those.
Here's a thread that details the issues with why not:
http://archives.postgresql.org/pgsql-general/2007-08/msg00660.php 
.  Basically, PHP's persistent connections are NOT pooling solution.
Us pgpool or somesuch.

<snip>

>
> max_connections = 2400

That is WAY too high.  Get a real pooler, such as pgpool, and drop
that down to 1000 and test from there.  I see you mentioned 500
concurrent connections.  Are each of those connections actually doing
something?  My guess that once you cut down on the number actual
connections you'll find that each connection can get it's work done
faster and you'll see that number drop significantly.  For example,
our application does anywhere from 200 - 600 transactions per second,
dependent on the time of day/week, and we never need more that 150 to
200 connections (although we do have the max_connections set to 500).

<snip>

Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com




Re: CPU bound at 99%

From
Bryan Buecking
Date:
On Tue, Apr 22, 2008 at 08:41:09AM -0700, Joshua D. Drake wrote:
> On Wed, 23 Apr 2008 00:31:01 +0900
> Bryan Buecking <buecking@gmail.com> wrote:
>
> > at any given time there is about 5-6 postgres in startup
> > (ps auxwww | grep postgres | grep startup | wc -l)
> >
> > about 2300 connections in idle
> > (ps auxwww | grep postgres | idle)
> >
> > and loads of "FATAL: sorry, too many clients already" being logged.
> >
> > The server that connects to the db is an apache server using
> > persistent connections. MaxClients is 2048 thus the high number of
> > connections needed. Application was written in PHP using the Pear DB
> > class.
>
> Sounds like your pooler isn't reusing connections properly.

The persistent connections are working properly. The idle connections
are expected given that the Apache child process are not closing them
(A la non-persistent).  The connections do go away after 1000 requests
(MaxChildRequest).

I decided to move towards persistent connections since prior to
persistent connections the idle vs startup were reversed.

--
Bryan Buecking                http://www.starling-software.com

Re: CPU bound at 99%

From
Bryan Buecking
Date:
On Tue, Apr 22, 2008 at 10:55:19AM -0500, Erik Jones wrote:
> On Apr 22, 2008, at 10:31 AM, Bryan Buecking wrote:
>
> >max_connections = 2400
>
> That is WAY too high.  Get a real pooler, such as pgpool, and drop
> that down to 1000 and test from there.

I agree, but the number of idle connections dont' seem to affect
performace only memory usage. I'm trying to lessen the load of
connection setup. But sounds like this tax is minimal?

When these issues started happening, max_connections was set to 1000 and
I was not using persistent connections.

> I see you mentioned 500 concurrent connections. Are each of those
> connections actually doing something?

Yes out of the 2400 odd connections, 500 are either in SELECT or RESET.

> My guess that once you cut down on the number actual connections
> you'll find that each connection can get it's work done faster
> and you'll see that number drop significantly.

I agree, but not in this case.  I will look at using pooling.
--
Bryan Buecking                http://www.starling-software.com

Re: CPU bound at 99%

From
"Harald Armin Massa"
Date:
Bryan,

 > > about 2300 connections in idle
>  > > (ps auxwww | grep postgres | idle)

that is about 2300 processes being task scheduled by your kernel, each
of them using > 1 MB of RAM and some other ressources, are you sure
that this is what you want?

Usual recommended design for a web application:

start request, rent a connection from connection pool, do query, put
connection back, finish request, wait for next request

so to get 500 connections in parallel, you would have the outside
situaion of 500 browsers submitting requests within the time needed to
fullfill one request.

Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
fx 01212-5-13695179
-
EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned!

Re: CPU bound at 99%

From
Bryan Buecking
Date:
On Tue, Apr 22, 2008 at 10:55:19AM -0500, Erik Jones wrote:
>
> Are you referring to PHP's persistent connections?  Do not use those.
> Here's a thread that details the issues with why not:
> http://archives.postgresql.org/pgsql-general/2007-08/msg00660.php .

Thanks for that article, very informative and persuasive enough that
I've turned off persistent connections.

--
Bryan Buecking                http://www.starling-software.com

Re: CPU bound at 99%

From
Rodrigo Gonzalez
Date:
Are tables vacuumed often?

Bryan Buecking escribió:
> On Tue, Apr 22, 2008 at 10:55:19AM -0500, Erik Jones wrote:
>
>> On Apr 22, 2008, at 10:31 AM, Bryan Buecking wrote:
>>
>>
>>> max_connections = 2400
>>>
>> That is WAY too high.  Get a real pooler, such as pgpool, and drop
>> that down to 1000 and test from there.
>>
>
> I agree, but the number of idle connections dont' seem to affect
> performace only memory usage. I'm trying to lessen the load of
> connection setup. But sounds like this tax is minimal?
>
> When these issues started happening, max_connections was set to 1000 and
> I was not using persistent connections.
>
>
>> I see you mentioned 500 concurrent connections. Are each of those
>> connections actually doing something?
>>
>
> Yes out of the 2400 odd connections, 500 are either in SELECT or RESET.
>
>
>> My guess that once you cut down on the number actual connections
>> you'll find that each connection can get it's work done faster
>> and you'll see that number drop significantly.
>>
>
> I agree, but not in this case.  I will look at using pooling.
>



Attachment

Re: CPU bound at 99%

From
Bryan Buecking
Date:
On Tue, Apr 22, 2008 at 01:21:03PM -0300, Rodrigo Gonzalez wrote:
> Are tables vacuumed often?

How often is often.  Right now db is vaccumed once a day.
--
Bryan Buecking                http://www.starling-software.com

Re: CPU bound at 99%

From
"Scott Marlowe"
Date:
On Tue, Apr 22, 2008 at 10:10 AM, Bryan Buecking <buecking@gmail.com> wrote:
>
>  I agree, but the number of idle connections dont' seem to affect
>  performace only memory usage. I'm trying to lessen the load of
>  connection setup. But sounds like this tax is minimal?

Not entirely true.  There are certain things that happen that require
one backend to notify ALL OTHER backends.  when this happens a lot,
then the system will slow to a crawl.

Re: CPU bound at 99%

From
Tom Lane
Date:
Bryan Buecking <buecking@gmail.com> writes:
> On Tue, Apr 22, 2008 at 10:55:19AM -0500, Erik Jones wrote:
>> That is WAY too high.  Get a real pooler, such as pgpool, and drop
>> that down to 1000 and test from there.

> I agree, but the number of idle connections dont' seem to affect
> performace only memory usage.

I doubt that's true (and your CPU load suggests the contrary as well).
There are common operations that have to scan the whole PGPROC array,
which has one entry per open connection.  What's worse, some of them
require exclusive lock on the array.

8.3 has some improvements in this area that will probably let it scale
to more connections than previous releases, but in any case connection
pooling is a good thing.

> I'm trying to lessen the load of
> connection setup. But sounds like this tax is minimal?

Not really.  You're better off reusing a connection over a large number
of queries.

            regards, tom lane

Re: CPU bound at 99%

From
Craig Ringer
Date:
Erik Jones wrote:

>> max_connections = 2400
>
> That is WAY too high.  Get a real pooler, such as pgpool, and drop that
> down to 1000 and test from there.  I see you mentioned 500 concurrent
> connections.  Are each of those connections actually doing something?
> My guess that once you cut down on the number actual connections you'll
> find that each connection can get it's work done faster and you'll see
> that number drop significantly.

It's not an issue for me - I'm expecting *never* to top 100 concurrent
connections, and many of those will be idle, with the usual load being
closer to 30 connections. Big stuff ;-)

However, I'm curious about what an idle backend really costs.

On my system each backend has an RSS of about 3.8MB, and a psql process
tends to be about 3.0MB. However, much of that will be shared library
bindings etc. The real cost per psql instance and associated backend
appears to be 1.5MB (measured with 10 connections using system free RAM
change) . If I use a little Python program to generate 50 connections
free system RAM drops by ~45MB and rises by the same amount when the
Python process exists and the backends die, so the backends presumably
use less than 1MB each of real unshared RAM.

Presumably the backends will grow if they perform some significant
queries and are then left idle. I haven't checked that.

At 1MB of RAM per backend that's not a trivial cost, but it's far from
earth shattering, especially allowing for the OS swapping out backends
that're idle for extended periods.

So ... what else does an idle backend cost? Is it reducing the amount of
shared memory available for use on complex queries? Are there some lists
PostgreSQL must scan for queries that get more expensive to examine as
the number of backends rise? Are there locking costs?

--
Craig Ringer

Re: CPU bound at 99%

From
PFC
Date:
> about 2300 connections in idle
> (ps auxwww | grep postgres | idle)

    [...]

> The server that connects to the db is an apache server using persistent
> connections. MaxClients is 2048 thus the high number of connections
> needed. Application was written in PHP using the Pear DB class.

    This is pretty classical.
    When your number of threads gets out of control, everything gets slower,
so more requests pile up, spawning more threads, this is positive
feedback, and in seconds all hell breaks loose. That's why I call it
imploding, like if it collapses under its own weight. There is a threshold
effect and it gets from working good to a crawl rather quickly once you
pass the threshold, as you experienced.

    Note that the same applies to Apache, PHP as well as Postgres : there is
a "sweet spot" in the number of threads, for optimum efficiency, depending
on how many cores you have. Too few threads, and it will be waiting for IO
or waiting for the database. Too many threads, and CPU cache utilization
becomes suboptimal and context switches eat your performance.

    This sweet spot is certainly not at 500 connections per core, either for
Postgres or for PHP. It is much lower, about 5-20 depending on your load.

    I will copypaste here an email I wrote to another person with the exact
same problem, and the exact same solution.
    Please read this carefully :

*********************************************************************

Basically there are three classes of websites in my book.
1- Low traffic (ie a few hits/s on dynamic pages), when performance
doesn't matter
2- High traffic (ie 10-100 hits/s on dynamic pages), when you must read
the rest of this email
3- Monster traffic (lots more than that) when you need to give some of
your cash to Akamai, get some load balancers, replicate your databases,
use lots of caching, etc. This is yahoo, flickr, meetic, etc.

Usually people whose web sites are imploding under load think they are in
class 3 but really most of them are in class 2 but using inadequate
technical solutions like MySQL, etc. I had a website with 200K members
that ran on a Celeron 1200 with 512 MB RAM, perfectly fine, and lighttpd
wasn't even visible in the top.

Good news for you is that the solution to your problem is pretty easy. You
should be able to solve that in about 4 hours.

Suppose you have some web servers for static content ; obviously you are
using lighttpd on that since it can service an "unlimited" (up to the OS
limit, something like 64K sockets) number of concurrent connections. You
could also use nginx or Zeus. I think Akamai uses Zeus. But Lighttpd is
perfectly fine (and free). For your static content servers you will want
to use lots of RAM for caching, if you serve images, put the small files
like thumbnails, css, javascript, html pages on a separate server so that
they are all served from RAM, use a cheap CPU since a Pentium-M  with
lighttpd will happily push 10K http hits/s if you don't wait for IO. Large
files should be on the second static server to avoid cache trashing on the
server which has all the frequently accessed small files.

Then you have some web servers for generating your dynamic content. Let's
suppose you have N CPU cores total.
With your N cores, the ideal number of threads would be N. However those
will also wait for IO and database operations, so you want to fill those
wait times with useful work, so maybe you will use something like 2...10
threads per core. This can only be determined by experimentation, it
depends on the type and length of your SQL queries so there is no "one
size fits all" answer.

Example. You have pages that take 20 ms to generate, and you have 100
requests for those coming up. Let's suppose you have one CPU core.

(Note : if your pages take longer than 10 ms, you have a problem. On the
previously mentioned website, now running on the cheapest Core 2 we could
find since the torrent tracker eats lots of CPU, pages take about 2-5 ms
to generate, even the forum pages with 30 posts on them. We use PHP with
compiled code caching and SQL is properly optimized). And, yes, it uses
MySQL. Once I wrote (as an experiment) an extremely simple forum which did
1400 pages/second (which is huge) with a desktop Core2 as the Postgres 8.2
server.

- You could use Apache in the old fasion way, have 100 threads, so all
your pages will take 20 ms x 100 = 2 seconds,
But the CPU cache utilisation will suck because of all those context
switches, you'll have 100 processes eating your RAM (count 8MB for a PHP
process), 100 database connections, 100 postgres processes, the locks will
stay on longer, transactions will last longer, you'll get more dead rows
to vacuum, etc.
And actually, since Apache will not buffer the output of your scripts, the
PHP or Perl interpreter will stay in memory (and hog a database
connection) until the client at the other end of the internets had loaded
all the data. If the guy has DSL, this can take 0.5 seconds, if he has
56K, much longer. So, you are likely to get much more than 100 processes
in your Apache, perhaps 150 or perhaps even 1000 if you are out of luck.
In this case the site usually implodes.

- You could have a lighttpd or squid proxy handling the client
connections, then funnelling that to a few threads generating the
webpages. Then, you don't care anymore about the slowness of the clients
because they are not hogging threads anymore. If you have 4 threads, your
requests will be processed in order, first come first served, 20 ms x 4 =
80 ms each average, the CPU cache will work better since you'll get much
less context switching, RAM will not be filled, postgres will be happy.

> So, the front-end proxy would have a number of max connections, say 200,

Number of connections to clients => don't set any values, sockets are free
in lighttpd.
Number of connections to PHP/fastcgi or apache/mod_perl backends => number
of cores x 2 to 5, adjust to taste

>   and it would connect to another httpd/mod_perl server behind with a
> lower number of connections, say 20. If the backend httpd server was
> busy, the proxy connection to it would just wait in a queue until it was
> available.

     Yes, it waits in a queue.

> Is that the kind of design you had in mind?

     Yes.
     The two key points are that :
     * Perl/PHP processes and their heavy resources (database connections,
RAM) are used only when they have work to do and not waiting for the
client.
     * The proxy must work this way :
         1- get and buffer request data from client (slow, up to 500 ms, up
to 2000 ms if user has emule or other crap hogging his upload)
         2- send request to backend (fast, on your LAN, < 1 ms)
         3- backend generates HTML and sends it to proxy (fast, LAN), proxy
buffers data
         4- backend is now free to process another request
         5- proxy sends buffered data to client (slow, up to 100-3000 ms)
     The slow parts (points 1 and 5) do not hog a perl/PHP backend.

     Do not use a transparent proxy ! The proxy must buffer requests and
data for this to work. Backends must never wait for the client. Lighttpd
will buffer everything, I believe Apache can be configured to do so. But I
prefer to use lighttpd for proxying, it is faster and the queuing works
better.

     Also, if you can use FastCGI, use it. I have never used mod_perl, but
with mod_php, you have a fixed startup cost every time a PHP interpreter
starts. With fastcgi, a number of PHP interpreter threads are spawned at
startup, so they are always ready, the startup cost is much smaller. You
can serve a small AJAX request with 1-2 database queries in less than 1 ms
if you are careful with your code (like, no heavyweight session
initialization on each page, using mmcache to avoid reparsing the PHP
everytime, etc).

     If you have several backend servers generating webpages, use sticky
sessions and put the session storage on the backends themselves, if you
use files use ReiserFS not ext3 which sucks when you have a large number
of session files in the same directory. Or use memcached, whatever, but
don't put sessions in the database, this gives you a nice tight bottleneck
when adding servers. If each and every one of your pages has an UPDATE
query to the sessions table you have a problem.

     As for why I like lighttpd, I am fond of the asynchronous select/poll
model for a webserver which needs to handle lots of concurrent
connections. When you have 50 open sockets threads are perfectly fine,
when you have 1000 a threaded server will implode. I wrote a bittorrent
tracker in Python using an asynchronous select/poll model ; it has been
handling about 150-400 HTTP hits per second for two years now, it has
about 100-200 concurrent opened sockets 24 hours a day, and the average
lifetime of a socket connection is 600 ms. There are 3 threads (webserver,
backend, deferred database operations) with some queues in between for the
plumbing. Serving an /announce HTTP request takes 350 microseconds of CPU
time. All using a purely interpreted language, lol. It uses half a core on
the Core 2 and about 40 MB of RAM.

    When lighttpd is overloaded (well, it's impossible to kill it with static
files unless it waits for disk IO, but if you overload the fastcgi
processes), requests are kicked out of the queue, so for instance it will
only serve 50% of the requests. But an overloaded apache will serve 0%
since you'll get 1000 threads, it'll swap, and everything will timeout and
crash.

********************************************************

End of copypaste.

    So :

    - You need to get less Postgres connections to let Postgres breathe and
use your CPU power to perform queries and not context switches and cache
management.
    - You need to get less PHP threads which will have the same effect on
your webserver.

    The way to do this is is actually pretty simple.

    - Frontend proxy (lighttpd), load balancer, whatever, sending static
requests to static servers, and dynamic requests to dynamic servers. If
the total size of your static files fits in the RAM of this server, make
the static server and the proxy the same lighttpd instance.

    - Backends for PHP : a number of servers running PHP/fastcgi, no web
servers at all, the lighttpd frontend can hit several PHP/fastcgi backends.

    - Use PHP persistent connections (which now appear to work in the latest
version, in fastcgi mode, I don't know about mod_php's persistent
connections though).
    - Or use pgpool or pgbouncer or another connection pooler, but only if
PHP's persistent connections do not work for you.

> 1:  Each apache / php process maintains its own connections, not
> sharing with others.  So it's NOT connection pooling, but people tend
> to think it is.

    True with mod_php (and sad). With fastcgi, you don't really care, since
the PHP processes are few and are active most of the time, no connection
hogging takes place unless you use many different users to connect to
postgres, in which case you should switch to pgpool.

> 2:  Each unique connection creates another persistent connection for
> an apache/php child process.  If you routinely connect to multiple
> servers / databases or as > 1 user, then each one of those
> combinations that is unique makes another persistent connection.

    True also for fastcgi, but if you don't do that, no problem.

> 3:  There's no facility in PHP to clean an old connection out and make
> sure it's in some kind of consistent state when you get it.  It's in
> exactly the same state it was when the previous php script finished
> with it.  Half completed transactions, partial sql statements,
> sequence functions like currval() may have values that don't apply to
> you.

    Apparently now fixed.

> 4:  pg_close can't close a persistent connection.  Once it's open, it
> stays open until the child process is harvested.

    Don't know about that.

> 5:  Apache, by default, is configured for 150 child processes.
> Postgresql, and many other databases for that matter, are configured
> for 100 or less.

    (and for good reason)

> Even if apache only opens one connection to one
> database with one user account, it will eventually try to open the
> 101st connection to postgresql and fail.  So, the default
> configuration of apache / postgresql for number of connections is
> unsafe for pconnect.

    fastcgi makes this problem disappear by separating the concept of "client
connection" from the concept of "web server thread". Not only will it make
Postgres happier, your PHP processing will be much faster too.

> 6:  The reason for connection pooling is primarily to twofold.  One is
> to allow very fast connections to your database when doing lots of
> small things where connection time will cost too much.  The other is
> to prevent your database from having lots of stale / idle connections
> that cause it to waste memory and to be slower since each backend
> needs to communicate with every other backend some amount of data some
> times.  pconnect takes care of the first problem, but exacerbates the
> second.

    Moot point with fastcgi.
    Unused PHP processes are removed in times of low traffic, along with
their connections.

> P.s. dont' think I'm dogging PHP, cause I'm not.  I use it all the
> time, and it's really great for simple small scripts that need to be
> done NOW and need to be lightweight.  I even use pconnect a bit.  But
> my machine is set for 50 or fewer apache children and 150 postgresql
> connects, and I only use pconnect on small, lightweight things that
> need to zoom.  Everything else gets regular old connect.

    Very true for mod_php, wrong for fastcgi : you can get extreme
performance with pconnect and a PHP code cache like turck/mm or
eaccelerator, down to 1 ms per page.

    Especially if you use PEAR which is very bloated, you nead a code cache
to avoid parsing it on every page.
    On previously mentioned website it cut the page time from 50 ms to 2 ms
on some pages because there was a lot of includes.









Re: CPU bound at 99%

From
Ivan Voras
Date:
Bryan Buecking wrote:
> On Tue, Apr 22, 2008 at 10:55:19AM -0500, Erik Jones wrote:
>> Are you referring to PHP's persistent connections?  Do not use those.
>> Here's a thread that details the issues with why not:
>> http://archives.postgresql.org/pgsql-general/2007-08/msg00660.php .
>
> Thanks for that article, very informative and persuasive enough that
> I've turned off persistent connections.

Note that it's not always true - current recommended practice for PHP is
to run it in FastCGI, in which case even though there are hundreds of
Apache processes, there are only few PHP processes with their persistent
database connections (and unused PHP FastCGI servers get killed off
routinely) so you get almost "proper" pooling without the overhead.