Thread: server performance issues - suggestions for tuning

server performance issues - suggestions for tuning

From
Kevin Kempter
Date:
Hi List;

I've just inherited multiple postgres database servers in multiple data
centers across the US and Europe via a new contract I've just started.

Each night during the nightly batch processing several of the servers (2 in
particular) slow to a crawl - they are dedicated postgres database servers.
There is a lot of database activity going on sometimes upwards of 200
concurrent queries however I just dont think that the machines should be this
pegged. I am in the process of cleaning up dead space - their #1 fix for
performance issues in the past is to kill the current vacuum process.
Likewise I've just bumped shared_buffers to 150000 and work_mem to 250000.

Even at that I still see slow processing/high system loads at nite.I have
noticed that killing the current vacuum process (autovacuum is turned on)
speeds up the entire machine significantly.

The servers are 4-CPU intel boxes (not dual-core) with 4Gig of memory and
attached to raid-10 array's

Any thoughts on where to start?

Below are the current/relevant/changed postgresql.conf settings.

Thanks in advance...

/Kevin




============== postgresql.conf (partial listing)========================
#---------------------------------------------------------------------------
# CLIENT CONNECTION DEFAULTS
#---------------------------------------------------------------------------

# - Statement Behavior -

#search_path = '$user,public'        # schema names
#default_tablespace = ''        # a tablespace name, '' uses
                    # the default
#check_function_bodies = on
#default_transaction_isolation = 'read committed'
#default_transaction_read_only = off
#statement_timeout = 0            # 0 is disabled, in milliseconds

# - Locale and Formatting -

#datestyle = 'iso, mdy'
#timezone = unknown            # actually, defaults to TZ
                    # environment setting
#australian_timezones = off
#extra_float_digits = 0            # min -15, max 2
#client_encoding = sql_ascii        # actually, defaults to database
                    # encoding

# These settings are initialized by initdb -- they might be changed
lc_messages = 'en_US.UTF-8'            # locale for system error message
                    # strings
lc_monetary = 'en_US.UTF-8'            # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'            # locale for number formatting
lc_time = 'en_US.UTF-8'                # locale for time formatting

# - Other Defaults -

#explain_pretty_print = on
#dynamic_library_path = '$libdir'


#---------------------------------------------------------------------------
# LOCK MANAGEMENT
#---------------------------------------------------------------------------

#deadlock_timeout = 1000        # in milliseconds
#max_locks_per_transaction = 64        # min 10
# note: each lock table slot uses ~220 bytes of shared memory, and there are
# max_locks_per_transaction * (max_connections + max_prepared_transactions)
# lock table slots.


#---------------------------------------------------------------------------
# VERSION/PLATFORM COMPATIBILITY
#---------------------------------------------------------------------------

# - Previous Postgres Versions -

#add_missing_from = off
#backslash_quote = safe_encoding    # on, off, or safe_encoding
#default_with_oids = off
#escape_string_warning = off
#regex_flavor = advanced        # advanced, extended, or basic
#sql_inheritance = on

# - Other Platforms & Clients -

#transform_null_equals = off


#---------------------------------------------------------------------------
# CUSTOMIZED OPTIONS
#---------------------------------------------------------------------------

#custom_variable_classes = ''        # list of custom variable class names
=============================================

Re: server performance issues - suggestions for tuning

From
Richard Huxton
Date:
Kevin Kempter wrote:
> Hi List;
>
> I've just inherited multiple postgres database servers in multiple data
> centers across the US and Europe via a new contract I've just started.
>
> Each night during the nightly batch processing several of the servers (2 in
> particular) slow to a crawl - they are dedicated postgres database servers.
> There is a lot of database activity going on sometimes upwards of 200
> concurrent queries however I just dont think that the machines should be this
> pegged. I am in the process of cleaning up dead space - their #1 fix for
> performance issues in the past is to kill the current vacuum process.
> Likewise I've just bumped shared_buffers to 150000 and work_mem to 250000.

Well, allowing vacuum to do its job can clearly only help matters. I'm
not sure about setting work_mem so high though. That's the memory you're
using per-sort, so you can use multiples of that in a single query. With
200 concurrent queries I'd worry about running into swap. If you're
doing it just for the batch processes that might make sense.

You might well want to set maintenance_work_mem quite high though, for
any overnight maintenance.

A shared_buffers of 1.2GB isn't outrageous, but again with 200 backend
processes you'll want to consider how much memory each process will
consume. It could be that you're better off with a smaller
shared_buffers and relying more on the OS doing its disk caching.

> Even at that I still see slow processing/high system loads at nite.I have
> noticed that killing the current vacuum process (autovacuum is turned on)
> speeds up the entire machine significantly.

If it's disk i/o that's the limiting factor you might want to look at
the "Cost-Based Vacuum Delay" section in the configuration settings.

> The servers are 4-CPU intel boxes (not dual-core) with 4Gig of memory and
> attached to raid-10 array's
>
> Any thoughts on where to start?

Make sure you are gathering stats and at least stats_block_level stuff.
Then have a cron-job make copies of the stats tables, but adding a
timestamp column. That way you can run diffs against different time periods.

Pair this up with top/vmstat/iostat activity.

Use log_min_duration_statement to catch any long-running queries so you
can see if you're getting bad plans that push activity up.

Try and make only one change at a time, otherwise it's difficult to tell
what's helping/hurting.

--
   Richard Huxton
   Archonet Ltd

Re: server performance issues - suggestions for tuning

From
"Kevin Grittner"
Date:
>>> On Mon, Aug 27, 2007 at 11:13 PM, in message
<200708272213.14277.kevin@kevinkempterllc.com>, Kevin Kempter
<kevin@kevinkempterllc.com> wrote:
> Each night during the nightly batch processing several of the servers (2 in
> particular) slow to a crawl - they are dedicated postgres database servers.
> There is a lot of database activity going on sometimes upwards of 200
> concurrent queries

> Any thoughts on where to start?

Is there any way to queue up these queries and limit how many are running at
a time?  I don't know what the experience of others is, but I've found that
when I have more than two to four queries running per CPU, throughput starts
to drop, and response time drops even faster.

For purposes of illustration, for a moment let's forget that a query may
block waiting for I/O and another query might be able to use the CPU in the
meantime.  Then, think of it this way -- if you have one CPU and 100 queries
to run, each of which will take one second, if you start them all and they
time slice, nobody gets anything for 100 seconds, so that is your average
response time.  If you run the one at a time, only one query takes that
long, the rest are faster, and you've cut your average response time in
half.  On top of that, there is overhead to switching between processes,
and there can be contention for resources such as locks, which both have a
tendency to further slow things down.

In the real world, there are multiple resources which can hold up a
query, so you get benefit from running more than one query at a time,
because they will often be using different resources.

But unless that machine has 50 CPUs, you will probably get better throughput
and response time by queuing the requests.

-Kevin



Re: server performance issues - suggestions for tuning

From
Andrew Sullivan
Date:
On Tue, Aug 28, 2007 at 08:12:06AM -0500, Kevin Grittner wrote:
>
> Is there any way to queue up these queries and limit how many are running at
> a time?

Sure: limit the number of connections to the database, and put a pool
in front.  It can indeed help.

If you have a lot of bloat due to large numbers of failed vacuums,
however, I suspect your problem is I/O.  Vacuum churns through the
disk very aggressively, and if you're close to your I/O limit, it can
push you over the top.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
"The year's penultimate month" is not in truth a good way of saying
November.
        --H.W. Fowler

Re: server performance issues - suggestions for tuning

From
"Scott Marlowe"
Date:
On 8/27/07, Kevin Kempter <kevin@kevinkempterllc.com> wrote:
> Hi List;
>
> I've just inherited multiple postgres database servers in multiple data
> centers across the US and Europe via a new contract I've just started.

What pg version are you working with, and on what OS / OS version?

> Each night during the nightly batch processing several of the servers (2 in
> particular) slow to a crawl - they are dedicated postgres database servers.
> There is a lot of database activity going on sometimes upwards of 200
> concurrent queries however I just dont think that the machines should be this
> pegged. I am in the process of cleaning up dead space - their #1 fix for
> performance issues in the past is to kill the current vacuum process.
> Likewise I've just bumped shared_buffers to 150000 and work_mem to 250000.

way too big for work_mem as mentioned before.  Set it to something
reasonable, like 8M or so.  Then, if you've got one query that really
needs lots of memory to run well, you can set it higher for that
connection / query only.  You can even set work_mem to a particular
number for a particular user with alter user command.

Oh, and 200 concurrent queries is a LOT.

> Even at that I still see slow processing/high system loads at nite.I have
> noticed that killing the current vacuum process (autovacuum is turned on)
> speeds up the entire machine significantly.

> The servers are 4-CPU intel boxes (not dual-core) with 4Gig of memory and
> attached to raid-10 array's

It sounds to me like your systems are I/O bound, at least when vacuum
is running.  If you want to get good performance and have vacuum run
in a reasonable amount of time, you might need to upgrade your RAID
subsystems.  Do you have battery backed caching controllers?  Which
exact model controller are you using?  How many drives in your RAID10
array?  What types of queries are typical (OLAP versus OLTP really)?

> Any thoughts on where to start?

The vacuum cost settings to reduce the impact vacuum has.

Increasing fsm settings as needed.

Vacuum verbose to see if you've blown out your fsm settings and to see
what fsm settings you might need.

reindexing particularly bloated tables / indexes.

hardware upgrades if needed.