Re: Several optimization options (config/hardware) - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: Several optimization options (config/hardware)
Date
Msg-id da65dbee02134675004f8a8895badebd.squirrel@sq.gransy.com
Whole thread Raw
In response to Several optimization options (config/hardware)  (Martin Grotzke <martin.grotzke@googlemail.com>)
List pgsql-performance
Hi,

On 2 Květen 2012, 15:19, Martin Grotzke wrote:
> Hi,
>
> we want to see if we can gain better performance with our postgresql
> database. In the last year the amount of data growed from ~25G to now
> ~140G and we're currently developing a new feature that needs to get
> data faster from the database. The system is both read and write heavy.

What does the read/write heavy mean? How much data / transactions you need
to handle, how many clients, etc.?

> At first I want to give you an overview over the hardware, software and
> configuration and the changes that I see we could check out. I'd be very
> happy if you could review and tell if the one or the other is nonsense.
>
> Hardware:
> - CPU: 4x4 Cores Intel Xeon L5630  @ 2.13GHz
> - RAM: 64GB
> - RAID 1 (1+0) HP Company Smart Array G6 controllers, P410i
>   (I don't know the actual number of discs)
> - A single partition for data and wal-files

Have you done any benchmarks with that hardware, to verify the
performance? Can you do that now (i.e. stopping the database so that you
can run them)?

>
> Software
> - RHEL 6, Kernel 2.6.32-220.4.1.el6.x86_64
> - postgresql90-server-9.0.6-1PGDG.rhel6.x86_64
>
> Configuration (selected from settings)
> ------------------------------+-----------+--------+-------------------
>              name             |  setting  |  unit  |       source
> ------------------------------+-----------+--------+-------------------
>  autovacuum                   | on        | [NULL] | configuration file
>  checkpoint_completion_target | 0.5       | [NULL] | default
>  checkpoint_segments          | 16        |        | configuration file
>  checkpoint_timeout           | 300       | s      | default
>  commit_delay                 | 0         |        | default
>  default_statistics_target    | 100       |        | default
>  effective_cache_size         | 16384     | 8kB    | default
>  fsync                        | on        | [NULL] | default
>  log_min_duration_statement   | 250       | ms     | configuration file
>  log_temp_files               | -1        | kB     | default
>  maintenance_work_mem         | 16384     | kB     | default
>  max_connections              | 2000      |        | configuration file
>  random_page_cost             | 4         | [NULL] | default
>  shared_buffers               | 1310720   | 8kB    | configuration file
>  synchronous_commit           | on        | [NULL] | default
>  wal_buffers                  | 256       | 8kB    | configuration file
>  wal_sync_method              | fdatasync | [NULL] | default
>  wal_writer_delay             | 200       | ms     | default
>  work_mem                     | 1024      | kB     | default
> ------------------------------+-----------+--------+-------------------
>
> Some stats:
> $ free -m
>              total  used    free   shared  buffers  cached
> Mem:         64413  63764    649        0       37   60577
> -/+ buffers/cache:   3148   61264
> Swap:         8191    333    7858
>
> iostat shows nearly all the time ~100% io utilization of the disc
> serving the pg data / wal files.

That's rather useless value, especially if you don't know details about
the RAID array. With multiple spindles, the array may be 100% utilized
(ratio of time it spent servicing requests) yet it may absorb more.
Imagine a RAID with 2 drives, each 50% utilized. The array may report 100%
utilization yet it's actually 50% utilized ...

>
> I'd suggest the following changes:
>
> (Improve query planning)
> 1) Increase effective_cache_size to 48GB
> 2) Increase work_mem to 10MB (alternatively first activate
> log_temp_files to see if this is really needed
> 3) Reduce random_page_cost to 1
>
> (WAL / I/O)
> 4) Set synchronous_commit=off
> 5) Increase checkpoint_segments to 32
> 6) Increase wal_buffers to 16M
> 7) Add new discs (RAID) for wal files / pg_xlog
>
> (Misc)
> 8) Increase maintainance_work_mem to 1GB
>
> In parallel I'd review statistics like long running queries, index usage
> (which ones can be dropped) etc.

Reviewing long-running stats queries is a good starting point - you need
to find out where the bottleneck is (I/O, CPU, ...) and this may be
helpful.

Dropping unused indexes is quite difficult - most of the time I see the
case with multiple similar indexes, all of them are used but it's possible
to remove some of them with minimal performance impact.

> At first I'd like to try out 1) to 3) as they affect the query planner,
> so that some indices that are not used right now might be used then.

If you don't know where the issue is, it's difficult to give any advices.
But in general, I'd say this

1) setting effective_cache_size to 48G - seems like a good idea, better
match for your environment

2) increasing work_mem - might help, but you should check the slow queries
first (enabling log_temp_files is a good idea)

3) setting random_page_cost is a really bad idea IMHO, especially with
spinners, rather weak controller and unknown details about the array

So do (1), maybe (2) and I'd definitely vote against (3).

Regarding the other options:

4) synchronous_commit=off - well, this may improve the performance, but it
won't fix the underlying issues and it may introduce other
application-level issues (expecting the transaction to be committed etc.)

5) Increase checkpoint_segments to 32 - Do you see a lot of
checkpoint-related warnings in the log? If not, this probably won't fix
anything. If you actually do have issues with checkpoints, I'd recommend
increasing the default checkpoint timeout (eg. to 30 minutes),
significantly increasing the number of segments (e.g. to 64 or more) and
tuning the completion target (e.g. to 0.9).

6) Increase wal_buffers to 16M - may help, but I would not expect a
tremendous improvement.

7) Add new discs (RAID) for wal files / pg_xlog - good idea, moving those
to a separate spindles may help a lot.

> After this change I'd review index usage and clean up those / improve
> queries.
>
> Then, finally I'd test WAL / I/O related changes.

Why do you want to do this last? Chances are that writes are causing many
of the I/O issues (because it needs to actually fsync the data). Tuning
this will improve the general I/O performance etc.

> Do you think this makes sense? Do you see other improvements, or do you
> need some more information?

First of all, find out more about the RAID array. Do some basic I/O tests
(with dd etc.).

Moreover, I've noticed you do have max_connections=2000. That's insanely
high in most cases, unless you're using commit_delay/commit_siblings. A
reasonable value is usually something like "num of cpus + num of drives"
although that's just a rough estimate. But given that you have 16 cores,
I'd expect ~100 or something like that. If you need more, I'd recommend a
pooler (e.g. pgpool).

Tomas


pgsql-performance by date:

Previous
From: Martin Grotzke
Date:
Subject: Several optimization options (config/hardware)
Next
From: Jan Nielsen
Date:
Subject: Re: Configuration Recommendations