Thread: Several optimization options (config/hardware)

Several optimization options (config/hardware)

From
Martin Grotzke
Date:
Hi,

we want to see if we can gain better performance with our postgresql
database. In the last year the amount of data growed from ~25G to now
~140G and we're currently developing a new feature that needs to get
data faster from the database. The system is both read and write heavy.

At first I want to give you an overview over the hardware, software and
configuration and the changes that I see we could check out. I'd be very
happy if you could review and tell if the one or the other is nonsense.

Hardware:
- CPU: 4x4 Cores Intel Xeon L5630  @ 2.13GHz
- RAM: 64GB
- RAID 1 (1+0) HP Company Smart Array G6 controllers, P410i
  (I don't know the actual number of discs)
- A single partition for data and wal-files

Software
- RHEL 6, Kernel 2.6.32-220.4.1.el6.x86_64
- postgresql90-server-9.0.6-1PGDG.rhel6.x86_64

Configuration (selected from settings)
------------------------------+-----------+--------+-------------------
             name             |  setting  |  unit  |       source
------------------------------+-----------+--------+-------------------
 autovacuum                   | on        | [NULL] | configuration file
 checkpoint_completion_target | 0.5       | [NULL] | default
 checkpoint_segments          | 16        |        | configuration file
 checkpoint_timeout           | 300       | s      | default
 commit_delay                 | 0         |        | default
 default_statistics_target    | 100       |        | default
 effective_cache_size         | 16384     | 8kB    | default
 fsync                        | on        | [NULL] | default
 log_min_duration_statement   | 250       | ms     | configuration file
 log_temp_files               | -1        | kB     | default
 maintenance_work_mem         | 16384     | kB     | default
 max_connections              | 2000      |        | configuration file
 random_page_cost             | 4         | [NULL] | default
 shared_buffers               | 1310720   | 8kB    | configuration file
 synchronous_commit           | on        | [NULL] | default
 wal_buffers                  | 256       | 8kB    | configuration file
 wal_sync_method              | fdatasync | [NULL] | default
 wal_writer_delay             | 200       | ms     | default
 work_mem                     | 1024      | kB     | default
------------------------------+-----------+--------+-------------------

Some stats:
$ free -m
             total  used    free   shared  buffers  cached
Mem:         64413  63764    649        0       37   60577
-/+ buffers/cache:   3148   61264
Swap:         8191    333    7858

iostat shows nearly all the time ~100% io utilization of the disc
serving the pg data / wal files.

I'd suggest the following changes:

(Improve query planning)
1) Increase effective_cache_size to 48GB
2) Increase work_mem to 10MB (alternatively first activate
log_temp_files to see if this is really needed
3) Reduce random_page_cost to 1

(WAL / I/O)
4) Set synchronous_commit=off
5) Increase checkpoint_segments to 32
6) Increase wal_buffers to 16M
7) Add new discs (RAID) for wal files / pg_xlog

(Misc)
8) Increase maintainance_work_mem to 1GB

In parallel I'd review statistics like long running queries, index usage
(which ones can be dropped) etc.

At first I'd like to try out 1) to 3) as they affect the query planner,
so that some indices that are not used right now might be used then.

After this change I'd review index usage and clean up those / improve
queries.

Then, finally I'd test WAL / I/O related changes.

Do you think this makes sense? Do you see other improvements, or do you
need some more information?

Thanx in advance,
cheers,
Martin


Attachment

Re: Several optimization options (config/hardware)

From
"Tomas Vondra"
Date:
Hi,

On 2 Květen 2012, 15:19, Martin Grotzke wrote:
> Hi,
>
> we want to see if we can gain better performance with our postgresql
> database. In the last year the amount of data growed from ~25G to now
> ~140G and we're currently developing a new feature that needs to get
> data faster from the database. The system is both read and write heavy.

What does the read/write heavy mean? How much data / transactions you need
to handle, how many clients, etc.?

> At first I want to give you an overview over the hardware, software and
> configuration and the changes that I see we could check out. I'd be very
> happy if you could review and tell if the one or the other is nonsense.
>
> Hardware:
> - CPU: 4x4 Cores Intel Xeon L5630  @ 2.13GHz
> - RAM: 64GB
> - RAID 1 (1+0) HP Company Smart Array G6 controllers, P410i
>   (I don't know the actual number of discs)
> - A single partition for data and wal-files

Have you done any benchmarks with that hardware, to verify the
performance? Can you do that now (i.e. stopping the database so that you
can run them)?

>
> Software
> - RHEL 6, Kernel 2.6.32-220.4.1.el6.x86_64
> - postgresql90-server-9.0.6-1PGDG.rhel6.x86_64
>
> Configuration (selected from settings)
> ------------------------------+-----------+--------+-------------------
>              name             |  setting  |  unit  |       source
> ------------------------------+-----------+--------+-------------------
>  autovacuum                   | on        | [NULL] | configuration file
>  checkpoint_completion_target | 0.5       | [NULL] | default
>  checkpoint_segments          | 16        |        | configuration file
>  checkpoint_timeout           | 300       | s      | default
>  commit_delay                 | 0         |        | default
>  default_statistics_target    | 100       |        | default
>  effective_cache_size         | 16384     | 8kB    | default
>  fsync                        | on        | [NULL] | default
>  log_min_duration_statement   | 250       | ms     | configuration file
>  log_temp_files               | -1        | kB     | default
>  maintenance_work_mem         | 16384     | kB     | default
>  max_connections              | 2000      |        | configuration file
>  random_page_cost             | 4         | [NULL] | default
>  shared_buffers               | 1310720   | 8kB    | configuration file
>  synchronous_commit           | on        | [NULL] | default
>  wal_buffers                  | 256       | 8kB    | configuration file
>  wal_sync_method              | fdatasync | [NULL] | default
>  wal_writer_delay             | 200       | ms     | default
>  work_mem                     | 1024      | kB     | default
> ------------------------------+-----------+--------+-------------------
>
> Some stats:
> $ free -m
>              total  used    free   shared  buffers  cached
> Mem:         64413  63764    649        0       37   60577
> -/+ buffers/cache:   3148   61264
> Swap:         8191    333    7858
>
> iostat shows nearly all the time ~100% io utilization of the disc
> serving the pg data / wal files.

That's rather useless value, especially if you don't know details about
the RAID array. With multiple spindles, the array may be 100% utilized
(ratio of time it spent servicing requests) yet it may absorb more.
Imagine a RAID with 2 drives, each 50% utilized. The array may report 100%
utilization yet it's actually 50% utilized ...

>
> I'd suggest the following changes:
>
> (Improve query planning)
> 1) Increase effective_cache_size to 48GB
> 2) Increase work_mem to 10MB (alternatively first activate
> log_temp_files to see if this is really needed
> 3) Reduce random_page_cost to 1
>
> (WAL / I/O)
> 4) Set synchronous_commit=off
> 5) Increase checkpoint_segments to 32
> 6) Increase wal_buffers to 16M
> 7) Add new discs (RAID) for wal files / pg_xlog
>
> (Misc)
> 8) Increase maintainance_work_mem to 1GB
>
> In parallel I'd review statistics like long running queries, index usage
> (which ones can be dropped) etc.

Reviewing long-running stats queries is a good starting point - you need
to find out where the bottleneck is (I/O, CPU, ...) and this may be
helpful.

Dropping unused indexes is quite difficult - most of the time I see the
case with multiple similar indexes, all of them are used but it's possible
to remove some of them with minimal performance impact.

> At first I'd like to try out 1) to 3) as they affect the query planner,
> so that some indices that are not used right now might be used then.

If you don't know where the issue is, it's difficult to give any advices.
But in general, I'd say this

1) setting effective_cache_size to 48G - seems like a good idea, better
match for your environment

2) increasing work_mem - might help, but you should check the slow queries
first (enabling log_temp_files is a good idea)

3) setting random_page_cost is a really bad idea IMHO, especially with
spinners, rather weak controller and unknown details about the array

So do (1), maybe (2) and I'd definitely vote against (3).

Regarding the other options:

4) synchronous_commit=off - well, this may improve the performance, but it
won't fix the underlying issues and it may introduce other
application-level issues (expecting the transaction to be committed etc.)

5) Increase checkpoint_segments to 32 - Do you see a lot of
checkpoint-related warnings in the log? If not, this probably won't fix
anything. If you actually do have issues with checkpoints, I'd recommend
increasing the default checkpoint timeout (eg. to 30 minutes),
significantly increasing the number of segments (e.g. to 64 or more) and
tuning the completion target (e.g. to 0.9).

6) Increase wal_buffers to 16M - may help, but I would not expect a
tremendous improvement.

7) Add new discs (RAID) for wal files / pg_xlog - good idea, moving those
to a separate spindles may help a lot.

> After this change I'd review index usage and clean up those / improve
> queries.
>
> Then, finally I'd test WAL / I/O related changes.

Why do you want to do this last? Chances are that writes are causing many
of the I/O issues (because it needs to actually fsync the data). Tuning
this will improve the general I/O performance etc.

> Do you think this makes sense? Do you see other improvements, or do you
> need some more information?

First of all, find out more about the RAID array. Do some basic I/O tests
(with dd etc.).

Moreover, I've noticed you do have max_connections=2000. That's insanely
high in most cases, unless you're using commit_delay/commit_siblings. A
reasonable value is usually something like "num of cpus + num of drives"
although that's just a rough estimate. But given that you have 16 cores,
I'd expect ~100 or something like that. If you need more, I'd recommend a
pooler (e.g. pgpool).

Tomas


Re: Several optimization options (config/hardware)

From
"Albe Laurenz"
Date:
Martin Grotzke wrote:
> we want to see if we can gain better performance with our postgresql
> database. In the last year the amount of data growed from ~25G to now
> ~140G and we're currently developing a new feature that needs to get
> data faster from the database. The system is both read and write
heavy.
>
> At first I want to give you an overview over the hardware, software
and
> configuration and the changes that I see we could check out. I'd be
very
> happy if you could review and tell if the one or the other is
nonsense.
>
> Hardware:
> - CPU: 4x4 Cores Intel Xeon L5630  @ 2.13GHz
> - RAM: 64GB
> - RAID 1 (1+0) HP Company Smart Array G6 controllers, P410i
>   (I don't know the actual number of discs)
> - A single partition for data and wal-files
>
> Software
> - RHEL 6, Kernel 2.6.32-220.4.1.el6.x86_64
> - postgresql90-server-9.0.6-1PGDG.rhel6.x86_64

You could try different kernel I/O elevators and see if that improves
something.

I have made good experiences with elevator=deadline and elevator=noop.

Yours,
Laurenz Albe

Re: Several optimization options (config/hardware)

From
Martin Grotzke
Date:
Hi Laurenz,

On 05/03/2012 09:26 AM, Albe Laurenz wrote:
> Martin Grotzke wrote:
>> we want to see if we can gain better performance with our postgresql
>> database. In the last year the amount of data growed from ~25G to now
>> ~140G and we're currently developing a new feature that needs to get
>> data faster from the database. The system is both read and write
> heavy.
>>
>> At first I want to give you an overview over the hardware, software
> and
>> configuration and the changes that I see we could check out. I'd be
> very
>> happy if you could review and tell if the one or the other is
> nonsense.
>>
>> Hardware:
>> - CPU: 4x4 Cores Intel Xeon L5630  @ 2.13GHz
>> - RAM: 64GB
>> - RAID 1 (1+0) HP Company Smart Array G6 controllers, P410i
>>   (I don't know the actual number of discs)
>> - A single partition for data and wal-files
>>
>> Software
>> - RHEL 6, Kernel 2.6.32-220.4.1.el6.x86_64
>> - postgresql90-server-9.0.6-1PGDG.rhel6.x86_64
>
> You could try different kernel I/O elevators and see if that improves
> something.
>
> I have made good experiences with elevator=deadline and elevator=noop.
Ok, great info.

I'm not sure at which device to look honestly to check the current
configuration.

mount/fstab shows the device /dev/mapper/VG01-www for the relevant
partition. When I check iostat high utilization is reported for the
devices dm-4 and sda (showing nearly the same numbers for util always),
so I suspect that dm-4 is mapped on sda.

This is the current config:
$ cat /sys/block/sda/queue/scheduler
noop anticipatory deadline [cfq]
$ cat /sys/block/dm-4/queue/scheduler
none

Which of them should be changed?
I'll discuss this also with our hosting provider next week, he'll know
what has to be done.

Cheers,
Martin


Attachment

Re: Several optimization options (config/hardware)

From
"Albe Laurenz"
Date:
Martin Grotzke wrote:
>> You could try different kernel I/O elevators and see if that improves
>> something.
>>
>> I have made good experiences with elevator=deadline and
elevator=noop.

> Ok, great info.
>
> I'm not sure at which device to look honestly to check the current
> configuration.
>
> mount/fstab shows the device /dev/mapper/VG01-www for the relevant
> partition. When I check iostat high utilization is reported for the
> devices dm-4 and sda (showing nearly the same numbers for util
always),
> so I suspect that dm-4 is mapped on sda.

Use the option -N of "iostat" to see long device names.
You can use "lvm" to figure out the mapping.

> This is the current config:
> $ cat /sys/block/sda/queue/scheduler
> noop anticipatory deadline [cfq]
> $ cat /sys/block/dm-4/queue/scheduler
> none

Do you mean literal "none" or do you mean that the file is empty?

> Which of them should be changed?
> I'll discuss this also with our hosting provider next week, he'll know
> what has to be done.

I'd just add "elevator=deadline" to the kernel line in /etc/grub.conf
and reboot.  At least if it is a dedicated database machine.

But of course you want to change it on the fly first to test - not
knowing
the answer to your question, I would change it in both devices if I can.

Yours,
Laurenz Albe

Re: Several optimization options (config/hardware)

From
Martin Grotzke
Date:
On 05/04/2012 09:57 AM, Albe Laurenz wrote:
> Martin Grotzke wrote:
>>> You could try different kernel I/O elevators and see if that
>>> improves something.
>>>
>>> I have made good experiences with elevator=deadline and
>>> elevator=noop.
>
>> Ok, great info.
>>
>> I'm not sure at which device to look honestly to check the current
>> configuration.
>>
>> mount/fstab shows the device /dev/mapper/VG01-www for the relevant
>> partition. When I check iostat high utilization is reported for
>> the devices dm-4 and sda (showing nearly the same numbers for util
>> always),
>> so I suspect that dm-4 is mapped on sda.
>
> Use the option -N of "iostat" to see long device names. You can use
> "lvm" to figure out the mapping.
iostat with -N shows VG01-www for dm-4. For lvm/lvdisplay/dmsetup I get
"Permission denied" as I have no root/sudo permissions. I need to check
this with our hosting provider (hopefully we have a call today).


>> This is the current config: $ cat /sys/block/sda/queue/scheduler
>> noop anticipatory deadline [cfq] $ cat
>> /sys/block/dm-4/queue/scheduler none
>
> Do you mean literal "none" or do you mean that the file is empty?
"none" was the output of `cat /sys/block/dm-4/queue/scheduler`.


>> Which of them should be changed? I'll discuss this also with our
>> hosting provider next week, he'll know what has to be done.
>
> I'd just add "elevator=deadline" to the kernel line in
> /etc/grub.conf and reboot.  At least if it is a dedicated database
> machine.
>
> But of course you want to change it on the fly first to test - not
> knowing the answer to your question, I would change it in both
> devices if I can.
Ok, makes sense.

Cheers,
Martin


Attachment