Re: Configuration Recommendations - Mailing list pgsql-performance

From Shaun Thomas
Subject Re: Configuration Recommendations
Date
Msg-id 4F96FFE1.2030809@peak6.com
Whole thread Raw
In response to Configuration Recommendations  (Jan Nielsen <jan.sture.nielsen@gmail.com>)
Responses Re: Configuration Recommendations  (Jan Nielsen <jan.sture.nielsen@gmail.com>)
Re: Configuration Recommendations  (John Lister <john.lister@kickstone.co.uk>)
List pgsql-performance
On 04/23/2012 09:56 PM, Jan Nielsen wrote:

> The new hardware for the 50GB PG 9.0 machine is:
> * 24 cores across 2 sockets
> * 64 GB RAM
> * 10 x 15k SAS drives on SAN
> * 1 x 15k SAS drive local
> * CentOS 6.2 (2.6.32 kernel)

This is a pretty good build. Nice and middle-of-the-road for current
hardware. I think it's probably relevant what your "24 cores across 2
sockets" are, though. Then again, based on the 24-cores, I have to
assume you've got hex-core Xeons of some sort, with hyperthreading. That
suggests a higher end Sandy Bridge Xeon, like the X5645 or higher. If
that's the case, you're in good hands.

As a note, though... make sure you enable Turbo and other performance
settings (disable power-down of unused CPUs, etc) in the BIOS when
setting this up. We found that the defaults for the CPUs did not allow
processor scaling, and it was far too aggressive in cycling down cores,
such that cycling them back up had a non-zero cost. We saw roughly a 20%
improvement by forcing the CPUs into full online performance mode.

> We are considering the following drive allocations:
>
> * 4 x 15k SAS drives, XFS, RAID 10 on SAN for PG data
> * 4 x 15k SAS drives, XFS, RAID 10 on SAN for PG indexes
> * 2 x 15k SAS drives, XFS, RAID 1 on SAN for PG xlog
> * 1 x 15k SAS drive, XFS, on local storage for OS

Please don't do this. If you have the system you just described, give
yourself an 8x RAID10, and the 2x RAID1. I've found that your indexes
will generally be about 1/3 to 1/2 the total sixe of your database. So,
not only does your data partition lose read spindles, but you've wasted
1/2 to 2/3s of your active drive space. This may not be a concern based
on your data growth curves, but it could be.

In addition, add another OS drive and put it into a RAID-1. If you have
server-class hardware, you'll want that extra drive. I'm frankly
surprised you were even able to acquire a dual Xeon class server without
a RAID-1 for OS data by default.

I'm not sure if you've done metrics or not, but XFS performance is
highly dependent on your init and mount options. I can give you some
guidelines there, but one of the major changes is that the Linux 3.X
kernels have some impressive performance improvements you won't see
using CentOS 6.2. Metadata in particular has undergone a massive upgrade
that drastically enhances its parallel scalability on metadata
modifications.

If possible, you might consider the new Ubuntu 12.04 LTS that's coming
out soon. It should have the newer XFS performance. If not, consider
injecting a newer kernel to the CentOS 6.2 install. And again, testing
is the only way to know for sure.

And test with pgbench, if possible. I used this to get our XFS init and
mount options, along with other OS/kernel settings. You can have very
different performance metrics from dd/bonnie than an actual use pattern
from real DB usage. As a hint, before you run any of these tests, both
write a '3' to /proc/sys/vm/drop_caches, and restart your PG instance.
You want to test your drives, not your memory. :)

> kernel.shmall = 4,294,967,296 (commas added for clarity)
> kernel.shmax = 68,719,476,736 (commas added for clarity)
> kernel.sem = 250 32000 32 128
> vm.swappiness = 0
> dirty_ratio = 10
> dirty_background_ratio = 5

Good. Though you might consider lowering dirty_background_ratio. At that
setting, it won't even try to write out data until you have about 3GB of
dirty pages. Even high-end disk controllers only have 1GB of local
capacitor-backed cache. If you really do have a good SAN, it probably
has more than that, but try to induce a high-turnover database test to
see what happens during heavy IO. Like, a heavy long-running PG-bench
should invoke several checkpoints and also flood the local write cache.
When that happens, monitor /proc/meminfo. Like this:

grep -A1 Dirty /proc/meminfo

That will tell you how much of your memory is dirty, but the 'Writeback'
entry is what you care about. If you see that as a non-zero value for
more than one consecutive check, you've saturated your write bandwidth
to the point performance will suffer. But the only way you can really
know any of this is with testing. Some SANs scale incredibly well to
large pool flushes, and others don't.

Also, make iostat your friend. Particularly with the -x option. During
your testing, keep one of these running in the background for the
devices on your SAN. Watch your %util column in particular. Graph it, if
you can. You can almost build a complete performance profile for
different workloads before you put a single byte of real data on this
hardware.

> If there are "obviously correct" choices in PG configuration, this would
> be tremendously helpful information to me. I'm planning on using pgbench
> to test the configuration options.

You sound like you've read up on this quite a bit. Greg's book is a very
good thing to have and learn from. It'll cover all the basics about the
postgresql.conf file. I don't see how I could add much to that, so just
pay attention to what he says. :)

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@peak6.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

pgsql-performance by date:

Previous
From: Robert Klemme
Date:
Subject: Re: Configuration Recommendations
Next
From: Jan Nielsen
Date:
Subject: Re: Configuration Recommendations