Thread: Peformance Tuning Opterons/ Hard Disk Layout

Peformance Tuning Opterons/ Hard Disk Layout

From
John Allgood
Date:
Hello Again

In the below statement you mention putting each database on its own raid
mirror.

"However, sticking with your arrangement, it would seem that you might be
able to get some extra performance if each database is on it's own raid,
since you are fairly likely to have 2 transactions occuring at the same
time, that don't affect eachother (since you wouldn't have any foreign
keys, etc on 2 separate databases.)"

That would take alot of disk drives to accomplish. I was thinking maybe
putting three or four databases on each raid and dividing the heaviest
used databases on each mirrored set. And for each of these sets have its
own mirror for pg_xlog. My question is what is the best way to setup
postgres databases on different disks. I have setup multiple postmasters
on this system as a test. The only problem was configuring each
databases "ie postgresql.conf, pg_hba.conf".  Is there anyway in
postgres to have everything in one cluster and have it seperated onto
multiple drives. Here is a example of what is was thinking about.

MIRROR1 - Database Group 1
MIRROR2 - pg_xlog for database group 1
MIRROR3 - Database Group 2
MIRROR4 - pg_xlog for database group 2
MIRROR5 - Database Group 3
MIRROR6 - pg_xlog for database group 3

This will take about 12 disk drives. I have a 14 bay Storage Bay I can
use two of the drives for hotspare's.




Thanks

John Allgood - ESC
Systems Administrator

Re: Peformance Tuning Opterons/ Hard Disk Layout

From
"Joel Fradkin"
Date:
I am no expert, but have been asking them a bunch and I think your missing a
key concept.

The data is best on several drives.
I could be completely off, but if I understood (I just finished doing the
same kind of thing minus several databases) you want your WAL on fast drives
in raid 1 and your data (as many drives as you can use) on raid 10 (can be
slower drives , but I saw you already have a bunch of 15k drives).
So you may get best performance just using one database rather then several
smaller ones on mirrored data drives. Keep in mind if you go with ES4 (I am
using AS4) and postgres 8 you can add spindles and move hard hit tables to
their own spindle.

Again I am no expert; just thought I would echo what I was informed.
I ended up using 2 15k drives in raid 1 for my WAL and 4 10k drives for my
data in raid 10. I ended up using links to these from the original install
of postgres on the raid 5, 4 15k drives inside the server itself. I believe
this gives me three separate raid arrays for my install with logs and such
on the raid 5, data on the raid 10 and wal on the raid 1. I am in the
testing and conversion phase and have found it very fast. I used a 4
processor Dell 6550, but think from what I have been told your computer
would have been a better choice (CPU wise). I am not using fibre but do have
a 14 drive powervault which I split to have the 15k's on one side and the
10k's on the other. So I am using both channels of the controller. I have
been told for me to get best performance I should add as many 10k drives to
my data array as I can (but this was all I had in my budget). I have room
for 3 more drives on that side of the powervault.

Best of luck on your project.

Joel


Re: Peformance Tuning Opterons/ Hard Disk Layout

From
John Arbash Meinel
Date:
John Allgood wrote:

> Hello Again
>
> In the below statement you mention putting each database on its own
> raid mirror.
>
> "However, sticking with your arrangement, it would seem that you might be
> able to get some extra performance if each database is on it's own raid,
> since you are fairly likely to have 2 transactions occuring at the same
> time, that don't affect eachother (since you wouldn't have any foreign
> keys, etc on 2 separate databases.)"
>
> That would take alot of disk drives to accomplish. I was thinking
> maybe putting three or four databases on each raid and dividing the
> heaviest used databases on each mirrored set. And for each of these
> sets have its own mirror for pg_xlog. My question is what is the best
> way to setup postgres databases on different disks. I have setup
> multiple postmasters on this system as a test. The only problem was
> configuring each databases "ie postgresql.conf, pg_hba.conf".  Is
> there anyway in postgres to have everything in one cluster and have it
> seperated onto multiple drives. Here is a example of what is was
> thinking about.
>
I think this is something that you would have to try and see what works.
My first feeling is that 8-disks in RAID10 is better than 4 sets of RAID1.

> MIRROR1 - Database Group 1
> MIRROR2 - pg_xlog for database group 1
> MIRROR3 - Database Group 2
> MIRROR4 - pg_xlog for database group 2
> MIRROR5 - Database Group 3
> MIRROR6 - pg_xlog for database group 3
>
> This will take about 12 disk drives. I have a 14 bay Storage Bay I can
> use two of the drives for hotspare's.
>
I would have all of them in 1 database cluster, which means they are all
served by the same postgres daemon. Which I believe means that they all
use the same pg_xlog. That means you only need 1 raid for pg_xlog,
though I would make it a 4-drive RAID10. (RAID1 is redundant, but
actually slower on writes, you need the 0 to speed up reading/writing, I
could be wrong).

I believe you can still split each database onto it's own raid later on
if you find that you need to.

So this is my proposal 1:
OS RAID (sounds like this is not in the Storage Bay).
4-drives RAID10 pg_xlog
8-drives RAID10 database cluster
2-drives Hot spares / RAID1

If you feel like you want to partition your databases, you could also do
proposal 2:
4-drives RAID10 pg_xlog
4-drives RAID10 databases master + 1-4
4-drives RAID10 databases 5-9
2-drives hotspare / RAID1

If you think partitioning is better than striping, you could do proposal 3:
4-drives RAID10 pg_xlog
2-drives RAID1 master database
2-drives RAID1 databases 1,2,3
2-drives RAID1 databases 4,5
2-drives RAID1 databases 6,7
2-drives RAID1 databases 8,9

There are certainly a lot of potential arrangements here, and it's not
like I've tried a lot of them. pg_xlog seems like a big enough
bottleneck that it would be good to put it on it's own RAID10, to make
it as fast as possible.

It also depends a lot on whether you will be write heavy/read heavy,
etc. RAID5 works quite well for reading, very poor for writing. But if
the only reason to have the master database is to perform read heavy
queries, and all the writing is done at night in bulk fashion with
careful tuning to avoid saturation, then maybe you would want to put the
master database on a RAID5 so that you can get extra disk space.
You could do proposal 4:
4-drive RAID10 pg_xlog
4-drive RAID5 master db
2-drive RAID1 dbs 1-3
2-drive RAID1 dbs 4-6
2-drive RAID1 dbs 7-9

You might also do some testing and find that pg_xlog doesn't deserve
it's own 4 disks, and they would be better off in the bulk tables.

Unfortunately a lot of this would come down to performance testing on
your dataset, with a real data load. Which isn't very easy to do.
I personally like the simplicity of proposal 1.

John
=:->

>
> Thanks
>
> John Allgood - ESC
> Systems Administrator



Attachment

Re: Peformance Tuning Opterons/ Hard Disk Layout

From
"Vig, Sandor (G/FI-2)"
Date:
-huh-

A lot of High-Tech ideas. But there is another way:
Simply measure the current IO load (pro DB if you must),
make an estimation how it could change in the future
(max. 3 years) and make a worst case scenario.

Than you should make a new array each time the
worst case scenario hits the IO bottleneck of your
config. (I mean the random read/write bandwith of
a raid array) than make so many raid arrays you
need. It's just that simple. :-)))

You should/must redesign it in every 3 years, that's for sure.

Vig Sándor




-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org]On Behalf Of John Arbash
Meinel
Sent: Thursday, February 24, 2005 8:41 PM
To: John Allgood
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout


John Allgood wrote:

> Hello Again
>
> In the below statement you mention putting each database on its own
> raid mirror.
>
> "However, sticking with your arrangement, it would seem that you might be
> able to get some extra performance if each database is on it's own raid,
> since you are fairly likely to have 2 transactions occuring at the same
> time, that don't affect eachother (since you wouldn't have any foreign
> keys, etc on 2 separate databases.)"
>
> That would take alot of disk drives to accomplish. I was thinking
> maybe putting three or four databases on each raid and dividing the
> heaviest used databases on each mirrored set. And for each of these
> sets have its own mirror for pg_xlog. My question is what is the best
> way to setup postgres databases on different disks. I have setup
> multiple postmasters on this system as a test. The only problem was
> configuring each databases "ie postgresql.conf, pg_hba.conf".  Is
> there anyway in postgres to have everything in one cluster and have it
> seperated onto multiple drives. Here is a example of what is was
> thinking about.
>
I think this is something that you would have to try and see what works.
My first feeling is that 8-disks in RAID10 is better than 4 sets of RAID1.

> MIRROR1 - Database Group 1
> MIRROR2 - pg_xlog for database group 1
> MIRROR3 - Database Group 2
> MIRROR4 - pg_xlog for database group 2
> MIRROR5 - Database Group 3
> MIRROR6 - pg_xlog for database group 3
>
> This will take about 12 disk drives. I have a 14 bay Storage Bay I can
> use two of the drives for hotspare's.
>
I would have all of them in 1 database cluster, which means they are all
served by the same postgres daemon. Which I believe means that they all
use the same pg_xlog. That means you only need 1 raid for pg_xlog,
though I would make it a 4-drive RAID10. (RAID1 is redundant, but
actually slower on writes, you need the 0 to speed up reading/writing, I
could be wrong).

I believe you can still split each database onto it's own raid later on
if you find that you need to.

So this is my proposal 1:
OS RAID (sounds like this is not in the Storage Bay).
4-drives RAID10 pg_xlog
8-drives RAID10 database cluster
2-drives Hot spares / RAID1

If you feel like you want to partition your databases, you could also do
proposal 2:
4-drives RAID10 pg_xlog
4-drives RAID10 databases master + 1-4
4-drives RAID10 databases 5-9
2-drives hotspare / RAID1

If you think partitioning is better than striping, you could do proposal 3:
4-drives RAID10 pg_xlog
2-drives RAID1 master database
2-drives RAID1 databases 1,2,3
2-drives RAID1 databases 4,5
2-drives RAID1 databases 6,7
2-drives RAID1 databases 8,9

There are certainly a lot of potential arrangements here, and it's not
like I've tried a lot of them. pg_xlog seems like a big enough
bottleneck that it would be good to put it on it's own RAID10, to make
it as fast as possible.

It also depends a lot on whether you will be write heavy/read heavy,
etc. RAID5 works quite well for reading, very poor for writing. But if
the only reason to have the master database is to perform read heavy
queries, and all the writing is done at night in bulk fashion with
careful tuning to avoid saturation, then maybe you would want to put the
master database on a RAID5 so that you can get extra disk space.
You could do proposal 4:
4-drive RAID10 pg_xlog
4-drive RAID5 master db
2-drive RAID1 dbs 1-3
2-drive RAID1 dbs 4-6
2-drive RAID1 dbs 7-9

You might also do some testing and find that pg_xlog doesn't deserve
it's own 4 disks, and they would be better off in the bulk tables.

Unfortunately a lot of this would come down to performance testing on
your dataset, with a real data load. Which isn't very easy to do.
I personally like the simplicity of proposal 1.

John
=:->

>
> Thanks
>
> John Allgood - ESC
> Systems Administrator



The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

Re: Peformance Tuning Opterons/ Hard Disk Layout

From
Markus Schaber
Date:
Hi, John,

John Allgood schrieb:
> My question is what is the best way to setup
> postgres databases on different disks. I have setup multiple postmasters
> on this system as a test. The only problem was configuring each
> databases "ie postgresql.conf, pg_hba.conf".  Is there anyway in
> postgres to have everything in one cluster and have it seperated onto
> multiple drives.

Using PostgreSQL 8.0, the newly introduced "tablespaces" solve all this:
http://www.postgresql.org/docs/8.0/interactive/manage-ag-tablespaces.html

Using PostgreSQL 7.4, you can relatively easy create single databases on
different drives. However, separating out single tables or indices
involves some black symlink magic. See google and
http://www.postgresql.org/docs/7.4/interactive/manage-ag-alternate-locs.html

HTH,
Markus
--
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:schabios@logi-track.com | www.logi-track.com