Thread: Peformance Tuning Opterons/ Hard Disk Layout
Hello Again In the below statement you mention putting each database on its own raid mirror. "However, sticking with your arrangement, it would seem that you might be able to get some extra performance if each database is on it's own raid, since you are fairly likely to have 2 transactions occuring at the same time, that don't affect eachother (since you wouldn't have any foreign keys, etc on 2 separate databases.)" That would take alot of disk drives to accomplish. I was thinking maybe putting three or four databases on each raid and dividing the heaviest used databases on each mirrored set. And for each of these sets have its own mirror for pg_xlog. My question is what is the best way to setup postgres databases on different disks. I have setup multiple postmasters on this system as a test. The only problem was configuring each databases "ie postgresql.conf, pg_hba.conf". Is there anyway in postgres to have everything in one cluster and have it seperated onto multiple drives. Here is a example of what is was thinking about. MIRROR1 - Database Group 1 MIRROR2 - pg_xlog for database group 1 MIRROR3 - Database Group 2 MIRROR4 - pg_xlog for database group 2 MIRROR5 - Database Group 3 MIRROR6 - pg_xlog for database group 3 This will take about 12 disk drives. I have a 14 bay Storage Bay I can use two of the drives for hotspare's. Thanks John Allgood - ESC Systems Administrator
I am no expert, but have been asking them a bunch and I think your missing a key concept. The data is best on several drives. I could be completely off, but if I understood (I just finished doing the same kind of thing minus several databases) you want your WAL on fast drives in raid 1 and your data (as many drives as you can use) on raid 10 (can be slower drives , but I saw you already have a bunch of 15k drives). So you may get best performance just using one database rather then several smaller ones on mirrored data drives. Keep in mind if you go with ES4 (I am using AS4) and postgres 8 you can add spindles and move hard hit tables to their own spindle. Again I am no expert; just thought I would echo what I was informed. I ended up using 2 15k drives in raid 1 for my WAL and 4 10k drives for my data in raid 10. I ended up using links to these from the original install of postgres on the raid 5, 4 15k drives inside the server itself. I believe this gives me three separate raid arrays for my install with logs and such on the raid 5, data on the raid 10 and wal on the raid 1. I am in the testing and conversion phase and have found it very fast. I used a 4 processor Dell 6550, but think from what I have been told your computer would have been a better choice (CPU wise). I am not using fibre but do have a 14 drive powervault which I split to have the 15k's on one side and the 10k's on the other. So I am using both channels of the controller. I have been told for me to get best performance I should add as many 10k drives to my data array as I can (but this was all I had in my budget). I have room for 3 more drives on that side of the powervault. Best of luck on your project. Joel
John Allgood wrote: > Hello Again > > In the below statement you mention putting each database on its own > raid mirror. > > "However, sticking with your arrangement, it would seem that you might be > able to get some extra performance if each database is on it's own raid, > since you are fairly likely to have 2 transactions occuring at the same > time, that don't affect eachother (since you wouldn't have any foreign > keys, etc on 2 separate databases.)" > > That would take alot of disk drives to accomplish. I was thinking > maybe putting three or four databases on each raid and dividing the > heaviest used databases on each mirrored set. And for each of these > sets have its own mirror for pg_xlog. My question is what is the best > way to setup postgres databases on different disks. I have setup > multiple postmasters on this system as a test. The only problem was > configuring each databases "ie postgresql.conf, pg_hba.conf". Is > there anyway in postgres to have everything in one cluster and have it > seperated onto multiple drives. Here is a example of what is was > thinking about. > I think this is something that you would have to try and see what works. My first feeling is that 8-disks in RAID10 is better than 4 sets of RAID1. > MIRROR1 - Database Group 1 > MIRROR2 - pg_xlog for database group 1 > MIRROR3 - Database Group 2 > MIRROR4 - pg_xlog for database group 2 > MIRROR5 - Database Group 3 > MIRROR6 - pg_xlog for database group 3 > > This will take about 12 disk drives. I have a 14 bay Storage Bay I can > use two of the drives for hotspare's. > I would have all of them in 1 database cluster, which means they are all served by the same postgres daemon. Which I believe means that they all use the same pg_xlog. That means you only need 1 raid for pg_xlog, though I would make it a 4-drive RAID10. (RAID1 is redundant, but actually slower on writes, you need the 0 to speed up reading/writing, I could be wrong). I believe you can still split each database onto it's own raid later on if you find that you need to. So this is my proposal 1: OS RAID (sounds like this is not in the Storage Bay). 4-drives RAID10 pg_xlog 8-drives RAID10 database cluster 2-drives Hot spares / RAID1 If you feel like you want to partition your databases, you could also do proposal 2: 4-drives RAID10 pg_xlog 4-drives RAID10 databases master + 1-4 4-drives RAID10 databases 5-9 2-drives hotspare / RAID1 If you think partitioning is better than striping, you could do proposal 3: 4-drives RAID10 pg_xlog 2-drives RAID1 master database 2-drives RAID1 databases 1,2,3 2-drives RAID1 databases 4,5 2-drives RAID1 databases 6,7 2-drives RAID1 databases 8,9 There are certainly a lot of potential arrangements here, and it's not like I've tried a lot of them. pg_xlog seems like a big enough bottleneck that it would be good to put it on it's own RAID10, to make it as fast as possible. It also depends a lot on whether you will be write heavy/read heavy, etc. RAID5 works quite well for reading, very poor for writing. But if the only reason to have the master database is to perform read heavy queries, and all the writing is done at night in bulk fashion with careful tuning to avoid saturation, then maybe you would want to put the master database on a RAID5 so that you can get extra disk space. You could do proposal 4: 4-drive RAID10 pg_xlog 4-drive RAID5 master db 2-drive RAID1 dbs 1-3 2-drive RAID1 dbs 4-6 2-drive RAID1 dbs 7-9 You might also do some testing and find that pg_xlog doesn't deserve it's own 4 disks, and they would be better off in the bulk tables. Unfortunately a lot of this would come down to performance testing on your dataset, with a real data load. Which isn't very easy to do. I personally like the simplicity of proposal 1. John =:-> > > Thanks > > John Allgood - ESC > Systems Administrator
Attachment
-huh- A lot of High-Tech ideas. But there is another way: Simply measure the current IO load (pro DB if you must), make an estimation how it could change in the future (max. 3 years) and make a worst case scenario. Than you should make a new array each time the worst case scenario hits the IO bottleneck of your config. (I mean the random read/write bandwith of a raid array) than make so many raid arrays you need. It's just that simple. :-))) You should/must redesign it in every 3 years, that's for sure. Vig Sándor -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of John Arbash Meinel Sent: Thursday, February 24, 2005 8:41 PM To: John Allgood Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout John Allgood wrote: > Hello Again > > In the below statement you mention putting each database on its own > raid mirror. > > "However, sticking with your arrangement, it would seem that you might be > able to get some extra performance if each database is on it's own raid, > since you are fairly likely to have 2 transactions occuring at the same > time, that don't affect eachother (since you wouldn't have any foreign > keys, etc on 2 separate databases.)" > > That would take alot of disk drives to accomplish. I was thinking > maybe putting three or four databases on each raid and dividing the > heaviest used databases on each mirrored set. And for each of these > sets have its own mirror for pg_xlog. My question is what is the best > way to setup postgres databases on different disks. I have setup > multiple postmasters on this system as a test. The only problem was > configuring each databases "ie postgresql.conf, pg_hba.conf". Is > there anyway in postgres to have everything in one cluster and have it > seperated onto multiple drives. Here is a example of what is was > thinking about. > I think this is something that you would have to try and see what works. My first feeling is that 8-disks in RAID10 is better than 4 sets of RAID1. > MIRROR1 - Database Group 1 > MIRROR2 - pg_xlog for database group 1 > MIRROR3 - Database Group 2 > MIRROR4 - pg_xlog for database group 2 > MIRROR5 - Database Group 3 > MIRROR6 - pg_xlog for database group 3 > > This will take about 12 disk drives. I have a 14 bay Storage Bay I can > use two of the drives for hotspare's. > I would have all of them in 1 database cluster, which means they are all served by the same postgres daemon. Which I believe means that they all use the same pg_xlog. That means you only need 1 raid for pg_xlog, though I would make it a 4-drive RAID10. (RAID1 is redundant, but actually slower on writes, you need the 0 to speed up reading/writing, I could be wrong). I believe you can still split each database onto it's own raid later on if you find that you need to. So this is my proposal 1: OS RAID (sounds like this is not in the Storage Bay). 4-drives RAID10 pg_xlog 8-drives RAID10 database cluster 2-drives Hot spares / RAID1 If you feel like you want to partition your databases, you could also do proposal 2: 4-drives RAID10 pg_xlog 4-drives RAID10 databases master + 1-4 4-drives RAID10 databases 5-9 2-drives hotspare / RAID1 If you think partitioning is better than striping, you could do proposal 3: 4-drives RAID10 pg_xlog 2-drives RAID1 master database 2-drives RAID1 databases 1,2,3 2-drives RAID1 databases 4,5 2-drives RAID1 databases 6,7 2-drives RAID1 databases 8,9 There are certainly a lot of potential arrangements here, and it's not like I've tried a lot of them. pg_xlog seems like a big enough bottleneck that it would be good to put it on it's own RAID10, to make it as fast as possible. It also depends a lot on whether you will be write heavy/read heavy, etc. RAID5 works quite well for reading, very poor for writing. But if the only reason to have the master database is to perform read heavy queries, and all the writing is done at night in bulk fashion with careful tuning to avoid saturation, then maybe you would want to put the master database on a RAID5 so that you can get extra disk space. You could do proposal 4: 4-drive RAID10 pg_xlog 4-drive RAID5 master db 2-drive RAID1 dbs 1-3 2-drive RAID1 dbs 4-6 2-drive RAID1 dbs 7-9 You might also do some testing and find that pg_xlog doesn't deserve it's own 4 disks, and they would be better off in the bulk tables. Unfortunately a lot of this would come down to performance testing on your dataset, with a real data load. Which isn't very easy to do. I personally like the simplicity of proposal 1. John =:-> > > Thanks > > John Allgood - ESC > Systems Administrator The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.
Hi, John, John Allgood schrieb: > My question is what is the best way to setup > postgres databases on different disks. I have setup multiple postmasters > on this system as a test. The only problem was configuring each > databases "ie postgresql.conf, pg_hba.conf". Is there anyway in > postgres to have everything in one cluster and have it seperated onto > multiple drives. Using PostgreSQL 8.0, the newly introduced "tablespaces" solve all this: http://www.postgresql.org/docs/8.0/interactive/manage-ag-tablespaces.html Using PostgreSQL 7.4, you can relatively easy create single databases on different drives. However, separating out single tables or indices involves some black symlink magic. See google and http://www.postgresql.org/docs/7.4/interactive/manage-ag-alternate-locs.html HTH, Markus -- markus schaber | dipl. informatiker logi-track ag | rennweg 14-16 | ch 8001 zürich phone +41-43-888 62 52 | fax +41-43-888 62 53 mailto:schabios@logi-track.com | www.logi-track.com