Re: SAN vs Internal Disks - Mailing list pgsql-performance
From | Joshua D. Drake |
---|---|
Subject | Re: SAN vs Internal Disks |
Date | |
Msg-id | 46E034F6.6080604@commandprompt.com Whole thread Raw |
In response to | Re: SAN vs Internal Disks ("Harsh Azad" <harsh.azad@gmail.com>) |
List | pgsql-performance |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Harsh Azad wrote: > Thanks Scott, we have now requested IBM/EMC to provide test machines. > Interestingly since you mentioned the importance of Raid controllers and the > drivers; we are planning to use Cent OS 5 for hosting the DB. > > Firstly, I could only find postgres 8.1.x RPM for CentOS 5, could not find > any RPM for 8.2.4. Is there any 8.2.4 RPM for CentOS 5? Look under the RHEL section of ftp.postgresql.org Joshua D. Drake > > Secondly, would investing into Redhat enterprise edition give any > performance advantage? I know all the SAN boxes are only certified on RHEL > and not CentOS. Or since CentOS is similar to RHEL it would be fine? > > Regards, > Harsh > > On 9/6/07, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> On 9/6/07, Harsh Azad <harsh.azad@gmail.com> wrote: >>> Hi, >>> >>> We are currently running our DB on a DualCore, Dual Proc 3.Ghz Xeon, 8GB >>> RAM, 4x SAS 146 GB 15K RPM on RAID 5. >>> >>> The current data size is about 50GB, but we want to purchase the >> hardware to >>> scale to about 1TB as we think our business will need to support that >> much >>> soon. >>> - Currently we have a 80% read and 20% write percentages. >> For this type load, you should be running on RAID10 not RAID5. Or, if >> you must use RAID 5, use more disks and have a battery backed caching >> RAID controller known to perform well with RAID5 and large arrays. >> >>> - Currently with this configuration the Database is showing signs of >>> over-loading. >> On I/O or CPU? If you're running out of CPU, then look to increasing >> CPU horsepower and tuning postgresql. >> If I/O then you need to look into a faster I/O subsystem. >> >>> - Auto-vaccum, etc run on this database, vaccum full runs nightly. >> Generally speaking, if you need to run vacuum fulls, you're doing >> something wrong. Is there a reason you're running vacuum full or is >> this just precautionary. vacuum full can bloat your indexes, so you >> shouldn't run it regularly. reindexing might be a better choice if >> you do need to regularly shrink your db. The better option is to >> monitor your fsm usage and adjust fsm settings / autovacuum settings >> as necessary. >> >>> - Currently CPU loads are about 20%, memory utilization is full (but >> this >>> is also due to linux caching disk blocks) and IO waits are frequent. >>> - We have a load of about 400 queries per second >> What does vmstat et. al. say about CPU versus I/O wait? >> >>> Now we are considering to purchase our own servers and in the process >> are >>> facing the usual dilemmas. First I'll list out what machine we have >> decided >>> to use: >>> 2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now) >>> 32 GB RAM >>> OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1 >>> (Data Storage mentioned below) >>> >>> We have already decided to split our database into 3 machines on the >> basis >>> on disjoint sets of data. So we will be purchasing three of these boxes. >>> >>> HELP 1: Does something look wrong with above configuration, I know there >>> will be small differences b/w opetron/xeon. But do you think there is >>> something against going for 2.4Ghz Quad Xeons (clovertown i think)? >> Look like good machines, plenty fo memory. >> >>> HELP 2: The main confusion is with regards to Data Storage. We have the >>> option of going for: >>> >>> A: IBM N-3700 SAN Box, having 12x FC 300GB disks, Partitioned into 3 >> disks >>> into RAID-4 for WAL/backup, and 9 disks on RAID-DP for data, 2 hot >> spare. We >>> are also considering similar solution from EMC - CX310C. >>> >>> B: Go for Internal of DAS based storage. Here for each server we should >> be >>> able to have: 2x disks on RAID-1 for logs, 6x disks on RAID-10 for >>> tablespace1 and 6x disks on RAID-10 for tablespace2. Or maybe 12x disks >> on >>> RAID-10 single table-space. >>> >>> What do I think? Well.. >>> SAN wins on manageability, replication (say to a DR site), backup, >> etc... >>> DAS wins on cost >> The problem with SAN is that it's apparently very easy to build a big >> expensive system that performs poorly. We've seen reports of such >> here on the lists a few times. I would definitely demand an >> evaluation period from your supplier to make sure it performs well if >> you go SAN. >> >>> But for a moment keeping these aside, i wanted to discuss, purely on >>> performance side which one is a winner? It feels like internal-disks >> will >>> perform better, but need to understand a rough magnitude of difference >> in >>> performance to see if its worth loosing the manageability features. >> That really really really depends. The quality of RAID controllers >> for either setup is very important, as is the driver support, etc... >> All things being even, I'd lean towards the local storage. >> >>> Also if we choose to go with DAS, what would be the best tool to do >> async >>> replication to DR site and maybe even as a extra plus a second read-only >> DB >>> server to distribute select loads. >> Look at slony, or PITR with continuous recovery. Of those two, I've >> only used Slony in production, and I was very happy with it's >> performance, and it was very easy to write a bash script to monitor >> the replication for failures. >> > > > - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG4DT2ATb/zqfZUUQRAoppAJ9Pj+/nDtDd/XhzMdRkjXcGHHuaeACfRTfV wE8+ErUXuVnXmlchYvCPgu8= =TihW -----END PGP SIGNATURE-----
pgsql-performance by date: