Thread: High volume inserts - more disks or more CPUs?
Seeking advice on system configuration (and I have read the techdocs.) We are converting a data collection system from Oracle to PostgreSQL 8.0. We are currently getting about 64 million rows per month; data is put into a new table each month. The number of simultaneous connections is very small: one that does all these inserts, and < 5 others that read. We trying to identify a server for this. Options are a 4-way Opteron with 4 SCSI disks, or a 2-way Opteron with 6 SCSI disks. The 4-CPU box currently has 16 GB of memory and the 2-CPU 4 GB, but we can move that memory around as necessary. (1) Would we be better off with more CPUs and fewer disks or fewer CPUs and more disks? (2) The techdocs suggest starting with 10% of available memory for shared buffers, which would be 1.6 GB on the 4-way. But I've seen posts here saying that anything more than 10,000 shared buffers (80 MB) provides little or no improvement. Where should we start? (3) If we go with more disks, should we attempt to split tables and indexes onto different drives (i.e., tablespaces), or just put all the disks in hardware RAID5 and use a single tablespace? I appreciate all suggestions. -- Guy Rouillier
Guy Rouillier wrote: > Seeking advice on system configuration (and I have read the techdocs.) Probably worth reading the archives for the "performance" list. > We are converting a data collection system from Oracle to PostgreSQL > 8.0. We are currently getting about 64 million rows per month; data is > put into a new table each month. The number of simultaneous connections > is very small: one that does all these inserts, and < 5 others that > read. > > We trying to identify a server for this. Options are a 4-way Opteron > with 4 SCSI disks, or a 2-way Opteron with 6 SCSI disks. The 4-CPU box > currently has 16 GB of memory and the 2-CPU 4 GB, but we can move that > memory around as necessary. > > (1) Would we be better off with more CPUs and fewer disks or fewer CPUs > and more disks? Usually, more disks. Obviously, you'll want to test your particular setup, but lots of RAM and lots of disk are generally more important than CPU. > (2) The techdocs suggest starting with 10% of available memory for > shared buffers, which would be 1.6 GB on the 4-way. But I've seen posts > here saying that anything more than 10,000 shared buffers (80 MB) > provides little or no improvement. Where should we start? Start at the "performance tuning" document below: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php > (3) If we go with more disks, should we attempt to split tables and > indexes onto different drives (i.e., tablespaces), or just put all the > disks in hardware RAID5 and use a single tablespace? Check the performance list archive for lots of discussion about this. You might want to put the WAL on separate disk(s) which will reduce the number available for storage. It depends on what your peak write-rate is. -- Richard Huxton Archonet Ltd
Hi! > -----Ursprüngliche Nachricht----- > Von: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] Im Auftrag von > Guy Rouillier > Gesendet: Montag, 13. Dezember 2004 07:17 > An: PostgreSQL General > Betreff: [GENERAL] High volume inserts - more disks or more CPUs? > (1) Would we be better off with more CPUs and fewer disks or > fewer CPUs and more disks? From my experience, it's generally a good idea to have as many disks as possible - CPU is secondary. Having enough RAM sothat at least the frequently accessed parts of your db data including the indexes fit completely into memory is also agood idea. > (3) If we go with more disks, should we attempt to split > tables and indexes onto different drives (i.e., tablespaces), > or just put all the disks in hardware RAID5 and use a single > tablespace? RAID5 is not an optimum choice for a database; switch to RAID0+1 if you can afford the disk space lost - this yields muchbetter insert performance than RAID5, as there's no parity calculation involved. There's another performance gain tobe achieved by moving the WAL-files to another RAID-set than the database files; splitting tablespaces across RAID-setsusually won't do much for you in terms of performance, but might be convenient when you think about scaling in size. Kind regards Markus
--=======174A76C======= Content-Type: text/plain; x-avg-checked=avg-ok-15674D89; charset=us-ascii; format=flowed Content-Transfer-Encoding: 8bit At 12:16 AM 12/13/2004 -0600, Guy Rouillier wrote: >(3) If we go with more disks, should we attempt to split tables and >indexes onto different drives (i.e., tablespaces), or just put all the >disks in hardware RAID5 and use a single tablespace? Fast inserts = fast writes. RAID5 = slower writes. You may wish to consider mirroring and striping aka RAID10... With a 4 disk array, reads could be 4X faster and writes could be 2X faster compared to single disk (assuming decent RAID controllers or RAID software). For the same number of disks, RAID5 would be slower than RAID10 but RAID5 will have more storage capacity. RAID10 would have slightly better redundancy - if 2 out of 4 of the right disks fail, you could still have all your data :). If the insert performance is more important then go for more disks over more CPU. If the read queries are more important than the insert performance AND the queries are likely to fit within RAM, then more CPU could be better. If you're not doing lots of fancy queries and the queries don't fit in 16GB, then go for the 50% more disks (6 vs 4). But I'm not a DB consultant ;). Regards, Link. --=======174A76C=======--