Thread: High volume inserts - more disks or more CPUs?

High volume inserts - more disks or more CPUs?

From
"Guy Rouillier"
Date:
Seeking advice on system configuration (and I have read the techdocs.)
We are converting a data collection system from Oracle to PostgreSQL
8.0.  We are currently getting about 64 million rows per month; data is
put into a new table each month.  The number of simultaneous connections
is very small: one that does all these inserts, and < 5 others that
read.

We trying to identify a server for this.  Options are a 4-way Opteron
with 4 SCSI disks, or a 2-way Opteron with 6 SCSI disks.  The 4-CPU box
currently has 16 GB of memory and the 2-CPU 4 GB, but we can move that
memory around as necessary.

(1) Would we be better off with more CPUs and fewer disks or fewer CPUs
and more disks?

(2) The techdocs suggest starting with 10% of available memory for
shared buffers, which would be 1.6 GB on the 4-way.  But I've seen posts
here saying that anything more than 10,000 shared buffers (80 MB)
provides little or no improvement.  Where should we start?

(3) If we go with more disks, should we attempt to split tables and
indexes onto different drives (i.e., tablespaces), or just put all the
disks in hardware RAID5 and use a single tablespace?

I appreciate all suggestions.

--
Guy Rouillier

Re: High volume inserts - more disks or more CPUs?

From
Richard Huxton
Date:
Guy Rouillier wrote:
> Seeking advice on system configuration (and I have read the techdocs.)

Probably worth reading the archives for the "performance" list.

> We are converting a data collection system from Oracle to PostgreSQL
> 8.0.  We are currently getting about 64 million rows per month; data is
> put into a new table each month.  The number of simultaneous connections
> is very small: one that does all these inserts, and < 5 others that
> read.
>
> We trying to identify a server for this.  Options are a 4-way Opteron
> with 4 SCSI disks, or a 2-way Opteron with 6 SCSI disks.  The 4-CPU box
> currently has 16 GB of memory and the 2-CPU 4 GB, but we can move that
> memory around as necessary.
>
> (1) Would we be better off with more CPUs and fewer disks or fewer CPUs
> and more disks?

Usually, more disks. Obviously, you'll want to test your particular
setup, but lots of RAM and lots of disk are generally more important
than CPU.

> (2) The techdocs suggest starting with 10% of available memory for
> shared buffers, which would be 1.6 GB on the 4-way.  But I've seen posts
> here saying that anything more than 10,000 shared buffers (80 MB)
> provides little or no improvement.  Where should we start?

Start at the "performance tuning" document below:
   http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php

> (3) If we go with more disks, should we attempt to split tables and
> indexes onto different drives (i.e., tablespaces), or just put all the
> disks in hardware RAID5 and use a single tablespace?

Check the performance list archive for lots of discussion about this.
You might want to put the WAL on separate disk(s) which will reduce the
number available for storage. It depends on what your peak write-rate is.

--
   Richard Huxton
   Archonet Ltd

Re: High volume inserts - more disks or more CPUs?

From
"Markus Wollny"
Date:
Hi!

> -----Ursprüngliche Nachricht-----
> Von: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] Im Auftrag von
> Guy Rouillier
> Gesendet: Montag, 13. Dezember 2004 07:17
> An: PostgreSQL General
> Betreff: [GENERAL] High volume inserts - more disks or more CPUs?

> (1) Would we be better off with more CPUs and fewer disks or
> fewer CPUs and more disks?

From my experience, it's generally a good idea to have as many disks as possible - CPU is secondary. Having enough RAM
sothat at least the frequently accessed parts of your db data including the indexes fit completely into memory is also
agood idea.  

> (3) If we go with more disks, should we attempt to split
> tables and indexes onto different drives (i.e., tablespaces),
> or just put all the disks in hardware RAID5 and use a single
> tablespace?

RAID5 is not an optimum choice for a database; switch to RAID0+1 if you can afford the disk space lost - this yields
muchbetter insert performance than RAID5, as there's no parity calculation involved. There's another performance gain
tobe achieved by moving the WAL-files to another RAID-set than the database files; splitting tablespaces across
RAID-setsusually won't do much for you in terms of performance, but might be convenient when you think about scaling in
size.

Kind regards

   Markus

Re: High volume inserts - more disks or more CPUs?

From
Lincoln Yeoh
Date:
--=======174A76C=======
Content-Type: text/plain; x-avg-checked=avg-ok-15674D89; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 8bit

At 12:16 AM 12/13/2004 -0600, Guy Rouillier wrote:

>(3) If we go with more disks, should we attempt to split tables and
>indexes onto different drives (i.e., tablespaces), or just put all the
>disks in hardware RAID5 and use a single tablespace?

Fast inserts = fast writes.

RAID5 = slower writes.

You may wish to consider mirroring and striping aka RAID10... With a 4 disk
array, reads could be 4X faster and writes could be 2X faster compared to
single disk (assuming decent RAID controllers or RAID software). For the
same number of disks, RAID5 would be slower than RAID10 but RAID5 will have
more storage capacity. RAID10 would have slightly better redundancy - if 2
out of 4 of the right disks fail, you could still have all your data :).

If the insert performance is more important then go for more disks over
more CPU.

If the read queries are more important than the insert performance AND the
queries are likely to fit within RAM, then more CPU could be better.  If
you're not doing lots of fancy queries and the queries don't fit in 16GB,
then go for the 50% more disks (6 vs 4).

But I'm not a DB consultant ;).

Regards,
Link.




--=======174A76C=======--