Re: Hardware for PostgreSQL - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: Hardware for PostgreSQL
Date
Msg-id 4728FAA1.7060905@fuzzy.cz
Whole thread Raw
In response to Hardware for PostgreSQL  (Ketema <ketema@gmail.com>)
Responses Re: Hardware for PostgreSQL  (Ow Mun Heng <Ow.Mun.Heng@wdc.com>)
Re: Hardware for PostgreSQL  (Magnus Hagander <magnus@hagander.net>)
List pgsql-performance
> I understand query tuning and table design play a large role in
> performance, but taking that factor away
> and focusing on just hardware, what is the best hardware to get for Pg
> to work at the highest level
> (meaning speed at returning results)?

Depends heavily on the particular application, but most performance
problems were caused by I/O (some of them because of bad table or
application design, some of them by slow drives).

> How does pg utilize multiple processors?  The more the better?

Linux version uses processes, so it's able to use multiple processors.
(Not sure about Windows version, but I guess it uses threads.)

> Are queries spread across multiple processors?
> Is Pg 64 bit?
> If so what processors are recommended?

Hard to tell, as for example I've seen several benchmarks about Xeons
from Intel, half of them saying that's the right CPU for PostgreSQL, the
other half saying there are better CPUs. But as I've said before - in
most cases the performance problems are caused by slow drives - take
your money and put them in more RAM / better drives (SCSI).

> I read this : http://www.postgresql.org/files/documentation/books/aw_pgsql/hw_performance/node12.html
> POSTGRESQL uses a multi-process model, meaning each database
> connection has its own Unix process. Because of this, all multi-cpu
> operating systems can spread multiple database connections among the
> available CPUs. However, if only a single database connection is
> active, it can only use one CPU. POSTGRESQL does not use multi-
> threading to allow a single process to use multiple CPUs.
>
> Its pretty old (2003) but is it still accurate?  if this statement is
> accurate how would it affect connection pooling software like pg_pool?

Yes, it's quite accurate. But see this (plus the rest of the documents
in the "Docs" section on that site)

http://www.powerpostgresql.com/PerfList

> RAM?  The more the merrier right? Understanding shmmax and the pg
> config file parameters for shared mem has to be adjusted to use it.
> Disks?  standard Raid rules right?  1 for safety 5 for best mix of
> performance and safety?

Yes, the more RAM you can get, the better the performance (usually). The
problem is you've forgotten to mention the size of the database and the
usage patterns. If the whole database fits into the RAM, the performance
can only increase in most cases.

In case of clients that mostly write data into the database, the amount
of RAM won't help too much as the data need to be written to the disk
anyway (unless you turn off 'fsync' which is a really stupid thing to do
in case of important data).

RAID - a quite difficult question and I'm not quite a master in this
field, so I'll just quote some simple truths from the article mentioned
above:

1) more spindles == better

    So buy multiple small disks rather than one large one, and spread the
    reads / writes across all of them using RAID 0, tablespaces or
    partitioning.

2) separate the transaction log from the database

    It's mostly written, and it's the most valuable data you have. And in
    case you use PITR, this is the only thing that really needs to be
    backed up.

3) RAID 0+1/1+0 > RAID 5

> Any preference of SCSI over SATA? What about using a High speed (fibre
> channel) mass storage device?

SCSI is definitely better than SATA - the SATA are consumer level
generally - the main criteria in it's development is capacity, and it
definitely can't compete with SCSI 10k drives when it comes to transfer
rates, seek times, CPU utilization, etc. (and that's what really matters
with databases). And you can even buy 15k SAS drives for reasonable
amount of money today ...

Tomas

pgsql-performance by date:

Previous
From: Pablo Alcaraz
Date:
Subject: Re: tables with 300+ partitions
Next
From: Ron St-Pierre
Date:
Subject: Re: hardware and For PostgreSQL