Thread: Hardware for PostgreSQL

Hardware for PostgreSQL

From
Ketema
Date:
I am trying to build a very Robust DB server that will support 1000+
concurrent users (all ready have seen max of 237 no pooling being
used).  I have read so many articles now that I am just saturated.  I
have a general idea but would like feedback from others.

I understand query tuning and table design play a large role in
performance, but taking that factor away
and focusing on just hardware, what is the best hardware to get for Pg
to work at the highest level
(meaning speed at returning results)?

How does pg utilize multiple processors?  The more the better?
Are queries spread across multiple processors?
Is Pg 64 bit?
If so what processors are recommended?

I read this : http://www.postgresql.org/files/documentation/books/aw_pgsql/hw_performance/node12.html
POSTGRESQL uses a multi-process model, meaning each database
connection has its own Unix process. Because of this, all multi-cpu
operating systems can spread multiple database connections among the
available CPUs. However, if only a single database connection is
active, it can only use one CPU. POSTGRESQL does not use multi-
threading to allow a single process to use multiple CPUs.

Its pretty old (2003) but is it still accurate?  if this statement is
accurate how would it affect connection pooling software like pg_pool?

RAM?  The more the merrier right? Understanding shmmax and the pg
config file parameters for shared mem has to be adjusted to use it.
Disks?  standard Raid rules right?  1 for safety 5 for best mix of
performance and safety?
Any preference of SCSI over SATA? What about using a High speed (fibre
channel) mass storage device?

Who has built the biggest baddest Pg server out there and what do you
use?

Thanks!


Re: Hardware for PostgreSQL

From
Magnus Hagander
Date:
Ketema wrote:
> I am trying to build a very Robust DB server that will support 1000+
> concurrent users (all ready have seen max of 237 no pooling being
> used).  I have read so many articles now that I am just saturated.  I
> have a general idea but would like feedback from others.
>
> I understand query tuning and table design play a large role in
> performance, but taking that factor away
> and focusing on just hardware, what is the best hardware to get for Pg
> to work at the highest level
> (meaning speed at returning results)?
>
> How does pg utilize multiple processors?  The more the better?

If you have many simultaneous queries, it will use more processors. If
you run just a single query at a time, it'll only use one CPU.

> Are queries spread across multiple processors?

No, not a single query. Max one CPU per query.


> Is Pg 64 bit?

Yes, if your OS and platform is.

> If so what processors are recommended?

AFAIK, the latest intels and AMDs are all good, and fairly equal. Make
sure you turn hyperthreading off. Multicore is fine, but not HT.


> I read this : http://www.postgresql.org/files/documentation/books/aw_pgsql/hw_performance/node12.html
> POSTGRESQL uses a multi-process model, meaning each database
> connection has its own Unix process. Because of this, all multi-cpu
> operating systems can spread multiple database connections among the
> available CPUs. However, if only a single database connection is
> active, it can only use one CPU. POSTGRESQL does not use multi-
> threading to allow a single process to use multiple CPUs.
>
> Its pretty old (2003) but is it still accurate?

Yes.


> if this statement is
> accurate how would it affect connection pooling software like pg_pool?

Not at all, really. It's only interesting how many running queries you
have, not how many connections. There are other advantages to the
pg_pool and friends, such as not having to fork new processes so often,
but it doesn't affect the spread over CPUs.


> RAM?  The more the merrier right? Understanding shmmax and the pg
> config file parameters for shared mem has to be adjusted to use it.

Yes. As long as your database doesn't fit entirely in RAM with room over
for sorting and such, more RAM will make things faster in just about
every case.


> Disks?  standard Raid rules right?  1 for safety 5 for best mix of
> performance and safety?

RAID-10 for best mix of performance and safety. RAID-5 can give you a
decent compromise between cost and performance/safety.

And get a RAID controller with lots of cache memory with battery backup.
This is *very* important.

And remember - lots of spindles (disks) if you want good write
performance. Regardless of which RAID you use.


> Any preference of SCSI over SATA? What about using a High speed (fibre
> channel) mass storage device?

Absolutely SCSI or SAS, and not SATA. I see no point with plain FC
disks, but if you get a high end SAN solution with FC between the host
and the controllers, that's what you're going to be using. There are
things to be said both for using DAS and SAN - they both ahve their
advantages.


> Who has built the biggest baddest Pg server out there and what do you
> use?

Probably not me :-) The biggest one I've set up is 16 cores, 32Gb RAM
and no more than 800Gb disk... But it's very fast :-)

Oh, and I'd absolutely recommend you go for brandname hardware, like IBM
or HP (or Sun or something if you don't want to go down the intel path).

//Magnus

Re: Hardware for PostgreSQL

From
Arjen van der Meijden
Date:
On 31-10-2007 17:45 Ketema wrote:
> I understand query tuning and table design play a large role in
> performance, but taking that factor away
> and focusing on just hardware, what is the best hardware to get for Pg
> to work at the highest level
> (meaning speed at returning results)?

It really depends on your budget and workload. Will it be read-heavy or
write-heavy? How large will the database be? Are those concurrent users
actively executing queries or is the actual concurrent query load lower
(it normally is)?
You should probably also try to estimate the amount of concurrently
executed queries and how heavy those queries are, as that is normally
more important as a performance measure. And normally its much less than
the amount of concurrently connected users.

> How does pg utilize multiple processors?  The more the better?
> Are queries spread across multiple processors?

It forks a process for a new connection and leaves the multi-cpu
scheduling to the OS. It does not spread a single query across multiple
cpu's. But with many concurrent users, you normally don't want or need
that anyway, it would mainly add extra stress to the scheduling of your
operating system.

> Is Pg 64 bit?
It can be compiled 64-bit and is available pre-compiled as 64-bits as well.

> If so what processors are recommended?

I think the x86-class cpu's deliver the most bang for buck and are the
best tested with postgres. Both AMD and Intel cpu's are pretty good, but
I think currently a system with two intel quad core cpus is in a very
good price/performance-point. Obviously you'll need to match the cpus to
your load, you may need more cpu-cores.

> Its pretty old (2003) but is it still accurate?  if this statement is
> accurate how would it affect connection pooling software like pg_pool?

It just keeps the process alive as long as the connection isn't closed,
nothing fancy or worrisome going on there. That's just the behavior I'd
expect at the connection pool-level.

> RAM?  The more the merrier right? Understanding shmmax and the pg
> config file parameters for shared mem has to be adjusted to use it.

More is better, but don't waste your money on it if you don't need it,
if your (the active part of your) database is smaller than the RAM,
increasing it doesn't do that much. I would be especially careful with
configurations that require those very expensive 4GB-modules.

> Disks?  standard Raid rules right?  1 for safety 5 for best mix of
> performance and safety?

Make sure you have a battery backed controller (or multiple), but you
should consider raid 10 if you have many writes and raid 5 or 50 if you
have a read-heavy environment. There are also people reporting that it's
faster to actually build several raid 1's and use the OS to combine them
to a raid 10.
Be careful with the amount of disks, in performance terms you're likely
better off with 16x 73GB than with 8x 146GB

> Any preference of SCSI over SATA? What about using a High speed (fibre
> channel) mass storage device?

I'd consider only SAS (serial attached scsi, the successor of scsi) for
a relatively small high performance storage array. Fibre channel is so
much more expensive, that you'll likely get much less performance for
the same amount of money. And I'd only use sata in such an environment
if the amount of storage, not its performance, is the main metric. I.e.
for file storage and backups.

Best regards,

Arjen

Re: Hardware for PostgreSQL

From
Tomas Vondra
Date:
> I understand query tuning and table design play a large role in
> performance, but taking that factor away
> and focusing on just hardware, what is the best hardware to get for Pg
> to work at the highest level
> (meaning speed at returning results)?

Depends heavily on the particular application, but most performance
problems were caused by I/O (some of them because of bad table or
application design, some of them by slow drives).

> How does pg utilize multiple processors?  The more the better?

Linux version uses processes, so it's able to use multiple processors.
(Not sure about Windows version, but I guess it uses threads.)

> Are queries spread across multiple processors?
> Is Pg 64 bit?
> If so what processors are recommended?

Hard to tell, as for example I've seen several benchmarks about Xeons
from Intel, half of them saying that's the right CPU for PostgreSQL, the
other half saying there are better CPUs. But as I've said before - in
most cases the performance problems are caused by slow drives - take
your money and put them in more RAM / better drives (SCSI).

> I read this : http://www.postgresql.org/files/documentation/books/aw_pgsql/hw_performance/node12.html
> POSTGRESQL uses a multi-process model, meaning each database
> connection has its own Unix process. Because of this, all multi-cpu
> operating systems can spread multiple database connections among the
> available CPUs. However, if only a single database connection is
> active, it can only use one CPU. POSTGRESQL does not use multi-
> threading to allow a single process to use multiple CPUs.
>
> Its pretty old (2003) but is it still accurate?  if this statement is
> accurate how would it affect connection pooling software like pg_pool?

Yes, it's quite accurate. But see this (plus the rest of the documents
in the "Docs" section on that site)

http://www.powerpostgresql.com/PerfList

> RAM?  The more the merrier right? Understanding shmmax and the pg
> config file parameters for shared mem has to be adjusted to use it.
> Disks?  standard Raid rules right?  1 for safety 5 for best mix of
> performance and safety?

Yes, the more RAM you can get, the better the performance (usually). The
problem is you've forgotten to mention the size of the database and the
usage patterns. If the whole database fits into the RAM, the performance
can only increase in most cases.

In case of clients that mostly write data into the database, the amount
of RAM won't help too much as the data need to be written to the disk
anyway (unless you turn off 'fsync' which is a really stupid thing to do
in case of important data).

RAID - a quite difficult question and I'm not quite a master in this
field, so I'll just quote some simple truths from the article mentioned
above:

1) more spindles == better

    So buy multiple small disks rather than one large one, and spread the
    reads / writes across all of them using RAID 0, tablespaces or
    partitioning.

2) separate the transaction log from the database

    It's mostly written, and it's the most valuable data you have. And in
    case you use PITR, this is the only thing that really needs to be
    backed up.

3) RAID 0+1/1+0 > RAID 5

> Any preference of SCSI over SATA? What about using a High speed (fibre
> channel) mass storage device?

SCSI is definitely better than SATA - the SATA are consumer level
generally - the main criteria in it's development is capacity, and it
definitely can't compete with SCSI 10k drives when it comes to transfer
rates, seek times, CPU utilization, etc. (and that's what really matters
with databases). And you can even buy 15k SAS drives for reasonable
amount of money today ...

Tomas

Re: Hardware for PostgreSQL

From
"Kevin Grittner"
Date:
>>> On Wed, Oct 31, 2007 at 11:45 AM, in message
<1193849108.591961.103570@22g2000hsm.googlegroups.com>, Ketema
<ketema@gmail.com> wrote:

> Who has built the biggest baddest Pg server out there and what do you
> use?

I don't think that would be us, but I can give you an example of
what can work.  We have a 220 GB database which is a replication
target for OLTP and supports a web site with about 2 million web
hits per day.  Daily, we have about 1 million database transactions
which modify data and 10 million database transactions which are
read-only.  The box has 8 4 GHz Xeon processors, 12 GB RAM, and RAID
5 with 13 live spindles and two hot spares.  The RAID controller has
256 MB RAM with battery backup.  This gives very good performance,
properly tuned.

Besides the PostgreSQL database, the box also runs middle tier
software written in Java.

I'll second the opinions that connection pooling and a high-
quality RAID controller with battery backed RAM cache are crucial.

-Kevin




Re: Hardware for PostgreSQL

From
"Ericson Smith"
Date:
> > Who has built the biggest baddest Pg server out there and what do you
> > use?

In my last job we had a 360GB database running on a 8 way opteron with
32 Gigs of ram. Two of those beasts connected to a SAN for hot
failover purposes.

We did not have much web traffic, but tons of update/insert traffic,
millions per day actually on pretty hefty tables (column wise).

- Ericson Smith
http://www.funadvice.com

Re: Hardware for PostgreSQL

From
Ow Mun Heng
Date:
On Wed, 2007-10-31 at 22:58 +0100, Tomas Vondra wrote:

> 2) separate the transaction log from the database
>
>     It's mostly written, and it's the most valuable data you have. And in
>     case you use PITR, this is the only thing that really needs to be
>     backed up.

My main DB datastore is in a raid1 array and the xlog is still
maintained in a single OS drive. Is this considered OK?


Re: Hardware for PostgreSQL

From
Magnus Hagander
Date:
Tomas Vondra wrote:
>> How does pg utilize multiple processors?  The more the better?
>
> Linux version uses processes, so it's able to use multiple processors.
> (Not sure about Windows version, but I guess it uses threads.)

No, the Windows version also uses processes.


//Magnus


Re: Hardware for PostgreSQL

From
Magnus Hagander
Date:
Ow Mun Heng wrote:
> On Wed, 2007-10-31 at 22:58 +0100, Tomas Vondra wrote:
>
>> 2) separate the transaction log from the database
>>
>>     It's mostly written, and it's the most valuable data you have. And in
>>     case you use PITR, this is the only thing that really needs to be
>>     backed up.
>
> My main DB datastore is in a raid1 array and the xlog is still
> maintained in a single OS drive. Is this considered OK?

Is your OS not RAIDed? I'd keep everything RAIDed one way or another -
otherwise you are certain to get downtime if the disk fails.

Also, if you don't have a *dedicated* disk for the xlog (putting it on
the OS disk doesn't make it dedicated), you miss out on most of the
performance advantage of doing it. The advantage is in that the writes
will be sequential so the disks don't have to seek, but if you have
other access on the same disk, that's not true anymore.

You're likely better off (performance-wise) putting it on the same disk
as the database itself if that one has better RAID, for example.

//Magnus

Re: Hardware for PostgreSQL

From
Ow Mun Heng
Date:
On Thu, 2007-11-01 at 07:54 +0100, Magnus Hagander wrote:
> Ow Mun Heng wrote:
> > On Wed, 2007-10-31 at 22:58 +0100, Tomas Vondra wrote:
> >
> >> 2) separate the transaction log from the database
> >>
> >>     It's mostly written, and it's the most valuable data you have. And in
> >>     case you use PITR, this is the only thing that really needs to be
> >>     backed up.
> >
> > My main DB datastore is in a raid1 array and the xlog is still
> > maintained in a single OS drive. Is this considered OK?
>
> Is your OS not RAIDed? I'd keep everything RAIDed one way or another -
> otherwise you are certain to get downtime if the disk fails.

Nope it's not raided. It's a very low end "server" running on IDE, max 4
drives. 1x80G system and 3x500G Raid1+1 hot spare

>
> Also, if you don't have a *dedicated* disk for the xlog (putting it on
> the OS disk doesn't make it dedicated), you miss out on most of the
> performance advantage of doing it. The advantage is in that the writes
> will be sequential so the disks don't have to seek, but if you have
> other access on the same disk, that's not true anymore.

As of right now, budget constraints is making me make do with that I've
got/(and it's not a whole lot)

>
> You're likely better off (performance-wise) putting it on the same disk
> as the database itself if that one has better RAID, for example.

I'm thinking along the lines of since nothing much writes to the OS
Disk, I should(keyword) be safe.

Thanks for the food for thought. Now.. time to find some dough to throw
around. :-)


Re: Hardware for PostgreSQL

From
Magnus Hagander
Date:
Ow Mun Heng wrote:
>> You're likely better off (performance-wise) putting it on the same disk
>> as the database itself if that one has better RAID, for example.
>
> I'm thinking along the lines of since nothing much writes to the OS
> Disk, I should(keyword) be safe.

Unless it's *always* in the cache (not so likely), reads will also move
the heads...

In the situation you have, I'd put the xlog on the same disk as the data
- mainly because it gives you RAID on it in case the disk breaks.

//Magnus

Re: Hardware for PostgreSQL

From
Adam Tauno Williams
Date:
> > You're likely better off (performance-wise) putting it on the same disk
> > as the database itself if that one has better RAID, for example.
> I'm thinking along the lines of since nothing much writes to the OS
> Disk, I should(keyword) be safe.

You are almost certainly wrong about this;  think "syslog"

--
Adam Tauno Williams, Network & Systems Administrator
Consultant - http://www.whitemiceconsulting.com
Developer - http://www.opengroupware.org


Re: Hardware for PostgreSQL

From
Steve Crawford
Date:
Ketema wrote:
> I am trying to build a very Robust DB server that will support 1000+
> concurrent users (all ready have seen max of 237 no pooling being
> used).  I have read so many articles now that I am just saturated.  I
> have a general idea but would like feedback from others.

Describe a bit better. 1,000 users or 1,000 simultaneous connections?
Ie, do you have a front-end where someone logs on, gets a connection,
and keeps it for the duration or is it a web-type app where each request
might connect-query-disconnect? If the latter, are your connections
persistent? How many queries/second do you expect?

How complex are the queries (retrieve single record or data-mining)?
Read-only or lots of updates? Do the read-queries need to be done every
time or are they candidates for caching?

> RAM?  The more the merrier right?

Generally, true. But once you reach the point that everything can fit in
RAM, more is just wasted $$$. And, according to my reading, there are
cases where more RAM can hurt - basically if you manage to create a
situation where your queries are large enough to just flush cache so you
don't benefit from caching but are hurt by spending time checking cache
for the data.

> Who has built the biggest baddest Pg server out there and what do you
> use?

Not me.

Someone just showed me live system monitoring data on one of his several
PG machines. That one was clocking multi-thousand TPS on a server
(Sun??) with 128GB RAM. That much RAM makes "top" look amusing.

Several of the social-networking sites are using PG - generally
spreading load over several (dozens) of servers. They also make heavy
use of pooling and caching - think dedicated memcached servers offering
a combined pool of several TB RAM.

For pooling, pgbouncer seems to have a good reputation. Tests on my
current production server show it shaving a few ms off every
connect-query-disconnect cycle. Connects are fairly fast in PG but that
delay becomes a significant issue under heavy load.

Test pooling carefully, though. If you blindly run everything through
your pooler instead of just selected apps, you can end up with
unexpected problems when one client changes a backend setting like "set
statement_timeout to 5". If the next client assigned to that backend
connection runs a long-duration analysis query, it is likely to fail.

Cheers,
Steve

Re: Hardware for PostgreSQL

From
Steve Crawford
Date:
Magnus Hagander wrote:
> Ow Mun Heng wrote:
>>> You're likely better off (performance-wise) putting it on the same disk
>>> as the database itself if that one has better RAID, for example.
>> I'm thinking along the lines of since nothing much writes to the OS
>> Disk, I should(keyword) be safe.
>
> Unless it's *always* in the cache (not so likely), reads will also move
> the heads...

And if you aren't mounted noatime, reads will also cause a write.

Cheers,
Steve

Re: Hardware for PostgreSQL

From
Ow Mun Heng
Date:
On Thu, 2007-11-01 at 11:16 -0700, Steve Crawford wrote:
> Magnus Hagander wrote:
> > Ow Mun Heng wrote:
> >>> You're likely better off (performance-wise) putting it on the same disk
> >>> as the database itself if that one has better RAID, for example.
> >> I'm thinking along the lines of since nothing much writes to the OS
> >> Disk, I should(keyword) be safe.
> >
> > Unless it's *always* in the cache (not so likely), reads will also move
> > the heads...
>
> And if you aren't mounted noatime, reads will also cause a write.


/dev/VolGroup00/LogVol01 /                       ext3    defaults,noatime        1 1
/dev/md0                /raid1_array            ext3    noatime,data=writeback   1 1

Yep..yep..

Re: Hardware for PostgreSQL

From
Jurgen Haan
Date:

Ketema wrote:

> RAM?  The more the merrier right? Understanding shmmax and the pg
> config file parameters for shared mem has to be adjusted to use it.
> Disks?  standard Raid rules right?  1 for safety 5 for best mix of
> performance and safety?
> Any preference of SCSI over SATA? What about using a High speed (fibre
> channel) mass storage device?

You might also check out NETAPP NAS arrays.
Due to clever use of a big memory buffer and a custom filesystem on a
RAID-4 array it has performance comparable to a big RAM disk connected
through NFS.

Works quite well with OLTP (at least for us).

>
> Who has built the biggest baddest Pg server out there and what do you
> use?
>
> Thanks!
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

Re: Hardware for PostgreSQL

From
Robert Treat
Date:
On Wednesday 31 October 2007 12:45, Ketema wrote:
> I am trying to build a very Robust DB server that will support 1000+
> concurrent users (all ready have seen max of 237 no pooling being
> used).  I have read so many articles now that I am just saturated.  I
> have a general idea but would like feedback from others.
>

Most of the other answers you've gotten have been pretty good, but I had some
questions on the above; specifically is there a reason you're avoid pooling?
(something like pgbouncer can work wonders).  Are your 1000+ concurrent users
working in something like a web environment, where they won't need a 1:1
user:connection map to service them all, or are these going to be more
permanent connections into the system?  FWIW I'd done 1000 connections
simultaneous on pretty basic hardware, but you need to have the right kind of
workloads to be able to do it.

>
> Who has built the biggest baddest Pg server out there and what do you
> use?
>

While I'm not sure this will be that much help, I'd feel remisce if I didn't
point you to it...
http://www.lethargy.org/~jesus/archives/66-Big-Bad-PostgreSQL.html

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL