Large Databases redux - Mailing list pgsql-general

From Jason Herr
Subject Large Databases redux
Date
Msg-id CAKMjnc39-OmWo_xdrh3VYub83bA2DM_6aKNH-Pz+Cwh96rNJBg@mail.gmail.com
Whole thread Raw
Responses Re: Large Databases redux  (John R Pierce <pierce@hogranch.com>)
List pgsql-general
Hey,

In an attempt to NOT pollute the thread started by Kjetil Nygård, I decided to ask a very similar question with likely different data.  
I am interested in hearing recommendations on hardware specs in terms of Drives/RAM/shared_buffers/CPUs.  I have been doing some research/testing, and am looking for validation/repudiation/meh.

So, what I know:
The size on disk should be (~2TB) for the database + WAL.
This is a warehousing database doing ETL from external data sources that will move to more OLAP function as features are developed.
Transactions/second aren't as important as volume/sec as we plan to use regular (hourly probably) bulk imports for inserts.
Deletions are rare and happen in bulk.
Data is vanilla.
24x7 environment with some possibilities for VERY short timed outages
Transaction latency would be preferred to be under 30s, however, queries can be formed by end user to take days.  Common OLAP queries will be cached after each bulk import.
Single selects on tables need to be 3ms
One table can be 500k rows taking ~400GB with 3 indices of ~200GB (I am generating data right now and using numbers from the generation to power this query) and 18 columns mostly real with one timestamp with timezone .  The rest of the large tables have less rows, but have similar column content.
Nightly backups and UPS are to be expected.
Data will be mined and stored elsewhere .

What I don't know:
Max simultaneous connections
Peak Transactions/second
Number of users (20-2000) hitting Web UI for analytic data 

I have my own theories based on what I've read and my puttering.  I think I can get away with a disk for the OS, disk for the WAL, disk for the large table (tablespaces) and a disk for the rest.  And when I say disk I mean storage device.  I'm thinking RAID1 15k disks for each set but the databases and then raid 10 or VERY large disks.

I don't really have useful information to give back at this point, but I will once I get everything running.  Also, I must work within some limitations, but I am still interested to see what the list has to say....


Thanks,
Jason



pgsql-general by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: Large PostgreSQL servers
Next
From: John R Pierce
Date:
Subject: Re: Large PostgreSQL servers