Thread: Large Databases redux

Large Databases redux

From

Jason Herr

Date:

21 March 2012, 18:18:38

Hey,

In an attempt to NOT pollute the thread started by Kjetil Nygård, I decided to ask a very similar question with likely different data.

I am interested in hearing recommendations on hardware specs in terms of Drives/RAM/shared_buffers/CPUs. I have been doing some research/testing, and am looking for validation/repudiation/meh.

So, what I know:

The size on disk should be (~2TB) for the database + WAL.

This is a warehousing database doing ETL from external data sources that will move to more OLAP function as features are developed.

Transactions/second aren't as important as volume/sec as we plan to use regular (hourly probably) bulk imports for inserts.

Deletions are rare and happen in bulk.

Data is vanilla.

24x7 environment with some possibilities for VERY short timed outages

Transaction latency would be preferred to be under 30s, however, queries can be formed by end user to take days. Common OLAP queries will be cached after each bulk import.

Single selects on tables need to be 3ms

One table can be 500k rows taking ~400GB with 3 indices of ~200GB (I am generating data right now and using numbers from the generation to power this query) and 18 columns mostly real with one timestamp with timezone . The rest of the large tables have less rows, but have similar column content.

Nightly backups and UPS are to be expected.

Data will be mined and stored elsewhere .

What I don't know:

Max simultaneous connections

Peak Transactions/second

Number of users (20-2000) hitting Web UI for analytic data

I have my own theories based on what I've read and my puttering. I think I can get away with a disk for the OS, disk for the WAL, disk for the large table (tablespaces) and a disk for the rest. And when I say disk I mean storage device. I'm thinking RAID1 15k disks for each set but the databases and then raid 10 or VERY large disks.

I don't really have useful information to give back at this point, but I will once I get everything running. Also, I must work within some limitations, but I am still interested to see what the list has to say....

Thanks,

Jason

Re: Large Databases redux

From

John R Pierce

Date:

21 March 2012, 18:59:13

On 03/21/12 2:18 PM, Jason Herr wrote:
> I have my own theories based on what I've read and my puttering.  I
> think I can get away with a disk for the OS, disk for the WAL, disk
> for the large table (tablespaces) and a disk for the rest.  And when I
> say disk I mean storage device.  I'm thinking RAID1 15k disks for each
> set but the databases and then raid 10 or VERY large disks.

I think you're better off just making one big raid10 out of all the
disks and putting everything on it, maybe in different file systems to
seperate out file fragmentation.   this way the IO workload is evenly
distributed across all the disks.

--
john r pierce                            N 37, W 122
santa cruz ca                         mid-left coast

Re: Large Databases redux

From

Martijn van Oosterhout

Date:

21 March 2012, 19:21:13

On Wed, Mar 21, 2012 at 02:58:43PM -0700, John R Pierce wrote:
> On 03/21/12 2:18 PM, Jason Herr wrote:
> >I have my own theories based on what I've read and my puttering.
> >I think I can get away with a disk for the OS, disk for the WAL,
> >disk for the large table (tablespaces) and a disk for the rest.
> >And when I say disk I mean storage device.  I'm thinking RAID1 15k
> >disks for each set but the databases and then raid 10 or VERY
> >large disks.
>
> I think you're better off just making one big raid10 out of all the
> disks and putting everything on it, maybe in different file systems
> to seperate out file fragmentation.   this way the IO workload is
> evenly distributed across all the disks.

That, and a good RAID controller with BBU cache will go a long way to
relieving the pain of fsync.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

Attachment

signature.asc

Re: Large Databases redux

From

John R Pierce

Date:

21 March 2012, 20:29:27

On 03/21/12 3:20 PM, Martijn van Oosterhout wrote:
> That, and a good RAID controller with BBU cache will go a long way to
> relieving the pain of fsync.

even better than BBU cache is the newer 'flash backed caches'.   works
the same, but uses a supercap rather than a battery, and backs the cache
up in a flashchip.  typically 1GB cache on the newer SAS2 stuff.   This
way you don't need to replace the batteries in 3 years or whatever, or
the complexities of having to move the raid card with the attached
battery if the mainboard or chassis failed.

--
john r pierce                            N 37, W 122
santa cruz ca                         mid-left coast

Re: Large Databases redux

From

Marti Raudsepp

Date:

22 March 2012, 18:53:50

On Thu, Mar 22, 2012 at 00:20, Martijn van Oosterhout <kleptog@svana.org> wrote:
> That, and a good RAID controller with BBU cache will go a long way to
> relieving the pain of fsync.

Well a BBU cache RAID is helpful, but fsyncs are a minor problem in
data warehouse workloads, since inserts are done in large bulks and
commits are rare. And you can run with synchronous_commit=off, since
it's always possible to reload the last batch after a power failure.

On Wed, Mar 21, 2012 at 23:18, Jason Herr <jaherr@gmail.com> wrote:
> Single selects on tables need to be 3ms

You've set yourself an impossible target, that's below the average
seek time of 15kRPM disks. For indexed single-row selects on
non-cached data, expect at least a few index page fetches and a heap
fetch, and potentially file system block map lookups. 20ms seems a
more plausible target. But with competing I/O activity, especially
other OLAP/DW queries and bulk data loads, that's still quite
optimistic.

If you have a high cached access correlation and lots of RAM, it might
be possible to keep the *average* below 3ms, but I don't know if you
can bet on that with 2TB of storage.

Regards,
Marti