Thread: Re: Hardware/OS recommendations for large databases (

Re: Hardware/OS recommendations for large databases (

From
"Luke Lonergan"
Date:
For data warehousing its pretty well open and shut.  To use all cpus and io channels on each query you will need mpp.

Has anyone done the math.on the original post?  5TB takes how long to scan once?  If you want to wait less than a
coupleof days just for a seq scan, you'd better be in the multi-gb per second range.
 

- Luke
--------------------------
Sent from my BlackBerry Wireless Device


-----Original Message-----
From: pgsql-performance-owner@postgresql.org <pgsql-performance-owner@postgresql.org>
To: pgsql-performance@postgresql.org <pgsql-performance@postgresql.org>
Sent: Sat Nov 26 13:51:18 2005
Subject: Re: [PERFORM] Hardware/OS recommendations for large databases (

>Another thought - I priced out a maxed out machine with 16 cores and
>128GB of RAM and 1.5TB of usable disk - $71,000.
>
>You could instead buy 8 machines that total 16 cores, 128GB RAM and 28TB
>of disk for $48,000, and it would be 16 times faster in scan rate, which
>is the most important factor for large databases.  The size would be 16
>rack units instead of 5, and you'd have to add a GigE switch for $1500.
>
>Scan rate for above SMP: 200MB/s
>
>Scan rate for above cluster: 3,200Mb/s
>
>You could even go dual core and double the memory on the cluster and
>you'd about match the price of the "god box".
>
>- Luke

Luke, I assume you are talking about useing the Greenplum MPP for this 
(otherwise I don't know how you are combining all the different systems).

If you are, then you are overlooking one very significant factor, the cost 
of the MPP software, at $10/cpu the cluster has an extra $160K in software 
costs, which is double the hardware costs.

if money is no object then go for it, but if it is then you comparison 
would be (ignoring software maintinance costs) the 16 core 128G ram system 
vs ~3xsmall systems totaling 6 cores and 48G ram.

yes if scan speed is the bottleneck you still win with the small systems, 
but for most other uses the large system would win easily. and in any case 
it's not the open and shut case that you keep presenting it as.

David Lang

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend


Re: Hardware/OS recommendations for large databases (

From
David Lang
Date:
On Sun, 27 Nov 2005, Luke Lonergan wrote:

> For data warehousing its pretty well open and shut.  To use all cpus and
> io channels on each query you will need mpp.
>
> Has anyone done the math.on the original post?  5TB takes how long to
> scan once?  If you want to wait less than a couple of days just for a
> seq scan, you'd better be in the multi-gb per second range.

if you truely need to scan the entire database then you are right, however
indexes should be able to cut the amount you need to scan drasticly.

David Lang

Re: Hardware/OS recommendations for large databases (

From
Stephan Szabo
Date:
On Sun, 27 Nov 2005, Luke Lonergan wrote:

> Has anyone done the math.on the original post?  5TB takes how long to
> scan once?  If you want to wait less than a couple of days just for a
> seq scan, you'd better be in the multi-gb per second range.

Err, I get about 31 megabytes/second to do 5TB in 170,000 seconds. I think
perhaps you were exaggerating a bit or adding additional overhead not
obvious from the above. ;)

---

At 1 gigabyte per second, 1 terrabyte should take about 1000 seconds
(between 16 and 17 minutes).  The impressive 3.2 gigabytes per second
listed before (if it actually scans consistently at that rate), puts it at
a little over 5 minutes I believe for 1, so about 26 for 5 terrabytes.
The 200 megabyte per second number puts it about 7 hours for 5
terrabytes AFAICS.

Re: Hardware/OS recommendations for large databases

From
Ron
Date:
At 01:18 AM 11/27/2005, Luke Lonergan wrote:
>For data warehousing its pretty well open and shut.  To use all cpus
>and io channels on each query you will need mpp.
>
>Has anyone done the math.on the original post?  5TB takes how long
>to scan once?  If you want to wait less than a couple of days just
>for a seq scan, you'd better be in the multi-gb per second range.
More than a bit of hyperbole there Luke.

Some common RW scenarios:
Dual 1GbE NICs => 200MBps => 5TB in 5x10^12/2x10^8= 25000secs=
~6hrs57mins.  Network stuff like re-transmits of dropped packets can
increase this, so network SLA's are critical.

Dual 10GbE NICs => ~1.6GBps (10GbE NICs can't yet do over ~800MBps
apiece) => 5x10^12/1.6x10^9=  3125secs= ~52mins.  SLA's are even
moire critical here.

If you are pushing 5TB around on a regular basis, you are not wasting
your time & money on commodity <= 300MBps RAID HW.  You'll be using
800MBps and 1600MBps high end stuff, which means you'll need ~1-2hrs
to sequentially scan 5TB on physical media.

Clever use of RAM can get a 5TB sequential scan down to ~17mins.

Yes, it's a lot of data.  But sequential scan times should be in the
mins or low single digit hours, not days.  Particularly if you use
RAM to maximum advantage.

Ron



Re: Hardware/OS recommendations for large databases

From
"Luke Lonergan"
Date:
Ron,

On 11/27/05 9:10 AM, "Ron" <rjpeace@earthlink.net> wrote:

> Clever use of RAM can get a 5TB sequential scan down to ~17mins.
>
> Yes, it's a lot of data.  But sequential scan times should be in the
> mins or low single digit hours, not days.  Particularly if you use
> RAM to maximum advantage.

Unfortunately, RAM doesn't help with scanning from disk at all.

WRT using network interfaces to help - it's interesting, but I think what
you'd want to connect to is other machines with storage on them.

- Luke



Re: Hardware/OS recommendations for large databases (

From
"Luke Lonergan"
Date:
Stephan,

On 11/27/05 7:48 AM, "Stephan Szabo" <sszabo@megazone.bigpanda.com> wrote:

> On Sun, 27 Nov 2005, Luke Lonergan wrote:
>
>> Has anyone done the math.on the original post?  5TB takes how long to
>> scan once?  If you want to wait less than a couple of days just for a
>> seq scan, you'd better be in the multi-gb per second range.
>
> Err, I get about 31 megabytes/second to do 5TB in 170,000 seconds. I think
> perhaps you were exaggerating a bit or adding additional overhead not
> obvious from the above. ;)

Thanks - the calculator on my blackberry was broken ;-)

> At 1 gigabyte per second, 1 terrabyte should take about 1000 seconds
> (between 16 and 17 minutes).  The impressive 3.2 gigabytes per second
> listed before (if it actually scans consistently at that rate), puts it at
> a little over 5 minutes I believe for 1, so about 26 for 5 terrabytes.
> The 200 megabyte per second number puts it about 7 hours for 5
> terrabytes AFAICS.

7 hours, days, same thing ;-)

On the reality of sustained scan rates like that:

We're getting 2.5GB/s sustained on a 2 year old machine with 16 hosts and 96
disks.  We run them in RAID0, which is only OK because MPP has built-in host
to host mirroring for fault management.

We just purchased a 4-way cluster with 8 drives each using the 3Ware 9550SX.
Our thought was to try the simplest approach first, which is a single RAID5,
which gets us 7 drives worth of capacity and performance.  As I posted
earlier, we get about 400MB/s seq scan rate on the RAID, but the Postgres
8.0 current scan rate limit is 64% of 400MB/s or 256MB/s per host.  The 8.1
mods (thanks Qingqing and Tom!) may increase that significantly toward the
400 max - we've already merged the 8.1 codebase into MPP so we'll also
feature the same enhancements.

Our next approach is to run these machines in a split RAID0 configuration,
or RAID0 on 4 and 4 drives.  We then run an MPP "segment instance" bound to
each CPU and I/O channel.  At that point, we'll have all 8 drives of
performance and capacity per host and we should get 333MB/s with current MPP
and perhaps over 400MB/s with MPP/8.1.  That would get us up to the 3.2GB/s
for 8 hosts.

Even better, all operators are executed on all CPUs for each query, so
sorting, hashing, agg, etc etc are run on all CPUs in the cluster.

- Luke



Re: Hardware/OS recommendations for large databases

From
Ron
Date:
At 02:11 PM 11/27/2005, Luke Lonergan wrote:
>Ron,
>
>On 11/27/05 9:10 AM, "Ron" <rjpeace@earthlink.net> wrote:
>
> > Clever use of RAM can get a 5TB sequential scan down to ~17mins.
> >
> > Yes, it's a lot of data.  But sequential scan times should be in the
> > mins or low single digit hours, not days.  Particularly if you use
> > RAM to maximum advantage.
>
>Unfortunately, RAM doesn't help with scanning from disk at all.
I agree with you if you are scanning a table "cold", having never
loaded it before, or if the system is not (or can't be) set up
properly with appropriate buffers.

However, outside of those 2 cases there are often tricks you can use
with enough RAM (and no, you don't need RAM equal to the size of the
item(s) being scanned) to substantially speed things up.  Best case,
you can get performance approximately equal to that of a RAM resident scan.


>WRT using network interfaces to help - it's interesting, but I think what
>you'd want to connect to is other machines with storage on them.
Maybe.  Or maybe you want to concentrate your storage in a farm that
is connected by network or Fiber Channel to the rest of your
HW.  That's what a NAS or SAN is after all.

"The rest of your HW" nowadays is often a cluster of RAM rich
hosts.  Assuming 64GB per host, 5TB can be split across ~79 hosts if
you want to make it all RAM resident.

Most don't have that kind of budget, but thankfully it is not usually
necessary to make all of the data RAM resident in order to obtain if
not all of the performance benefits you'd get if all of the data was.

Ron



Re: Hardware/OS recommendations for large databases (

From
Stephan Szabo
Date:
On Sun, 27 Nov 2005, Luke Lonergan wrote:

> Stephan,
>
> On 11/27/05 7:48 AM, "Stephan Szabo" <sszabo@megazone.bigpanda.com> wrote:
>
> > On Sun, 27 Nov 2005, Luke Lonergan wrote:
> >
> >> Has anyone done the math.on the original post?  5TB takes how long to
> >> scan once?  If you want to wait less than a couple of days just for a
> >> seq scan, you'd better be in the multi-gb per second range.
> >
> > Err, I get about 31 megabytes/second to do 5TB in 170,000 seconds. I think
> > perhaps you were exaggerating a bit or adding additional overhead not
> > obvious from the above. ;)
>
> Thanks - the calculator on my blackberry was broken ;-)

Well, it was suspiciously close to a factor of 60 off, which when working
in time could have just been a simple math error.

> > At 1 gigabyte per second, 1 terrabyte should take about 1000 seconds
> > (between 16 and 17 minutes).  The impressive 3.2 gigabytes per second
> > listed before (if it actually scans consistently at that rate), puts it at
> > a little over 5 minutes I believe for 1, so about 26 for 5 terrabytes.
> > The 200 megabyte per second number puts it about 7 hours for 5
> > terrabytes AFAICS.
>
> 7 hours, days, same thing ;-)
>
> On the reality of sustained scan rates like that:

Well, the reason I asked was that IIRC the 3.2 used earlier in the
discussion was exactly multiplying scanners and base rate (ie, no
additional overhead).  I couldn't tell if that was back of the envelope or
if the overhead was in fact negligible.  (Or I could be misremembering the
conversation).  I don't doubt that it's possible to get the rate, just
wasn't sure if the rate was actually applicable to the ongoing discussion
of the comparison.