Re: Hardware/OS recommendations for large databases ( - Mailing list pgsql-performance

From David Lang
Subject Re: Hardware/OS recommendations for large databases (
Date
Msg-id Pine.LNX.4.62.0511271940400.2807@qnivq.ynat.uz
Whole thread Raw
In response to Re: Hardware/OS recommendations for large databases (  (Brendan Duddridge <brendan@clickspace.com>)
List pgsql-performance
On Mon, 28 Nov 2005, Brendan Duddridge wrote:

> Hi David,
>
> Thanks for your reply. So how is that different than something like Slony2 or
> pgcluster with multi-master replication? Is it similar technology? We're
> currently looking for a good clustering solution that will work on our Apple
> Xserves and Xserve RAIDs.

MPP doesn't just split up the data, it splits up the processing as well,
so if you have a 5 machine cluster, each machine holds 1/5 of your data
(plus a backup for one of the other machines) and when you do a query MPP
slices and dices the query to send a subset of the query to each machine,
it then gets the responses from all the machines and combines them

if you ahve to do a full table scan for example, wach machine would only
have to go through 20% of the data

a Slony of pgcluster setup has each machine with a full copy of all the
data, only one machine can work on a given query at a time, and if you
have to do a full table scan one machine needs to read 100% of the data.

in many ways this is the holy grail of databases. almost all other areas
of computing can now be scaled by throwing more machines at the problem in
a cluster, with each machine just working on it's piece of the problem,
but databases have had serious trouble doing the same and so have been
ruled by the 'big monster machine'. Oracle has been selling Oracle Rac for
a few years, and reports from people who have used it range drasticly
(from it works great, to it's a total disaster), in part depending on the
types of queries that have been made.

Greenplum thinks that they have licked the problems for the more general
case (and that commodity networks are now fast enough to match disk speeds
in processing the data) if they are right then when they hit full release
with the new version they should be cracking a lot of the
price/performance records on the big database benchmarks (TPC and
similar), and if their pricing is reasonable, they may be breaking them by
an order of magnatude or more (it's not unusual for the top machines to
spend more then $1,000,000 on just their disk arrays for those
systems, MPP could conceivably put togeather a cluster of $5K machines
that runs rings around them (and probably will for at least some of the
subtests, the big question is if they can sweep the board and take the top
spots outright)

they have more details (and marketing stuff) on their site at
http://www.greenplum.com/prod_deepgreen_cluster.html

don't get me wrong, I am very impressed with their stuff, but (haveing
ranted a little here on the list about them) I think MPP and it's
performace is a bit off topic for the postgres performance list (at least
until the postgres project itself starts implementing similar features :-)

David Lang

> Thanks,
>
> ____________________________________________________________________
> Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com
>
> ClickSpace Interactive Inc.
> Suite L100, 239 - 10th Ave. SE
> Calgary, AB  T2G 0V9
>
> http://www.clickspace.com
>
> On Nov 27, 2005, at 8:09 PM, David Lang wrote:
>
>> On Mon, 28 Nov 2005, Brendan Duddridge wrote:
>>
>>> Forgive my ignorance, but what is MPP? Is that part of Bizgres? Is it
>>> possible to upgrade from Postgres 8.1 to Bizgres?
>>
>> MPP is the Greenplum propriatary extention to postgres that spreads the
>> data over multiple machines, (raid, but with entire machines not just
>> drives, complete with data replication within the cluster to survive a
>> machine failing)
>>
>> for some types of queries they can definantly scale lineraly with the
>> number of machines (other queries are far more difficult and the overhead
>> of coordinating the machines shows more. this is one of the key things that
>> the new version they recently announced the beta for is supposed to be
>> drasticly improving)
>>
>> early in the year when I first looked at them their prices were exorbadent,
>> but Luke says I'm wildly mistake on their current prices so call them for
>> details
>>
>> it uses the same interfaces as postgres so it should be a drop in
>> replacement to replace a single server with a cluster.
>>
>> it's facinating technology to read about.
>>
>> I seem to remember reading that one of the other postgres companies is also
>> producing a clustered version of postgres, but I don't remember who and
>> know nothing about them.
>>
>> David Lang
>>
>
>

pgsql-performance by date:

Previous
From: David Boreham
Date:
Subject: Re: Hardware/OS recommendations for large databases (
Next
From: David Lang
Date:
Subject: Re: Hardware/OS recommendations for large databases (