Thread: Re: Hardware/OS recommendations for large databases (

Re: Hardware/OS recommendations for large databases (

From
"Merlin Moncure"
Date:
>
> It certainly makes quite a difference as I measure it:
>
> doing select(1) from a 181000 page table (completely uncached) on my
PIII:
>
> 8.0 : 32 s
> 8.1 : 25 s
>
> Note that the 'fastcount()' function takes 21 s in both cases - so all
> the improvement seems to be from the count overhead reduction.

Are you running windows?  There is a big performance improvement in
count(*) on pg 8.0->8.1 on win32 that is not relevant to this debate...

Merlin

Re: Hardware/OS recommendations for large databases (

From
Mark Kirkwood
Date:
Merlin Moncure wrote:
>>It certainly makes quite a difference as I measure it:
>>
>>doing select(1) from a 181000 page table (completely uncached) on my
>
> PIII:
>
>>8.0 : 32 s
>>8.1 : 25 s
>>
>>Note that the 'fastcount()' function takes 21 s in both cases - so all
>>the improvement seems to be from the count overhead reduction.
>
>
> Are you running windows?  There is a big performance improvement in
> count(*) on pg 8.0->8.1 on win32 that is not relevant to this debate...
>

No - FreeBSD 6.0 on a dual PIII 1 Ghz. The slow cpu means that the 8.1
improvements are very noticeable!

A point of interest - applying Niels palloc - avoiding changes to
NodeAgg.c and int8.c in 8.0 changes those results to:

8.0 + palloc avoiding patch : 27 s

(I am guessing the remaining 2 s could be shaved off if I backported
8.1's virtual tuples - however that looked like a lot of work)

Cheers

Mark

Re: Hardware/OS recommendations for large databases (

From
"Luke Lonergan"
Date:
Mark,

On 11/28/05 1:45 PM, "Mark Kirkwood" <markir@paradise.net.nz> wrote:

>>> 8.0 : 32 s
>>> 8.1 : 25 s

A 22% reduction.

select count(1) on 12,900MB = 1617125 pages fully cached:

MPP based on 8.0 : 6.06s
MPP based on 8.1 : 4.45s

A 26% reduction.

I'll take it!

I am looking to back-port Tom's pre-8.2 changes and test again, maybe
tonight.

- Luke



Re: Hardware/OS recommendations for large databases (

From
Brendan Duddridge
Date:
Forgive my ignorance, but what is MPP? Is that part of Bizgres? Is it
possible to upgrade from Postgres 8.1 to Bizgres?

Thanks,

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Nov 28, 2005, at 3:05 PM, Luke Lonergan wrote:

> Mark,
>
> On 11/28/05 1:45 PM, "Mark Kirkwood" <markir@paradise.net.nz> wrote:
>
>>>> 8.0 : 32 s
>>>> 8.1 : 25 s
>
> A 22% reduction.
>
> select count(1) on 12,900MB = 1617125 pages fully cached:
>
> MPP based on 8.0 : 6.06s
> MPP based on 8.1 : 4.45s
>
> A 26% reduction.
>
> I'll take it!
>
> I am looking to back-port Tom's pre-8.2 changes and test again, maybe
> tonight.
>
> - Luke
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


Attachment

Re: Hardware/OS recommendations for large databases (

From
Brendan Duddridge
Date:
Hi David,

Thanks for your reply. So how is that different than something like
Slony2 or pgcluster with multi-master replication? Is it similar
technology? We're currently looking for a good clustering solution
that will work on our Apple Xserves and Xserve RAIDs.

Thanks,

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Nov 27, 2005, at 8:09 PM, David Lang wrote:

> On Mon, 28 Nov 2005, Brendan Duddridge wrote:
>
>> Forgive my ignorance, but what is MPP? Is that part of Bizgres? Is
>> it possible to upgrade from Postgres 8.1 to Bizgres?
>
> MPP is the Greenplum propriatary extention to postgres that spreads
> the data over multiple machines, (raid, but with entire machines
> not just drives, complete with data replication within the cluster
> to survive a machine failing)
>
> for some types of queries they can definantly scale lineraly with
> the number of machines (other queries are far more difficult and
> the overhead of coordinating the machines shows more. this is one
> of the key things that the new version they recently announced the
> beta for is supposed to be drasticly improving)
>
> early in the year when I first looked at them their prices were
> exorbadent, but Luke says I'm wildly mistake on their current
> prices so call them for details
>
> it uses the same interfaces as postgres so it should be a drop in
> replacement to replace a single server with a cluster.
>
> it's facinating technology to read about.
>
> I seem to remember reading that one of the other postgres companies
> is also producing a clustered version of postgres, but I don't
> remember who and know nothing about them.
>
> David Lang
>


Attachment

Re: Hardware/OS recommendations for large databases (

From
David Boreham
Date:
Brendan Duddridge wrote:

> Thanks for your reply. So how is that different than something like
> Slony2 or pgcluster with multi-master replication? Is it similar
> technology? We're currently looking for a good clustering solution
> that will work on our Apple Xserves and Xserve RAIDs.

I think you need to be more specific about what you're trying to do.
'clustering' encompasses so many things that it means almost nothing by
itself.

slony provides facilities for replicating data. Its primary purpose is
to improve reliability. MPP distributes both data and queries. Its
primary purpose is to improve performance for a subset of all query types.



Re: Hardware/OS recommendations for large databases (

From
David Lang
Date:
On Mon, 28 Nov 2005, Brendan Duddridge wrote:

> Hi David,
>
> Thanks for your reply. So how is that different than something like Slony2 or
> pgcluster with multi-master replication? Is it similar technology? We're
> currently looking for a good clustering solution that will work on our Apple
> Xserves and Xserve RAIDs.

MPP doesn't just split up the data, it splits up the processing as well,
so if you have a 5 machine cluster, each machine holds 1/5 of your data
(plus a backup for one of the other machines) and when you do a query MPP
slices and dices the query to send a subset of the query to each machine,
it then gets the responses from all the machines and combines them

if you ahve to do a full table scan for example, wach machine would only
have to go through 20% of the data

a Slony of pgcluster setup has each machine with a full copy of all the
data, only one machine can work on a given query at a time, and if you
have to do a full table scan one machine needs to read 100% of the data.

in many ways this is the holy grail of databases. almost all other areas
of computing can now be scaled by throwing more machines at the problem in
a cluster, with each machine just working on it's piece of the problem,
but databases have had serious trouble doing the same and so have been
ruled by the 'big monster machine'. Oracle has been selling Oracle Rac for
a few years, and reports from people who have used it range drasticly
(from it works great, to it's a total disaster), in part depending on the
types of queries that have been made.

Greenplum thinks that they have licked the problems for the more general
case (and that commodity networks are now fast enough to match disk speeds
in processing the data) if they are right then when they hit full release
with the new version they should be cracking a lot of the
price/performance records on the big database benchmarks (TPC and
similar), and if their pricing is reasonable, they may be breaking them by
an order of magnatude or more (it's not unusual for the top machines to
spend more then $1,000,000 on just their disk arrays for those
systems, MPP could conceivably put togeather a cluster of $5K machines
that runs rings around them (and probably will for at least some of the
subtests, the big question is if they can sweep the board and take the top
spots outright)

they have more details (and marketing stuff) on their site at
http://www.greenplum.com/prod_deepgreen_cluster.html

don't get me wrong, I am very impressed with their stuff, but (haveing
ranted a little here on the list about them) I think MPP and it's
performace is a bit off topic for the postgres performance list (at least
until the postgres project itself starts implementing similar features :-)

David Lang

> Thanks,
>
> ____________________________________________________________________
> Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com
>
> ClickSpace Interactive Inc.
> Suite L100, 239 - 10th Ave. SE
> Calgary, AB  T2G 0V9
>
> http://www.clickspace.com
>
> On Nov 27, 2005, at 8:09 PM, David Lang wrote:
>
>> On Mon, 28 Nov 2005, Brendan Duddridge wrote:
>>
>>> Forgive my ignorance, but what is MPP? Is that part of Bizgres? Is it
>>> possible to upgrade from Postgres 8.1 to Bizgres?
>>
>> MPP is the Greenplum propriatary extention to postgres that spreads the
>> data over multiple machines, (raid, but with entire machines not just
>> drives, complete with data replication within the cluster to survive a
>> machine failing)
>>
>> for some types of queries they can definantly scale lineraly with the
>> number of machines (other queries are far more difficult and the overhead
>> of coordinating the machines shows more. this is one of the key things that
>> the new version they recently announced the beta for is supposed to be
>> drasticly improving)
>>
>> early in the year when I first looked at them their prices were exorbadent,
>> but Luke says I'm wildly mistake on their current prices so call them for
>> details
>>
>> it uses the same interfaces as postgres so it should be a drop in
>> replacement to replace a single server with a cluster.
>>
>> it's facinating technology to read about.
>>
>> I seem to remember reading that one of the other postgres companies is also
>> producing a clustered version of postgres, but I don't remember who and
>> know nothing about them.
>>
>> David Lang
>>
>
>

Re: Hardware/OS recommendations for large databases (

From
David Lang
Date:
On Mon, 28 Nov 2005, Brendan Duddridge wrote:

> Forgive my ignorance, but what is MPP? Is that part of Bizgres? Is it
> possible to upgrade from Postgres 8.1 to Bizgres?

MPP is the Greenplum propriatary extention to postgres that spreads the
data over multiple machines, (raid, but with entire machines not just
drives, complete with data replication within the cluster to survive a
machine failing)

for some types of queries they can definantly scale lineraly with the
number of machines (other queries are far more difficult and the overhead
of coordinating the machines shows more. this is one of the key things
that the new version they recently announced the beta for is supposed to
be drasticly improving)

early in the year when I first looked at them their prices were
exorbadent, but Luke says I'm wildly mistake on their current prices so
call them for details

it uses the same interfaces as postgres so it should be a drop in
replacement to replace a single server with a cluster.

it's facinating technology to read about.

I seem to remember reading that one of the other postgres companies is
also producing a clustered version of postgres, but I don't remember who
and know nothing about them.

David Lang