Thread: Re: Hardware/OS recommendations for large databases (
> > It certainly makes quite a difference as I measure it: > > doing select(1) from a 181000 page table (completely uncached) on my PIII: > > 8.0 : 32 s > 8.1 : 25 s > > Note that the 'fastcount()' function takes 21 s in both cases - so all > the improvement seems to be from the count overhead reduction. Are you running windows? There is a big performance improvement in count(*) on pg 8.0->8.1 on win32 that is not relevant to this debate... Merlin
Merlin Moncure wrote: >>It certainly makes quite a difference as I measure it: >> >>doing select(1) from a 181000 page table (completely uncached) on my > > PIII: > >>8.0 : 32 s >>8.1 : 25 s >> >>Note that the 'fastcount()' function takes 21 s in both cases - so all >>the improvement seems to be from the count overhead reduction. > > > Are you running windows? There is a big performance improvement in > count(*) on pg 8.0->8.1 on win32 that is not relevant to this debate... > No - FreeBSD 6.0 on a dual PIII 1 Ghz. The slow cpu means that the 8.1 improvements are very noticeable! A point of interest - applying Niels palloc - avoiding changes to NodeAgg.c and int8.c in 8.0 changes those results to: 8.0 + palloc avoiding patch : 27 s (I am guessing the remaining 2 s could be shaved off if I backported 8.1's virtual tuples - however that looked like a lot of work) Cheers Mark
Mark, On 11/28/05 1:45 PM, "Mark Kirkwood" <markir@paradise.net.nz> wrote: >>> 8.0 : 32 s >>> 8.1 : 25 s A 22% reduction. select count(1) on 12,900MB = 1617125 pages fully cached: MPP based on 8.0 : 6.06s MPP based on 8.1 : 4.45s A 26% reduction. I'll take it! I am looking to back-port Tom's pre-8.2 changes and test again, maybe tonight. - Luke
Forgive my ignorance, but what is MPP? Is that part of Bizgres? Is it possible to upgrade from Postgres 8.1 to Bizgres? Thanks, ____________________________________________________________________ Brendan Duddridge | CTO | 403-277-5591 x24 | brendan@clickspace.com ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Nov 28, 2005, at 3:05 PM, Luke Lonergan wrote: > Mark, > > On 11/28/05 1:45 PM, "Mark Kirkwood" <markir@paradise.net.nz> wrote: > >>>> 8.0 : 32 s >>>> 8.1 : 25 s > > A 22% reduction. > > select count(1) on 12,900MB = 1617125 pages fully cached: > > MPP based on 8.0 : 6.06s > MPP based on 8.1 : 4.45s > > A 26% reduction. > > I'll take it! > > I am looking to back-port Tom's pre-8.2 changes and test again, maybe > tonight. > > - Luke > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
Attachment
Hi David, Thanks for your reply. So how is that different than something like Slony2 or pgcluster with multi-master replication? Is it similar technology? We're currently looking for a good clustering solution that will work on our Apple Xserves and Xserve RAIDs. Thanks, ____________________________________________________________________ Brendan Duddridge | CTO | 403-277-5591 x24 | brendan@clickspace.com ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Nov 27, 2005, at 8:09 PM, David Lang wrote: > On Mon, 28 Nov 2005, Brendan Duddridge wrote: > >> Forgive my ignorance, but what is MPP? Is that part of Bizgres? Is >> it possible to upgrade from Postgres 8.1 to Bizgres? > > MPP is the Greenplum propriatary extention to postgres that spreads > the data over multiple machines, (raid, but with entire machines > not just drives, complete with data replication within the cluster > to survive a machine failing) > > for some types of queries they can definantly scale lineraly with > the number of machines (other queries are far more difficult and > the overhead of coordinating the machines shows more. this is one > of the key things that the new version they recently announced the > beta for is supposed to be drasticly improving) > > early in the year when I first looked at them their prices were > exorbadent, but Luke says I'm wildly mistake on their current > prices so call them for details > > it uses the same interfaces as postgres so it should be a drop in > replacement to replace a single server with a cluster. > > it's facinating technology to read about. > > I seem to remember reading that one of the other postgres companies > is also producing a clustered version of postgres, but I don't > remember who and know nothing about them. > > David Lang >
Attachment
Brendan Duddridge wrote: > Thanks for your reply. So how is that different than something like > Slony2 or pgcluster with multi-master replication? Is it similar > technology? We're currently looking for a good clustering solution > that will work on our Apple Xserves and Xserve RAIDs. I think you need to be more specific about what you're trying to do. 'clustering' encompasses so many things that it means almost nothing by itself. slony provides facilities for replicating data. Its primary purpose is to improve reliability. MPP distributes both data and queries. Its primary purpose is to improve performance for a subset of all query types.
On Mon, 28 Nov 2005, Brendan Duddridge wrote: > Hi David, > > Thanks for your reply. So how is that different than something like Slony2 or > pgcluster with multi-master replication? Is it similar technology? We're > currently looking for a good clustering solution that will work on our Apple > Xserves and Xserve RAIDs. MPP doesn't just split up the data, it splits up the processing as well, so if you have a 5 machine cluster, each machine holds 1/5 of your data (plus a backup for one of the other machines) and when you do a query MPP slices and dices the query to send a subset of the query to each machine, it then gets the responses from all the machines and combines them if you ahve to do a full table scan for example, wach machine would only have to go through 20% of the data a Slony of pgcluster setup has each machine with a full copy of all the data, only one machine can work on a given query at a time, and if you have to do a full table scan one machine needs to read 100% of the data. in many ways this is the holy grail of databases. almost all other areas of computing can now be scaled by throwing more machines at the problem in a cluster, with each machine just working on it's piece of the problem, but databases have had serious trouble doing the same and so have been ruled by the 'big monster machine'. Oracle has been selling Oracle Rac for a few years, and reports from people who have used it range drasticly (from it works great, to it's a total disaster), in part depending on the types of queries that have been made. Greenplum thinks that they have licked the problems for the more general case (and that commodity networks are now fast enough to match disk speeds in processing the data) if they are right then when they hit full release with the new version they should be cracking a lot of the price/performance records on the big database benchmarks (TPC and similar), and if their pricing is reasonable, they may be breaking them by an order of magnatude or more (it's not unusual for the top machines to spend more then $1,000,000 on just their disk arrays for those systems, MPP could conceivably put togeather a cluster of $5K machines that runs rings around them (and probably will for at least some of the subtests, the big question is if they can sweep the board and take the top spots outright) they have more details (and marketing stuff) on their site at http://www.greenplum.com/prod_deepgreen_cluster.html don't get me wrong, I am very impressed with their stuff, but (haveing ranted a little here on the list about them) I think MPP and it's performace is a bit off topic for the postgres performance list (at least until the postgres project itself starts implementing similar features :-) David Lang > Thanks, > > ____________________________________________________________________ > Brendan Duddridge | CTO | 403-277-5591 x24 | brendan@clickspace.com > > ClickSpace Interactive Inc. > Suite L100, 239 - 10th Ave. SE > Calgary, AB T2G 0V9 > > http://www.clickspace.com > > On Nov 27, 2005, at 8:09 PM, David Lang wrote: > >> On Mon, 28 Nov 2005, Brendan Duddridge wrote: >> >>> Forgive my ignorance, but what is MPP? Is that part of Bizgres? Is it >>> possible to upgrade from Postgres 8.1 to Bizgres? >> >> MPP is the Greenplum propriatary extention to postgres that spreads the >> data over multiple machines, (raid, but with entire machines not just >> drives, complete with data replication within the cluster to survive a >> machine failing) >> >> for some types of queries they can definantly scale lineraly with the >> number of machines (other queries are far more difficult and the overhead >> of coordinating the machines shows more. this is one of the key things that >> the new version they recently announced the beta for is supposed to be >> drasticly improving) >> >> early in the year when I first looked at them their prices were exorbadent, >> but Luke says I'm wildly mistake on their current prices so call them for >> details >> >> it uses the same interfaces as postgres so it should be a drop in >> replacement to replace a single server with a cluster. >> >> it's facinating technology to read about. >> >> I seem to remember reading that one of the other postgres companies is also >> producing a clustered version of postgres, but I don't remember who and >> know nothing about them. >> >> David Lang >> > >
On Mon, 28 Nov 2005, Brendan Duddridge wrote: > Forgive my ignorance, but what is MPP? Is that part of Bizgres? Is it > possible to upgrade from Postgres 8.1 to Bizgres? MPP is the Greenplum propriatary extention to postgres that spreads the data over multiple machines, (raid, but with entire machines not just drives, complete with data replication within the cluster to survive a machine failing) for some types of queries they can definantly scale lineraly with the number of machines (other queries are far more difficult and the overhead of coordinating the machines shows more. this is one of the key things that the new version they recently announced the beta for is supposed to be drasticly improving) early in the year when I first looked at them their prices were exorbadent, but Luke says I'm wildly mistake on their current prices so call them for details it uses the same interfaces as postgres so it should be a drop in replacement to replace a single server with a cluster. it's facinating technology to read about. I seem to remember reading that one of the other postgres companies is also producing a clustered version of postgres, but I don't remember who and know nothing about them. David Lang