Thread: Postgres or Greenplum
Hi
I have been using Postgres for many years and have recently discover Greenplum, which appears to be a heavily modify Postgres based, multi node DB that is VERY fast.
All the tests that I have seen suggest that Greenplum when implemented on a single server, like Postgres, but with several separate installations can be many time times faster than Postgres. This is achieved by using multiple DBs to store the data and using multiple logger and writer processes to fully use the all the resources of the server.
Has the Postgres development team ever considered using this technique to split the data into separate sequential files that can be accessed by multiple writers/reader processes? If so, what was the conclusion?
Finally, thanks for all the good work over the years!
Simon
Simon Windsor
Eml: simon.windsor@cornfield.org.uk
Tel: 01454 617689
Mob: 07590 324560
“There is nothing in the world that some man cannot make a little worse and sell a little cheaper, and he who considers price only is that man's lawful prey.”
"Simon Windsor" <simon.windsor@cornfield.me.uk> writes: > I have been using Postgres for many years and have recently discover > Greenplum, which appears to be a heavily modify Postgres based, multi node > DB that is VERY fast. Very fast on a very narrow set of use cases ... regards, tom lane
On Tue, Jun 7, 2011 at 10:26 PM, Simon Windsor <simon.windsor@cornfield.me.uk> wrote: > I have been using Postgres for many years and have recently discover > Greenplum, which appears to be a heavily modify Postgres based, multi node > DB that is VERY fast. > > All the tests that I have seen suggest that Greenplum when implemented on a > single server, like Postgres, but with several separate installations can > be many time times faster than Postgres. This is achieved by using multiple > DBs to store the data and using multiple logger and writer processes to > fully use the all the resources of the server. > > Has the Postgres development team ever considered using this technique to > split the data into separate sequential files that can be accessed by > multiple writers/reader processes? If so, what was the conclusion? > > Finally, thanks for all the good work over the years! Yes, I've looked at implementing parallel query a number of times. My estimate was that its about 2 man years effort to do something worthwhile there, and so far nobody has offered funding for such a task. There was some recent discussion about obtaining funding recently, so we'll see how that goes. It is of course reasonably straightforward to achieve trivial parallelism, but that's mostly useless in the real world. So its on the roadmap, but some way off yet. Many commercial implementations exist, and IMHO the Greenplum solution is the best general purpose DW solution currently available for PostgreSQL-like environments. Greenplum does have a community edition that is free to use and your stated performance results match my experience. We've worked with a number of data warehouse customers hitting the limits and moving up to Greenplum. Once people give up the Oracle mantra, it frees them to consider a range of alternatives. Main reasons for deferring work on parallel query has been that other techniques have been easier to achieve useful gains with. For example, partitioning allowed PostgreSQL to dramatically reduce scan times with less complexity. Synchronous scans can also achieve good efficiencies for cases where total throughput is important. I expect to do more work on improving decision support query performance in the next release (9.2), so if anybody wishes to partially fund development that would be much appreciated. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, 7 Jun 2011 23:04:04 +0100, Simon Riggs wrote: > On Tue, Jun 7, 2011 at 10:26 PM, Simon Windsor > <simon.windsor@cornfield.me.uk> wrote: > >> I have been using Postgres for many years and have recently discover >> Greenplum, which appears to be a heavily modify Postgres based, >> multi node >> DB that is VERY fast. >> >> All the tests that I have seen suggest that Greenplum when >> implemented on a >> single server, like Postgres, but with several separate >> installations can >> be many time times faster than Postgres. This is achieved by using >> multiple >> DBs to store the data and using multiple logger and writer >> processes to >> fully use the all the resources of the server. >> >> Has the Postgres development team ever considered using this >> technique to >> split the data into separate sequential files that can be accessed >> by >> multiple writers/reader processes? If so, what was the conclusion? >> >> Finally, thanks for all the good work over the years! > > Yes, I've looked at implementing parallel query a number of times. My > estimate was that its about 2 man years effort to do something > worthwhile there, and so far nobody has offered funding for such a > task. There was some recent discussion about obtaining funding > recently, so we'll see how that goes. It is of course reasonably > straightforward to achieve trivial parallelism, but that's mostly > useless in the real world. So its on the roadmap, but some way off > yet. > > Many commercial implementations exist, and IMHO the Greenplum > solution > is the best general purpose DW solution currently available for > PostgreSQL-like environments. Greenplum does have a community edition > that is free to use and your stated performance results match my > experience. We've worked with a number of data warehouse customers > hitting the limits and moving up to Greenplum. Once people give up > the > Oracle mantra, it frees them to consider a range of alternatives. > > Main reasons for deferring work on parallel query has been that other > techniques have been easier to achieve useful gains with. For > example, > partitioning allowed PostgreSQL to dramatically reduce scan times > with > less complexity. Synchronous scans can also achieve good efficiencies > for cases where total throughput is important. I expect to do more > work on improving decision support query performance in the next > release (9.2), so if anybody wishes to partially fund development > that > would be much appreciated. > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services But, I think GreenPlum is "share nothing", isn't it? Regards, Radek
On 07/06/2011 23.52, Tom Lane wrote: > Very fast on a very narrow set of use cases ... Can you explain a little (if possible)? Thank you
Hi Radoslaw, On Wed, 08 Jun 2011 07:30:33 +0200, Radosław Smogura <rsmogura@softperience.eu> wrote: > > But, I think GreenPlum is "share nothing", isn't it? Yes, indeed. In very simple words Greenplum is a parallel processing database solution that implements the "shared-nothing" architecture. One master server is responsible for managing data distribution and query processing among several segments - usually residing on multiple servers (that do not share any physical resource, but the network). The shared nothing architecture allows the system to be linearly scalable by adding commodity hardware to the cluster. There is a special version of Greenplum (Greenplum Community Edition) that can be used for testing and development (the license allows usage in production environments as well under certain circumstances - for instance research). Greenplum is owned by EMC and is not open-source. Greenplum typical usage is for data warehousing purposes, as main source for business intelligence and analytics query. If you are interested in topics like this, I suggest that you participate to Char(11) (www.char11.org), the second edition of the Clustering, High Availability and Replication conference. Cheers, Gabriele -- Gabriele Bartolini - 2ndQuadrant Italia PostgreSQL Training, Services and Support Gabriele.Bartolini@2ndQuadrant.it - www.2ndQuadrant.it