Re: PostgreSQL, OLAP, and Large Clusters - Mailing list pgsql-general

From Ondrej Ivanič
Subject Re: PostgreSQL, OLAP, and Large Clusters
Date
Msg-id CAM6mieKG0XSeZ+A=jFGqGCPM9ZXSWQo+Ob6oL=cbXknZFN_ozQ@mail.gmail.com
Whole thread Raw
In response to PostgreSQL, OLAP, and Large Clusters  (Ryan Kelly <rpkelly22@gmail.com>)
Responses Re: PostgreSQL, OLAP, and Large Clusters
List pgsql-general
Hi,

On 26 September 2012 21:50, Ryan Kelly <rpkelly22@gmail.com> wrote:
> The size of our database is growing rather rapidly. We're concerned
> about how well Postgres will scale for OLAP-style queries over terabytes
> of data. Googling around doesn't yield great results for vanilla
> Postgres in this application, but generally links to other software like
> Greenplum, Netezza, and Aster Data (some of which are based off of
> Postgres). Too, there are solutions like Stado. But I'm concerned about
> the amount of effort to use such solutions and what we would have to
> give up feature-wise.

We were able to have 2TB of data (16 Cores, 96GB RAM, FC connected
SAN). Performance was good:
- up to 5sec for simple select with multiple where conditions (2 - 25)
order by any column (of 400 columns) and return top 10k
- up to 15sec for executing 5 queries (simultaneously) which return
top 20 combination of any two columns
- up to 25sec for executing 56 queries (using materialised aggregate tables)
- aggregation job ran every 15 minutes and completed under 2 minutes:
5mil rows -> aggregation -> 56 tables
- all queries can be executed over date range up to several months
(monthly partitioned tables, 6 months history)

but it was very hard to scale this solution. We  have tried:
- FusionIO cards: 10 to 100 times better performance, but very hard to
expand storage capacity; Cooling/power issues
- AsterData: nice SQL-MR feature and analytics (decision trees,
frequent items, clustering, ...); No libpq support and you have to use
JDBC or selected ODBC manager
- Greenplum (winer): performance comparable to FusionIO (10 to 50
times); we were able to remove aggregation job (because of columnar
store model); easy to port from postgres but could be complicated if
you are heavy pgpsql user

At this time I would try:
- Postgres-XC
- Stado
- Cassandra + Solr + Hadoop (for example DataStax Enterprise)
- Birst (http://www.birst.com/)

PS. For any commercial solution is better to buy their appliance (if
you can afford it...).

--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)


pgsql-general by date:

Previous
From: Samuel Gendler
Date:
Subject: Re: [PERFORM] Inaccurate Explain Cost
Next
From: Adrian Klaver
Date:
Subject: Re: Odd Invalid type name error in postgresql 9.1