Thread: PG's suitability for high volume environment (many INSERTs and lots of aggregation reporting)

PG's suitability for high volume environment (many INSERTs and lots of aggregation reporting)

From

Phoenix Kiula

Date:

28 January 2009, 09:27:06

Hi. Further to my bafflement with the "count(*)" queries as described
in this thread:

http://archives.postgresql.org/pgsql-general/2009-01/msg00804.php

It seems that whenever this question has come up, Postgresql comes up
very short in terms of "count(*)" functions.

The performance is always slow, because of the planner's need to guess
and such. I don't fully understand how the statistics work (and the
explanation on the PG website is way too geeky) but he columns I work
with already have a stat level of 100. Not helping at all.

We are now considering a web based logging functionality for users of
our website. This means the table could be heavily INSERTed into. We
get about 10 million hits a day, and I'm guessing that we will have to
keep this data around for a while.

My question: with that kind of volume and the underlying aggregation
functions (by product id, dates, possibly IP addresses or at least
countries of origin..) will PG ever be a good choice? Or should I be
looking at some other kind of tools? I wonder if OLAP tools would be
overkill for something that needs to look like a barebones version of
google analytics limited to our site..

Appreciate any thoughts. If possible I would prefer to tone down any
requests for MySQL and such!

Thanks!

Re: PG's suitability for high volume environment (many INSERTs and lots of aggregation reporting)

From

Gregory Stark

Date:

28 January 2009, 10:22:37

Phoenix Kiula <phoenix.kiula@gmail.com> writes:

> My question: with that kind of volume and the underlying aggregation
> functions (by product id, dates, possibly IP addresses or at least
> countries of origin..) will PG ever be a good choice?

Well, only you're able to judge that for your own data and use cases.

Your query is sorting 10,000 records in half a second which is not great but
not terrible either. I think the only way you'll be able to speed that up is
by changing your index design so that Postgres can access the data you need
without sorting through all the irrelevant records.

I suspect others already suggested this, but you might look at partial
indexes. If your queries are very dynamic against relatively static data you
might look at building denormalized caches of the precalculated data.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning