Home > mailing lists

Re: PG performance in high volume environment (many INSERTs and lots of aggregation reporting) - Mailing list pgsql-performance

From	Chris Browne
Subject	Re: PG performance in high volume environment (many INSERTs and lots of aggregation reporting)
Date	January 29, 2009 17:13:18
Msg-id	87ab99esq8.fsf@dba2.int.libertyrms.com Whole thread Raw
In response to	PG performance in high volume environment (many INSERTs and lots of aggregation reporting) (Phoenix Kiula <phoenix.kiula@gmail.com>)
Responses	Re: PG performance in high volume environment (many INSERTs and lots of aggregation reporting)
List	pgsql-performance

Tree view

phoenix.kiula@gmail.com (Phoenix Kiula) writes:
> [Ppsted similar note to PG General but I suppose it's more appropriate
> in this list. Apologies for cross-posting.]
>
> Hi. Further to my bafflement with the "count(*)" queries as described
> in this thread:
>
> http://archives.postgresql.org/pgsql-general/2009-01/msg00804.php
>
> It seems that whenever this question has come up, Postgresql comes up
> very short in terms of "count(*)" functions.
>
> The performance is always slow, because of the planner's need to guess
> and such. I don't fully understand how the statistics work (and the
> explanation on the PG website is way too geeky) but he columns I work
> with already have a stat level of 100. Not helping at all.

That's definitely *NOT* due to "planner's need to guess"; it's due to
there being some *specific* work that PostgreSQL needs to do that some
other databases can avoid due to different storage strategies.

The matter is quite succinctly described here:

http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007#Counting_rows_in_a_table

I'll just take one excerpt:
---------------------------
It is worth observing that it is only this precise form of aggregate
that must be so pessimistic; if augmented with a "WHERE" clause like

SELECT COUNT(*) FROM table WHERE status = 'something'

PostgreSQL, MySQL, and most other database implementations will take
advantage of available indexes against the restricted field(s) to
limit how many records must be counted, which can greatly accelerate
such queries.
---------------------------

It is common for systems where it is necessary for aggregation
reporting to be fast to do pre-computation of the aggregates, and that
is in no way specific to PostgreSQL.

If you need *really* fast aggregates, then it will be worthwhile to
put together triggers or procedures or something of the sort to help
pre-compute the aggregates.
--
(reverse (concatenate 'string "ofni.sesabatadxunil" "@" "enworbbc"))
http://linuxfinances.info/info/wp.html
"When you have eliminated the impossible, whatever remains, however
improbable, must be the truth." -- Sir Arthur Conan Doyle (1859-1930),
English author. Sherlock Holmes, in The Sign of Four, ch. 6 (1889).
[...but see the Holmesian Fallacy, due to Bob Frankston...
<http://www.frankston.com/public/Essays/Holmesian%20Fallacy.asp>]

pgsql-performance by date:

From: "Subbiah Stalin-XCGF84"
Date: 29 January 2009, 16:22:27
Subject: Sort performance

From: Robert Haas
Date: 29 January 2009, 19:09:58
Subject: Re: Using multiple cores for index creation?

Re: PG performance in high volume environment (many INSERTs and lots of aggregation reporting) - Mailing list pgsql-performance

Previous

Next