Re: getting count for a specific querry - Mailing list pgsql-sql

From Tom Lane
Subject Re: getting count for a specific querry
Date
Msg-id 16323.1112988473@sss.pgh.pa.us
Whole thread Raw
In response to Re: getting count for a specific querry  (Bob Henkel <luckyratfoot@gmail.com>)
Responses Re: getting count for a specific querry
List pgsql-sql
Bob Henkel <luckyratfoot@gmail.com> writes:
> From a simple/high level perspective why is this? That is why can't
> PostgreSQL do aggregates as well across large chunks of data. I'm
> assuming it extremely complicated. Otherwise the folks around here
> would have churned out a fix in a month or less and made this issue a
> past story.

You can find very detailed discussions of this in the archives, but
the basic reason is that we have a very general/extensible view of
aggregates (which is how come we can support custom aggregates).
An aggregate is a function that you feed all the input rows to, one
at a time, and then it produces the answer.  Nice, general, extensible,
and not at all optimizable :-(

Now in general that is the only way to do it, and so Scott's implication
that we always suck compared to other databases is really an
overstatement.  Ask another database to do a standard deviation
calculation, for instance, and it'll be just as slow.  However there are
special cases that other DBs can optimize that we don't even try to.
The big ones are:

* COUNT(*) across a whole table --- most non-MVCC databases keep tabs of
the physical number of the rows in the table, and so they can answer
this very quickly.  Postgres doesn't keep such a count, and under MVCC
rules it wouldn't necessarily be the right answer if we had it.
(BTW, count of rows satisfying a particular condition is a different
ballgame entirely; in most cases that can't be optimized at all, AFAIK.)
If you are willing to accept approximate answers there are various
tricks you can use --- see the archives --- but we don't get to fudge
on COUNT(*) itself because it's in the SQL standard.

* MIN or MAX of an indexed column --- most DBs can use an index scan to
find such a row relatively quickly, although whether this trick works or
not depends a whole lot on whether you have WHERE or GROUP BY and just
what those conditions look like.

You can fake the min/max answer in Postgres by doing the transformstion
to an indexable query by hand, for instance instead of MAX(col) doSELECT col FROM tab ORDER BY col DESC LIMIT 1;

There are periodic discussions in the hackers list about teaching the
planner to do that automatically, and it will probably happen someday;
but it's a complicated task and not exceedingly high on the priority list.
        regards, tom lane


pgsql-sql by date:

Previous
From: "Joel Fradkin"
Date:
Subject: Re: getting count for a specific querry
Next
From: Andrew Sullivan
Date:
Subject: Re: getting count for a specific querry