Adam,
> I do a lot of reporting based on such SQL statements. Counting rows from
> large datasets. Since the PG gurus don't seem to think this is such a big
> deal can someone enlighten me as to why?
I am not a core developer, but I will take a crack at your question anyway
based on my personal knowledge. I am sure that Tom, Bruce, or Stephan will
correct my mistaken assumptions.
Actually, the inability to index MAX, MIN, and COUNT is an annoyance I think
everyone would like to fix; it's just that the technical challenge is much
greater than the benefit for the core team. If you know a programmer who
wants to tackle it, go ahead.
Presumably you've already read the many previous e-mails on why it is a
techincal challenge.
Now, even if that challenge were solved, indexing for aggregates would still
be of limited usefulness because:
Few (if any) RDBMSs can index for SUM, STDEV, or other "calcuation"
aggregates. This is because the value of every single record must be incuded
and estimates are not possible, so the performance gain from using an index
is infinitessimal except for those RDBMSs with very slow file access times.
For Postgres custom aggregates, using a standard index is impossible, for
reasons I think are obvious.
That leaves MAX, MIN, and COUNT. All of these aggregates should, in an
ideal world, be index-responsive for large data sets. Once again, for small
data sets or subsets, indexes are not useful. And there is a workaround for
Min and Max.
So what we'd be looking at is either developing a special parser routine for
MIN, MAX, and COUNT (and them only) just to index for those aggregates, or
coming up with a new type of index just for aggregates. The first approach
is a bit of kludge that would require a lot of debugging; the second is
probably the best long-term solution, but would require a great deal of
innovation.
--
-Josh Berkus
Aglio Database Solutions
San Francisco