Re: [NOVICE] Aggregates and Indexes - Mailing list pgsql-sql

From Josh Berkus
Subject Re: [NOVICE] Aggregates and Indexes
Date
Msg-id 200208021320.56468.josh@agliodbs.com
Whole thread Raw
Responses Re: [NOVICE] Aggregates and Indexes
Re: [NOVICE] Aggregates and Indexes
List pgsql-sql
Adam,

> I do a lot of reporting based on such SQL statements.  Counting rows from
> large datasets.  Since the PG gurus don't seem to think this is such a big
> deal can someone enlighten me as to why?

I am not a core developer, but I will take a crack at your question anyway
based on my personal knowledge.  I am sure that Tom, Bruce, or Stephan will
correct my mistaken assumptions.

Actually, the inability to index MAX, MIN, and COUNT is an annoyance I think
everyone would like to fix; it's just that the technical challenge is much
greater than the benefit for the core team.   If you know a programmer who
wants to tackle it, go ahead.

Presumably you've already read the many previous e-mails on why it is a
techincal challenge.

Now, even if that challenge were solved, indexing for aggregates would still
be of limited usefulness because:

Few (if any) RDBMSs can index for SUM, STDEV, or other "calcuation"
aggregates.  This is because the value of every single record must be incuded
and estimates are not possible, so the performance gain from using an index
is infinitessimal except for those RDBMSs with very slow file access times.

For Postgres custom aggregates, using a standard index is impossible, for
reasons I think are obvious.

That leaves MAX, MIN, and COUNT.    All of these aggregates should, in an
ideal world, be index-responsive for large data sets.  Once again, for small
data sets or subsets, indexes are not useful.   And there is a workaround for
Min and Max.

So what we'd be looking at is either developing a special parser routine for
MIN, MAX, and COUNT (and them only) just to index for those aggregates, or
coming up with a new type of index just for aggregates.   The first approach
is a bit of kludge that would require a lot of debugging; the second is
probably the best long-term solution, but would require a great deal of
innovation.


--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


pgsql-sql by date:

Previous
From: Stephan Szabo
Date:
Subject: Re: Seeking advice regarding a design problem
Next
From: Tom Lane
Date:
Subject: Re: [NOVICE] Aggregates and Indexes