Re: Aggregates and Indexes - Mailing list pgsql-novice
From | Ron Johnson |
---|---|
Subject | Re: Aggregates and Indexes |
Date | |
Msg-id | 1028334457.17801.92.camel@haggis Whole thread Raw |
In response to | Re: Aggregates and Indexes (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-novice |
On Fri, 2002-08-02 at 15:43, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > > For Postgres custom aggregates, using a standard index is impossible, for > > reasons I think are obvious. > > > That leaves MAX, MIN, and COUNT. All of these aggregates should, in an > > ideal world, be index-responsive for large data sets. > > While it's fairly clear how one might use an index for MAX/MIN > (basically, make the optimizer transform it into a SELECT ... ORDER BY > ... LIMIT 1 operation, which can then be done with an indexscan), > I really don't see how an index can help for COUNT. > > The real problem with COUNT is that any attempt to maintain such a value > on-the-fly creates a single-point bottleneck for all insertions and > deletions on the table. The perspective of most of the developers is > that that cost outweighs any possible savings from having an > instantaneous COUNT operation. Blecch! Who wants to keep current counts????? You are right that that is totally silly. The closed-source database I use (Rdb/VMS on Alpha VMS) takes great advantage of indexes when processing COUNT, SUM, MIN & MAX. For example: CREATE INDEX ndx1 on table1 (fld1, fld2); SELECT fld1, COUNT(*) FROM table1 WHERE fld1 = 'bar' GROUP BY fld1; Since the index stores key-values and oids that point to the data, Rdb looks at the "wedge" of ndx1 that matches "fld1 = 'bar'" and counts up the oids that meet the criteria. In more complicated queries, where there is only partial index support for the query, yes, you will have to hit the live table, but the index should minimize the number of pages in the table that the must read. Thus, the index still speeds up the query. > When you add in the issue that under MVCC there isn't a unique COUNT > that's the same for all transactions, it's just not worth thinking > about. (And do I need to point out that with WHERE conditions, > GROUP BY, or a variable COUNT argument, all hope of such optimization > disappears anyway? A global rowcount doesn't help in those cases.) > > The MAX/MIN issue will probably be addressed someday, but since there > is a good workaround it's not very high on anyone's TODO queue. We have > many more-pressing problems. -- +-----------------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA | | | | "The greatest dangers to liberty lurk in insidious encroachment | | by men of zeal, well-meaning, but without understanding." | | Justice Louis Brandeis, dissenting, Olmstead v US (1928) | +-----------------------------------------------------------------+
pgsql-novice by date: