On Tue, 2004-06-29 at 00:42, Harald Lau (Sector-X) wrote:
> Hi,
>
> I've experienced that PG up to current release does not make use of an index when aggregating. Which of course may
resultin unacceptable answering times
>
> This behaviour is reproducable on any table with any aggregat function in all of my databases on every machine
(PostgreSQL7.4.2 on i386-redhat-linux-gnu and PostgreSQL 7.2.1 on i686-pc-linux-gnu)
>
> f.e. querying against a 2.8-mio-records (2.800.000) table the_table
> SELECT count(*) FROM the_table
> => Seq scan -> takes about 12 sec
>
> SELECT Avg(num_found) AS NumFound FROM the_table --(index on num_found)
> => Seq scan -> takes about 10 sec
>
> SELECT Sum(num_found) AS TotalFound FROM the_table --(index on num_found)
> => Seq scan -> takes about 11 sec
>
> SELECT Max(date_) AS LatestDate FROM the_table --(index on date_)
> => Seq scan -> takes about 14 sec
>
> But
> SELECT date_ AS LatestDate FROM the_table ORDER BY date_ DESC LIMIT 1;
> => Index scan -> takes 0.18 msec
>
> MS SQLServer 2000: Use of an appropriate index _whenever_ aggregating.
>
> Am I doing something wrong?
Yes, you're expecting an MVCC database to behave like a row locking
database.
Due to the way PostgreSQL is put together, it can't count on an index
giving it values, only pointers to values, so to speak. This means it
can use an index, but it will still go to the table to get the right
value.
On the other hand, the trade off is that MVCC can handle much higher
parallel loads, usually.
Note that if you're aggregate is on sub subset of a table, then an index
scan can often be a big win, such as:
create table z(a int, b int);
insert into z values (1,1); (repeat a couple thousand times)
select avg(b) from z where a=3; <-- this can use the index
But note that in the above, the table's rows will still have to be
accessed to get the right values.