Re: count(*) slow on large tables - Mailing list pgsql-performance

From Greg Stark
Subject Re: count(*) slow on large tables
Date
Msg-id 87brsyrjiz.fsf@stark.dyndns.tv
Whole thread Raw
In response to Re: count(*) slow on large tables  (Christopher Browne <cbbrowne@libertyrms.info>)
List pgsql-performance
Christopher Browne <cbbrowne@libertyrms.info> writes:

> It would be very hairy to implement it correctly, and all this would
> cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"
>
> If you had a single WHERE clause attached, you would have to revert to
> walking through the tuples looking for the ones that are live and
> committed, which is true for any DBMS.

Well it would be handy for a few other cases as well.

1 It would be useful for the case where you have a partial index with a
  matching where clause. The optimizer already considers using such indexes
  but it has to pay the cost of the tuple lookup, which is substantial.

2 It would be useful for the very common queries of the form
  WHERE x IN (select id from foo where some_indexed_expression)

  (Or the various equivalent forms including outer joins that test to see if
  the matching record was found and don't retrieve any other columns in the
  select list.)

3 It would be useful for many-many relationships where the intermediate table
  has only the two primary key columns being joined. If you create a
  multi-column index on the two columns it shouldn't need to look up the
  tuple. This would be effectively be nearly equivalent to an "index organized
  table".


4 It would be useful for just about all the referential integrity queries...


I don't mean to say this is definitely a good thing. The tradeoff in
complexity and time to maintain the index pages would be large. But don't
dismiss it as purely a count(*) optimization hack.

I know Oracle is capable of it and it can speed up your query a lot when you
remove that last unnecessary column from a join table allowing oracle to skip
the step of reading the table.

--
greg

pgsql-performance by date:

Previous
From: Dror Matalon
Date:
Subject: Re: count(*) slow on large tables
Next
From: Josh Berkus
Date:
Subject: Re: TPC-R benchmarks