Thread: select count(*) on large tables

select count(*) on large tables

From
Cosimo Streppone
Date:
Hello,

I've followed the last discussion about the particular case of
"select count(*)"s on large tables being somewhat slow.

I've seen also this issue already on the todo list, so I know
it is not a simple question.
This problem arises for me on very large tables, which I mean
starting from 1 million rows and above.

The alternative solution I tried, that has an optimal
speed up, unfortunately is not a way out, and it is based
on "EXPLAIN SELECT count(*)" output parsing, which
is obviously *not* reliable.

The times always get better doing a vacuum (and eventually
reindex) of the table, and they slowly lower again.

Is there an estimate time for this issue to be resolved?
Can I help in some way (code, test cases, ...)?

--
Cosimo


Re: select count(*) on large tables

From
Dennis Bjorklund
Date:
On Thu, 8 Apr 2004, Cosimo Streppone wrote:

> The alternative solution I tried, that has an optimal
> speed up, unfortunately is not a way out, and it is based
> on "EXPLAIN SELECT count(*)" output parsing, which
> is obviously *not* reliable.

Try this to get the estimate:

   SELECT relname, reltuples from pg_class order by relname;

> The times always get better doing a vacuum (and eventually
> reindex) of the table, and they slowly lower again.

Yes, the estimate is updated by the analyze.

> Is there an estimate time for this issue to be resolved?

It's not so easy to "fix". The naive fixes makes other operations slower,
most notably makes things less concurrent which is bad since it wont scale
as good for many users then.

You can always keep the count yourself and have some triggers that update
the count on each insert and delete on the table. It will of course make
all inserts and deletes slower, but if you count all rows often maybe it's
worth it. Most people do not need to count all rows in a table anyway. You
usually count all rows such as this and that (some condition).

--
/Dennis Björklund


Re: select count(*) on large tables

From
Tom Lane
Date:
Cosimo Streppone <cosimo@streppone.it> writes:
> Is there an estimate time for this issue to be resolved?

Approximately never.  It's a fundamental feature of Postgres' design.

As noted by Dennis, you can look at the pg_class statistics if a recent
estimate is good enough, or you can build user-level tracking tools if
you'd rather have fast count(*) than concurrent update capability.  But
don't sit around waiting for the developers to "fix this bug", because
it isn't a bug and it isn't going to be fixed.

            regards, tom lane