Thread: Make COUNT(*) Faster?

Make COUNT(*) Faster?

From

Varun Mehta

Date:

07 July 2005, 19:48:11

Hello all you PostgreSQL/SQL gurus!

I've started using PostgreSQL pretty recently, and I am quite  
disturbed about the performance of a simple SELECT COUNT(*) FROM  
table.  What should (in my mind) be a nearly instantaneous operation  
instead takes nearly 700ms in a table with only 87k rows of data!

If I run an EXPLAIN on this query I can see that it is doing a  
sequential scan, which seems quite needless, as surely this  
information is cached in some secret location.

It is very possible that I am missing something, so I ask you: is  
there a faster way to find out how many rows are in a table?  I've  
tried doing a COUNT(column) where I have an index on column, but it  
still does a sequential scan and it is still very very slow.  What  
are my options?

I offer you many thanks in advance,
Varun Mehta

Re: Make COUNT(*) Faster?

From

Michael Fuhr

Date:

07 July 2005, 23:22:03

On Thu, Jul 07, 2005 at 03:48:39PM -0700, Varun Mehta wrote:
> 
> I've started using PostgreSQL pretty recently, and I am quite  
> disturbed about the performance of a simple SELECT COUNT(*) FROM  
> table.  What should (in my mind) be a nearly instantaneous operation  
> instead takes nearly 700ms in a table with only 87k rows of data!

Speeding up COUNT is on the developers' TODO list, but it's not as
simple as it might seem or it would have been done already.  This
has been brought up many times over the years -- search the archives
to see past discussion.  Words to search for include "MVCC," "index,"
and "visibility."

> If I run an EXPLAIN on this query I can see that it is doing a  
> sequential scan, which seems quite needless, as surely this  
> information is cached in some secret location.

If an estimate will suffice then you could use the table's
pg_class.reltuples value, but beware that it can be rather
out of date.

http://www.postgresql.org/docs/8.0/static/catalog-pg-class.html

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: Make COUNT(*) Faster?

From

Chris Browne

Date:

08 July 2005, 01:04:16

vmehta@apple.com (Varun Mehta) writes:
> If I run an EXPLAIN on this query I can see that it is doing a
> sequential scan, which seems quite needless, as surely this
> information is cached in some secret location.

That would in fact surely *NOT* be the case.

If you have multiple users performing updates on that table
concurrently, with the possibility of some of those updates rolling
back, then it doesn't make sense for there to be any such "one place"
where a count would be stored.

Consider the case where you ask for COUNT(*) while the following set
of transactions are outstanding:
 1.  A transaction, which, as it turns out, will get rolled back,     that has inserted 40 tuples;
 2.  A transaction which has modified 10 tuples, thereby generating     10 dead tuples and adding 10 new ones;
 3.  14 transactions are outstanding, each of which have added     2 tuples to the table.

None of those transactions have COMMITted, so there are some 78 tuples
"in limbo" spread across 16 transactions.

If there were some "single secret place" with a count, how would you
suggest it address those 78 tuples and 16 transactions that aren't yet
(and maybe never will be) part of the count?

> It is very possible that I am missing something, so I ask you: is
> there a faster way to find out how many rows are in a table?  I've
> tried doing a COUNT(column) where I have an index on column, but it
> still does a sequential scan and it is still very very slow.  What
> are my options?

Use of the index doesn't help because the index isn't forcibly up to
date.  It has no notion of marking "index tuples" as dead/not visible.
Visibility information is only attached to the tuples themselves.

Look up "MVCC" for more details...
-- 
(format nil "~S@~S" "cbbrowne" "acm.org")
http://www.ntlug.org/~cbbrowne/sap.html
Rules of the Evil Overlord #78.  "I will not tell my Legions of Terror
"And he must  be taken alive!" The command will be:  ``And try to take
him alive if it is reasonably practical.''"
<http://www.eviloverlord.com/>

Re: Make COUNT(*) Faster?

From

Tom Lane

Date:

08 July 2005, 01:27:30

Chris Browne <cbbrowne@acm.org> writes:
> vmehta@apple.com (Varun Mehta) writes:
>> If I run an EXPLAIN on this query I can see that it is doing a
>> sequential scan, which seems quite needless, as surely this
>> information is cached in some secret location.

> [ example scenario snipped ]
> If there were some "single secret place" with a count, how would you
> suggest it address those 78 tuples and 16 transactions that aren't yet
> (and maybe never will be) part of the count?

It's worse than that: once some of those transactions have committed,
the right answer is observer-dependent, since some onlooker transactions
may see those guys as committed while others think they are not yet
committed.  So there could certainly not be just one secret place...

There are solutions suggested in the archives, but they all amount
to making COUNT(*)-with-no-WHERE-or-GROUP-BY-clause fast at the price
of nontrivial distributed overhead for all updates --- overhead that
would be paid whether or not the application ever did such a COUNT.
That's not a tradeoff we've wanted to make in general.  You can
implement it yourself via triggers for specific tables that you think
it's worth doing for.

Also, if an approximate answer is good enough, there are a whole other
set of possible solutions.
        regards, tom lane

Re: Make COUNT(*) Faster?

From

Steve Wampler

Date:

08 July 2005, 11:12:39

Chris Browne wrote:

> None of those transactions have COMMITted, so there are some 78 tuples
> "in limbo" spread across 16 transactions.
> 
> If there were some "single secret place" with a count, how would you
> suggest it address those 78 tuples and 16 transactions that aren't yet
> (and maybe never will be) part of the count?

Hmmm, I understand this and don't doubt it, but out of curiousity, how
does the current SELECT COUNT(*) handle this?  It doesn't lock the entire
table while counting (I assume) so the current implementation is really
just an approximate count in the above scenario anyway.  Or even when
not, since the true 'count' is likely to have changed by the time the
user does anything with the result of SELECT COUNT(*) on any active table
(and on an inactive table, pg_class.reltuples is nearly as good as
SELECT COUNT(*) and far faster to get to.)

I assume this has been beaten well past death, but I don't see why it
wouldn't be possible to keep pg_class.reltuples a bit more up-to-date
instead of updating it only on vacuums.

-- 
Steve Wampler -- swampler@noao.edu
The gods that smiled on your birth are now laughing out loud.

Re: Make COUNT(*) Faster?

From

Bruno Wolff III

Date:

08 July 2005, 11:34:47

On Fri, Jul 08, 2005 at 07:12:26 -0700, Steve Wampler <swampler@noao.edu> wrote:
> 
> Hmmm, I understand this and don't doubt it, but out of curiousity, how
> does the current SELECT COUNT(*) handle this?  It doesn't lock the entire

It only counts tuples visible to the current transaction.

> table while counting (I assume) so the current implementation is really
> just an approximate count in the above scenario anyway.  Or even when

No, it is an exact count.

> not, since the true 'count' is likely to have changed by the time the

There is no single true count. There is a separate true count for each
transaction.

> user does anything with the result of SELECT COUNT(*) on any active table
> (and on an inactive table, pg_class.reltuples is nearly as good as
> SELECT COUNT(*) and far faster to get to.)
> 
> I assume this has been beaten well past death, but I don't see why it
> wouldn't be possible to keep pg_class.reltuples a bit more up-to-date
> instead of updating it only on vacuums.

Because it costs resources to keep track of that and people don't usually need
exact tuple counts for whole tables. Those that do and are willing to pay the
price can use triggers to maintain a count in a separate table.

Re: Make COUNT(*) Faster?

From

Steve Wampler

Date:

08 July 2005, 12:07:41

Bruno Wolff III wrote:
> No, it is an exact count.

Yes, for the transaction, but it's an approximation of the number of
tuples in the table - which is probably what the people who worry about
its cost are more interested in (an approximate count for the table).
I'm also claiming that a true count for any active table is meaningless and
am *not* suggesting that effort be spent on trying to produce such
a true count.

>>I assume this has been beaten well past death, but I don't see why it
>>wouldn't be possible to keep pg_class.reltuples a bit more up-to-date
>>instead of updating it only on vacuums.
> 
> 
> Because it costs resources to keep track of that and people don't usually need
> exact tuple counts for whole tables. 

Yes, we agree completely! (Which is why I said 'a bit more' instead of
'exactly' above.)  My uses for COUNT(*) are to get 'reasonable' approximate
counts of the table sizes - not true counts, but approximate values.  Unfortunately,
pg_class.reltuples gets too far off too fast for me to use it as a consistent guide
to current table size.  If you Folks Who Know believe that simply keeping
pg_class.reltuples 'closer' to the actual table size is too expensive, I'll
accept that [after all, I have to right now anyway], but I'm surprised that
it is, given all the other work that must go on at the start/close of a transaction.

I also understand that 'reasonable' and 'closer' are vague terms.
In the example scenerio where there were around 80 rows in an indeterminate
state, my claim is that, in a table of around a million rows, it doesn't
matter whether some portion of those indeterminate rows are included in
an approximation of the table size or not (though it might in a table of
100 'true' rows - but the decision to ask for a true 'transaction' count (slow)
or an approximate table size (fast) should be left to the user in either case).

So, leave COUNT(*) alone.  But it would be very handy to have a
way to get an approximate table size that is more accurate than is
provided by a pg_class.reltuples that is only updated on vacuums.

-- 
Steve Wampler -- swampler@noao.edu
The gods that smiled on your birth are now laughing out loud.

Re: Make COUNT(*) Faster?

From

Rod Taylor

Date:

08 July 2005, 12:25:56

> So, leave COUNT(*) alone.  But it would be very handy to have a
> way to get an approximate table size that is more accurate than is
> provided by a pg_class.reltuples that is only updated on vacuums.

Create 2 sequences, one for counting tuple additions and one for
counting tuple deletions.

When you INSERT a tuple, bump the "added" sequence (select nextval());

When you DELETE a tuple, bump the "deleted" sequence (select nextval());

To retrieve an approximate count, take the current value of both
sequences (select directly -- don't use currval) and subtract the
"deletes" from the "adds".


This is a very fast tracking mechanism with the catch that it does not
handle rollbacks -- but you only wanted approximate. Put all of the
logic inside a pair of triggers and a function within the DB.


--

Re: Make COUNT(*) Faster?

From

Dawid Kuroczko

Date:

08 July 2005, 12:30:44

On 7/8/05, Steve Wampler <swampler@noao.edu> wrote:
> > None of those transactions have COMMITted, so there are some 78 tuples
> > "in limbo" spread across 16 transactions.
> >
> > If there were some "single secret place" with a count, how would you
> > suggest it address those 78 tuples and 16 transactions that aren't yet
> > (and maybe never will be) part of the count?
>
> Hmmm, I understand this and don't doubt it, but out of curiousity, how
> does the current SELECT COUNT(*) handle this?  It doesn't lock the entire
> table while counting (I assume) so the current implementation is really
> just an approximate count in the above scenario anyway.  Or even when
> not, since the true 'count' is likely to have changed by the time the
> user does anything with the result of SELECT COUNT(*) on any active table
> (and on an inactive table, pg_class.reltuples is nearly as good as
> SELECT COUNT(*) and far faster to get to.)
>
> I assume this has been beaten well past death, but I don't see why it
> wouldn't be possible to keep pg_class.reltuples a bit more up-to-date
> instead of updating it only on vacuums.

Use
EXPLAIN SELECT * FROM yourcountedtable;

Planner seems to track estimated statistics on-the-fly. :)

You can even wrap EXPLAIN SELECT in a pgsql function if you
need it.
  Regards,      Dawid

PS: And be aware that these are 'statistics'.  And the statement that there
are lies, big lies and statistics is sometimes true even for PostgreSQL. ;-)

Re: Make COUNT(*) Faster?

From

Andrew Sullivan

Date:

08 July 2005, 12:31:10

On Fri, Jul 08, 2005 at 08:07:27AM -0700, Steve Wampler wrote:
> Bruno Wolff III wrote:
> > No, it is an exact count.
> 
> Yes, for the transaction, but it's an approximation of the number of
> tuples in the table - which is probably what the people who worry about
> its cost are more interested in (an approximate count for the table).

You seem to have the wrong idea of "tuples in the table" here.  You
need to think harder about MVCC visibility rules.  Given MVCC,
there isn't really a "view from nowhere" in the system -- there's
just the idea of what tuple visibility.  For a little more, you might
want to look at the presentation Tom Lane made for this:

http://www.postgresql.org/files/developer/transactions.pdf

A  

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
A certain description of men are for getting out of debt, yet are
against all taxes for raising money to pay it off.    --Alexander Hamilton

Re: Make COUNT(*) Faster?

From

Dawid Kuroczko

Date:

08 July 2005, 12:34:39

On 7/8/05, Rod Taylor <pg@rbt.ca> wrote:
> Create 2 sequences, one for counting tuple additions and one for
> counting tuple deletions.
>
> When you INSERT a tuple, bump the "added" sequence (select nextval());
>
> When you DELETE a tuple, bump the "deleted" sequence (select nextval());
>
> To retrieve an approximate count, take the current value of both
> sequences (select directly -- don't use currval) and subtract the
> "deletes" from the "adds".

Never thought of that!  Good idea. :-)
  Regards,     Dawid

PS: There aren't any on ROLLBACK triggers, right? ;-)))

Re: Make COUNT(*) Faster?

From

Tom Lane

Date:

08 July 2005, 12:35:09

Steve Wampler <swampler@noao.edu> writes:
> So, leave COUNT(*) alone.  But it would be very handy to have a
> way to get an approximate table size that is more accurate than is
> provided by a pg_class.reltuples that is only updated on vacuums.

If you want something cheap, you could use the same technique the
planner uses nowadays: take RelationGetNumberOfBlocks() (which is
guaranteed accurate) and multiply by reltuples/relpages.  I don't
see anyplace where RelationGetNumberOfBlocks is directly exposed to
users now, but it'd be trivial to code up a couple of C functions to
provide this functionality.
        regards, tom lane

Re: Make COUNT(*) Faster?

From

Rod Taylor

Date:

08 July 2005, 12:41:58

On Fri, 2005-07-08 at 17:34 +0200, Dawid Kuroczko wrote:
> On 7/8/05, Rod Taylor <pg@rbt.ca> wrote:
> > Create 2 sequences, one for counting tuple additions and one for
> > counting tuple deletions.
> > 
> > When you INSERT a tuple, bump the "added" sequence (select nextval());
> > 
> > When you DELETE a tuple, bump the "deleted" sequence (select nextval());
> > 
> > To retrieve an approximate count, take the current value of both
> > sequences (select directly -- don't use currval) and subtract the
> > "deletes" from the "adds".
> 
> Never thought of that!  Good idea. :-)
> 
>    Regards,
>       Dawid
> 
> PS: There aren't any on ROLLBACK triggers, right? ;-)))

No. You could make then RI triggers and have them deferred until commit
though.

--

Re: Make COUNT(*) Faster?

From

Steve Wampler

Date:

08 July 2005, 13:00:21

Tom Lane wrote:
> Steve Wampler <swampler@noao.edu> writes:
> 
>>So, leave COUNT(*) alone.  But it would be very handy to have a
>>way to get an approximate table size that is more accurate than is
>>provided by a pg_class.reltuples that is only updated on vacuums.
> 
> If you want something cheap, you could use the same technique the
> planner uses nowadays: take RelationGetNumberOfBlocks() (which is
> guaranteed accurate) and multiply by reltuples/relpages.  I don't
> see anyplace where RelationGetNumberOfBlocks is directly exposed to
> users now, but it'd be trivial to code up a couple of C functions to
> provide this functionality.

Yes - this would be an excellent approximation for my needs!  The
solution that Dawid Kuroczko suggested (just call "explain select *
on ..." and parse the result) would be equivalent these days, right?
(I think in the 7.x versions the planner was just using pg_class.reltuples,
which wouldn't have helped.)  If true, I can handle that parsing myself
easily enough without exposing RelationGetNumberOfBlocks.

Thanks (Tom and Dawid)!

-- 
Steve Wampler -- swampler@noao.edu
The gods that smiled on your birth are now laughing out loud.

Re: Make COUNT(*) Faster?

From

PFC

Date:

08 July 2005, 13:08:30


> which wouldn't have helped.)  If true, I can handle that parsing myself
> easily enough without exposing RelationGetNumberOfBlocks.
Is there a way to get EXPLAIN results in a non-text-formatted way for  
easier use ?I'm asking, because it seems the feature set grows by the minute in  
posgres nowadays...

Re: Make COUNT(*) Faster?

From

Tom Lane

Date:

08 July 2005, 13:14:12

Steve Wampler <swampler@noao.edu> writes:
> Tom Lane wrote:
>> If you want something cheap, you could use the same technique the
>> planner uses nowadays: take RelationGetNumberOfBlocks() (which is
>> guaranteed accurate) and multiply by reltuples/relpages.

> Yes - this would be an excellent approximation for my needs!  The
> solution that Dawid Kuroczko suggested (just call "explain select *
> on ..." and parse the result) would be equivalent these days, right?

Close enough (the planner actually does some additional heuristic
stuff to avoid going crazy on corner cases).
        regards, tom lane

Re: Make COUNT(*) Faster?

From

Christopher Browne

Date:

08 July 2005, 13:50:24

> I'm also claiming that a true count for any active table is
> meaningless and am *not* suggesting that effort be spent on trying
> to produce such a true count.

That's a pretty big assumption that would in fact be WRONG.

We have managers interested in counting the number of objects we have
around (As a domain registry, what objects would you imagine those
might be :-)), and they're keen on possibly even being able to
reconcile those counts from day to day based on transaction activity.

Leaping into some sort of vague guesstimation would destroy the
ability to do any kind of analysis of activity, and I daresay enrage
them.

There may be times that a really rough guess can suffice; there are
other times when exactness is absolutely vital.

Creating a "fast but WRONG COUNT(*)" which prevented getting the exact
answer that the present implementation provides would be a severe
misfeature.
-- 
output = reverse("gro.gultn" "@" "enworbbc")
http://linuxdatabases.info/info/rdbms.html
"The  test of a  principle  is whether it  applies  even to people you
don't like." -- Henry Spencer

Re: Make COUNT(*) Faster?

From

Steve Wampler

Date:

08 July 2005, 14:09:12

Christopher Browne wrote:
>>I'm also claiming that a true count for any active table is
>>meaningless and am *not* suggesting that effort be spent on trying
>>to produce such a true count.
> 
> 
> That's a pretty big assumption that would in fact be WRONG.

Please reread the message from Bruno and reconcile the above statement
with his assertion (which I believe) that there is *no* single true count
for an active table.  [I'm defining 'active' as currently
undergoing insert/copy/delete/update actions].

> We have managers interested in counting the number of objects we have
> around (As a domain registry, what objects would you imagine those
> might be :-)), and they're keen on possibly even being able to
> reconcile those counts from day to day based on transaction activity.

If Bruno is correct, then they need to do this reconcilation from
within a single transaction (the same one that does the COUNT(*)) - or
else they are working on an 'inactive' table [one not currently
accepting changes].  If neither condition holds, then isn't the result
they are using from COUNT(*) currently is *already* an approximation?

> Leaping into some sort of vague guesstimation would destroy the
> ability to do any kind of analysis of activity, and I daresay enrage
> them.

No doubt!  Let's hope the above conditions hold.

> There may be times that a really rough guess can suffice; there are
> other times when exactness is absolutely vital.

But, as others have said, COUNT(*) does not return a true count for
a table, but rather just a true count for the *current transaction*.
So COUNT(*)'s from different simultaneous transactions may very well
produce different values.

> Creating a "fast but WRONG COUNT(*)" which prevented getting the exact
> answer that the present implementation provides would be a severe
> misfeature.

Agreed - note that I did not suggest replacing the current COUNT(*)
with an inexact version, but wanted (and now have) a quick way to get
a reasonable approximation of the current table size.

-- 
Steve Wampler -- swampler@noao.edu
The gods that smiled on your birth are now laughing out loud.

Re: Make COUNT(*) Faster?

From

Alvaro Herrera

Date:

08 July 2005, 16:51:34

On Fri, Jul 08, 2005 at 06:08:03PM +0200, PFC wrote:
> 
> 
> >which wouldn't have helped.)  If true, I can handle that parsing myself
> >easily enough without exposing RelationGetNumberOfBlocks.
> 
>     Is there a way to get EXPLAIN results in a non-text-formatted way 
>     for  easier use ?
>     I'm asking, because it seems the feature set grows by the minute in  
> posgres nowadays...

Not yet, but a friend of mine was working on a patch to make it output
in a custom XML format, to be able to create a tool similar to Redhat's
Visual Explain.  I expect he will show up in pgsql-hackers sometime ...

In spanish: http://www.ubiobio.cl/~gpoo/weblog/archives/000397.html

-- 
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"Find a bug in a program, and fix it, and the program will work today.
Show the program how to find and fix a bug, and the program
will work forever" (Oliver Silfridge)

Re: Make COUNT(*) Faster?

From

Harald Fuchs

Date:

10 July 2005, 09:30:43

In article <758d5e7f05070808305c049aae@mail.gmail.com>,
Dawid Kuroczko <qnex42@gmail.com> writes:

> Use
> EXPLAIN SELECT * FROM yourcountedtable;

> Planner seems to track estimated statistics on-the-fly. :)

> You can even wrap EXPLAIN SELECT in a pgsql function if you
> need it.

Do you know how to do that?  A function "approx_count(tablename)"
would be really handy, but
 FOR row IN EXECUTE 'EXPLAIN SELECT * FROM ' || tbl LOOP

fails with the following message:
 ERROR:  cannot open non-SELECT query as cursor

> PS: And be aware that these are 'statistics'.  And the statement that there
> are lies, big lies and statistics is sometimes true even for PostgreSQL. ;-)

Does this mean that PostgreSQL believes only in statistics it has
faked itself?  ;-)

Re: Make COUNT(*) Faster?

From

Tom Lane

Date:

10 July 2005, 13:10:39

Harald Fuchs <hf0614x@protecting.net> writes:
>   FOR row IN EXECUTE 'EXPLAIN SELECT * FROM ' || tbl LOOP
> fails with the following message:
>   ERROR:  cannot open non-SELECT query as cursor

[ checks CVS history... ]  Use 8.0.2 or later.
        regards, tom lane