Re: SELECT COUNT(*) execution time on large tables (v9.0.4-1) - Mailing list pgsql-general

From David Johnston
Subject Re: SELECT COUNT(*) execution time on large tables (v9.0.4-1)
Date
Msg-id 00b001cc1caf$77062e10$65128a30$@yahoo.com
Whole thread Raw
In response to SELECT COUNT(*) execution time on large tables (v9.0.4-1)  ("Carlos Sotto Maior \(SIM\)" <csotto@sistemassim.com.br>)
Responses RES: SELECT COUNT(*) execution time on large tables (v9.0.4-1)  ("Carlos Sotto Maior \(SIM\)" <csotto@sistemassim.com.br>)
List pgsql-general
Counting live data is inherently imprecise.  There are supposedly some
system tables that can give you rough numbers.

You would be better off figuring out an alternative method to get the data
you desire and stop continually recounting all 5.7M records.

A Trigger driven counter, for insert and delete, is probably the most
obvious method.  Also, say for temporal data, cache the prior monthly counts
and only perform an actual count over the current (changing) month(s).

At your table size the brute-force approach is obviously not going to work
so an alternative method needs to be devised, one that eliminates
re-counting previously counted records.  The specific design is going to be
highly dependent on your specific requirements - which is why no generalized
solution exists.  If you provide the why behind the question, and not just
the question, people may be inclined to provide relevant suggestions.

Issuing a "count(*)" is not a need - it is an implementation.  The need is
what you end up doing with that number.

Lastly, the time you spent combing the system catalogs would have been
better spent perusing the FAQ linked to off the PostgreSQL homepage.  You
question, in almost the same words, is in the FAQ with a link to the wiki
which repeats all your observations and explains why the behavior is that
way; and suggests (links to) possible alternatives.  You may wish to go
there now to get more background and ideas.

David J.


> Hi,
>
> My application has a frequent need to issue a select count(*) on tables.
> Some have a large row count. (The example below are from a 5.7 M row;
> Some are larger).
>
> Issuing  either  SELECT COUNT(*)  or SELECT COUNT(<Primary_Key_Colum>)
> yelds a sequential scan on table;
>
> I have browsed catalog tables, digging for a real time Row.count but  so
far
> did not find any.
> QUESTION: Is there a better (faster) way to obtain the row count from a
> table?
>



pgsql-general by date:

Previous
From: preetika tyagi
Date:
Subject: Shared Buffer Size
Next
From: Derrick Rice
Date:
Subject: Re: Shared Buffer Size