Thread: SELECT COUNT(*) execution time on large tables (v9.0.4-1)

SELECT COUNT(*) execution time on large tables (v9.0.4-1)

From
"Carlos Sotto Maior \(SIM\)"
Date:
Hi,

My application has a frequent need to issue a select count(*) on tables.
Some have a large row count. (The example below are from a 5.7 M row; Some
are larger).

Issuing  either  SELECT COUNT(*)  or SELECT COUNT(<Primary_Key_Colum>) yelds
a sequential scan on table;

I have browsed catalog tables, digging for a real time Row.count but  so far
did not find any.
QUESTION: Is there a better (faster) way to obtain the row count from a
table?


Table has been VACCUM ANALYZED prior EXPLAIN ANALYZE  (Numbers are from a
test server)
----------------------------------------------------------------------------
--------
explain analyze select count(*) as qtd from ut_mailing_client ;
"Aggregate  (cost=10000231424.23..10000231424.24 rows=1 width=0) (actual
time=7205.009..7205.010 rows=1 loops=1)"
"  ->  Seq Scan on ut_mailing_client  (cost=10000000000.00..10000217054.58
rows=5747858 width=0) (actual time=0.034..3866.765 rows=5747311 loops=1)"
"Total runtime: 7205.071 ms"
----------------------------------------------------------------------------
--------
explain analyze select count(utm_id) as qtd from ut_mailing_client ;
"Aggregate  (cost=10000231424.23..10000231424.24 rows=1 width=4) (actual
time=7984.382..7984.382 rows=1 loops=1)"
"  ->  Seq Scan on ut_mailing_client  (cost=10000000000.00..10000217054.58
rows=5747858 width=4) (actual time=0.028..4114.177 rows=5747311 loops=1)"
"Total runtime: 7984.443 ms"
----------------------------------------------------------------------------
------
explain analyze select count(beneficio) as qtd from ut_mailing_client ;
"Aggregate  (cost=10000231424.23..10000231424.24 rows=1 width=11) (actual
time=7591.530..7591.530 rows=1 loops=1)"
"  ->  Seq Scan on ut_mailing_client  (cost=10000000000.00..10000217054.58
rows=5747858 width=11) (actual time=0.032..3845.412 rows=5747311 loops=1)"
"Total runtime: 7591.595 ms"

--TABLE
STRUCTURE-------------------------------------------------------------------
-------------
CREATE TABLE ut_mailing_client
(
  utm_id serial NOT NULL,
  beneficio character varying(10) NOT NULL,
.
.
.
  CONSTRAINT ut_mailing_client_pkey PRIMARY KEY (beneficio),
  CONSTRAINT ut_mailing_client_utm_id_key UNIQUE (utm_id)
)
WITH (
  OIDS=FALSE
);

-----VACUM
ANALYZE---------------------------------------------------------------------
-------------
INFO:  vacuuming "public.ut_mailing_client"
INFO:  index "ut_mailing_client_pkey" now contains 5747311 row versions in
18926 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 6.50s/3.24u sec elapsed 39.03 sec.
INFO:  index "ut_mailing_client_utm_id_key" now contains 5747311 row
versions in 12615 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 4.28s/2.19u sec elapsed 26.05 sec.
INFO:  index "ut_mailing_client_utm_fk_lote_utm_dt_used_idx" now contains
5747311 row versions in 18926 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 6.39s/3.27u sec elapsed 38.90 sec.
INFO:  "ut_mailing_client": found 0 removable, 1179 nonremovable row
versions in 31 out of 159576 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 17.17s/8.71u sec elapsed 104.02 sec.
INFO:  vacuuming "pg_toast.pg_toast_69799"
INFO:  index "pg_toast_69799_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_69799": found 0 removable, 0 nonremovable row versions in 0
out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing "public.ut_mailing_client"
INFO:  "ut_mailing_client": scanned 30000 of 159576 pages, containing
1080857 live rows and 0 dead rows; 30000 rows in sample, 5749295 estimated
total rows
Total query runtime: 111560 ms.

Carlos Sotto Maior
+55 11 8244-7899
csotto@sistemassim.com.br

Sistemas Sim Serviços e Tecnologia Ltda.
+55 11 5041-3086
Rua Tenente Gomes Ribeiro, 78
Vila Clementino (Próximo ao Metro Santa Cruz)
São Paulo - SP
04038-040



Re: SELECT COUNT(*) execution time on large tables (v9.0.4-1)

From
"David Johnston"
Date:
Counting live data is inherently imprecise.  There are supposedly some
system tables that can give you rough numbers.

You would be better off figuring out an alternative method to get the data
you desire and stop continually recounting all 5.7M records.

A Trigger driven counter, for insert and delete, is probably the most
obvious method.  Also, say for temporal data, cache the prior monthly counts
and only perform an actual count over the current (changing) month(s).

At your table size the brute-force approach is obviously not going to work
so an alternative method needs to be devised, one that eliminates
re-counting previously counted records.  The specific design is going to be
highly dependent on your specific requirements - which is why no generalized
solution exists.  If you provide the why behind the question, and not just
the question, people may be inclined to provide relevant suggestions.

Issuing a "count(*)" is not a need - it is an implementation.  The need is
what you end up doing with that number.

Lastly, the time you spent combing the system catalogs would have been
better spent perusing the FAQ linked to off the PostgreSQL homepage.  You
question, in almost the same words, is in the FAQ with a link to the wiki
which repeats all your observations and explains why the behavior is that
way; and suggests (links to) possible alternatives.  You may wish to go
there now to get more background and ideas.

David J.


> Hi,
>
> My application has a frequent need to issue a select count(*) on tables.
> Some have a large row count. (The example below are from a 5.7 M row;
> Some are larger).
>
> Issuing  either  SELECT COUNT(*)  or SELECT COUNT(<Primary_Key_Colum>)
> yelds a sequential scan on table;
>
> I have browsed catalog tables, digging for a real time Row.count but  so
far
> did not find any.
> QUESTION: Is there a better (faster) way to obtain the row count from a
> table?
>



Re: SELECT COUNT(*) execution time on large tables (v9.0.4-1)

From
Greg Smith
Date:
On 05/27/2011 12:41 PM, Carlos Sotto Maior (SIM) wrote:
> I have browsed catalog tables, digging for a real time Row.count but  so far
> did not find any.
>

See http://wiki.postgresql.org/wiki/Slow_Counting which shows you where
the one system count estimate is at, as well as suggesting links to
where you can find alternate approaches here.  If you need an exact
count and can't afford to generate a full query to find one, some sort
of trigger-based approach is likely where you'll need to go.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


RES: SELECT COUNT(*) execution time on large tables (v9.0.4-1)

From
"Carlos Sotto Maior \(SIM\)"
Date:
David, Thanks for your reply.

I will probably use the strategy of a trigger driven counter, with temporal
strategy devising current month totals and up to last month total as current
month changes rapidly.

I also apologize for not being investigative enough. I did look at wiki but
maybe I did not pursue my doubt in wiki in the proper manner. I will surely
get more acquainted to wiki to keep this channel clean from repeated
questions.

Once again, thank you very much.

Carlos Sotto Maior
+55 11 8244-7899
csotto@sistemassim.com.br

Sistemas Sim Serviços e Tecnologia Ltda.
+55 11 5041-3086
Rua Tenente Gomes Ribeiro, 78
Vila Clementino (Próximo ao Metro Santa Cruz)
São Paulo - SP
04038-040

-----Mensagem original-----
De: David Johnston [mailto:polobo@yahoo.com]
Enviada em: sexta-feira, 27 de maio de 2011 17:49
Para: 'Carlos Sotto Maior (SIM)'; pgsql-general@postgresql.org
Assunto: RE: [GENERAL] SELECT COUNT(*) execution time on large tables
(v9.0.4-1)

Counting live data is inherently imprecise.  There are supposedly some
system tables that can give you rough numbers.

You would be better off figuring out an alternative method to get the data
you desire and stop continually recounting all 5.7M records.

A Trigger driven counter, for insert and delete, is probably the most
obvious method.  Also, say for temporal data, cache the prior monthly counts
and only perform an actual count over the current (changing) month(s).

At your table size the brute-force approach is obviously not going to work
so an alternative method needs to be devised, one that eliminates
re-counting previously counted records.  The specific design is going to be
highly dependent on your specific requirements - which is why no generalized
solution exists.  If you provide the why behind the question, and not just
the question, people may be inclined to provide relevant suggestions.

Issuing a "count(*)" is not a need - it is an implementation.  The need is
what you end up doing with that number.

Lastly, the time you spent combing the system catalogs would have been
better spent perusing the FAQ linked to off the PostgreSQL homepage.  You
question, in almost the same words, is in the FAQ with a link to the wiki
which repeats all your observations and explains why the behavior is that
way; and suggests (links to) possible alternatives.  You may wish to go
there now to get more background and ideas.

David J.


> Hi,
>
> My application has a frequent need to issue a select count(*) on tables.
> Some have a large row count. (The example below are from a 5.7 M row;
> Some are larger).
>
> Issuing  either  SELECT COUNT(*)  or SELECT COUNT(<Primary_Key_Colum>)
> yelds a sequential scan on table;
>
> I have browsed catalog tables, digging for a real time Row.count but
> so
far
> did not find any.
> QUESTION: Is there a better (faster) way to obtain the row count from
> a table?
>



Re: RES: SELECT COUNT(*) execution time on large tables (v9.0.4-1)

From
salah jubeh
Date:
Please Have a look on  pg_stat_user_tables, there is a field called n_live_tup . But I think in some cases this might not hold the actual row number
 
Regards

 



From: Carlos Sotto Maior (SIM) <csotto@sistemassim.com.br>
To: David Johnston <polobo@yahoo.com>; pgsql-general@postgresql.org
Sent: Mon, May 30, 2011 8:35:39 PM
Subject: RES: [GENERAL] SELECT COUNT(*) execution time on large tables (v9.0.4-1)

David, Thanks for your reply.

I will probably use the strategy of a trigger driven counter, with temporal
strategy devising current month totals and up to last month total as current
month changes rapidly.

I also apologize for not being investigative enough. I did look at wiki but
maybe I did not pursue my doubt in wiki in the proper manner. I will surely
get more acquainted to wiki to keep this channel clean from repeated
questions.

Once again, thank you very much.

Carlos Sotto Maior
+55 11 8244-7899
csotto@sistemassim.com.br

Sistemas Sim Serviços e Tecnologia Ltda.
+55 11 5041-3086
Rua Tenente Gomes Ribeiro, 78
Vila Clementino (Próximo ao Metro Santa Cruz)
São Paulo - SP
04038-040

-----Mensagem original-----
De: David Johnston [mailto:polobo@yahoo.com]
Enviada em: sexta-feira, 27 de maio de 2011 17:49
Para: 'Carlos Sotto Maior (SIM)'; pgsql-general@postgresql.org
Assunto: RE: [GENERAL] SELECT COUNT(*) execution time on large tables
(v9.0.4-1)

Counting live data is inherently imprecise.  There are supposedly some
system tables that can give you rough numbers.

You would be better off figuring out an alternative method to get the data
you desire and stop continually recounting all 5.7M records.

A Trigger driven counter, for insert and delete, is probably the most
obvious method.  Also, say for temporal data, cache the prior monthly counts
and only perform an actual count over the current (changing) month(s).

At your table size the brute-force approach is obviously not going to work
so an alternative method needs to be devised, one that eliminates
re-counting previously counted records.  The specific design is going to be
highly dependent on your specific requirements - which is why no generalized
solution exists.  If you provide the why behind the question, and not just
the question, people may be inclined to provide relevant suggestions.

Issuing a "count(*)" is not a need - it is an implementation.  The need is
what you end up doing with that number.

Lastly, the time you spent combing the system catalogs would have been
better spent perusing the FAQ linked to off the PostgreSQL homepage.  You
question, in almost the same words, is in the FAQ with a link to the wiki
which repeats all your observations and explains why the behavior is that
way; and suggests (links to) possible alternatives.  You may wish to go
there now to get more background and ideas.

David J.


> Hi,
>
> My application has a frequent need to issue a select count(*) on tables.
> Some have a large row count. (The example below are from a 5.7 M row;
> Some are larger).
>
> Issuing  either  SELECT COUNT(*)  or SELECT COUNT(<Primary_Key_Colum>)
> yelds a sequential scan on table;
>
> I have browsed catalog tables, digging for a real time Row.count but 
> so
far
> did not find any.
> QUESTION: Is there a better (faster) way to obtain the row count from
> a table?
>



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general