Thread: Slow "Select count(*) ..." query on table with 60 Mio. rows

Slow "Select count(*) ..." query on table with 60 Mio. rows

From
tom
Date:
Hi,

=== Problem ===

i have a db-table "data_measurand" with about 60000000 (60 Millions)
rows and the following query takes about 20-30 seconds (with psql):

mydb=# select count(*) from data_measurand;
  count
----------
 60846187
(1 row)


=== Question ===

- What can i do to improve the performance for the data_measurand table?

=== Background ===

I created a application with django 1.1 ( http://djangoproject.com ) to
collect, analyze and visualize measurement data.

=== My System ===

= Postgres Version =
postgres=# select version();
version
---------------------------------------------------------------------
 PostgreSQL 8.3.9 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real
(Debian 4.3.2-1.1) 4.3.2
(1 row)

I installed postgres with apt-get from debian lenny without any
modifications.

= Debian Lenny Kernel Version =
lenny:~# uname  -a
Linux or.ammonit.com 2.6.26-2-686-bigmem #1 SMP Wed Nov 4 21:12:12 UTC
2009 i686 GNU/Linux

= Hardware =
model name    : AMD Athlon(tm) 64 X2 Dual Core Processor 6000+
cpu MHz        : 1000.000
cache size    : 512 KB
MemTotal    : 8281516 kB (8 GB)

I use a software raid and LVM for Logical Volume Management. Filesystem
is ext3



=== My Table Definitions ===

mydb=# \d data_measurand;
                                     Table "public.data_measurand"
     Column      |          Type          |
Modifiers
-----------------+------------------------+-------------------------------------------------------------
 id              | integer                | not null default
nextval('data_measurand_id_seq'::regclass)
 entry_id        | integer                | not null
 sensor_id       | integer                | not null
 avg_value       | numeric(10,4)          |
 avg_count_value | integer                |
 min_value       | numeric(10,4)          |
 max_value       | numeric(10,4)          |
 sigma_value     | numeric(10,4)          |
 unit            | character varying(20)  | not null
 status          | integer                | not null
 comment         | character varying(255) | not null
Indexes:
    "data_measurand_pkey" PRIMARY KEY, btree (id)
    "data_measurand_entry_id_68e2e3fe" UNIQUE, btree (entry_id,
sensor_id)
    "data_measurand_avg_count_value" btree (avg_count_value)
    "data_measurand_avg_value" btree (avg_value)
    "data_measurand_comment" btree (comment)
    "data_measurand_entry_id" btree (entry_id)
    "data_measurand_max_value" btree (max_value)
    "data_measurand_min_value" btree (min_value)
    "data_measurand_sensor_id" btree (sensor_id)
    "data_measurand_sigma_value" btree (sigma_value)
    "data_measurand_status" btree (status)
    "data_measurand_unit" btree (unit)
Foreign-key constraints:
    "entry_id_refs_id_50fa9bdf" FOREIGN KEY (entry_id) REFERENCES
data_entry(id) DEFERRABLE INITIALLY DEFERRED
    "sensor_id_refs_id_5ed84c7c" FOREIGN KEY (sensor_id) REFERENCES
sensor_sensor(id) DEFERRABLE INITIALLY DEFERRED



mydb=# \d data_entry;
                                       Table "public.data_entry"
      Column      |           Type           |
Modifiers
------------------+--------------------------+---------------------------------------------------------
 id               | integer                  | not null default
nextval('data_entry_id_seq'::regclass)
 project_id       | integer                  | not null
 logger_id        | integer                  | not null
 original_file_id | integer                  | not null
 datetime         | timestamp with time zone | not null
Indexes:
    "data_entry_pkey" PRIMARY KEY, btree (id)
    "data_entry_logger_id_197f5d41" UNIQUE, btree (logger_id, datetime)
    "data_entry_datetime" btree (datetime)
    "data_entry_logger_id" btree (logger_id)
    "data_entry_original_file_id" btree (original_file_id)
    "data_entry_project_id" btree (project_id)
Foreign-key constraints:
    "logger_id_refs_id_5f73cf46" FOREIGN KEY (logger_id) REFERENCES
logger_logger(id) DEFERRABLE INITIALLY DEFERRED
    "original_file_id_refs_id_44e8d3b1" FOREIGN KEY (original_file_id)
REFERENCES data_originalfile(id) DEFERRABLE INITIALLY DEFERRED
    "project_id_refs_id_719fb302" FOREIGN KEY (project_id) REFERENCES
project_project(id) DEFERRABLE INITIALLY DEFERRED



mydb=# \d project_project;
                                    Table "public.project_project"
    Column     |          Type          |
Modifiers
---------------+------------------------+--------------------------------------------------------------
 id            | integer                | not null default
nextval('project_project_id_seq'::regclass)
 auth_group_id | integer                | not null
 name          | character varying(200) | not null
 timezone      | character varying(200) |
 longitude     | double precision       |
 latitude      | double precision       |
 altitude      | double precision       |
 comment       | text                   |
Indexes:
    "project_project_pkey" PRIMARY KEY, btree (id)
    "project_project_auth_group_id" btree (auth_group_id)
Foreign-key constraints:
    "auth_group_id_refs_id_267c7fe5" FOREIGN KEY (auth_group_id)
REFERENCES auth_group(id) DEFERRABLE INITIALLY DEFERRED



mydb=# \d logger_logger;
                                         Table "public.logger_logger"
        Column         |           Type           |
Modifiers
-----------------------+--------------------------+------------------------------------------------------------
 id                    | integer                  | not null default
nextval('logger_logger_id_seq'::regclass)
 auth_group_id         | integer                  | not null
 project_id            | integer                  |
 serial                | character varying(50)    | not null
 type                  | character varying(30)    | not null
 comment               | text                     |
 last_email            | timestamp with time zone |
 last_checked_datetime | timestamp with time zone |
Indexes:
    "logger_logger_pkey" PRIMARY KEY, btree (id)
    "logger_logger_serial_key" UNIQUE, btree (serial)
    "logger_logger_auth_group_id" btree (auth_group_id)
    "logger_logger_last_checked_datetime" btree (last_checked_datetime)
    "logger_logger_project_id" btree (project_id)
Foreign-key constraints:
    "auth_group_id_refs_id_355ed859" FOREIGN KEY (auth_group_id)
REFERENCES auth_group(id) DEFERRABLE INITIALLY DEFERRED
    "project_id_refs_id_5f4a56f3" FOREIGN KEY (project_id) REFERENCES
project_project(id) DEFERRABLE INITIALLY DEFERRED




I hope that's enough information.

Cheers Tom


Re: Slow "Select count(*) ..." query on table with 60 Mio. rows

From
Matthew Wakeling
Date:
On Thu, 14 Jan 2010, tom wrote:
> i have a db-table "data_measurand" with about 60000000 (60 Millions)
> rows and the following query takes about 20-30 seconds (with psql):
>
> mydb=# select count(*) from data_measurand;
>  count
> ----------
> 60846187
> (1 row)

Sounds pretty reasonable to me. Looking at your table, the rows are maybe
200 bytes wide? That's 12GB of data for Postgres to munch through. 30
seconds is really rather quick for that (400MB/s). What sort of RAID array
is managing to give you that much?

> I use a software raid and LVM for Logical Volume Management. Filesystem
> is ext3

Ditch lvm.


This is an FAQ. Counting the rows in a table is an expensive operation in
Postgres. It can't be answered directly from an index. If you want, you
can keep track of the number of rows yourself with triggers, but beware
that this will slow down write access to the table.

Matthew

--
 Nog:     Look! They've made me into an ensign!
 O'Brien: I didn't know things were going so badly.
 Nog:     Frightening, isn't it?

Re: Slow "Select count(*) ..." query on table with 60 Mio. rows

From
"A. Kretschmer"
Date:
In response to tom :
> Hi,
>
> === Problem ===
>
> i have a db-table "data_measurand" with about 60000000 (60 Millions)
> rows and the following query takes about 20-30 seconds (with psql):
>
> mydb=# select count(*) from data_measurand;
>   count
> ----------
>  60846187
> (1 row)
>
>
> === Question ===
>
> - What can i do to improve the performance for the data_measurand table?

Short answer: nothing.

Long answer: PG has to check the visibility for each record, so it
forces a seq.scan.

But you can get an estimation, ask pg_class (a system table), the column
reltuples there contains an estimated row rount.
http://www.postgresql.org/docs/current/static/catalog-pg-class.html

If you really needs the correct row-count you should create a TRIGGER
and count with this trigger all INSERTs and DELETEs.


Regards, Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99