Select count(*) on a 2B Rows Tables Takes ~20 Hours - Mailing list pgsql-performance

From Fd Habash
Subject Select count(*) on a 2B Rows Tables Takes ~20 Hours
Date
Msg-id 5b9a9f81.1c69fb81.43388.0e8b@mx.google.com
Whole thread Raw
Responses Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours  (Justin Pryzby <pryzby@telsasoft.com>)
Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance

Based on my research in the forums and Google , it is described in multiple places that ‘select count(*)’ is expected to be slow in Postgres because of the MVCC controls imposed upon the query leading a table scan. Also, the elapsed time increase linearly with table size.

 

However, I do not know if elapsed time I’m getting is to be expected.

 

Table reltuples in pg_class = 2,266,649,344 (pretty close)

Query = select count(*) from jim.sttyations ;

Elapsed time (ET) = 18.5 hrs

 

This is an Aurora cluster running on r4.2xlarge (8 vCPU, 61g). CPU usage during count run hovers around 20% with 20g of freeable memory.

 

Is this ET expected? If not, what could be slowing it down? I’m currently running explain analyze and I’ll share the final output when done.

 

I’m familiar with the ideas listed here https://www.citusdata.com/blog/2016/10/12/count-performance/

 

Table "jim.sttyations"

      Column       |           Type           |         Modifiers          | Storage  | Stats target | Description

-------------------+--------------------------+----------------------------+----------+--------------+-------------

stty_id            | bigint                   | not null                   | plain    |              |

 stty_hitlist_line  | text                     | not null                   | extended |              |

 stty_status        | text                     | not null default 'Y'::text | extended |              |

 stty_status_date   | timestamp with time zone | not null                   | plain    |              |

 vs_number         | integer                  | not null                   | plain    |              |

 stty_date_created  | timestamp with time zone | not null                   | plain    |              |

 stty_stty_id        | bigint                   |                            | plain    |              |

 stty_position      | bigint                   |                            | plain    |              |

 mstty_id           | bigint                   |                            | plain    |              |

 vsr_number        | integer                  |                            | plain    |              |

 stty_date_modified | timestamp with time zone |                            | plain    |              |

 stty_stored        | text                     | not null default 'N'::text | extended |              |

 stty_sequence      | text                     |                            | extended |              |

 stty_hash          | text                     |                            | extended |              |

Indexes:

    "stty_pk" PRIMARY KEY, btree (stty_id)

    "stty_indx_fk01" btree (stty_stty_id)

    "stty_indx_fk03" btree (vsr_number)

    "stty_indx_fk04" btree (vs_number)

    "stty_indx_pr01" btree (mstty_id, stty_id)

Check constraints:

    "stty_cnst_ck01" CHECK (stty_status = ANY (ARRAY['Y'::text, 'N'::text]))

    "stty_cnst_ck02" CHECK (stty_stored = ANY (ARRAY['N'::text, 'Y'::text]))

Foreign-key constraints:

    "stty_cnst_fk01" FOREIGN KEY (stty_stty_id) REFERENCES sttyations(stty_id) NOT VALID

    "stty_cnst_fk02" FOREIGN KEY (mstty_id) REFERENCES master_sttyations(mstty_id)

    "stty_cnst_fk03" FOREIGN KEY (vsr_number) REFERENCES valid_status_reasons(vsr_number)

 

----------------
Thank you

 

 

refpep-> select count(*) from jim.sttyations;

                                                    QUERY PLAN                                                   

------------------------------------------------------------------------------------------------------------------

Aggregate  (cost=73451291.77..73451291.78 rows=1 width=8)

   Output: count(*)

   ->  Index Only Scan using stty_indx_fk03 on jim.sttyations  (cost=0.58..67784668.41 rows=2266649344 width=0)

         Output: vsr_number

(4 rows)

 

pgsql-performance by date:

Previous
From: padusuma
Date:
Subject: Re: Performance of INSERT into temporary tables using psqlODBCdriver
Next
From: Justin Pryzby
Date:
Subject: Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours