Thread: Select count(*) on a 2B Rows Tables Takes ~20 Hours
Based on my research in the forums and Google , it is described in multiple places that ‘select count(*)’ is expected to be slow in Postgres because of the MVCC controls imposed upon the query leading a table scan. Also, the elapsed time increase linearly with table size.
However, I do not know if elapsed time I’m getting is to be expected.
Table reltuples in pg_class = 2,266,649,344 (pretty close)
Query = select count(*) from jim.sttyations ;
Elapsed time (ET) = 18.5 hrs
This is an Aurora cluster running on r4.2xlarge (8 vCPU, 61g). CPU usage during count run hovers around 20% with 20g of freeable memory.
Is this ET expected? If not, what could be slowing it down? I’m currently running explain analyze and I’ll share the final output when done.
I’m familiar with the ideas listed here https://www.citusdata.com/blog/2016/10/12/count-performance/
Table "jim.sttyations"
Column | Type | Modifiers | Storage | Stats target | Description
-------------------+--------------------------+----------------------------+----------+--------------+-------------
stty_id | bigint | not null | plain | |
stty_hitlist_line | text | not null | extended | |
stty_status | text | not null default 'Y'::text | extended | |
stty_status_date | timestamp with time zone | not null | plain | |
vs_number | integer | not null | plain | |
stty_date_created | timestamp with time zone | not null | plain | |
stty_stty_id | bigint | | plain | |
stty_position | bigint | | plain | |
mstty_id | bigint | | plain | |
vsr_number | integer | | plain | |
stty_date_modified | timestamp with time zone | | plain | |
stty_stored | text | not null default 'N'::text | extended | |
stty_sequence | text | | extended | |
stty_hash | text | | extended | |
Indexes:
"stty_pk" PRIMARY KEY, btree (stty_id)
"stty_indx_fk01" btree (stty_stty_id)
"stty_indx_fk03" btree (vsr_number)
"stty_indx_fk04" btree (vs_number)
"stty_indx_pr01" btree (mstty_id, stty_id)
Check constraints:
"stty_cnst_ck01" CHECK (stty_status = ANY (ARRAY['Y'::text, 'N'::text]))
"stty_cnst_ck02" CHECK (stty_stored = ANY (ARRAY['N'::text, 'Y'::text]))
Foreign-key constraints:
"stty_cnst_fk01" FOREIGN KEY (stty_stty_id) REFERENCES sttyations(stty_id) NOT VALID
"stty_cnst_fk02" FOREIGN KEY (mstty_id) REFERENCES master_sttyations(mstty_id)
"stty_cnst_fk03" FOREIGN KEY (vsr_number) REFERENCES valid_status_reasons(vsr_number)
----------------
Thank you
refpep-> select count(*) from jim.sttyations;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Aggregate (cost=73451291.77..73451291.78 rows=1 width=8)
Output: count(*)
-> Index Only Scan using stty_indx_fk03 on jim.sttyations (cost=0.58..67784668.41 rows=2266649344 width=0)
Output: vsr_number
(4 rows)
On Thu, Sep 13, 2018 at 01:33:54PM -0400, Fd Habash wrote: > Is this ET expected? If not, what could be slowing it down? I’m currently running explain analyze and I’ll share the finaloutput when done. explain(analyze,BUFFERS) is what's probably interesting You're getting an index-only-scan, but maybe still making many accesses to the heap (table) for pages which aren't all-visible. You can maybe improve by vacuuming (perhaps by daily cronjob or by ALTER TABLE SET autovacuum threshold or scale factor). -- Justin Pryzby System Administrator Telsasoft +1-952-707-8581
Fd Habash <fmhabash@gmail.com> writes: > Based on my research in the forums and Google , it is described in multiple places that ‘select count(*)’ is expected tobe slow in Postgres because of the MVCC controls imposed upon the query leading a table scan. Also, the elapsed time increaselinearly with table size. > However, I do not know if elapsed time I’m getting is to be expected. > Table reltuples in pg_class = 2,266,649,344 (pretty close) > Query = select count(*) from jim.sttyations ; > Elapsed time (ET) = 18.5 hrs That's pretty awful. My recollection is that in recent PG releases, SELECT COUNT(*) runs at something on the order of 100ns/row given an all-in-memory table. Evidently you're rather badly I/O bound. > This is an Aurora cluster running on r4.2xlarge (8 vCPU, 61g). Don't know much about Aurora, but I wonder whether you paid for guaranteed (provisioned) IOPS, and if so what service level. > refpep-> select count(*) from jim.sttyations; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------ > Aggregate (cost=73451291.77..73451291.78 rows=1 width=8) > Output: count(*) > -> Index Only Scan using stty_indx_fk03 on jim.sttyations (cost=0.58..67784668.41 rows=2266649344 width=0) > Output: vsr_number > (4 rows) Oh, hmm ... the 100ns figure I mentioned was for a seqscan. IOS could be a lot worse for a number of reasons, foremost being that if the table isn't mostly all-visible then it'd involve a lot of random heap access. It might be interesting to try forcing a seqscan plan (see enable_indexscan). regards, tom lane
Just checked metrics while the count was running …
Read latency < 3.5 ms
Write latency < 4 ms
Read throughput ~ 40 MB/sec with sporadic peaks at 100
Read IOPS ~ 5000
QDepth < 3
----------------
Thank you
From: Tom Lane
Sent: Thursday, September 13, 2018 2:12 PM
To: Fd Habash
Cc: pgsql-performance@lists.postgresql.org
Subject: Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours
Fd Habash <fmhabash@gmail.com> writes:
> Based on my research in the forums and Google , it is described in multiple places that ‘select count(*)’ is expected to be slow in Postgres because of the MVCC controls imposed upon the query leading a table scan. Also, the elapsed time increase linearly with table size.
> However, I do not know if elapsed time I’m getting is to be expected.
> Table reltuples in pg_class = 2,266,649,344 (pretty close)
> Query = select count(*) from jim.sttyations ;
> Elapsed time (ET) = 18.5 hrs
That's pretty awful. My recollection is that in recent PG releases,
SELECT COUNT(*) runs at something on the order of 100ns/row given an
all-in-memory table. Evidently you're rather badly I/O bound.
> This is an Aurora cluster running on r4.2xlarge (8 vCPU, 61g).
Don't know much about Aurora, but I wonder whether you paid for
guaranteed (provisioned) IOPS, and if so what service level.
> refpep-> select count(*) from jim.sttyations;
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=73451291.77..73451291.78 rows=1 width=8)
> Output: count(*)
> -> Index Only Scan using stty_indx_fk03 on jim.sttyations (cost=0.58..67784668.41 rows=2266649344 width=0)
> Output: vsr_number
> (4 rows)
Oh, hmm ... the 100ns figure I mentioned was for a seqscan. IOS
could be a lot worse for a number of reasons, foremost being that
if the table isn't mostly all-visible then it'd involve a lot of
random heap access. It might be interesting to try forcing a
seqscan plan (see enable_indexscan).
regards, tom lane
Hi, On 2018-09-13 14:12:02 -0400, Tom Lane wrote: > > This is an Aurora cluster running on r4.2xlarge (8 vCPU, 61g). > > Don't know much about Aurora, but I wonder whether you paid for > guaranteed (provisioned) IOPS, and if so what service level. Given that aurora uses direct-io and has the storage layer largely completely replaced, I'm not sure how much we can help here. My understanding is that access to blocks can require page-level "log reconciliation", which can cause adverse IO patterns. The direct-IO means that cache configuration / prefetching is much more crucial. If a lot of those tuples aren't frozen (don't quite know how that works there), the clog accesses will also kill you if the table was filled over many transactions, since clog's access characteristics to a lot of xids is pretty bad with DIO. Greetings, Andres Freund
Fred Habash wrote: > If I'm reading this correctly, it took 57M ms out of an elapsed time of 61M ms to read 45M pages from the filesystem? > If the average service time per sarr is < 5 ms, Is this a case of bloated index where re-indexing is warranted? > > explain (analyze,buffers,timing,verbose,costs) > select count(*) from jim.pitations ; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=72893810.73..72893810.74 rows=1 width=8) (actual time=61141110.437..61141110.437 rows=1 loops=1) > Output: count(*) > Buffers: shared hit=72620045 read=45297330 > I/O Timings: read=57489958.088 > -> Index Only Scan using pit_indx_fk03 on jim.pitations (cost=0.58..67227187.37 rows=2266649344 width=0) (actual time=42.327..60950272.189rows=2269623575 loops=1) > Output: vsr_number > Heap Fetches: 499950392 > Buffers: shared hit=72620045 read=45297330 > I/O Timings: read=57489958.088 > Planning time: 14.014 ms > Execution time: 61,141,110.516 ms > (11 rows) 2269623575 / (45297330 + 72620045) ~ 20, so you have an average 20 items per block. That is few, and the index seems indeed bloated. Looking at the read times, you average out at about 1 ms per block read from I/O, but with that many blocks that's of course still a long time. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Aside from I/O going to a different kind of storage, I don't think anything Aurora-specific should be at play here. Would the 118 million buffer accesses (hits+reads) only include the index scan, or would that number also reflect buffersaccessed for the 500 million heap fetches? While Aurora doesn't have a filesystem cache (since it's a different kind of storage), it does default the buffer_cache to75% to offset this. It appears that as Laurenz has pointed out, this is simply a lot of I/O requests in a serial process. BTW that's 900GB of data that was read (118 million buffers of 8k each) - on a box with only 61GB of memory available forcaching. -Jeremy Sent from my TI-83 > On Sep 17, 2018, at 12:04 PM, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > Fred Habash wrote: >> If I'm reading this correctly, it took 57M ms out of an elapsed time of 61M ms to read 45M pages from the filesystem? >> If the average service time per sarr is < 5 ms, Is this a case of bloated index where re-indexing is warranted? >> >> explain (analyze,buffers,timing,verbose,costs) >> select count(*) from jim.pitations ; >> QUERY PLAN >> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> Aggregate (cost=72893810.73..72893810.74 rows=1 width=8) (actual time=61141110.437..61141110.437 rows=1 loops=1) >> Output: count(*) >> Buffers: shared hit=72620045 read=45297330 >> I/O Timings: read=57489958.088 >> -> Index Only Scan using pit_indx_fk03 on jim.pitations (cost=0.58..67227187.37 rows=2266649344 width=0) (actual time=42.327..60950272.189rows=2269623575 loops=1) >> Output: vsr_number >> Heap Fetches: 499950392 >> Buffers: shared hit=72620045 read=45297330 >> I/O Timings: read=57489958.088 >> Planning time: 14.014 ms >> Execution time: 61,141,110.516 ms >> (11 rows) > > 2269623575 / (45297330 + 72620045) ~ 20, so you have an average 20 > items per block. That is few, and the index seems indeed bloated. > > Looking at the read times, you average out at about 1 ms per block > read from I/O, but with that many blocks that's of course still a long time. > > Yours, > Laurenz Albe > -- > Cybertec | https://www.cybertec-postgresql.com > >