Thread: Select count(*) on a 2B Rows Tables Takes ~20 Hours

Select count(*) on a 2B Rows Tables Takes ~20 Hours

From

Fd Habash

Date:

13 September 2018, 23:33:54

Based on my research in the forums and Google , it is described in multiple places that ‘select count(*)’ is expected to be slow in Postgres because of the MVCC controls imposed upon the query leading a table scan. Also, the elapsed time increase linearly with table size.

However, I do not know if elapsed time I’m getting is to be expected.

Table reltuples in pg_class = 2,266,649,344 (pretty close)

Query = select count(*) from jim.sttyations ;

Elapsed time (ET) = 18.5 hrs

This is an Aurora cluster running on r4.2xlarge (8 vCPU, 61g). CPU usage during count run hovers around 20% with 20g of freeable memory.

Is this ET expected? If not, what could be slowing it down? I’m currently running explain analyze and I’ll share the final output when done.

I’m familiar with the ideas listed here https://www.citusdata.com/blog/2016/10/12/count-performance/

Table "jim.sttyations"

-------------------+--------------------------+----------------------------+----------+--------------+-------------

Indexes:

"stty_pk" PRIMARY KEY, btree (stty_id)

"stty_indx_fk01" btree (stty_stty_id)

"stty_indx_fk03" btree (vsr_number)

"stty_indx_fk04" btree (vs_number)

"stty_indx_pr01" btree (mstty_id, stty_id)

Check constraints:

"stty_cnst_ck01" CHECK (stty_status = ANY (ARRAY['Y'::text, 'N'::text]))

"stty_cnst_ck02" CHECK (stty_stored = ANY (ARRAY['N'::text, 'Y'::text]))

Foreign-key constraints:

"stty_cnst_fk01" FOREIGN KEY (stty_stty_id) REFERENCES sttyations(stty_id) NOT VALID

"stty_cnst_fk02" FOREIGN KEY (mstty_id) REFERENCES master_sttyations(mstty_id)

"stty_cnst_fk03" FOREIGN KEY (vsr_number) REFERENCES valid_status_reasons(vsr_number)

----------------
Thank you

refpep-> select count(*) from jim.sttyations;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------

Aggregate (cost=73451291.77..73451291.78 rows=1 width=8)

Output: count(*)

-> Index Only Scan using stty_indx_fk03 on jim.sttyations (cost=0.58..67784668.41 rows=2266649344 width=0)

Output: vsr_number

(4 rows)

Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours

From

Justin Pryzby

Date:

14 September 2018, 00:05:32

On Thu, Sep 13, 2018 at 01:33:54PM -0400, Fd Habash wrote:
> Is this ET expected? If not, what could be slowing it down? I’m currently running explain analyze and I’ll share the
finaloutput when done. 
 

explain(analyze,BUFFERS) is what's probably interesting

You're getting an index-only-scan, but maybe still making many accesses to the
heap (table) for pages which aren't all-visible.  You can maybe improve by
vacuuming (perhaps by daily cronjob or by ALTER TABLE SET autovacuum threshold
or scale factor).

-- 
Justin Pryzby
System Administrator
Telsasoft
+1-952-707-8581

Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours

From

Tom Lane

Date:

14 September 2018, 00:12:02

Fd Habash <fmhabash@gmail.com> writes:
> Based on my research in the forums and Google , it is described in multiple places that ‘select count(*)’ is expected
tobe slow in Postgres because of the MVCC controls imposed upon the query leading a table scan. Also, the elapsed time
increaselinearly with table size.  
> However, I do not know if elapsed time I’m getting is to be expected.

> Table reltuples in pg_class = 2,266,649,344 (pretty close)
> Query = select count(*) from jim.sttyations ;
> Elapsed time (ET) = 18.5 hrs

That's pretty awful.  My recollection is that in recent PG releases,
SELECT COUNT(*) runs at something on the order of 100ns/row given an
all-in-memory table.  Evidently you're rather badly I/O bound.

> This is an Aurora cluster running on r4.2xlarge (8 vCPU, 61g).

Don't know much about Aurora, but I wonder whether you paid for
guaranteed (provisioned) IOPS, and if so what service level.

> refpep-> select count(*) from jim.sttyations;
>                                                     QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=73451291.77..73451291.78 rows=1 width=8)
>    Output: count(*)
>    ->  Index Only Scan using stty_indx_fk03 on jim.sttyations  (cost=0.58..67784668.41 rows=2266649344 width=0)
>          Output: vsr_number
> (4 rows)

Oh, hmm ... the 100ns figure I mentioned was for a seqscan.  IOS
could be a lot worse for a number of reasons, foremost being that
if the table isn't mostly all-visible then it'd involve a lot of
random heap access.  It might be interesting to try forcing a
seqscan plan (see enable_indexscan).

            regards, tom lane

RE: Select count(*) on a 2B Rows Tables Takes ~20 Hours

From

Fd Habash

Date:

14 September 2018, 01:35:23

Just checked metrics while the count was running …

Read latency < 3.5 ms

Write latency < 4 ms

Read throughput ~ 40 MB/sec with sporadic peaks at 100

Read IOPS ~ 5000

QDepth < 3

----------------
Thank you

From: Tom Lane
Sent: Thursday, September 13, 2018 2:12 PM
To: Fd Habash
Cc: pgsql-performance@lists.postgresql.org
Subject: Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours

Fd Habash <fmhabash@gmail.com> writes:

> Based on my research in the forums and Google , it is described in multiple places that ‘select count(*)’ is expected to be slow in Postgres because of the MVCC controls imposed upon the query leading a table scan. Also, the elapsed time increase linearly with table size.

> However, I do not know if elapsed time I’m getting is to be expected.

> Table reltuples in pg_class = 2,266,649,344 (pretty close)

> Query = select count(*) from jim.sttyations ;

> Elapsed time (ET) = 18.5 hrs

That's pretty awful. My recollection is that in recent PG releases,

SELECT COUNT(*) runs at something on the order of 100ns/row given an

all-in-memory table. Evidently you're rather badly I/O bound.

> This is an Aurora cluster running on r4.2xlarge (8 vCPU, 61g).

Don't know much about Aurora, but I wonder whether you paid for

guaranteed (provisioned) IOPS, and if so what service level.

> refpep-> select count(*) from jim.sttyations;

> QUERY PLAN

> ------------------------------------------------------------------------------------------------------------------

> Aggregate (cost=73451291.77..73451291.78 rows=1 width=8)

> Output: count(*)

> -> Index Only Scan using stty_indx_fk03 on jim.sttyations (cost=0.58..67784668.41 rows=2266649344 width=0)

> Output: vsr_number

> (4 rows)

Oh, hmm ... the 100ns figure I mentioned was for a seqscan. IOS

could be a lot worse for a number of reasons, foremost being that

if the table isn't mostly all-visible then it'd involve a lot of

random heap access. It might be interesting to try forcing a

seqscan plan (see enable_indexscan).

regards, tom lane

Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours

From

Andres Freund

Date:

14 September 2018, 01:43:47

Hi,

On 2018-09-13 14:12:02 -0400, Tom Lane wrote:
> > This is an Aurora cluster running on r4.2xlarge (8 vCPU, 61g).
> 
> Don't know much about Aurora, but I wonder whether you paid for
> guaranteed (provisioned) IOPS, and if so what service level.

Given that aurora uses direct-io and has the storage layer largely
completely replaced, I'm not sure how much we can help here.  My
understanding is that access to blocks can require page-level "log
reconciliation", which can cause adverse IO patterns.  The direct-IO
means that cache configuration / prefetching is much more crucial.  If a
lot of those tuples aren't frozen (don't quite know how that works
there), the clog accesses will also kill you if the table was filled
over many transactions, since clog's access characteristics to a lot of
xids is pretty bad with DIO.

Greetings,

Andres Freund

Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours

From

Fred Habash

Date:

17 September 2018, 22:22:46

Buffers: shared hit=72620045 read=45,297,330

I/O Timings: read=57,489,958.088

Execution time: 61,141,110.516 ms

If I'm reading this correctly, it took 57M ms out of an elapsed time of 61M ms to read 45M pages from the filesystem?

If the average service time per sarr is < 5 ms, Is this a case of bloated index where re-indexing is warranted?

Thanks

explain (analyze,buffers,timing,verbose,costs)

select count(*) from jim.pitations ;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=72893810.73..72893810.74 rows=1 width=8) (actual time=61141110.437..61141110.437 rows=1 loops=1)

Output: count(*)

Buffers: shared hit=72620045 read=45297330

I/O Timings: read=57489958.088

-> Index Only Scan using pit_indx_fk03 on jim.pitations (cost=0.58..67227187.37 rows=2266649344 width=0) (actual time=42.327..60950272.189 rows=2269623575 loops=1)

Output: vsr_number

Heap Fetches: 499950392

Buffers: shared hit=72620045 read=45297330

I/O Timings: read=57489958.088

Planning time: 14.014 ms

Execution time: 61,141,110.516 ms

(11 rows)

Time: 61141132.309 ms

refpep=>

Screen session test_pg on ip-10-241-48-178 (system load: 0.00 0.00 0.00) Sun 16.09.2018 14:52

Screen sess

Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours

From

Laurenz Albe

Date:

18 September 2018, 01:04:46

Fred Habash wrote:
> If I'm reading this correctly, it took 57M ms out of an elapsed time of 61M ms to read 45M pages from the
filesystem?
> If the average service time per sarr is < 5 ms, Is this a case of bloated index where re-indexing is warranted? 
> 
> explain (analyze,buffers,timing,verbose,costs)
> select count(*) from jim.pitations ;
>                                                                                  QUERY PLAN
                                                     
 
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate  (cost=72893810.73..72893810.74 rows=1 width=8) (actual time=61141110.437..61141110.437 rows=1 loops=1)
>    Output: count(*)
>    Buffers: shared hit=72620045 read=45297330
>    I/O Timings: read=57489958.088
>    ->  Index Only Scan using pit_indx_fk03 on jim.pitations  (cost=0.58..67227187.37 rows=2266649344 width=0) (actual
time=42.327..60950272.189rows=2269623575 loops=1)
 
>          Output: vsr_number
>          Heap Fetches: 499950392
>          Buffers: shared hit=72620045 read=45297330
>          I/O Timings: read=57489958.088
> Planning time: 14.014 ms
> Execution time: 61,141,110.516 ms
> (11 rows)

2269623575 / (45297330 + 72620045) ~ 20, so you have an average 20
items per block.  That is few, and the index seems indeed bloated.

Looking at the read times, you average out at about 1 ms per block
read from I/O, but with that many blocks that's of course still a long time.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours

From

"Schneider, Jeremy"

Date:

19 September 2018, 07:25:46

Aside from I/O going to a different kind of storage, I don't think anything Aurora-specific should be at play here.

Would the 118 million buffer accesses (hits+reads) only include the index scan, or would that number also reflect
buffersaccessed for the 500 million heap fetches? 

While Aurora doesn't have a filesystem cache (since it's a different kind of storage), it does default the buffer_cache
to75% to offset this. It appears that as Laurenz has pointed out, this is simply a lot of I/O requests in a serial
process. 

BTW that's 900GB of data that was read (118 million buffers of 8k each) - on a box with only 61GB of memory available
forcaching. 

-Jeremy

Sent from my TI-83

> On Sep 17, 2018, at 12:04 PM, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> Fred Habash wrote:
>> If I'm reading this correctly, it took 57M ms out of an elapsed time of 61M ms to read 45M pages from the
filesystem?
>> If the average service time per sarr is < 5 ms, Is this a case of bloated index where re-indexing is warranted?
>>
>> explain (analyze,buffers,timing,verbose,costs)
>> select count(*) from jim.pitations ;
>>                                                                                 QUERY PLAN
                                                      
>>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> Aggregate  (cost=72893810.73..72893810.74 rows=1 width=8) (actual time=61141110.437..61141110.437 rows=1 loops=1)
>>   Output: count(*)
>>   Buffers: shared hit=72620045 read=45297330
>>   I/O Timings: read=57489958.088
>>   ->  Index Only Scan using pit_indx_fk03 on jim.pitations  (cost=0.58..67227187.37 rows=2266649344 width=0) (actual
time=42.327..60950272.189rows=2269623575 loops=1) 
>>         Output: vsr_number
>>         Heap Fetches: 499950392
>>         Buffers: shared hit=72620045 read=45297330
>>         I/O Timings: read=57489958.088
>> Planning time: 14.014 ms
>> Execution time: 61,141,110.516 ms
>> (11 rows)
>
> 2269623575 / (45297330 + 72620045) ~ 20, so you have an average 20
> items per block.  That is few, and the index seems indeed bloated.
>
> Looking at the read times, you average out at about 1 ms per block
> read from I/O, but with that many blocks that's of course still a long time.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>
>