Thread: Thick indexes - a look at count(1) query

Thick indexes - a look at count(1) query

From

"Gokulakannan Somasundaram"

Date:

16 January 2008, 10:17:41

Hi,<br />   I have submitted a new patch against thick indexes(indexes with snapshot)<br /><a
href="http://archives.postgresql.org/pgsql-hackers/2007-10/msg00220.php">http://archives.postgresql.org/pgsql-hackers/2007-10/msg00220.php.
</a><br/><br />I did look closely at improving the performance of count(1) queries. It worked well, when we are
selectinga sub-set of the result-set. But when i did a query like "select count(1) from table", it showed a improvement
inresponse time, but not to the extent, i wanted it to be. Let's have a look at the stats. <br /><br />gokul=# explain
analyzeselect count(1) from dd;<br />LOG:  EXECUTOR STATISTICS<br />DETAIL:  ! system usage stats:<br />        !      
0.277208elapsed 0.275457 user 0.000148 system sec<br />        !       [1.128422 user 0.004976 sys total]<br />       
!      0/0 [0/0] filesystem blocks in/out<br />        !       0/0 [0/0] page faults/reclaims, 0 [0] swaps<br />       
!      0 [0] signals rcvd, 0/0 [6/12] messages rcvd/sent<br />        !       0/39 [5/160] voluntary/involuntary
contextswitches <br />        ! buffer usage stats:<br />        !       Shared blocks:       1024 Logical
Reads,         0 Physical Reads,          0 written, buffer hit rate = 100.00%<br />        !       Local 
blocks:         0 read,          0 written, buffer hit rate = 0.00%<br />        !       Direct blocks:          0
read,         0 written<br />STATEMENT:  explain analyze select count(1) from dd;<br
/>                                                           QUERY
PLAN                                                           <br
/>----------------------------------------------------------------------------------------------------------------------------------<br
/> Aggregate (cost=7323.10..7323.11 rows=1 width=0) (actual time=276.838..276.838 rows=1 loops=1) <br />   ->  Index
OnlyScan using idx on dd  (cost=0.00..6741.42 rows=232671 width=0) (actual time=0.042..160.753 rows=232679 loops=1)<br
/> Totalruntime: 276.928 ms<br />(3 rows)<br /><br />gokul=# set enable_indexscan=off;<br /> SET<br /><br />gokul=#
explainanalyze select count(1) from dd;<br />LOG:  EXECUTOR STATISTICS<br />DETAIL:  ! system usage stats:<br />       
!      0.331441 elapsed 0.258903 user 0.067953 system sec<br />        !       [1.906069 user 0.211479 sys total]<br
/>       !       0/0 [0/0] filesystem blocks in/out<br />        !       0/0 [0/0] page faults/reclaims, 0 [0] swaps<br
/>       !       0 [0] signals rcvd, 0/0 [10/19] messages rcvd/sent<br />        !       0/39 [9/312]
voluntary/involuntarycontext switches <br />        ! buffer usage stats:<br />        !       Shared blocks:      
5223Logical Reads,       4391 Physical Reads,          0 written, buffer hit rate = 15.93%<br />        !       Local 
blocks:         0 read,          0 written, buffer hit rate = 0.00%<br />        !       Direct blocks:          0
read,         0 written<br />STATEMENT:  explain analyze select count(1) from dd;<br
/>                                                  QUERY PLAN                                                    <br
/>-----------------------------------------------------------------------------------------------------------------<br
/> Aggregate (cost=8131.39..8131.40 rows=1 width=0) (actual time=331.075..331.076 rows=1 loops=1)<br />    ->  Seq
Scanon dd  (cost=0.00..7549.71 rows=232671 width=0) (actual time=0.042..203.958 rows=232679 loops=1)<br /> Total
runtime:331.167 ms<br />(3 rows)<br /><br />I have run the query multiple times and it shows the response time, around
whatis shown here <br />The table is just a multiple copy of pg_class table( approx 200000 rows). As it can be seen,
theLogical reads show a ratio of 1:5, but the  response time is not in the same ratio. I tried to profile and couldn't
findanything significant. Eventhough it shows 4391 physical reads, that's from OS cache,  since i ave already run the
querymultiple times. <br /><br />One more disadvantage with using select count(1) using index scan is that, it pollutes
theshared memory, unlike full-table scans. But something can be done in the regard.<br /><br /><br /><br />Any
thoughts?<br/><br />Thanks, <br />Gokul.<br />

Re: Thick indexes - a look at count(1) query

From

Decibel!

Date:

17 January 2008, 15:58:08

On Jan 16, 2008, at 8:07 AM, Gokulakannan Somasundaram wrote:
> gokul=# explain analyze select count(1) from dd;
> LOG:  EXECUTOR STATISTICS
> DETAIL:  ! system usage stats:
>         !       0.277208 elapsed 0.275457 user 0.000148 system sec
>         !       [1.128422 user 0.004976 sys total]
>         !       0/0 [0/0] filesystem blocks in/out
>         !       0/0 [0/0] page faults/reclaims, 0 [0] swaps
>         !       0 [0] signals rcvd, 0/0 [6/12] messages rcvd/sent
>         !       0/39 [5/160] voluntary/involuntary context switches
>         ! buffer usage stats:
>         !       Shared blocks:       1024 Logical Reads,          0  
> Physical Reads,          0 written, buffer hit rate = 100.00%
>         !       Local  blocks:          0 read,          0 written,  
> buffer hit rate = 0.00%
>         !       Direct blocks:          0 read,          0 written
> STATEMENT:  explain analyze select count(1) from dd;
>                                                             QUERY PLAN
> ---------------------------------------------------------------------- 
> ------------------------------------------------------------
>  Aggregate  (cost=7323.10..7323.11 rows=1 width=0) (actual  
> time=276.838..276.838 rows=1 loops=1)
>    ->  Index Only Scan using idx on dd  (cost=0.00..6741.42  
> rows=232671 width=0) (actual time=0.042..160.753 rows=232679 loops=1)
>  Total runtime: 276.928 ms
> (3 rows)
>
> gokul=# set enable_indexscan=off;
> SET
>
> gokul=# explain analyze select count(1) from dd;
> LOG:  EXECUTOR STATISTICS
> DETAIL:  ! system usage stats:
>         !       0.331441 elapsed 0.258903 user 0.067953 system sec
>         !       [1.906069 user 0.211479 sys total]
>         !       0/0 [0/0] filesystem blocks in/out
>         !       0/0 [0/0] page faults/reclaims, 0 [0] swaps
>         !       0 [0] signals rcvd, 0/0 [10/19] messages rcvd/sent
>         !       0/39 [9/312] voluntary/involuntary context switches
>         ! buffer usage stats:
>         !       Shared blocks:       5223 Logical Reads,       4391  
> Physical Reads,          0 written, buffer hit rate = 15.93%
>         !       Local  blocks:          0 read,          0 written,  
> buffer hit rate = 0.00%
>         !       Direct blocks:          0 read,          0 written
> STATEMENT:  explain analyze select count(1) from dd;
>                                                    QUERY PLAN
> ---------------------------------------------------------------------- 
> -------------------------------------------
>  Aggregate  (cost=8131.39..8131.40 rows=1 width=0) (actual  
> time=331.075..331.076 rows=1 loops=1)
>    ->  Seq Scan on dd  (cost=0.00..7549.71 rows=232671 width=0)  
> (actual time=0.042..203.958 rows=232679 loops=1)
>  Total runtime: 331.167 ms
> (3 rows)
>
> I have run the query multiple times and it shows the response time,  
> around what is shown here
> The table is just a multiple copy of pg_class table( approx 200000  
> rows). As it can be seen, the Logical reads show a ratio of 1:5,  
> but the  response time is not in the same ratio. I tried to profile  
> and couldn't find anything significant. Eventhough it shows 4391  
> physical reads, that's from OS cache,  since i ave already run the  
> query multiple times.

Have you tried just executing the query with executor stats on? You  
could be seeing the overhead of explain analyze...
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Re: Thick indexes - a look at count(1) query

From

"Gokulakannan Somasundaram"

Date:

23 January 2008, 12:41:48

As you said, 'gettimeofday' is occupying 40% of the execution time. But without explain analyze(with just executor stats), i am not getting the time duration at all. Is there any other way, i can get the timings without this overhead?

Thanks,
Gokul.

On Jan 18, 2008 1:23 AM, Decibel! <decibel@decibel.org> wrote:

On Jan 16, 2008, at 8:07 AM, Gokulakannan Somasundaram wrote:
> gokul=# explain analyze select count(1) from dd;
> LOG: EXECUTOR STATISTICS
> DETAIL: ! system usage stats:
> ! 0.277208 elapsed 0.275457 user 0.000148 system sec
> ! [1.128422 user 0.004976 sys total]
> ! 0/0 [0/0] filesystem blocks in/out
> ! 0/0 [0/0] page faults/reclaims, 0 [0] swaps
> ! 0 [0] signals rcvd, 0/0 [6/12] messages rcvd/sent
> ! 0/39 [5/160] voluntary/involuntary context switches
> ! buffer usage stats:
> ! Shared blocks: 1024 Logical Reads, 0
> Physical Reads, 0 written, buffer hit rate = 100.00%
> ! Local blocks: 0 read, 0 written,
> buffer hit rate = 0.00%
> ! Direct blocks: 0 read, 0 written
> STATEMENT: explain analyze select count(1) from dd;
> QUERY PLAN
> ----------------------------------------------------------------------
> ------------------------------------------------------------
> Aggregate (cost=7323.10..7323.11 rows=1 width=0) (actual
> time=276.838..276.838 rows=1 loops=1)
> -> Index Only Scan using idx on dd (cost= 0.00..6741.42
> rows=232671 width=0) (actual time=0.042..160.753 rows=232679 loops=1)
> Total runtime: 276.928 ms
> (3 rows)
>
> gokul=# set enable_indexscan=off;
> SET
>
> gokul=# explain analyze select count(1) from dd;
> LOG: EXECUTOR STATISTICS
> DETAIL: ! system usage stats:
> ! 0.331441 elapsed 0.258903 user 0.067953 system sec
> ! [1.906069 user 0.211479 sys total]
> ! 0/0 [0/0] filesystem blocks in/out
> ! 0/0 [0/0] page faults/reclaims, 0 [0] swaps
> ! 0 [0] signals rcvd, 0/0 [10/19] messages rcvd/sent
> ! 0/39 [9/312] voluntary/involuntary context switches
> ! buffer usage stats:
> ! Shared blocks: 5223 Logical Reads, 4391
> Physical Reads, 0 written, buffer hit rate = 15.93%
> ! Local blocks: 0 read, 0 written,
> buffer hit rate = 0.00%
> ! Direct blocks: 0 read, 0 written
> STATEMENT: explain analyze select count(1) from dd;
> QUERY PLAN
> ----------------------------------------------------------------------
> -------------------------------------------
> Aggregate (cost=8131.39..8131.40 rows=1 width=0) (actual
> time=331.075..331.076 rows=1 loops=1)
> -> Seq Scan on dd (cost=0.00..7549.71 rows=232671 width=0)
> (actual time=0.042..203.958 rows=232679 loops=1)
> Total runtime: 331.167 ms
> (3 rows)
>
> I have run the query multiple times and it shows the response time,
> around what is shown here
> The table is just a multiple copy of pg_class table( approx 200000
> rows). As it can be seen, the Logical reads show a ratio of 1:5,
> but the response time is not in the same ratio. I tried to profile
> and couldn't find anything significant. Eventhough it shows 4391
> physical reads, that's from OS cache, since i ave already run the
> query multiple times.

Have you tried just executing the query with executor stats on? You
could be seeing the overhead of explain analyze...
--
Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Re: Thick indexes - a look at count(1) query

From

Martijn van Oosterhout

Date:

23 January 2008, 13:02:19

On Wed, Jan 23, 2008 at 10:02:14PM +0530, Gokulakannan Somasundaram wrote:
> As you said, 'gettimeofday' is occupying 40% of the execution time. But
> without explain analyze(with just executor stats), i am not getting the time
> duration at all.  Is there any other way, i can get the timings without this
> overhead?

\timing iirc.

Inside psql, check the help.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
>  -- John F Kennedy

Re: Thick indexes - a look at count(1) query

From

"Guillaume Smet"

Date:

23 January 2008, 13:39:20

On Jan 23, 2008 5:48 PM, Martijn van Oosterhout <kleptog@svana.org> wrote:
> On Wed, Jan 23, 2008 at 10:02:14PM +0530, Gokulakannan Somasundaram wrote:
> > As you said, 'gettimeofday' is occupying 40% of the execution time. But
> > without explain analyze(with just executor stats), i am not getting the time
> > duration at all.  Is there any other way, i can get the timings without this
> > overhead?
>
> \timing iirc.

Yep, it's \timing.

You can also use log_min_duration_statement and pgFouine to have
average stats on a longer run.

--
Guillaume