Thread: Thick indexes - a look at count(1) query
Hi,<br /> I have submitted a new patch against thick indexes(indexes with snapshot)<br /><a href="http://archives.postgresql.org/pgsql-hackers/2007-10/msg00220.php">http://archives.postgresql.org/pgsql-hackers/2007-10/msg00220.php. </a><br/><br />I did look closely at improving the performance of count(1) queries. It worked well, when we are selectinga sub-set of the result-set. But when i did a query like "select count(1) from table", it showed a improvement inresponse time, but not to the extent, i wanted it to be. Let's have a look at the stats. <br /><br />gokul=# explain analyzeselect count(1) from dd;<br />LOG: EXECUTOR STATISTICS<br />DETAIL: ! system usage stats:<br /> ! 0.277208elapsed 0.275457 user 0.000148 system sec<br /> ! [1.128422 user 0.004976 sys total]<br /> ! 0/0 [0/0] filesystem blocks in/out<br /> ! 0/0 [0/0] page faults/reclaims, 0 [0] swaps<br /> ! 0 [0] signals rcvd, 0/0 [6/12] messages rcvd/sent<br /> ! 0/39 [5/160] voluntary/involuntary contextswitches <br /> ! buffer usage stats:<br /> ! Shared blocks: 1024 Logical Reads, 0 Physical Reads, 0 written, buffer hit rate = 100.00%<br /> ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00%<br /> ! Direct blocks: 0 read, 0 written<br />STATEMENT: explain analyze select count(1) from dd;<br /> QUERY PLAN <br />----------------------------------------------------------------------------------------------------------------------------------<br /> Aggregate (cost=7323.10..7323.11 rows=1 width=0) (actual time=276.838..276.838 rows=1 loops=1) <br /> -> Index OnlyScan using idx on dd (cost=0.00..6741.42 rows=232671 width=0) (actual time=0.042..160.753 rows=232679 loops=1)<br /> Totalruntime: 276.928 ms<br />(3 rows)<br /><br />gokul=# set enable_indexscan=off;<br /> SET<br /><br />gokul=# explainanalyze select count(1) from dd;<br />LOG: EXECUTOR STATISTICS<br />DETAIL: ! system usage stats:<br /> ! 0.331441 elapsed 0.258903 user 0.067953 system sec<br /> ! [1.906069 user 0.211479 sys total]<br /> ! 0/0 [0/0] filesystem blocks in/out<br /> ! 0/0 [0/0] page faults/reclaims, 0 [0] swaps<br /> ! 0 [0] signals rcvd, 0/0 [10/19] messages rcvd/sent<br /> ! 0/39 [9/312] voluntary/involuntarycontext switches <br /> ! buffer usage stats:<br /> ! Shared blocks: 5223Logical Reads, 4391 Physical Reads, 0 written, buffer hit rate = 15.93%<br /> ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00%<br /> ! Direct blocks: 0 read, 0 written<br />STATEMENT: explain analyze select count(1) from dd;<br /> QUERY PLAN <br />-----------------------------------------------------------------------------------------------------------------<br /> Aggregate (cost=8131.39..8131.40 rows=1 width=0) (actual time=331.075..331.076 rows=1 loops=1)<br /> -> Seq Scanon dd (cost=0.00..7549.71 rows=232671 width=0) (actual time=0.042..203.958 rows=232679 loops=1)<br /> Total runtime:331.167 ms<br />(3 rows)<br /><br />I have run the query multiple times and it shows the response time, around whatis shown here <br />The table is just a multiple copy of pg_class table( approx 200000 rows). As it can be seen, theLogical reads show a ratio of 1:5, but the response time is not in the same ratio. I tried to profile and couldn't findanything significant. Eventhough it shows 4391 physical reads, that's from OS cache, since i ave already run the querymultiple times. <br /><br />One more disadvantage with using select count(1) using index scan is that, it pollutes theshared memory, unlike full-table scans. But something can be done in the regard.<br /><br /><br /><br />Any thoughts?<br/><br />Thanks, <br />Gokul.<br />
On Jan 16, 2008, at 8:07 AM, Gokulakannan Somasundaram wrote: > gokul=# explain analyze select count(1) from dd; > LOG: EXECUTOR STATISTICS > DETAIL: ! system usage stats: > ! 0.277208 elapsed 0.275457 user 0.000148 system sec > ! [1.128422 user 0.004976 sys total] > ! 0/0 [0/0] filesystem blocks in/out > ! 0/0 [0/0] page faults/reclaims, 0 [0] swaps > ! 0 [0] signals rcvd, 0/0 [6/12] messages rcvd/sent > ! 0/39 [5/160] voluntary/involuntary context switches > ! buffer usage stats: > ! Shared blocks: 1024 Logical Reads, 0 > Physical Reads, 0 written, buffer hit rate = 100.00% > ! Local blocks: 0 read, 0 written, > buffer hit rate = 0.00% > ! Direct blocks: 0 read, 0 written > STATEMENT: explain analyze select count(1) from dd; > QUERY PLAN > ---------------------------------------------------------------------- > ------------------------------------------------------------ > Aggregate (cost=7323.10..7323.11 rows=1 width=0) (actual > time=276.838..276.838 rows=1 loops=1) > -> Index Only Scan using idx on dd (cost=0.00..6741.42 > rows=232671 width=0) (actual time=0.042..160.753 rows=232679 loops=1) > Total runtime: 276.928 ms > (3 rows) > > gokul=# set enable_indexscan=off; > SET > > gokul=# explain analyze select count(1) from dd; > LOG: EXECUTOR STATISTICS > DETAIL: ! system usage stats: > ! 0.331441 elapsed 0.258903 user 0.067953 system sec > ! [1.906069 user 0.211479 sys total] > ! 0/0 [0/0] filesystem blocks in/out > ! 0/0 [0/0] page faults/reclaims, 0 [0] swaps > ! 0 [0] signals rcvd, 0/0 [10/19] messages rcvd/sent > ! 0/39 [9/312] voluntary/involuntary context switches > ! buffer usage stats: > ! Shared blocks: 5223 Logical Reads, 4391 > Physical Reads, 0 written, buffer hit rate = 15.93% > ! Local blocks: 0 read, 0 written, > buffer hit rate = 0.00% > ! Direct blocks: 0 read, 0 written > STATEMENT: explain analyze select count(1) from dd; > QUERY PLAN > ---------------------------------------------------------------------- > ------------------------------------------- > Aggregate (cost=8131.39..8131.40 rows=1 width=0) (actual > time=331.075..331.076 rows=1 loops=1) > -> Seq Scan on dd (cost=0.00..7549.71 rows=232671 width=0) > (actual time=0.042..203.958 rows=232679 loops=1) > Total runtime: 331.167 ms > (3 rows) > > I have run the query multiple times and it shows the response time, > around what is shown here > The table is just a multiple copy of pg_class table( approx 200000 > rows). As it can be seen, the Logical reads show a ratio of 1:5, > but the response time is not in the same ratio. I tried to profile > and couldn't find anything significant. Eventhough it shows 4391 > physical reads, that's from OS cache, since i ave already run the > query multiple times. Have you tried just executing the query with executor stats on? You could be seeing the overhead of explain analyze... -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
As you said, 'gettimeofday' is occupying 40% of the execution time. But without explain analyze(with just executor stats), i am not getting the time duration at all. Is there any other way, i can get the timings without this overhead?
Thanks,
Gokul.
Thanks,
Gokul.
On Jan 18, 2008 1:23 AM, Decibel! <decibel@decibel.org> wrote:
Have you tried just executing the query with executor stats on? YouOn Jan 16, 2008, at 8:07 AM, Gokulakannan Somasundaram wrote:
> gokul=# explain analyze select count(1) from dd;
> LOG: EXECUTOR STATISTICS
> DETAIL: ! system usage stats:
> ! 0.277208 elapsed 0.275457 user 0.000148 system sec
> ! [1.128422 user 0.004976 sys total]
> ! 0/0 [0/0] filesystem blocks in/out
> ! 0/0 [0/0] page faults/reclaims, 0 [0] swaps
> ! 0 [0] signals rcvd, 0/0 [6/12] messages rcvd/sent
> ! 0/39 [5/160] voluntary/involuntary context switches
> ! buffer usage stats:
> ! Shared blocks: 1024 Logical Reads, 0
> Physical Reads, 0 written, buffer hit rate = 100.00%
> ! Local blocks: 0 read, 0 written,
> buffer hit rate = 0.00%
> ! Direct blocks: 0 read, 0 written
> STATEMENT: explain analyze select count(1) from dd;
> QUERY PLAN
> ----------------------------------------------------------------------
> ------------------------------------------------------------
> Aggregate (cost=7323.10..7323.11 rows=1 width=0) (actual
> time=276.838..276.838 rows=1 loops=1)
> -> Index Only Scan using idx on dd (cost= 0.00..6741.42
> rows=232671 width=0) (actual time=0.042..160.753 rows=232679 loops=1)
> Total runtime: 276.928 ms
> (3 rows)
>
> gokul=# set enable_indexscan=off;
> SET
>
> gokul=# explain analyze select count(1) from dd;
> LOG: EXECUTOR STATISTICS
> DETAIL: ! system usage stats:
> ! 0.331441 elapsed 0.258903 user 0.067953 system sec
> ! [1.906069 user 0.211479 sys total]
> ! 0/0 [0/0] filesystem blocks in/out
> ! 0/0 [0/0] page faults/reclaims, 0 [0] swaps
> ! 0 [0] signals rcvd, 0/0 [10/19] messages rcvd/sent
> ! 0/39 [9/312] voluntary/involuntary context switches
> ! buffer usage stats:
> ! Shared blocks: 5223 Logical Reads, 4391
> Physical Reads, 0 written, buffer hit rate = 15.93%
> ! Local blocks: 0 read, 0 written,
> buffer hit rate = 0.00%
> ! Direct blocks: 0 read, 0 written
> STATEMENT: explain analyze select count(1) from dd;
> QUERY PLAN
> ----------------------------------------------------------------------
> -------------------------------------------
> Aggregate (cost=8131.39..8131.40 rows=1 width=0) (actual
> time=331.075..331.076 rows=1 loops=1)
> -> Seq Scan on dd (cost=0.00..7549.71 rows=232671 width=0)
> (actual time=0.042..203.958 rows=232679 loops=1)
> Total runtime: 331.167 ms
> (3 rows)
>
> I have run the query multiple times and it shows the response time,
> around what is shown here
> The table is just a multiple copy of pg_class table( approx 200000
> rows). As it can be seen, the Logical reads show a ratio of 1:5,
> but the response time is not in the same ratio. I tried to profile
> and couldn't find anything significant. Eventhough it shows 4391
> physical reads, that's from OS cache, since i ave already run the
> query multiple times.
could be seeing the overhead of explain analyze...
--
Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828
On Wed, Jan 23, 2008 at 10:02:14PM +0530, Gokulakannan Somasundaram wrote: > As you said, 'gettimeofday' is occupying 40% of the execution time. But > without explain analyze(with just executor stats), i am not getting the time > duration at all. Is there any other way, i can get the timings without this > overhead? \timing iirc. Inside psql, check the help. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution inevitable. > -- John F Kennedy
On Jan 23, 2008 5:48 PM, Martijn van Oosterhout <kleptog@svana.org> wrote: > On Wed, Jan 23, 2008 at 10:02:14PM +0530, Gokulakannan Somasundaram wrote: > > As you said, 'gettimeofday' is occupying 40% of the execution time. But > > without explain analyze(with just executor stats), i am not getting the time > > duration at all. Is there any other way, i can get the timings without this > > overhead? > > \timing iirc. Yep, it's \timing. You can also use log_min_duration_statement and pgFouine to have average stats on a longer run. -- Guillaume