Thread: Selective usage of index in planner/optimizer (Too conservative?)

Selective usage of index in planner/optimizer (Too conservative?)

From
Ludwig Lim
Date:
Hi:

 I was testing a database when notice that it does not
used the new index I created. So after a couple of
VACUUM ANALYZE it tried the following test queries.

****  TEST CASE #1 ***********
loyalty=# set enable_seqscan=off;
SET VARIABLE
loyalty=# explain analyze select count(*) from points
where branch_cd=1 ;
NOTICE:  QUERY PLAN:

Aggregate  (cost=119123.54..119123.54 rows=1 width=0)
(actual time=811.08..811.0
8 rows=1 loops=1)
  ->  Index Scan using idx_monthly_branch on points
(cost=0.00..1187
65.86 rows=143073 width=0) (actual time=0.19..689.75
rows=136790 loops=1)
Total runtime: 811.17 msec

*****  TEST CASE #2 *********
loyalty=# set enable_seqscan=on;
SET VARIABLE
loyalty=# explain analyze select count(*) from points
where branch_cd=1 ;
NOTICE:  QUERY PLAN:

Aggregate  (cost=62752.34..62752.34 rows=1 width=0)
(actual time=3593.93..3593.9
3 rows=1 loops=1)
  ->  Seq Scan on points  (cost=0.00..62681.70
rows=28254 width=0) (a
ctual time=0.33..3471.54 rows=136790 loops=1)
Total runtime: 3594.01 msec


*** TEST CASE #3 (Sequential scan turned off) ******
loyalty=# explain select * from points where
branch_cd=5;
NOTICE:  QUERY PLAN:

Index Scan using idx_monthly_branch on points
(cost=0.00..49765.12 r
ows=16142 width=55)



   I am wondering why in test case #2 it did not use
an index scan, where as in case #3 it did. The number
of rows in test #2 and #3 are just a small subset of
table "points".

   The following are the number of elements in the
table:
   branch_cd = 1    --->     136,970
   branch_cd = 5    --->      39,385
   count(*)         --->   2,570,173

   Its rather strange why "SELECT COUNT(*)...WHERE
branch_cd=1" uses sequential scan even though it just
comprises 5.3% of whole table...

  I'ts also strange because of the ff: (Remember test
case 1 and 2 are the same query)

test 1   -->  seq_scan=off  --> 811.17 msec
test 2   -->  seq_scan=on   --> 3594.01 msec

  Test #1 have 400% improvement over Test #2, yet the
query plan for test #2 is the default.

  Are there way to let the planner improve the choice
in using an index or not?  BTW the "cost" variables
are set to the default for the test.


   Thank you in advance.

ludwig.


__________________________________________________
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!
http://sbc.yahoo.com

Re: Selective usage of index in planner/optimizer (Too conservative?)

From
Tom Lane
Date:
Ludwig Lim <lud_nowhere_man@yahoo.com> writes:
> NOTICE:  QUERY PLAN:

> Aggregate  (cost=119123.54..119123.54 rows=1 width=0)
> (actual time=811.08..811.0
> 8 rows=1 loops=1)
>   ->  Index Scan using idx_monthly_branch on points
> (cost=0.00..1187
> 65.86 rows=143073 width=0) (actual time=0.19..689.75
> rows=136790 loops=1)
> Total runtime: 811.17 msec

> NOTICE:  QUERY PLAN:

> Aggregate  (cost=62752.34..62752.34 rows=1 width=0)
> (actual time=3593.93..3593.9
> 3 rows=1 loops=1)
>   ->  Seq Scan on points  (cost=0.00..62681.70
> rows=28254 width=0) (a
> ctual time=0.33..3471.54 rows=136790 loops=1)
> Total runtime: 3594.01 msec

Something fishy about this --- why is the estimated number of rows
different in the two cases (143073 vs 28254)?  Did you redo VACUUM
and/or ANALYZE in between?

>    I am wondering why in test case #2 it did not use
> an index scan, where as in case #3 it did.

Probably because it knows "branch_cd=5" is more selective than
"branch_cd=1".  It would be useful to see the pg_stats entry for
branch_cd.

>    Its rather strange why "SELECT COUNT(*)...WHERE
> branch_cd=1" uses sequential scan even though it just
> comprises 5.3% of whole table...

No, what's strange is that it's faster to use an indexscan for that.
The table must be very nearly in order by branch_cd; have you clustered
it recently?

            regards, tom lane

Re: Selective usage of index in planner/optimizer (Too conservative?)

From
Ludwig Lim
Date:
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Something fishy about this --- why is the estimated
> number of rows
> different in the two cases (143073 vs 28254)?  Did
> you redo VACUUM
> and/or ANALYZE in between?

   I neither VACUUMed nor ANALYZEd between the 2
cases.
>
> >    I am wondering why in test case #2 it did not
> use
> > an index scan, where as in case #3 it did.
>
> Probably because it knows "branch_cd=5" is more
> selective than
> "branch_cd=1".  It would be useful to see the
> pg_stats entry for
> branch_cd.

  Should I try altering the statistics? I tried
  ANALYZE points(branch_cd);
  but it still gave me the same results.

> >    Its rather strange why "SELECT COUNT(*)...WHERE
> > branch_cd=1" uses sequential scan even though it
> just
> > comprises 5.3% of whole table...

   What I mean is the table is rather large. (2
million rows) and I thought the planner would
automatically used an index to retrieve a small subset
(based on the percentage) of the large table.

> No, what's strange is that it's faster to use an
> indexscan for that.
> The table must be very nearly in order by branch_cd;
> have you clustered
> it recently?

  I never clustered the table.

   But prior to testing I dropped an index and create
a new one. Does dropping and creating index "confuse"
the planner even after a VACUUM ANALYZE?

   I seem to notice this trend everytime I add a new
index to the table. It would slow down and the
performance would gradually improve in a day or two.

  Should I try changing "cost" variables? I'm using
Pentium IV, with SCSI [RAID 5].

regards,

ludwig.


__________________________________________________
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!
http://sbc.yahoo.com