Home > mailing lists

Re: force the use of a particular index - Mailing list pgsql-performance

From	Scott Cain
Subject	Re: force the use of a particular index
Date	July 12, 2003 00:07:33
Msg-id	1057937779.5767.17.camel@localhost.localdomain Whole thread Raw
In response to	Re: force the use of a particular index (Rod Taylor <rbt@rbt.ca>)
Responses	Re: force the use of a particular index
List	pgsql-performance

Tree view

Rod,

I see what you mean about the scan on the feature_pkey taking a long
time.  I tried several things to remedy that.  I created an index on
feature (feature_id,type_id) (which I don't think makes sense since
feature_id is the primary key, so add another column really doesn't
help).  I also created a index on feature (type_id, feature_id), but the
planner doesn't use it.  Also, there was an already existing index on
feature (type_id) that the planner never used.

One thing I tried that changed the query plan and improved performance
slightly (but still nowhere near what I need) was to add a partial index
on featureloc on (fmin,fmax) where scrfeature_id=6.  This is something I
could realistically do since there are relatively few (>30)
srcfeature_ids that I am interested in, so putting in place a partial
index for each of them would not be a big deal.  Nevertheless, the
performance is still not there.  Here is the EXPLAIN ANALYZE for this
situation:

                                                                           QUERY PLAN

  

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=156172.23..156200.11 rows=159 width=54) (actual time=63631.93..63631.93 rows=1 loops=1)
   ->  Sort  (cost=156172.23..156176.21 rows=1594 width=54) (actual time=63631.93..63631.93 rows=1 loops=1)
         Sort Key: f.name, fl.fmin, fl.fmax, fl.strand, f.type_id, f.feature_id
         ->  Hash Join  (cost=135100.30..156087.46 rows=1594 width=54) (actual time=63631.29..63631.79 rows=1 loops=1)
               Hash Cond: ("outer".feature_id = "inner".feature_id)
               ->  Index Scan using featureloc_src_6 on featureloc fl  (cost=0.00..18064.99 rows=101883 width=14)
(actualtime=26.11..430.00 rows=570 loops=1) 
                     Index Cond: ((fmin <= 2585581) AND (fmax >= 2565581))
                     Filter: (srcfeature_id = 6)
               ->  Hash  (cost=134601.43..134601.43 rows=48347 width=40) (actual time=63182.86..63182.86 rows=0
loops=1)
                     ->  Index Scan using feature_pkey on feature f  (cost=0.00..134601.43 rows=48347 width=40) (actual
time=69.98..62978.27rows=13825 loops=1) 
                           Filter: (type_id = 219)
 Total runtime: 63632.28 msec
(12 rows)

Any other ideas?

Thanks,
Scott

On Fri, 2003-07-11 at 09:38, Rod Taylor wrote:
> On Fri, 2003-07-11 at 13:17, Scott Cain wrote:
> > The problem (at least as it appears to me) is not that it is performing
> > a table scan instead of an index scan, it is that it is using the wrong
> > index.  Here is the output from EXPLAIN ANALYZE:
> >
> >                                                                          QUERY PLAN
> >
------------------------------------------------------------------------------------------------------------------------------------------------------------
> >  Unique  (cost=494008.47..494037.59 rows=166 width=54) (actual time=114660.37..114660.38 rows=1 loops=1)
> >    ->  Sort  (cost=494008.47..494012.63 rows=1664 width=54) (actual time=114660.37..114660.37 rows=1 loops=1)
> >          Sort Key: f.name, fl.fmin, fl.fmax, fl.strand, f.type_id, f.feature_id
> >          ->  Nested Loop  (cost=0.00..493919.44 rows=1664 width=54) (actual time=2596.13..114632.90 rows=1 loops=1)
> >                ->  Index Scan using feature_pkey on feature f  (cost=0.00..134601.43 rows=52231 width=40) (actual
time=105.74..56048.87rows=13825 loops=1) 
> >                      Filter: (type_id = 219)
> >                ->  Index Scan using featureloc_idx1 on featureloc fl  (cost=0.00..6.87 rows=1 width=14) (actual
time=4.23..4.23rows=0 loops=13825) 
> >                      Index Cond: ("outer".feature_id = fl.feature_id)
> >                      Filter: ((srcfeature_id = 6) AND (fmin <= 2585581) AND (fmax >= 2565581))
> >  Total runtime: 114660.91 msec
>
> > it is using on featureloc (featureloc_idx1) is on the foreign key
> > feature_id.  It should instead be using another index, featureloc_idx3,
> > which is built on (srcfeature_id, fmin, fmax).
>
> Nope.. The optimizer is right in the decision to use featureloc_idx1.
> You will notice it is expecting to retrieve a single row from this
> index, but the featureloc_idx3 is bound to be larger (due to indexing
> more data), thus take more disk reads for the exact same information (or
> in this case, lack thereof).
>
> What is taking a long time is the scan on feature_pkey. It looks like it
> is throwing away a ton of rows that are not type_id = 219.  Either that,
> or you do a pile of deletes and haven't run REINDEX recently.
>
> Create an index consisting of (feature_id, type_id).  This will probably
> make a significant different in execution time.
--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.org
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

pgsql-performance by date:

From: Scott Cain
Date: 11 July 2003, 18:39:26
Subject: Re: force the use of a particular index

From: Rod Taylor
Date: 12 July 2003, 00:28:54
Subject: Re: force the use of a particular index

Re: force the use of a particular index - Mailing list pgsql-performance

Previous

Next