Thread: Slow IN query

Slow IN query

From
"Eric Jain"
Date:
Using an OR or IN query seems to be orders of magnitudes slower than
running a query twice. There is an unique index on 'id' and an index on
'model_ns, model'. The number of row returned is less than 800.
Everything is vacuumed and analyzed. Running on 7.4.1. Perhaps this
situation is something the optimizer could be cleverer about; if not,
never mind.

explain select * from statements
where model_ns='4' and model in ('P42655', 'Q9XFM4')
order by id;

 Index Scan using statements_pkey on statements  (cost=0.00..2166475.49
rows=107 width=113)
   Filter: ((model_ns = 4::smallint) AND ((model = 'P42655'::text) OR
(model = 'Q9XFM4'::text)))


explain select * from statements
where model_ns='4' and model = 'P42655'
union
select * from statements
where model_ns='4' and model = 'Q9XFM4'
order by id;

 Sort  (cost=425.84..426.11 rows=108 width=113)
   Sort Key: id
   ->  Unique  (cost=418.14..422.19 rows=108 width=113)
         ->  Sort  (cost=418.14..418.41 rows=108 width=113)
               Sort Key: id, model_ns, model, "statement", subject_ns,
subject, predicate_ns, predicate, object_ns, object, object_string,
object_number, object_boolean, generated
               ->  Append  (cost=0.00..414.49 rows=108 width=113)
                     ->  Subquery Scan "*SELECT* 1"  (cost=0.00..207.25
rows=54 width=113)
                           ->  Index Scan using statements_uniprot_idx
on statements  (cost=0.00..206.71 rows=54 width=113)
                                 Index Cond: ((model_ns = 4::smallint)
AND (model = 'P42655'::text))
                     ->  Subquery Scan "*SELECT* 2"  (cost=0.00..207.25
rows=54 width=113)
                           ->  Index Scan using statements_uniprot_idx
on statements  (cost=0.00..206.71 rows=54 width=113)
                                 Index Cond: ((model_ns = 4::smallint)
AND (model = 'Q9XFM4'::text))


Re: Slow IN query

From
Tom Lane
Date:
"Eric Jain" <Eric.Jain@isb-sib.ch> writes:
> explain select * from statements
> where model_ns='4' and model in ('P42655', 'Q9XFM4')
> order by id;

This works reasonably well in CVS tip, but in 7.4 and earlier the
planner will not figure out that a multi-column index can be used unless
the OR condition is on the *first* column of the index.  Try flipping
the order of your index columns.

Stupid example in 7.4.2:

regression=# create table statements (model_ns smallint, model text,
regression(# unique(model_ns,model));
NOTICE:  CREATE TABLE / UNIQUE will create implicit index "statements_model_ns_key" for table "statements"
CREATE TABLE
regression=# explain select * from statements where model_ns='4' and model in ('P42655', 'Q9XFM4');
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Index Scan using statements_model_ns_key on statements  (cost=0.00..17.09 rows=1 width=34)
   Index Cond: (model_ns = 4::smallint)
   Filter: ((model = 'P42655'::text) OR (model = 'Q9XFM4'::text))
(3 rows)

regression=# drop table statements;
DROP TABLE
regression=# create table statements (model_ns smallint, model text,
regression(# unique(model,model_ns));
NOTICE:  CREATE TABLE / UNIQUE will create implicit index "statements_model_key" for table "statements"
CREATE TABLE
regression=# explain select * from statements where model_ns='4' and model in ('P42655', 'Q9XFM4');
                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using statements_model_key, statements_model_key on statements  (cost=0.00..9.66 rows=1 width=34)
   Index Cond: (((model = 'P42655'::text) AND (model_ns = 4::smallint)) OR ((model = 'Q9XFM4'::text) AND (model_ns =
4::smallint)))
   Filter: ((model_ns = 4::smallint) AND ((model = 'P42655'::text) OR (model = 'Q9XFM4'::text)))
(3 rows)

Development tip, however, is able to produce the latter plan in both
cases.

In your test case, I would imagine that the condition on model_ns alone
is being judged too unselective to make an index scan worthwhile.

            regards, tom lane