Thread: Slow IN query
Using an OR or IN query seems to be orders of magnitudes slower than running a query twice. There is an unique index on 'id' and an index on 'model_ns, model'. The number of row returned is less than 800. Everything is vacuumed and analyzed. Running on 7.4.1. Perhaps this situation is something the optimizer could be cleverer about; if not, never mind. explain select * from statements where model_ns='4' and model in ('P42655', 'Q9XFM4') order by id; Index Scan using statements_pkey on statements (cost=0.00..2166475.49 rows=107 width=113) Filter: ((model_ns = 4::smallint) AND ((model = 'P42655'::text) OR (model = 'Q9XFM4'::text))) explain select * from statements where model_ns='4' and model = 'P42655' union select * from statements where model_ns='4' and model = 'Q9XFM4' order by id; Sort (cost=425.84..426.11 rows=108 width=113) Sort Key: id -> Unique (cost=418.14..422.19 rows=108 width=113) -> Sort (cost=418.14..418.41 rows=108 width=113) Sort Key: id, model_ns, model, "statement", subject_ns, subject, predicate_ns, predicate, object_ns, object, object_string, object_number, object_boolean, generated -> Append (cost=0.00..414.49 rows=108 width=113) -> Subquery Scan "*SELECT* 1" (cost=0.00..207.25 rows=54 width=113) -> Index Scan using statements_uniprot_idx on statements (cost=0.00..206.71 rows=54 width=113) Index Cond: ((model_ns = 4::smallint) AND (model = 'P42655'::text)) -> Subquery Scan "*SELECT* 2" (cost=0.00..207.25 rows=54 width=113) -> Index Scan using statements_uniprot_idx on statements (cost=0.00..206.71 rows=54 width=113) Index Cond: ((model_ns = 4::smallint) AND (model = 'Q9XFM4'::text))
"Eric Jain" <Eric.Jain@isb-sib.ch> writes: > explain select * from statements > where model_ns='4' and model in ('P42655', 'Q9XFM4') > order by id; This works reasonably well in CVS tip, but in 7.4 and earlier the planner will not figure out that a multi-column index can be used unless the OR condition is on the *first* column of the index. Try flipping the order of your index columns. Stupid example in 7.4.2: regression=# create table statements (model_ns smallint, model text, regression(# unique(model_ns,model)); NOTICE: CREATE TABLE / UNIQUE will create implicit index "statements_model_ns_key" for table "statements" CREATE TABLE regression=# explain select * from statements where model_ns='4' and model in ('P42655', 'Q9XFM4'); QUERY PLAN -------------------------------------------------------------------------------------------- Index Scan using statements_model_ns_key on statements (cost=0.00..17.09 rows=1 width=34) Index Cond: (model_ns = 4::smallint) Filter: ((model = 'P42655'::text) OR (model = 'Q9XFM4'::text)) (3 rows) regression=# drop table statements; DROP TABLE regression=# create table statements (model_ns smallint, model text, regression(# unique(model,model_ns)); NOTICE: CREATE TABLE / UNIQUE will create implicit index "statements_model_key" for table "statements" CREATE TABLE regression=# explain select * from statements where model_ns='4' and model in ('P42655', 'Q9XFM4'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Index Scan using statements_model_key, statements_model_key on statements (cost=0.00..9.66 rows=1 width=34) Index Cond: (((model = 'P42655'::text) AND (model_ns = 4::smallint)) OR ((model = 'Q9XFM4'::text) AND (model_ns = 4::smallint))) Filter: ((model_ns = 4::smallint) AND ((model = 'P42655'::text) OR (model = 'Q9XFM4'::text))) (3 rows) Development tip, however, is able to produce the latter plan in both cases. In your test case, I would imagine that the condition on model_ns alone is being judged too unselective to make an index scan worthwhile. regards, tom lane