Thread: index not used if using IN or OR

index not used if using IN or OR

From
Mario Ivankovits
Date:
Hello !

Sorry if this has been discussed before, it is just hard to find in the
archives using the words "or" or "in" :-o

I use postgres-8.0 beta4 for windows.
I broke down my problem to a very simple table - two columns
"primary_key" and "secondary_key". Creates and Insert you will find below.

If I query the _empty_ freshly created table I get the following explain
result:

select * from tt where seckey = 1;
Index Scan using seckey_key on tt  (cost=0.00..17.07 rows=5 width=12)
  Index Cond: (seckey = 1)

If I use "OR" (or IN) things get worse:

select * from tt where seckey = 1 or seckey = 2
Seq Scan on tt  (cost=0.00..0.00 rows=1 width=12)
  Filter: ((seckey = 1) OR (seckey = 2))

Note the "Seq Scan" instead of using the index.

After populating the table with 8920 records and "analyze" the scenario
gets even worser:

select * from tt where seckey = 1;
Seq Scan on tt  (cost=0.00..168.50 rows=1669 width=12) (actual
time=0.000..15.000 rows=1784 loops=1)
  Filter: (seckey = 1)
Total runtime: 31.000 ms

Now also this simple query uses a "Seq Scan".

Now the questions are:
a) Why is the index not used if I use "OR" or "IN"
b) Why is the index not used after "analyze" ?

Any help is very appreciated!

Thanks,
Mario


// The table and data

CREATE TABLE tt (
  pkey int4 NOT NULL DEFAULT nextval('public."tt_PKEY_seq"'::text),
  seckey int8,
  CONSTRAINT pkey_key PRIMARY KEY (pkey)
)
WITHOUT OIDS;

CREATE INDEX seckey_key  ON tt  USING btree  (seckey);

// inserted many-many times
insert into tt values (default, 1);
insert into tt values (default, 2);
insert into tt values (default, 3);
insert into tt values (default, 4);
insert into tt values (default, 5);


Re: index not used if using IN or OR

From
Richard Huxton
Date:
Mario Ivankovits wrote:
> Hello !
>
> Sorry if this has been discussed before, it is just hard to find in the
> archives using the words "or" or "in" :-o
>
> I use postgres-8.0 beta4 for windows.
> I broke down my problem to a very simple table - two columns
> "primary_key" and "secondary_key". Creates and Insert you will find below.
>
> If I query the _empty_ freshly created table I get the following explain
> result:
>
> select * from tt where seckey = 1;
> Index Scan using seckey_key on tt  (cost=0.00..17.07 rows=5 width=12)
>  Index Cond: (seckey = 1)
>
> If I use "OR" (or IN) things get worse:
>
> select * from tt where seckey = 1 or seckey = 2
> Seq Scan on tt  (cost=0.00..0.00 rows=1 width=12)
>  Filter: ((seckey = 1) OR (seckey = 2))
>
> Note the "Seq Scan" instead of using the index.

But as you said, your table is *empty* - why would an index be faster?
Try running EXPLAIN ANALYSE on these queries and look at the actual times.

> After populating the table with 8920 records and "analyze" the scenario
> gets even worser:
>
> select * from tt where seckey = 1;
> Seq Scan on tt  (cost=0.00..168.50 rows=1669 width=12) (actual
> time=0.000..15.000 rows=1784 loops=1)
>  Filter: (seckey = 1)
> Total runtime: 31.000 ms
>
> Now also this simple query uses a "Seq Scan".

Well, it thinks it's going to be returning 1669 rows. If that's roughly
right, then scanning the table probably is faster.

Run the queries again with EXPLAIN ANALYSE. Also try issuing
   set enable_seqscan=false;
This will force the planner to use any indexes it finds. Compare the
times with and without, and don't forget to account for the effects of
caching.

--
   Richard Huxton
   Archonet Ltd

Re: index not used if using IN or OR

From
Tom Lane
Date:
Mario Ivankovits <mario@ops.co.at> writes:
> After populating the table with 8920 records and "analyze" the scenario
> gets even worser:

> select * from tt where seckey = 1;
> Seq Scan on tt  (cost=0.00..168.50 rows=1669 width=12) (actual
> time=0.000..15.000 rows=1784 loops=1)
>   Filter: (seckey = 1)
> Total runtime: 31.000 ms

> Now also this simple query uses a "Seq Scan".

Which is exactly what it *should* do, considering that it is selecting
1784 out of 8920 records.  Indexscans only win for small selectivities
--- the rule of thumb is that retrieving more than about 1% of the
records should use a seqscan.

            regards, tom lane