Index works with foo.val=bar.val but not foo.val<=bar.val - Mailing list pgsql-performance

From 8lhhxba02@sneakemail.com
Subject Index works with foo.val=bar.val but not foo.val<=bar.val
Date
Msg-id 908-19450@sneakemail.com
Whole thread Raw
List pgsql-performance
can anyone imagine why my query refuses to use indexes for a
foo.val<=bar.val query (but works for foo.val=bar.val foo.val<=1). the
i_value fields are ints (int4), and selectivity of both of these queries
is 2/10000. and I'm running pgsql 7.4.3.

Also, the default_statistics_target is 1000 and I increased the
effective_cache_size from 1000 to 80000 but it still doesn't work (but
seems sort of unrelated. there seems to be something fundamentally wrong,
because even if I set enable_seqscan=off and all the other crazy plans it
tries, it never gets to index scan).

If anyone has any ideas, please let me know, I've been trying to get this
working for a long time now.

this query produces the correct plan using Index Scan:

INSERT INTO answer
SELECT foo.cid as pid, bar.cid as sid,count(bar.cid) as cnt, 1
FROM foo, bar
   WHERE foo.field=bar.field AND
   (
    foo.op='i=i' AND bar.op='i=i' AND bar.i_value=foo.i_value
   )
GROUP BY foo.cid,bar.cid;

------explain analyze----------
  Subquery Scan "*SELECT*"  (cost=9.04..9.07 rows=2 width=20) (actual
time=1.000..1.000 rows=2 loops=1)
    ->  HashAggregate  (cost=9.04..9.04 rows=2 width=8) (actual
time=1.000..1.000 rows=2 loops=1)
          ->  Nested Loop  (cost=0.00..9.02 rows=2 width=8) (actual
time=1.000..1.000 rows=2 loops=1)
                Join Filter: (("outer".field)::text =
("inner".field)::text)
                ->  Seq Scan on foo (cost=0.00..1.01 rows=1 width=17)
(actual time=0.000..0.000 rows=1 loops=1)
                      Filter: ((op)::text = 'i=i'::text)
                ->  Index Scan using bar_index_op_i_value on bar
(cost=0.00..7.98 rows=2 width=17)
  (actual time=1.000..1.000 rows=2 loops=1)
                      Index Cond: (((bar.op)::text = 'i=i'::text) AND
(bar.i_value = "outer".i_value))
  Total runtime: 1.000 ms

-------------------------

this almost identical query doesn't (with = as <=):

INSERT INTO answer
SELECT foo.cid as pid, bar.cid as sid,count(bar.cid) as cnt, 5
FROM foo, bar
   WHERE foo.field=bar.field AND
   (
    foo.op='i<=i' AND bar.op='i=i' AND bar.i_value<=foo.i_value
   )
GROUP BY foo.cid,bar.cid;

-------------Explain-------
Table  contains 9 rows
QUERY PLAN
----------
Subquery Scan "*SELECT*"  (cost=385.02..435.03 rows=3334 width=20) (actual
time=50.000..50.000 rows=2 loops=1)
   ->  HashAggregate  (cost=385.02..393.35 rows=3334 width=8) (actual
time=50.000..50.000 rows=2 loops=1)
         ->  Nested Loop  (cost=0.00..360.01 rows=3334 width=8) (actual
time=9.000..50.000 rows=2 loops=1)
               Join Filter: ((("outer".field)::text =
("inner".field)::text) AND ("inner".i_value <= "outer".i_value))
               ->  Seq Scan on foo (cost=0.00..1.01 rows=1 width=17)
(actual time=0.000..0.000 rows=1 loops
=1)
                     Filter: ((op)::text = 'i<=i'::text)
               ->  Seq Scan on bar (cost=0.00..209.00 rows=10000 width=17)
(actual time=0.000..29.000 rows=
10000 loops=1)
                     Filter: ((op)::text = 'i=i'::text)
Total runtime: 51.000 ms

----------------------------

These queries both are on tables foo with 10,000 values (with i_value
values from 1-5000) and
one single entry in bar (with i_value=1)

If table bar has more than 1 entry, it resorts to a merge join, why can't
I get this to use Index Scan also?

This is what happens if bar has like 20 values (1-20):

INSERT INTO answer
SELECT foo.cid as pid, bar.cid as sid,count(bar.cid) as cnt, 5
FROM foo, bar
   WHERE foo.field=bar.field AND
   (
    foo.op='i<=i' AND bar.op='i=i' AND bar.i_value<=foo.i_value
   )
GROUP BY foo.cid,bar.cid;

-------------Explain-------
Table  contains 8 rows
QUERY PLAN
----------
Subquery Scan "*SELECT*"  (cost=2.74..3.04 rows=20 width=20) (actual
time=0.000..0.000 rows=20 loops=1)
   ->  HashAggregate  (cost=2.74..2.79 rows=20 width=8) (actual
time=0.000..0.000 rows=20 loops=1)
         ->  Merge Join  (cost=1.29..2.59 rows=20 width=8) (actual
time=0.000..0.000 rows=20 loops=1)
               Merge Cond: ("outer".i_value = "inner".i_value)
               Join Filter: (("inner".field)::text = ("outer".field)::text)
               ->  Index Scan using bar_index on bar (cost=0.00..500.98
rows=10000 width=17) (actu
al time=0.000..0.000 rows=21 loops=1)
                     Filter: ((op)::text = 'i=i'::text)
               ->  Sort  (cost=1.29..1.32 rows=10 width=17) (actual
time=0.000..0.000 rows=19 loops=1)
                     Sort Key: foo.i_value
                     ->  Seq Scan on foo (cost=0.00..1.12 rows=10 width=17)
(actual time=0.000..0.000 rows=
10 loops=1)
                           Filter: ((op)::text = 'i=i'::text)
Total runtime: 4.000 ms

Thanks!

--Kris

"Love is like Pi: natural, irrational, and very important."
   -- Lisa Hoffman


pgsql-performance by date:

Previous
From: "Merlin Moncure"
Date:
Subject: Re: best way to fetch next/prev record based on index
Next
From: Russell Smith
Date:
Subject: Re: my boss want to migrate to ORACLE