Thread: planer don't use index. bad plan for where id = x or id in (select ...)

planer don't use index. bad plan for where id = x or id in (select ...)

From
Pavel Stehule
Date:
Hello,

Pg make query 1. and 2. very fast (use index), but for query 3. dont use
index. I can solve its using select union, but I readed so pg 7.5 don't
problem with OR operator. I use cvs pg. I used vacuum analyze first.

table sp_op_product has 15K rows, sp_op_uct 37K rows;

regards
Pavel Stehule

query 1.

intra=# explain analyse select * from sp_op_uct where sp_op_id in (select
sp_op_id from sp_op_produkt where bal_zak = 34123);
                                                                    QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=7.81..74.59 rows=17 width=371) (actual
time=0.241..0.249 rows=1 loops=1)
   ->  HashAggregate  (cost=7.81..7.81 rows=12 width=4) (actual
time=0.118..0.120 rows=1 loops=1)
         ->  Index Scan using sp_op_product_bal_zak on sp_op_produkt
(cost=0.00..7.78 rows=12 width=4) (actual time=0.077..0.082 rows=1
loops=1)
               Index Cond: (bal_zak = 34123)
   ->  Index Scan using sp_op_uct_sp_op_id on sp_op_uct  (cost=0.00..5.54
rows=2 width=371) (actual time=0.068..0.072 rows=1 loops=1)
         Index Cond: (sp_op_uct.sp_op_id = "outer".sp_op_id)
 Total runtime: 1.846 ms

query 2.

intra=# explain ANALYZE select * from sp_op_uct where sp_op_id = 34123;
                                                           QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
 Index Scan using sp_op_uct_sp_op_id on sp_op_uct  (cost=0.00..5.54 rows=2
width=371) (actual time=0.066..0.072 rows=1 loops=1)
   Index Cond: (sp_op_id = 34123)
 Total runtime: 0.287 ms

slow query 3.
intra=# EXPLAIN ANALYZE select * from sp_op_uct where sp_op_id = 34123 or
sp_op_id in (select sp_op_id from sp_op_produkt where bal_zak = 34123) ;
                                                                  QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on sp_op_uct  (cost=7.81..1283.83 rows=18602 width=371) (actual
time=3.176..240.379 rows=2 loops=1)
   Filter: ((sp_op_id = 34123) OR (hashed subplan))
   SubPlan
     ->  Index Scan using sp_op_product_bal_zak on sp_op_produkt
(cost=0.00..7.78 rows=12 width=4) (actual time=0.441..0.449 rows=1
loops=1)
           Index Cond: (bal_zak = 34123)
 Total runtime: 240.868 ms

Fast query 4.

intra=# explain ANALYZE select * from sp_op_uct where sp_op_id in (select
sp_op_id from sp_op_produkt where bal_zak = 34123) union select * from
sp_op_uct where sp_op_id = 34123;

                                          
QUERY PLAN
                  

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=80.72..82.39 rows=19 width=371) (actual time=0.857..0.868
rows=2 loops=1)
   ->  Sort  (cost=80.72..80.77 rows=19 width=371) (actual
time=0.852..0.854 rows=2 loops=1)
         Sort Key: sp_op_id, perioda, uozac, uokon, setup, timely,
uamount, "LIMIT", krok, kdy, uctuj, tarif, jednotka, merfunc, tarif_id,
hl_cinnost, merene, typ_fakturace, region, prod_group, rychlost, sdilene,
produkt, vl_sluzba, per_jedn, vpn, prov_inst, pevne_komutovane, spravnost,
fakt_text, pres_rychlost, pm, pm2, sp_kvalita
         ->  Append  (cost=7.81..80.32 rows=19 width=371) (actual
time=0.247..0.317 rows=2 loops=1)
               ->  Subquery Scan "*SELECT* 1"  (cost=7.81..74.76 rows=17
width=371) (actual time=0.244..0.256 rows=1 loops=1)
                     ->  Nested Loop  (cost=7.81..74.59 rows=17 width=371)
(actual time=0.197..0.205 rows=1 loops=1)
                           ->  HashAggregate  (cost=7.81..7.81 rows=12
width=4) (actual time=0.096..0.098 rows=1 loops=1)
                                 ->  Index Scan using
sp_op_product_bal_zak on sp_op_produkt  (cost=0.00..7.78 rows=12 width=4)
(actual time=0.040..0.044 rows=1 loops=1)
                                       Index Cond: (bal_zak = 34123)
                           ->  Index Scan using sp_op_uct_sp_op_id on
sp_op_uct  (cost=0.00..5.54 rows=2 width=371) (actual time=0.045..0.050
rows=1 loops=1)
                                 Index Cond: (sp_op_uct.sp_op_id =
"outer".sp_op_id)
               ->  Subquery Scan "*SELECT* 2"  (cost=0.00..5.56 rows=2
width=371) (actual time=0.048..0.053 rows=1 loops=1)
                     ->  Index Scan using sp_op_uct_sp_op_id on sp_op_uct
(cost=0.00..5.54 rows=2 width=371) (actual time=0.019..0.022 rows=1
loops=1)
                           Index Cond: (sp_op_id = 34123)
 Total runtime: 2.413 ms



Pavel Stehule <stehule@kix.fsv.cvut.cz> writes:

> Hello,
>
> Pg make query 1. and 2. very fast (use index), but for query 3. dont use
> index. I can solve its using select union, but I readed so pg 7.5 don't
> problem with OR operator. I use cvs pg. I used vacuum analyze first.

I don't think even in CVS that there's much hope on this query with the OR
between the IN clause and a plain equals clause, but there are a few
improvements you could make to the UNION form:

> intra=# explain ANALYZE select * from sp_op_uct where sp_op_id in (select
> sp_op_id from sp_op_produkt where bal_zak = 34123) union select * from
> sp_op_uct where sp_op_id = 34123;

Try

SELECT *
  FROM sp_op_uct
 WHERE sp_op_id IN (
         SELECT sp_op_id
           FROM sp_op_produkt
          WHERE bal_zak = 34123
         UNION ALL
         SELECT 34123 AS sp_op_id
       )

You can select just the plain constant 34123 instead of having to fetch the
row from sp_op_uct. This probably won't save much actual time since the record
would have been in cache anyways. And you can use UNION ALL instead of UNION
since the IN will eliminate duplicates anyways. This avoids an extra
sort/uniquify step. Again it wouldn't really save much time since the extra
sort was only across 2 rows. But I like seeing simple clean plans even if they
aren't really much faster.

--
greg

Greg Stark <gsstark@mit.edu> writes:
> ... And you can use UNION ALL instead of UNION
> since the IN will eliminate duplicates anyways. This avoids an extra
> sort/uniquify step.

FWIW, CVS tip realizes that it doesn't need two unique-ification steps
in this scenario.  But I agree that UNION ALL is the cleaner way to
write the query.

            regards, tom lane