Re: [repost] partial index / funxtional idx or bad sql? - Mailing list pgsql-performance

From Tom Lane
Subject Re: [repost] partial index / funxtional idx or bad sql?
Date
Msg-id 8275.1052837172@sss.pgh.pa.us
Whole thread Raw
In response to Re: [repost] partial index / funxtional idx or bad sql?  ("Jim C. Nasby" <jim@nasby.net>)
List pgsql-performance
"Jim C. Nasby" <jim@nasby.net> writes:
> Wow, I'll have to keep that in mind. Shouldn't the optimizer be able to
> handle that? Could this get added to the TODO?

No, 'cause it's done (in CVS tip).

I'm actually a bit hesitant now to recommend that people do such things,
because the 7.4 optimizer is likely to produce a better plan from the
unmodified IN query than it will from any explicitly "improved" version.
The 7.4 code knows several ways to do IN efficiently, but when you
hand-transform the query you are forcing the choice; perhaps wrongly.

An example from CVS tip and the regression database in which hand
transformation forces a less efficient plan choice:

regression=# explain analyze select * from tenk1 a where unique1 in (select ten from tenk1);
                                                             QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=483.17..484.91 rows=10 width=248) (actual time=407.14..409.16 rows=10 loops=1)
   Merge Cond: ("outer".unique1 = "inner".ten)
   ->  Index Scan using tenk1_unique1 on tenk1 a  (cost=0.00..1571.97 rows=10000 width=244) (actual time=0.41..1.60
rows=11loops=1) 
   ->  Sort  (cost=483.17..483.19 rows=10 width=4) (actual time=406.57..406.65 rows=10 loops=1)
         Sort Key: tenk1.ten
         ->  HashAggregate  (cost=483.00..483.00 rows=10 width=4) (actual time=406.08..406.26 rows=10 loops=1)
               ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=4) (actual time=0.19..261.84 rows=10000
loops=1)
 Total runtime: 410.74 msec
(8 rows)

regression=# explain analyze select * from tenk1 a, (select distinct ten from tenk1) b
regression-# where a.unique1 = b.ten;
                                                           QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1122.39..1232.59 rows=10 width=248) (actual time=476.67..666.02 rows=10 loops=1)
   ->  Subquery Scan b  (cost=1122.39..1172.39 rows=10 width=4) (actual time=475.94..662.00 rows=10 loops=1)
         ->  Unique  (cost=1122.39..1172.39 rows=10 width=4) (actual time=475.89..661.65 rows=10 loops=1)
               ->  Sort  (cost=1122.39..1147.39 rows=10000 width=4) (actual time=475.85..559.27 rows=10000 loops=1)
                     Sort Key: ten
                     ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=4) (actual time=0.37..274.87 rows=10000
loops=1)
   ->  Index Scan using tenk1_unique1 on tenk1 a  (cost=0.00..6.01 rows=1 width=244) (actual time=0.27..0.31 rows=1
loops=10)
         Index Cond: (a.unique1 = "outer".ten)
 Total runtime: 687.53 msec
(9 rows)

So, for now, make the transformation ... but keep a note about the IN
version to try whenever you update to 7.4.

            regards, tom lane


pgsql-performance by date:

Previous
From: "Alfranio Junior"
Date:
Subject: Re: PERFORMANCE and SIZE
Next
From: Josh Berkus
Date:
Subject: Re: PERFORMANCE and SIZE