Re: Optimizer difference using function index between 7.3 and 7.4 - Mailing list pgsql-performance

From Tom Lane
Subject Re: Optimizer difference using function index between 7.3 and 7.4
Date
Msg-id 24252.1077123322@sss.pgh.pa.us
Whole thread Raw
In response to Optimizer difference using function index between 7.3 and 7.4  (Jeff Boes <jboes@nexcerpt.com>)
List pgsql-performance
Jeff Boes <jboes@nexcerpt.com> writes:
> Is this a bug in the optimizer? Or did something
> change about the way functional indexes are used?

In 7.3, the only possible plan for these queries was a nestloop or
nestloop with inner indexscan, because the planner could not generate
merge or hash joins on join conditions more complex than "var1 = var2".
You were fortunate that a nestloop was fast enough for your situation.

In 7.4 the planner can (as you see) generate both merge and hash options
for this query.  What it's not very good at yet is picking the best
option to use, because it doesn't have any statistics about the
distribution of functional indexes, and so it's pretty much guessing
about selectivity.

As of just a couple days ago, there is code in CVS tip that keeps and
uses stats about the values of functional index columns.  It seems
likely that this would help out tremendously in terms of estimating
the costs well for your problem.  Don't suppose you'd like to try
setting up a test system with your data and trying it ...

BTW, as best I can tell, the amazing speed for the mergejoin is a bit of
a fluke.

>  Merge Join  (cost=12.64..219974.16 rows=1705551 width=4) (actual
> time=17.928..17.928 rows=0 loops=1)
>    Merge Cond: (fn_urlrev("outer".path_base) = "inner"."?column2?")
>    ->  Index Scan using ix_links_3 on links l  (cost=0.00..173058.87
> rows=1705550 width=78) (actual time=0.229..0.285 rows=7 loops=1)
>    ->  Sort  (cost=12.64..13.14 rows=200 width=74) (actual
> time=9.652..9.871 rows=200 loops=1)
>          Sort Key: fn_urlrev(t.rev_path_base)
>          ->  Seq Scan on clm_tmp_links t  (cost=0.00..5.00 rows=200
> width=74) (actual time=0.166..5.753 rows=200 loops=1)
>  Total runtime: 18.125 ms

Notice how the indexscan on links is reporting that it only returned
7 rows.  Ordinarily you'd expect that it'd scan the whole table (and
that's what the cost estimate is expecting).  I think what must have
happened is that the scan stopped only a little way into the table,
because the sequence of values from the temp table ended with a value
that was close to the start of the range of values in the main table.
Mergejoin stops fetching as soon as it exhausts either input table.
This was good luck for you in this case but would likely not hold up
with another set of temp values.

            regards, tom lane

pgsql-performance by date:

Previous
From: Jeff Boes
Date:
Subject: Optimizer difference using function index between 7.3 and 7.4
Next
From: "Eric Jain"
Date:
Subject: Re: UPDATE with subquery too slow