Re: index scan of whole table, can't see why - Mailing list pgsql-performance

From Russell Smith
Subject Re: index scan of whole table, can't see why
Date
Msg-id 200501210838.19638.mr-russ@pws.com.au
Whole thread Raw
In response to Re: index scan of whole table, can't see why  ("Dan Langille" <dan@langille.org>)
Responses Re: index scan of whole table, can't see why
List pgsql-performance
On Fri, 21 Jan 2005 02:36 am, Dan Langille wrote:
> On 20 Jan 2005 at 7:26, Stephan Szabo wrote:

[snip]
> > Honestly I expected it to be slower (which it was), but I figured it's
> > worth seeing what alternate plans it'll generate (specifically to see how
> > it cost a nested loop on that join to compare to the fast plan).
> > Unfortunately, it generated a merge join, so I think it might require both
> > enable_hashjoin=false and enable_mergejoin=false to get it which is likely
> > to be even slower in practice but still may be useful to see.
>
> Setting both to false gives a dramatic performance boost.  See
> http://rafb.net/paste/results/b70KAi42.html
>
         ->  Materialize  (cost=15288.70..15316.36 rows=2766 width=35) (actual time=0.004..0.596 rows=135 loops=92)
               ->  Nested Loop  (cost=0.00..15288.70 rows=2766 width=35) (actual time=0.060..9.130 rows=135 loops=1)

The Planner here has a quite inaccurate guess at the number of rows that will match in the join.  An alternative to
turning off join types is to up the statistics on the Element columns because that's where the join is happening.
Hopefullythe planner will 
get a better idea.  However it may not be able too.  2766 rows vs 135 is quite likely to choose different plans.  As
youcan 
see you have had to turn off two join types to give something you wanted/expected.

> This gives suitable speed, but why does the plan vary so much with
> such a minor change in the WHERE clause?
Plan 1 - broken
       ->  Nested Loop  (cost=0.00..3825.30 rows=495 width=35) (actual time=0.056..16.161 rows=218 loops=1)

Plan 2 - deprecated
        ->  Hash Join  (cost=3676.78..10144.06 rows=2767 width=35) (actual time=7.638..1158.128 rows=135 loops=1)

The performance difference is when the where is changed, you have a totally different set of selection options.
The Plan 1 and Plan 2 shown from your paste earlier, report that you are out by a factor of 2 for plan 1.  But for plan
2
its a factor of 20.  The planner is likely to make the wrong choice when the stats are out by that factor.

Beware what is a small "typing" change does not mean they queries are anything alight.

Regards

Russell Smith.

pgsql-performance by date:

Previous
From: Alex Turner
Date:
Subject: Re: [SQL] OFFSET impact on Performance???
Next
From: "Matt Casters"
Date:
Subject: Re: