Re: index scan of whole table, can't see why - Mailing list pgsql-performance

From Dan Langille
Subject Re: index scan of whole table, can't see why
Date
Msg-id 41F00CA8.8976.F79FE91@localhost
Whole thread Raw
In response to Re: index scan of whole table, can't see why  (Russell Smith <mr-russ@pws.com.au>)
List pgsql-performance
On 21 Jan 2005 at 8:38, Russell Smith wrote:

> On Fri, 21 Jan 2005 02:36 am, Dan Langille wrote:
> > On 20 Jan 2005 at 7:26, Stephan Szabo wrote:
>
> [snip]
> > > Honestly I expected it to be slower (which it was), but I figured
> > > it's worth seeing what alternate plans it'll generate
> > > (specifically to see how it cost a nested loop on that join to
> > > compare to the fast plan). Unfortunately, it generated a merge
> > > join, so I think it might require both enable_hashjoin=false and
> > > enable_mergejoin=false to get it which is likely to be even slower
> > > in practice but still may be useful to see.
> >
> > Setting both to false gives a dramatic performance boost.  See
> > http://rafb.net/paste/results/b70KAi42.html
> >
>          ->  Materialize  (cost=15288.70..15316.36 rows=2766 width=35)
>          (actual time=0.004..0.596 rows=135 loops=92)
>                ->  Nested Loop  (cost=0.00..15288.70 rows=2766
>                width=35) (actual time=0.060..9.130 rows=135 loops=1)
>
> The Planner here has a quite inaccurate guess at the number of rows
> that will match in the join.  An alternative to turning off join types
> is to up the statistics on the Element columns because that's where
> the join is happening.  Hopefully the planner will get a better idea.
> However it may not be able too.  2766 rows vs 135 is quite likely to
> choose different plans.  As you can see you have had to turn off two
> join types to give something you wanted/expected.

Fair comment.  However, the statistics on ports.element_id,
ports.deprecated, ports.broken, and element.id are both set to 1000.

> > This gives suitable speed, but why does the plan vary so much with
> > such a minor change in the WHERE clause?
> Plan 1 - broken
>        ->  Nested Loop  (cost=0.00..3825.30 rows=495 width=35) (actual
>        time=0.056..16.161 rows=218 loops=1)
>
> Plan 2 - deprecated
>         ->  Hash Join  (cost=3676.78..10144.06 rows=2767 width=35)
>         (actual time=7.638..1158.128 rows=135 loops=1)
>
> The performance difference is when the where is changed, you have a
> totally different set of selection options. The Plan 1 and Plan 2
> shown from your paste earlier, report that you are out by a factor of
> 2 for plan 1.  But for plan 2 its a factor of 20.  The planner is
> likely to make the wrong choice when the stats are out by that factor.
>
> Beware what is a small "typing" change does not mean they queries are
> anything alight.

Agreed.  I just did not expect such a dramatic change which a result
set that is similar.  Actually, they aren't that similar at all.

Thank you.
--
Dan Langille : http://www.langille.org/
BSDCan - The Technical BSD Conference - http://www.bsdcan.org/


pgsql-performance by date:

Previous
From: "Bruno Almeida do Lago"
Date:
Subject: Re: PostgreSQL clustering VS MySQL clustering
Next
From: Josh Berkus
Date:
Subject: Re: PostgreSQL clustering VS MySQL clustering