Thread: index scan of whole table, can't see why

index scan of whole table, can't see why

From
"Dan Langille"
Date:
Hi folks,

Running on 7.4.2, recently vacuum analysed the three tables in
question.

The query plan in question changes dramatically when a WHERE clause
changes from ports.broken to ports.deprecated.  I don't see why.
Well, I do see why: a sequential scan of a 130,000 rows.  The query
goes from 13ms to 1100ms because the of this.  The full plans are at
http://rafb.net/paste/results/v8ccvQ54.html

I have tried some tuning by:

  set effective_cache_size to 4000, was 1000
  set random_page_cost to 1, was 4

The resulting plan changes, but no speed improvment, are at
http://rafb.net/paste/results/rV8khJ18.html

Any suggestions please?

--
Dan Langille : http://www.langille.org/
BSDCan - The Technical BSD Conference - http://www.bsdcan.org/


Re: index scan of whole table, can't see why

From
andrew@pillette.com
Date:
This is a multi-part message in MIME format.

--bound1106197232
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

Let's see if I have been paying enough attention to the SQL gurus. The planner is making a different estimate of how
manydeprecated<>'' versus how many broken <> ''. I would try SET STATISTICS to a larger number on the ports table, and
re-analyze.

--bound1106197232--

Re: index scan of whole table, can't see why

From
Ragnar Hafstað
Date:
On Wed, 2005-01-19 at 21:00 -0800, andrew@pillette.com wrote:
> Let's see if I have been paying enough attention to the SQL gurus.
> The planner is making a different estimate of how many deprecated<>'' versus how many broken <> ''.
> I would try SET STATISTICS to a larger number on the ports table, and re-analyze.

that should not help, as the estimate is accurate, according to the
explain analyze.

gnari



Re: index scan of whole table, can't see why

From
Ragnar Hafstað
Date:
On Wed, 2005-01-19 at 20:37 -0500, Dan Langille wrote:
> Hi folks,
>
> Running on 7.4.2, recently vacuum analysed the three tables in
> question.
>
> The query plan in question changes dramatically when a WHERE clause
> changes from ports.broken to ports.deprecated.  I don't see why.
> Well, I do see why: a sequential scan of a 130,000 rows.  The query
> goes from 13ms to 1100ms because the of this.  The full plans are at
> http://rafb.net/paste/results/v8ccvQ54.html
>
> I have tried some tuning by:
>
>   set effective_cache_size to 4000, was 1000
>   set random_page_cost to 1, was 4
>
> The resulting plan changes, but no speed improvment, are at
> http://rafb.net/paste/results/rV8khJ18.html
>

this just confirms that an indexscan is not always better than a
tablescan. by setting random_page_cost to 1, you deceiving the
planner into thinking that the indexscan is almost as effective
as a tablescan.

> Any suggestions please?

did you try to increase sort_mem ?

gnari



Re: index scan of whole table, can't see why

From
"Dan Langille"
Date:
On 20 Jan 2005 at 9:34, Ragnar Hafstað wrote:

> On Wed, 2005-01-19 at 20:37 -0500, Dan Langille wrote:
> > Hi folks,
> >
> > Running on 7.4.2, recently vacuum analysed the three tables in
> > question.
> >
> > The query plan in question changes dramatically when a WHERE clause
> > changes from ports.broken to ports.deprecated.  I don't see why.
> > Well, I do see why: a sequential scan of a 130,000 rows.  The query
> > goes from 13ms to 1100ms because the of this.  The full plans are at
> > http://rafb.net/paste/results/v8ccvQ54.html
> >
> > I have tried some tuning by:
> >
> >   set effective_cache_size to 4000, was 1000
> >   set random_page_cost to 1, was 4
> >
> > The resulting plan changes, but no speed improvment, are at
> > http://rafb.net/paste/results/rV8khJ18.html
> >
>
> this just confirms that an indexscan is not always better than a
> tablescan. by setting random_page_cost to 1, you deceiving the
> planner into thinking that the indexscan is almost as effective
> as a tablescan.
>
> > Any suggestions please?
>
> did you try to increase sort_mem ?

I tried sort_mem = 4096 and then 16384. This did not make a
difference.  See http://rafb.net/paste/results/AVDqEm55.html

Thank you.
--
Dan Langille : http://www.langille.org/
BSDCan - The Technical BSD Conference - http://www.bsdcan.org/


Re: index scan of whole table, can't see why

From
Stephan Szabo
Date:
On Wed, 19 Jan 2005, Dan Langille wrote:

> Hi folks,
>
> Running on 7.4.2, recently vacuum analysed the three tables in
> question.
>
> The query plan in question changes dramatically when a WHERE clause
> changes from ports.broken to ports.deprecated.  I don't see why.
> Well, I do see why: a sequential scan of a 130,000 rows.  The query
> goes from 13ms to 1100ms because the of this.  The full plans are at
> http://rafb.net/paste/results/v8ccvQ54.html
>
> I have tried some tuning by:
>
>   set effective_cache_size to 4000, was 1000
>   set random_page_cost to 1, was 4
>
> The resulting plan changes, but no speed improvment, are at
> http://rafb.net/paste/results/rV8khJ18.html
>
> Any suggestions please?

As a question, what does it do if enable_hashjoin is false? I'm wondering
if it'll pick a nested loop for that step for the element/ports join and
what it estimates the cost to be.


Re: index scan of whole table, can't see why

From
"Dan Langille"
Date:
On 20 Jan 2005 at 6:14, Stephan Szabo wrote:

> On Wed, 19 Jan 2005, Dan Langille wrote:
>
> > Hi folks,
> >
> > Running on 7.4.2, recently vacuum analysed the three tables in
> > question.
> >
> > The query plan in question changes dramatically when a WHERE clause
> > changes from ports.broken to ports.deprecated.  I don't see why.
> > Well, I do see why: a sequential scan of a 130,000 rows.  The query
> > goes from 13ms to 1100ms because the of this.  The full plans are at
> > http://rafb.net/paste/results/v8ccvQ54.html
> >
> > I have tried some tuning by:
> >
> >   set effective_cache_size to 4000, was 1000
> >   set random_page_cost to 1, was 4
> >
> > The resulting plan changes, but no speed improvment, are at
> > http://rafb.net/paste/results/rV8khJ18.html
> >
> > Any suggestions please?
>
> As a question, what does it do if enable_hashjoin is false? I'm wondering
> if it'll pick a nested loop for that step for the element/ports join and
> what it estimates the cost to be.

With enable_hashjoin = false, no speed improvement.  Execution plan
at http://rafb.net/paste/results/qtSFVM72.html

thanks
--
Dan Langille : http://www.langille.org/
BSDCan - The Technical BSD Conference - http://www.bsdcan.org/


Re: index scan of whole table, can't see why

From
Stephan Szabo
Date:
On Thu, 20 Jan 2005, Dan Langille wrote:

> On 20 Jan 2005 at 6:14, Stephan Szabo wrote:
>
> > On Wed, 19 Jan 2005, Dan Langille wrote:
> >
> > > Hi folks,
> > >
> > > Running on 7.4.2, recently vacuum analysed the three tables in
> > > question.
> > >
> > > The query plan in question changes dramatically when a WHERE clause
> > > changes from ports.broken to ports.deprecated.  I don't see why.
> > > Well, I do see why: a sequential scan of a 130,000 rows.  The query
> > > goes from 13ms to 1100ms because the of this.  The full plans are at
> > > http://rafb.net/paste/results/v8ccvQ54.html
> > >
> > > I have tried some tuning by:
> > >
> > >   set effective_cache_size to 4000, was 1000
> > >   set random_page_cost to 1, was 4
> > >
> > > The resulting plan changes, but no speed improvment, are at
> > > http://rafb.net/paste/results/rV8khJ18.html
> > >
> > > Any suggestions please?
> >
> > As a question, what does it do if enable_hashjoin is false? I'm wondering
> > if it'll pick a nested loop for that step for the element/ports join and
> > what it estimates the cost to be.
>
> With enable_hashjoin = false, no speed improvement.  Execution plan
> at http://rafb.net/paste/results/qtSFVM72.html

Honestly I expected it to be slower (which it was), but I figured it's
worth seeing what alternate plans it'll generate (specifically to see how
it cost a nested loop on that join to compare to the fast plan).
Unfortunately, it generated a merge join, so I think it might require both
enable_hashjoin=false and enable_mergejoin=false to get it which is likely
to be even slower in practice but still may be useful to see.


Re: index scan of whole table, can't see why

From
"Dan Langille"
Date:
On 20 Jan 2005 at 7:26, Stephan Szabo wrote:

> On Thu, 20 Jan 2005, Dan Langille wrote:
>
> > On 20 Jan 2005 at 6:14, Stephan Szabo wrote:
> >
> > > On Wed, 19 Jan 2005, Dan Langille wrote:
> > >
> > > > Hi folks,
> > > >
> > > > Running on 7.4.2, recently vacuum analysed the three tables in
> > > > question.
> > > >
> > > > The query plan in question changes dramatically when a WHERE clause
> > > > changes from ports.broken to ports.deprecated.  I don't see why.
> > > > Well, I do see why: a sequential scan of a 130,000 rows.  The query
> > > > goes from 13ms to 1100ms because the of this.  The full plans are at
> > > > http://rafb.net/paste/results/v8ccvQ54.html
> > > >
> > > > I have tried some tuning by:
> > > >
> > > >   set effective_cache_size to 4000, was 1000
> > > >   set random_page_cost to 1, was 4
> > > >
> > > > The resulting plan changes, but no speed improvment, are at
> > > > http://rafb.net/paste/results/rV8khJ18.html
> > > >
> > > > Any suggestions please?
> > >
> > > As a question, what does it do if enable_hashjoin is false? I'm wondering
> > > if it'll pick a nested loop for that step for the element/ports join and
> > > what it estimates the cost to be.
> >
> > With enable_hashjoin = false, no speed improvement.  Execution plan
> > at http://rafb.net/paste/results/qtSFVM72.html
>
> Honestly I expected it to be slower (which it was), but I figured it's
> worth seeing what alternate plans it'll generate (specifically to see how
> it cost a nested loop on that join to compare to the fast plan).
> Unfortunately, it generated a merge join, so I think it might require both
> enable_hashjoin=false and enable_mergejoin=false to get it which is likely
> to be even slower in practice but still may be useful to see.

Setting both to false gives a dramatic performance boost.  See
http://rafb.net/paste/results/b70KAi42.html

This gives suitable speed, but why does the plan vary so much with
such a minor change in the WHERE clause?
--
Dan Langille : http://www.langille.org/
BSDCan - The Technical BSD Conference - http://www.bsdcan.org/


Re: index scan of whole table, can't see why

From
Russell Smith
Date:
On Fri, 21 Jan 2005 02:36 am, Dan Langille wrote:
> On 20 Jan 2005 at 7:26, Stephan Szabo wrote:

[snip]
> > Honestly I expected it to be slower (which it was), but I figured it's
> > worth seeing what alternate plans it'll generate (specifically to see how
> > it cost a nested loop on that join to compare to the fast plan).
> > Unfortunately, it generated a merge join, so I think it might require both
> > enable_hashjoin=false and enable_mergejoin=false to get it which is likely
> > to be even slower in practice but still may be useful to see.
>
> Setting both to false gives a dramatic performance boost.  See
> http://rafb.net/paste/results/b70KAi42.html
>
         ->  Materialize  (cost=15288.70..15316.36 rows=2766 width=35) (actual time=0.004..0.596 rows=135 loops=92)
               ->  Nested Loop  (cost=0.00..15288.70 rows=2766 width=35) (actual time=0.060..9.130 rows=135 loops=1)

The Planner here has a quite inaccurate guess at the number of rows that will match in the join.  An alternative to
turning off join types is to up the statistics on the Element columns because that's where the join is happening.
Hopefullythe planner will 
get a better idea.  However it may not be able too.  2766 rows vs 135 is quite likely to choose different plans.  As
youcan 
see you have had to turn off two join types to give something you wanted/expected.

> This gives suitable speed, but why does the plan vary so much with
> such a minor change in the WHERE clause?
Plan 1 - broken
       ->  Nested Loop  (cost=0.00..3825.30 rows=495 width=35) (actual time=0.056..16.161 rows=218 loops=1)

Plan 2 - deprecated
        ->  Hash Join  (cost=3676.78..10144.06 rows=2767 width=35) (actual time=7.638..1158.128 rows=135 loops=1)

The performance difference is when the where is changed, you have a totally different set of selection options.
The Plan 1 and Plan 2 shown from your paste earlier, report that you are out by a factor of 2 for plan 1.  But for plan
2
its a factor of 20.  The planner is likely to make the wrong choice when the stats are out by that factor.

Beware what is a small "typing" change does not mean they queries are anything alight.

Regards

Russell Smith.

Re: index scan of whole table, can't see why

From
"Dan Langille"
Date:
On 21 Jan 2005 at 8:38, Russell Smith wrote:

> On Fri, 21 Jan 2005 02:36 am, Dan Langille wrote:
> > On 20 Jan 2005 at 7:26, Stephan Szabo wrote:
>
> [snip]
> > > Honestly I expected it to be slower (which it was), but I figured
> > > it's worth seeing what alternate plans it'll generate
> > > (specifically to see how it cost a nested loop on that join to
> > > compare to the fast plan). Unfortunately, it generated a merge
> > > join, so I think it might require both enable_hashjoin=false and
> > > enable_mergejoin=false to get it which is likely to be even slower
> > > in practice but still may be useful to see.
> >
> > Setting both to false gives a dramatic performance boost.  See
> > http://rafb.net/paste/results/b70KAi42.html
> >
>          ->  Materialize  (cost=15288.70..15316.36 rows=2766 width=35)
>          (actual time=0.004..0.596 rows=135 loops=92)
>                ->  Nested Loop  (cost=0.00..15288.70 rows=2766
>                width=35) (actual time=0.060..9.130 rows=135 loops=1)
>
> The Planner here has a quite inaccurate guess at the number of rows
> that will match in the join.  An alternative to turning off join types
> is to up the statistics on the Element columns because that's where
> the join is happening.  Hopefully the planner will get a better idea.
> However it may not be able too.  2766 rows vs 135 is quite likely to
> choose different plans.  As you can see you have had to turn off two
> join types to give something you wanted/expected.

Fair comment.  However, the statistics on ports.element_id,
ports.deprecated, ports.broken, and element.id are both set to 1000.

> > This gives suitable speed, but why does the plan vary so much with
> > such a minor change in the WHERE clause?
> Plan 1 - broken
>        ->  Nested Loop  (cost=0.00..3825.30 rows=495 width=35) (actual
>        time=0.056..16.161 rows=218 loops=1)
>
> Plan 2 - deprecated
>         ->  Hash Join  (cost=3676.78..10144.06 rows=2767 width=35)
>         (actual time=7.638..1158.128 rows=135 loops=1)
>
> The performance difference is when the where is changed, you have a
> totally different set of selection options. The Plan 1 and Plan 2
> shown from your paste earlier, report that you are out by a factor of
> 2 for plan 1.  But for plan 2 its a factor of 20.  The planner is
> likely to make the wrong choice when the stats are out by that factor.
>
> Beware what is a small "typing" change does not mean they queries are
> anything alight.

Agreed.  I just did not expect such a dramatic change which a result
set that is similar.  Actually, they aren't that similar at all.

Thank you.
--
Dan Langille : http://www.langille.org/
BSDCan - The Technical BSD Conference - http://www.bsdcan.org/