Thread: index scan of whole table, can't see why
Hi folks, Running on 7.4.2, recently vacuum analysed the three tables in question. The query plan in question changes dramatically when a WHERE clause changes from ports.broken to ports.deprecated. I don't see why. Well, I do see why: a sequential scan of a 130,000 rows. The query goes from 13ms to 1100ms because the of this. The full plans are at http://rafb.net/paste/results/v8ccvQ54.html I have tried some tuning by: set effective_cache_size to 4000, was 1000 set random_page_cost to 1, was 4 The resulting plan changes, but no speed improvment, are at http://rafb.net/paste/results/rV8khJ18.html Any suggestions please? -- Dan Langille : http://www.langille.org/ BSDCan - The Technical BSD Conference - http://www.bsdcan.org/
This is a multi-part message in MIME format. --bound1106197232 Content-Type: text/plain Content-Transfer-Encoding: 7bit Let's see if I have been paying enough attention to the SQL gurus. The planner is making a different estimate of how manydeprecated<>'' versus how many broken <> ''. I would try SET STATISTICS to a larger number on the ports table, and re-analyze. --bound1106197232--
On Wed, 2005-01-19 at 21:00 -0800, andrew@pillette.com wrote: > Let's see if I have been paying enough attention to the SQL gurus. > The planner is making a different estimate of how many deprecated<>'' versus how many broken <> ''. > I would try SET STATISTICS to a larger number on the ports table, and re-analyze. that should not help, as the estimate is accurate, according to the explain analyze. gnari
On Wed, 2005-01-19 at 20:37 -0500, Dan Langille wrote: > Hi folks, > > Running on 7.4.2, recently vacuum analysed the three tables in > question. > > The query plan in question changes dramatically when a WHERE clause > changes from ports.broken to ports.deprecated. I don't see why. > Well, I do see why: a sequential scan of a 130,000 rows. The query > goes from 13ms to 1100ms because the of this. The full plans are at > http://rafb.net/paste/results/v8ccvQ54.html > > I have tried some tuning by: > > set effective_cache_size to 4000, was 1000 > set random_page_cost to 1, was 4 > > The resulting plan changes, but no speed improvment, are at > http://rafb.net/paste/results/rV8khJ18.html > this just confirms that an indexscan is not always better than a tablescan. by setting random_page_cost to 1, you deceiving the planner into thinking that the indexscan is almost as effective as a tablescan. > Any suggestions please? did you try to increase sort_mem ? gnari
On 20 Jan 2005 at 9:34, Ragnar Hafstað wrote: > On Wed, 2005-01-19 at 20:37 -0500, Dan Langille wrote: > > Hi folks, > > > > Running on 7.4.2, recently vacuum analysed the three tables in > > question. > > > > The query plan in question changes dramatically when a WHERE clause > > changes from ports.broken to ports.deprecated. I don't see why. > > Well, I do see why: a sequential scan of a 130,000 rows. The query > > goes from 13ms to 1100ms because the of this. The full plans are at > > http://rafb.net/paste/results/v8ccvQ54.html > > > > I have tried some tuning by: > > > > set effective_cache_size to 4000, was 1000 > > set random_page_cost to 1, was 4 > > > > The resulting plan changes, but no speed improvment, are at > > http://rafb.net/paste/results/rV8khJ18.html > > > > this just confirms that an indexscan is not always better than a > tablescan. by setting random_page_cost to 1, you deceiving the > planner into thinking that the indexscan is almost as effective > as a tablescan. > > > Any suggestions please? > > did you try to increase sort_mem ? I tried sort_mem = 4096 and then 16384. This did not make a difference. See http://rafb.net/paste/results/AVDqEm55.html Thank you. -- Dan Langille : http://www.langille.org/ BSDCan - The Technical BSD Conference - http://www.bsdcan.org/
On Wed, 19 Jan 2005, Dan Langille wrote: > Hi folks, > > Running on 7.4.2, recently vacuum analysed the three tables in > question. > > The query plan in question changes dramatically when a WHERE clause > changes from ports.broken to ports.deprecated. I don't see why. > Well, I do see why: a sequential scan of a 130,000 rows. The query > goes from 13ms to 1100ms because the of this. The full plans are at > http://rafb.net/paste/results/v8ccvQ54.html > > I have tried some tuning by: > > set effective_cache_size to 4000, was 1000 > set random_page_cost to 1, was 4 > > The resulting plan changes, but no speed improvment, are at > http://rafb.net/paste/results/rV8khJ18.html > > Any suggestions please? As a question, what does it do if enable_hashjoin is false? I'm wondering if it'll pick a nested loop for that step for the element/ports join and what it estimates the cost to be.
On 20 Jan 2005 at 6:14, Stephan Szabo wrote: > On Wed, 19 Jan 2005, Dan Langille wrote: > > > Hi folks, > > > > Running on 7.4.2, recently vacuum analysed the three tables in > > question. > > > > The query plan in question changes dramatically when a WHERE clause > > changes from ports.broken to ports.deprecated. I don't see why. > > Well, I do see why: a sequential scan of a 130,000 rows. The query > > goes from 13ms to 1100ms because the of this. The full plans are at > > http://rafb.net/paste/results/v8ccvQ54.html > > > > I have tried some tuning by: > > > > set effective_cache_size to 4000, was 1000 > > set random_page_cost to 1, was 4 > > > > The resulting plan changes, but no speed improvment, are at > > http://rafb.net/paste/results/rV8khJ18.html > > > > Any suggestions please? > > As a question, what does it do if enable_hashjoin is false? I'm wondering > if it'll pick a nested loop for that step for the element/ports join and > what it estimates the cost to be. With enable_hashjoin = false, no speed improvement. Execution plan at http://rafb.net/paste/results/qtSFVM72.html thanks -- Dan Langille : http://www.langille.org/ BSDCan - The Technical BSD Conference - http://www.bsdcan.org/
On Thu, 20 Jan 2005, Dan Langille wrote: > On 20 Jan 2005 at 6:14, Stephan Szabo wrote: > > > On Wed, 19 Jan 2005, Dan Langille wrote: > > > > > Hi folks, > > > > > > Running on 7.4.2, recently vacuum analysed the three tables in > > > question. > > > > > > The query plan in question changes dramatically when a WHERE clause > > > changes from ports.broken to ports.deprecated. I don't see why. > > > Well, I do see why: a sequential scan of a 130,000 rows. The query > > > goes from 13ms to 1100ms because the of this. The full plans are at > > > http://rafb.net/paste/results/v8ccvQ54.html > > > > > > I have tried some tuning by: > > > > > > set effective_cache_size to 4000, was 1000 > > > set random_page_cost to 1, was 4 > > > > > > The resulting plan changes, but no speed improvment, are at > > > http://rafb.net/paste/results/rV8khJ18.html > > > > > > Any suggestions please? > > > > As a question, what does it do if enable_hashjoin is false? I'm wondering > > if it'll pick a nested loop for that step for the element/ports join and > > what it estimates the cost to be. > > With enable_hashjoin = false, no speed improvement. Execution plan > at http://rafb.net/paste/results/qtSFVM72.html Honestly I expected it to be slower (which it was), but I figured it's worth seeing what alternate plans it'll generate (specifically to see how it cost a nested loop on that join to compare to the fast plan). Unfortunately, it generated a merge join, so I think it might require both enable_hashjoin=false and enable_mergejoin=false to get it which is likely to be even slower in practice but still may be useful to see.
On 20 Jan 2005 at 7:26, Stephan Szabo wrote: > On Thu, 20 Jan 2005, Dan Langille wrote: > > > On 20 Jan 2005 at 6:14, Stephan Szabo wrote: > > > > > On Wed, 19 Jan 2005, Dan Langille wrote: > > > > > > > Hi folks, > > > > > > > > Running on 7.4.2, recently vacuum analysed the three tables in > > > > question. > > > > > > > > The query plan in question changes dramatically when a WHERE clause > > > > changes from ports.broken to ports.deprecated. I don't see why. > > > > Well, I do see why: a sequential scan of a 130,000 rows. The query > > > > goes from 13ms to 1100ms because the of this. The full plans are at > > > > http://rafb.net/paste/results/v8ccvQ54.html > > > > > > > > I have tried some tuning by: > > > > > > > > set effective_cache_size to 4000, was 1000 > > > > set random_page_cost to 1, was 4 > > > > > > > > The resulting plan changes, but no speed improvment, are at > > > > http://rafb.net/paste/results/rV8khJ18.html > > > > > > > > Any suggestions please? > > > > > > As a question, what does it do if enable_hashjoin is false? I'm wondering > > > if it'll pick a nested loop for that step for the element/ports join and > > > what it estimates the cost to be. > > > > With enable_hashjoin = false, no speed improvement. Execution plan > > at http://rafb.net/paste/results/qtSFVM72.html > > Honestly I expected it to be slower (which it was), but I figured it's > worth seeing what alternate plans it'll generate (specifically to see how > it cost a nested loop on that join to compare to the fast plan). > Unfortunately, it generated a merge join, so I think it might require both > enable_hashjoin=false and enable_mergejoin=false to get it which is likely > to be even slower in practice but still may be useful to see. Setting both to false gives a dramatic performance boost. See http://rafb.net/paste/results/b70KAi42.html This gives suitable speed, but why does the plan vary so much with such a minor change in the WHERE clause? -- Dan Langille : http://www.langille.org/ BSDCan - The Technical BSD Conference - http://www.bsdcan.org/
On Fri, 21 Jan 2005 02:36 am, Dan Langille wrote: > On 20 Jan 2005 at 7:26, Stephan Szabo wrote: [snip] > > Honestly I expected it to be slower (which it was), but I figured it's > > worth seeing what alternate plans it'll generate (specifically to see how > > it cost a nested loop on that join to compare to the fast plan). > > Unfortunately, it generated a merge join, so I think it might require both > > enable_hashjoin=false and enable_mergejoin=false to get it which is likely > > to be even slower in practice but still may be useful to see. > > Setting both to false gives a dramatic performance boost. See > http://rafb.net/paste/results/b70KAi42.html > -> Materialize (cost=15288.70..15316.36 rows=2766 width=35) (actual time=0.004..0.596 rows=135 loops=92) -> Nested Loop (cost=0.00..15288.70 rows=2766 width=35) (actual time=0.060..9.130 rows=135 loops=1) The Planner here has a quite inaccurate guess at the number of rows that will match in the join. An alternative to turning off join types is to up the statistics on the Element columns because that's where the join is happening. Hopefullythe planner will get a better idea. However it may not be able too. 2766 rows vs 135 is quite likely to choose different plans. As youcan see you have had to turn off two join types to give something you wanted/expected. > This gives suitable speed, but why does the plan vary so much with > such a minor change in the WHERE clause? Plan 1 - broken -> Nested Loop (cost=0.00..3825.30 rows=495 width=35) (actual time=0.056..16.161 rows=218 loops=1) Plan 2 - deprecated -> Hash Join (cost=3676.78..10144.06 rows=2767 width=35) (actual time=7.638..1158.128 rows=135 loops=1) The performance difference is when the where is changed, you have a totally different set of selection options. The Plan 1 and Plan 2 shown from your paste earlier, report that you are out by a factor of 2 for plan 1. But for plan 2 its a factor of 20. The planner is likely to make the wrong choice when the stats are out by that factor. Beware what is a small "typing" change does not mean they queries are anything alight. Regards Russell Smith.
On 21 Jan 2005 at 8:38, Russell Smith wrote: > On Fri, 21 Jan 2005 02:36 am, Dan Langille wrote: > > On 20 Jan 2005 at 7:26, Stephan Szabo wrote: > > [snip] > > > Honestly I expected it to be slower (which it was), but I figured > > > it's worth seeing what alternate plans it'll generate > > > (specifically to see how it cost a nested loop on that join to > > > compare to the fast plan). Unfortunately, it generated a merge > > > join, so I think it might require both enable_hashjoin=false and > > > enable_mergejoin=false to get it which is likely to be even slower > > > in practice but still may be useful to see. > > > > Setting both to false gives a dramatic performance boost. See > > http://rafb.net/paste/results/b70KAi42.html > > > -> Materialize (cost=15288.70..15316.36 rows=2766 width=35) > (actual time=0.004..0.596 rows=135 loops=92) > -> Nested Loop (cost=0.00..15288.70 rows=2766 > width=35) (actual time=0.060..9.130 rows=135 loops=1) > > The Planner here has a quite inaccurate guess at the number of rows > that will match in the join. An alternative to turning off join types > is to up the statistics on the Element columns because that's where > the join is happening. Hopefully the planner will get a better idea. > However it may not be able too. 2766 rows vs 135 is quite likely to > choose different plans. As you can see you have had to turn off two > join types to give something you wanted/expected. Fair comment. However, the statistics on ports.element_id, ports.deprecated, ports.broken, and element.id are both set to 1000. > > This gives suitable speed, but why does the plan vary so much with > > such a minor change in the WHERE clause? > Plan 1 - broken > -> Nested Loop (cost=0.00..3825.30 rows=495 width=35) (actual > time=0.056..16.161 rows=218 loops=1) > > Plan 2 - deprecated > -> Hash Join (cost=3676.78..10144.06 rows=2767 width=35) > (actual time=7.638..1158.128 rows=135 loops=1) > > The performance difference is when the where is changed, you have a > totally different set of selection options. The Plan 1 and Plan 2 > shown from your paste earlier, report that you are out by a factor of > 2 for plan 1. But for plan 2 its a factor of 20. The planner is > likely to make the wrong choice when the stats are out by that factor. > > Beware what is a small "typing" change does not mean they queries are > anything alight. Agreed. I just did not expect such a dramatic change which a result set that is similar. Actually, they aren't that similar at all. Thank you. -- Dan Langille : http://www.langille.org/ BSDCan - The Technical BSD Conference - http://www.bsdcan.org/