Thread: Hash join in 8.3
I´m running some compatibility and performance tests, between two servers with 8.1 and 8.3 as follows : [1] 8.1: postgres 8.1.9 / 1GB Ram / P4 3Ghz / Debian 2.6.18-4-686 [2] 8.3: postgres 8.3 b4 / 1GB Ram / P4 3Ghz / Debian 2.6.18-4-686 [2] is faster for every single operation, but I found something with the planner that seems odd. Consider this structure: create table test ( i bigint unique not null, t text ); populated with 4 million rows with generate_series(1,4000000) create table jtest ( i bigint not null, constraint jtestfk foreign key (i) references test (i) ); populated with 6 million rows And the query: # select j.i, t.t from jtest j inner join test t on t.i = j.i where (j.i*1.5) between 3000000 and 4000000; Planner for [1]: Nested Loop (cost=0.00..270192.02 rows=20000 width=41) (actual time=4192.514..32781.498 rows=1333334 loops=1) -> Seq Scan on jtest j (cost=0.00..179412.02 rows=30000 width=8) (actual time=4147.813..19195.877 rows=1333334 loops=1) Filter: ((((i)::numeric * 1.5) >= 3000000::numeric) AND (((i)::numeric * 1.5) <= 4000000::numeric)) -> Index Scan using test_i_key on test t (cost=0.00..3.01 rows=1 width=41) (actual time=0.007..0.008 rows=1 loops=1333334) Index Cond: (t.i = "outer".i) Total runtime: 33372.300 ms Planner for [2]: Hash Join (cost=176924.02..297518.03 rows=20000 width=38) (actual time=125715.079..239893.461 rows=1333334 loops=1) Hash Cond: (t.i = j.i) -> Seq Scan on test t (cost=0.00..75394.00 rows=4000000 width=38) (actual time=0.051..4344.157 rows=4000000 loops=1) -> Hash (cost=176549.02..176549.02 rows=30000 width=8) (actual time=11711.708..11711.708 rows=1333334 loops=1) -> Seq Scan on jtest j (cost=0.00..176549.02 rows=30000 width=8) (actual time=2228.052..10812.444 rows=1333334 loops=1) Filter: ((((i)::numeric * 1.5) >= 3000000::numeric) AND (((i)::numeric * 1.5) <= 4000000::numeric)) Total runtime: 240461.273 ms Besides the (expected) weak guess on rows for both servers on seq scan on jtest, there is something nasty with [2] that prevents the planner to use the index. For some reason, [1] uses the index first, and then seq scan to filter. [2] seq scans filter first, and hash aggregate instead of using the index. Now, turning off hashing: # set enable_hashjoin=off; # set enable_hashagg=off; Again for [2]: Merge Join (cost=178781.75..328370.60 rows=20000 width=38) (actual time=15703.086..18799.493 rows=1333334 loops=1) Merge Cond: (t.i = j.i) -> Index Scan using test_i_key on test t (cost=0.00..139273.96 rows=4000018 width=38) (actual time=0.125..2895.835 rows=2666667 loops=1) -> Sort (cost=178781.75..178856.75 rows=30000 width=8) (actual time=12423.001..13007.569 rows=1333334 loops=1) Sort Key: j.i Sort Method: quicksort Memory: 84852kB -> Seq Scan on jtest j (cost=0.00..176550.85 rows=30000 width=8) (actual time=2076.495..10417.157 rows=1333334 loops=1) Filter: ((((i)::numeric * 1.5) >= 3000000::numeric) AND (((i)::numeric * 1.5) <= 4000000::numeric)) Total runtime: 19340.734 ms Works fine now. Quicksort and index scan. Some points here: 1. The query is kinda stupid, but its a compatibility test (I´m trying to figure out how many queries must be rewritten for 8.3) 2. Vacuum is up2date! 3. Is there any way to make [2] use the index ? -- []´s, André Volpato Ecom Tecnologia LTDA - Análise e Desenvolvimento andre.volpato@ecomtecnologia.com.br
=?ISO-8859-1?Q?Andr=E9_Volpato?= <andre.volpato@ecomtecnologia.com.br> writes: > Besides the (expected) weak guess on rows for both servers on seq scan > on jtest, there is something nasty with [2] that prevents the planner to > use the index. There isn't anything "preventing" either version from choosing any of the three plans, as you can easily prove for yourself by experimenting with enable_nestloop/enable_mergejoin/enable_hashjoin. The cost estimates seem close enough that random variations in ANALYZE stats would change which one looks cheapest. regards, tom lane
André Volpato <andre.volpato@ecomtecnologia.com.br> writes: > And the query: > > # select j.i, t.t from jtest j inner join test t on t.i = j.i where (j.i*1.5) > between 3000000 and 4000000; > > Planner for [1]: > Nested Loop (cost=0.00..270192.02 rows=20000 width=41) (actual > Planner for [2]: > Hash Join (cost=176924.02..297518.03 rows=20000 width=38) (actual > Now, turning off hashing: > # set enable_hashjoin=off; > # set enable_hashagg=off; > > Again for [2]: > Merge Join (cost=178781.75..328370.60 rows=20000 width=38) (actual I think the answer is that if you have bad statistics you'll get a bad plan and which bad plan is going to be pretty much random. But I'm curious if you turn off mergejoin whether you can get a Nested Loop plan and what cost 8.3 gives it. It looks to me like 8.3 came up with a higher cost for Nested Loop than 8.1.9 (I think 8.1.10 came out with some planner fixes btw) and so it's deciding these other plans are better. And they might have been better for the imaginary scenario that the planner thinks is going on. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support!
Gregory Stark escreveu: > André Volpato <andre.volpato@ecomtecnologia.com.br> writes: > >> And the query: >> >> # select j.i, t.t from jtest j inner join test t on t.i = j.i where (j.i*1.5) >> between 3000000 and 4000000; >> >> Planner for [1]: >> Nested Loop (cost=0.00..270192.02 rows=20000 width=41) (actual >> >> Planner for [2]: >> Hash Join (cost=176924.02..297518.03 rows=20000 width=38) (actual >> >> Now, turning off hashing: >> # set enable_hashjoin=off; >> # set enable_hashagg=off; >> >> Again for [2]: >> Merge Join (cost=178781.75..328370.60 rows=20000 width=38) (actual >> > > I think the answer is that if you have bad statistics you'll get a bad plan > and which bad plan is going to be pretty much random. > I believe the statistics are ok, I´ve runned vacuum analyze before all those tries. > But I'm curious if you turn off mergejoin whether you can get a Nested Loop > plan and what cost 8.3 gives it. It looks to me like 8.3 came up with a higher > cost for Nested Loop than 8.1.9 (I think 8.1.10 came out with some planner > fixes btw) and so it's deciding these other plans are better. And they might > have been better for the imaginary scenario that the planner thinks is going > on. Not anymore :) Nested Loop (cost=0.00..389461.65 rows=20000 width=38) Total runtime: 22934.656 ms Without hash and merge, the plan is exactly the same for 8.1 and 8.3. No inicial cost for nested loops... it seems that hash < merge < nested in this case. -- []´s, André Volpato Ecom Tecnologia LTDA - Análise e Desenvolvimento andre.volpato@ecomtecnologia.com.br
Gregory Stark <stark@enterprisedb.com> writes: > But I'm curious if you turn off mergejoin whether you can get a Nested Loop > plan and what cost 8.3 gives it. It looks to me like 8.3 came up with a higher > cost for Nested Loop than 8.1.9 (I think 8.1.10 came out with some planner > fixes btw) and so it's deciding these other plans are better. And they might > have been better for the imaginary scenario that the planner thinks is going > on. Actually, now that I think about it, 8.3 should be *more* likely than 8.1 to choose a nestloop-with-inner-indexscan plan. 8.1 didn't have the changes to allow a discount for repeated inner indexscans. I'm wondering if (a) the 8.1 installation being compared to had some planner cost parameter changes that were not copied into the 8.3 installation; or (b) the only reason 8.1 likes the nestloop plan is that it has no statistics on the test tables, whereas 8.3 does have stats because of autovacuum being on by default. regards, tom lane
Tom Lane escreveu: > Gregory Stark <stark@enterprisedb.com> writes: > >> But I'm curious if you turn off mergejoin whether you can get a Nested Loop >> plan and what cost 8.3 gives it. It looks to me like 8.3 came up with a higher >> cost for Nested Loop than 8.1.9 (I think 8.1.10 came out with some planner >> fixes btw) and so it's deciding these other plans are better. And they might >> have been better for the imaginary scenario that the planner thinks is going >> on. >> Let me show this part again, to make things easier to understand =) Nested Loop (cost=0.00..389461.65 rows=20000 width=38) Total runtime: 22934.656 ms Without hash and merge, the plan is exactly the same for 8.1 and 8.3. No inicial cost for nested loops... it seems that hash < merge < nested in this case. > Actually, now that I think about it, 8.3 should be *more* likely than > 8.1 to choose a nestloop-with-inner-indexscan plan. 8.1 didn't have the > changes to allow a discount for repeated inner indexscans. > > I'm wondering if > > (a) the 8.1 installation being compared to had some planner cost > parameter changes that were not copied into the 8.3 installation; or > The parameters are at their default. The only changes made are in shared buffers, work and maintenance mem. > (b) the only reason 8.1 likes the nestloop plan is that it has no > statistics on the test tables, whereas 8.3 does have stats because > of autovacuum being on by default. > I dont know if I understand what test tables you are talking about. To run this tests I disabled autovacuum, and run "vacuum analyze test / jtest" before. -- []´s, André Volpato Ecom Tecnologia LTDA - Análise e Desenvolvimento andre.volpato@ecomtecnologia.com.br
Tom Lane escreveu: <blockquote cite="mid:18544.1197574622@sss.pgh.pa.us" type="cite"><pre wrap="">Gregory Stark <a class="moz-txt-link-rfc2396E"href="mailto:stark@enterprisedb.com"><stark@enterprisedb.com></a> writes: </pre><blockquotetype="cite"><pre wrap="">But I'm curious if you turn off mergejoin whether you can get a Nested Loop plan and what cost 8.3 gives it. It looks to me like 8.3 came up with a higher cost for Nested Loop than 8.1.9 (I think 8.1.10 came out with some planner fixes btw) and so it's deciding these other plans are better. And they might have been better for the imaginary scenario that the planner thinks is going on. </pre></blockquote><pre wrap=""> Actually, now that I think about it, 8.3 should be *more* likely than 8.1 to choose a nestloop-with-inner-indexscan plan. 8.1 didn't have the changes to allow a discount for repeated inner indexscans. I'm wondering if (a) the 8.1 installation being compared to had some planner cost parameter changes that were not copied into the 8.3 installation; or (b) the only reason 8.1 likes the nestloop plan is that it has no statistics on the test tables, whereas 8.3 does have stats because of autovacuum being on by default. regards, tom lane </pre></blockquote><br /> I think I found the answer!<br /><br /> 8.1: likes nested loop even aftervacuumdb on the database.<br /><br /> 8.3: likes hash at first time but:<br /> - after vacuumdb *on the database* (Iwas running on the tables.....), it turns out to:<br /> Merge Join (cost=178779.93..328503.44 rows=30000 width=38) in20005.207 ms<br /> #set enable_mergejoin=off;<br /> Hash Join (cost=156644.00..365204.03 rows=30000 width=38) in 29104.390ms<br /> * a very faster hash here, seqscanning the smaller table before the bigger one. Tricky!<br /><br /> Iwont trust table vacuums anymore...<br /><br /><pre class="moz-signature" cols="72">-- []´s, André Volpato Ecom Tecnologia LTDA - Análise e Desenvolvimento <a class="moz-txt-link-abbreviated" href="mailto:andre.volpato@ecomtecnologia.com.br">andre.volpato@ecomtecnologia.com.br</a></pre>
André Volpato <andre.volpato@ecomtecnologia.com.br> writes: > Gregory Stark escreveu: >> André Volpato <andre.volpato@ecomtecnologia.com.br> writes: >> >> I think the answer is that if you have bad statistics you'll get a bad plan >> and which bad plan is going to be pretty much random. >> > I believe the statistics are ok, I´ve runned vacuum analyze before all those > tries. Sorry, I should have said "bad estimates". That is, because of the j*1.5 BETWEEN 3000000 AND 4000000 clause the optimizer isn't going to be able to come up with a good estimate of how many rows that will match. What plan it picks when it has such a bad estimate is going to be pretty random, dependant on just what plans would be good in a situation entirely unrelated to the reality. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support!
André Volpato <andre.volpato@ecomtecnologia.com.br> writes: > I think I found the answer!<br> > <br> > 8.1: likes nested loop even after vacuumdb on the database.<br> > <br> > 8.3: likes hash at first time but:<br> > - after vacuumdb *on the database* (I was running on the tables.....), > it turns out to:<br> > Merge Join (cost=178779.93..328503.44 rows=30000 width=38) in > 20005.207 ms<br> > #set enable_mergejoin=off;<br> > Hash Join (cost=156644.00..365204.03 rows=30000 width=38) in > 29104.390 ms<br> > * a very faster hash here, seqscanning the smaller table before the > bigger one. Tricky!<br> > <br> > I wont trust table vacuums anymore...<br> > <br> > <pre class="moz-signature" cols="72">-- HTML-only mail isn't looked upon too favourably here. You keep saying "vacuum" which makes me think maybe you're not actually analyzing your tables at all. "vacuum" doesn't analyze the tables, you have to run "analyze" (or "vacuum analyze") for that. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
Gregory Stark escreveu: > André Volpato <andre.volpato@ecomtecnologia.com.br> writes: > >> I think I found the answer! >> >> 8.1: likes nested loop even after vacuumdb on the database.<br> >> >> 8.3: likes hash at first time but: >> - after vacuumdb *on the database* (I was running on the tables.....), it turns out to: >> Merge Join (cost=178779.93..328503.44 rows=30000 width=38) in 20005.207 ms >> # set enable_mergejoin=off; >> Hash Join(cost=156644.00..365204.03 rows=30000 width=38) in 29104.390 ms >> * a very faster hash here, seqscanning the smaller table before the bigger one. Tricky! >> >> I wont trust table vacuums anymore... >> >> > > HTML-only mail isn't looked upon too favourably here. > My bad. Tbird for some reason isn´t auto-removing html in sent mail to @postgresql.org. > You keep saying "vacuum" which makes me think maybe you're not actually > analyzing your tables at all. "vacuum" doesn't analyze the tables, you have to > run "analyze" (or "vacuum analyze") for that. I always run vaccumm analyze. The plan only changes in 8.3 after "vacuumdb -v -z database". > Sorry, I should have said "bad estimates". That is, because of the > > j*1.5 BETWEEN 3000000 AND 4000000 That's supposed to be that way :) I think all of this worth for me to have a clue on how the planner goes in bad sql, wich causes bad estimates. Thank you all for your support! -- []´s, André Volpato Ecom Tecnologia LTDA - Análise e Desenvolvimento andre.volpato@ecomtecnologia.com.br