Thread: how to change the provoke table in hash join
Hi,
Below are two query plan for same SQL with and without an index. I noticed the Hash join order has changed since index has been created and this is not what I want. As it’s hashing the big table and to provoke records in a small table. in Oracle, it’s simple to add hint to point the table you’d like to be used as the provoke table. However, in Postgres, I don’t know how to change the behavior.
--plan 1, 10 seconds were spent on sequential scan on term_weekly table.
dev=# explain analyze select distinct cs_id from lookup_weekly n inner join term_weekly s on s.b_id=n.b_id and s.date=n.date where term in ('cat'::text);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=2100211.06..2100211.11 rows=5 width=4) (actual time=27095.470..27095.487 rows=138 loops=1)
-> Hash Join (cost=954343.95..2100211.04 rows=5 width=4) (actual time=24088.912..27095.206 rows=160 loops=1)
Hash Cond: (((n.b_id)::text = (s.b_id)::text) AND (n.date = s.date))
-> Append (cost=0.00..862153.59 rows=37828460 width=52) (actual time=0.016..10923.091 rows=37828459 loops=1)
-> Seq Scan on lookup_weekly n (cost=0.00..0.00 rows=1 width=524) (actual time=0.001..0.001 rows=0 loops=1)
-> Seq Scan on lookup_weekly_20131130 n_1 (cost=0.00..117764.18 rows=5158718 width=52) (actual time=0.015..1229.217 rows=5158718 loops=1)
-> Seq Scan on lookup_weekly_20131207 n_2 (cost=0.00..117764.18 rows=5158718 width=52) (actual time=5.225..1177.539 rows=5158718 loops=1)
-> Seq Scan on lookup_weekly_20131214 n_3 (cost=0.00..117764.18 rows=5158718 width=52) (actual time=5.756..1274.135 rows=5158718 loops=1)
-> Seq Scan on lookup_weekly_20131221 n_4 (cost=0.00..117764.18 rows=5158718 width=52) (actual time=4.269..1131.570 rows=5158718 loops=1)
-> Seq Scan on lookup_weekly_20131228 n_5 (cost=0.00..117764.18 rows=5158718 width=52) (actual time=9.383..1110.435 rows=5158718 loops=1)
-> Seq Scan on lookup_weekly_20140426 n_6 (cost=0.00..91715.42 rows=4042442 width=52) (actual time=8.137..947.724 rows=4042442 loops=1)
-> Seq Scan on lookup_weekly_20140503 n_7 (cost=0.00..93516.49 rows=4118149 width=52) (actual time=7.717..791.339 rows=4118149 loops=1)
-> Seq Scan on lookup_weekly_20140329 n_8 (cost=0.00..88100.78 rows=3874278 width=52) (actual time=0.004..637.297 rows=3874278 loops=1)
-> Hash (cost=954343.47..954343.47 rows=32 width=61) (actual time=10604.327..10604.327 rows=553 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 43kB
-> Append (cost=0.00..954343.47 rows=32 width=61) (actual time=10.009..10602.075 rows=553 loops=1)
-> Seq Scan on term_weekly s (cost=0.00..0.00 rows=1 width=520) (actual time=0.000..0.000 rows=0 loops=1)
Filter: (term = 'cat'::text)
-> Seq Scan on term_weekly_20140503 s_1 (cost=0.00..262030.12 rows=8 width=46) (actual time=10.007..3738.945 rows=166 loops=1)
Filter: (term = 'cat'::text)
Rows Removed by Filter: 8516324
-> Seq Scan on term_weekly_20140510 s_2 (cost=0.00..246131.35 rows=8 width=46) (actual time=52.059..2316.793 rows=152 loops=1)
Filter: (term = 'cat'::text)
Rows Removed by Filter: 8010196
-> Seq Scan on term_weekly_20140517 s_3 (cost=0.00..233644.94 rows=8 width=46) (actual time=26.661..2504.273 rows=135 loops=1)
Filter: (term = 'cat'::text)
Rows Removed by Filter: 7632420
-> Seq Scan on term_weekly_20140524 s_4 (cost=0.00..212537.06 rows=7 width=46) (actual time=49.773..2041.578 rows=100 loops=1)
Filter: (term = 'cat'::text)
Rows Removed by Filter: 6950865
Total runtime: 27095.639 ms
(31 rows)
--plan 2, only 1 second spent on index scan of term_weekly table, however, as it selects the big table to do the hashing, it takes 22 seconds for the hash to complete. The advantage get from index has been totally lost because of this join order.
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1429795.17..1429795.22 rows=5 width=4) (actual time=22991.289..22991.307 rows=138 loops=1)
-> Hash Join (cost=1429580.49..1429795.15 rows=5 width=4) (actual time=22963.340..22991.214 rows=160 loops=1)
Hash Cond: (((s.b_id)::text = (n.b_id)::text) AND (s.date = n.date))
-> Append (cost=0.00..142.77 rows=32 width=61) (actual time=0.052..1.125 rows=553 loops=1)
-> Seq Scan on term_weekly s (cost=0.00..0.00 rows=1 width=520) (actual time=0.001..0.001 rows=0 loops=1)
Filter: (term = 'cat'::text)
-> Index Scan using idx_term_weekly_20140503_3 on term_weekly_20140503 s_1 (cost=0.56..36.70 rows=8 width=46) (actual time=0.051..0.353 rows=166 loops=1)
Index Cond: (term = 'cat'::text)
-> Index Scan using idx_term_weekly_20140510_3 on term_weekly_20140510 s_2 (cost=0.56..36.70 rows=8 width=46) (actual time=0.043..0.293 rows=152 loops=1)
Index Cond: (term = 'cat'::text)
-> Index Scan using idx_term_weekly_20140517_3 on term_weekly_20140517 s_3 (cost=0.56..36.70 rows=8 width=46) (actual time=0.029..0.244 rows=135 loops=1)
Index Cond: (term = 'cat'::text)
-> Index Scan using idx_term_weekly_20140524_3 on term_weekly_20140524 s_4 (cost=0.56..32.68 rows=7 width=46) (actual time=0.024..0.192 rows=100 loops=1)
Index Cond: (term = 'cat'::text)
-> Hash (cost=862153.59..862153.59 rows=37828460 width=52) (actual time=22939.457..22939.457 rows=37828459 loops=1)
Buckets: 4194304 Batches: 1 Memory Usage: 3144960kB
-> Append (cost=0.00..862153.59 rows=37828460 width=52) (actual time=0.010..9100.690 rows=37828459 loops=1)
-> Seq Scan on lookup_weekly n (cost=0.00..0.00 rows=1 width=524) (actual time=0.001..0.001 rows=0 loops=1)
-> Seq Scan on lookup_weekly_20131130 n_1 (cost=0.00..117764.18 rows=5158718 width=52) (actual time=0.008..1099.194 rows=5158718 loops=1)
-> Seq Scan on lookup_weekly_20131207 n_2 (cost=0.00..117764.18 rows=5158718 width=52) (actual time=0.004..861.678 rows=5158718 loops=1)
-> Seq Scan on lookup_weekly_20131214 n_3 (cost=0.00..117764.18 rows=5158718 width=52) (actual time=0.004..860.374 rows=5158718 loops=1)
-> Seq Scan on lookup_weekly_20131221 n_4 (cost=0.00..117764.18 rows=5158718 width=52) (actual time=0.003..852.169 rows=5158718 loops=1)
-> Seq Scan on lookup_weekly_20131228 n_5 (cost=0.00..117764.18 rows=5158718 width=52) (actual time=0.005..835.201 rows=5158718 loops=1)
-> Seq Scan on lookup_weekly_20140426 n_6 (cost=0.00..91715.42 rows=4042442 width=52) (actual time=0.005..663.261 rows=4042442 loops=1)
-> Seq Scan on lookup_weekly_20140503 n_7 (cost=0.00..93516.49 rows=4118149 width=52) (actual time=0.006..678.281 rows=4118149 loops=1)
-> Seq Scan on lookup_weekly_20140329 n_8 (cost=0.00..88100.78 rows=3874278 width=52) (actual time=0.003..635.296 rows=3874278 loops=1)
Total runtime: 22995.361 ms
(27 rows)
Thanks,
Suya
--plan 1, 10 seconds were spent on sequential scan on term_weekly table.
dev=# explain analyze select distinct cs_id from lookup_weekly n inner join term_weekly s on s.b_id=n.b_id and s.date=n.date where term in ('cat'::text);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=2100211.06..2100211.11 rows=5 width=4) (actual time=27095.470..27095.487 rows=138 loops=1)...
--plan 2, only 1 second spent on index scan of term_weekly table, however, as it selects the big table to do the hashing, it takes 22 seconds for the hash to complete. The advantage get from index has been totally lost because of this join order.
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1429795.17..1429795.22 rows=5 width=4) (actual time=22991.289..22991.307 rows=138 loops=1)...
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres
On Wed, Sep 10, 2014 at 10:05 PM, Huang, Suya <Suya.Huang@au.experian.com> wrote:--plan 1, 10 seconds were spent on sequential scan on term_weekly table.
dev=# explain analyze select distinct cs_id from lookup_weekly n inner join term_weekly s on s.b_id=n.b_id and s.date=n.date where term in ('cat'::text);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=2100211.06..2100211.11 rows=5 width=4) (actual time=27095.470..27095.487 rows=138 loops=1)
...
--plan 2, only 1 second spent on index scan of term_weekly table, however, as it selects the big table to do the hashing, it takes 22 seconds for the hash to complete. The advantage get from index has been totally lost because of this join order.
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1429795.17..1429795.22 rows=5 width=4) (actual time=22991.289..22991.307 rows=138 loops=1)
...Am I reading something wrong here? I haven't looked all the plan, but the second is faster (overall), so why do you think you need a hint or change what the planner choose? For me looks like using the index is the best for this situation. Could you try running this multiple times and taking the min/max/avg time of both?
From: Jeff Janes [mailto:jeff.janes@gmail.com]
Sent: Friday, September 12, 2014 4:09 AM
To: Matheus de Oliveira
Cc: Huang, Suya; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] how to change the provoke table in hash join
On Thu, Sep 11, 2014 at 7:09 AM, Matheus de Oliveira <matioli.matheus@gmail.com> wrote:
On Wed, Sep 10, 2014 at 10:05 PM, Huang, Suya <Suya.Huang@au.experian.com> wrote:
--plan 1, 10 seconds were spent on sequential scan on term_weekly table.
dev=# explain analyze select distinct cs_id from lookup_weekly n inner join term_weekly s on s.b_id=n.b_id and s.date=n.date where term in ('cat'::text);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=2100211.06..2100211.11 rows=5 width=4) (actual time=27095.470..27095.487 rows=138 loops=1)
...
--plan 2, only 1 second spent on index scan of term_weekly table, however, as it selects the big table to do the hashing, it takes 22 seconds for the hash to complete. The advantage get from index has been totally lost because of this join order.
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1429795.17..1429795.22 rows=5 width=4) (actual time=22991.289..22991.307 rows=138 loops=1)
...
Am I reading something wrong here? I haven't looked all the plan, but the second is faster (overall), so why do you think you need a hint or change what the planner choose? For me looks like using the index is the best for this situation. Could you try running this multiple times and taking the min/max/avg time of both?
The difference in time could be a caching effect, not a reproducible difference.
The 2nd plan uses 3GB of memory, and there might be better uses for that memory.
Currently memory is un-costed, other than "cliff costing" once you thinks it will exceed work_mem, which I think is a problem. Just because I will let you use 4GB of memory if you will really benefit from it, doesn't mean you should use 4GB gratuitously.
Suya, what happens if you lower work_mem setting? Does it revert to the plan you want?
Cheers,
Jeff
Hey Jeff,
It’s quite interesting, after I reduced the work_mem to 1GB, it chose the right plan. Also, if I create a temporary table and then join it with the temporary table, it also chose the right plan. Is this a defect of PG optimizer? While doing hash join, it’s unable to pick the small table to be the hash probe table while the query is complicated (not really that complicated in this case)
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1524294.96..1524295.01 rows=5 width=4) (actual time=13409.960..13409.979 rows=138 loops=1)
-> Hash Join (cost=143.25..1524294.94 rows=5 width=4) (actual time=10648.440..13409.718 rows=160 loops=1)
Hash Cond: (((n.b_id)::text = (s.b_id)::text) AND (n.date = s.date))
-> Append (cost=0.00..862153.59 rows=37828460 width=52) (actual time=0.006..8152.938 rows=37828459 loops=1)
-> Seq Scan on lookup_weekly n (cost=0.00..0.00 rows=1 width=524) (actual time=0.000..0.000 rows=0 loops=1)
-> Seq Scan on lookup_weekly_20131130 n_1 (cost=0.00..117764.18 rows=5158718 width=52) (actual time=0.006..743.985 rows=5158718 loops=1)
-> Seq Scan on lookup_weekly_20131207 n_2 (cost=0.00..117764.18 rows=5158718 width=52) (actual time=0.003..894.061 rows=5158718 loops=1)
-> Seq Scan on lookup_weekly_20131214 n_3 (cost=0.00..117764.18 rows=5158718 width=52) (actual time=0.008..746.660 rows=5158718 loops=1)
-> Seq Scan on lookup_weekly_20131221 n_4 (cost=0.00..117764.18 rows=5158718 width=52) (actual time=0.004..750.305 rows=5158718 loops=1)
-> Seq Scan on lookup_weekly_20131228 n_5 (cost=0.00..117764.18 rows=5158718 width=52) (actual time=0.004..741.233 rows=5158718 loops=1)
-> Seq Scan on lookup_weekly_20140426 n_6 (cost=0.00..91715.42 rows=4042442 width=52) (actual time=0.010..595.792 rows=4042442 loops=1)
-> Seq Scan on lookup_weekly_20140503 n_7 (cost=0.00..93516.49 rows=4118149 width=52) (actual time=0.009..598.208 rows=4118149 loops=1)
-> Seq Scan on lookup_weekly_20140329 n_8 (cost=0.00..88100.78 rows=3874278 width=52) (actual time=0.004..574.846 rows=3874278 loops=1)
-> Hash (cost=142.77..142.77 rows=32 width=61) (actual time=0.924..0.924 rows=553 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 43kB
-> Append (cost=0.00..142.77 rows=32 width=61) (actual time=0.031..0.752 rows=553 loops=1)
-> Seq Scan on term_weekly s (cost=0.00..0.00 rows=1 width=520) (actual time=0.000..0.000 rows=0 loops=1)
Filter: (term = 'cat'::text)
-> Index Scan using idx_term_weekly_20140503_3 on term_weekly_20140503 s_1 (cost=0.56..36.70 rows=8 width=46) (actual time=0.031..0.225 rows=166 loops=1)
Index Cond: (term = 'cat'::text)
-> Index Scan using idx_term_weekly_20140510_3 on term_weekly_20140510 s_2 (cost=0.56..36.70 rows=8 width=46) (actual time=0.023..0.192 rows=152 loops=1)
Index Cond: (term = 'cat'::text)
-> Index Scan using idx_term_weekly_20140517_3 on term_weekly_20140517 s_3 (cost=0.56..36.70 rows=8 width=46) (actual time=0.022..0.176 rows=135 loops=1)
Index Cond: (term = 'cat'::text)
-> Index Scan using idx_term_weekly_20140524_3 on term_weekly_20140524 s_4 (cost=0.56..32.68 rows=7 width=46) (actual time=0.016..0.126 rows=100 loops=1)
Index Cond: (term = 'cat'::text)
Total runtime: 13410.097 ms
Thanks,
Suya