query plan question, nested loop vs hash join - Mailing list pgsql-performance
From | Andrey Lizenko |
---|---|
Subject | query plan question, nested loop vs hash join |
Date | |
Msg-id | CADKuZZB=TeBszkxmzPGeZ14cBm4wpCqK9TsfXyUoooamPzekOw@mail.gmail.com Whole thread Raw |
Responses |
Re: query plan question, nested loop vs hash join
|
List | pgsql-performance |
Hi,
I have similar problem as in
server version is 9.3.4
Here is only two quite simple tables:
db_new=# \d activities_example
Table "public.activities_example"
Column | Type | Modifiers
----------------+---------+-----------
id | integer |
order_chain_id | integer |
Indexes:
"activities_example_idx" btree (order_chain_id)
db_new=# \d orders_example
Table "public.orders_example"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
Number of rows as below:
db_new=# select count(*) from activities_example ;
count
---------
3059965
db_new=# select count(*) from orders_example ;
count
-------
19038
db_new=# select count(*) from activities_example where order_chain_id in (select id from orders_example);
count
-------
91426
(1 row)
and I can see that planner uses hashjoin with all enabled options and nested loop with disabled parameter:
db_new=# explain analyze select * from activities_example where order_chain_id in (select id from orders_example);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=513.36..57547.59 rows=89551 width=8) (actual time=18.340..966.367 rows=91426 loops=1)
Hash Cond: (activities_example.order_chain_id = orders_example.id)
-> Seq Scan on activities_example (cost=0.00..44139.65 rows=3059965 width=8) (actual time=0.018..294.216 rows=3059965 loops=1)
-> Hash (cost=275.38..275.38 rows=19038 width=4) (actual time=5.458..5.458 rows=19038 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 670kB
-> Seq Scan on orders_example (cost=0.00..275.38 rows=19038 width=4) (actual time=0.015..2.308 rows=19038 loops=1)
Total runtime: 970.234 ms
(7 rows)
db_new=# set enable_hashjoin = off;
SET
db_new=# explain analyze select * from activities_example where order_chain_id in (select id from orders_example);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=1629.09..166451.01 rows=89551 width=8) (actual time=16.091..116.476 rows=91426 loops=1)
-> Unique (cost=1628.66..1723.85 rows=19038 width=4) (actual time=15.929..23.156 rows=19038 loops=1)
-> Sort (cost=1628.66..1676.25 rows=19038 width=4) (actual time=15.892..19.884 rows=19038 loops=1)
Sort Key: orders_example.id
Sort Method: external sort Disk: 264kB
-> Seq Scan on orders_example (cost=0.00..275.38 rows=19038 width=4) (actual time=0.015..2.747 rows=19038 loops=1)
-> Index Scan using activities_example_idx on activities_example (cost=0.43..8.60 rows=5 width=8) (actual time=0.002..0.004 rows=5 loops=19038)
Index Cond: (order_chain_id = orders_example.id)
Total runtime: 121.366 ms
(9 rows)
second runtime is much more quicker.
What is the reason of "Seq Scan on activities_example" in the first case?
Is it possible to force optimizer choose the second plan without doing "set enable_hashjoin = off;" ?
Increasing of 'effective_cache_size' leads to similar thing with mergejoin,
other options (work_mem, shared_buffers. etc) do not change anything.
other options (work_mem, shared_buffers. etc) do not change anything.
Thanks in advance.
--
Regards, Andrey Lizenko
pgsql-performance by date: