Thread: query plan question, nested loop vs hash join
Hi,
I have similar problem as in
server version is 9.3.4
Here is only two quite simple tables:
db_new=# \d activities_example
Table "public.activities_example"
Column | Type | Modifiers
----------------+---------+-----------
id | integer |
order_chain_id | integer |
Indexes:
"activities_example_idx" btree (order_chain_id)
db_new=# \d orders_example
Table "public.orders_example"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
Number of rows as below:
db_new=# select count(*) from activities_example ;
count
---------
3059965
db_new=# select count(*) from orders_example ;
count
-------
19038
db_new=# select count(*) from activities_example where order_chain_id in (select id from orders_example);
count
-------
91426
(1 row)
and I can see that planner uses hashjoin with all enabled options and nested loop with disabled parameter:
db_new=# explain analyze select * from activities_example where order_chain_id in (select id from orders_example);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=513.36..57547.59 rows=89551 width=8) (actual time=18.340..966.367 rows=91426 loops=1)
Hash Cond: (activities_example.order_chain_id = orders_example.id)
-> Seq Scan on activities_example (cost=0.00..44139.65 rows=3059965 width=8) (actual time=0.018..294.216 rows=3059965 loops=1)
-> Hash (cost=275.38..275.38 rows=19038 width=4) (actual time=5.458..5.458 rows=19038 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 670kB
-> Seq Scan on orders_example (cost=0.00..275.38 rows=19038 width=4) (actual time=0.015..2.308 rows=19038 loops=1)
Total runtime: 970.234 ms
(7 rows)
db_new=# set enable_hashjoin = off;
SET
db_new=# explain analyze select * from activities_example where order_chain_id in (select id from orders_example);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=1629.09..166451.01 rows=89551 width=8) (actual time=16.091..116.476 rows=91426 loops=1)
-> Unique (cost=1628.66..1723.85 rows=19038 width=4) (actual time=15.929..23.156 rows=19038 loops=1)
-> Sort (cost=1628.66..1676.25 rows=19038 width=4) (actual time=15.892..19.884 rows=19038 loops=1)
Sort Key: orders_example.id
Sort Method: external sort Disk: 264kB
-> Seq Scan on orders_example (cost=0.00..275.38 rows=19038 width=4) (actual time=0.015..2.747 rows=19038 loops=1)
-> Index Scan using activities_example_idx on activities_example (cost=0.43..8.60 rows=5 width=8) (actual time=0.002..0.004 rows=5 loops=19038)
Index Cond: (order_chain_id = orders_example.id)
Total runtime: 121.366 ms
(9 rows)
second runtime is much more quicker.
What is the reason of "Seq Scan on activities_example" in the first case?
Is it possible to force optimizer choose the second plan without doing "set enable_hashjoin = off;" ?
Increasing of 'effective_cache_size' leads to similar thing with mergejoin,
other options (work_mem, shared_buffers. etc) do not change anything.
other options (work_mem, shared_buffers. etc) do not change anything.
Thanks in advance.
--
Regards, Andrey Lizenko
On Fri, Oct 3, 2014 at 6:38 PM, Andrey Lizenko <lizenko79@gmail.com> wrote: > Is it possible to force optimizer choose the second plan without doing "set > enable_hashjoin = off;" ? > > Increasing of 'effective_cache_size' leads to similar thing with mergejoin, > other options (work_mem, shared_buffers. etc) do not change anything. Have you tried changing random_page_cost? In small databases where most of the data is cached anyway, lowering random_page_cost to somewhere between 1 and 2 usually leads to better planner decisions. Regards, Marti
Thanks for your reply, Marti, as I answered to Tom couple of days ago adjusting of 'effective_cache_size' to 80% of RAM and 'random_page_cost' from 2 to 1 helped me.
On 8 October 2014 00:26, Marti Raudsepp <marti@juffo.org> wrote:
On Fri, Oct 3, 2014 at 6:38 PM, Andrey Lizenko <lizenko79@gmail.com> wrote:
> Is it possible to force optimizer choose the second plan without doing "set
> enable_hashjoin = off;" ?
>
> Increasing of 'effective_cache_size' leads to similar thing with mergejoin,
> other options (work_mem, shared_buffers. etc) do not change anything.
Have you tried changing random_page_cost?
In small databases where most of the data is cached anyway, lowering
random_page_cost to somewhere between 1 and 2 usually leads to better
planner decisions.
Regards,
Marti
С уважением, Андрей Лизенко