Home > mailing lists

query plan question, nested loop vs hash join - Mailing list pgsql-performance

From	Andrey Lizenko
Subject	query plan question, nested loop vs hash join
Date	October 7, 2014 22:27:31
Msg-id	CADKuZZB=TeBszkxmzPGeZ14cBm4wpCqK9TsfXyUoooamPzekOw@mail.gmail.com Whole thread Raw
Responses	Re: query plan question, nested loop vs hash join
List	pgsql-performance

Tree view

Hi,

I have similar problem as in

http://www.postgresql.org/message-id/flat/52B311C4.1070108@gmail.com#52B311C4.1070108@gmail.com

server version is 9.3.4

Here is only two quite simple tables:

db_new=# \d activities_example

Table "public.activities_example"

Column | Type | Modifiers

----------------+---------+-----------

id | integer |

order_chain_id | integer |

Indexes:

"activities_example_idx" btree (order_chain_id)

db_new=# \d orders_example

Table "public.orders_example"

Column | Type | Modifiers

--------+---------+-----------

id | integer |

Number of rows as below:

db_new=# select count(*) from activities_example ;

count

---------

3059965

db_new=# select count(*) from orders_example ;

count

-------

19038

db_new=# select count(*) from activities_example where order_chain_id in (select id from orders_example);

count

-------

91426

(1 row)

and I can see that planner uses hashjoin with all enabled options and nested loop with disabled parameter:

db_new=# explain analyze select * from activities_example where order_chain_id in (select id from orders_example);

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------

Hash Semi Join (cost=513.36..57547.59 rows=89551 width=8) (actual time=18.340..966.367 rows=91426 loops=1)

Hash Cond: (activities_example.order_chain_id = orders_example.id)

-> Seq Scan on activities_example (cost=0.00..44139.65 rows=3059965 width=8) (actual time=0.018..294.216 rows=3059965 loops=1)

-> Hash (cost=275.38..275.38 rows=19038 width=4) (actual time=5.458..5.458 rows=19038 loops=1)

Buckets: 2048 Batches: 1 Memory Usage: 670kB

-> Seq Scan on orders_example (cost=0.00..275.38 rows=19038 width=4) (actual time=0.015..2.308 rows=19038 loops=1)

Total runtime: 970.234 ms

(7 rows)

db_new=# set enable_hashjoin = off;

SET

db_new=# explain analyze select * from activities_example where order_chain_id in (select id from orders_example);

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=1629.09..166451.01 rows=89551 width=8) (actual time=16.091..116.476 rows=91426 loops=1)

-> Unique (cost=1628.66..1723.85 rows=19038 width=4) (actual time=15.929..23.156 rows=19038 loops=1)

-> Sort (cost=1628.66..1676.25 rows=19038 width=4) (actual time=15.892..19.884 rows=19038 loops=1)

Sort Key: orders_example.id

Sort Method: external sort Disk: 264kB

-> Seq Scan on orders_example (cost=0.00..275.38 rows=19038 width=4) (actual time=0.015..2.747 rows=19038 loops=1)

-> Index Scan using activities_example_idx on activities_example (cost=0.43..8.60 rows=5 width=8) (actual time=0.002..0.004 rows=5 loops=19038)

Index Cond: (order_chain_id = orders_example.id)

Total runtime: 121.366 ms

(9 rows)

second runtime is much more quicker.

What is the reason of "Seq Scan on activities_example" in the first case?

Is it possible to force optimizer choose the second plan without doing "set enable_hashjoin = off;" ?

Increasing of 'effective_cache_size' leads to similar thing with mergejoin,
other options (work_mem, shared_buffers. etc) do not change anything.

Thanks in advance.

Regards, Andrey Lizenko

pgsql-performance by date:

From: George Neuner
Date: 07 October 2014, 22:27:29
Subject: help: function failing

From: Mike Blackwell
Date: 07 October 2014, 22:55:56
Subject: Re: pg_basebackup - odd performance

query plan question, nested loop vs hash join - Mailing list pgsql-performance

Previous

Next