Re: Selection of join algorithm. - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Selection of join algorithm.
Date
Msg-id CAMkU=1xde+VJbNhCcVLuD=8BvjN4_g94S07Pnkr1h5tiCn0sYA@mail.gmail.com
Whole thread Raw
In response to Selection of join algorithm.  (Ishaya Bhatt <ishayabhatt@gmail.com>)
List pgsql-hackers
On Sat, Mar 8, 2014 at 6:18 AM, Ishaya Bhatt <ishayabhatt@gmail.com> wrote:
Hi,

I am trying to analyze join performance. But I see that even for a table having 100,000 rows and join attribute as primary key, postgres always performs hash join.

Can anyone please tell me under which conditions merge join or nested loop join is invoked?


Unless you trying to look into the source code of postgresql to see how the internals of the planner works, this should really go to pgsql-performance@postgresql.org, not to hackers.

A nested loop would be favored if there were some WHERE condition that filtered out nearly all of the rows of the "outer" table.  In that case, only a small amount of the inner table needs to be accessed, and so reading the whole thing to hash it would be too expensive.

A merge join would be favored if you used an "ORDER BY" to ask for the data to be sorted in the same order as the merge join would naturally deliver it in.

If the data is too large to fit in work_mem, it might favor either the merge join or nested loop compared to the hash join. This stuff is hard to discuss in the abstract.  It is probably best to use the enable_*join settings to see what it does with your actual data (or better yet a synthetic data set whose generator you can share with us).

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Thom Brown
Date:
Subject: db_user_namespace a "temporary measure"
Next
From: "Wang, Jing"
Date:
Subject: issue log message to suggest VACUUM FULL if a table is nearly empty