Home > mailing lists

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions - Mailing list pgsql-general

From	David G. Johnston
Subject	Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions
Date	July 25, 2017 05:58:18
Msg-id	CAKFQuwZXN8wOfYfKK=8pt4VUnXur0bGJH9uh2216eLQ6W9gFzw@mail.gmail.com Whole thread
In response to	Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions
List	pgsql-general

Tree view

On Mon, Jul 24, 2017 at 3:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The cost to form the inner hash is basically negligible whether it's
de-duped or not, but if it's not (known) de-duped then the cost
estimate for the semijoin is going to rise some, and that discourages
selecting it.

Why does the "hash semi join" care about duplication of values on the inner relation? Doesn't it only care whether a given bucket exists irrespective of its contents?

Looking at those explains it would seem the "hash semi join" is simply an inherently more expensive to execute compared to a "hash join" and that the act of de-duping the inner relation would have to be quite expensive to overcome the gap. I cannot reconcile this with the previous paragraph though...

Pointing me to the readme or code file (comments) that explains this in more detail would be welcome. Not sure what to grep for - "Hash Semi Join" only turns up a couple of expected output results...

Thx.

David J.

pgsql-general by date:

From: Jeff Janes
Date: 25 July 2017, 05:50:07
Subject: Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

From: "David G. Johnston"
Date: 25 July 2017, 06:03:24
Subject: Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions - Mailing list pgsql-general

Previous

Next