Re: HashJoin w/option to unique-ify inner rel - Mailing list pgsql-hackers

From Robert Haas
Subject Re: HashJoin w/option to unique-ify inner rel
Date
Msg-id 603c8f070904161728p60da50a0o4833fbe9ba1a4f94@mail.gmail.com
Whole thread Raw
In response to Re: HashJoin w/option to unique-ify inner rel  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Thu, Apr 16, 2009 at 7:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Upon further review, it appears that a big part of this problem is
>> that cost_hashjoin() doesn't understand that it needs cost semi-joins
>> differently from inner or left joins.
>
> Yeah, I have a note to look into that before 8.4 final.  The same is
> true for nestloops: stopping after hitting one match to the current
> outer can make a big difference, and that's not reflected in costsize.c
> yet.  I'm not sure whether cost_mergejoin needs to care.

Cool.  It's worth noting that this is also a problem for anti-joins,
which can also stop after hitting one match.

For merge join, it looks like you might save something if there are
duplicates in both the inner and outer lists.  If the outer side
contains 1,1,1,1,1,2 and the inner side contains 1,1,1,2,2, then an
INNER JOIN or LEFT JOIN will rescan the first three tuples of the
inner side three times, whereas a SEMI or ANTI JOIN will scan the
first tuple three times and then the next two only once.  But I'm not
sure if we can estimate this accurately enough to matter.

...Robert


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: HashJoin w/option to unique-ify inner rel
Next
From: Robert Haas
Date:
Subject: Re: HashJoin w/option to unique-ify inner rel