Thread: pgsql: Support RIGHT and FULL OUTER JOIN in hash joins.

pgsql: Support RIGHT and FULL OUTER JOIN in hash joins.

From
Tom Lane
Date:
Support RIGHT and FULL OUTER JOIN in hash joins.

This is advantageous first because it allows us to hash the smaller table
regardless of the outer-join type, and second because hash join can be more
flexible than merge join in dealing with arbitrary join quals in a FULL
join.  For merge join all the join quals have to be mergejoinable, but hash
join will work so long as there's at least one hashjoinable qual --- the
others can be any condition.  (This is true essentially because we don't
keep per-inner-tuple match flags in merge join, while hash join can do so.)

To do this, we need a has-it-been-matched flag for each tuple in the
hashtable, not just one for the current outer tuple.  The key idea that
makes this practical is that we can store the match flag in the tuple's
infomask, since there are lots of bits there that are of no interest for a
MinimalTuple.  So we aren't increasing the size of the hashtable at all for
the feature.

To write this without turning the hash code into even more of a pile of
spaghetti than it already was, I rewrote ExecHashJoin in a state-machine
style, similar to ExecMergeJoin.  Other than that decision, it was pretty
straightforward.

Branch
------
master

Details
-------
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=f4e4b3274317d9ce30de7e7e5b04dece7c4e1791

Modified Files
--------------
src/backend/executor/nodeHash.c         |  151 ++++++++-
src/backend/executor/nodeHashjoin.c     |  576 ++++++++++++++++++-------------
src/backend/optimizer/path/equivclass.c |    7 +-
src/backend/optimizer/path/joinpath.c   |  145 +++-----
src/backend/optimizer/path/joinrels.c   |   12 +
src/backend/optimizer/plan/initsplan.c  |   10 +-
src/include/access/htup.h               |   23 ++
src/include/executor/hashjoin.h         |    2 +
src/include/executor/nodeHash.h         |   10 +-
src/include/nodes/execnodes.h           |   23 +-
10 files changed, 596 insertions(+), 363 deletions(-)


Re: pgsql: Support RIGHT and FULL OUTER JOIN in hash joins.

From
Robert Haas
Date:
On Thu, Dec 30, 2010 at 8:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Support RIGHT and FULL OUTER JOIN in hash joins.

This is cool, but on first blush your changes to
add_paths_to_joinrel() appear to be total nonsense.  I think the
problem is that have_nonmergeable_clause is really a misnomer - it's
not clear until you read through the whole diff that this won't get
set for inner/left joins.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pgsql: Support RIGHT and FULL OUTER JOIN in hash joins.

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> This is cool, but on first blush your changes to
> add_paths_to_joinrel() appear to be total nonsense.  I think the
> problem is that have_nonmergeable_clause is really a misnomer - it's
> not clear until you read through the whole diff that this won't get
> set for inner/left joins.

Yeah, I wasn't totally satisfied with that variable name either.
Do you have a better idea?

            regards, tom lane

Re: pgsql: Support RIGHT and FULL OUTER JOIN in hash joins.

From
Tom Lane
Date:
I wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> This is cool, but on first blush your changes to
>> add_paths_to_joinrel() appear to be total nonsense.  I think the
>> problem is that have_nonmergeable_clause is really a misnomer - it's
>> not clear until you read through the whole diff that this won't get
>> set for inner/left joins.

> Yeah, I wasn't totally satisfied with that variable name either.
> Do you have a better idea?

On reflection, how about inverting the flag's value and calling it
mergejoin_allowed or some such?

            regards, tom lane

Re: pgsql: Support RIGHT and FULL OUTER JOIN in hash joins.

From
Robert Haas
Date:
On Thu, Dec 30, 2010 at 9:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> This is cool, but on first blush your changes to
>>> add_paths_to_joinrel() appear to be total nonsense.  I think the
>>> problem is that have_nonmergeable_clause is really a misnomer - it's
>>> not clear until you read through the whole diff that this won't get
>>> set for inner/left joins.
>
>> Yeah, I wasn't totally satisfied with that variable name either.
>> Do you have a better idea?
>
> On reflection, how about inverting the flag's value and calling it
> mergejoin_allowed or some such?

Yeah, that's better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pgsql: Support RIGHT and FULL OUTER JOIN in hash joins.

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Dec 30, 2010 at 9:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> On reflection, how about inverting the flag's value and calling it
>> mergejoin_allowed or some such?

> Yeah, that's better.

OK, done that way.

            regards, tom lane

Re: pgsql: Support RIGHT and FULL OUTER JOIN in hash joins.

From
Hitoshi Harada
Date:
2011/1/1 Tom Lane <tgl@sss.pgh.pa.us>:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Dec 30, 2010 at 9:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> On reflection, how about inverting the flag's value and calling it
>>> mergejoin_allowed or some such?
>
>> Yeah, that's better.
>
> OK, done that way.

It looks to me like mergejoin_allowed should be initialized to false.
If enable_mergejoin is off and jointype != JOIN_FULL then mergejoin is
not allowed, isn't it? Sorry for noise if it's only my confusing.

Regards,

--
Hitoshi Harada

Re: pgsql: Support RIGHT and FULL OUTER JOIN in hash joins.

From
Tom Lane
Date:
Hitoshi Harada <umi.tanuki@gmail.com> writes:
> It looks to me like mergejoin_allowed should be initialized to false.
> If enable_mergejoin is off and jointype != JOIN_FULL then mergejoin is
> not allowed, isn't it? Sorry for noise if it's only my confusing.

No, the code is correct as-is: we need the flag to be set true by
default, because the way that the function is using it, we'll also skip
some nestloop possibilities when it's false.  Note the comment for step
2.

It could be that a different name for that flag variable would be a
better idea, but neither Robert nor I could come up with a better one.

            regards, tom lane

Re: pgsql: Support RIGHT and FULL OUTER JOIN in hash joins.

From
Hitoshi Harada
Date:
2011/5/24 Tom Lane <tgl@sss.pgh.pa.us>:
> Hitoshi Harada <umi.tanuki@gmail.com> writes:
>> It looks to me like mergejoin_allowed should be initialized to false.
>> If enable_mergejoin is off and jointype != JOIN_FULL then mergejoin is
>> not allowed, isn't it? Sorry for noise if it's only my confusing.
>
> No, the code is correct as-is: we need the flag to be set true by
> default, because the way that the function is using it, we'll also skip
> some nestloop possibilities when it's false.  Note the comment for step
> 2.
>
> It could be that a different name for that flag variable would be a
> better idea, but neither Robert nor I could come up with a better one.

Ah, ok. I think I now understand it. "mergejoin_doable" flag. Thanks.

Regards,


--
Hitoshi Harada