FULL JOIN planner deficiency - Mailing list pgsql-hackers

From Tom Lane
Subject FULL JOIN planner deficiency
Date
Msg-id 32090.1539378124@sss.pgh.pa.us
Whole thread Raw
List pgsql-hackers
Consider this simple query:

regression=# explain select * from
    int8_tbl as a1 full join (select 1 as id) as a2 on (a1.q1 = a2.id);
                            QUERY PLAN
------------------------------------------------------------------
 Hash Full Join  (cost=0.03..1.11 rows=5 width=20)
   Hash Cond: (a1.q1 = (1))
   ->  Seq Scan on int8_tbl a1  (cost=0.00..1.05 rows=5 width=16)
   ->  Hash  (cost=0.02..0.02 rows=1 width=4)
         ->  Result  (cost=0.00..0.01 rows=1 width=4)
(5 rows)

Not too exciting-looking.  But this ought to be exactly equivalent:

regression=# create table dual();
CREATE TABLE
regression=# insert into dual default values;
INSERT 0 1
regression=# explain select * from
    int8_tbl as a1 full join (select 1 as id from dual) as a2 on (a1.q1 = a2.id);
ERROR:  FULL JOIN is only supported with merge-joinable or hash-joinable join conditions

I ran into this while testing the patch mentioned in
<5395.1539275668@sss.pgh.pa.us>, which basically causes the FROM-less
subselect to be treated the same as the "FROM dual" case.  But it's
a pre-existing, and long-standing, problem.

The root of the problem is that once the constant "1" has been pulled
up from the sub-select, we have a join qual that looks like "a1.q1 = 1",
and that is not a mergeable or hashable join qual, because it fails to
compare expressions from the two sides of the join.

I spent awhile thinking about whether we could generalize our notion
of mergeability, or hashability, to make this work, but it soon made
my head hurt.  Even if it's possible it would likely not be a change
we'd want to backpatch.

However, there's another way to deal with it, which is to wrap the
pulled-up constant in a PlaceHolderVar, which will cause it to act
like a Var for the purpose of recognizing a qual as mergeable/hashable.
The attached two-line (sans tests) patch does this and fixes the problem.

While this could in theory reduce our ability to optimize things
(by making expressions look unequal that formerly looked equal),
I do not think it's a big problem because our ability to optimize
full joins is pretty darn limited anyway.

Given the lack of complaints, I'm not real sure whether this is
worth back-patching.  Thoughts?

            regards, tom lane

diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 688b3a1..cd6e119 100644
*** a/src/backend/optimizer/prep/prepjointree.c
--- b/src/backend/optimizer/prep/prepjointree.c
*************** replace_vars_in_jointree(Node *jtnode,
*** 2044,2049 ****
--- 2044,2061 ----
          }
          replace_vars_in_jointree(j->larg, context, lowest_nulling_outer_join);
          replace_vars_in_jointree(j->rarg, context, lowest_nulling_outer_join);
+
+         /*
+          * Use PHVs within the join quals of a full join, even when it's the
+          * lowest nulling outer join.  Otherwise, we cannot identify which
+          * side of the join a pulled-up var-free expression came from, which
+          * can lead to failure to make a plan at all because none of the quals
+          * appear to be mergeable or hashable conditions.  For this purpose we
+          * don't care about the state of wrap_non_vars, so leave it alone.
+          */
+         if (j->jointype == JOIN_FULL)
+             context->need_phvs = true;
+
          j->quals = pullup_replace_vars(j->quals, context);

          /*

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Maximum password length
Next
From: Isaac Morland
Date:
Subject: Re: Maximum password length