Re: [HACKERS] PoC: full merge join on comparison clause - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: [HACKERS] PoC: full merge join on comparison clause
Date
Msg-id CAFjFpRdPUCXOe9bcAQWnRBTU1jAtPPa8fMJrn6oAtttBj=HniA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] PoC: full merge join on comparison clause  (Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>)
Responses Re: [HACKERS] PoC: full merge join on comparison clause  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Re: [HACKERS] PoC: full merge join on comparison clause  (Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>)
List pgsql-hackers
Hi,
I have started reviewing these patches. I haven't grasped the design
yet. But here are some comments on the first patch.


-    clauses = (MergeJoinClause) palloc0(nClauses *
sizeof(MergeJoinClauseData));
+    parent->mj_Clauses = (MergeJoinClause) palloc0(nClauses *
sizeof(MergeJoinClauseData));

crosses 80 characters.

-        StrategyNumber opstrategy = mergestrategies[iClause];
+        StrategyNumber sort_strategy = mergestrategies[iClause];
-        int            op_strategy;
+        int            join_strategy;
I don't see a reason why should we change the name of variable here. These are
operator strategies and there's no need to change their names. The name change
is introducing unnecessary diffs.

+        clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args),
(PlanState *) parent);
+        clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args),
(PlanState *) parent);

cross 80 characters.

 /*
@@ -378,20 +375,29 @@ MJEvalInnerValues(MergeJoinState *mergestate,
TupleTableSlot *innerslot)
     return result;
 }

+/* Tuple comparison result */
+typedef enum
+{
+    MJCR_NextInner = 1,
+    MJCR_NextOuter = -1,
+    MJCR_Join = 0
+} MJCompareResult;
+
 /*
  * MJCompare
  *
- * Compare the mergejoinable values of the current two input tuples
- * and return 0 if they are equal (ie, the mergejoin equalities all
- * succeed), >0 if outer > inner, <0 if outer < inner.
+ * Compare the mergejoinable values of the current two input tuples.
+ * If they are equal, i.e., the mergejoin equalities all succeed,
+ * return MJCR_Join, if outer > inner, MJCR_NextInner, and else
+ * MJCR_NextOuter.
  *
  * MJEvalOuterValues and MJEvalInnerValues must already have been called
  * for the current outer and inner tuples, respectively.
  */
-static int
+static MJCompareResult
 MJCompare(MergeJoinState *mergestate)
 {

I am not sure about this change as well. MJCompare()'s job is to compare given
keys in the two tuples and return the comparison result. The result was used as
it is to decide which side to advance in an equality based merge join. But for
inequality based merge join the result needs to be interpreted further. I think
we should write a wrapper around MJCompare which interprets the result rather
than changing MJCompare itself. OR at least change the name of MJCompare. The
first option is better in case we use MJCompare for purposes other than merge
join in future. I am not sure what those could be, but say a merge based union
or something like that.

     /*
      * Sweep through the existing EquivalenceClasses looking for matches to
@@ -273,7 +274,7 @@ process_equivalence(PlannerInfo *root,
         /*
          * A "match" requires matching sets of btree opfamilies.  Use of
          * equal() for this test has implications discussed in the comments
-         * for get_mergejoin_opfamilies().
+         * for get_equality_opfamilies().

I think we should leave mergejoin word in there or at least indicate that these
are btree opfamilies so that we don't confuse it with hash equality operator
families.

It will be good if you can write something about why these changes are
required in the file. If you are using git format-patch, you could
write a commit message that gets added to the patch. That way, it
leaves there for anybody to review.

I am having a difficult time reading the next patch. There are various
changes in the second patch, which I don't understand the reason
behind. I think some comments will help, in as commit message or in
the code.

I will continue reviewing the patches.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pgsql-hackers by date:

Previous
From: Antonin Houska
Date:
Subject: Re: Push down Aggregates below joins
Next
From: Alexander Kuzmenkov
Date:
Subject: Re: Generating partitioning tuple conversion maps faster