Re: [HACKERS] PoC: full merge join on comparison clause - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: [HACKERS] PoC: full merge join on comparison clause |
Date | |
Msg-id | CAFjFpRfAxa6CEKVA4E0mYXY+R+=2+o7H+Yk5h3PMfdRx5cTYvQ@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] PoC: full merge join on comparison clause (Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>) |
Responses |
Re: [HACKERS] PoC: full merge join on comparison clause
|
List | pgsql-hackers |
On Tue, Jul 10, 2018 at 10:06 PM, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote: > I tried to fix the things you mentioned and improve the comments. Among > other changes, there is now a description of how merge join works with > inequalities at the top of nodeMergejoin.c. It also explains why we only > support one inequality clause. Thanks for the commit messages. I would use word "in-equality" instead of "comparison" since equality is also a comparison. > > Some particular points: > > On 07/06/2018 04:01 PM, Ashutosh Bapat wrote: >> >> - StrategyNumber opstrategy = mergestrategies[iClause]; >> + StrategyNumber sort_strategy = mergestrategies[iClause]; >> - int op_strategy; >> + int join_strategy; >> I don't see a reason why should we change the name of variable here. These >> are >> operator strategies and there's no need to change their names. The name >> change >> is introducing unnecessary diffs. > > > These variables have different meaning but their names differ only with an > underscore. When I had to change this function, I made mistakes because of > this. I'd keep the descriptive names to avoid further confusion. Should this > be a separate patch? No, 0001 suffice. But I am still not sure that the variable name change is worth the trouble. Anyway, will leave this for a committer to judge. > > This is just a cross-check for the planner. Added a comment. We should > probably use a separate error code for internal errors as opposed to user > errors, but I'm not sure if we have one, I see just elog(ERROR) being used > everywhere. elog(ERROR) is fine. Thanks for the comments. - if (op_mergejoinable(opno, exprType(leftarg)) && - !contain_volatile_functions((Node *) clause)) - restrictinfo->mergeopfamilies = get_mergejoin_opfamilies(opno); + if (!contain_volatile_functions((Node *) clause)) + { + if (op_mergejoinable_equality(opno, exprType(leftarg))) Why is this condition split. Also why is the change in the order of conditions? + { + restrictinfo->mergeopfamilies = get_btree_equality_opfamilies(opno); + restrictinfo->is_mj_equality = true; Comparing this with the original code, I think, is_mj_equality should be true if restrictinfo->mergeopfamilies is not NIL. There is no way that a clause can act as an equality clause when there are no families in which the operator is an equality operator. If restrictinfo->mergeopfamilies can not be NIL here, probably we should add an Assert and a bit of explanation as to why is_mj_equality is true. With this work the meaning of oprcanmerge (See pg_operator catalog and also CREATE OPERATOR syntax) changes. Every btree operator can now be used to perform a merge join. oprcanmerge however only indicates whether an operator is an equality or not. Have you thought about that? Do we require to re-define oprcanmerge? + * + * If the inequality clause is not the last one, or if there are several + * of them, this algorithm doesn't work, because it is not possible to + * sort the inputs in such a way that given an outer tuple, the matching + * inner tuples form a contiguous interval. I think, it should be possible to use this technique with more than one inequality clauses as long as all the operators require the input to be ordered in the same direction and the clauses are ANDed. In that case the for a given outer tuple the matching inner tuples form a contiguous interval. I think it's better to straighten out these things before diving further into the 2nd patch. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
pgsql-hackers by date: