Re: Removing unneeded self joins - Mailing list pgsql-hackers

From jian he
Subject Re: Removing unneeded self joins
Date
Msg-id CACJufxF-ubw739W+33_rFSA+YTdRTDcKwVidKwXAS8NCy4nx+Q@mail.gmail.com
Whole thread Raw
In response to Re: Removing unneeded self joins  (Andrei Lepikhov <lepihov@gmail.com>)
Responses Re: Removing unneeded self joins
List pgsql-hackers
On Tue, Jul 9, 2024 at 2:06 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> On 7/2/24 07:25, jian he wrote:
> > to make sure it's correct, I have added a lot of tests,
> > Some of this may be contrived, maybe some of the tests are redundant.
> Thanks for your job!
> I passed through the patches and have some notes:
> 1. Patch 0001 has not been applied anymore since the previous week's
> changes in the core. Also, there is one place with trailing whitespace.

thanks.
because the previous thread mentioned the EPQ problem.
in remove_useless_self_joins, i make it can only process CMD_SELECT query.

also thanks to Alexander Korotkov's tip.
I added a bool change_RangeTblRef to ChangeVarNodes_context.
default is true, so won't influence ChangeVarNodes.
After that create a function ChangeVarNodesExtended.
so now replace_varno replaced by ChangeVarNodes.


now in ChangeVarNodes_walker we've add:
```
if (IsA(node, RestrictInfo))
{
RestrictInfo *rinfo = (RestrictInfo *) node;
int relid = -1;
bool is_req_equal =
(rinfo->required_relids == rinfo->clause_relids);
bool clause_relids_is_multiple =
(bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
...
}
```
but this part, we don't have much comments, adding some comments would be good.
but I am not sure how.


static bool
match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
Index relid)
but actually we call it via:
if (!match_unique_clauses(root, inner, uclauses, outer->relid))

I am not sure whether the second argument is "inner" or "outer".
Maybe it will cause confusion.
same with innerrel_is_unique_ext.



/*
* At this stage, joininfo lists of inner and outer can contain
* only clauses, required for a superior outer join that can't
* influence this optimization. So, we can avoid to call the
* build_joinrel_restrictlist() routine.
*/
restrictlist = generate_join_implied_equalities(root, joinrelids,
inner->relids,
outer, NULL);
build_joinrel_restrictlist require joinrel, innerrel, outrel, but here
we only have innerrel, outterrel.
so i am confused with the comments.



i add following code snippets after generate_join_implied_equalities
```
if (restrictlist == NIL)
continue
```



I have some confusion with the comments.
/*
* Determine if the inner table can duplicate outer rows.  We must
* bypass the unique rel cache here since we're possibly using a
* subset of join quals. We can use 'force_cache' == true when all
* join quals are self-join quals.  Otherwise, we could end up
* putting false negatives in the cache.
*/
if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
outer, JOIN_INNER, selfjoinquals,
list_length(otherjoinquals) == 0,
&uclauses))
continue;

"unique rel cache", not sure the meaning, obviously, "relcache" has a
different meaning.
so i am slightly confused with
"We must bypass the unique rel cache here since we're possibly using a
 subset of join quals"



i have refactored comments below
```
if (!match_unique_clauses(root, inner, uclauses, outer->relid))
```.
please check v5-0002 for comments refactor.

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Pluggable cumulative statistics
Next
From: Tatsuo Ishii
Date:
Subject: CFbot failed on Windows platform