Re: Removing unneeded self joins - Mailing list pgsql-hackers

From Ronan Dunklau
Subject Re: Removing unneeded self joins
Date
Msg-id 2375492.jE0xQCEvom@aivenronan
Whole thread Raw
In response to Re: Removing unneeded self joins  (Andrey Lepikhov <a.lepikhov@postgrespro.ru>)
Responses Re: Removing unneeded self joins
List pgsql-hackers
Le jeudi 30 juin 2022, 16:11:51 CEST Andrey Lepikhov a écrit :
> On 19/5/2022 16:47, Ronan Dunklau wrote:
> > I'll take a look at that one.
>
> New version of the patch, rebased on current master:
> 1. pgindent over the patch have passed.
> 2. number of changed files is reduced.
> 3. Some documentation and comments is added.

Hello Andrey,

Thanks for the updates.

The general approach seems sensible to me, so I'm going to focus on some details.

In a very recent thread [1], Tom Lane is proposing to add infrastructure to make Var aware of their nullability by
outerjoins. I wonder if that would help with avoiding the need for adding is not null clauses when the column is known
notnull ? 
If we have a precedent for adding a BitmapSet to the Var itself, maybe the whole discussion regarding keeping track of
nullabilitycan be extended to the original column nullability ? 

Also, I saw it was mentioned earlier in the thread but how difficult would it be to process the transformed quals
throughthe EquivalenceClass machinery and the qual simplification ?  
For example, if the target audience of this patch is ORM, or inlined views, it wouldn't surprise me to see queries of
thiskind in the wild, which could be avoided altogether: 

postgres=# explain (costs off) select * from sj s1 join sj s2 on s1.a = s2.a where s1.b = 2 and s2.b =3;
                     QUERY PLAN
-----------------------------------------------------
 Seq Scan on sj s2
   Filter: ((a IS NOT NULL) AND (b = 3) AND (b = 2))
(2 lignes)


+       for (counter = 0; counter < list_length(*sources);)
+       {
+               ListCell   *cell = list_nth_cell(*sources, counter);
+               RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+               int                     counter1;
+
....
+                 ec->ec_members = list_delete_cell(ec->ec_members, cell);


Why don't you use foreach() and foreach_delete_current macros for iterating and removing items in the lists, both in
update_ec_membersand update_ec_sources ? 


+                               if ((bms_is_member(k, info->syn_lefthand) ^
+                                        bms_is_member(r, info->syn_lefthand)) ||
+                                       (bms_is_member(k, info->syn_righthand) ^
+                                        bms_is_member(r, info->syn_righthand)))

I think this is more compact and easier to follow than the previous version, but I'm not sure how common it is in
postgressource code to use that kind of construction ? 

Some review about the comments:


I see you keep using the terms "the keeping relation" and "the removing relation" in reference to the relation that is
keptand the one that is removed. 
Aside from the grammar (the kept relation or the removed relation), maybe it would make it clearer to call them
somethingelse. In other parts of the code, you used "the remaining relation / the removed relation" which makes sense. 

 /*
  * Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.

This could be rephrased: ", optionally replacing it with another relid. The latter is used by the self-join removing
process."


[1] https://www.postgresql.org/message-id/flat/830269.1656693747%40sss.pgh.pa.us

--
Ronan Dunklau





pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: list of TransactionIds
Next
From: Peter Eisentraut
Date:
Subject: Re: pgsql: Change timeline field of IDENTIFY_SYSTEM to int8