Re: Removing Functionally Dependent GROUP BY Columns - Mailing list pgsql-hackers

From David Rowley
Subject Re: Removing Functionally Dependent GROUP BY Columns
Date
Msg-id CAKJS1f_WgEuiazFEoXpTB_vTDZ_FHKrPxwZ0BiQdMwQ_4kmYmA@mail.gmail.com
Whole thread Raw
In response to Re: Removing Functionally Dependent GROUP BY Columns  (Julien Rouhaud <julien.rouhaud@dalibo.com>)
Responses Re: Removing Functionally Dependent GROUP BY Columns  (Julien Rouhaud <julien.rouhaud@dalibo.com>)
List pgsql-hackers
Many thanks for the thorough review!

On 12 January 2016 at 23:37, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:

In identify_key_vars()

+       /* Only PK constraints are of interest for now, see comment above */
+       if (con->contype != CONSTRAINT_PRIMARY)
+           continue;

I think the comment should better refer to
remove_useless_groupby_columns() (don't optimize further than what
check_functional_grouping() does).


I've improved this by explaining it more clearly.
 

+       /*
+        * Exit if the constraint is deferrable, there's no point in looking
+        * for another PK constriant, as there can only be one.
+        */

small typo on "constriant"


Fixed.
 


+               {
+                   varlistmatches = bms_add_member(varlistmatches,
varlistidx);
+                   found_col = true;
+                   /* don't break, there might be dupicates */
+               }

small typo on "dupicates". Also, I thought transformGroupClauseExpr()
called in the parse analysis make sure that there isn't any duplicate in
the GROUP BY clause. Do I miss something?


No I missed something :) You are right. I've put a break; here and a comment to explain why.
 

in remove_useless_groupby_columns()

+       /*
+        * Skip if there were no Vars found in the GROUP BY clause which
belong
+        * to this relation. We can also skip if there's only 1 Var, as
there
+        * needs to be more than one Var for there to be any columns
which are
+        * functionally dependent on another column.
+        */


This comment doesn't sound very clear. I'm not a native english speaker,
so maybe it's just me.


Yes it was badly worded. I've fixed in the attached.
 
+   /*
+    * If we found any surplus Vars in the GROUP BY clause, then we'll build
+    * a new GROUP BY clause without these surplus Vars.
+    */
+   if (anysurplus)
+   {
+       List *new_groupby = NIL;
+
+       foreach (lc, root->parse->groupClause)
+       {
+           SortGroupClause *sgc = (SortGroupClause *) lfirst(lc);
+           TargetEntry *tle;
+           Var *var;
+
+           tle = get_sortgroupclause_tle(sgc, root->parse->targetList);
+           var = (Var *) tle->expr;
+
+           if (!IsA(var, Var))
+               continue;
+ [...]

if var isn't a Var, it needs to be added in new_groupby.


Yes you're right, well, at least I've written enough code to ensure that it's not needed.
Technically we could look inside non-Vars and check if the Expr is just made up of Vars from a single relation, and then we could also make that surplus if we find other Vars which make up the table's primary key.  I didn't make these changes as I thought it was a less likely scenario. It wouldn't be too much extra code to add however. I've went and added an XXX comment to explain that there might be future optimisation opportunities in the future around this. 
 
I've attached an updated patch.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Truncation of identifiers
Next
From: Gavin Flower
Date:
Subject: Re: Truncation of identifiers