Re: [PATCH] Erase the distinctClause if the result is unique by definition - Mailing list pgsql-hackers

From Andy Fan
Subject Re: [PATCH] Erase the distinctClause if the result is unique by definition
Date
Msg-id CAKU4AWqM_HM1HHVnFhYx2dhnOwV0HGOGgPtFOO4ogz1FFtLKQg@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Erase the distinctClause if the result is unique by definition  (Andy Fan <zhihui.fan1213@gmail.com>)
Responses Re: [PATCH] Erase the distinctClause if the result is unique by definition
List pgsql-hackers
Upload the newest patch so that the cfbot can pass.  The last patch failed 
because some explain without the (cost off). 

I'm still on the way to figure out how to handle aggregation calls without 
aggregation path.  Probably we can get there by hacking some 
ExprEvalPushStep for Aggref node.  But since the current patch not tied 
with this closely,   so I would put this patch for review first. 



On Wed, Mar 4, 2020 at 9:13 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

 
* There are some changes in existing regression cases that aren't
visibly related to the stated purpose of the patch, eg it now
notices that "select distinct max(unique2) from tenk1" doesn't
require an explicit DISTINCT step.  That's not wrong, but I wonder
if maybe you should subdivide this patch into more than one patch,
because that must be coming from some separate change.  I'm also
wondering what caused the plan change in expected/join.out.

Per my purpose it should be in the same patch,  the logical here is we 
have distinct in the sql and the query is distinct already since the max
function (the rule is defined in query_is_distinct_agg which is splited from 
the original query_is_distinct_for clause).  

I think I was right until I come into contrib/postgres_fdw/sql/postgres_fdw.sql. 
Per my understanding,  the query the result of  "select max(a) from t"  is unique
since the aggregation function and has no group clause there.  But in the 
postgres_fdw.sql case,  the Query->hasAggs is true for "select distinct 
(select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) 
from ft2 t2 where t2.c2 % 6 = 0 order by 1;"  This looks very strange to me. 
Is my understanding wrong or there is a bug here?

query->hasAggs was set to true in the following call stack. 

 pstate->p_hasAggs = true;     

.. 

 qry->hasAggs =  pstate->p_hasAggs;    


0  in check_agglevels_and_constraints of parse_agg.c:343
1  in transformAggregateCall of parse_agg.c:236
2  in ParseFuncOrColumn of parse_func.c:805
3  in transformFuncCall of parse_expr.c:1558
4  in transformExprRecurse of parse_expr.c:265
5  in transformExpr of parse_expr.c:155
6  in transformTargetEntry of parse_target.c:105
7  in transformTargetList of parse_target.c:193
8  in transformSelectStmt of analyze.c:1224
9  in transformStmt of analyze.c:301

You can see the new updated patch which should fix all the issues you point out 
except the one for supporting group by.   The another reason for this patch will 
not be the final one is because the changes for postgres_fdw.out is too arbitrary. 
uploading it now just for reference.  (The new introduced guc variable can be
removed at last, keeping it now just make sure the testing is easier.)


At a high level, I'm a bit disturbed that this focuses only on DISTINCT
and doesn't (appear to) have any equivalent intelligence for GROUP BY,
though surely that offers much the same opportunities for optimization.
It seems like it'd be worthwhile to take a couple steps back and see
if we couldn't recast the logic to work for both.


OK,  Looks group by is a bit harder  than distinct since the aggregation function. 
I will go through the code to see where to add this logic.  
 

Can we grantee  any_aggr_func(a) == a  if only 1 row returned,  if so, we can do
some work on the pathtarget/reltarget by transforming the Aggref to raw expr.
I checked the execution path of the aggregation call, looks it depends on Agg node 
which is the thing we want to remove.  

We can't grantee  any_aggr_func(a) == a  when only 1 row returned, so the above 
method doesn't work.   do you have any suggestion for this? 
Attachment

pgsql-hackers by date:

Previous
From: Arseny Sher
Date:
Subject: Re: logical copy_replication_slot issues
Next
From: Kartyshov Ivan
Date:
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed