Re: WIP: Aggregation push-down - Mailing list pgsql-hackers

From Andy Fan
Subject Re: WIP: Aggregation push-down
Date
Msg-id CAKU4AWpzp1ZaDdD+hX8q4zpXhgNg9S93zNwmAc4Q3h5Xqj0NSA@mail.gmail.com
Whole thread Raw
In response to Re: WIP: Aggregation push-down  (Antonin Houska <ah@cybertec.at>)
Responses Re: WIP: Aggregation push-down
List pgsql-hackers


On Fri, Apr 24, 2020 at 8:10 PM Antonin Houska <ah@cybertec.at> wrote:
Andy Fan <zhihui.fan1213@gmail.com> wrote:

> The more tests on your patch, the more powerful I feel it is!

Thanks for the appreciation. Given the poor progress it's increasingly hard
for me to find motivation to work on it. I'll try to be more professional :-)


Let's not give up:)  I see your patch can push down the aggregation in 3 cases
at least:

1).  The group by clause exists in the join eq clause.
2).  The group by clause doesn't exist in join eq clause, but we have 
pk on on side of join eq clause. 
3).  The group by clause doesn't exist in join eq clause, and we don't have 
the pk as well. 

Tom well explained the correctness of the first 2 cases [1] and probably the case
3) is correct as well, but it is a bit of hard to understand.  If this is a case for others
as well, that probably make the review harder. 

So my little suggestion is can we split the patch into some smaller commit
to handle each case?  like: commit 1 & 2 handles case 1 & 2,commit 3 handles
join relation as well.   commit 4 handles the case 3.   Commit 5 can avoid the 
two-phase aggregation for case 2.  Commit 6 introduces the aggmultifn.  and
commit 7 to handle some outer join case Ashutosh raised at [2].  However not
all the cases need to be handled at one time. 

Actually when I read the code, I want such separation by my own to verify my
understanding is correct,  but I don't have such time at least now. It maybe much
easier for you since you are the author.  I will be pretty pleasure to help in any case
after I close my current working item (Hopefully in 2 months). 

> At the code level, I did some slight changes on init_grouping_targets which may
> make the code easier to read.  You are free to to use/not use it.

I'm going to accept your change of create_rel_agg_info(), but I hesitate about
the changes to init_grouping_targets().

First, is it worth to spend CPU cycles on construction of an extra list
grouping_columns? Is there a corner case in which we cannot simply pass
grouping_columns=target->exprs to check_functional_grouping()?

Second, it's obvious that you prefer the style

    foreach ()
    {
        if ()
           ...
        else if ()
           ...
        else
           ...
    }

over this

    foreach ()
    {
        if ()
        {
           ...
           continue;
        }

        if ()
        {
           ...
           continue;
        }

        ...
    }

I often prefer the latter and I see that the existing planner code uses this
style quite often too. I think the reason is that it allows for more complex
tests, while the "else-if style" requires all tests to take place inside the
"if ()" expression. However, if several (not necessarily tightly related)
tests become "compressed" this way, it's less obvious how where to put
comments.  Indeed it seems that some comments got lost due to your changes.

Your explanation looks reasonable,  thanks for that.  the changes also used
some builtin function to avoid the one more level for-loop. like tlist_member.  

As for the high level design, based on my current knowledge,  probably no need
to change.  


 

pgsql-hackers by date:

Previous
From: 曾文旌
Date:
Subject: Re: [Proposal] Global temporary tables
Next
From: David Rowley
Date:
Subject: Re: Subplan result caching