Thread: WIP: Aggregation push-down - take2

WIP: Aggregation push-down - take2

From
"Fujii.Yuki@df.MitsubishiElectric.co.jp"
Date:
Hi everyone.

I develop postgresql's extension such as fdw in my work.
I'm interested in using postgresql for OLAP.
After [1] having been withdrawn, I reviewed [1].
I think that this patch is realy useful when using OLAP queries.
Furthermore, I think it would be more useful if this patch works on a foreign table.
So, I would like to ask you a question on this patch in this new thread.

I changed this patch a little and confirmed that my idea is true.
The followings are things I found and differences of between my prototype and this patch.
  1. Things I found
   I execute a query which contain join of postgres_fdw's foreign table and a table and aggregation of the join result.
   In my setting, my prototype reduce this query's response by 93%.
  2. Differences between my prototype and this patch
   (1) Pushdown aggregation of foeign table if FDW pushdown partial aggregation
   (2) postgres_fdw pushdowns some partial aggregations
I attached my prototype source code and content of my experiment.
I want to resume development of this patch if there is some possibility of accept of this patch's function.
I took a contact to Mr.Houska on resuming development of this patch.
As a result, Mr.Houska advised for me that I ask in pgsql-hackers whether any reviewers / committers are
interested to work on the patch.
Is anyone interested in my work?

Sincerely yours.
Yuuki Fujii

[1] https://commitfest.postgresql.org/32/

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

Attachment

RE: WIP: Aggregation push-down - take2

From
"Fujii.Yuki@df.MitsubishiElectric.co.jp"
Date:
Hi everyone.

I rebased the following patches which were submitted in [1].
    v17-0001-Introduce-RelInfoList-structure.patch
    v17-0002-Aggregate-push-down-basic-functionality.patch
    v17-0003-Use-also-partial-paths-as-the-input-for-grouped-path.patch

I checked I can apply the rebased patch to commit 2cd2569c72b8920048e35c31c9be30a6170e1410.

I'm going to register the rebased patch in next commitfest.

Sincerely yours,
Yuuki Fujii

[1] https://commitfest.postgresql.org/32/

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

> -----Original Message-----
> From: Fujii.Yuki@df.MitsubishiElectric.co.jp
> <Fujii.Yuki@df.MitsubishiElectric.co.jp>
> Sent: Friday, April 15, 2022 4:33 PM
> To: pgsql-hackers@lists.postgresql.org
> Cc: david@pgmasters.net; ah@cybertec.at; tgl@sss.pgh.pa.us; Tomas Vondra
> <tomas.vondra@enterprisedb.com>; zhihui.fan1213@gmail.com;
> legrand_legrand@hotmail.com; daniel@yesql.se
> Subject: [CAUTION!! MELCO?] WIP: Aggregation push-down - take2
>
> Hi everyone.
>
> I develop postgresql's extension such as fdw in my work.
> I'm interested in using postgresql for OLAP.
> After [1] having been withdrawn, I reviewed [1].
> I think that this patch is realy useful when using OLAP queries.
> Furthermore, I think it would be more useful if this patch works on a foreign
> table.
> So, I would like to ask you a question on this patch in this new thread.
>
> I changed this patch a little and confirmed that my idea is true.
> The followings are things I found and differences of between my prototype and
> this patch.
>   1. Things I found
>    I execute a query which contain join of postgres_fdw's foreign table and a
> table and aggregation of the join result.
>    In my setting, my prototype reduce this query's response by 93%.
>   2. Differences between my prototype and this patch
>    (1) Pushdown aggregation of foeign table if FDW pushdown partial
> aggregation
>    (2) postgres_fdw pushdowns some partial aggregations I attached my
> prototype source code and content of my experiment.
> I want to resume development of this patch if there is some possibility of
> accept of this patch's function.
> I took a contact to Mr.Houska on resuming development of this patch.
> As a result, Mr.Houska advised for me that I ask in pgsql-hackers whether any
> reviewers / committers are interested to work on the patch.
> Is anyone interested in my work?
>
> Sincerely yours.
> Yuuki Fujii
>
> [1] https://commitfest.postgresql.org/32/
>
> --
> Yuuki Fujii
> Information Technology R&D Center Mitsubishi Electric Corporation

Attachment

Re: WIP: Aggregation push-down - take2

From
Tomas Vondra
Date:
Hi,

On 7/12/22 08:49, Fujii.Yuki@df.MitsubishiElectric.co.jp wrote:
> Hi everyone.
> 
> I rebased the following patches which were submitted in [1].
>     v17-0001-Introduce-RelInfoList-structure.patch
>     v17-0002-Aggregate-push-down-basic-functionality.patch
>     v17-0003-Use-also-partial-paths-as-the-input-for-grouped-path.patch
> 
> I checked I can apply the rebased patch to commit 2cd2569c72b8920048e35c31c9be30a6170e1410.
> 
> I'm going to register the rebased patch in next commitfest.
> 
I've started looking at this patch series again, but I wonder what's the
plan. The last patch version no longer applies, so I rebased it - see
the attachment. The failures were pretty minor, but there're two warnings:

pathnode.c:3174:11: warning: variable 'agg_exprs' set but not used
[-Wunused-but-set-variable]
        Node       *agg_exprs;
                    ^
pathnode.c:3252:11: warning: variable 'agg_exprs' set but not used
[-Wunused-but-set-variable]
        Node       *agg_exprs;
                    ^

so there seem to be some loose ends. Moreover, there are two failures in
make check, due to plan changes like this:

+ Finalize GroupAggregate
    Group Key: p.i
-   ->  Nested Loop
-         ->  Partial HashAggregate
-               Group Key: c1.parent
-               ->  Seq Scan on agg_pushdown_child1 c1
-         ->  Index Scan using agg_pushdown_parent_pkey on ...
-               Index Cond: (i = c1.parent)
-(8 rows)
+   ->  Sort
+         Sort Key: p.i
+         ->  Nested Loop
+               ->  Partial HashAggregate
+                     Group Key: c1.parent
+                     ->  Seq Scan on agg_pushdown_child1 c1
+               ->  Index Scan using agg_pushdown_parent_pkey on ...
+                     Index Cond: (i = c1.parent)
+(10 rows)

This seems somewhat strange - maybe the plan is correct, but the extra
sort seems unnecessary.

However, maybe I'm confused/missing something? The above message says
v17 having parts 0001-0003, but there's only one patch in v18. So maybe
I failed to apply some prior patch?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: WIP: Aggregation push-down - take2

From
Antonin Houska
Date:
Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

> Hi,
> 
> On 7/12/22 08:49, Fujii.Yuki@df.MitsubishiElectric.co.jp wrote:
> > Hi everyone.
> > 
> > I rebased the following patches which were submitted in [1].
> >     v17-0001-Introduce-RelInfoList-structure.patch
> >     v17-0002-Aggregate-push-down-basic-functionality.patch
> >     v17-0003-Use-also-partial-paths-as-the-input-for-grouped-path.patch
> > 
> > I checked I can apply the rebased patch to commit 2cd2569c72b8920048e35c31c9be30a6170e1410.
> > 
> > I'm going to register the rebased patch in next commitfest.
> > 
> I've started looking at this patch series again, but I wonder what's the
> plan. The last patch version no longer applies, so I rebased it - see
> the attachment. The failures were pretty minor, but there're two warnings:
> 
> pathnode.c:3174:11: warning: variable 'agg_exprs' set but not used
> [-Wunused-but-set-variable]
>         Node       *agg_exprs;
>                     ^
> pathnode.c:3252:11: warning: variable 'agg_exprs' set but not used
> [-Wunused-but-set-variable]
>         Node       *agg_exprs;
>                     ^
> 
> so there seem to be some loose ends. Moreover, there are two failures in
> make check, due to plan changes like this:
> 
> + Finalize GroupAggregate
>     Group Key: p.i
> -   ->  Nested Loop
> -         ->  Partial HashAggregate
> -               Group Key: c1.parent
> -               ->  Seq Scan on agg_pushdown_child1 c1
> -         ->  Index Scan using agg_pushdown_parent_pkey on ...
> -               Index Cond: (i = c1.parent)
> -(8 rows)
> +   ->  Sort
> +         Sort Key: p.i
> +         ->  Nested Loop
> +               ->  Partial HashAggregate
> +                     Group Key: c1.parent
> +                     ->  Seq Scan on agg_pushdown_child1 c1
> +               ->  Index Scan using agg_pushdown_parent_pkey on ...
> +                     Index Cond: (i = c1.parent)
> +(10 rows)
> 
> This seems somewhat strange - maybe the plan is correct, but the extra
> sort seems unnecessary.
> 
> However, maybe I'm confused/missing something? The above message says
> v17 having parts 0001-0003, but there's only one patch in v18. So maybe
> I failed to apply some prior patch?

I've rebased the last version I had on my workstation (v17), the regression
tests just worked. Maybe v18 was messed up. v20 is attached.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

From 04219da3cbf5be5e1e6243dc3615a57a3925c7de Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Fri, 4 Nov 2022 15:02:57 +0100
Subject: [PATCH 1/3] Introduce RelInfoList structure.

This patch puts join_rel_list and join_rel_hash fields of PlannerInfo
structure into a new structure RelInfoList. It also adjusts add_join_rel() and
find_join_rel() functions so they only call add_rel_info() and find_rel_info()
respectively.

fetch_upper_rel() now uses the new API and the hash table as well because the
list stored in root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG] will contain many
relations as soon as the aggregate push-down feature is added.
---
 contrib/postgres_fdw/postgres_fdw.c    |   3 +-
 src/backend/optimizer/geqo/geqo_eval.c |  12 +-
 src/backend/optimizer/plan/planmain.c  |   3 +-
 src/backend/optimizer/util/relnode.c   | 170 ++++++++++++++-----------
 src/include/nodes/pathnodes.h          |  31 +++--
 5 files changed, 126 insertions(+), 93 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 8d7500abfb..bb1125e57c 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -5777,7 +5777,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
      */
     Assert(fpinfo->relation_index == 0);    /* shouldn't be set yet */
     fpinfo->relation_index =
-        list_length(root->parse->rtable) + list_length(root->join_rel_list);
+        list_length(root->parse->rtable) +
+        list_length(root->join_rel_list->items);
 
     return true;
 }
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 004481d608..7ad0baaa0f 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -92,11 +92,11 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
      *
      * join_rel_level[] shouldn't be in use, so just Assert it isn't.
      */
-    savelength = list_length(root->join_rel_list);
-    savehash = root->join_rel_hash;
+    savelength = list_length(root->join_rel_list->items);
+    savehash = root->join_rel_list->hash;
     Assert(root->join_rel_level == NULL);
 
-    root->join_rel_hash = NULL;
+    root->join_rel_list->hash = NULL;
 
     /* construct the best path for the given combination of relations */
     joinrel = gimme_tree(root, tour, num_gene);
@@ -121,9 +121,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
      * Restore join_rel_list to its former state, and put back original
      * hashtable if any.
      */
-    root->join_rel_list = list_truncate(root->join_rel_list,
-                                        savelength);
-    root->join_rel_hash = savehash;
+    root->join_rel_list->items = list_truncate(root->join_rel_list->items,
+                                               savelength);
+    root->join_rel_list->hash = savehash;
 
     /* release all the memory acquired within gimme_tree */
     MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 63deed27c9..55de28f073 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -65,8 +65,7 @@ query_planner(PlannerInfo *root,
      * NOTE: append_rel_list was set up by subquery_planner, so do not touch
      * here.
      */
-    root->join_rel_list = NIL;
-    root->join_rel_hash = NULL;
+    root->join_rel_list = makeNode(RelInfoList);
     root->join_rel_level = NULL;
     root->join_cur_level = 0;
     root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 1786a3dadd..c75d2d1f19 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -32,11 +32,15 @@
 #include "utils/lsyscache.h"
 
 
-typedef struct JoinHashEntry
+/*
+ * An entry of a hash table that we use to make lookup for RelOptInfo
+ * structures more efficient.
+ */
+typedef struct RelInfoEntry
 {
-    Relids        join_relids;    /* hash key --- MUST BE FIRST */
-    RelOptInfo *join_rel;
-} JoinHashEntry;
+    Relids        relids;            /* hash key --- MUST BE FIRST */
+    void       *data;
+} RelInfoEntry;
 
 static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
                                 RelOptInfo *input_rel);
@@ -389,11 +393,11 @@ find_base_rel(PlannerInfo *root, int relid)
 }
 
 /*
- * build_join_rel_hash
- *      Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ *      Construct the auxiliary hash table for relation specific data.
  */
 static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
 {
     HTAB       *hashtab;
     HASHCTL        hash_ctl;
@@ -401,47 +405,49 @@ build_join_rel_hash(PlannerInfo *root)
 
     /* Create the hash table */
     hash_ctl.keysize = sizeof(Relids);
-    hash_ctl.entrysize = sizeof(JoinHashEntry);
+    hash_ctl.entrysize = sizeof(RelInfoEntry);
     hash_ctl.hash = bitmap_hash;
     hash_ctl.match = bitmap_match;
     hash_ctl.hcxt = CurrentMemoryContext;
-    hashtab = hash_create("JoinRelHashTable",
+    hashtab = hash_create("RelHashTable",
                           256L,
                           &hash_ctl,
                           HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
 
     /* Insert all the already-existing joinrels */
-    foreach(l, root->join_rel_list)
+    foreach(l, list->items)
     {
-        RelOptInfo *rel = (RelOptInfo *) lfirst(l);
-        JoinHashEntry *hentry;
+        RelOptInfo       *rel = lfirst_node(RelOptInfo, l);
+        RelInfoEntry *hentry;
         bool        found;
 
-        hentry = (JoinHashEntry *) hash_search(hashtab,
-                                               &(rel->relids),
-                                               HASH_ENTER,
-                                               &found);
+        hentry = (RelInfoEntry *) hash_search(hashtab,
+                                              &rel->relids,
+                                              HASH_ENTER,
+                                              &found);
         Assert(!found);
-        hentry->join_rel = rel;
+        hentry->data = rel;
     }
 
-    root->join_rel_hash = hashtab;
+    list->hash = hashtab;
 }
 
 /*
- * find_join_rel
- *      Returns relation entry corresponding to 'relids' (a set of RT indexes),
- *      or NULL if none exists.  This is for join relations.
+ * find_rel_info
+ *      Find a base or join relation entry.
  */
-RelOptInfo *
-find_join_rel(PlannerInfo *root, Relids relids)
+static void *
+find_rel_info(RelInfoList *list, Relids relids)
 {
+    if (list == NULL)
+        return NULL;
+
     /*
      * Switch to using hash lookup when list grows "too long".  The threshold
      * is arbitrary and is known only here.
      */
-    if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
-        build_join_rel_hash(root);
+    if (!list->hash && list_length(list->items) > 32)
+        build_rel_hash(list);
 
     /*
      * Use either hashtable lookup or linear search, as appropriate.
@@ -451,34 +457,82 @@ find_join_rel(PlannerInfo *root, Relids relids)
      * so would force relids out of a register and thus probably slow down the
      * list-search case.
      */
-    if (root->join_rel_hash)
+    if (list->hash)
     {
         Relids        hashkey = relids;
-        JoinHashEntry *hentry;
+        RelInfoEntry *hentry;
 
-        hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
-                                               &hashkey,
-                                               HASH_FIND,
-                                               NULL);
+        hentry = (RelInfoEntry *) hash_search(list->hash,
+                                              &hashkey,
+                                              HASH_FIND,
+                                              NULL);
         if (hentry)
-            return hentry->join_rel;
+            return hentry->data;
     }
     else
     {
         ListCell   *l;
 
-        foreach(l, root->join_rel_list)
+        foreach(l, list->items)
         {
-            RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+            RelOptInfo   *item = lfirst_node(RelOptInfo, l);
 
-            if (bms_equal(rel->relids, relids))
-                return rel;
+            if (bms_equal(item->relids, relids))
+                return item;
         }
     }
 
     return NULL;
 }
 
+/*
+ * find_join_rel
+ *      Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ *      or NULL if none exists.  This is for join relations.
+ */
+RelOptInfo *
+find_join_rel(PlannerInfo *root, Relids relids)
+{
+    return (RelOptInfo *) find_rel_info(root->join_rel_list, relids);
+}
+
+/*
+ * add_rel_info
+ *        Add relation specific info to a list, and also add it to the auxiliary
+ *        hashtable if there is one.
+ */
+static void
+add_rel_info(RelInfoList *list, RelOptInfo *rel)
+{
+    /* GEQO requires us to append the new joinrel to the end of the list! */
+    list->items = lappend(list->items, rel);
+
+    /* store it into the auxiliary hashtable if there is one. */
+    if (list->hash)
+    {
+        RelInfoEntry *hentry;
+        bool        found;
+
+        hentry = (RelInfoEntry *) hash_search(list->hash,
+                                              &rel->relids,
+                                              HASH_ENTER,
+                                              &found);
+        Assert(!found);
+        hentry->data = rel;
+    }
+}
+
+/*
+ * add_join_rel
+ *        Add given join relation to the list of join relations in the given
+ *        PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+    add_rel_info(root->join_rel_list, joinrel);
+}
+
 /*
  * set_foreign_rel_properties
  *        Set up foreign-join fields if outer and inner relation are foreign
@@ -529,32 +583,6 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
     }
 }
 
-/*
- * add_join_rel
- *        Add given join relation to the list of join relations in the given
- *        PlannerInfo. Also add it to the auxiliary hashtable if there is one.
- */
-static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
-{
-    /* GEQO requires us to append the new joinrel to the end of the list! */
-    root->join_rel_list = lappend(root->join_rel_list, joinrel);
-
-    /* store it into the auxiliary hashtable if there is one. */
-    if (root->join_rel_hash)
-    {
-        JoinHashEntry *hentry;
-        bool        found;
-
-        hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
-                                               &(joinrel->relids),
-                                               HASH_ENTER,
-                                               &found);
-        Assert(!found);
-        hentry->join_rel = joinrel;
-    }
-}
-
 /*
  * build_join_rel
  *      Returns relation entry corresponding to the union of two given rels,
@@ -1223,22 +1251,14 @@ subbuild_joinrel_joinlist(RelOptInfo *joinrel,
 RelOptInfo *
 fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
 {
+    RelInfoList *list = &root->upper_rels[kind];
     RelOptInfo *upperrel;
-    ListCell   *lc;
-
-    /*
-     * For the moment, our indexing data structure is just a List for each
-     * relation kind.  If we ever get so many of one kind that this stops
-     * working well, we can improve it.  No code outside this function should
-     * assume anything about how to find a particular upperrel.
-     */
 
     /* If we already made this upperrel for the query, return it */
-    foreach(lc, root->upper_rels[kind])
+    if (list)
     {
-        upperrel = (RelOptInfo *) lfirst(lc);
-
-        if (bms_equal(upperrel->relids, relids))
+        upperrel = find_rel_info(list, relids);
+        if (upperrel)
             return upperrel;
     }
 
@@ -1257,7 +1277,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
     upperrel->cheapest_unique_path = NULL;
     upperrel->cheapest_parameterized_paths = NIL;
 
-    root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
+    add_rel_info(&root->upper_rels[kind], upperrel);
 
     return upperrel;
 }
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 09342d128d..0ca7d5ab51 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -80,6 +80,25 @@ typedef enum UpperRelationKind
     /* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
 } UpperRelationKind;
 
+/*
+ * Hashed list to store relation specific info and to retrieve it by relids.
+ *
+ * For small problems we just scan the list to do lookups, but when there are
+ * many relations we build a hash table for faster lookups. The hash table is
+ * present and valid when rel_hash is not NULL.  Note that we still maintain
+ * the list even when using the hash table for lookups; this simplifies life
+ * for GEQO.
+ */
+typedef struct RelInfoList
+{
+    pg_node_attr(no_copy_equal, no_read)
+
+    NodeTag        type;
+
+    List       *items;
+    struct HTAB *hash pg_node_attr(read_write_ignore);
+} RelInfoList;
+
 /*----------
  * PlannerGlobal
  *        Global information for planning/optimization
@@ -260,15 +279,9 @@ struct PlannerInfo
 
     /*
      * join_rel_list is a list of all join-relation RelOptInfos we have
-     * considered in this planning run.  For small problems we just scan the
-     * list to do lookups, but when there are many join relations we build a
-     * hash table for faster lookups.  The hash table is present and valid
-     * when join_rel_hash is not NULL.  Note that we still maintain the list
-     * even when using the hash table for lookups; this simplifies life for
-     * GEQO.
+     * considered in this planning run.
      */
-    List       *join_rel_list;
-    struct HTAB *join_rel_hash pg_node_attr(read_write_ignore);
+    struct RelInfoList *join_rel_list;    /* list of join-relation RelOptInfos */
 
     /*
      * When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -395,7 +408,7 @@ struct PlannerInfo
      * Upper-rel RelOptInfos. Use fetch_upper_rel() to get any particular
      * upper rel.
      */
-    List       *upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+    RelInfoList       upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);;
 
     /* Result tlists chosen by grouping_planner for upper-stage processing */
     struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
-- 
2.31.1

From 36e6671a291d8f69293c89312a4064d7e5b2382c Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Fri, 4 Nov 2022 15:02:57 +0100
Subject: [PATCH 2/3] Aggregate push-down - basic functionality.

With this patch, partial aggregation can be applied to a base relation or to a
join, and the resulting "grouped" relations can be joined to other "plain"
relations. Once all tables are joined, the aggregation is finalized. See
README for more information.

The next patches will enable the aggregate push-down feature for parallel
query processing, for partitioned tables and for foreign tables.
---
 src/backend/commands/trigger.c             |   2 +-
 src/backend/optimizer/README               |  69 ++
 src/backend/optimizer/path/allpaths.c      | 147 ++++
 src/backend/optimizer/path/costsize.c      |  17 +-
 src/backend/optimizer/path/equivclass.c    | 130 +++
 src/backend/optimizer/path/joinrels.c      | 193 ++++-
 src/backend/optimizer/plan/initsplan.c     | 289 +++++++
 src/backend/optimizer/plan/planmain.c      |  12 +
 src/backend/optimizer/plan/planner.c       |  44 +-
 src/backend/optimizer/plan/setrefs.c       |  33 +
 src/backend/optimizer/prep/prepagg.c       | 264 +++---
 src/backend/optimizer/prep/prepjointree.c  |   1 +
 src/backend/optimizer/util/pathnode.c      | 144 +++-
 src/backend/optimizer/util/relnode.c       | 949 ++++++++++++++++++++-
 src/backend/optimizer/util/tlist.c         |  31 +
 src/backend/utils/misc/guc_tables.c        |  10 +
 src/include/nodes/pathnodes.h              |  94 ++
 src/include/optimizer/clauses.h            |   3 +-
 src/include/optimizer/pathnode.h           |  19 +-
 src/include/optimizer/paths.h              |   6 +
 src/include/optimizer/planmain.h           |   1 +
 src/include/optimizer/prep.h               |   2 +
 src/include/optimizer/tlist.h              |   4 +-
 src/test/regress/expected/agg_pushdown.out | 216 +++++
 src/test/regress/expected/sysviews.out     |   3 +-
 src/test/regress/expected/triggers.out     |   2 +-
 src/test/regress/parallel_schedule         |   2 +
 src/test/regress/sql/agg_pushdown.sql      | 115 +++
 28 files changed, 2608 insertions(+), 194 deletions(-)
 create mode 100644 src/test/regress/expected/agg_pushdown.out
 create mode 100644 src/test/regress/sql/agg_pushdown.sql

diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index e64145e710..182e6161e0 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -264,7 +264,7 @@ CreateTriggerFiringOn(CreateTrigStmt *stmt, const char *queryString,
                         (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                          errmsg("\"%s\" is a partitioned table",
                                 RelationGetRelationName(rel)),
-                         errdetail("ROW triggers with transition tables are not supported on partitioned tables.")));
+                         errdetail("Triggers on partitioned tables cannot have transition tables.")));
         }
     }
     else if (rel->rd_rel->relkind == RELKIND_VIEW)
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 41c120e0cd..2fd1a96269 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1158,3 +1158,72 @@ breaking down aggregation or grouping over a partitioned relation into
 aggregation or grouping over its partitions is called partitionwise
 aggregation.  Especially when the partition keys match the GROUP BY clause,
 this can be significantly faster than the regular method.
+
+Aggregate push-down
+-------------------
+
+The obvious way to evaluate aggregates is to evaluate the FROM clause of the
+SQL query (this is what query_planner does) and use the resulting paths as the
+input of Agg node. However, if the groups are large enough, it may be more
+efficient to apply the partial aggregation to the output of base relation
+scan, and finalize it when we have all relations of the query joined:
+
+  EXPLAIN
+  SELECT a.i, avg(b.y)
+  FROM a JOIN b ON b.j = a.i
+  GROUP BY a.i;
+
+  Finalize HashAggregate
+    Group Key: a.i
+    ->  Nested Loop
+          ->  Partial HashAggregate
+                Group Key: b.j
+                ->  Seq Scan on b
+          ->  Index Only Scan using a_pkey on a
+                Index Cond: (i = b.j)
+
+Thus the join above the partial aggregate node receives fewer input rows, and
+so the number of outer-to-inner pairs of tuples to be checked can be
+significantly lower, which can in turn lead to considerably lower join cost.
+
+Note that there's often no GROUP BY expression to be used for the partial
+aggregation, so we use equivalence classes to derive grouping expression: in
+the example above, the grouping key "b.j" was derived from "a.i".
+
+Also note that in this case the partial aggregate uses the "b.j" as grouping
+column although the column does not appear in the query target list. The point
+is that "b.j" is needed to evaluate the join condition, and there's no other
+way for the partial aggregate to emit its values.
+
+Besides base relation, the aggregation can also be pushed down to join:
+
+  EXPLAIN
+  SELECT a.i, avg(b.y + c.v)
+  FROM   a JOIN b ON b.j = a.i
+         JOIN c ON c.k = a.i
+  WHERE b.j = c.k GROUP BY a.i;
+
+  Finalize HashAggregate
+    Group Key: a.i
+    ->  Hash Join
+      Hash Cond: (b.j = a.i)
+      ->  Partial HashAggregate
+        Group Key: b.j
+        ->  Hash Join
+              Hash Cond: (b.j = c.k)
+              ->  Seq Scan on b
+              ->  Hash
+                ->  Seq Scan on c
+      ->  Hash
+        ->  Seq Scan on a
+
+Whether the Agg node is created out of base relation or out of join, it's
+added to a separate RelOptInfo that we call "grouped relation". Grouped
+relation can be joined to a non-grouped relation, which results in a grouped
+relation too. Join of two grouped relations does not seem to be very useful
+and is currently not supported.
+
+If query_planner produces a grouped relation that contains valid paths, these
+are simply added to the UPPERREL_PARTIAL_GROUP_AGG relation. Further
+processing of these paths then does not differ from processing of other
+partially grouped paths.
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4ddaed31a4..f00f900ff4 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -62,6 +62,7 @@ typedef struct pushdown_safety_info
 
 /* These parameters are set by GUC */
 bool        enable_geqo = false;    /* just in case GUC doesn't set it */
+bool        enable_agg_pushdown;
 int            geqo_threshold;
 int            min_parallel_table_scan_size;
 int            min_parallel_index_scan_size;
@@ -75,6 +76,7 @@ join_search_hook_type join_search_hook = NULL;
 
 static void set_base_rel_consider_startup(PlannerInfo *root);
 static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
 static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
                          Index rti, RangeTblEntry *rte);
@@ -126,6 +128,9 @@ static void set_result_pathlist(PlannerInfo *root, RelOptInfo *rel,
                                 RangeTblEntry *rte);
 static void set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel,
                                    RangeTblEntry *rte);
+static void add_grouped_path(PlannerInfo *root, RelOptInfo *rel,
+                             Path *subpath, AggStrategy aggstrategy,
+                             RelAggInfo *agg_info);
 static RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist);
 static bool subquery_is_pushdown_safe(Query *subquery, Query *topquery,
                                       pushdown_safety_info *safetyInfo);
@@ -188,6 +193,13 @@ make_one_rel(PlannerInfo *root, List *joinlist)
      */
     set_base_rel_sizes(root);
 
+    /*
+     * Now that the sizes are known, we can estimate the sizes of the grouped
+     * relations.
+     */
+    if (root->grouped_var_list)
+        setup_base_grouped_rels(root);
+
     /*
      * We should now have size estimates for every actual table involved in
      * the query, and we also know which if any have been deleted from the
@@ -328,6 +340,48 @@ set_base_rel_sizes(PlannerInfo *root)
     }
 }
 
+/*
+ * setup_based_grouped_rels
+ *      For each "plain" relation build a grouped relation if aggregate pushdown
+ *    is possible and if this relation is suitable for partial aggregation.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+    Index        rti;
+
+    for (rti = 1; rti < root->simple_rel_array_size; rti++)
+    {
+        RelOptInfo *brel = root->simple_rel_array[rti];
+        RelOptInfo *rel_grouped;
+        RelAggInfo *agg_info;
+
+        /* there may be empty slots corresponding to non-baserel RTEs */
+        if (brel == NULL)
+            continue;
+
+        Assert(brel->relid == rti); /* sanity check on array */
+
+        /*
+         * The aggregate push-down feature only makes sense if there are
+         * multiple base rels in the query.
+         */
+        if (!bms_nonempty_difference(root->all_baserels, brel->relids))
+            continue;
+
+        /* ignore RTEs that are "other rels" */
+        if (brel->reloptkind != RELOPT_BASEREL)
+            continue;
+
+        rel_grouped = build_simple_grouped_rel(root, brel->relid, &agg_info);
+        if (rel_grouped)
+        {
+            /* Make the relation available for joining. */
+            add_grouped_rel(root, rel_grouped, agg_info);
+        }
+    }
+}
+
 /*
  * set_base_rel_pathlists
  *      Finds all paths available for scanning each base-relation entry.
@@ -500,8 +554,21 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
                 }
                 else
                 {
+                    RelOptInfo *rel_grouped;
+                    RelAggInfo *agg_info;
+
                     /* Plain relation */
                     set_plain_rel_pathlist(root, rel, rte);
+
+                    /* Add paths to the grouped relation if one exists. */
+                    rel_grouped = find_grouped_rel(root, rel->relids,
+                                                   &agg_info);
+                    if (rel_grouped)
+                    {
+                        generate_grouping_paths(root, rel_grouped, rel,
+                                                agg_info);
+                        set_cheapest(rel_grouped);
+                    }
                 }
                 break;
             case RTE_SUBQUERY:
@@ -3263,6 +3330,80 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
     }
 }
 
+/*
+ * generate_grouping_paths
+ *         Create partially aggregated paths and add them to grouped relation.
+ *
+ * "rel_plain" is base or join relation whose paths are not grouped.
+ */
+void
+generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
+                        RelOptInfo *rel_plain, RelAggInfo *agg_info)
+{
+    ListCell   *lc;
+
+    if (IS_DUMMY_REL(rel_plain))
+    {
+        mark_dummy_rel(rel_grouped);
+        return;
+    }
+
+    foreach(lc, rel_plain->pathlist)
+    {
+        Path       *path = (Path *) lfirst(lc);
+
+        /*
+         * Since the path originates from the non-grouped relation which is
+         * not aware of the aggregate push-down, we must ensure that it
+         * provides the correct input for aggregation.
+         */
+        path = (Path *) create_projection_path(root, rel_grouped, path,
+                                               agg_info->agg_input);
+
+        /*
+         * add_grouped_path() will check whether the path has suitable
+         * pathkeys.
+         */
+        add_grouped_path(root, rel_grouped, path, AGG_SORTED, agg_info);
+
+        /*
+         * Repeated creation of hash table (for new parameter values) should
+         * be possible, does not sound like a good idea in terms of
+         * efficiency.
+         */
+        if (path->param_info == NULL)
+            add_grouped_path(root, rel_grouped, path, AGG_HASHED, agg_info);
+    }
+
+    /* Could not generate any grouped paths? */
+    if (rel_grouped->pathlist == NIL)
+        mark_dummy_rel(rel_grouped);
+}
+
+/*
+ * Apply partial aggregation to a subpath and add the AggPath to the pathlist.
+ */
+static void
+add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+                 AggStrategy aggstrategy, RelAggInfo *agg_info)
+{
+    Path       *agg_path;
+
+
+    if (aggstrategy == AGG_HASHED)
+        agg_path = (Path *) create_agg_hashed_path(root, rel, subpath,
+                                                   agg_info);
+    else if (aggstrategy == AGG_SORTED)
+        agg_path = (Path *) create_agg_sorted_path(root, rel, subpath,
+                                                   agg_info);
+    else
+        elog(ERROR, "unexpected strategy %d", aggstrategy);
+
+    /* Add the grouped path to the list of grouped base paths. */
+    if (agg_path != NULL)
+        add_path(rel, (Path *) agg_path);
+}
+
 /*
  * make_rel_from_joinlist
  *      Build access paths using a "joinlist" to guide the join path search.
@@ -3404,6 +3545,7 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 
     for (lev = 2; lev <= levels_needed; lev++)
     {
+        RelOptInfo *rel_grouped;
         ListCell   *lc;
 
         /*
@@ -3441,6 +3583,11 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
             /* Find and save the cheapest paths for this rel */
             set_cheapest(rel);
 
+            /* The same for grouped relation if one exists. */
+            rel_grouped = find_grouped_rel(root, rel->relids, NULL);
+            if (rel_grouped)
+                set_cheapest(rel_grouped);
+
 #ifdef OPTIMIZER_DEBUG
             debug_print_rel(root, rel);
 #endif
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 4c6b1d1f55..b34ad90d08 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -4999,7 +4999,6 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
                                0,
                                JOIN_INNER,
                                NULL);
-
     rel->rows = clamp_row_est(nrows);
 
     cost_qual_eval(&rel->baserestrictcost, rel->baserestrictinfo, root);
@@ -6016,11 +6015,11 @@ set_pathtarget_cost_width(PlannerInfo *root, PathTarget *target)
     foreach(lc, target->exprs)
     {
         Node       *node = (Node *) lfirst(lc);
+        int32        item_width;
 
         if (IsA(node, Var))
         {
             Var           *var = (Var *) node;
-            int32        item_width;
 
             /* We should not see any upper-level Vars here */
             Assert(var->varlevelsup == 0);
@@ -6052,6 +6051,20 @@ set_pathtarget_cost_width(PlannerInfo *root, PathTarget *target)
             Assert(item_width > 0);
             tuple_width += item_width;
         }
+        else if (IsA(node, Aggref))
+        {
+            /*
+             * If the target is evaluated by AggPath, it'll care of cost
+             * estimate. If the target is above AggPath (typically target of a
+             * join relation that contains grouped relation), the cost of
+             * Aggref should not be accounted for again.
+             *
+             * On the other hand, width is always needed.
+             */
+            item_width = get_typavgwidth(exprType(node), exprTypmod(node));
+            Assert(item_width > 0);
+            tuple_width += item_width;
+        }
         else
         {
             /*
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index e65b967b1f..483daeb5de 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -3149,6 +3149,136 @@ is_redundant_derived_clause(RestrictInfo *rinfo, List *clauselist)
     return false;
 }
 
+/*
+ * translate_expression_to_rels
+ *        If the appropriate equivalence classes exist, replace vars in
+ *        gvi->gvexpr with vars whose varno is equal to relid. Return NULL if
+ *        translation is not possible or needed.
+ *
+ * Note: Currently we only translate Var expressions. This is subject to
+ * change as the aggregate push-down feature gets enhanced.
+ */
+GroupedVarInfo *
+translate_expression_to_rel(PlannerInfo *root, GroupedVarInfo *gvi,
+                            Index relid)
+{
+    Var           *var;
+    ListCell   *l1;
+    bool        found_orig = false;
+    Var           *var_translated = NULL;
+    GroupedVarInfo *result;
+
+    /* Can't do anything w/o equivalence classes. */
+    if (root->eq_classes == NIL)
+        return NULL;
+
+    var = castNode(Var, gvi->gvexpr);
+
+    /*
+     * Do we need to translate the var?
+     */
+    if (var->varno == relid)
+        return NULL;
+
+    /*
+     * Find the replacement var.
+     */
+    foreach(l1, root->eq_classes)
+    {
+        EquivalenceClass *ec = lfirst_node(EquivalenceClass, l1);
+        ListCell   *l2;
+
+        /* TODO Check if any other EC kind should be ignored. */
+        if (ec->ec_has_volatile || ec->ec_below_outer_join || ec->ec_broken)
+            continue;
+
+        /* Single-element EC can hardly help in translations. */
+        if (list_length(ec->ec_members) == 1)
+            continue;
+
+        /*
+         * Collect all vars of this EC and their varnos.
+         *
+         * ec->ec_relids does not help because we're only interested in a
+         * subset of EC members.
+         */
+        foreach(l2, ec->ec_members)
+        {
+            EquivalenceMember *em = lfirst_node(EquivalenceMember, l2);
+            Var           *ec_var;
+
+            /*
+             * The grouping expressions derived here are used to evaluate
+             * possibility to push aggregation down to RELOPT_BASEREL or
+             * RELOPT_JOINREL relations, and to construct reltargets for the
+             * grouped rels. We're not interested at the moment whether the
+             * relations do have children.
+             */
+            if (em->em_is_child)
+                continue;
+
+            if (!IsA(em->em_expr, Var))
+                continue;
+
+            ec_var = castNode(Var, em->em_expr);
+            if (equal(ec_var, var))
+                found_orig = true;
+            else if (ec_var->varno == relid)
+                var_translated = ec_var;
+
+            if (found_orig && var_translated)
+            {
+                /*
+                 * The replacement Var must have the same data type, otherwise
+                 * the values are not guaranteed to be grouped in the same way
+                 * as values of the original Var.
+                 */
+                if (ec_var->vartype != var->vartype)
+                    return NULL;
+
+                break;
+            }
+        }
+
+        if (found_orig)
+        {
+            /*
+             * The same expression probably does not exist in multiple ECs.
+             */
+            if (var_translated == NULL)
+            {
+                /*
+                 * Failed to translate the expression.
+                 */
+                return NULL;
+            }
+            else
+            {
+                /* Success. */
+                break;
+            }
+        }
+        else
+        {
+            /*
+             * Vars of the requested relid can be in the next ECs too.
+             */
+            var_translated = NULL;
+        }
+    }
+
+    if (!found_orig)
+        return NULL;
+
+    result = makeNode(GroupedVarInfo);
+    memcpy(result, gvi, sizeof(GroupedVarInfo));
+
+    result->gv_eval_at = bms_make_singleton(relid);
+    result->gvexpr = (Expr *) var_translated;
+
+    return result;
+}
+
 /*
  * is_redundant_with_indexclauses
  *        Test whether rinfo is redundant with any clause in the IndexClause
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 9da3ff2f9a..09a92541c0 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -21,6 +21,7 @@
 #include "optimizer/paths.h"
 #include "partitioning/partbounds.h"
 #include "utils/memutils.h"
+#include "utils/selfuncs.h"
 
 
 static void make_rels_by_clause_joins(PlannerInfo *root,
@@ -35,6 +36,10 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
 static bool restriction_is_constant_false(List *restrictlist,
                                           RelOptInfo *joinrel,
                                           bool only_pushed_down);
+static RelOptInfo *make_join_rel_common(PlannerInfo *root, RelOptInfo *rel1,
+                                        RelOptInfo *rel2,
+                                        RelAggInfo *agg_info,
+                                        RelOptInfo *rel_agg_input);
 static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
                                         RelOptInfo *rel2, RelOptInfo *joinrel,
                                         SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -669,21 +674,20 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
     return true;
 }
 
-
 /*
- * make_join_rel
- *       Find or create a join RelOptInfo that represents the join of
- *       the two given rels, and add to it path information for paths
- *       created with the two rels as outer and inner rel.
- *       (The join rel may already contain paths generated from other
- *       pairs of rels that add up to the same set of base rels.)
+ * make_join_rel_common
+ *     The workhorse of make_join_rel().
+ *
+ *    'agg_info' contains the reltarget of grouped relation and everything we
+ *    need to aggregate the join result. If NULL, then the join relation should
+ *    not be grouped.
  *
- * NB: will return NULL if attempted join is not valid.  This can happen
- * when working with outer joins, or with IN or EXISTS clauses that have been
- * turned into joins.
+ *    'rel_agg_input' describes the AggPath input relation if the join output
+ *    should be aggregated. If NULL is passed, do not aggregate the join output.
  */
-RelOptInfo *
-make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
+static RelOptInfo *
+make_join_rel_common(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
+                     RelAggInfo *agg_info, RelOptInfo *rel_agg_input)
 {
     Relids        joinrelids;
     SpecialJoinInfo *sjinfo;
@@ -744,7 +748,7 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
      * goes with this particular joining.
      */
     joinrel = build_join_rel(root, joinrelids, rel1, rel2, sjinfo,
-                             &restrictlist);
+                             &restrictlist, agg_info);
 
     /*
      * If we've already proven this join is empty, we needn't consider any
@@ -757,14 +761,173 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
     }
 
     /* Add paths to the join relation. */
-    populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
-                                restrictlist);
+    if (rel_agg_input == NULL)
+    {
+        /*
+         * Simply join the input relations, whether both are plain or one of
+         * them is grouped.
+         */
+        populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
+                                    restrictlist);
+    }
+    else
+    {
+        /* The join relation is grouped. */
+        Assert(agg_info != NULL);
+
+        /*
+         * Apply partial aggregation to the paths of rel_agg_input and add the
+         * resulting paths to joinrel.
+         */
+        generate_grouping_paths(root, joinrel, rel_agg_input, agg_info);
+    }
 
     bms_free(joinrelids);
 
     return joinrel;
 }
 
+/*
+ * make_join_rel_combined
+ *     Join grouped relation to non-grouped one.
+ */
+static void
+make_join_rel_combined(PlannerInfo *root, RelOptInfo *rel1,
+                       RelOptInfo *rel2,
+                       RelAggInfo *agg_info)
+{
+    RelOptInfo *rel1_grouped;
+    RelOptInfo *rel2_grouped;
+    bool        rel1_grouped_useful = false;
+    bool        rel2_grouped_useful = false;
+
+    /* Retrieve the grouped relations. */
+    rel1_grouped = find_grouped_rel(root, rel1->relids, NULL);
+    rel2_grouped = find_grouped_rel(root, rel2->relids, NULL);
+
+    /*
+     * Dummy rel may indicate a join relation that is able to generate grouped
+     * paths as such (i.e. it has valid agg_info), but for which the path
+     * actually could not be created (e.g. only AGG_HASHED strategy was
+     * possible but work_mem was not sufficient for hash table).
+     */
+    rel1_grouped_useful = rel1_grouped != NULL && !IS_DUMMY_REL(rel1_grouped);
+    rel2_grouped_useful = rel2_grouped != NULL && !IS_DUMMY_REL(rel2_grouped);
+
+    /* Nothing to do if there's no grouped relation. */
+    if (!rel1_grouped_useful && !rel2_grouped_useful)
+        return;
+
+    if (rel1_grouped_useful)
+        make_join_rel_common(root, rel1_grouped, rel2, agg_info, NULL);
+
+    if (rel2_grouped_useful)
+        make_join_rel_common(root, rel1, rel2_grouped, agg_info, NULL);
+
+    /*
+     * Join of two grouped relations is currently not supported. In such a
+     * case, grouping of one side would change the occurrence of the other
+     * side's aggregate transient states on the input of the final
+     * aggregation. This can be handled by adjusting the transient states, but
+     * it's not worth the effort because it's hard to find a use case for this
+     * kind of join.
+     *
+     * XXX If the join of two grouped rels is implemented someday, note that
+     * both rels can have aggregates, so it'd be hard to join grouped rel to
+     * non-grouped here: 1) such a "mixed join" would require a special
+     * target, 2) both AGGSPLIT_FINAL_DESERIAL and AGGSPLIT_SIMPLE aggregates
+     * could appear in the target of the final aggregation node, originating
+     * from the grouped and the non-grouped input rel respectively.
+     */
+}
+
+/*
+ * make_join_rel
+ *       Find or create a join RelOptInfo that represents the join of
+ *       the two given rels, and add to it path information for paths
+ *       created with the two rels as outer and inner rel.
+ *       (The join rel may already contain paths generated from other
+ *       pairs of rels that add up to the same set of base rels.)
+ *
+ *       In addition to creating an ordinary join relation, try to create a
+ *       grouped one. There are two strategies to achieve that: join a grouped
+ *       relation to plain one, or join two plain relations and apply partial
+ *       aggregation to the result.
+ *
+ * NB: will return NULL if attempted join is not valid.  This can happen when
+ * working with outer joins, or with IN or EXISTS clauses that have been
+ * turned into joins. Besides that, NULL is also returned if caller is
+ * interested in a grouped relation but it could not be created.
+ *
+ * Only the plain relation is returned; if grouped relation exists, it can be
+ * retrieved using find_grouped_rel().
+ */
+RelOptInfo *
+make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
+{
+    Relids        joinrelids;
+    RelAggInfo *agg_info = NULL;
+    RelOptInfo *joinrel,
+               *joinrel_plain;
+
+    /* 1) form the plain join. */
+    joinrel = make_join_rel_common(root, rel1, rel2, NULL, NULL);
+    joinrel_plain = joinrel;
+
+    if (joinrel_plain == NULL)
+        return joinrel_plain;
+
+    /*
+     * We're done if there are no grouping expressions nor aggregates.
+     */
+    if (root->grouped_var_list == NIL)
+        return joinrel_plain;
+
+    joinrelids = bms_union(rel1->relids, rel2->relids);
+    joinrel = find_grouped_rel(root, joinrelids, &agg_info);
+
+    if (joinrel != NULL)
+    {
+        /*
+         * If the same grouped joinrel was already formed, just with the base
+         * rels divided between rel1 and rel2 in a different way, the matching
+         * agg_info should already be there.
+         */
+        Assert(agg_info != NULL);
+    }
+    else
+    {
+        /*
+         * agg_info must be created from scratch.
+         */
+        agg_info = create_rel_agg_info(root, joinrel_plain);
+
+        /* Cannot we build grouped join? */
+        if (agg_info == NULL)
+            return joinrel_plain;
+
+        /*
+         * The number of aggregate input rows is simply the number of rows of
+         * the non-grouped relation, which should have been estimated by now.
+         */
+        agg_info->input_rows = joinrel_plain->rows;
+    }
+
+    /*
+     * 2) join two plain rels and aggregate the join paths. Aggregate
+     * push-down only makes sense if the join is not the top-level one.
+     */
+    if (bms_nonempty_difference(root->all_baserels, joinrelids))
+        make_join_rel_common(root, rel1, rel2, agg_info, joinrel_plain);
+
+    /*
+     * 3) combine plain and grouped relations.
+     */
+    make_join_rel_combined(root, rel1, rel2, agg_info);
+
+    return joinrel_plain;
+}
+
 /*
  * populate_joinrel_with_paths
  *      Add paths to the given joinrel for given pair of joining relations. The
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index fd8cbb1dc7..8e913c92d8 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "access/nbtree.h"
 #include "catalog/pg_class.h"
 #include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
@@ -48,6 +49,8 @@ typedef struct PostponedQual
 } PostponedQual;
 
 
+static void create_aggregate_grouped_var_infos(PlannerInfo *root);
+static void create_grouping_expr_grouped_var_infos(PlannerInfo *root);
 static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
                                        Index rtindex);
 static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -270,6 +273,292 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
     }
 }
 
+/*
+ * Add GroupedVarInfo to grouped_var_list for each aggregate as well as for
+ * each possible grouping expression.
+ *
+ * root->group_pathkeys must be setup before this function is called.
+ */
+extern void
+setup_aggregate_pushdown(PlannerInfo *root)
+{
+    ListCell   *lc;
+
+    /*
+     * Isn't user interested in the aggregate push-down feature?
+     */
+    if (!enable_agg_pushdown)
+        return;
+
+    /* The feature can only be applied to grouped aggregation. */
+    if (!root->parse->groupClause)
+        return;
+
+    /*
+     * Grouping sets require multiple different groupings but the base
+     * relation can only generate one.
+     */
+    if (root->parse->groupingSets)
+        return;
+
+    /*
+     * SRF is not allowed in the aggregate argument and we don't even want it
+     * in the GROUP BY clause, so forbid it in general. It needs to be
+     * analyzed if evaluation of a GROUP BY clause containing SRF below the
+     * query targetlist would be correct. Currently it does not seem to be an
+     * important use case.
+     */
+    if (root->parse->hasTargetSRFs)
+        return;
+
+    /* Create GroupedVarInfo per (distinct) aggregate. */
+    create_aggregate_grouped_var_infos(root);
+
+    /* Isn't there any aggregate to be pushed down? */
+    if (root->grouped_var_list == NIL)
+        return;
+
+    /* Create GroupedVarInfo per grouping expression. */
+    create_grouping_expr_grouped_var_infos(root);
+
+    /* Isn't there any useful grouping expression for aggregate push-down? */
+    if (root->grouped_var_list == NIL)
+        return;
+
+    /*
+     * Now that we know that grouping can be pushed down, search for the
+     * maximum sortgroupref. The base relations may need it if extra grouping
+     * expressions get added to them.
+     */
+    Assert(root->max_sortgroupref == 0);
+    foreach(lc, root->processed_tlist)
+    {
+        TargetEntry *te = lfirst_node(TargetEntry, lc);
+
+        if (te->ressortgroupref > root->max_sortgroupref)
+            root->max_sortgroupref = te->ressortgroupref;
+    }
+}
+
+/*
+ * Create GroupedVarInfo for each distinct aggregate.
+ *
+ * If any aggregate is not suitable, set root->grouped_var_list to NIL and
+ * return.
+ */
+static void
+create_aggregate_grouped_var_infos(PlannerInfo *root)
+{
+    List       *tlist_exprs;
+    ListCell   *lc;
+
+    Assert(root->grouped_var_list == NIL);
+
+    tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+                                  PVC_INCLUDE_AGGREGATES);
+
+    /*
+     * Although GroupingFunc is related to root->parse->groupingSets, this
+     * field does not necessarily reflect its presence.
+     */
+    foreach(lc, tlist_exprs)
+    {
+        Expr       *expr = (Expr *) lfirst(lc);
+
+        if (IsA(expr, GroupingFunc))
+            return;
+    }
+
+    /*
+     * Aggregates within the HAVING clause need to be processed in the same
+     * way as those in the main targetlist.
+     *
+     * Note that the contained aggregates will be pushed down, but the
+     * containing HAVING clause must be ignored until the aggregation is
+     * finalized.
+     */
+    if (root->parse->havingQual != NULL)
+    {
+        List       *having_exprs;
+
+        having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+                                       PVC_INCLUDE_AGGREGATES);
+        if (having_exprs != NIL)
+            tlist_exprs = list_concat(tlist_exprs, having_exprs);
+    }
+
+    if (tlist_exprs == NIL)
+        return;
+
+    foreach(lc, tlist_exprs)
+    {
+        Expr       *expr = (Expr *) lfirst(lc);
+        Aggref       *aggref;
+        ListCell   *lc2;
+        GroupedVarInfo *gvi;
+        bool        exists;
+
+        /*
+         * tlist_exprs may also contain Vars, but we only need Aggrefs.
+         */
+        if (IsA(expr, Var))
+            continue;
+
+        aggref = castNode(Aggref, expr);
+
+        /* TODO Think if (some of) these can be handled. */
+        if (aggref->aggvariadic ||
+            aggref->aggdirectargs || aggref->aggorder ||
+            aggref->aggdistinct)
+        {
+            /*
+             * Aggregation push-down is not useful if at least one aggregate
+             * cannot be evaluated below the top-level join.
+             *
+             * XXX Is it worth freeing the GroupedVarInfos and their subtrees?
+             */
+            root->grouped_var_list = NIL;
+            break;
+        }
+
+        /* Does GroupedVarInfo for this aggregate already exist? */
+        exists = false;
+        foreach(lc2, root->grouped_var_list)
+        {
+            gvi = lfirst_node(GroupedVarInfo, lc2);
+
+            if (equal(expr, gvi->gvexpr))
+            {
+                exists = true;
+                break;
+            }
+        }
+
+        /* Construct a new GroupedVarInfo if does not exist yet. */
+        if (!exists)
+        {
+            Relids        relids;
+
+            gvi = makeNode(GroupedVarInfo);
+            gvi->gvexpr = (Expr *) copyObject(aggref);
+
+            /* Find out where the aggregate should be evaluated. */
+            relids = pull_varnos(root, (Node *) aggref);
+            if (!bms_is_empty(relids))
+                gvi->gv_eval_at = relids;
+            else
+                gvi->gv_eval_at = NULL;
+
+            root->grouped_var_list = lappend(root->grouped_var_list, gvi);
+        }
+    }
+
+    list_free(tlist_exprs);
+}
+
+/*
+ * Create GroupedVarInfo for each expression usable as grouping key.
+ *
+ * In addition to the expressions of the query targetlist, group_pathkeys is
+ * also considered the source of grouping expressions. That increases the
+ * chance to get the relation output grouped.
+ */
+static void
+create_grouping_expr_grouped_var_infos(PlannerInfo *root)
+{
+    ListCell   *l1,
+               *l2;
+    List       *exprs = NIL;
+    List       *sortgrouprefs = NIL;
+
+    /*
+     * Make sure GroupedVarInfo exists for each expression usable as grouping
+     * key.
+     */
+    foreach(l1, root->parse->groupClause)
+    {
+        SortGroupClause *sgClause;
+        TargetEntry *te;
+        Index        sortgroupref;
+        TypeCacheEntry *tce;
+        Oid            equalimageproc;
+
+        sgClause = lfirst_node(SortGroupClause, l1);
+        te = get_sortgroupclause_tle(sgClause, root->processed_tlist);
+        sortgroupref = te->ressortgroupref;
+
+        Assert(sortgroupref > 0);
+
+        /*
+         * Non-zero sortgroupref does not necessarily imply grouping
+         * expression: data can also be sorted by aggregate.
+         */
+        if (IsA(te->expr, Aggref))
+            continue;
+
+        /*
+         * The aggregate push-down feature currently supports only plain Vars
+         * as grouping expressions.
+         */
+        if (!IsA(te->expr, Var))
+        {
+            root->grouped_var_list = NIL;
+            return;
+        }
+
+        /*
+         * Aggregate push-down is only possible if equality of grouping keys
+         * per the equality operator implies bitwise equality. Otherwise, if
+         * we put keys of different byte images into the same group, we lose
+         * some information that may be needed to evaluate join clauses above
+         * the pushed-down aggregate node, or the WHERE clause.
+         *
+         * For example, the NUMERIC data type is not supported because values
+         * that fall into the same group according to the equality operator
+         * (e.g. 0 and 0.0) can have different scale.
+         */
+        tce = lookup_type_cache(exprType((Node *) te->expr),
+                                TYPECACHE_BTREE_OPFAMILY);
+        if (!OidIsValid(tce->btree_opf) ||
+            !OidIsValid(tce->btree_opintype))
+            goto fail;
+
+        equalimageproc = get_opfamily_proc(tce->btree_opf,
+                                           tce->btree_opintype,
+                                           tce->btree_opintype,
+                                           BTEQUALIMAGE_PROC);
+        if (!OidIsValid(equalimageproc) ||
+            !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+                                               tce->typcollation,
+                                               ObjectIdGetDatum(tce->btree_opintype))))
+            goto fail;
+
+        exprs = lappend(exprs, te->expr);
+        sortgrouprefs = lappend_int(sortgrouprefs, sortgroupref);
+    }
+
+    /*
+     * Construct GroupedVarInfo for each expression.
+     */
+    forboth(l1, exprs, l2, sortgrouprefs)
+    {
+        Var           *var = lfirst_node(Var, l1);
+        int            sortgroupref = lfirst_int(l2);
+        GroupedVarInfo *gvi = makeNode(GroupedVarInfo);
+
+        gvi->gvexpr = (Expr *) copyObject(var);
+        gvi->sortgroupref = sortgroupref;
+
+        /* Find out where the expression should be evaluated. */
+        gvi->gv_eval_at = bms_make_singleton(var->varno);
+
+        root->grouped_var_list = lappend(root->grouped_var_list, gvi);
+    }
+    return;
+
+fail:
+    root->grouped_var_list = NIL;
+}
 
 /*****************************************************************************
  *
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 55de28f073..3302673c59 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -66,6 +66,7 @@ query_planner(PlannerInfo *root,
      * here.
      */
     root->join_rel_list = makeNode(RelInfoList);
+    root->agg_info_list = makeNode(RelInfoList);
     root->join_rel_level = NULL;
     root->join_cur_level = 0;
     root->canon_pathkeys = NIL;
@@ -76,6 +77,7 @@ query_planner(PlannerInfo *root,
     root->placeholder_list = NIL;
     root->placeholder_array = NULL;
     root->placeholder_array_size = 0;
+    root->grouped_var_list = NIL;
     root->fkey_list = NIL;
     root->initial_rels = NIL;
 
@@ -254,6 +256,16 @@ query_planner(PlannerInfo *root,
      */
     extract_restriction_or_clauses(root);
 
+    /*
+     * If the query result can be grouped, check if any grouping can be
+     * performed below the top-level join. If so, setup
+     * root->grouped_var_list.
+     *
+     * The base relations should be fully initialized now, so that we have
+     * enough info to decide whether grouping is possible.
+     */
+    setup_aggregate_pushdown(root);
+
     /*
      * Now expand appendrels by adding "otherrels" for their children.  We
      * delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 493a3af0fa..0ada3ba3eb 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -629,6 +629,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
     memset(root->upper_rels, 0, sizeof(root->upper_rels));
     memset(root->upper_targets, 0, sizeof(root->upper_targets));
     root->processed_tlist = NIL;
+    root->max_sortgroupref = 0;
     root->update_colnos = NIL;
     root->grouping_map = NULL;
     root->minmax_aggs = NIL;
@@ -3856,11 +3857,11 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
         bool        force_rel_creation;
 
         /*
-         * If we're doing partitionwise aggregation at this level, force
-         * creation of a partially_grouped_rel so we can add partitionwise
-         * paths to it.
+         * If we're doing partitionwise aggregation at this level or if
+         * aggregate push-down succeeded to create some paths, force creation
+         * of a partially_grouped_rel so we can add the related paths to it.
          */
-        force_rel_creation = (patype == PARTITIONWISE_AGGREGATE_PARTIAL);
+        force_rel_creation = patype == PARTITIONWISE_AGGREGATE_PARTIAL;
 
         partially_grouped_rel =
             create_partial_grouping_paths(root,
@@ -3893,10 +3894,14 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 
     /* Gather any partially grouped partial paths. */
     if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
-    {
         gather_grouping_paths(root, partially_grouped_rel);
+
+    /*
+     * The non-partial paths can come either from the Gather above or from
+     * aggregate push-down.
+     */
+    if (partially_grouped_rel && partially_grouped_rel->pathlist)
         set_cheapest(partially_grouped_rel);
-    }
 
     /*
      * Estimate number of groups.
@@ -6837,6 +6842,13 @@ create_partial_grouping_paths(PlannerInfo *root,
     bool        can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
     bool        can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
 
+    /*
+     * The output relation could have been already created due to aggregate
+     * push-down.
+     */
+    partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
+    Assert(enable_agg_pushdown || partially_grouped_rel == NULL);
+
     /*
      * Consider whether we should generate partially aggregated non-partial
      * paths.  We can only do this if we have a non-partial path, and only if
@@ -6863,16 +6875,18 @@ create_partial_grouping_paths(PlannerInfo *root,
      */
     if (cheapest_total_path == NULL &&
         cheapest_partial_path == NULL &&
-        !force_rel_creation)
+        !force_rel_creation &&
+        partially_grouped_rel == NULL)
         return NULL;
 
     /*
      * Build a new upper relation to represent the result of partially
      * aggregating the rows from the input relation.
      */
-    partially_grouped_rel = fetch_upper_rel(root,
-                                            UPPERREL_PARTIAL_GROUP_AGG,
-                                            grouped_rel->relids);
+    if (partially_grouped_rel == NULL)
+        partially_grouped_rel = fetch_upper_rel(root,
+                                                UPPERREL_PARTIAL_GROUP_AGG,
+                                                grouped_rel->relids);
     partially_grouped_rel->consider_parallel =
         grouped_rel->consider_parallel;
     partially_grouped_rel->reloptkind = grouped_rel->reloptkind;
@@ -6886,10 +6900,14 @@ create_partial_grouping_paths(PlannerInfo *root,
      * emit the same tlist as regular aggregate paths, because (1) we must
      * include Vars and Aggrefs needed in HAVING, which might not appear in
      * the result tlist, and (2) the Aggrefs must be set in partial mode.
+     *
+     * If the target was already created for the sake of aggregate push-down,
+     * it should be compatible with what we'd create here.
      */
-    partially_grouped_rel->reltarget =
-        make_partial_grouping_target(root, grouped_rel->reltarget,
-                                     extra->havingQual);
+    if (partially_grouped_rel->reltarget->exprs == NIL)
+        partially_grouped_rel->reltarget =
+            make_partial_grouping_target(root, grouped_rel->reltarget,
+                                         extra->havingQual);
 
     if (!extra->partial_costs_set)
     {
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1cb0abdbc1..72657be7ae 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -2870,6 +2870,39 @@ fix_join_expr_mutator(Node *node, fix_join_expr_context *context)
         /* No referent found for Var */
         elog(ERROR, "variable not found in subplan target lists");
     }
+    if (IsA(node, Aggref))
+    {
+        Aggref       *aggref = castNode(Aggref, node);
+
+        /*
+         * The upper plan targetlist can contain Aggref whose value has
+         * already been evaluated by the subplan. However this can only happen
+         * with specific value of aggsplit.
+         */
+        if (aggref->aggsplit == AGGSPLIT_INITIAL_SERIAL)
+        {
+            /* See if the Aggref has bubbled up from a lower plan node */
+            if (context->outer_itlist && context->outer_itlist->has_non_vars)
+            {
+                newvar = search_indexed_tlist_for_non_var((Expr *) node,
+                                                          context->outer_itlist,
+                                                          OUTER_VAR);
+                if (newvar)
+                    return (Node *) newvar;
+            }
+            if (context->inner_itlist && context->inner_itlist->has_non_vars)
+            {
+                newvar = search_indexed_tlist_for_non_var((Expr *) node,
+                                                          context->inner_itlist,
+                                                          INNER_VAR);
+                if (newvar)
+                    return (Node *) newvar;
+            }
+        }
+
+        /* No referent found for Aggref */
+        elog(ERROR, "Aggref not found in subplan target lists");
+    }
     if (IsA(node, PlaceHolderVar))
     {
         PlaceHolderVar *phv = (PlaceHolderVar *) node;
diff --git a/src/backend/optimizer/prep/prepagg.c b/src/backend/optimizer/prep/prepagg.c
index da89b55402..7bb747ee6b 100644
--- a/src/backend/optimizer/prep/prepagg.c
+++ b/src/backend/optimizer/prep/prepagg.c
@@ -64,6 +64,10 @@ static int    find_compatible_trans(PlannerInfo *root, Aggref *newagg,
                                   Datum initValue, bool initValueIsNull,
                                   List *transnos);
 static Datum GetAggInitVal(Datum textInitVal, Oid transtype);
+static void get_agg_clause_costs_trans(PlannerInfo *root, AggSplit aggsplit,
+                                       AggTransInfo *transinfo, AggClauseCosts *costs);
+static void get_agg_clause_costs_agginfo(PlannerInfo *root, AggSplit aggsplit,
+                                         AggInfo *agginfo, AggClauseCosts *costs);
 
 /* -----------------
  * Resolve the transition type of all Aggrefs, and determine which Aggrefs
@@ -546,132 +550,176 @@ get_agg_clause_costs(PlannerInfo *root, AggSplit aggsplit, AggClauseCosts *costs
     {
         AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
 
-        /*
-         * Add the appropriate component function execution costs to
-         * appropriate totals.
-         */
-        if (DO_AGGSPLIT_COMBINE(aggsplit))
-        {
-            /* charge for combining previously aggregated states */
-            add_function_cost(root, transinfo->combinefn_oid, NULL,
-                              &costs->transCost);
-        }
-        else
-            add_function_cost(root, transinfo->transfn_oid, NULL,
-                              &costs->transCost);
-        if (DO_AGGSPLIT_DESERIALIZE(aggsplit) &&
-            OidIsValid(transinfo->deserialfn_oid))
-            add_function_cost(root, transinfo->deserialfn_oid, NULL,
-                              &costs->transCost);
-        if (DO_AGGSPLIT_SERIALIZE(aggsplit) &&
-            OidIsValid(transinfo->serialfn_oid))
-            add_function_cost(root, transinfo->serialfn_oid, NULL,
-                              &costs->finalCost);
+        get_agg_clause_costs_trans(root, aggsplit, transinfo, costs);
+    }
 
-        /*
-         * These costs are incurred only by the initial aggregate node, so we
-         * mustn't include them again at upper levels.
-         */
-        if (!DO_AGGSPLIT_COMBINE(aggsplit))
-        {
-            /* add the input expressions' cost to per-input-row costs */
-            QualCost    argcosts;
+    foreach(lc, root->agginfos)
+    {
+        AggInfo    *agginfo = (AggInfo *) lfirst(lc);
 
-            cost_qual_eval_node(&argcosts, (Node *) transinfo->args, root);
-            costs->transCost.startup += argcosts.startup;
-            costs->transCost.per_tuple += argcosts.per_tuple;
+        get_agg_clause_costs_agginfo(root, aggsplit, agginfo, costs);
 
-            /*
-             * Add any filter's cost to per-input-row costs.
-             *
-             * XXX Ideally we should reduce input expression costs according
-             * to filter selectivity, but it's not clear it's worth the
-             * trouble.
-             */
-            if (transinfo->aggfilter)
-            {
-                cost_qual_eval_node(&argcosts, (Node *) transinfo->aggfilter,
-                                    root);
-                costs->transCost.startup += argcosts.startup;
-                costs->transCost.per_tuple += argcosts.per_tuple;
-            }
-        }
+    }
+}
+
+/*
+ * Like get_agg_clause_costs(), but only consider aggregates passed in the
+ * 'aggrefs' list.
+ */
+void
+get_agg_clause_costs_some(PlannerInfo *root, AggSplit aggsplit, List *aggrefs,
+                          AggClauseCosts *costs)
+{
+    ListCell    *lc;
+
+    foreach(lc, aggrefs)
+    {
+        Aggref    *aggref    = lfirst_node(Aggref, lc);
+        AggTransInfo *aggtrans = (AggTransInfo *) list_nth(root->aggtransinfos,
+                                                           aggref->aggtransno);
+        AggInfo    *agginfo = list_nth(root->agginfos, aggref->aggno);
+
+
+        get_agg_clause_costs_trans(root, aggsplit, aggtrans, costs);
+        get_agg_clause_costs_agginfo(root, aggsplit, agginfo, costs);
+    }
+}
+
+/*
+ * Sub-routine of get_agg_clause_costs(), to process a single AggTransInfo.
+ */
+static void
+get_agg_clause_costs_trans(PlannerInfo *root, AggSplit aggsplit,
+                           AggTransInfo *transinfo, AggClauseCosts *costs)
+{
+    /*
+     * Add the appropriate component function execution costs to appropriate
+     * totals.
+     */
+    if (DO_AGGSPLIT_COMBINE(aggsplit))
+    {
+        /* charge for combining previously aggregated states */
+        add_function_cost(root, transinfo->combinefn_oid, NULL,
+                          &costs->transCost);
+    }
+    else
+        add_function_cost(root, transinfo->transfn_oid, NULL,
+                          &costs->transCost);
+    if (DO_AGGSPLIT_DESERIALIZE(aggsplit) &&
+        OidIsValid(transinfo->deserialfn_oid))
+        add_function_cost(root, transinfo->deserialfn_oid, NULL,
+                          &costs->transCost);
+    if (DO_AGGSPLIT_SERIALIZE(aggsplit) &&
+        OidIsValid(transinfo->serialfn_oid))
+        add_function_cost(root, transinfo->serialfn_oid, NULL,
+                          &costs->finalCost);
+
+    /*
+     * These costs are incurred only by the initial aggregate node, so we
+     * mustn't include them again at upper levels.
+     */
+    if (!DO_AGGSPLIT_COMBINE(aggsplit))
+    {
+        /* add the input expressions' cost to per-input-row costs */
+        QualCost    argcosts;
+
+        cost_qual_eval_node(&argcosts, (Node *) transinfo->args, root);
+        costs->transCost.startup += argcosts.startup;
+        costs->transCost.per_tuple += argcosts.per_tuple;
 
         /*
-         * If the transition type is pass-by-value then it doesn't add
-         * anything to the required size of the hashtable.  If it is
-         * pass-by-reference then we have to add the estimated size of the
-         * value itself, plus palloc overhead.
+         * Add any filter's cost to per-input-row costs.
+         *
+         * XXX Ideally we should reduce input expression costs according to
+         * filter selectivity, but it's not clear it's worth the trouble.
          */
-        if (!transinfo->transtypeByVal)
+        if (transinfo->aggfilter)
         {
-            int32        avgwidth;
+            cost_qual_eval_node(&argcosts, (Node *) transinfo->aggfilter,
+                                root);
+            costs->transCost.startup += argcosts.startup;
+            costs->transCost.per_tuple += argcosts.per_tuple;
+        }
+    }
 
-            /* Use average width if aggregate definition gave one */
-            if (transinfo->aggtransspace > 0)
-                avgwidth = transinfo->aggtransspace;
-            else if (transinfo->transfn_oid == F_ARRAY_APPEND)
-            {
-                /*
-                 * If the transition function is array_append(), it'll use an
-                 * expanded array as transvalue, which will occupy at least
-                 * ALLOCSET_SMALL_INITSIZE and possibly more.  Use that as the
-                 * estimate for lack of a better idea.
-                 */
-                avgwidth = ALLOCSET_SMALL_INITSIZE;
-            }
-            else
-            {
-                avgwidth = get_typavgwidth(transinfo->aggtranstype, transinfo->aggtranstypmod);
-            }
+    /*
+     * If the transition type is pass-by-value then it doesn't add anything to
+     * the required size of the hashtable.  If it is pass-by-reference then we
+     * have to add the estimated size of the value itself, plus palloc
+     * overhead.
+     */
+    if (!transinfo->transtypeByVal)
+    {
+        int32        avgwidth;
 
-            avgwidth = MAXALIGN(avgwidth);
-            costs->transitionSpace += avgwidth + 2 * sizeof(void *);
-        }
-        else if (transinfo->aggtranstype == INTERNALOID)
+        /* Use average width if aggregate definition gave one */
+        if (transinfo->aggtransspace > 0)
+            avgwidth = transinfo->aggtransspace;
+        else if (transinfo->transfn_oid == F_ARRAY_APPEND)
         {
             /*
-             * INTERNAL transition type is a special case: although INTERNAL
-             * is pass-by-value, it's almost certainly being used as a pointer
-             * to some large data structure.  The aggregate definition can
-             * provide an estimate of the size.  If it doesn't, then we assume
-             * ALLOCSET_DEFAULT_INITSIZE, which is a good guess if the data is
-             * being kept in a private memory context, as is done by
-             * array_agg() for instance.
+             * If the transition function is array_append(), it'll use an
+             * expanded array as transvalue, which will occupy at least
+             * ALLOCSET_SMALL_INITSIZE and possibly more.  Use that as the
+             * estimate for lack of a better idea.
              */
-            if (transinfo->aggtransspace > 0)
-                costs->transitionSpace += transinfo->aggtransspace;
-            else
-                costs->transitionSpace += ALLOCSET_DEFAULT_INITSIZE;
+            avgwidth = ALLOCSET_SMALL_INITSIZE;
+        }
+        else
+        {
+            avgwidth = get_typavgwidth(transinfo->aggtranstype, transinfo->aggtranstypmod);
         }
-    }
 
-    foreach(lc, root->agginfos)
+        avgwidth = MAXALIGN(avgwidth);
+        costs->transitionSpace += avgwidth + 2 * sizeof(void *);
+    }
+    else if (transinfo->aggtranstype == INTERNALOID)
     {
-        AggInfo    *agginfo = lfirst_node(AggInfo, lc);
-        Aggref       *aggref = linitial_node(Aggref, agginfo->aggrefs);
-
         /*
-         * Add the appropriate component function execution costs to
-         * appropriate totals.
+         * INTERNAL transition type is a special case: although INTERNAL is
+         * pass-by-value, it's almost certainly being used as a pointer to
+         * some large data structure.  The aggregate definition can provide an
+         * estimate of the size.  If it doesn't, then we assume
+         * ALLOCSET_DEFAULT_INITSIZE, which is a good guess if the data is
+         * being kept in a private memory context, as is done by array_agg()
+         * for instance.
          */
-        if (!DO_AGGSPLIT_SKIPFINAL(aggsplit) &&
-            OidIsValid(agginfo->finalfn_oid))
-            add_function_cost(root, agginfo->finalfn_oid, NULL,
-                              &costs->finalCost);
+        if (transinfo->aggtransspace > 0)
+            costs->transitionSpace += transinfo->aggtransspace;
+        else
+            costs->transitionSpace += ALLOCSET_DEFAULT_INITSIZE;
+    }
+}
 
-        /*
-         * If there are direct arguments, treat their evaluation cost like the
-         * cost of the finalfn.
-         */
-        if (aggref->aggdirectargs)
-        {
-            QualCost    argcosts;
+/*
+ * Sub-routine of get_agg_clause_costs(), to process a single AggInfo.
+ */
+static void
+get_agg_clause_costs_agginfo(PlannerInfo *root, AggSplit aggsplit,
+                             AggInfo *agginfo, AggClauseCosts *costs)
+{
+    Aggref       *aggref = linitial_node(Aggref, agginfo->aggrefs);
 
-            cost_qual_eval_node(&argcosts, (Node *) aggref->aggdirectargs,
-                                root);
-            costs->finalCost.startup += argcosts.startup;
-            costs->finalCost.per_tuple += argcosts.per_tuple;
-        }
+    /*
+     * Add the appropriate component function execution costs to appropriate
+     * totals.
+     */
+    if (!DO_AGGSPLIT_SKIPFINAL(aggsplit) &&
+        OidIsValid(agginfo->finalfn_oid))
+        add_function_cost(root, agginfo->finalfn_oid, NULL,
+                          &costs->finalCost);
+
+    /*
+     * If there are direct arguments, treat their evaluation cost like the
+     * cost of the finalfn.
+     */
+    if (aggref->aggdirectargs)
+    {
+        QualCost    argcosts;
+
+        cost_qual_eval_node(&argcosts, (Node *) aggref->aggdirectargs,
+                            root);
+        costs->finalCost.startup += argcosts.startup;
+        costs->finalCost.per_tuple += argcosts.per_tuple;
     }
 }
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 41c7066d90..1fc0466b5c 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -1007,6 +1007,7 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
     memset(subroot->upper_rels, 0, sizeof(subroot->upper_rels));
     memset(subroot->upper_targets, 0, sizeof(subroot->upper_targets));
     subroot->processed_tlist = NIL;
+    root->max_sortgroupref = 0;
     subroot->update_colnos = NIL;
     subroot->grouping_map = NULL;
     subroot->minmax_aggs = NIL;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 6dd11329fb..7025ebf94b 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2670,8 +2670,7 @@ create_projection_path(PlannerInfo *root,
     pathnode->path.pathtype = T_Result;
     pathnode->path.parent = rel;
     pathnode->path.pathtarget = target;
-    /* For now, assume we are above any joins, so no parameterization */
-    pathnode->path.param_info = NULL;
+    pathnode->path.param_info = subpath->param_info;
     pathnode->path.parallel_aware = false;
     pathnode->path.parallel_safe = rel->consider_parallel &&
         subpath->parallel_safe &&
@@ -3163,6 +3162,147 @@ create_agg_path(PlannerInfo *root,
     return pathnode;
 }
 
+/*
+ * Apply AGG_SORTED aggregation path to subpath if it's suitably sorted.
+ *
+ * NULL is returned if sorting of subpath output is not suitable.
+ */
+AggPath *
+create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+                       RelAggInfo *agg_info)
+{
+    List       *agg_exprs;
+    AggSplit    aggsplit;
+    AggClauseCosts agg_costs;
+    PathTarget *target;
+    double        dNumGroups;
+    ListCell   *lc1;
+    List       *key_subset = NIL;
+    AggPath    *result = NULL;
+
+    aggsplit = AGGSPLIT_INITIAL_SERIAL;
+    agg_exprs = agg_info->agg_exprs;
+    target = agg_info->target;
+
+    if (subpath->pathkeys == NIL)
+        return NULL;
+
+    if (!grouping_is_sortable(root->parse->groupClause))
+        return NULL;
+
+    /*
+     * Find all query pathkeys that our relation does affect.
+     */
+    foreach(lc1, root->group_pathkeys)
+    {
+        PathKey    *gkey = castNode(PathKey, lfirst(lc1));
+        ListCell   *lc2;
+
+        foreach(lc2, subpath->pathkeys)
+        {
+            PathKey    *skey = castNode(PathKey, lfirst(lc2));
+
+            if (skey == gkey)
+            {
+                key_subset = lappend(key_subset, gkey);
+                break;
+            }
+        }
+    }
+
+    if (key_subset == NIL)
+        return NULL;
+
+    /* Check if AGG_SORTED is useful for the whole query.  */
+    if (!pathkeys_contained_in(key_subset, subpath->pathkeys))
+        return NULL;
+
+    MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+    get_agg_clause_costs_some(root, aggsplit, agg_exprs, &agg_costs);
+
+    Assert(agg_info->group_exprs != NIL);
+    dNumGroups = estimate_num_groups(root, agg_info->group_exprs,
+                                     subpath->rows, NULL, NULL);
+
+    /*
+     * qual is NIL because the HAVING clause cannot be evaluated until the
+     * final value of the aggregate is known.
+     */
+    result = create_agg_path(root, rel, subpath, target,
+                             AGG_SORTED, aggsplit,
+                             agg_info->group_clauses,
+                             NIL,
+                             &agg_costs,
+                             dNumGroups);
+
+    /* The agg path should require no fewer parameters than the plain one. */
+    result->path.param_info = subpath->param_info;
+
+    return result;
+}
+
+/*
+ * Apply AGG_HASHED aggregation to subpath.
+ */
+AggPath *
+create_agg_hashed_path(PlannerInfo *root, RelOptInfo *rel,
+                       Path *subpath, RelAggInfo *agg_info)
+{
+    bool        can_hash;
+    List       *agg_exprs;
+    AggSplit    aggsplit;
+    AggClauseCosts agg_costs;
+    PathTarget *target;
+    double        dNumGroups;
+    double        hashaggtablesize;
+    Query       *parse = root->parse;
+    AggPath    *result = NULL;
+
+    /* Do not try to create hash table for each parameter value. */
+    Assert(subpath->param_info == NULL);
+
+    aggsplit = AGGSPLIT_INITIAL_SERIAL;
+    agg_exprs = agg_info->agg_exprs;
+    target = agg_info->target;
+
+    MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+    get_agg_clause_costs_some(root, aggsplit, agg_exprs, &agg_costs);
+
+    can_hash = (parse->groupClause != NIL &&
+                parse->groupingSets == NIL &&
+                root->numOrderedAggs == 0 &&
+                grouping_is_hashable(parse->groupClause));
+
+    if (can_hash)
+    {
+        Assert(agg_info->group_exprs != NIL);
+        dNumGroups = estimate_num_groups(root, agg_info->group_exprs,
+                                         subpath->rows, NULL, NULL);
+
+        hashaggtablesize = estimate_hashagg_tablesize(root, subpath,
+                                                      &agg_costs,
+                                                      dNumGroups);
+
+        if (hashaggtablesize < work_mem * 1024L)
+        {
+            /*
+             * qual is NIL because the HAVING clause cannot be evaluated until
+             * the final value of the aggregate is known.
+             */
+            result = create_agg_path(root, rel, subpath,
+                                     target,
+                                     AGG_HASHED,
+                                     aggsplit,
+                                     agg_info->group_clauses,
+                                     NIL,
+                                     &agg_costs,
+                                     dNumGroups);
+        }
+    }
+
+    return result;
+}
+
 /*
  * create_groupingsets_path
  *      Creates a pathnode that represents performing GROUPING SETS aggregation
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index c75d2d1f19..1f124b9713 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -18,17 +18,23 @@
 
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
+#include "catalog/pg_class_d.h"
+#include "catalog/pg_constraint.h"
 #include "optimizer/appendinfo.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/inherit.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/placeholder.h"
 #include "optimizer/plancat.h"
+#include "optimizer/planner.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
 #include "utils/hsearch.h"
+#include "utils/selfuncs.h"
 #include "utils/lsyscache.h"
 
 
@@ -76,6 +82,11 @@ static void build_child_join_reltarget(PlannerInfo *root,
                                        RelOptInfo *childrel,
                                        int nappinfos,
                                        AppendRelInfo **appinfos);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+                                  PathTarget *target, PathTarget *agg_input,
+                                  List *gvis, List **group_exprs_extra_p);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
 
 
 /*
@@ -369,6 +380,110 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
     return rel;
 }
 
+/*
+ * build_simple_grouped_rel
+ *      Construct a new RelOptInfo for a grouped base relation out of an
+ *      existing non-grouped relation. On success, pointer to the corresponding
+ *      RelAggInfo is stored in *agg_info_p in addition to returning the grouped
+ *      relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, int relid,
+                         RelAggInfo **agg_info_p)
+{
+    RangeTblEntry *rte;
+    RelOptInfo *rel_plain,
+               *rel_grouped;
+    RelAggInfo *agg_info;
+
+    /* Isn't there any grouping expression to be pushed down? */
+    if (root->grouped_var_list == NIL)
+        return NULL;
+
+    rel_plain = root->simple_rel_array[relid];
+
+    /* Caller should only pass rti that represents base relation. */
+    Assert(rel_plain != NULL);
+
+    /*
+     * Not all RTE kinds are supported when grouping is considered.
+     *
+     * TODO Consider relaxing some of these restrictions.
+     */
+    rte = root->simple_rte_array[rel_plain->relid];
+    if (rte->rtekind != RTE_RELATION ||
+        rte->relkind == RELKIND_FOREIGN_TABLE ||
+        rte->tablesample != NULL)
+        return NULL;
+
+    /*
+     * Grouped append relation is not supported yet.
+     */
+    if (rte->inh)
+        return NULL;
+
+    /*
+     * Currently we do not support child relations ("other rels").
+     */
+    if (rel_plain->reloptkind != RELOPT_BASEREL)
+        return NULL;
+
+    /*
+     * Prepare the information we need for aggregation of the rel contents.
+     */
+    agg_info = create_rel_agg_info(root, rel_plain);
+    if (agg_info == NULL)
+        return NULL;
+
+    /*
+     * TODO Consider if 1) a flat copy is o.k., 2) it's safer in terms of
+     * adding new fields to RelOptInfo) to copy everything and then reset some
+     * fields, or to zero the structure and copy individual fields.
+     */
+    rel_grouped = makeNode(RelOptInfo);
+    memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
+
+    /*
+     * Note on consider_startup: while the AGG_HASHED strategy needs the whole
+     * relation, AGG_SORTED does not. Therefore we do not force
+     * consider_startup to false.
+     */
+
+    /*
+     * Set the appropriate target for grouped paths.
+     *
+     * reltarget should match the target of partially aggregated paths.
+     */
+    rel_grouped->reltarget = agg_info->target;
+
+    /*
+     * Grouped paths must not be mixed with the plain ones.
+     */
+    rel_grouped->pathlist = NIL;
+    rel_grouped->partial_pathlist = NIL;
+    rel_grouped->cheapest_startup_path = NULL;
+    rel_grouped->cheapest_total_path = NULL;
+    rel_grouped->cheapest_unique_path = NULL;
+    rel_grouped->cheapest_parameterized_paths = NIL;
+
+    /*
+     * The number of aggregation input rows is simply the number of rows of
+     * the non-grouped relation, which should have been estimated by now.
+     */
+    agg_info->input_rows = rel_plain->rows;
+
+    /*
+     * The number of output rows is supposedly different (lower) due to
+     * grouping.
+     */
+    rel_grouped->rows = estimate_num_groups(root, agg_info->group_exprs,
+                                            agg_info->input_rows, NULL,
+                                            NULL);
+
+    *agg_info_p = agg_info;
+    return rel_grouped;
+}
+
 /*
  * find_base_rel
  *      Find a base or other relation entry, which must already exist.
@@ -417,16 +532,20 @@ build_rel_hash(RelInfoList *list)
     /* Insert all the already-existing joinrels */
     foreach(l, list->items)
     {
-        RelOptInfo       *rel = lfirst_node(RelOptInfo, l);
+        void       *item = lfirst(l);
         RelInfoEntry *hentry;
         bool        found;
+        Relids        relids;
+
+        Assert(IsA(item, RelOptInfo));
+        relids = ((RelOptInfo *) item)->relids;
 
         hentry = (RelInfoEntry *) hash_search(hashtab,
-                                              &rel->relids,
+                                              &relids,
                                               HASH_ENTER,
                                               &found);
         Assert(!found);
-        hentry->data = rel;
+        hentry->data = item;
     }
 
     list->hash = hashtab;
@@ -475,9 +594,17 @@ find_rel_info(RelInfoList *list, Relids relids)
 
         foreach(l, list->items)
         {
-            RelOptInfo   *item = lfirst_node(RelOptInfo, l);
+            void       *item = lfirst(l);
+            Relids        item_relids = NULL;
 
-            if (bms_equal(item->relids, relids))
+            Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
+
+            if (IsA(item, RelOptInfo))
+                item_relids = ((RelOptInfo *) item)->relids;
+            else if (IsA(item, RelAggInfo))
+                item_relids = ((RelAggInfo *) item)->relids;
+
+            if (bms_equal(item_relids, relids))
                 return item;
         }
     }
@@ -502,23 +629,31 @@ find_join_rel(PlannerInfo *root, Relids relids)
  *        hashtable if there is one.
  */
 static void
-add_rel_info(RelInfoList *list, RelOptInfo *rel)
+add_rel_info(RelInfoList *list, void *data)
 {
+    Assert(IsA(data, RelOptInfo) || IsA(data, RelAggInfo));
+
     /* GEQO requires us to append the new joinrel to the end of the list! */
-    list->items = lappend(list->items, rel);
+    list->items = lappend(list->items, data);
 
     /* store it into the auxiliary hashtable if there is one. */
     if (list->hash)
     {
+        Relids        relids;
         RelInfoEntry *hentry;
         bool        found;
 
+        if (IsA(data, RelOptInfo))
+            relids = ((RelOptInfo *) data)->relids;
+        else if (IsA(data, RelAggInfo))
+            relids = ((RelAggInfo *) data)->relids;
+
         hentry = (RelInfoEntry *) hash_search(list->hash,
-                                              &rel->relids,
+                                              &relids,
                                               HASH_ENTER,
                                               &found);
         Assert(!found);
-        hentry->data = rel;
+        hentry->data = data;
     }
 }
 
@@ -533,6 +668,63 @@ add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
     add_rel_info(root->join_rel_list, joinrel);
 }
 
+/*
+ * add_grouped_rel
+ *        Add grouped base or join relation to the list of grouped relations in
+ *        the given PlannerInfo. Also add the corresponding RelAggInfo to
+ *        agg_info_list.
+ */
+void
+add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info)
+{
+    add_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], rel);
+    add_rel_info(root->agg_info_list, agg_info);
+}
+
+/*
+ * find_grouped_rel
+ *      Returns grouped relation entry (base or join relation) corresponding to
+ *      'relids' or NULL if none exists.
+ *
+ * If agg_info_p is a valid pointer, then pointer to RelAggInfo that
+ * corresponds to the relation returned is assigned to *agg_info_p.
+ *
+ * The call fetch_upper_rel(root, UPPERREL_PARTIAL_GROUP_AGG, ...) should
+ * return the same relation if it exists, however the behavior is different if
+ * the relation is not there. find_grouped_rel() should be used in
+ * query_planner() and subroutines.
+ */
+RelOptInfo *
+find_grouped_rel(PlannerInfo *root, Relids relids, RelAggInfo **agg_info_p)
+{
+    RelOptInfo *rel;
+
+    rel = (RelOptInfo *) find_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG],
+                                       relids);
+    if (rel == NULL)
+    {
+        if (agg_info_p)
+            *agg_info_p = NULL;
+
+        return NULL;
+    }
+
+    /* Is caller interested in RelAggInfo? */
+    if (agg_info_p)
+    {
+        RelAggInfo *agg_info;
+
+        agg_info = (RelAggInfo *) find_rel_info(root->agg_info_list, relids);
+
+        /* The relation exists, so the agg_info should be there too. */
+        Assert(agg_info != NULL);
+
+        *agg_info_p = agg_info;
+    }
+
+    return rel;
+}
+
 /*
  * set_foreign_rel_properties
  *        Set up foreign-join fields if outer and inner relation are foreign
@@ -595,6 +787,7 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
  * 'restrictlist_ptr': result variable.  If not NULL, *restrictlist_ptr
  *        receives the list of RestrictInfo nodes that apply to this
  *        particular pair of joinable relations.
+ * 'agg_info' indicates that grouped join relation should be created.
  *
  * restrictlist_ptr makes the routine's API a little grotty, but it saves
  * duplicated calculation of the restrictlist...
@@ -605,10 +798,12 @@ build_join_rel(PlannerInfo *root,
                RelOptInfo *outer_rel,
                RelOptInfo *inner_rel,
                SpecialJoinInfo *sjinfo,
-               List **restrictlist_ptr)
+               List **restrictlist_ptr,
+               RelAggInfo *agg_info)
 {
     RelOptInfo *joinrel;
     List       *restrictlist;
+    bool        grouped = agg_info != NULL;
 
     /* This function should be used only for join between parents. */
     Assert(!IS_OTHER_REL(outer_rel) && !IS_OTHER_REL(inner_rel));
@@ -616,7 +811,8 @@ build_join_rel(PlannerInfo *root,
     /*
      * See if we already have a joinrel for this set of base rels.
      */
-    joinrel = find_join_rel(root, joinrelids);
+    joinrel = !grouped ? find_join_rel(root, joinrelids) :
+        find_grouped_rel(root, joinrelids, NULL);
 
     if (joinrel)
     {
@@ -715,9 +911,21 @@ build_join_rel(PlannerInfo *root,
      * and inner rels we first try to build it from.  But the contents should
      * be the same regardless.
      */
-    build_joinrel_tlist(root, joinrel, outer_rel);
-    build_joinrel_tlist(root, joinrel, inner_rel);
-    add_placeholders_to_joinrel(root, joinrel, outer_rel, inner_rel);
+    if (!grouped)
+    {
+        joinrel->reltarget = create_empty_pathtarget();
+        build_joinrel_tlist(root, joinrel, outer_rel);
+        build_joinrel_tlist(root, joinrel, inner_rel);
+        add_placeholders_to_joinrel(root, joinrel, outer_rel, inner_rel);
+    }
+    else
+    {
+        /*
+         * The target for grouped join should already have its cost and width
+         * computed, see create_rel_agg_info().
+         */
+        joinrel->reltarget = agg_info->target;
+    }
 
     /*
      * add_placeholders_to_joinrel also took care of adding the ph_lateral
@@ -749,49 +957,75 @@ build_join_rel(PlannerInfo *root,
     joinrel->has_eclass_joins = has_relevant_eclass_joinclause(root, joinrel);
 
     /* Store the partition information. */
-    build_joinrel_partition_info(joinrel, outer_rel, inner_rel, restrictlist,
-                                 sjinfo->jointype);
+    if (!grouped)
+        build_joinrel_partition_info(joinrel, outer_rel, inner_rel,
+                                     restrictlist, sjinfo->jointype);
 
-    /*
-     * Set estimates of the joinrel's size.
-     */
-    set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
-                               sjinfo, restrictlist);
+    if (!grouped)
+    {
+        /*
+         * Set estimates of the joinrel's size.
+         */
+        set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
+                                   sjinfo, restrictlist);
 
-    /*
-     * Set the consider_parallel flag if this joinrel could potentially be
-     * scanned within a parallel worker.  If this flag is false for either
-     * inner_rel or outer_rel, then it must be false for the joinrel also.
-     * Even if both are true, there might be parallel-restricted expressions
-     * in the targetlist or quals.
-     *
-     * Note that if there are more than two rels in this relation, they could
-     * be divided between inner_rel and outer_rel in any arbitrary way.  We
-     * assume this doesn't matter, because we should hit all the same baserels
-     * and joinclauses while building up to this joinrel no matter which we
-     * take; therefore, we should make the same decision here however we get
-     * here.
-     */
-    if (inner_rel->consider_parallel && outer_rel->consider_parallel &&
-        is_parallel_safe(root, (Node *) restrictlist) &&
-        is_parallel_safe(root, (Node *) joinrel->reltarget->exprs))
-        joinrel->consider_parallel = true;
+        /*
+         * Set the consider_parallel flag if this joinrel could potentially be
+         * scanned within a parallel worker.  If this flag is false for either
+         * inner_rel or outer_rel, then it must be false for the joinrel also.
+         * Even if both are true, there might be parallel-restricted
+         * expressions in the targetlist or quals.
+         *
+         * Note that if there are more than two rels in this relation, they
+         * could be divided between inner_rel and outer_rel in any arbitrary
+         * way.  We assume this doesn't matter, because we should hit all the
+         * same baserels and joinclauses while building up to this joinrel no
+         * matter which we take; therefore, we should make the same decision
+         * here however we get here.
+         */
+        if (inner_rel->consider_parallel && outer_rel->consider_parallel &&
+            is_parallel_safe(root, (Node *) restrictlist) &&
+            is_parallel_safe(root, (Node *) joinrel->reltarget->exprs))
+            joinrel->consider_parallel = true;
+    }
+    else
+    {
+        /*
+         * Grouping essentially changes the number of rows.
+         *
+         * XXX We do not distinguish whether two plain rels are joined and the
+         * result is aggregated, or the aggregation has been already applied
+         * to one of the input rels. Is this worth extra effort, e.g.
+         * maintaining a separate RelOptInfo for each case (one difficulty
+         * that would introduce is construction of AppendPath)?
+         */
+        joinrel->rows = estimate_num_groups(root, agg_info->group_exprs,
+                                            agg_info->input_rows, NULL, NULL);
+    }
 
     /* Add the joinrel to the PlannerInfo. */
-    add_join_rel(root, joinrel);
+    if (!grouped)
+        add_join_rel(root, joinrel);
+    else
+        add_grouped_rel(root, joinrel, agg_info);
 
     /*
-     * Also, if dynamic-programming join search is active, add the new joinrel
-     * to the appropriate sublist.  Note: you might think the Assert on number
-     * of members should be for equality, but some of the level 1 rels might
-     * have been joinrels already, so we can only assert <=.
+     * Also, if dynamic-programming join search is active, add the new
+     * joinrelset to the appropriate sublist.  Note: you might think the
+     * Assert on number of members should be for equality, but some of the
+     * level 1 rels might have been joinrels already, so we can only assert
+     * <=.
+     *
+     * Do noting for grouped relation as it's stored aside from
+     * join_rel_level.
      */
-    if (root->join_rel_level)
+    if (root->join_rel_level && !grouped)
     {
         Assert(root->join_cur_level > 0);
-        Assert(root->join_cur_level <= bms_num_members(joinrel->relids));
+        Assert(root->join_cur_level <= bms_num_members(joinrelids));
         root->join_rel_level[root->join_cur_level] =
-            lappend(root->join_rel_level[root->join_cur_level], joinrel);
+            lappend(root->join_rel_level[root->join_cur_level],
+                    joinrel);
     }
 
     return joinrel;
@@ -2079,3 +2313,624 @@ build_child_join_reltarget(PlannerInfo *root,
     childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
     childrel->reltarget->width = parentrel->reltarget->width;
 }
+
+/*
+ * Check if the relation can produce grouped paths and return the information
+ * it'll need for it. The passed relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+    List       *gvis;
+    List       *aggregates = NIL;
+    bool        found_other_rel_agg;
+    ListCell   *lc;
+    RelAggInfo *result;
+    PathTarget *agg_input;
+    PathTarget *target = NULL;
+    List       *grp_exprs_extra = NIL;
+    List       *group_clauses_final;
+    int            i;
+
+    /*
+     * The function shouldn't have been called if there's no opportunity for
+     * aggregation push-down.
+     */
+    Assert(root->grouped_var_list != NIL);
+
+    /*
+     * The current implementation of aggregation push-down cannot handle
+     * PlaceHolderVar (PHV).
+     *
+     * If we knew that the PHV should be evaluated in this target (and of
+     * course, if its expression matched some Aggref argument), we'd just let
+     * init_grouping_targets add that Aggref. On the other hand, if we knew
+     * that the PHV is evaluated below the current rel, we could ignore it
+     * because the referencing Aggref would take care of propagation of the
+     * value to upper joins.
+     *
+     * The problem is that the same PHV can be evaluated in the target of the
+     * current rel or in that of lower rel --- depending on the input paths.
+     * For example, consider rel->relids = {A, B, C} and if ph_eval_at = {B,
+     * C}. Path "A JOIN (B JOIN C)" implies that the PHV is evaluated by the
+     * "(B JOIN C)", while path "(A JOIN B) JOIN C" evaluates the PHV itself.
+     */
+    foreach(lc, rel->reltarget->exprs)
+    {
+        Expr       *expr = lfirst(lc);
+
+        if (IsA(expr, PlaceHolderVar))
+            return NULL;
+    }
+
+    if (IS_SIMPLE_REL(rel))
+    {
+        RangeTblEntry *rte = root->simple_rte_array[rel->relid];;
+
+        /*
+         * rtekind != RTE_RELATION case is not supported yet.
+         */
+        if (rte->rtekind != RTE_RELATION)
+            return NULL;
+    }
+
+    /* Caller should only pass base relations or joins. */
+    Assert(rel->reloptkind == RELOPT_BASEREL ||
+           rel->reloptkind == RELOPT_JOINREL);
+
+    /*
+     * If any outer join can set the attribute value to NULL, the Agg plan
+     * would receive different input at the base rel level.
+     *
+     * XXX For RELOPT_JOINREL, do not return if all the joins that can set any
+     * entry of the grouped target (do we need to postpone this check until
+     * the grouped target is available, and init_grouping_targets take care?)
+     * of this rel to NULL are provably below rel. (It's ok if rel is one of
+     * these joins.)
+     */
+    if (bms_overlap(rel->relids, root->nullable_baserels))
+        return NULL;
+
+    /*
+     * Use equivalence classes to generate additional grouping expressions for
+     * the current rel. Without these we might not be able to apply
+     * aggregation to the relation result set.
+     *
+     * It's important that create_grouping_expr_grouped_var_infos has
+     * processed the explicit grouping columns by now. If the grouping clause
+     * contains multiple expressions belonging to the same EC, the original
+     * (i.e. not derived) one should be preferred when we build grouping
+     * target for a relation. Otherwise we have a problem when trying to match
+     * target entries to grouping clauses during plan creation, see
+     * get_grouping_expression().
+     */
+    gvis = list_copy(root->grouped_var_list);
+    foreach(lc, root->grouped_var_list)
+    {
+        GroupedVarInfo *gvi = lfirst_node(GroupedVarInfo, lc);
+        int            relid = -1;
+
+        /* Only interested in grouping expressions. */
+        if (IsA(gvi->gvexpr, Aggref))
+            continue;
+
+        while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+        {
+            GroupedVarInfo *gvi_trans;
+
+            gvi_trans = translate_expression_to_rel(root, gvi, relid);
+            if (gvi_trans != NULL)
+                gvis = lappend(gvis, gvi_trans);
+        }
+    }
+
+    /*
+     * Check if some aggregates or grouping expressions can be evaluated in
+     * this relation's target, and collect all vars referenced by these
+     * aggregates / grouping expressions;
+     */
+    found_other_rel_agg = false;
+    foreach(lc, gvis)
+    {
+        GroupedVarInfo *gvi = lfirst_node(GroupedVarInfo, lc);
+
+        /*
+         * The subset includes gv_eval_at uninitialized, which includes
+         * Aggref.aggstar.
+         */
+        if (bms_is_subset(gvi->gv_eval_at, rel->relids))
+        {
+            /*
+             * init_grouping_targets will handle plain Var grouping
+             * expressions because it needs to look them up in
+             * grouped_var_list anyway.
+             */
+            if (IsA(gvi->gvexpr, Var))
+                continue;
+
+            /*
+             * Currently, GroupedVarInfo only handles Vars and Aggrefs.
+             */
+            Assert(IsA(gvi->gvexpr, Aggref));
+
+            gvi->agg_partial = (Aggref *) copyObject(gvi->gvexpr);
+            mark_partial_aggref(gvi->agg_partial, AGGSPLIT_INITIAL_SERIAL);
+
+            /*
+             * Accept the aggregate.
+             */
+            aggregates = lappend(aggregates, gvi);
+        }
+        else if (IsA(gvi->gvexpr, Aggref))
+        {
+            /*
+             * Remember that there is at least one aggregate expression that
+             * needs something else than this rel.
+             */
+            found_other_rel_agg = true;
+
+            /*
+             * This condition effectively terminates creation of the
+             * RelAggInfo, so there's no reason to check the next
+             * GroupedVarInfo.
+             */
+            break;
+        }
+    }
+
+    /*
+     * Grouping makes little sense w/o aggregate function and w/o grouping
+     * expressions.
+     */
+    if (aggregates == NIL)
+    {
+        list_free(gvis);
+        return NULL;
+    }
+
+    /*
+     * Give up if some other aggregate(s) need relations other than the
+     * current one.
+     *
+     * If the aggregate needs the current rel plus anything else, then the
+     * problem is that grouping of the current relation could make some input
+     * variables unavailable for the "higher aggregate", and it'd also
+     * decrease the number of input rows the "higher aggregate" receives.
+     *
+     * If the aggregate does not even need the current rel, then neither the
+     * current rel nor anything else should be grouped because we do not
+     * support join of two grouped relations.
+     */
+    if (found_other_rel_agg)
+    {
+        list_free(gvis);
+        return NULL;
+    }
+
+    /*
+     * Create target for grouped paths as well as one for the input paths of
+     * the aggregation paths.
+     */
+    target = create_empty_pathtarget();
+    agg_input = create_empty_pathtarget();
+
+    /*
+     * Cannot suitable targets for the aggregation push-down be derived?
+     */
+    if (!init_grouping_targets(root, rel, target, agg_input, gvis,
+                               &grp_exprs_extra))
+    {
+        list_free(gvis);
+        return NULL;
+    }
+
+    list_free(gvis);
+
+    /*
+     * Aggregation push-down makes no sense w/o grouping expressions.
+     */
+    if ((list_length(target->exprs) + list_length(grp_exprs_extra)) == 0)
+        return NULL;
+
+    group_clauses_final = root->parse->groupClause;
+
+    /*
+     * If the aggregation target should have extra grouping expressions (in
+     * order to emit input vars for join conditions), add them now. This step
+     * includes assignment of tleSortGroupRef's which we can generate now.
+     */
+    if (list_length(grp_exprs_extra) > 0)
+    {
+        Index        sortgroupref;
+
+        /*
+         * We'll have to add some clauses, but query group clause must be
+         * preserved.
+         */
+        group_clauses_final = list_copy(group_clauses_final);
+
+        /*
+         * Always start at root->max_sortgroupref. The extra grouping
+         * expressions aren't used during the final aggregation, so the
+         * sortgroupref values don't need to be unique across the query. Thus
+         * we don't have to increase root->max_sortgroupref, which makes
+         * recognition of the extra grouping expressions pretty easy.
+         */
+        sortgroupref = root->max_sortgroupref;
+
+        /*
+         * Generate the SortGroupClause's and add the expressions to the
+         * target.
+         */
+        foreach(lc, grp_exprs_extra)
+        {
+            Var           *var = lfirst_node(Var, lc);
+            SortGroupClause *cl = makeNode(SortGroupClause);
+
+            /*
+             * Initialize the SortGroupClause.
+             *
+             * As the final aggregation will not use this grouping expression,
+             * we don't care whether sortop is < or >. The value of
+             * nulls_first should not matter for the same reason.
+             */
+            cl->tleSortGroupRef = ++sortgroupref;
+            get_sort_group_operators(var->vartype,
+                                     false, true, false,
+                                     &cl->sortop, &cl->eqop, NULL,
+                                     &cl->hashable);
+            group_clauses_final = lappend(group_clauses_final, cl);
+            add_column_to_pathtarget(target, (Expr *) var,
+                                     cl->tleSortGroupRef);
+
+            /*
+             * The aggregation input target must emit this var too.
+             */
+            add_column_to_pathtarget(agg_input, (Expr *) var,
+                                     cl->tleSortGroupRef);
+        }
+    }
+
+    /*
+     * Add aggregates to the grouping target.
+     */
+    foreach(lc, aggregates)
+    {
+        GroupedVarInfo *gvi;
+
+        gvi = lfirst_node(GroupedVarInfo, lc);
+        add_column_to_pathtarget(target, (Expr *) gvi->agg_partial,
+                                 gvi->sortgroupref);
+    }
+
+    /*
+     * Build a list of grouping expressions and a list of the corresponding
+     * SortGroupClauses.
+     */
+    i = 0;
+    result = makeNode(RelAggInfo);
+    foreach(lc, target->exprs)
+    {
+        Index        sortgroupref = 0;
+        SortGroupClause *cl;
+        Expr       *texpr;
+
+        texpr = (Expr *) lfirst(lc);
+
+        if (IsA(texpr, Aggref))
+        {
+            /*
+             * Once we see Aggref, no grouping expressions should follow.
+             */
+            break;
+        }
+
+        /*
+         * Find the clause by sortgroupref.
+         */
+        sortgroupref = target->sortgrouprefs[i++];
+
+        /*
+         * Besides being an aggregate, the target expression should have no
+         * other reason then being a column of a relation functionally
+         * dependent on the GROUP BY clause. So it's not actually a grouping
+         * column.
+         */
+        if (sortgroupref == 0)
+            continue;
+
+        /*
+         * group_clause_final contains the "local" clauses, so this search
+         * should succeed.
+         */
+        cl = get_sortgroupref_clause(sortgroupref, group_clauses_final);
+
+        result->group_clauses = list_append_unique(result->group_clauses,
+                                                   cl);
+
+        /*
+         * Add only unique clauses because of joins (both sides of a join can
+         * point at the same grouping clause). XXX Is it worth adding a bool
+         * argument indicating that we're dealing with join right now?
+         */
+        result->group_exprs = list_append_unique(result->group_exprs,
+                                                 texpr);
+    }
+
+    /*
+     * Since neither target nor agg_input is supposed to be identical to the
+     * source reltarget, compute the width and cost again.
+     *
+     * target does not yet contain aggregates, but these will be accounted by
+     * AggPath.
+     */
+    set_pathtarget_cost_width(root, target);
+    set_pathtarget_cost_width(root, agg_input);
+
+    result->relids = bms_copy(rel->relids);
+    result->target = target;
+    result->agg_input = agg_input;
+
+    /* Finally collect the aggregates. */
+    while (lc != NULL)
+    {
+        Aggref       *aggref = lfirst_node(Aggref, lc);
+
+        /*
+         * Partial aggregation is what the grouped paths should do.
+         */
+        result->agg_exprs = lappend(result->agg_exprs, aggref);
+        lc = lnext(target->exprs, lc);
+    }
+
+    /* The "input_rows" field should be set by caller. */
+    return result;
+}
+
+/*
+ * Initialize target for grouped paths (target) as well as a target for paths
+ * that generate input for aggregation (agg_input).
+ *
+ * group_exprs_extra_p receives a list of Var nodes for which we need to
+ * construct SortGroupClause. Those vars will then be used as additional
+ * grouping expressions, for the sake of join clauses.
+ *
+ * gvis a list of GroupedVarInfo's possibly useful for rel.
+ *
+ * Return true iff the targets could be initialized.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+                      PathTarget *target, PathTarget *agg_input,
+                      List *gvis, List **group_exprs_extra_p)
+{
+    ListCell   *lc;
+    List       *possibly_dependent = NIL;
+    Var           *tvar;
+
+    foreach(lc, rel->reltarget->exprs)
+    {
+        Index        sortgroupref;
+
+        /*
+         * Given that PlaceHolderVar currently prevents us from doing
+         * aggregation push-down, the source target cannot contain anything
+         * more complex than a Var.
+         */
+        tvar = lfirst_node(Var, lc);
+
+        sortgroupref = get_expression_sortgroupref((Expr *) tvar, gvis);
+        if (sortgroupref > 0)
+        {
+            /*
+             * If the target expression can be used as the grouping key, we
+             * don't have to worry whether it can be emitted by the AggPath
+             * pushed down to relation / join.
+             */
+            add_column_to_pathtarget(target, (Expr *) tvar, sortgroupref);
+
+            /*
+             * As for agg_input, add the original expression but set
+             * sortgroupref in addition.
+             */
+            add_column_to_pathtarget(agg_input, (Expr *) tvar, sortgroupref);
+        }
+        else
+        {
+            if (is_var_needed_by_join(root, tvar, rel))
+            {
+                /*
+                 * The variable is needed for a join, however it's neither in
+                 * the GROUP BY clause nor can it be derived from it using EC.
+                 * (Otherwise it would have to be added to the targets above.)
+                 * We need to construct special SortGroupClause for that
+                 * variable.
+                 *
+                 * Note that its tleSortGroupRef needs to be unique within
+                 * agg_input, so we need to postpone creation of the
+                 * SortGroupClause's until we're done with the iteration of
+                 * rel->reltarget->exprs. Also it makes sense for the caller
+                 * to do some more check before it starts to create those
+                 * SortGroupClause's.
+                 */
+                *group_exprs_extra_p = lappend(*group_exprs_extra_p, tvar);
+            }
+            else if (is_var_in_aggref_only(root, tvar))
+            {
+                /*
+                 * Another reason we might need this variable is that some
+                 * aggregate pushed down to this relation references it. In
+                 * such a case, add that var to agg_input, but not to
+                 * "target". However, if the aggregate is not the only reason
+                 * for the var to be in the target, some more checks need to
+                 * be performed below.
+                 */
+                add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+            }
+            else
+            {
+                /*
+                 * The Var can be functionally dependent on another expression
+                 * of the target, but we cannot check until the other
+                 * expressions are in the target.
+                 */
+                possibly_dependent = lappend(possibly_dependent, tvar);
+            }
+        }
+    }
+
+    /*
+     * Now we can check whether the expression is functionally dependent on
+     * another one.
+     */
+    foreach(lc, possibly_dependent)
+    {
+        List       *deps = NIL;
+        RangeTblEntry *rte;
+
+        tvar = lfirst_node(Var, lc);
+        rte = root->simple_rte_array[tvar->varno];
+
+        /*
+         * Check if the Var can be in the grouping key even though it's not
+         * mentioned by the GROUP BY clause (and could not be derived using
+         * ECs).
+         */
+        if (check_functional_grouping(rte->relid, tvar->varno,
+                                      tvar->varlevelsup,
+                                      target->exprs, &deps))
+        {
+            /*
+             * The var shouldn't be actually used for grouping key evaluation
+             * (instead, the one this depends on will be), so sortgroupref
+             * should not be important.
+             */
+            add_new_column_to_pathtarget(target, (Expr *) tvar);
+            add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+        }
+        else
+        {
+            /*
+             * As long as the query is semantically correct, arriving here
+             * means that the var is referenced by a generic grouping
+             * expression but not referenced by any join.
+             *
+             * If the aggregate push-down will support generic grouping
+             * expression sin the future, create_rel_agg_info() will have to
+             * add this variable to "agg_input" target and also add the whole
+             * generic expression to "target".
+             */
+            return false;
+        }
+    }
+
+    return true;
+}
+
+/*
+ * Check whether given variable appears in Aggref(s) which we consider usable
+ * at relation / join level, and only in the Aggref(s).
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+    ListCell   *lc;
+    bool        found = false;
+
+    foreach(lc, root->grouped_var_list)
+    {
+        GroupedVarInfo *gvi = lfirst_node(GroupedVarInfo, lc);
+        ListCell   *lc2;
+        List       *vars;
+
+        if (!IsA(gvi->gvexpr, Aggref))
+            continue;
+
+        if (!bms_is_member(var->varno, gvi->gv_eval_at))
+            continue;
+
+        /*
+         * XXX Consider some sort of caching.
+         */
+        vars = pull_var_clause((Node *) gvi->gvexpr, PVC_RECURSE_AGGREGATES);
+        foreach(lc2, vars)
+        {
+            Var           *v = lfirst_node(Var, lc2);
+
+            if (equal(v, var))
+            {
+                found = true;
+                break;
+            }
+
+        }
+        list_free(vars);
+
+        if (found)
+            break;
+    }
+
+    /* No aggregate references the Var? */
+    if (!found)
+        return false;
+
+    /* Does the Var appear in the target outside aggregates? */
+    found = false;
+    foreach(lc, root->processed_tlist)
+    {
+        TargetEntry *te = lfirst_node(TargetEntry, lc);
+
+        if (IsA(te->expr, Aggref))
+            continue;
+
+        if (equal(te->expr, var))
+            return false;
+
+    }
+
+    /* The Var is in aggregate(s) and only there. */
+    return true;
+}
+
+/*
+ * Check if given variable is needed by joins above the current rel?
+ *
+ * Consider pushing the aggregate avg(b.y) down to relation "b" for the
+ * following query:
+ *
+ *    SELECT a.i, avg(b.y)
+ *    FROM a JOIN b ON b.j = a.i
+ *    GROUP BY a.i;
+ *
+ * If we aggregate the "b" relation alone, the column "b.j" needs to be used
+ * as the grouping key because otherwise it cannot find its way to the input
+ * of the join expression.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+    Relids        relids_no_top;
+    int            ndx;
+    RelOptInfo *baserel;
+
+    /*
+     * The relids we're not interested in do include 0, which is the top-level
+     * targetlist. The only reason for relids to contain 0 should be that
+     * arg_var is referenced either by aggregate or by grouping expression,
+     * but right now we're interested in the *other* reasons. (As soon
+     * aggregation is pushed down, the aggregates in the query targetlist no
+     * longer need direct reference to arg_var anyway.)
+     */
+
+    relids_no_top = bms_copy(rel->relids);
+    bms_add_member(relids_no_top, 0);
+
+    baserel = find_base_rel(root, var->varno);
+    ndx = var->varattno - baserel->min_attr;
+    if (bms_nonempty_difference(baserel->attr_needed[ndx],
+                                relids_no_top))
+        return true;
+
+    return false;
+}
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index 784a1af82d..943ffb0a67 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -820,6 +820,37 @@ apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target)
     }
 }
 
+/*
+ * Return sortgroupref if expr can be used as the grouping expression in an
+ * AggPath at relation or join level, or 0 if it can't.
+ *
+ * gvis a list of a list of GroupedVarInfo's available for the query,
+ * including those derived using equivalence classes.
+ */
+Index
+get_expression_sortgroupref(Expr *expr, List *gvis)
+{
+    ListCell   *lc;
+
+    foreach(lc, gvis)
+    {
+        GroupedVarInfo *gvi = lfirst_node(GroupedVarInfo, lc);
+
+        if (IsA(gvi->gvexpr, Aggref))
+            continue;
+
+        if (equal(gvi->gvexpr, expr))
+        {
+            Assert(gvi->sortgroupref > 0);
+
+            return gvi->sortgroupref;
+        }
+    }
+
+    /* The expression cannot be used as grouping key. */
+    return 0;
+}
+
 /*
  * split_pathtarget_at_srfs
  *        Split given PathTarget into multiple levels to position SRFs safely
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 836b49484a..8b9ec81418 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -939,6 +939,16 @@ struct config_bool ConfigureNamesBool[] =
         false,
         NULL, NULL, NULL
     },
+    {
+        {"enable_agg_pushdown", PGC_USERSET, QUERY_TUNING_METHOD,
+            gettext_noop("Enables aggregate push-down."),
+            NULL,
+            GUC_EXPLAIN
+        },
+        &enable_agg_pushdown,
+        false,
+        NULL, NULL, NULL
+    },
     {
         {"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
             gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0ca7d5ab51..07459c423f 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -370,6 +370,9 @@ struct PlannerInfo
     /* list of PlaceHolderInfos */
     List       *placeholder_list;
 
+    /* List of GroupedVarInfos. */
+    List       *grouped_var_list;
+
     /* array of PlaceHolderInfos indexed by phid */
     struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
     /* allocated size of array */
@@ -410,6 +413,12 @@ struct PlannerInfo
      */
     RelInfoList       upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);;
 
+    /*
+     * list of grouped relation RelAggInfos. One instance of RelAggInfo per
+     * item of the upper_rels[UPPERREL_PARTIAL_GROUP_AGG] list.
+     */
+    struct RelInfoList *agg_info_list;
+
     /* Result tlists chosen by grouping_planner for upper-stage processing */
     struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
 
@@ -424,6 +433,12 @@ struct PlannerInfo
      */
     List       *processed_tlist;
 
+    /*
+     * The maximum ressortgroupref among target entries in processed_list.
+     * Useful when adding extra grouping expressions for partial aggregation.
+     */
+    int            max_sortgroupref;
+
     /*
      * For UPDATE, this list contains the target table's attribute numbers to
      * which the first N entries of processed_tlist are to be assigned.  (Any
@@ -1025,6 +1040,62 @@ typedef struct RelOptInfo
     ((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
      (rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
 
+/*
+ * RelAggInfo
+ *        Information needed to create grouped paths for base rels and joins.
+ *
+ * "relids" is the set of base-relation identifiers, just like with
+ * RelOptInfo.
+ *
+ * "target" will be used as pathtarget if partial aggregation is applied to
+ * base relation or join. The same target will also --- if the relation is a
+ * join --- be used to joinin grouped path to a non-grouped one.  This target
+ * can contain plain-Var grouping expressions and Aggref nodes.
+ *
+ * Note: There's a convention that Aggref expressions are supposed to follow
+ * the other expressions of the target. Iterations of ->exprs may rely on this
+ * arrangement.
+ *
+ * "agg_input" contains Vars used either as grouping expressions or aggregate
+ * arguments. Paths providing the aggregation plan with input data should use
+ * this target. The only difference from reltarget of the non-grouped relation
+ * is that some items can have sortgroupref initialized.
+ *
+ * "input_rows" is the estimated number of input rows for AggPath. It's
+ * actually just a workspace for users of the structure, i.e. not initialized
+ * when instance of the structure is created.
+ *
+ * "group_clauses" and "group_exprs" are lists of SortGroupClause and the
+ * corresponding grouping expressions respectively.
+ *
+ * "agg_exprs" is a list of Aggref nodes for the aggregation of the relation's
+ * paths.
+ *
+ * "rel_grouped" is the relation containing the partially aggregated paths.
+ */
+typedef struct RelAggInfo
+{
+    pg_node_attr(no_copy_equal, no_read)
+
+    NodeTag        type;
+
+    Relids        relids;            /* Base rels contained in this grouped rel. */
+
+    struct PathTarget *target;    /* Target for grouped paths. */
+
+    struct PathTarget *agg_input;    /* pathtarget of paths that generate input
+                                     * for aggregation paths. */
+
+    double        input_rows;
+
+    List       *group_clauses;
+    List       *group_exprs;
+
+    List       *agg_exprs;        /* Aggref expressions. */
+
+    RelOptInfo *rel_grouped;    /* Grouped relation. */
+} RelAggInfo;
+
 /*
  * IndexOptInfo
  *        Per-index information for planning/optimization
@@ -2888,6 +2959,29 @@ typedef struct PlaceHolderInfo
     int32        ph_width;
 } PlaceHolderInfo;
 
+/*
+ * GroupedVarInfo exists for each expression that can be used as an aggregate
+ * or grouping expression evaluated below a join.
+ *
+ * TODO Rename, perhaps to GroupedTargetEntry? (Also rename the variables of
+ * this type.)
+ */
+typedef struct GroupedVarInfo
+{
+    pg_node_attr(no_copy_equal, no_read)
+
+    NodeTag        type;
+
+    Expr       *gvexpr;            /* the represented expression. */
+    Aggref       *agg_partial;    /* if gvexpr is aggregate, agg_partial is the
+                                 * corresponding partial aggregate */
+    Index        sortgroupref;    /* If gvexpr is a grouping expression, this is
+                                 * the tleSortGroupRef of the corresponding
+                                 * SortGroupClause. */
+    Relids        gv_eval_at;        /* lowest level we can evaluate the expression
+                                 * at or NULL if it can happen anywhere. */
+} GroupedVarInfo;
+
 /*
  * This struct describes one potentially index-optimizable MIN/MAX aggregate
  * function.  MinMaxAggPath contains a list of these, and if we accept that
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index ff242d1b6d..a4d2249a11 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -54,5 +54,6 @@ extern Query *inline_set_returning_function(PlannerInfo *root,
                                             RangeTblEntry *rte);
 
 extern Bitmapset *pull_paramids(Expr *expr);
-
+extern GroupedVarInfo *translate_expression_to_rel(PlannerInfo *root,
+                                                   GroupedVarInfo *gvi, Index relid);
 #endif                            /* CLAUSES_H */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 050f00e79a..4aea6bd94f 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -230,6 +230,14 @@ extern AggPath *create_agg_path(PlannerInfo *root,
                                 List *qual,
                                 const AggClauseCosts *aggcosts,
                                 double numGroups);
+extern AggPath *create_agg_sorted_path(PlannerInfo *root,
+                                       RelOptInfo *rel,
+                                       Path *subpath,
+                                       RelAggInfo *agg_info);
+extern AggPath *create_agg_hashed_path(PlannerInfo *root,
+                                       RelOptInfo *rel,
+                                       Path *subpath,
+                                       RelAggInfo *agg_info);
 extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
                                                   RelOptInfo *rel,
                                                   Path *subpath,
@@ -303,14 +311,21 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
 extern void expand_planner_arrays(PlannerInfo *root, int add_size);
 extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
                                     RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, int relid,
+                                            RelAggInfo **agg_info_p);
 extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
 extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
+extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel,
+                            RelAggInfo *agg_info);
+extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids,
+                                    RelAggInfo **agg_info_p);
 extern RelOptInfo *build_join_rel(PlannerInfo *root,
                                   Relids joinrelids,
                                   RelOptInfo *outer_rel,
                                   RelOptInfo *inner_rel,
                                   SpecialJoinInfo *sjinfo,
-                                  List **restrictlist_ptr);
+                                  List **restrictlist_ptr,
+                                  RelAggInfo *agg_info);
 extern Relids min_join_parameterization(PlannerInfo *root,
                                         Relids joinrelids,
                                         RelOptInfo *outer_rel,
@@ -336,5 +351,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
                                         RelOptInfo *outer_rel, RelOptInfo *inner_rel,
                                         RelOptInfo *parent_joinrel, List *restrictlist,
                                         SpecialJoinInfo *sjinfo, JoinType jointype);
-
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
 #endif                            /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 41f765d342..0a8e09a2e2 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
  * allpaths.c
  */
 extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_agg_pushdown;
 extern PGDLLIMPORT int geqo_threshold;
 extern PGDLLIMPORT int min_parallel_table_scan_size;
 extern PGDLLIMPORT int min_parallel_index_scan_size;
@@ -56,6 +57,11 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
                                   bool override_rows);
 extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
                                          bool override_rows);
+extern void generate_grouping_paths(PlannerInfo *root,
+                                    RelOptInfo *rel_grouped,
+                                    RelOptInfo *rel_plain,
+                                    RelAggInfo *agg_info);
+
 extern int    compute_parallel_worker(RelOptInfo *rel, double heap_pages,
                                     double index_pages, int max_workers);
 extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9dffdcfd1e..5a253e2283 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -72,6 +72,7 @@ extern void add_other_rels_to_query(PlannerInfo *root);
 extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
 extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
                                    Relids where_needed);
+extern void setup_aggregate_pushdown(PlannerInfo *root);
 extern void find_lateral_references(PlannerInfo *root);
 extern void create_lateral_join_info(PlannerInfo *root);
 extern List *deconstruct_jointree(PlannerInfo *root);
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 5b4f350b33..c8e0f2a0d7 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -46,6 +46,8 @@ extern PlanRowMark *get_plan_rowmark(List *rowmarks, Index rtindex);
  */
 extern void get_agg_clause_costs(PlannerInfo *root, AggSplit aggsplit,
                                  AggClauseCosts *costs);
+extern void get_agg_clause_costs_some(PlannerInfo *root, AggSplit aggsplit,
+                                      List *aggrefs, AggClauseCosts *costs);
 extern void preprocess_aggrefs(PlannerInfo *root, Node *clause);
 
 /*
diff --git a/src/include/optimizer/tlist.h b/src/include/optimizer/tlist.h
index 04668ba1c0..6e71ed47ab 100644
--- a/src/include/optimizer/tlist.h
+++ b/src/include/optimizer/tlist.h
@@ -49,8 +49,10 @@ extern void split_pathtarget_at_srfs(PlannerInfo *root,
                                      PathTarget *target, PathTarget *input_target,
                                      List **targets, List **targets_contain_srfs);
 
+/* TODO Find the best location for this one. */
+extern Index get_expression_sortgroupref(Expr *expr, List *gvis);
+
 /* Convenience macro to get a PathTarget with valid cost/width fields */
 #define create_pathtarget(root, tlist) \
     set_pathtarget_cost_width(root, make_pathtarget_from_tlist(tlist))
-
 #endif                            /* TLIST_H */
diff --git a/src/test/regress/expected/agg_pushdown.out b/src/test/regress/expected/agg_pushdown.out
new file mode 100644
index 0000000000..03a5ccf571
--- /dev/null
+++ b/src/test/regress/expected/agg_pushdown.out
@@ -0,0 +1,216 @@
+CREATE TABLE agg_pushdown_parent (
+    i int primary key,
+    x int);
+CREATE TABLE agg_pushdown_child1 (
+    j int,
+    parent int references agg_pushdown_parent,
+    v double precision,
+    PRIMARY KEY (j, parent));
+CREATE INDEX ON agg_pushdown_child1(parent);
+CREATE TABLE agg_pushdown_child2 (
+    k int,
+    parent int references agg_pushdown_parent,
+    v double precision,
+    PRIMARY KEY (k, parent));;
+INSERT INTO agg_pushdown_parent(i, x)
+SELECT n, n
+FROM generate_series(0, 7) AS s(n);
+INSERT INTO agg_pushdown_child1(j, parent, v)
+SELECT 128 * i + n, i, random()
+FROM generate_series(0, 127) AS s(n), agg_pushdown_parent;
+INSERT INTO agg_pushdown_child2(k, parent, v)
+SELECT 128 * i + n, i, random()
+FROM generate_series(0, 127) AS s(n), agg_pushdown_parent;
+ANALYZE;
+SET enable_agg_pushdown TO on;
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+--
+-- In addition, check that functionally dependent column "c.x" can be
+-- referenced by SELECT although GROUP BY references "p.i".
+EXPLAIN (COSTS off)
+SELECT p.x, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+                                      QUERY PLAN                                      
+--------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Sort
+         Sort Key: p.i
+         ->  Nested Loop
+               ->  Partial HashAggregate
+                     Group Key: c1.parent
+                     ->  Seq Scan on agg_pushdown_child1 c1
+               ->  Index Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+                     Index Cond: (i = c1.parent)
+(10 rows)
+
+-- The same for hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Sort
+         Sort Key: p.i
+         ->  Hash Join
+               Hash Cond: (p.i = c1.parent)
+               ->  Seq Scan on agg_pushdown_parent p
+               ->  Hash
+                     ->  Partial HashAggregate
+                           Group Key: c1.parent
+                           ->  Seq Scan on agg_pushdown_child1 c1
+(11 rows)
+
+-- The same for merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Merge Join
+         Merge Cond: (p.i = c1.parent)
+         ->  Sort
+               Sort Key: p.i
+               ->  Seq Scan on agg_pushdown_parent p
+         ->  Sort
+               Sort Key: c1.parent
+               ->  Partial HashAggregate
+                     Group Key: c1.parent
+                     ->  Seq Scan on agg_pushdown_child1 c1
+(12 rows)
+
+-- Restore the default values.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO on;
+-- Scan index on agg_pushdown_child1(parent) column and aggregate the result
+-- using AGG_SORTED strategy.
+SET enable_seqscan TO off;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+                                         QUERY PLAN                                          
+---------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Nested Loop
+         ->  Partial GroupAggregate
+               Group Key: c1.parent
+               ->  Index Scan using agg_pushdown_child1_parent_idx on agg_pushdown_child1 c1
+         ->  Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+               Index Cond: (i = c1.parent)
+(8 rows)
+
+SET enable_seqscan TO on;
+-- Join "c1" to "p.x" column, i.e. one that is not in the GROUP BY clause. The
+-- planner should still use "c1.parent" as grouping expression for partial
+-- aggregation, although it's not in the same equivalence class as the GROUP
+-- BY expression ("p.i"). The reason to use "c1.parent" for partial
+-- aggregation is that this is the only way for "c1" to provide the join
+-- expression with input data.
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.x GROUP BY p.i;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Sort
+         Sort Key: p.i
+         ->  Hash Join
+               Hash Cond: (p.x = c1.parent)
+               ->  Seq Scan on agg_pushdown_parent p
+               ->  Hash
+                     ->  Partial HashAggregate
+                           Group Key: c1.parent
+                           ->  Seq Scan on agg_pushdown_child1 c1
+(11 rows)
+
+-- Perform nestloop join between agg_pushdown_child1 and agg_pushdown_child2
+-- and aggregate the result.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+                                            QUERY PLAN                                             
+---------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Sort
+         Sort Key: p.i
+         ->  Nested Loop
+               ->  Partial HashAggregate
+                     Group Key: c1.parent
+                     ->  Nested Loop
+                           ->  Seq Scan on agg_pushdown_child1 c1
+                           ->  Index Scan using agg_pushdown_child2_pkey on agg_pushdown_child2 c2
+                                 Index Cond: ((k = c1.j) AND (parent = c1.parent))
+               ->  Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+                     Index Cond: (i = c1.parent)
+(13 rows)
+
+-- The same for hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+                                       QUERY PLAN                                       
+----------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Sort
+         Sort Key: p.i
+         ->  Hash Join
+               Hash Cond: (p.i = c1.parent)
+               ->  Seq Scan on agg_pushdown_parent p
+               ->  Hash
+                     ->  Partial HashAggregate
+                           Group Key: c1.parent
+                           ->  Hash Join
+                                 Hash Cond: ((c1.parent = c2.parent) AND (c1.j = c2.k))
+                                 ->  Seq Scan on agg_pushdown_child1 c1
+                                 ->  Hash
+                                       ->  Seq Scan on agg_pushdown_child2 c2
+(15 rows)
+
+-- The same for merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+SET enable_seqscan TO off;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+                                            QUERY PLAN                                             
+---------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Merge Join
+         Merge Cond: (c1.parent = p.i)
+         ->  Sort
+               Sort Key: c1.parent
+               ->  Partial HashAggregate
+                     Group Key: c1.parent
+                     ->  Merge Join
+                           Merge Cond: ((c1.j = c2.k) AND (c1.parent = c2.parent))
+                           ->  Index Scan using agg_pushdown_child1_pkey on agg_pushdown_child1 c1
+                           ->  Index Scan using agg_pushdown_child2_pkey on agg_pushdown_child2 c2
+         ->  Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+(13 rows)
+
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 579b861d84..442f7f9b41 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -111,6 +111,7 @@ select count(*) = 0 as ok from pg_stat_wal_receiver;
 select name, setting from pg_settings where name like 'enable%';
               name              | setting 
 --------------------------------+---------
+ enable_agg_pushdown            | off
  enable_async_append            | on
  enable_bitmapscan              | on
  enable_gathermerge             | on
@@ -131,7 +132,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(20 rows)
+(21 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 6d80ab1a6d..8b8eadd181 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -2046,7 +2046,7 @@ create trigger failed after update on parted_trig
   referencing old table as old_table
   for each row execute procedure trigger_nothing();
 ERROR:  "parted_trig" is a partitioned table
-DETAIL:  ROW triggers with transition tables are not supported on partitioned tables.
+DETAIL:  Triggers on partitioned tables cannot have transition tables.
 drop table parted_trig;
 --
 -- Verify trigger creation for partitioned tables, and drop behavior
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 9a139f1e24..1fbf5321da 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -99,6 +99,8 @@ test: select_parallel
 test: write_parallel
 test: vacuum_parallel
 
+test: agg_pushdown
+
 # no relation related tests can be put in this group
 test: publication subscription
 
diff --git a/src/test/regress/sql/agg_pushdown.sql b/src/test/regress/sql/agg_pushdown.sql
new file mode 100644
index 0000000000..0a4614592b
--- /dev/null
+++ b/src/test/regress/sql/agg_pushdown.sql
@@ -0,0 +1,115 @@
+CREATE TABLE agg_pushdown_parent (
+    i int primary key,
+    x int);
+
+CREATE TABLE agg_pushdown_child1 (
+    j int,
+    parent int references agg_pushdown_parent,
+    v double precision,
+    PRIMARY KEY (j, parent));
+
+CREATE INDEX ON agg_pushdown_child1(parent);
+
+CREATE TABLE agg_pushdown_child2 (
+    k int,
+    parent int references agg_pushdown_parent,
+    v double precision,
+    PRIMARY KEY (k, parent));;
+
+INSERT INTO agg_pushdown_parent(i, x)
+SELECT n, n
+FROM generate_series(0, 7) AS s(n);
+
+INSERT INTO agg_pushdown_child1(j, parent, v)
+SELECT 128 * i + n, i, random()
+FROM generate_series(0, 127) AS s(n), agg_pushdown_parent;
+
+INSERT INTO agg_pushdown_child2(k, parent, v)
+SELECT 128 * i + n, i, random()
+FROM generate_series(0, 127) AS s(n), agg_pushdown_parent;
+
+ANALYZE;
+
+SET enable_agg_pushdown TO on;
+
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+--
+-- In addition, check that functionally dependent column "c.x" can be
+-- referenced by SELECT although GROUP BY references "p.i".
+EXPLAIN (COSTS off)
+SELECT p.x, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+
+-- The same for hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+
+-- The same for merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+
+-- Restore the default values.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO on;
+
+-- Scan index on agg_pushdown_child1(parent) column and aggregate the result
+-- using AGG_SORTED strategy.
+SET enable_seqscan TO off;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+
+SET enable_seqscan TO on;
+
+-- Join "c1" to "p.x" column, i.e. one that is not in the GROUP BY clause. The
+-- planner should still use "c1.parent" as grouping expression for partial
+-- aggregation, although it's not in the same equivalence class as the GROUP
+-- BY expression ("p.i"). The reason to use "c1.parent" for partial
+-- aggregation is that this is the only way for "c1" to provide the join
+-- expression with input data.
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.x GROUP BY p.i;
+
+-- Perform nestloop join between agg_pushdown_child1 and agg_pushdown_child2
+-- and aggregate the result.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+
+-- The same for hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+
+-- The same for merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+SET enable_seqscan TO off;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
-- 
2.31.1

From e8904d1137a262dc2b9a6ed83e0ced9c87fcd684 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Fri, 4 Nov 2022 15:02:57 +0100
Subject: [PATCH 3/3] Use also partial paths as the input for grouped paths.

---
 src/backend/commands/trigger.c                |   2 +-
 src/backend/optimizer/path/allpaths.c         |  46 +++++-
 src/backend/optimizer/util/relnode.c          |  46 +++---
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/test/regress/expected/agg_pushdown.out    | 156 ++++++++++++++++++
 src/test/regress/expected/triggers.out        |   2 +-
 src/test/regress/sql/agg_pushdown.sql         |  65 ++++++++
 7 files changed, 286 insertions(+), 32 deletions(-)

diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 182e6161e0..e64145e710 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -264,7 +264,7 @@ CreateTriggerFiringOn(CreateTrigStmt *stmt, const char *queryString,
                         (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                          errmsg("\"%s\" is a partitioned table",
                                 RelationGetRelationName(rel)),
-                         errdetail("Triggers on partitioned tables cannot have transition tables.")));
+                         errdetail("ROW triggers with transition tables are not supported on partitioned tables.")));
         }
     }
     else if (rel->rd_rel->relkind == RELKIND_VIEW)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f00f900ff4..32b3dedc71 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -130,7 +130,7 @@ static void set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel,
                                    RangeTblEntry *rte);
 static void add_grouped_path(PlannerInfo *root, RelOptInfo *rel,
                              Path *subpath, AggStrategy aggstrategy,
-                             RelAggInfo *agg_info);
+                             RelAggInfo *agg_info, bool partial);
 static RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist);
 static bool subquery_is_pushdown_safe(Query *subquery, Query *topquery,
                                       pushdown_safety_info *safetyInfo);
@@ -3341,6 +3341,7 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
                         RelOptInfo *rel_plain, RelAggInfo *agg_info)
 {
     ListCell   *lc;
+    Path       *path;
 
     if (IS_DUMMY_REL(rel_plain))
     {
@@ -3350,7 +3351,7 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
 
     foreach(lc, rel_plain->pathlist)
     {
-        Path       *path = (Path *) lfirst(lc);
+        path = (Path *) lfirst(lc);
 
         /*
          * Since the path originates from the non-grouped relation which is
@@ -3364,7 +3365,8 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
          * add_grouped_path() will check whether the path has suitable
          * pathkeys.
          */
-        add_grouped_path(root, rel_grouped, path, AGG_SORTED, agg_info);
+        add_grouped_path(root, rel_grouped, path, AGG_SORTED, agg_info,
+                         false);
 
         /*
          * Repeated creation of hash table (for new parameter values) should
@@ -3372,12 +3374,38 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
          * efficiency.
          */
         if (path->param_info == NULL)
-            add_grouped_path(root, rel_grouped, path, AGG_HASHED, agg_info);
+            add_grouped_path(root, rel_grouped, path, AGG_HASHED, agg_info,
+                             false);
     }
 
     /* Could not generate any grouped paths? */
     if (rel_grouped->pathlist == NIL)
+    {
         mark_dummy_rel(rel_grouped);
+        return;
+    }
+
+    /*
+     * Almost the same for partial paths.
+     *
+     * The difference is that parameterized paths are never created, see
+     * add_partial_path() for explanation.
+     */
+    foreach(lc, rel_plain->partial_pathlist)
+    {
+        path = (Path *) lfirst(lc);
+
+        if (path->param_info != NULL)
+            continue;
+
+        path = (Path *) create_projection_path(root, rel_grouped, path,
+                                               agg_info->agg_input);
+
+        add_grouped_path(root, rel_grouped, path, AGG_SORTED, agg_info,
+                         true);
+        add_grouped_path(root, rel_grouped, path, AGG_HASHED, agg_info,
+                         true);
+    }
 }
 
 /*
@@ -3385,7 +3413,8 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
  */
 static void
 add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
-                 AggStrategy aggstrategy, RelAggInfo *agg_info)
+                 AggStrategy aggstrategy, RelAggInfo *agg_info,
+                 bool partial)
 {
     Path       *agg_path;
 
@@ -3401,7 +3430,12 @@ add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 
     /* Add the grouped path to the list of grouped base paths. */
     if (agg_path != NULL)
-        add_path(rel, (Path *) agg_path);
+    {
+        if (!partial)
+            add_path(rel, (Path *) agg_path);
+        else
+            add_partial_path(rel, (Path *) agg_path);
+    }
 }
 
 /*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 1f124b9713..ce2e267e91 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -961,33 +961,12 @@ build_join_rel(PlannerInfo *root,
         build_joinrel_partition_info(joinrel, outer_rel, inner_rel,
                                      restrictlist, sjinfo->jointype);
 
+    /*
+     * Set estimates of the joinrel's size.
+     */
     if (!grouped)
-    {
-        /*
-         * Set estimates of the joinrel's size.
-         */
         set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
                                    sjinfo, restrictlist);
-
-        /*
-         * Set the consider_parallel flag if this joinrel could potentially be
-         * scanned within a parallel worker.  If this flag is false for either
-         * inner_rel or outer_rel, then it must be false for the joinrel also.
-         * Even if both are true, there might be parallel-restricted
-         * expressions in the targetlist or quals.
-         *
-         * Note that if there are more than two rels in this relation, they
-         * could be divided between inner_rel and outer_rel in any arbitrary
-         * way.  We assume this doesn't matter, because we should hit all the
-         * same baserels and joinclauses while building up to this joinrel no
-         * matter which we take; therefore, we should make the same decision
-         * here however we get here.
-         */
-        if (inner_rel->consider_parallel && outer_rel->consider_parallel &&
-            is_parallel_safe(root, (Node *) restrictlist) &&
-            is_parallel_safe(root, (Node *) joinrel->reltarget->exprs))
-            joinrel->consider_parallel = true;
-    }
     else
     {
         /*
@@ -1003,6 +982,25 @@ build_join_rel(PlannerInfo *root,
                                             agg_info->input_rows, NULL, NULL);
     }
 
+    /*
+     * Set the consider_parallel flag if this joinrel could potentially be
+     * scanned within a parallel worker.  If this flag is false for either
+     * inner_rel or outer_rel, then it must be false for the joinrel also.
+     * Even if both are true, there might be parallel-restricted expressions
+     * in the targetlist or quals.
+     *
+     * Note that if there are more than two rels in this relation, they could
+     * be divided between inner_rel and outer_rel in any arbitrary way.  We
+     * assume this doesn't matter, because we should hit all the same baserels
+     * and joinclauses while building up to this joinrel no matter which we
+     * take; therefore, we should make the same decision here however we get
+     * here.
+     */
+    if (inner_rel->consider_parallel && outer_rel->consider_parallel &&
+        is_parallel_safe(root, (Node *) restrictlist) &&
+        is_parallel_safe(root, (Node *) joinrel->reltarget->exprs))
+        joinrel->consider_parallel = true;
+
     /* Add the joinrel to the PlannerInfo. */
     if (!grouped)
         add_join_rel(root, joinrel);
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 868d21c351..89f944d83a 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -385,6 +385,7 @@
 #enable_partition_pruning = on
 #enable_partitionwise_join = off
 #enable_partitionwise_aggregate = off
+#enable_agg_pushdown = off
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
diff --git a/src/test/regress/expected/agg_pushdown.out b/src/test/regress/expected/agg_pushdown.out
index 03a5ccf571..66d36d122e 100644
--- a/src/test/regress/expected/agg_pushdown.out
+++ b/src/test/regress/expected/agg_pushdown.out
@@ -214,3 +214,159 @@ c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
          ->  Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
 (13 rows)
 
+-- Most of the tests above with parallel query processing enforced.
+SET min_parallel_index_scan_size = 0;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+-- Partially aggregate a single relation.
+--
+-- Nestloop join.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+EXPLAIN (COSTS off)
+SELECT p.x, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+                                                 QUERY PLAN                                                 
+------------------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Gather Merge
+         Workers Planned: 1
+         ->  Nested Loop
+               ->  Partial GroupAggregate
+                     Group Key: c1.parent
+                     ->  Parallel Index Scan using agg_pushdown_child1_parent_idx on agg_pushdown_child1 c1
+               ->  Index Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+                     Index Cond: (i = c1.parent)
+(10 rows)
+
+-- Hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+                                                    QUERY PLAN                                                    
+------------------------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Sort
+         Sort Key: p.i
+         ->  Gather
+               Workers Planned: 1
+               ->  Parallel Hash Join
+                     Hash Cond: (c1.parent = p.i)
+                     ->  Partial GroupAggregate
+                           Group Key: c1.parent
+                           ->  Parallel Index Scan using agg_pushdown_child1_parent_idx on agg_pushdown_child1 c1
+                     ->  Parallel Hash
+                           ->  Parallel Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+(13 rows)
+
+-- Merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+                                                 QUERY PLAN                                                 
+------------------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Gather Merge
+         Workers Planned: 1
+         ->  Merge Join
+               Merge Cond: (c1.parent = p.i)
+               ->  Partial GroupAggregate
+                     Group Key: c1.parent
+                     ->  Parallel Index Scan using agg_pushdown_child1_parent_idx on agg_pushdown_child1 c1
+               ->  Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+(10 rows)
+
+SET enable_nestloop TO on;
+SET enable_hashjoin TO on;
+-- Perform nestloop join between agg_pushdown_child1 and agg_pushdown_child2
+-- and aggregate the result.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+                                                    QUERY PLAN                                                    
+------------------------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Gather Merge
+         Workers Planned: 2
+         ->  Sort
+               Sort Key: p.i
+               ->  Nested Loop
+                     ->  Partial HashAggregate
+                           Group Key: c1.parent
+                           ->  Nested Loop
+                                 ->  Parallel Index Scan using agg_pushdown_child1_pkey on agg_pushdown_child1 c1
+                                 ->  Index Scan using agg_pushdown_child2_pkey on agg_pushdown_child2 c2
+                                       Index Cond: ((k = c1.j) AND (parent = c1.parent))
+                     ->  Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+                           Index Cond: (i = c1.parent)
+(15 rows)
+
+-- The same for hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+                                                       QUERY PLAN

 

+------------------------------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Gather Merge
+         Workers Planned: 1
+         ->  Sort
+               Sort Key: p.i
+               ->  Parallel Hash Join
+                     Hash Cond: (c1.parent = p.i)
+                     ->  Partial HashAggregate
+                           Group Key: c1.parent
+                           ->  Parallel Hash Join
+                                 Hash Cond: ((c1.parent = c2.parent) AND (c1.j = c2.k))
+                                 ->  Parallel Index Scan using agg_pushdown_child1_parent_idx on agg_pushdown_child1
c1
+                                 ->  Parallel Hash
+                                       ->  Parallel Index Scan using agg_pushdown_child2_pkey on agg_pushdown_child2
c2
+                     ->  Parallel Hash
+                           ->  Parallel Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+(17 rows)
+
+-- The same for merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+SET enable_seqscan TO off;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+                                                    QUERY PLAN                                                    
+------------------------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Gather Merge
+         Workers Planned: 2
+         ->  Merge Join
+               Merge Cond: (c1.parent = p.i)
+               ->  Sort
+                     Sort Key: c1.parent
+                     ->  Partial HashAggregate
+                           Group Key: c1.parent
+                           ->  Merge Join
+                                 Merge Cond: ((c1.j = c2.k) AND (c1.parent = c2.parent))
+                                 ->  Parallel Index Scan using agg_pushdown_child1_pkey on agg_pushdown_child1 c1
+                                 ->  Index Scan using agg_pushdown_child2_pkey on agg_pushdown_child2 c2
+               ->  Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+(15 rows)
+
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 8b8eadd181..6d80ab1a6d 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -2046,7 +2046,7 @@ create trigger failed after update on parted_trig
   referencing old table as old_table
   for each row execute procedure trigger_nothing();
 ERROR:  "parted_trig" is a partitioned table
-DETAIL:  Triggers on partitioned tables cannot have transition tables.
+DETAIL:  ROW triggers with transition tables are not supported on partitioned tables.
 drop table parted_trig;
 --
 -- Verify trigger creation for partitioned tables, and drop behavior
diff --git a/src/test/regress/sql/agg_pushdown.sql b/src/test/regress/sql/agg_pushdown.sql
index 0a4614592b..49ba6dd67c 100644
--- a/src/test/regress/sql/agg_pushdown.sql
+++ b/src/test/regress/sql/agg_pushdown.sql
@@ -113,3 +113,68 @@ EXPLAIN (COSTS off)
 SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
 agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
 c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+
+-- Most of the tests above with parallel query processing enforced.
+SET min_parallel_index_scan_size = 0;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+
+-- Partially aggregate a single relation.
+--
+-- Nestloop join.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+EXPLAIN (COSTS off)
+SELECT p.x, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+
+-- Hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+
+-- Merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+
+SET enable_nestloop TO on;
+SET enable_hashjoin TO on;
+
+-- Perform nestloop join between agg_pushdown_child1 and agg_pushdown_child2
+-- and aggregate the result.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+
+-- The same for hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+
+-- The same for merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+SET enable_seqscan TO off;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
-- 
2.31.1


Re: WIP: Aggregation push-down - take2

From
Tomas Vondra
Date:
Hi,

I did a quick initial review of the v20 patch series. I plan to do a
more thorough review over the next couple days, if time permits. In
general I think the patch is in pretty good shape.

I've added a bunch of comments in a number of places - see the "review
comments" parts for each of the original parts. That should make it
easier to deal with all the items. I'll go through the main stuff here:

1) I was somewhat confused why we even need RelInfoList, when it merely
wraps existing fields, but I guess it's because we need multiple such
pairs - one for joins, one for grouped rels. Correct?

2) While reading the README, I was somewhat confused because it seems to
suggest we have to push the aggregate only to baserel level, but then it
also talks about pushing to other places (above joins). There's a couple
other places in the README that confused me a bit, see the XXX comments.

In general, I think the README focuses on explaining the motivation,
i.e. why we want to do this, but it's somewhat light on how it's done.
The other parts talk about the implementation in more detail.

3) I tweaked a couple places in allpaths.c to make it more readable, but
I admit that's a somewhat subjective measure, so feel free to undo that.

4) setup_base_grouped_rels compares bitmaps before looking at
reloptkind, which seems to be cheaper so maybe the checks should happen
in the opposite order (not a huge difference, though)

5) add_grouped_path seems to be a bit confusing, because the name makes
it look like it does about the same stuff as add_path/add_partial_path,
when that's not quite true

6) 0002 failed to add enable_agg_pushdown to the sample file, which
leads to a failure in regression tests

7) when I change enable_agg_pushdown to true and run regression tests, I
get a bunch of failures like

   ERROR:  WindowFunc found where not expected

Seems we don't handle window functions correctly somewhere, or maybe
setup_aggregate_pushdown should check/reject hasWindowFuncs too?

8) create_ordinary_grouping_paths changes when set_cheapest() gets
called, but I can't quite convince myself the change is correct. How
come it's correct to check pathlist instead of partial_pathlist (as before).

9) I see create_agg_sorted_path is quite picky about the subpath
pathkeys, essentially requiring it to be a prefix of group_pathkeys.
Seems unnecessary, no? Even if we sort/group on different pathkeys, that
reduces the cardinality, and we may do sort later (or just finalize
using hashagg).

Furthermore, we generally try creating a sort with the proper ordering
in other places - why not here? I mean, if subpath has pathkeys=A and we
need [A,B], we could try adding suitable IncrementalSort, no? Or even
full Sort, or something. Or is that not beneficial here?

10) I don't understand why create_agg_hashed_path limits the hashtable
size to work_mem - shouldn't it do something like cost_agg to account
for spilling to disk?

11) There's an unnecessary/unrelated change in trigger.c.

12) I improved/reworded a couple comments where I initially was unsure
what exactly that means. Hopefully I got it right.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: WIP: Aggregation push-down - take2

From
Antonin Houska
Date:
Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

> Hi,
> 
> I did a quick initial review of the v20 patch series. I plan to do a
> more thorough review over the next couple days, if time permits. In
> general I think the patch is in pretty good shape.

Thanks.

> I've added a bunch of comments in a number of places - see the "review
> comments" parts for each of the original parts. That should make it
> easier to deal with all the items. I'll go through the main stuff here:

Unless I miss something, all these items are covered in context below, except
for this one:

> 7) when I change enable_agg_pushdown to true and run regression tests, I
> get a bunch of failures like
> 
>    ERROR:  WindowFunc found where not expected
> 
> Seems we don't handle window functions correctly somewhere, or maybe
> setup_aggregate_pushdown should check/reject hasWindowFuncs too?

We don't need to reject window functions, window functions are processed after
grouping/aggregation. The problem I noticed in the regression tests was that a
window function referenced a (non-window) aggregate. We just need to ensure
that pull_var_clause() recurses into that window function in such cases:

Besides the next version, v21-fixes.patch file is attached. It tries to
summarize all the changes between v21 and v22. (I wonder if this attachment
makes the cfbot fail.)


diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 8e913c92d8..8dc39765f2 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -355,7 +355,8 @@ create_aggregate_grouped_var_infos(PlannerInfo *root)
     Assert(root->grouped_var_list == NIL);
 
     tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
-                                  PVC_INCLUDE_AGGREGATES);
+                                  PVC_INCLUDE_AGGREGATES |
+                                  PVC_RECURSE_WINDOWFUNCS);
 
     /*
      * Although GroupingFunc is related to root->parse->groupingSets, this


> ---
>  src/backend/optimizer/util/relnode.c | 11 +++++++++++
>  src/include/nodes/pathnodes.h        |  3 +++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
> index 94720865f47..d4367ba14a5 100644
> --- a/src/backend/optimizer/util/relnode.c
> +++ b/src/backend/optimizer/util/relnode.c
> @@ -382,6 +382,12 @@ find_base_rel(PlannerInfo *root, int relid)
>  /*
>   * build_rel_hash
>   *      Construct the auxiliary hash table for relation specific data.
> + *
> + * XXX Why is this renamed, leaving out the "join" part? Are we going to use
> + * it for other purposes?

Yes, besides join relation, it's used to find the "grouped relation" by
Relids. This change tries to follow the suggestion "Maybe an appropriate
preliminary patch ..." in [1], but I haven't got any feedback whether my
understanding was correct.

> + * XXX Also, why change the API and not pass PlannerInfo? Seems pretty usual
> + * for planner functions.

I think that the reason was that, with the patch applied, PlannerInfo contains
multiple fields of the RelInfoList type, so build_rel_hash() needs an
information which one it should process. Passing the exact field is simpler
than passing PlannerInfo plus some additional information.

>   */
>  static void
>  build_rel_hash(RelInfoList *list)
> @@ -422,6 +428,11 @@ build_rel_hash(RelInfoList *list)
>  /*
>   * find_rel_info
>   *      Find a base or join relation entry.
> + *
> + * XXX Why change the API and not pass PlannerInfo? Seems pretty usual
> + * for planner functions.

For the same reason that build_rel_hash() receives the list explicitly, see
above.

> + * XXX I don't understand why we need both this and find_join_rel.

Perhaps I just wanted to keep the call sites of find_join_rel() untouched. I
think that

    find_join_rel(root, relids);

is a little bit easier to read than

    (RelOptInfo *) find_rel_info(root->join_rel_list, relids);

>   */
>  static void *
>  find_rel_info(RelInfoList *list, Relids relids)
> diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
> index 0ca7d5ab51e..018ce755720 100644
> --- a/src/include/nodes/pathnodes.h
> +++ b/src/include/nodes/pathnodes.h
> @@ -88,6 +88,9 @@ typedef enum UpperRelationKind
>   * present and valid when rel_hash is not NULL.  Note that we still maintain
>   * the list even when using the hash table for lookups; this simplifies life
>   * for GEQO.
> + *
> + * XXX I wonder why we actually need a separate node, merely wrapping fields
> + * that already existed ...

This is so that the existing fields can still be printed out
(nodes/outfuncs.c).

> diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
> index 2fd1a962699..6f6b7d0b93b 100644
> --- a/src/backend/optimizer/README
> +++ b/src/backend/optimizer/README
> @@ -1168,6 +1168,12 @@ input of Agg node. However, if the groups are large enough, it may be more
>  efficient to apply the partial aggregation to the output of base relation
>  scan, and finalize it when we have all relations of the query joined:
>  
> +XXX review: Hmm, do we need to push it all the way down to base relations? Or
> +would it make sense to do the agg on an intermediate level? Say, we're joining
> +three tables A, B and C. Maybe the agg could/should be evaluated on top of join
> +A+B, before joining with C? Say, maybe the aggregate references columns from
> +both base relations?
> +
>    EXPLAIN
>    SELECT a.i, avg(b.y)
>    FROM a JOIN b ON b.j = a.i

Another example below does show the partial aggregates at join level.

> +XXX Perhaps mention this may also mean the partial ggregate could be pushed
> +to a remote server with FDW partitions?

Even if it's not implemented in the current patch version?

> +
>  Note that there's often no GROUP BY expression to be used for the partial
>  aggregation, so we use equivalence classes to derive grouping expression: in
>  the example above, the grouping key "b.j" was derived from "a.i".
>  
> +XXX I think this is slightly confusing - there is a GROUP BY expression for the
> +partial aggregate, but as stated in the query it may not reference the side of
> +a join explicitly.

ok, changed.

>  Also note that in this case the partial aggregate uses the "b.j" as grouping
>  column although the column does not appear in the query target list. The point
>  is that "b.j" is needed to evaluate the join condition, and there's no other
>  way for the partial aggregate to emit its values.
>  
> +XXX Not sure I understand what this is trying to say. Firstly, maybe it'd be
> +helpful to show targetlists in the EXPLAIN, i.e. do it as VERBOSE. But more
> +importantly, isn't this a direct consequence of the equivalence classes stuff
> +mentioned in the preceding paragraph?

The equivalence class is just a mechanism to derive expressions which are not
explicitly mentioned in the query, but there's always a question whether you
need to derive any expression for particular table or not. Here I tried to
explain that the choice of join columns is related to the choice of grouping
keys for the partial aggregate.

I've deleted this paragraph and added a note to the previous one.

>  Besides base relation, the aggregation can also be pushed down to join:
>  
>    EXPLAIN
> @@ -1217,6 +1235,10 @@ Besides base relation, the aggregation can also be pushed down to join:
>        ->  Hash
>          ->  Seq Scan on a
>  
> +XXX Aha, so this is pretty-much an answer to my earlier comment, and matches
> +my example with three tables. Maybe this suggests the initial reference to
> +base relations is a bit confusing.

I tried to use the simplest example to demonstrate the concepts, then extended
it to the partially-aggregated joins.

> +XXX I think this is a good explanation of the motivation for this patch, but
> +maybe it'd be good to go into more details about how we decide if it's correct
> +to actually do the pushdown, data structures etc. Similar to earlier parts of
> +this README.

Added two paragraphs, see "Regarding correctness...".

> diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
> index f00f900ff41..6d2c2f4fc36 100644
> --- a/src/backend/optimizer/path/allpaths.c
> +++ b/src/backend/optimizer/path/allpaths.c
> @@ -196,9 +196,10 @@ make_one_rel(PlannerInfo *root, List *joinlist)
>      /*
>       * Now that the sizes are known, we can estimate the sizes of the grouped
>       * relations.
> +     *
> +     * XXX Seems more consistent with code nearby.
>       */
> -    if (root->grouped_var_list)
> -        setup_base_grouped_rels(root);
> +    setup_base_grouped_rels(root);

In general I prefer not calling a function if it's obvious that it's not
needed, but on the other hand the test of the 'grouped_var_list' field may be
considered disturbing from the caller's perspective. I've got no strong
opinion on this, so I can accept this proposal.

>  
>  /*
> - * setup_based_grouped_rels
> + * setup_base_grouped_rels
>   *      For each "plain" relation build a grouped relation if aggregate pushdown
>   *    is possible and if this relation is suitable for partial aggregation.
>   */

Fixed, thanks.

>  {
>      Index        rti;
>  
> +    /* If there are no grouped relations, estimate their sizes. */
> +    if (!root->grouped_var_list)
> +        return;
> +

Accepted, but with different wording (s/relations/expressions/).

> +        /* XXX Shouldn't this check be earlier? Seems cheaper than the check
> +         * calling bms_nonempty_difference, for example. */
>          if (brel->reloptkind != RELOPT_BASEREL)
>              continue;

Right, moved.

>          rel_grouped = build_simple_grouped_rel(root, brel->relid, &agg_info);
> -        if (rel_grouped)
> -        {
> -            /* Make the relation available for joining. */
> -            add_grouped_rel(root, rel_grouped, agg_info);
> -        }
> +
> +        /* XXX When does this happen? */
> +        if (!rel_grouped)
> +            continue;
> +
> +        /* Make the relation available for joining. */
> +        add_grouped_rel(root, rel_grouped, agg_info);

I'd use the "continue" statement if there was a lot of code in the "if
(rel_grouped) {...}" branch, but no strong preference in this case, so
accepted.

>      }
>  }
>  
> @@ -560,6 +569,8 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
>                      /* Plain relation */
>                      set_plain_rel_pathlist(root, rel, rte);
>  
> +                    /* XXX Shouldn't this really be part of set_plain_rel_pathlist? */
> +
>                      /* Add paths to the grouped relation if one exists. */
>                      rel_grouped = find_grouped_rel(root, rel->relids,

Yes, it can. Moved.

> @@ -3382,6 +3393,11 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
>  
>  /*
>   * Apply partial aggregation to a subpath and add the AggPath to the pathlist.
> + *
> + * XXX I think this is potentially quite confusing, because the existing "add"
> + * functions add_path and add_partial_path only check if the proposed path is
> + * dominated by an existing path, pathkeys, etc. But this does more than that,
> + * perhaps even constructing new path etc.
>   */
>  static void
>  add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,

Maybe, but I don't have a good idea of an alternative name.
create_group_path() already exists and the create_*_path() functions are
rather low-level. Maybe generate_grouped_path(), and at the same time rename
generate_grouping_paths() to generate_grouped_paths()? In general, the
generate_*_path*() functions do non-trivial things and eventually call
add_path().

> @@ -3399,9 +3414,16 @@ add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
>      else
>          elog(ERROR, "unexpected strategy %d", aggstrategy);
>  
> +    /*
> +     * Bail out if we failed to create a suitable aggregated path. This can
> +     * happen e.g. then the path does not support hashing (for AGG_HASHED),
> +     * or when the input path is not sorted.
> +     */
> +    if (agg_path == NULL)
> +        return;
> +
>      /* Add the grouped path to the list of grouped base paths. */
> -    if (agg_path != NULL)
> -        add_path(rel, (Path *) agg_path);
> +    add_path(rel, (Path *) agg_path);

ok, changed.

>  }
>  
>  /*
> @@ -3545,7 +3567,6 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
>  
>      for (lev = 2; lev <= levels_needed; lev++)
>      {
> -        RelOptInfo *rel_grouped;
>          ListCell   *lc;
>  
>          /*
> @@ -3567,6 +3588,8 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
>           */
>          foreach(lc, root->join_rel_level[lev])
>          {
> +            RelOptInfo *rel_grouped;
> +
>              rel = (RelOptInfo *) lfirst(lc);

Sure, fixed.

> diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
> index 8e913c92d8b..d7a9de9645e 100644
> --- a/src/backend/optimizer/plan/initsplan.c
> +++ b/src/backend/optimizer/plan/initsplan.c
> @@ -278,6 +278,8 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
>   * each possible grouping expression.
>   *
>   * root->group_pathkeys must be setup before this function is called.
> + *
> + * XXX Perhaps this should check/reject hasWindowFuncs too?

create_window_paths() is called after create_grouping_paths() (see
grouping_planner()), so it should not care whether the input (possibly
grouped) paths involve the aggregate push-down or not.

>   */
>  extern void
>  setup_aggregate_pushdown(PlannerInfo *root)
> @@ -311,6 +313,12 @@ setup_aggregate_pushdown(PlannerInfo *root)
>      if (root->parse->hasTargetSRFs)
>          return;
>  
> +    /*
> +     * XXX Maybe it'd be better to move create_aggregate_grouped_var_infos and
> +     * create_grouping_expr_grouped_var_infos to a function returning bool, and
> +     * only check that here.
> +     */
> +

Hm, it looks to me like too much "indirection", and also a decriptive function
name would be tricky to invent.

>      /* Create GroupedVarInfo per (distinct) aggregate. */
>      create_aggregate_grouped_var_infos(root);
>  
> @@ -329,6 +337,8 @@ setup_aggregate_pushdown(PlannerInfo *root)
>       * Now that we know that grouping can be pushed down, search for the
>       * maximum sortgroupref. The base relations may need it if extra grouping
>       * expressions get added to them.
> +     *
> +     * XXX Shouldn't we do that only when adding extra grouping expressions?
>       */
>      Assert(root->max_sortgroupref == 0);
>      foreach(lc, root->processed_tlist)

We don't know at this (early) stage whether those "extra grouping expression"
will be needed for at least one relation. (max_sortgroupref is used by
create_rel_agg_info())

> diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
> index 0ada3ba3ebe..2f4db69c1f9 100644
> --- a/src/backend/optimizer/plan/planner.c
> +++ b/src/backend/optimizer/plan/planner.c
> @@ -3899,6 +3899,10 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
>      /*
>       * The non-partial paths can come either from the Gather above or from
>       * aggregate push-down.
> +     *
> +     * XXX I can't quite convince myself this is correct. How come it's fine
> +     * to check pathlist and then call set_cheapest() on partially_grouped_rel?
> +     * Maybe it's correct and the comment merely needs to explain this.

It's not clear to me what makes you confused. Without my patch, the code looks
like this:

    if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
    {
        gather_grouping_paths(root, partially_grouped_rel);
        set_cheapest(partially_grouped_rel);
    }

Here gather_grouping_paths() adds paths to partially_grouped_rel->pathlist. My
patch calls set_cheapest() independent from gather_grouping_paths() because
the paths requiring the aggregate finalization can also be generated by the
aggregate push-down feature.

>       */
>      if (partially_grouped_rel && partially_grouped_rel->pathlist)
>          set_cheapest(partially_grouped_rel);
> @@ -6847,6 +6851,12 @@ create_partial_grouping_paths(PlannerInfo *root,
>       * push-down.
>       */
>      partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
> +
> +    /*
> +     * If the relation already exists, it must have been created by aggregate
> +     * pushdown. We can't check how exactly it got created, but we can at least
> +     * check that aggregate pushdown is enabled.
> +     */
>      Assert(enable_agg_pushdown || partially_grouped_rel == NULL);

ok, done.

> @@ -6872,6 +6882,8 @@ create_partial_grouping_paths(PlannerInfo *root,
>       * If we can't partially aggregate partial paths, and we can't partially
>       * aggregate non-partial paths, then don't bother creating the new
>       * RelOptInfo at all, unless the caller specified force_rel_creation.
> +     *
> +     * XXX Not sure why we're checking the partially_grouped_rel here?
>       */
>      if (cheapest_total_path == NULL &&
>          cheapest_partial_path == NULL &&

I think (but not verified yet) that without this test the function could
return NULL for reasons unrelated to the aggregate push-down. Nevertheless, I
realize now that there's no aggregate push-down specific processing in the
function. I've adjusted it so that it does return, but the returned value is
partially_grouped_rel rather than NULL.

> @@ -6881,7 +6893,9 @@ create_partial_grouping_paths(PlannerInfo *root,
>  
>      /*
>       * Build a new upper relation to represent the result of partially
> -     * aggregating the rows from the input relation.
> +     * aggregating the rows from the input relation. The relation may
> +     * already exist due to aggregate pushdown, in which case we don't
> +     * need to create it.
>       */
>      if (partially_grouped_rel == NULL)
>          partially_grouped_rel = fetch_upper_rel(root,

ok, done.

> @@ -6903,6 +6917,8 @@ create_partial_grouping_paths(PlannerInfo *root,
>       *
>       * If the target was already created for the sake of aggregate push-down,
>       * it should be compatible with what we'd create here.
> +     *
> +     * XXX Why is this checking reltarget->exprs? What does that mean? 
>       */
>      if (partially_grouped_rel->reltarget->exprs == NIL)
>          partially_grouped_rel->reltarget =

I've added this comment:

     * XXX If fetch_upper_rel() had to create a new relation (i.e. aggregate
     * push-down generated no paths), it created an empty target. Should we
     * change the convention and have it assign NULL to reltarget instead?  Or
     * should we introduce a function like is_pathtarget_empty()?

> diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
> index 7025ebf94be..395bd093d34 100644
> --- a/src/backend/optimizer/util/pathnode.c
> +++ b/src/backend/optimizer/util/pathnode.c
> @@ -3163,9 +3163,21 @@ create_agg_path(PlannerInfo *root,
>  }
>  
>  /*
> + * create_agg_sorted_path
> + *        Creates a pathnode performing sorted aggregation/grouping
> + *
>   * Apply AGG_SORTED aggregation path to subpath if it's suitably sorted.
>   *
>   * NULL is returned if sorting of subpath output is not suitable.
> + *
> + * XXX I'm a bit confused why we need this? We now have create_agg_path and also
> + * create_agg_sorted_path and create_agg_hashed_path.

Do you mean that the function names are confusing? The functions
create_agg_sorted_path() and create_agg_hashed_path() do some checks /
preparation for the call of the existing function create_agg_path(), which is
more low-level. Should the names be something like
create_partial_agg_sorted_path() and create_partial_agg_hashed_path() ?

> + *
> + * XXX This assumes the input path to be sorted in a suitable way, but for
> + * regular aggregation we check that separately and then perhaps add sort
> + * if needed (possibly incremental one). That is, we don't do such checks
> + * in create_agg_path. Shouldn't we do the same thing before calling this
> + * new functions?
>   */
>  AggPath *
>  create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
> @@ -3184,6 +3196,7 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
>      agg_exprs = agg_info->agg_exprs;
>      target = agg_info->target;

Likewise, it seems that you'd like to see different function name and maybe
different location of this function. Both create_agg_sorted_path() and
create_agg_hashed_path() are rather wrappers for create_agg_path().

>  
> +    /* Bail out if the input path is not sorted at all. */
>      if (subpath->pathkeys == NIL)
>          return NULL;

ok, done.

> @@ -3192,6 +3205,18 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
>  
>      /*
>       * Find all query pathkeys that our relation does affect.
> +     *
> +     * XXX Not sure what "that our relation does affect" means? Also, we
> +     * are not looking at query_pathkeys but group_pathkeys, so that's a
> +     * bit confusing. Perhaps something like this would be better:
> +     *

Indeed, the check of pathkeys was weird, I've reworked it.

> @@ -3210,10 +3235,21 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
>          }
>      }
>  
> +    /* Bail out if the subquery has no pathkeys for the grouping. */
>      if (key_subset == NIL)
>          return NULL;
>  
> -    /* Check if AGG_SORTED is useful for the whole query.  */
> +    /*
> +     * Check if AGG_SORTED is useful for the whole query.
> +     *
> +     * XXX So this means we require the group pathkeys matched to the
> +     * subpath have to be a prefix of subpath->pathkeys. Why is that
> +     * necessary? We'll reduce the cardinality, and in the worst case
> +     * we'll have to add a separate sort (full or incremental). Or we
> +     * could finalize using hashed aggregate.

Although with different arguments, pathkeys_contained_in() is still used in
the new version of the patch. I've added a TODO comment about the incremental
sort (it did not exist when I was writing the patch), but what do you mean by
"reducing the cardinality"? Eventually the partial aggregate should reduce the
cardinality, but for the AGG_SORT strategy to work, the input sorting must be
such that the executor can recognize the group boundaries.

> +     *
> +     * XXX Doesn't seem to change any regression tests when disabled.
> +     */
>      if (!pathkeys_contained_in(key_subset, subpath->pathkeys))
>          return NULL;

"disabled" means removal of this part (including the return statement), or
returning NULL unconditionally? Whatever you mean, please check with the new
version.

> @@ -3231,7 +3267,7 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
>      result = create_agg_path(root, rel, subpath, target,
>                               AGG_SORTED, aggsplit,
>                               agg_info->group_clauses,
> -                             NIL,
> +                             NIL,    /* qual for HAVING clause */
>                               &agg_costs,
>                               dNumGroups);

ok, done here as well as in create_agg_hashed_path().

> @@ -3283,6 +3319,9 @@ create_agg_hashed_path(PlannerInfo *root, RelOptInfo *rel,
>                                                        &agg_costs,
>                                                        dNumGroups);
>  
> +        /*
> +         * XXX But we can spill to disk in hashagg now, no?
> +         */
>          if (hashaggtablesize < work_mem * 1024L)
>          {

Yes, we can. It wasn't possible while I was writing the patch. Fixed.

> diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
> index 868d21c351e..6e87ada684b 100644
> --- a/src/backend/utils/misc/postgresql.conf.sample
> +++ b/src/backend/utils/misc/postgresql.conf.sample
> @@ -388,6 +388,7 @@
>  #enable_seqscan = on
>  #enable_sort = on
>  #enable_tidscan = on
> +#enable_agg_pushdown = on

Done.

> diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
> index 1055ea70940..05192ca549a 100644
> --- a/src/backend/optimizer/path/allpaths.c
> +++ b/src/backend/optimizer/path/allpaths.c
> @@ -3352,7 +3352,7 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
>                          RelOptInfo *rel_plain, RelAggInfo *agg_info)
>  {
>      ListCell   *lc;
> -    Path       *path;
> +    Path       *path;    /* XXX why declare at this level, not in the loops */
>  

I usually do it this way, not sure why. Perhaps because it's less typing :-) I
changed that in the next version so that we don't waste time arguing about
unimportant things.

[1] https://www.postgresql.org/message-id/9726.1542577439%40sss.pgh.pa.us

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

From f1f7a407cd0e0e05a15a6de5c85c2d8620a1ae2a Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Thu, 17 Nov 2022 10:41:11 +0100
Subject: [PATCH 1/3] Introduce RelInfoList structure.

This patch puts join_rel_list and join_rel_hash fields of PlannerInfo
structure into a new structure RelInfoList. It also adjusts add_join_rel() and
find_join_rel() functions so they only call add_rel_info() and find_rel_info()
respectively.

fetch_upper_rel() now uses the new API and the hash table as well because the
list stored in root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG] will contain many
relations as soon as the aggregate push-down feature is added.
---
 contrib/postgres_fdw/postgres_fdw.c    |   3 +-
 src/backend/optimizer/geqo/geqo_eval.c |  12 +-
 src/backend/optimizer/plan/planmain.c  |   3 +-
 src/backend/optimizer/util/relnode.c   | 170 ++++++++++++++-----------
 src/include/nodes/pathnodes.h          |  31 +++--
 5 files changed, 126 insertions(+), 93 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 8d7500abfb..bb1125e57c 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -5777,7 +5777,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
      */
     Assert(fpinfo->relation_index == 0);    /* shouldn't be set yet */
     fpinfo->relation_index =
-        list_length(root->parse->rtable) + list_length(root->join_rel_list);
+        list_length(root->parse->rtable) +
+        list_length(root->join_rel_list->items);
 
     return true;
 }
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 004481d608..7ad0baaa0f 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -92,11 +92,11 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
      *
      * join_rel_level[] shouldn't be in use, so just Assert it isn't.
      */
-    savelength = list_length(root->join_rel_list);
-    savehash = root->join_rel_hash;
+    savelength = list_length(root->join_rel_list->items);
+    savehash = root->join_rel_list->hash;
     Assert(root->join_rel_level == NULL);
 
-    root->join_rel_hash = NULL;
+    root->join_rel_list->hash = NULL;
 
     /* construct the best path for the given combination of relations */
     joinrel = gimme_tree(root, tour, num_gene);
@@ -121,9 +121,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
      * Restore join_rel_list to its former state, and put back original
      * hashtable if any.
      */
-    root->join_rel_list = list_truncate(root->join_rel_list,
-                                        savelength);
-    root->join_rel_hash = savehash;
+    root->join_rel_list->items = list_truncate(root->join_rel_list->items,
+                                               savelength);
+    root->join_rel_list->hash = savehash;
 
     /* release all the memory acquired within gimme_tree */
     MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 63deed27c9..55de28f073 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -65,8 +65,7 @@ query_planner(PlannerInfo *root,
      * NOTE: append_rel_list was set up by subquery_planner, so do not touch
      * here.
      */
-    root->join_rel_list = NIL;
-    root->join_rel_hash = NULL;
+    root->join_rel_list = makeNode(RelInfoList);
     root->join_rel_level = NULL;
     root->join_cur_level = 0;
     root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index d7b4434e7f..94720865f4 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -32,11 +32,15 @@
 #include "utils/lsyscache.h"
 
 
-typedef struct JoinHashEntry
+/*
+ * An entry of a hash table that we use to make lookup for RelOptInfo
+ * structures more efficient.
+ */
+typedef struct RelInfoEntry
 {
-    Relids        join_relids;    /* hash key --- MUST BE FIRST */
-    RelOptInfo *join_rel;
-} JoinHashEntry;
+    Relids        relids;            /* hash key --- MUST BE FIRST */
+    void       *data;
+} RelInfoEntry;
 
 static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
                                 RelOptInfo *input_rel);
@@ -376,11 +380,11 @@ find_base_rel(PlannerInfo *root, int relid)
 }
 
 /*
- * build_join_rel_hash
- *      Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ *      Construct the auxiliary hash table for relation specific data.
  */
 static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
 {
     HTAB       *hashtab;
     HASHCTL        hash_ctl;
@@ -388,47 +392,49 @@ build_join_rel_hash(PlannerInfo *root)
 
     /* Create the hash table */
     hash_ctl.keysize = sizeof(Relids);
-    hash_ctl.entrysize = sizeof(JoinHashEntry);
+    hash_ctl.entrysize = sizeof(RelInfoEntry);
     hash_ctl.hash = bitmap_hash;
     hash_ctl.match = bitmap_match;
     hash_ctl.hcxt = CurrentMemoryContext;
-    hashtab = hash_create("JoinRelHashTable",
+    hashtab = hash_create("RelHashTable",
                           256L,
                           &hash_ctl,
                           HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
 
     /* Insert all the already-existing joinrels */
-    foreach(l, root->join_rel_list)
+    foreach(l, list->items)
     {
-        RelOptInfo *rel = (RelOptInfo *) lfirst(l);
-        JoinHashEntry *hentry;
+        RelOptInfo       *rel = lfirst_node(RelOptInfo, l);
+        RelInfoEntry *hentry;
         bool        found;
 
-        hentry = (JoinHashEntry *) hash_search(hashtab,
-                                               &(rel->relids),
-                                               HASH_ENTER,
-                                               &found);
+        hentry = (RelInfoEntry *) hash_search(hashtab,
+                                              &rel->relids,
+                                              HASH_ENTER,
+                                              &found);
         Assert(!found);
-        hentry->join_rel = rel;
+        hentry->data = rel;
     }
 
-    root->join_rel_hash = hashtab;
+    list->hash = hashtab;
 }
 
 /*
- * find_join_rel
- *      Returns relation entry corresponding to 'relids' (a set of RT indexes),
- *      or NULL if none exists.  This is for join relations.
+ * find_rel_info
+ *      Find a base or join relation entry.
  */
-RelOptInfo *
-find_join_rel(PlannerInfo *root, Relids relids)
+static void *
+find_rel_info(RelInfoList *list, Relids relids)
 {
+    if (list == NULL)
+        return NULL;
+
     /*
      * Switch to using hash lookup when list grows "too long".  The threshold
      * is arbitrary and is known only here.
      */
-    if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
-        build_join_rel_hash(root);
+    if (!list->hash && list_length(list->items) > 32)
+        build_rel_hash(list);
 
     /*
      * Use either hashtable lookup or linear search, as appropriate.
@@ -438,34 +444,82 @@ find_join_rel(PlannerInfo *root, Relids relids)
      * so would force relids out of a register and thus probably slow down the
      * list-search case.
      */
-    if (root->join_rel_hash)
+    if (list->hash)
     {
         Relids        hashkey = relids;
-        JoinHashEntry *hentry;
+        RelInfoEntry *hentry;
 
-        hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
-                                               &hashkey,
-                                               HASH_FIND,
-                                               NULL);
+        hentry = (RelInfoEntry *) hash_search(list->hash,
+                                              &hashkey,
+                                              HASH_FIND,
+                                              NULL);
         if (hentry)
-            return hentry->join_rel;
+            return hentry->data;
     }
     else
     {
         ListCell   *l;
 
-        foreach(l, root->join_rel_list)
+        foreach(l, list->items)
         {
-            RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+            RelOptInfo   *item = lfirst_node(RelOptInfo, l);
 
-            if (bms_equal(rel->relids, relids))
-                return rel;
+            if (bms_equal(item->relids, relids))
+                return item;
         }
     }
 
     return NULL;
 }
 
+/*
+ * find_join_rel
+ *      Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ *      or NULL if none exists.  This is for join relations.
+ */
+RelOptInfo *
+find_join_rel(PlannerInfo *root, Relids relids)
+{
+    return (RelOptInfo *) find_rel_info(root->join_rel_list, relids);
+}
+
+/*
+ * add_rel_info
+ *        Add relation specific info to a list, and also add it to the auxiliary
+ *        hashtable if there is one.
+ */
+static void
+add_rel_info(RelInfoList *list, RelOptInfo *rel)
+{
+    /* GEQO requires us to append the new joinrel to the end of the list! */
+    list->items = lappend(list->items, rel);
+
+    /* store it into the auxiliary hashtable if there is one. */
+    if (list->hash)
+    {
+        RelInfoEntry *hentry;
+        bool        found;
+
+        hentry = (RelInfoEntry *) hash_search(list->hash,
+                                              &rel->relids,
+                                              HASH_ENTER,
+                                              &found);
+        Assert(!found);
+        hentry->data = rel;
+    }
+}
+
+/*
+ * add_join_rel
+ *        Add given join relation to the list of join relations in the given
+ *        PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+    add_rel_info(root->join_rel_list, joinrel);
+}
+
 /*
  * set_foreign_rel_properties
  *        Set up foreign-join fields if outer and inner relation are foreign
@@ -516,32 +570,6 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
     }
 }
 
-/*
- * add_join_rel
- *        Add given join relation to the list of join relations in the given
- *        PlannerInfo. Also add it to the auxiliary hashtable if there is one.
- */
-static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
-{
-    /* GEQO requires us to append the new joinrel to the end of the list! */
-    root->join_rel_list = lappend(root->join_rel_list, joinrel);
-
-    /* store it into the auxiliary hashtable if there is one. */
-    if (root->join_rel_hash)
-    {
-        JoinHashEntry *hentry;
-        bool        found;
-
-        hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
-                                               &(joinrel->relids),
-                                               HASH_ENTER,
-                                               &found);
-        Assert(!found);
-        hentry->join_rel = joinrel;
-    }
-}
-
 /*
  * build_join_rel
  *      Returns relation entry corresponding to the union of two given rels,
@@ -1210,22 +1238,14 @@ subbuild_joinrel_joinlist(RelOptInfo *joinrel,
 RelOptInfo *
 fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
 {
+    RelInfoList *list = &root->upper_rels[kind];
     RelOptInfo *upperrel;
-    ListCell   *lc;
-
-    /*
-     * For the moment, our indexing data structure is just a List for each
-     * relation kind.  If we ever get so many of one kind that this stops
-     * working well, we can improve it.  No code outside this function should
-     * assume anything about how to find a particular upperrel.
-     */
 
     /* If we already made this upperrel for the query, return it */
-    foreach(lc, root->upper_rels[kind])
+    if (list)
     {
-        upperrel = (RelOptInfo *) lfirst(lc);
-
-        if (bms_equal(upperrel->relids, relids))
+        upperrel = find_rel_info(list, relids);
+        if (upperrel)
             return upperrel;
     }
 
@@ -1244,7 +1264,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
     upperrel->cheapest_unique_path = NULL;
     upperrel->cheapest_parameterized_paths = NIL;
 
-    root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
+    add_rel_info(&root->upper_rels[kind], upperrel);
 
     return upperrel;
 }
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index a544b313d3..869854e235 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -80,6 +80,25 @@ typedef enum UpperRelationKind
     /* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
 } UpperRelationKind;
 
+/*
+ * Hashed list to store relation specific info and to retrieve it by relids.
+ *
+ * For small problems we just scan the list to do lookups, but when there are
+ * many relations we build a hash table for faster lookups. The hash table is
+ * present and valid when rel_hash is not NULL.  Note that we still maintain
+ * the list even when using the hash table for lookups; this simplifies life
+ * for GEQO.
+ */
+typedef struct RelInfoList
+{
+    pg_node_attr(no_copy_equal, no_read)
+
+    NodeTag        type;
+
+    List       *items;
+    struct HTAB *hash pg_node_attr(read_write_ignore);
+} RelInfoList;
+
 /*----------
  * PlannerGlobal
  *        Global information for planning/optimization
@@ -260,15 +279,9 @@ struct PlannerInfo
 
     /*
      * join_rel_list is a list of all join-relation RelOptInfos we have
-     * considered in this planning run.  For small problems we just scan the
-     * list to do lookups, but when there are many join relations we build a
-     * hash table for faster lookups.  The hash table is present and valid
-     * when join_rel_hash is not NULL.  Note that we still maintain the list
-     * even when using the hash table for lookups; this simplifies life for
-     * GEQO.
+     * considered in this planning run.
      */
-    List       *join_rel_list;
-    struct HTAB *join_rel_hash pg_node_attr(read_write_ignore);
+    struct RelInfoList *join_rel_list;    /* list of join-relation RelOptInfos */
 
     /*
      * When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -395,7 +408,7 @@ struct PlannerInfo
      * Upper-rel RelOptInfos. Use fetch_upper_rel() to get any particular
      * upper rel.
      */
-    List       *upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+    RelInfoList       upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);;
 
     /* Result tlists chosen by grouping_planner for upper-stage processing */
     struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
-- 
2.31.1

From 912e1976d0f9e7604558a8f5a70fb3c32b2646c7 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Thu, 17 Nov 2022 10:41:11 +0100
Subject: [PATCH 2/3] Aggregate push-down - basic functionality.

With this patch, partial aggregation can be applied to a base relation or to a
join, and the resulting "grouped" relations can be joined to other "plain"
relations. Once all tables are joined, the aggregation is finalized. See
README for more information.

The next patches will enable the aggregate push-down feature for parallel
query processing, for partitioned tables and for foreign tables.
---
 src/backend/optimizer/README                  |  89 ++
 src/backend/optimizer/path/allpaths.c         | 157 +++
 src/backend/optimizer/path/costsize.c         |  16 +-
 src/backend/optimizer/path/equivclass.c       | 130 +++
 src/backend/optimizer/path/joinrels.c         | 193 +++-
 src/backend/optimizer/plan/initsplan.c        | 290 +++++
 src/backend/optimizer/plan/planmain.c         |  12 +
 src/backend/optimizer/plan/planner.c          |  71 +-
 src/backend/optimizer/plan/setrefs.c          |  33 +
 src/backend/optimizer/prep/prepagg.c          | 264 +++--
 src/backend/optimizer/prep/prepjointree.c     |   1 +
 src/backend/optimizer/util/pathnode.c         | 126 ++-
 src/backend/optimizer/util/relnode.c          | 998 +++++++++++++++++-
 src/backend/optimizer/util/tlist.c            |  31 +
 src/backend/utils/misc/guc_tables.c           |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/pathnodes.h                 |  96 ++
 src/include/optimizer/clauses.h               |   3 +-
 src/include/optimizer/pathnode.h              |  19 +-
 src/include/optimizer/paths.h                 |   6 +
 src/include/optimizer/planmain.h              |   1 +
 src/include/optimizer/prep.h                  |   2 +
 src/include/optimizer/tlist.h                 |   4 +-
 src/test/regress/expected/agg_pushdown.out    | 216 ++++
 src/test/regress/expected/sysviews.out        |   3 +-
 src/test/regress/parallel_schedule            |   2 +
 src/test/regress/sql/agg_pushdown.sql         | 115 ++
 27 files changed, 2694 insertions(+), 195 deletions(-)
 create mode 100644 src/test/regress/expected/agg_pushdown.out
 create mode 100644 src/test/regress/sql/agg_pushdown.sql

diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 41c120e0cd..db97bd254d 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1158,3 +1158,92 @@ breaking down aggregation or grouping over a partitioned relation into
 aggregation or grouping over its partitions is called partitionwise
 aggregation.  Especially when the partition keys match the GROUP BY clause,
 this can be significantly faster than the regular method.
+
+Aggregate push-down
+-------------------
+
+The obvious way to evaluate aggregates is to evaluate the FROM clause of the
+SQL query (this is what query_planner does) and use the resulting paths as the
+input of Agg node. However, if the groups are large enough, it may be more
+efficient to apply the partial aggregation to the output of base relation
+scan, and finalize it when we have all relations of the query joined:
+
+  EXPLAIN
+  SELECT a.i, avg(b.y)
+  FROM a JOIN b ON b.j = a.i
+  GROUP BY a.i;
+
+  Finalize HashAggregate
+    Group Key: a.i
+    ->  Nested Loop
+          ->  Partial HashAggregate
+                Group Key: b.j
+                ->  Seq Scan on b
+          ->  Index Only Scan using a_pkey on a
+                Index Cond: (i = b.j)
+
+Thus the join above the partial aggregate node receives fewer input rows, and
+so the number of outer-to-inner pairs of tuples to be checked can be
+significantly lower, which can in turn lead to considerably lower join cost.
+
+Note that the GROUP BY expression might not be useful for the partial
+aggregate. In the example above, the aggregate avg(b.y) references table "b",
+but the GROUP BY expression mentions "a". However, the equivalence class {a.i,
+b.j} allows us to use the b.j column as a grouping key for the partial
+aggregation of the "b" table. The equivalence class mechanism is suitable
+because it's designed to derive join clauses, and at the same time the join
+clauses determine the choice of grouping columns of the partial aggregate: the
+only way for the partial aggregate to provide upper join(s) with input values
+is to have the join input expression(s) in the grouping key: besides grouping
+columns, the partial aggregate can only produce the transient states of the
+aggregate functions, but aggregate functions cannot be referenced by the JOIN
+clauses.
+
+Regarding correctness, join node considers the output of the partial aggregate
+to be equivalent to the output of a plain (non-aggregated) relation scan. That
+is, a group (i.e. a row of the partial aggregate output) matches the other
+side of the join if and only if each row of the non-aggregate relation
+does. In other words, all rows belonging to the same group have the same value
+of the join columns (As mentioned above, a join cannot reference other output
+expressions of the partial aggregate than the grouping expressions.).
+
+However, there's a restriction from the aggregate's perspective: the aggregate
+cannot be pushed down if any column referenced by either grouping expression
+or aggregate function can be set to NULL by an outer join above the relation
+to which we want to apply the partiall aggregation. The point is that those
+NULL values would not appear on the input of the pushed-down, so it could
+either put the rows into groups in a different way than the aggregate at the
+top of the plan, or it could compute wrong values of the aggregate functions.
+
+Besides base relation, the aggregation can also be pushed down to join:
+
+  EXPLAIN
+  SELECT a.i, avg(b.y + c.v)
+  FROM   a JOIN b ON b.j = a.i
+         JOIN c ON c.k = a.i
+  WHERE b.j = c.k GROUP BY a.i;
+
+  Finalize HashAggregate
+    Group Key: a.i
+    ->  Hash Join
+      Hash Cond: (b.j = a.i)
+      ->  Partial HashAggregate
+        Group Key: b.j
+        ->  Hash Join
+              Hash Cond: (b.j = c.k)
+              ->  Seq Scan on b
+              ->  Hash
+                ->  Seq Scan on c
+      ->  Hash
+        ->  Seq Scan on a
+
+Whether the Agg node is created out of base relation or out of join, it's
+added to a separate RelOptInfo that we call "grouped relation". Grouped
+relation can be joined to a non-grouped relation, which results in a grouped
+relation too. Join of two grouped relations does not seem to be very useful
+and is currently not supported.
+
+If query_planner produces a grouped relation that contains valid paths, these
+are simply added to the UPPERREL_PARTIAL_GROUP_AGG relation. Further
+processing of these paths then does not differ from processing of other
+partially grouped paths.
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4ddaed31a4..6638311ead 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -62,6 +62,7 @@ typedef struct pushdown_safety_info
 
 /* These parameters are set by GUC */
 bool        enable_geqo = false;    /* just in case GUC doesn't set it */
+bool        enable_agg_pushdown;
 int            geqo_threshold;
 int            min_parallel_table_scan_size;
 int            min_parallel_index_scan_size;
@@ -75,6 +76,7 @@ join_search_hook_type join_search_hook = NULL;
 
 static void set_base_rel_consider_startup(PlannerInfo *root);
 static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
 static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
                          Index rti, RangeTblEntry *rte);
@@ -126,6 +128,9 @@ static void set_result_pathlist(PlannerInfo *root, RelOptInfo *rel,
                                 RangeTblEntry *rte);
 static void set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel,
                                    RangeTblEntry *rte);
+static void add_grouped_path(PlannerInfo *root, RelOptInfo *rel,
+                             Path *subpath, AggStrategy aggstrategy,
+                             RelAggInfo *agg_info);
 static RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist);
 static bool subquery_is_pushdown_safe(Query *subquery, Query *topquery,
                                       pushdown_safety_info *safetyInfo);
@@ -188,6 +193,12 @@ make_one_rel(PlannerInfo *root, List *joinlist)
      */
     set_base_rel_sizes(root);
 
+    /*
+     * Now that the sizes are known, we can estimate the sizes of the grouped
+     * relations.
+     */
+    setup_base_grouped_rels(root);
+
     /*
      * We should now have size estimates for every actual table involved in
      * the query, and we also know which if any have been deleted from the
@@ -328,6 +339,54 @@ set_base_rel_sizes(PlannerInfo *root)
     }
 }
 
+/*
+ * setup_base_grouped_rels
+ *      For each "plain" relation build a grouped relation if aggregate pushdown
+ *    is possible and if this relation is suitable for partial aggregation.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+    Index        rti;
+
+    /* If there are no grouping expressions, no aggregate push-down. */
+    if (!root->grouped_var_list)
+        return;
+
+
+    for (rti = 1; rti < root->simple_rel_array_size; rti++)
+    {
+        RelOptInfo *brel = root->simple_rel_array[rti];
+        RelOptInfo *rel_grouped;
+        RelAggInfo *agg_info;
+
+        /* there may be empty slots corresponding to non-baserel RTEs */
+        if (brel == NULL)
+            continue;
+
+        Assert(brel->relid == rti); /* sanity check on array */
+
+        /* ignore RTEs that are "other rels" */
+        if (brel->reloptkind != RELOPT_BASEREL)
+            continue;
+
+        /*
+         * The aggregate push-down feature only makes sense if there are
+         * multiple base rels in the query.
+         */
+        if (!bms_nonempty_difference(root->all_baserels, brel->relids))
+            continue;
+
+        rel_grouped = build_simple_grouped_rel(root, brel->relid, &agg_info);
+        /* Couldn't any aggregate be pushed down to this relation? */
+        if (!rel_grouped)
+            continue;
+
+        /* Make the relation available for joining. */
+        add_grouped_rel(root, rel_grouped, agg_info);
+    }
+}
+
 /*
  * set_base_rel_pathlists
  *      Finds all paths available for scanning each base-relation entry.
@@ -769,6 +828,8 @@ static void
 set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 {
     Relids        required_outer;
+    RelOptInfo *rel_grouped;
+    RelAggInfo *agg_info;
 
     /*
      * We don't support pushing join clauses into the quals of a seqscan, but
@@ -789,6 +850,14 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
     /* Consider TID scans */
     create_tidscan_paths(root, rel);
+
+    /* Add paths to the grouped relation if one exists. */
+    rel_grouped = find_grouped_rel(root, rel->relids, &agg_info);
+    if (!rel_grouped)
+        return;
+
+    generate_grouping_paths(root, rel_grouped, rel, agg_info);
+    set_cheapest(rel_grouped);
 }
 
 /*
@@ -3263,6 +3332,87 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
     }
 }
 
+/*
+ * generate_grouping_paths
+ *         Create partially aggregated paths and add them to grouped relation.
+ *
+ * "rel_plain" is base or join relation whose paths are not grouped.
+ */
+void
+generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
+                        RelOptInfo *rel_plain, RelAggInfo *agg_info)
+{
+    ListCell   *lc;
+
+    if (IS_DUMMY_REL(rel_plain))
+    {
+        mark_dummy_rel(rel_grouped);
+        return;
+    }
+
+    foreach(lc, rel_plain->pathlist)
+    {
+        Path       *path = (Path *) lfirst(lc);
+
+        /*
+         * Since the path originates from the non-grouped relation which is
+         * not aware of the aggregate push-down, we must ensure that it
+         * provides the correct input for aggregation.
+         */
+        path = (Path *) create_projection_path(root, rel_grouped, path,
+                                               agg_info->agg_input);
+
+        /*
+         * add_grouped_path() will check whether the path has suitable
+         * pathkeys.
+         */
+        add_grouped_path(root, rel_grouped, path, AGG_SORTED, agg_info);
+
+        /*
+         * Repeated creation of hash table (for new parameter values) should
+         * be possible, does not sound like a good idea in terms of
+         * efficiency.
+         */
+        if (path->param_info == NULL)
+            add_grouped_path(root, rel_grouped, path, AGG_HASHED, agg_info);
+    }
+
+    /* Could not generate any grouped paths? */
+    if (rel_grouped->pathlist == NIL)
+        mark_dummy_rel(rel_grouped);
+}
+
+/*
+ * Apply partial aggregation to a subpath and add the AggPath to the pathlist.
+ */
+static void
+add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+                 AggStrategy aggstrategy, RelAggInfo *agg_info)
+{
+    Path       *agg_path;
+
+
+    if (aggstrategy == AGG_HASHED)
+        agg_path = (Path *) create_agg_hashed_path(root, rel, subpath,
+                                                   agg_info);
+    else if (aggstrategy == AGG_SORTED)
+        agg_path = (Path *) create_agg_sorted_path(root, rel, subpath,
+                                                   agg_info);
+    else
+        elog(ERROR, "unexpected strategy %d", aggstrategy);
+
+    /*
+     * Bail out if we failed to create a suitable aggregated path. This can
+     * happen e.g. then the path does not support hashing (for AGG_HASHED),
+     * or when the input path is not sorted.
+     */
+    if (agg_path == NULL)
+        return;
+
+    /* Add the grouped path to the list of grouped base paths. */
+    add_path(rel, (Path *) agg_path);
+}
+
 /*
  * make_rel_from_joinlist
  *      Build access paths using a "joinlist" to guide the join path search.
@@ -3425,6 +3575,8 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
          */
         foreach(lc, root->join_rel_level[lev])
         {
+            RelOptInfo *rel_grouped;
+
             rel = (RelOptInfo *) lfirst(lc);
 
             /* Create paths for partitionwise joins. */
@@ -3441,6 +3593,11 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
             /* Find and save the cheapest paths for this rel */
             set_cheapest(rel);
 
+            /* The same for grouped relation if one exists. */
+            rel_grouped = find_grouped_rel(root, rel->relids, NULL);
+            if (rel_grouped)
+                set_cheapest(rel_grouped);
+
 #ifdef OPTIMIZER_DEBUG
             debug_print_rel(root, rel);
 #endif
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 4c6b1d1f55..4688f561f0 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -6016,11 +6016,11 @@ set_pathtarget_cost_width(PlannerInfo *root, PathTarget *target)
     foreach(lc, target->exprs)
     {
         Node       *node = (Node *) lfirst(lc);
+        int32        item_width;
 
         if (IsA(node, Var))
         {
             Var           *var = (Var *) node;
-            int32        item_width;
 
             /* We should not see any upper-level Vars here */
             Assert(var->varlevelsup == 0);
@@ -6052,6 +6052,20 @@ set_pathtarget_cost_width(PlannerInfo *root, PathTarget *target)
             Assert(item_width > 0);
             tuple_width += item_width;
         }
+        else if (IsA(node, Aggref))
+        {
+            /*
+             * If the target is evaluated by AggPath, it'll care of cost
+             * estimate. If the target is above AggPath (typically target of a
+             * join relation that contains grouped relation), the cost of
+             * Aggref should not be accounted for again.
+             *
+             * On the other hand, width is always needed.
+             */
+            item_width = get_typavgwidth(exprType(node), exprTypmod(node));
+            Assert(item_width > 0);
+            tuple_width += item_width;
+        }
         else
         {
             /*
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index e65b967b1f..483daeb5de 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -3149,6 +3149,136 @@ is_redundant_derived_clause(RestrictInfo *rinfo, List *clauselist)
     return false;
 }
 
+/*
+ * translate_expression_to_rels
+ *        If the appropriate equivalence classes exist, replace vars in
+ *        gvi->gvexpr with vars whose varno is equal to relid. Return NULL if
+ *        translation is not possible or needed.
+ *
+ * Note: Currently we only translate Var expressions. This is subject to
+ * change as the aggregate push-down feature gets enhanced.
+ */
+GroupedVarInfo *
+translate_expression_to_rel(PlannerInfo *root, GroupedVarInfo *gvi,
+                            Index relid)
+{
+    Var           *var;
+    ListCell   *l1;
+    bool        found_orig = false;
+    Var           *var_translated = NULL;
+    GroupedVarInfo *result;
+
+    /* Can't do anything w/o equivalence classes. */
+    if (root->eq_classes == NIL)
+        return NULL;
+
+    var = castNode(Var, gvi->gvexpr);
+
+    /*
+     * Do we need to translate the var?
+     */
+    if (var->varno == relid)
+        return NULL;
+
+    /*
+     * Find the replacement var.
+     */
+    foreach(l1, root->eq_classes)
+    {
+        EquivalenceClass *ec = lfirst_node(EquivalenceClass, l1);
+        ListCell   *l2;
+
+        /* TODO Check if any other EC kind should be ignored. */
+        if (ec->ec_has_volatile || ec->ec_below_outer_join || ec->ec_broken)
+            continue;
+
+        /* Single-element EC can hardly help in translations. */
+        if (list_length(ec->ec_members) == 1)
+            continue;
+
+        /*
+         * Collect all vars of this EC and their varnos.
+         *
+         * ec->ec_relids does not help because we're only interested in a
+         * subset of EC members.
+         */
+        foreach(l2, ec->ec_members)
+        {
+            EquivalenceMember *em = lfirst_node(EquivalenceMember, l2);
+            Var           *ec_var;
+
+            /*
+             * The grouping expressions derived here are used to evaluate
+             * possibility to push aggregation down to RELOPT_BASEREL or
+             * RELOPT_JOINREL relations, and to construct reltargets for the
+             * grouped rels. We're not interested at the moment whether the
+             * relations do have children.
+             */
+            if (em->em_is_child)
+                continue;
+
+            if (!IsA(em->em_expr, Var))
+                continue;
+
+            ec_var = castNode(Var, em->em_expr);
+            if (equal(ec_var, var))
+                found_orig = true;
+            else if (ec_var->varno == relid)
+                var_translated = ec_var;
+
+            if (found_orig && var_translated)
+            {
+                /*
+                 * The replacement Var must have the same data type, otherwise
+                 * the values are not guaranteed to be grouped in the same way
+                 * as values of the original Var.
+                 */
+                if (ec_var->vartype != var->vartype)
+                    return NULL;
+
+                break;
+            }
+        }
+
+        if (found_orig)
+        {
+            /*
+             * The same expression probably does not exist in multiple ECs.
+             */
+            if (var_translated == NULL)
+            {
+                /*
+                 * Failed to translate the expression.
+                 */
+                return NULL;
+            }
+            else
+            {
+                /* Success. */
+                break;
+            }
+        }
+        else
+        {
+            /*
+             * Vars of the requested relid can be in the next ECs too.
+             */
+            var_translated = NULL;
+        }
+    }
+
+    if (!found_orig)
+        return NULL;
+
+    result = makeNode(GroupedVarInfo);
+    memcpy(result, gvi, sizeof(GroupedVarInfo));
+
+    result->gv_eval_at = bms_make_singleton(relid);
+    result->gvexpr = (Expr *) var_translated;
+
+    return result;
+}
+
 /*
  * is_redundant_with_indexclauses
  *        Test whether rinfo is redundant with any clause in the IndexClause
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 9da3ff2f9a..09a92541c0 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -21,6 +21,7 @@
 #include "optimizer/paths.h"
 #include "partitioning/partbounds.h"
 #include "utils/memutils.h"
+#include "utils/selfuncs.h"
 
 
 static void make_rels_by_clause_joins(PlannerInfo *root,
@@ -35,6 +36,10 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
 static bool restriction_is_constant_false(List *restrictlist,
                                           RelOptInfo *joinrel,
                                           bool only_pushed_down);
+static RelOptInfo *make_join_rel_common(PlannerInfo *root, RelOptInfo *rel1,
+                                        RelOptInfo *rel2,
+                                        RelAggInfo *agg_info,
+                                        RelOptInfo *rel_agg_input);
 static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
                                         RelOptInfo *rel2, RelOptInfo *joinrel,
                                         SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -669,21 +674,20 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
     return true;
 }
 
-
 /*
- * make_join_rel
- *       Find or create a join RelOptInfo that represents the join of
- *       the two given rels, and add to it path information for paths
- *       created with the two rels as outer and inner rel.
- *       (The join rel may already contain paths generated from other
- *       pairs of rels that add up to the same set of base rels.)
+ * make_join_rel_common
+ *     The workhorse of make_join_rel().
+ *
+ *    'agg_info' contains the reltarget of grouped relation and everything we
+ *    need to aggregate the join result. If NULL, then the join relation should
+ *    not be grouped.
  *
- * NB: will return NULL if attempted join is not valid.  This can happen
- * when working with outer joins, or with IN or EXISTS clauses that have been
- * turned into joins.
+ *    'rel_agg_input' describes the AggPath input relation if the join output
+ *    should be aggregated. If NULL is passed, do not aggregate the join output.
  */
-RelOptInfo *
-make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
+static RelOptInfo *
+make_join_rel_common(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
+                     RelAggInfo *agg_info, RelOptInfo *rel_agg_input)
 {
     Relids        joinrelids;
     SpecialJoinInfo *sjinfo;
@@ -744,7 +748,7 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
      * goes with this particular joining.
      */
     joinrel = build_join_rel(root, joinrelids, rel1, rel2, sjinfo,
-                             &restrictlist);
+                             &restrictlist, agg_info);
 
     /*
      * If we've already proven this join is empty, we needn't consider any
@@ -757,14 +761,173 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
     }
 
     /* Add paths to the join relation. */
-    populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
-                                restrictlist);
+    if (rel_agg_input == NULL)
+    {
+        /*
+         * Simply join the input relations, whether both are plain or one of
+         * them is grouped.
+         */
+        populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
+                                    restrictlist);
+    }
+    else
+    {
+        /* The join relation is grouped. */
+        Assert(agg_info != NULL);
+
+        /*
+         * Apply partial aggregation to the paths of rel_agg_input and add the
+         * resulting paths to joinrel.
+         */
+        generate_grouping_paths(root, joinrel, rel_agg_input, agg_info);
+    }
 
     bms_free(joinrelids);
 
     return joinrel;
 }
 
+/*
+ * make_join_rel_combined
+ *     Join grouped relation to non-grouped one.
+ */
+static void
+make_join_rel_combined(PlannerInfo *root, RelOptInfo *rel1,
+                       RelOptInfo *rel2,
+                       RelAggInfo *agg_info)
+{
+    RelOptInfo *rel1_grouped;
+    RelOptInfo *rel2_grouped;
+    bool        rel1_grouped_useful = false;
+    bool        rel2_grouped_useful = false;
+
+    /* Retrieve the grouped relations. */
+    rel1_grouped = find_grouped_rel(root, rel1->relids, NULL);
+    rel2_grouped = find_grouped_rel(root, rel2->relids, NULL);
+
+    /*
+     * Dummy rel may indicate a join relation that is able to generate grouped
+     * paths as such (i.e. it has valid agg_info), but for which the path
+     * actually could not be created (e.g. only AGG_HASHED strategy was
+     * possible but work_mem was not sufficient for hash table).
+     */
+    rel1_grouped_useful = rel1_grouped != NULL && !IS_DUMMY_REL(rel1_grouped);
+    rel2_grouped_useful = rel2_grouped != NULL && !IS_DUMMY_REL(rel2_grouped);
+
+    /* Nothing to do if there's no grouped relation. */
+    if (!rel1_grouped_useful && !rel2_grouped_useful)
+        return;
+
+    if (rel1_grouped_useful)
+        make_join_rel_common(root, rel1_grouped, rel2, agg_info, NULL);
+
+    if (rel2_grouped_useful)
+        make_join_rel_common(root, rel1, rel2_grouped, agg_info, NULL);
+
+    /*
+     * Join of two grouped relations is currently not supported. In such a
+     * case, grouping of one side would change the occurrence of the other
+     * side's aggregate transient states on the input of the final
+     * aggregation. This can be handled by adjusting the transient states, but
+     * it's not worth the effort because it's hard to find a use case for this
+     * kind of join.
+     *
+     * XXX If the join of two grouped rels is implemented someday, note that
+     * both rels can have aggregates, so it'd be hard to join grouped rel to
+     * non-grouped here: 1) such a "mixed join" would require a special
+     * target, 2) both AGGSPLIT_FINAL_DESERIAL and AGGSPLIT_SIMPLE aggregates
+     * could appear in the target of the final aggregation node, originating
+     * from the grouped and the non-grouped input rel respectively.
+     */
+}
+
+/*
+ * make_join_rel
+ *       Find or create a join RelOptInfo that represents the join of
+ *       the two given rels, and add to it path information for paths
+ *       created with the two rels as outer and inner rel.
+ *       (The join rel may already contain paths generated from other
+ *       pairs of rels that add up to the same set of base rels.)
+ *
+ *       In addition to creating an ordinary join relation, try to create a
+ *       grouped one. There are two strategies to achieve that: join a grouped
+ *       relation to plain one, or join two plain relations and apply partial
+ *       aggregation to the result.
+ *
+ * NB: will return NULL if attempted join is not valid.  This can happen when
+ * working with outer joins, or with IN or EXISTS clauses that have been
+ * turned into joins. Besides that, NULL is also returned if caller is
+ * interested in a grouped relation but it could not be created.
+ *
+ * Only the plain relation is returned; if grouped relation exists, it can be
+ * retrieved using find_grouped_rel().
+ */
+RelOptInfo *
+make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
+{
+    Relids        joinrelids;
+    RelAggInfo *agg_info = NULL;
+    RelOptInfo *joinrel,
+               *joinrel_plain;
+
+    /* 1) form the plain join. */
+    joinrel = make_join_rel_common(root, rel1, rel2, NULL, NULL);
+    joinrel_plain = joinrel;
+
+    if (joinrel_plain == NULL)
+        return joinrel_plain;
+
+    /*
+     * We're done if there are no grouping expressions nor aggregates.
+     */
+    if (root->grouped_var_list == NIL)
+        return joinrel_plain;
+
+    joinrelids = bms_union(rel1->relids, rel2->relids);
+    joinrel = find_grouped_rel(root, joinrelids, &agg_info);
+
+    if (joinrel != NULL)
+    {
+        /*
+         * If the same grouped joinrel was already formed, just with the base
+         * rels divided between rel1 and rel2 in a different way, the matching
+         * agg_info should already be there.
+         */
+        Assert(agg_info != NULL);
+    }
+    else
+    {
+        /*
+         * agg_info must be created from scratch.
+         */
+        agg_info = create_rel_agg_info(root, joinrel_plain);
+
+        /* Cannot we build grouped join? */
+        if (agg_info == NULL)
+            return joinrel_plain;
+
+        /*
+         * The number of aggregate input rows is simply the number of rows of
+         * the non-grouped relation, which should have been estimated by now.
+         */
+        agg_info->input_rows = joinrel_plain->rows;
+    }
+
+    /*
+     * 2) join two plain rels and aggregate the join paths. Aggregate
+     * push-down only makes sense if the join is not the top-level one.
+     */
+    if (bms_nonempty_difference(root->all_baserels, joinrelids))
+        make_join_rel_common(root, rel1, rel2, agg_info, joinrel_plain);
+
+    /*
+     * 3) combine plain and grouped relations.
+     */
+    make_join_rel_combined(root, rel1, rel2, agg_info);
+
+    return joinrel_plain;
+}
+
 /*
  * populate_joinrel_with_paths
  *      Add paths to the given joinrel for given pair of joining relations. The
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index fd8cbb1dc7..8dc39765f2 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "access/nbtree.h"
 #include "catalog/pg_class.h"
 #include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
@@ -48,6 +49,8 @@ typedef struct PostponedQual
 } PostponedQual;
 
 
+static void create_aggregate_grouped_var_infos(PlannerInfo *root);
+static void create_grouping_expr_grouped_var_infos(PlannerInfo *root);
 static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
                                        Index rtindex);
 static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -270,6 +273,293 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
     }
 }
 
+/*
+ * Add GroupedVarInfo to grouped_var_list for each aggregate as well as for
+ * each possible grouping expression.
+ *
+ * root->group_pathkeys must be setup before this function is called.
+ */
+extern void
+setup_aggregate_pushdown(PlannerInfo *root)
+{
+    ListCell   *lc;
+
+    /*
+     * Isn't user interested in the aggregate push-down feature?
+     */
+    if (!enable_agg_pushdown)
+        return;
+
+    /* The feature can only be applied to grouped aggregation. */
+    if (!root->parse->groupClause)
+        return;
+
+    /*
+     * Grouping sets require multiple different groupings but the base
+     * relation can only generate one.
+     */
+    if (root->parse->groupingSets)
+        return;
+
+    /*
+     * SRF is not allowed in the aggregate argument and we don't even want it
+     * in the GROUP BY clause, so forbid it in general. It needs to be
+     * analyzed if evaluation of a GROUP BY clause containing SRF below the
+     * query targetlist would be correct. Currently it does not seem to be an
+     * important use case.
+     */
+    if (root->parse->hasTargetSRFs)
+        return;
+
+    /* Create GroupedVarInfo per (distinct) aggregate. */
+    create_aggregate_grouped_var_infos(root);
+
+    /* Isn't there any aggregate to be pushed down? */
+    if (root->grouped_var_list == NIL)
+        return;
+
+    /* Create GroupedVarInfo per grouping expression. */
+    create_grouping_expr_grouped_var_infos(root);
+
+    /* Isn't there any useful grouping expression for aggregate push-down? */
+    if (root->grouped_var_list == NIL)
+        return;
+
+    /*
+     * Now that we know that grouping can be pushed down, search for the
+     * maximum sortgroupref. The base relations may need it if extra grouping
+     * expressions get added to them.
+     */
+    Assert(root->max_sortgroupref == 0);
+    foreach(lc, root->processed_tlist)
+    {
+        TargetEntry *te = lfirst_node(TargetEntry, lc);
+
+        if (te->ressortgroupref > root->max_sortgroupref)
+            root->max_sortgroupref = te->ressortgroupref;
+    }
+}
+
+/*
+ * Create GroupedVarInfo for each distinct aggregate.
+ *
+ * If any aggregate is not suitable, set root->grouped_var_list to NIL and
+ * return.
+ */
+static void
+create_aggregate_grouped_var_infos(PlannerInfo *root)
+{
+    List       *tlist_exprs;
+    ListCell   *lc;
+
+    Assert(root->grouped_var_list == NIL);
+
+    tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+                                  PVC_INCLUDE_AGGREGATES |
+                                  PVC_RECURSE_WINDOWFUNCS);
+
+    /*
+     * Although GroupingFunc is related to root->parse->groupingSets, this
+     * field does not necessarily reflect its presence.
+     */
+    foreach(lc, tlist_exprs)
+    {
+        Expr       *expr = (Expr *) lfirst(lc);
+
+        if (IsA(expr, GroupingFunc))
+            return;
+    }
+
+    /*
+     * Aggregates within the HAVING clause need to be processed in the same
+     * way as those in the main targetlist.
+     *
+     * Note that the contained aggregates will be pushed down, but the
+     * containing HAVING clause must be ignored until the aggregation is
+     * finalized.
+     */
+    if (root->parse->havingQual != NULL)
+    {
+        List       *having_exprs;
+
+        having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+                                       PVC_INCLUDE_AGGREGATES);
+        if (having_exprs != NIL)
+            tlist_exprs = list_concat(tlist_exprs, having_exprs);
+    }
+
+    if (tlist_exprs == NIL)
+        return;
+
+    foreach(lc, tlist_exprs)
+    {
+        Expr       *expr = (Expr *) lfirst(lc);
+        Aggref       *aggref;
+        ListCell   *lc2;
+        GroupedVarInfo *gvi;
+        bool        exists;
+
+        /*
+         * tlist_exprs may also contain Vars, but we only need Aggrefs.
+         */
+        if (IsA(expr, Var))
+            continue;
+
+        aggref = castNode(Aggref, expr);
+
+        /* TODO Think if (some of) these can be handled. */
+        if (aggref->aggvariadic ||
+            aggref->aggdirectargs || aggref->aggorder ||
+            aggref->aggdistinct)
+        {
+            /*
+             * Aggregation push-down is not useful if at least one aggregate
+             * cannot be evaluated below the top-level join.
+             *
+             * XXX Is it worth freeing the GroupedVarInfos and their subtrees?
+             */
+            root->grouped_var_list = NIL;
+            break;
+        }
+
+        /* Does GroupedVarInfo for this aggregate already exist? */
+        exists = false;
+        foreach(lc2, root->grouped_var_list)
+        {
+            gvi = lfirst_node(GroupedVarInfo, lc2);
+
+            if (equal(expr, gvi->gvexpr))
+            {
+                exists = true;
+                break;
+            }
+        }
+
+        /* Construct a new GroupedVarInfo if does not exist yet. */
+        if (!exists)
+        {
+            Relids        relids;
+
+            gvi = makeNode(GroupedVarInfo);
+            gvi->gvexpr = (Expr *) copyObject(aggref);
+
+            /* Find out where the aggregate should be evaluated. */
+            relids = pull_varnos(root, (Node *) aggref);
+            if (!bms_is_empty(relids))
+                gvi->gv_eval_at = relids;
+            else
+                gvi->gv_eval_at = NULL;
+
+            root->grouped_var_list = lappend(root->grouped_var_list, gvi);
+        }
+    }
+
+    list_free(tlist_exprs);
+}
+
+/*
+ * Create GroupedVarInfo for each expression usable as grouping key.
+ *
+ * In addition to the expressions of the query targetlist, group_pathkeys is
+ * also considered the source of grouping expressions. That increases the
+ * chance to get the relation output grouped.
+ */
+static void
+create_grouping_expr_grouped_var_infos(PlannerInfo *root)
+{
+    ListCell   *l1,
+               *l2;
+    List       *exprs = NIL;
+    List       *sortgrouprefs = NIL;
+
+    /*
+     * Make sure GroupedVarInfo exists for each expression usable as grouping
+     * key.
+     */
+    foreach(l1, root->parse->groupClause)
+    {
+        SortGroupClause *sgClause;
+        TargetEntry *te;
+        Index        sortgroupref;
+        TypeCacheEntry *tce;
+        Oid            equalimageproc;
+
+        sgClause = lfirst_node(SortGroupClause, l1);
+        te = get_sortgroupclause_tle(sgClause, root->processed_tlist);
+        sortgroupref = te->ressortgroupref;
+
+        Assert(sortgroupref > 0);
+
+        /*
+         * Non-zero sortgroupref does not necessarily imply grouping
+         * expression: data can also be sorted by aggregate.
+         */
+        if (IsA(te->expr, Aggref))
+            continue;
+
+        /*
+         * The aggregate push-down feature currently supports only plain Vars
+         * as grouping expressions.
+         */
+        if (!IsA(te->expr, Var))
+        {
+            root->grouped_var_list = NIL;
+            return;
+        }
+
+        /*
+         * Aggregate push-down is only possible if equality of grouping keys
+         * per the equality operator implies bitwise equality. Otherwise, if
+         * we put keys of different byte images into the same group, we lose
+         * some information that may be needed to evaluate join clauses above
+         * the pushed-down aggregate node, or the WHERE clause.
+         *
+         * For example, the NUMERIC data type is not supported because values
+         * that fall into the same group according to the equality operator
+         * (e.g. 0 and 0.0) can have different scale.
+         */
+        tce = lookup_type_cache(exprType((Node *) te->expr),
+                                TYPECACHE_BTREE_OPFAMILY);
+        if (!OidIsValid(tce->btree_opf) ||
+            !OidIsValid(tce->btree_opintype))
+            goto fail;
+
+        equalimageproc = get_opfamily_proc(tce->btree_opf,
+                                           tce->btree_opintype,
+                                           tce->btree_opintype,
+                                           BTEQUALIMAGE_PROC);
+        if (!OidIsValid(equalimageproc) ||
+            !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+                                               tce->typcollation,
+                                               ObjectIdGetDatum(tce->btree_opintype))))
+            goto fail;
+
+        exprs = lappend(exprs, te->expr);
+        sortgrouprefs = lappend_int(sortgrouprefs, sortgroupref);
+    }
+
+    /*
+     * Construct GroupedVarInfo for each expression.
+     */
+    forboth(l1, exprs, l2, sortgrouprefs)
+    {
+        Var           *var = lfirst_node(Var, l1);
+        int            sortgroupref = lfirst_int(l2);
+        GroupedVarInfo *gvi = makeNode(GroupedVarInfo);
+
+        gvi->gvexpr = (Expr *) copyObject(var);
+        gvi->sortgroupref = sortgroupref;
+
+        /* Find out where the expression should be evaluated. */
+        gvi->gv_eval_at = bms_make_singleton(var->varno);
+
+        root->grouped_var_list = lappend(root->grouped_var_list, gvi);
+    }
+    return;
+
+fail:
+    root->grouped_var_list = NIL;
+}
 
 /*****************************************************************************
  *
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 55de28f073..3302673c59 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -66,6 +66,7 @@ query_planner(PlannerInfo *root,
      * here.
      */
     root->join_rel_list = makeNode(RelInfoList);
+    root->agg_info_list = makeNode(RelInfoList);
     root->join_rel_level = NULL;
     root->join_cur_level = 0;
     root->canon_pathkeys = NIL;
@@ -76,6 +77,7 @@ query_planner(PlannerInfo *root,
     root->placeholder_list = NIL;
     root->placeholder_array = NULL;
     root->placeholder_array_size = 0;
+    root->grouped_var_list = NIL;
     root->fkey_list = NIL;
     root->initial_rels = NIL;
 
@@ -254,6 +256,16 @@ query_planner(PlannerInfo *root,
      */
     extract_restriction_or_clauses(root);
 
+    /*
+     * If the query result can be grouped, check if any grouping can be
+     * performed below the top-level join. If so, setup
+     * root->grouped_var_list.
+     *
+     * The base relations should be fully initialized now, so that we have
+     * enough info to decide whether grouping is possible.
+     */
+    setup_aggregate_pushdown(root);
+
     /*
      * Now expand appendrels by adding "otherrels" for their children.  We
      * delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 493a3af0fa..3292b4b419 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -629,6 +629,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
     memset(root->upper_rels, 0, sizeof(root->upper_rels));
     memset(root->upper_targets, 0, sizeof(root->upper_targets));
     root->processed_tlist = NIL;
+    root->max_sortgroupref = 0;
     root->update_colnos = NIL;
     root->grouping_map = NULL;
     root->minmax_aggs = NIL;
@@ -3856,11 +3857,11 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
         bool        force_rel_creation;
 
         /*
-         * If we're doing partitionwise aggregation at this level, force
-         * creation of a partially_grouped_rel so we can add partitionwise
-         * paths to it.
+         * If we're doing partitionwise aggregation at this level or if
+         * aggregate push-down succeeded to create some paths, force creation
+         * of a partially_grouped_rel so we can add the related paths to it.
          */
-        force_rel_creation = (patype == PARTITIONWISE_AGGREGATE_PARTIAL);
+        force_rel_creation = patype == PARTITIONWISE_AGGREGATE_PARTIAL;
 
         partially_grouped_rel =
             create_partial_grouping_paths(root,
@@ -3893,10 +3894,14 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 
     /* Gather any partially grouped partial paths. */
     if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
-    {
         gather_grouping_paths(root, partially_grouped_rel);
+
+    /*
+     * The non-partial paths can come either from the Gather above or from
+     * aggregate push-down.
+     */
+    if (partially_grouped_rel && partially_grouped_rel->pathlist)
         set_cheapest(partially_grouped_rel);
-    }
 
     /*
      * Estimate number of groups.
@@ -6837,6 +6842,19 @@ create_partial_grouping_paths(PlannerInfo *root,
     bool        can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
     bool        can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
 
+    /*
+     * The output relation could have been already created due to aggregate
+     * push-down.
+     */
+    partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
+
+    /*
+     * If the relation already exists, it must have been created by aggregate
+     * pushdown. We can't check how exactly it got created, but we can at
+     * least check that aggregate pushdown is enabled.
+     */
+    Assert(enable_agg_pushdown || partially_grouped_rel == NULL);
+
     /*
      * Consider whether we should generate partially aggregated non-partial
      * paths.  We can only do this if we have a non-partial path, and only if
@@ -6859,20 +6877,30 @@ create_partial_grouping_paths(PlannerInfo *root,
     /*
      * If we can't partially aggregate partial paths, and we can't partially
      * aggregate non-partial paths, then don't bother creating the new
-     * RelOptInfo at all, unless the caller specified force_rel_creation.
+     * RelOptInfo at all, unless the caller specified force_rel_creation. However
      */
     if (cheapest_total_path == NULL &&
         cheapest_partial_path == NULL &&
         !force_rel_creation)
-        return NULL;
+    {
+        /*
+         * If partially_grouped_rel exists, it should contain paths generated
+         * by the aggregate push-down feature, so the caller is interested in
+         * it.
+         */
+        return partially_grouped_rel;
+    }
 
     /*
      * Build a new upper relation to represent the result of partially
-     * aggregating the rows from the input relation.
-     */
-    partially_grouped_rel = fetch_upper_rel(root,
-                                            UPPERREL_PARTIAL_GROUP_AGG,
-                                            grouped_rel->relids);
+     * aggregating the rows from the input relation. The relation may already
+     * exist due to aggregate pushdown, in which case we don't need to create
+     * it.
+     */
+    if (partially_grouped_rel == NULL)
+        partially_grouped_rel = fetch_upper_rel(root,
+                                                UPPERREL_PARTIAL_GROUP_AGG,
+                                                grouped_rel->relids);
     partially_grouped_rel->consider_parallel =
         grouped_rel->consider_parallel;
     partially_grouped_rel->reloptkind = grouped_rel->reloptkind;
@@ -6886,10 +6914,19 @@ create_partial_grouping_paths(PlannerInfo *root,
      * emit the same tlist as regular aggregate paths, because (1) we must
      * include Vars and Aggrefs needed in HAVING, which might not appear in
      * the result tlist, and (2) the Aggrefs must be set in partial mode.
-     */
-    partially_grouped_rel->reltarget =
-        make_partial_grouping_target(root, grouped_rel->reltarget,
-                                     extra->havingQual);
+     *
+     * If the target was already created for the sake of aggregate push-down,
+     * it should be compatible with what we'd create here.
+     *
+     * XXX If fetch_upper_rel() had to create a new relation (i.e. aggregate
+     * push-down generated no paths), it created an empty target. Should we
+     * change the convention and have it assign NULL to reltarget instead?  Or
+     * should we introduce a function like is_pathtarget_empty()?
+     */
+    if (partially_grouped_rel->reltarget->exprs == NIL)
+        partially_grouped_rel->reltarget =
+            make_partial_grouping_target(root, grouped_rel->reltarget,
+                                         extra->havingQual);
 
     if (!extra->partial_costs_set)
     {
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1cb0abdbc1..72657be7ae 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -2870,6 +2870,39 @@ fix_join_expr_mutator(Node *node, fix_join_expr_context *context)
         /* No referent found for Var */
         elog(ERROR, "variable not found in subplan target lists");
     }
+    if (IsA(node, Aggref))
+    {
+        Aggref       *aggref = castNode(Aggref, node);
+
+        /*
+         * The upper plan targetlist can contain Aggref whose value has
+         * already been evaluated by the subplan. However this can only happen
+         * with specific value of aggsplit.
+         */
+        if (aggref->aggsplit == AGGSPLIT_INITIAL_SERIAL)
+        {
+            /* See if the Aggref has bubbled up from a lower plan node */
+            if (context->outer_itlist && context->outer_itlist->has_non_vars)
+            {
+                newvar = search_indexed_tlist_for_non_var((Expr *) node,
+                                                          context->outer_itlist,
+                                                          OUTER_VAR);
+                if (newvar)
+                    return (Node *) newvar;
+            }
+            if (context->inner_itlist && context->inner_itlist->has_non_vars)
+            {
+                newvar = search_indexed_tlist_for_non_var((Expr *) node,
+                                                          context->inner_itlist,
+                                                          INNER_VAR);
+                if (newvar)
+                    return (Node *) newvar;
+            }
+        }
+
+        /* No referent found for Aggref */
+        elog(ERROR, "Aggref not found in subplan target lists");
+    }
     if (IsA(node, PlaceHolderVar))
     {
         PlaceHolderVar *phv = (PlaceHolderVar *) node;
diff --git a/src/backend/optimizer/prep/prepagg.c b/src/backend/optimizer/prep/prepagg.c
index da89b55402..7bb747ee6b 100644
--- a/src/backend/optimizer/prep/prepagg.c
+++ b/src/backend/optimizer/prep/prepagg.c
@@ -64,6 +64,10 @@ static int    find_compatible_trans(PlannerInfo *root, Aggref *newagg,
                                   Datum initValue, bool initValueIsNull,
                                   List *transnos);
 static Datum GetAggInitVal(Datum textInitVal, Oid transtype);
+static void get_agg_clause_costs_trans(PlannerInfo *root, AggSplit aggsplit,
+                                       AggTransInfo *transinfo, AggClauseCosts *costs);
+static void get_agg_clause_costs_agginfo(PlannerInfo *root, AggSplit aggsplit,
+                                         AggInfo *agginfo, AggClauseCosts *costs);
 
 /* -----------------
  * Resolve the transition type of all Aggrefs, and determine which Aggrefs
@@ -546,132 +550,176 @@ get_agg_clause_costs(PlannerInfo *root, AggSplit aggsplit, AggClauseCosts *costs
     {
         AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
 
-        /*
-         * Add the appropriate component function execution costs to
-         * appropriate totals.
-         */
-        if (DO_AGGSPLIT_COMBINE(aggsplit))
-        {
-            /* charge for combining previously aggregated states */
-            add_function_cost(root, transinfo->combinefn_oid, NULL,
-                              &costs->transCost);
-        }
-        else
-            add_function_cost(root, transinfo->transfn_oid, NULL,
-                              &costs->transCost);
-        if (DO_AGGSPLIT_DESERIALIZE(aggsplit) &&
-            OidIsValid(transinfo->deserialfn_oid))
-            add_function_cost(root, transinfo->deserialfn_oid, NULL,
-                              &costs->transCost);
-        if (DO_AGGSPLIT_SERIALIZE(aggsplit) &&
-            OidIsValid(transinfo->serialfn_oid))
-            add_function_cost(root, transinfo->serialfn_oid, NULL,
-                              &costs->finalCost);
+        get_agg_clause_costs_trans(root, aggsplit, transinfo, costs);
+    }
 
-        /*
-         * These costs are incurred only by the initial aggregate node, so we
-         * mustn't include them again at upper levels.
-         */
-        if (!DO_AGGSPLIT_COMBINE(aggsplit))
-        {
-            /* add the input expressions' cost to per-input-row costs */
-            QualCost    argcosts;
+    foreach(lc, root->agginfos)
+    {
+        AggInfo    *agginfo = (AggInfo *) lfirst(lc);
 
-            cost_qual_eval_node(&argcosts, (Node *) transinfo->args, root);
-            costs->transCost.startup += argcosts.startup;
-            costs->transCost.per_tuple += argcosts.per_tuple;
+        get_agg_clause_costs_agginfo(root, aggsplit, agginfo, costs);
 
-            /*
-             * Add any filter's cost to per-input-row costs.
-             *
-             * XXX Ideally we should reduce input expression costs according
-             * to filter selectivity, but it's not clear it's worth the
-             * trouble.
-             */
-            if (transinfo->aggfilter)
-            {
-                cost_qual_eval_node(&argcosts, (Node *) transinfo->aggfilter,
-                                    root);
-                costs->transCost.startup += argcosts.startup;
-                costs->transCost.per_tuple += argcosts.per_tuple;
-            }
-        }
+    }
+}
+
+/*
+ * Like get_agg_clause_costs(), but only consider aggregates passed in the
+ * 'aggrefs' list.
+ */
+void
+get_agg_clause_costs_some(PlannerInfo *root, AggSplit aggsplit, List *aggrefs,
+                          AggClauseCosts *costs)
+{
+    ListCell    *lc;
+
+    foreach(lc, aggrefs)
+    {
+        Aggref    *aggref    = lfirst_node(Aggref, lc);
+        AggTransInfo *aggtrans = (AggTransInfo *) list_nth(root->aggtransinfos,
+                                                           aggref->aggtransno);
+        AggInfo    *agginfo = list_nth(root->agginfos, aggref->aggno);
+
+
+        get_agg_clause_costs_trans(root, aggsplit, aggtrans, costs);
+        get_agg_clause_costs_agginfo(root, aggsplit, agginfo, costs);
+    }
+}
+
+/*
+ * Sub-routine of get_agg_clause_costs(), to process a single AggTransInfo.
+ */
+static void
+get_agg_clause_costs_trans(PlannerInfo *root, AggSplit aggsplit,
+                           AggTransInfo *transinfo, AggClauseCosts *costs)
+{
+    /*
+     * Add the appropriate component function execution costs to appropriate
+     * totals.
+     */
+    if (DO_AGGSPLIT_COMBINE(aggsplit))
+    {
+        /* charge for combining previously aggregated states */
+        add_function_cost(root, transinfo->combinefn_oid, NULL,
+                          &costs->transCost);
+    }
+    else
+        add_function_cost(root, transinfo->transfn_oid, NULL,
+                          &costs->transCost);
+    if (DO_AGGSPLIT_DESERIALIZE(aggsplit) &&
+        OidIsValid(transinfo->deserialfn_oid))
+        add_function_cost(root, transinfo->deserialfn_oid, NULL,
+                          &costs->transCost);
+    if (DO_AGGSPLIT_SERIALIZE(aggsplit) &&
+        OidIsValid(transinfo->serialfn_oid))
+        add_function_cost(root, transinfo->serialfn_oid, NULL,
+                          &costs->finalCost);
+
+    /*
+     * These costs are incurred only by the initial aggregate node, so we
+     * mustn't include them again at upper levels.
+     */
+    if (!DO_AGGSPLIT_COMBINE(aggsplit))
+    {
+        /* add the input expressions' cost to per-input-row costs */
+        QualCost    argcosts;
+
+        cost_qual_eval_node(&argcosts, (Node *) transinfo->args, root);
+        costs->transCost.startup += argcosts.startup;
+        costs->transCost.per_tuple += argcosts.per_tuple;
 
         /*
-         * If the transition type is pass-by-value then it doesn't add
-         * anything to the required size of the hashtable.  If it is
-         * pass-by-reference then we have to add the estimated size of the
-         * value itself, plus palloc overhead.
+         * Add any filter's cost to per-input-row costs.
+         *
+         * XXX Ideally we should reduce input expression costs according to
+         * filter selectivity, but it's not clear it's worth the trouble.
          */
-        if (!transinfo->transtypeByVal)
+        if (transinfo->aggfilter)
         {
-            int32        avgwidth;
+            cost_qual_eval_node(&argcosts, (Node *) transinfo->aggfilter,
+                                root);
+            costs->transCost.startup += argcosts.startup;
+            costs->transCost.per_tuple += argcosts.per_tuple;
+        }
+    }
 
-            /* Use average width if aggregate definition gave one */
-            if (transinfo->aggtransspace > 0)
-                avgwidth = transinfo->aggtransspace;
-            else if (transinfo->transfn_oid == F_ARRAY_APPEND)
-            {
-                /*
-                 * If the transition function is array_append(), it'll use an
-                 * expanded array as transvalue, which will occupy at least
-                 * ALLOCSET_SMALL_INITSIZE and possibly more.  Use that as the
-                 * estimate for lack of a better idea.
-                 */
-                avgwidth = ALLOCSET_SMALL_INITSIZE;
-            }
-            else
-            {
-                avgwidth = get_typavgwidth(transinfo->aggtranstype, transinfo->aggtranstypmod);
-            }
+    /*
+     * If the transition type is pass-by-value then it doesn't add anything to
+     * the required size of the hashtable.  If it is pass-by-reference then we
+     * have to add the estimated size of the value itself, plus palloc
+     * overhead.
+     */
+    if (!transinfo->transtypeByVal)
+    {
+        int32        avgwidth;
 
-            avgwidth = MAXALIGN(avgwidth);
-            costs->transitionSpace += avgwidth + 2 * sizeof(void *);
-        }
-        else if (transinfo->aggtranstype == INTERNALOID)
+        /* Use average width if aggregate definition gave one */
+        if (transinfo->aggtransspace > 0)
+            avgwidth = transinfo->aggtransspace;
+        else if (transinfo->transfn_oid == F_ARRAY_APPEND)
         {
             /*
-             * INTERNAL transition type is a special case: although INTERNAL
-             * is pass-by-value, it's almost certainly being used as a pointer
-             * to some large data structure.  The aggregate definition can
-             * provide an estimate of the size.  If it doesn't, then we assume
-             * ALLOCSET_DEFAULT_INITSIZE, which is a good guess if the data is
-             * being kept in a private memory context, as is done by
-             * array_agg() for instance.
+             * If the transition function is array_append(), it'll use an
+             * expanded array as transvalue, which will occupy at least
+             * ALLOCSET_SMALL_INITSIZE and possibly more.  Use that as the
+             * estimate for lack of a better idea.
              */
-            if (transinfo->aggtransspace > 0)
-                costs->transitionSpace += transinfo->aggtransspace;
-            else
-                costs->transitionSpace += ALLOCSET_DEFAULT_INITSIZE;
+            avgwidth = ALLOCSET_SMALL_INITSIZE;
+        }
+        else
+        {
+            avgwidth = get_typavgwidth(transinfo->aggtranstype, transinfo->aggtranstypmod);
         }
-    }
 
-    foreach(lc, root->agginfos)
+        avgwidth = MAXALIGN(avgwidth);
+        costs->transitionSpace += avgwidth + 2 * sizeof(void *);
+    }
+    else if (transinfo->aggtranstype == INTERNALOID)
     {
-        AggInfo    *agginfo = lfirst_node(AggInfo, lc);
-        Aggref       *aggref = linitial_node(Aggref, agginfo->aggrefs);
-
         /*
-         * Add the appropriate component function execution costs to
-         * appropriate totals.
+         * INTERNAL transition type is a special case: although INTERNAL is
+         * pass-by-value, it's almost certainly being used as a pointer to
+         * some large data structure.  The aggregate definition can provide an
+         * estimate of the size.  If it doesn't, then we assume
+         * ALLOCSET_DEFAULT_INITSIZE, which is a good guess if the data is
+         * being kept in a private memory context, as is done by array_agg()
+         * for instance.
          */
-        if (!DO_AGGSPLIT_SKIPFINAL(aggsplit) &&
-            OidIsValid(agginfo->finalfn_oid))
-            add_function_cost(root, agginfo->finalfn_oid, NULL,
-                              &costs->finalCost);
+        if (transinfo->aggtransspace > 0)
+            costs->transitionSpace += transinfo->aggtransspace;
+        else
+            costs->transitionSpace += ALLOCSET_DEFAULT_INITSIZE;
+    }
+}
 
-        /*
-         * If there are direct arguments, treat their evaluation cost like the
-         * cost of the finalfn.
-         */
-        if (aggref->aggdirectargs)
-        {
-            QualCost    argcosts;
+/*
+ * Sub-routine of get_agg_clause_costs(), to process a single AggInfo.
+ */
+static void
+get_agg_clause_costs_agginfo(PlannerInfo *root, AggSplit aggsplit,
+                             AggInfo *agginfo, AggClauseCosts *costs)
+{
+    Aggref       *aggref = linitial_node(Aggref, agginfo->aggrefs);
 
-            cost_qual_eval_node(&argcosts, (Node *) aggref->aggdirectargs,
-                                root);
-            costs->finalCost.startup += argcosts.startup;
-            costs->finalCost.per_tuple += argcosts.per_tuple;
-        }
+    /*
+     * Add the appropriate component function execution costs to appropriate
+     * totals.
+     */
+    if (!DO_AGGSPLIT_SKIPFINAL(aggsplit) &&
+        OidIsValid(agginfo->finalfn_oid))
+        add_function_cost(root, agginfo->finalfn_oid, NULL,
+                          &costs->finalCost);
+
+    /*
+     * If there are direct arguments, treat their evaluation cost like the
+     * cost of the finalfn.
+     */
+    if (aggref->aggdirectargs)
+    {
+        QualCost    argcosts;
+
+        cost_qual_eval_node(&argcosts, (Node *) aggref->aggdirectargs,
+                            root);
+        costs->finalCost.startup += argcosts.startup;
+        costs->finalCost.per_tuple += argcosts.per_tuple;
     }
 }
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index f4cdb879c2..c4f16de7c1 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -1007,6 +1007,7 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
     memset(subroot->upper_rels, 0, sizeof(subroot->upper_rels));
     memset(subroot->upper_targets, 0, sizeof(subroot->upper_targets));
     subroot->processed_tlist = NIL;
+    root->max_sortgroupref = 0;
     subroot->update_colnos = NIL;
     subroot->grouping_map = NULL;
     subroot->minmax_aggs = NIL;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 6dd11329fb..2627e2f252 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2670,8 +2670,7 @@ create_projection_path(PlannerInfo *root,
     pathnode->path.pathtype = T_Result;
     pathnode->path.parent = rel;
     pathnode->path.pathtarget = target;
-    /* For now, assume we are above any joins, so no parameterization */
-    pathnode->path.param_info = NULL;
+    pathnode->path.param_info = subpath->param_info;
     pathnode->path.parallel_aware = false;
     pathnode->path.parallel_safe = rel->consider_parallel &&
         subpath->parallel_safe &&
@@ -3163,6 +3162,129 @@ create_agg_path(PlannerInfo *root,
     return pathnode;
 }
 
+/*
+ * create_agg_sorted_path
+ *        Creates a pathnode performing sorted aggregation/grouping
+ *
+ * Apply AGG_SORTED aggregation path to subpath if it's suitably sorted.
+ *
+ * NULL is returned if sorting of subpath output is not suitable.
+ */
+AggPath *
+create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+                       RelAggInfo *agg_info)
+{
+    List       *agg_exprs;
+    AggSplit    aggsplit;
+    AggClauseCosts agg_costs;
+    PathTarget *target;
+    double        dNumGroups;
+    AggPath    *result = NULL;
+
+    aggsplit = AGGSPLIT_INITIAL_SERIAL;
+    agg_exprs = agg_info->agg_exprs;
+    target = agg_info->target;
+
+    /* group_pathkeys are necessary to evaluate the sorting. */
+    if (agg_info->group_pathkeys == NIL)
+        return NULL;
+
+    /*
+     * The input path must be sorted in a specific way, but if it's not sorted
+     * at all, it's not useful for AGG_SORTED.
+     */
+    if (subpath->pathkeys == NIL)
+        return NULL;
+
+    /* Are the grouping clauses suitable for sorted aggregation? */
+    if (!grouping_is_sortable(agg_info->group_clauses))
+        return NULL;
+
+    /*
+     * Is the input path sorted enough for this grouping? TODO Consider using
+     * incremental sort if the sorting is "almost sufficient".
+     */
+    if (!pathkeys_contained_in(agg_info->group_pathkeys, subpath->pathkeys))
+        return NULL;
+
+    MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+    get_agg_clause_costs_some(root, aggsplit, agg_exprs, &agg_costs);
+
+    Assert(agg_info->group_exprs != NIL);
+    dNumGroups = estimate_num_groups(root, agg_info->group_exprs,
+                                     subpath->rows, NULL, NULL);
+
+    /*
+     * qual is NIL because the HAVING clause cannot be evaluated until the
+     * final value of the aggregate is known.
+     */
+    result = create_agg_path(root, rel, subpath, target,
+                             AGG_SORTED, aggsplit,
+                             agg_info->group_clauses,
+                             NIL,    /* qual for HAVING clause */
+                             &agg_costs,
+                             dNumGroups);
+
+    /* The agg path should require no fewer parameters than the plain one. */
+    result->path.param_info = subpath->param_info;
+
+    return result;
+}
+
+/*
+ * Apply AGG_HASHED aggregation to subpath.
+ */
+AggPath *
+create_agg_hashed_path(PlannerInfo *root, RelOptInfo *rel,
+                       Path *subpath, RelAggInfo *agg_info)
+{
+    bool        can_hash;
+    List       *agg_exprs;
+    AggSplit    aggsplit;
+    AggClauseCosts agg_costs;
+    PathTarget *target;
+    double        dNumGroups;
+    Query       *parse = root->parse;
+    AggPath    *result = NULL;
+
+    /* Do not try to create hash table for each parameter value. */
+    Assert(subpath->param_info == NULL);
+
+    aggsplit = AGGSPLIT_INITIAL_SERIAL;
+    agg_exprs = agg_info->agg_exprs;
+    target = agg_info->target;
+
+    MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+    get_agg_clause_costs_some(root, aggsplit, agg_exprs, &agg_costs);
+
+    can_hash = (parse->groupClause != NIL &&
+                parse->groupingSets == NIL &&
+                root->numOrderedAggs == 0 &&
+                grouping_is_hashable(parse->groupClause));
+
+    if (can_hash)
+    {
+        Assert(agg_info->group_exprs != NIL);
+        dNumGroups = estimate_num_groups(root, agg_info->group_exprs,
+                                         subpath->rows, NULL, NULL);
+
+        /*
+         * qual is NIL because the HAVING clause cannot be evaluated until the
+         * final value of the aggregate is known.
+         */
+        result = create_agg_path(root, rel, subpath,
+                                 target,
+                                 AGG_HASHED,
+                                 aggsplit,
+                                 agg_info->group_clauses,
+                                 NIL, /* qual for HAVING clause */
+                                 &agg_costs,
+                                 dNumGroups);
+    }
+
+    return result;
+}
+
 /*
  * create_groupingsets_path
  *      Creates a pathnode that represents performing GROUPING SETS aggregation
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 94720865f4..556f25bece 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -18,17 +18,23 @@
 
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
+#include "catalog/pg_class_d.h"
+#include "catalog/pg_constraint.h"
 #include "optimizer/appendinfo.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/inherit.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/placeholder.h"
 #include "optimizer/plancat.h"
+#include "optimizer/planner.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
 #include "utils/hsearch.h"
+#include "utils/selfuncs.h"
 #include "utils/lsyscache.h"
 
 
@@ -76,6 +82,11 @@ static void build_child_join_reltarget(PlannerInfo *root,
                                        RelOptInfo *childrel,
                                        int nappinfos,
                                        AppendRelInfo **appinfos);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+                                  PathTarget *target, PathTarget *agg_input,
+                                  List *gvis, List **group_exprs_extra_p);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
 
 
 /*
@@ -356,6 +367,110 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
     return rel;
 }
 
+/*
+ * build_simple_grouped_rel
+ *      Construct a new RelOptInfo for a grouped base relation out of an
+ *      existing non-grouped relation. On success, pointer to the corresponding
+ *      RelAggInfo is stored in *agg_info_p in addition to returning the grouped
+ *      relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, int relid,
+                         RelAggInfo **agg_info_p)
+{
+    RangeTblEntry *rte;
+    RelOptInfo *rel_plain,
+               *rel_grouped;
+    RelAggInfo *agg_info;
+
+    /* Isn't there any grouping expression to be pushed down? */
+    if (root->grouped_var_list == NIL)
+        return NULL;
+
+    rel_plain = root->simple_rel_array[relid];
+
+    /* Caller should only pass rti that represents base relation. */
+    Assert(rel_plain != NULL);
+
+    /*
+     * Not all RTE kinds are supported when grouping is considered.
+     *
+     * TODO Consider relaxing some of these restrictions.
+     */
+    rte = root->simple_rte_array[rel_plain->relid];
+    if (rte->rtekind != RTE_RELATION ||
+        rte->relkind == RELKIND_FOREIGN_TABLE ||
+        rte->tablesample != NULL)
+        return NULL;
+
+    /*
+     * Grouped append relation is not supported yet.
+     */
+    if (rte->inh)
+        return NULL;
+
+    /*
+     * Currently we do not support child relations ("other rels").
+     */
+    if (rel_plain->reloptkind != RELOPT_BASEREL)
+        return NULL;
+
+    /*
+     * Prepare the information we need for aggregation of the rel contents.
+     */
+    agg_info = create_rel_agg_info(root, rel_plain);
+    if (agg_info == NULL)
+        return NULL;
+
+    /*
+     * TODO Consider if 1) a flat copy is o.k., 2) it's safer in terms of
+     * adding new fields to RelOptInfo) to copy everything and then reset some
+     * fields, or to zero the structure and copy individual fields.
+     */
+    rel_grouped = makeNode(RelOptInfo);
+    memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
+
+    /*
+     * Note on consider_startup: while the AGG_HASHED strategy needs the whole
+     * relation, AGG_SORTED does not. Therefore we do not force
+     * consider_startup to false.
+     */
+
+    /*
+     * Set the appropriate target for grouped paths.
+     *
+     * reltarget should match the target of partially aggregated paths.
+     */
+    rel_grouped->reltarget = agg_info->target;
+
+    /*
+     * Grouped paths must not be mixed with the plain ones.
+     */
+    rel_grouped->pathlist = NIL;
+    rel_grouped->partial_pathlist = NIL;
+    rel_grouped->cheapest_startup_path = NULL;
+    rel_grouped->cheapest_total_path = NULL;
+    rel_grouped->cheapest_unique_path = NULL;
+    rel_grouped->cheapest_parameterized_paths = NIL;
+
+    /*
+     * The number of aggregation input rows is simply the number of rows of
+     * the non-grouped relation, which should have been estimated by now.
+     */
+    agg_info->input_rows = rel_plain->rows;
+
+    /*
+     * The number of output rows is supposedly different (lower) due to
+     * grouping.
+     */
+    rel_grouped->rows = estimate_num_groups(root, agg_info->group_exprs,
+                                            agg_info->input_rows, NULL,
+                                            NULL);
+
+    *agg_info_p = agg_info;
+    return rel_grouped;
+}
+
 /*
  * find_base_rel
  *      Find a base or other relation entry, which must already exist.
@@ -404,16 +519,20 @@ build_rel_hash(RelInfoList *list)
     /* Insert all the already-existing joinrels */
     foreach(l, list->items)
     {
-        RelOptInfo       *rel = lfirst_node(RelOptInfo, l);
+        void       *item = lfirst(l);
         RelInfoEntry *hentry;
         bool        found;
+        Relids        relids;
+
+        Assert(IsA(item, RelOptInfo));
+        relids = ((RelOptInfo *) item)->relids;
 
         hentry = (RelInfoEntry *) hash_search(hashtab,
-                                              &rel->relids,
+                                              &relids,
                                               HASH_ENTER,
                                               &found);
         Assert(!found);
-        hentry->data = rel;
+        hentry->data = item;
     }
 
     list->hash = hashtab;
@@ -462,9 +581,17 @@ find_rel_info(RelInfoList *list, Relids relids)
 
         foreach(l, list->items)
         {
-            RelOptInfo   *item = lfirst_node(RelOptInfo, l);
+            void       *item = lfirst(l);
+            Relids        item_relids = NULL;
 
-            if (bms_equal(item->relids, relids))
+            Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
+
+            if (IsA(item, RelOptInfo))
+                item_relids = ((RelOptInfo *) item)->relids;
+            else if (IsA(item, RelAggInfo))
+                item_relids = ((RelAggInfo *) item)->relids;
+
+            if (bms_equal(item_relids, relids))
                 return item;
         }
     }
@@ -489,23 +616,31 @@ find_join_rel(PlannerInfo *root, Relids relids)
  *        hashtable if there is one.
  */
 static void
-add_rel_info(RelInfoList *list, RelOptInfo *rel)
+add_rel_info(RelInfoList *list, void *data)
 {
+    Assert(IsA(data, RelOptInfo) || IsA(data, RelAggInfo));
+
     /* GEQO requires us to append the new joinrel to the end of the list! */
-    list->items = lappend(list->items, rel);
+    list->items = lappend(list->items, data);
 
     /* store it into the auxiliary hashtable if there is one. */
     if (list->hash)
     {
+        Relids        relids;
         RelInfoEntry *hentry;
         bool        found;
 
+        if (IsA(data, RelOptInfo))
+            relids = ((RelOptInfo *) data)->relids;
+        else if (IsA(data, RelAggInfo))
+            relids = ((RelAggInfo *) data)->relids;
+
         hentry = (RelInfoEntry *) hash_search(list->hash,
-                                              &rel->relids,
+                                              &relids,
                                               HASH_ENTER,
                                               &found);
         Assert(!found);
-        hentry->data = rel;
+        hentry->data = data;
     }
 }
 
@@ -520,6 +655,63 @@ add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
     add_rel_info(root->join_rel_list, joinrel);
 }
 
+/*
+ * add_grouped_rel
+ *        Add grouped base or join relation to the list of grouped relations in
+ *        the given PlannerInfo. Also add the corresponding RelAggInfo to
+ *        agg_info_list.
+ */
+void
+add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info)
+{
+    add_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], rel);
+    add_rel_info(root->agg_info_list, agg_info);
+}
+
+/*
+ * find_grouped_rel
+ *      Returns grouped relation entry (base or join relation) corresponding to
+ *      'relids' or NULL if none exists.
+ *
+ * If agg_info_p is a valid pointer, then pointer to RelAggInfo that
+ * corresponds to the relation returned is assigned to *agg_info_p.
+ *
+ * The call fetch_upper_rel(root, UPPERREL_PARTIAL_GROUP_AGG, ...) should
+ * return the same relation if it exists, however the behavior is different if
+ * the relation is not there. find_grouped_rel() should be used in
+ * query_planner() and subroutines.
+ */
+RelOptInfo *
+find_grouped_rel(PlannerInfo *root, Relids relids, RelAggInfo **agg_info_p)
+{
+    RelOptInfo *rel;
+
+    rel = (RelOptInfo *) find_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG],
+                                       relids);
+    if (rel == NULL)
+    {
+        if (agg_info_p)
+            *agg_info_p = NULL;
+
+        return NULL;
+    }
+
+    /* Is caller interested in RelAggInfo? */
+    if (agg_info_p)
+    {
+        RelAggInfo *agg_info;
+
+        agg_info = (RelAggInfo *) find_rel_info(root->agg_info_list, relids);
+
+        /* The relation exists, so the agg_info should be there too. */
+        Assert(agg_info != NULL);
+
+        *agg_info_p = agg_info;
+    }
+
+    return rel;
+}
+
 /*
  * set_foreign_rel_properties
  *        Set up foreign-join fields if outer and inner relation are foreign
@@ -582,6 +774,7 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
  * 'restrictlist_ptr': result variable.  If not NULL, *restrictlist_ptr
  *        receives the list of RestrictInfo nodes that apply to this
  *        particular pair of joinable relations.
+ * 'agg_info' indicates that grouped join relation should be created.
  *
  * restrictlist_ptr makes the routine's API a little grotty, but it saves
  * duplicated calculation of the restrictlist...
@@ -592,10 +785,12 @@ build_join_rel(PlannerInfo *root,
                RelOptInfo *outer_rel,
                RelOptInfo *inner_rel,
                SpecialJoinInfo *sjinfo,
-               List **restrictlist_ptr)
+               List **restrictlist_ptr,
+               RelAggInfo *agg_info)
 {
     RelOptInfo *joinrel;
     List       *restrictlist;
+    bool        grouped = agg_info != NULL;
 
     /* This function should be used only for join between parents. */
     Assert(!IS_OTHER_REL(outer_rel) && !IS_OTHER_REL(inner_rel));
@@ -603,7 +798,8 @@ build_join_rel(PlannerInfo *root,
     /*
      * See if we already have a joinrel for this set of base rels.
      */
-    joinrel = find_join_rel(root, joinrelids);
+    joinrel = !grouped ? find_join_rel(root, joinrelids) :
+        find_grouped_rel(root, joinrelids, NULL);
 
     if (joinrel)
     {
@@ -702,9 +898,21 @@ build_join_rel(PlannerInfo *root,
      * and inner rels we first try to build it from.  But the contents should
      * be the same regardless.
      */
-    build_joinrel_tlist(root, joinrel, outer_rel);
-    build_joinrel_tlist(root, joinrel, inner_rel);
-    add_placeholders_to_joinrel(root, joinrel, outer_rel, inner_rel);
+    if (!grouped)
+    {
+        joinrel->reltarget = create_empty_pathtarget();
+        build_joinrel_tlist(root, joinrel, outer_rel);
+        build_joinrel_tlist(root, joinrel, inner_rel);
+        add_placeholders_to_joinrel(root, joinrel, outer_rel, inner_rel);
+    }
+    else
+    {
+        /*
+         * The target for grouped join should already have its cost and width
+         * computed, see create_rel_agg_info().
+         */
+        joinrel->reltarget = agg_info->target;
+    }
 
     /*
      * add_placeholders_to_joinrel also took care of adding the ph_lateral
@@ -736,49 +944,75 @@ build_join_rel(PlannerInfo *root,
     joinrel->has_eclass_joins = has_relevant_eclass_joinclause(root, joinrel);
 
     /* Store the partition information. */
-    build_joinrel_partition_info(joinrel, outer_rel, inner_rel, restrictlist,
-                                 sjinfo->jointype);
+    if (!grouped)
+        build_joinrel_partition_info(joinrel, outer_rel, inner_rel,
+                                     restrictlist, sjinfo->jointype);
 
-    /*
-     * Set estimates of the joinrel's size.
-     */
-    set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
-                               sjinfo, restrictlist);
+    if (!grouped)
+    {
+        /*
+         * Set estimates of the joinrel's size.
+         */
+        set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
+                                   sjinfo, restrictlist);
 
-    /*
-     * Set the consider_parallel flag if this joinrel could potentially be
-     * scanned within a parallel worker.  If this flag is false for either
-     * inner_rel or outer_rel, then it must be false for the joinrel also.
-     * Even if both are true, there might be parallel-restricted expressions
-     * in the targetlist or quals.
-     *
-     * Note that if there are more than two rels in this relation, they could
-     * be divided between inner_rel and outer_rel in any arbitrary way.  We
-     * assume this doesn't matter, because we should hit all the same baserels
-     * and joinclauses while building up to this joinrel no matter which we
-     * take; therefore, we should make the same decision here however we get
-     * here.
-     */
-    if (inner_rel->consider_parallel && outer_rel->consider_parallel &&
-        is_parallel_safe(root, (Node *) restrictlist) &&
-        is_parallel_safe(root, (Node *) joinrel->reltarget->exprs))
-        joinrel->consider_parallel = true;
+        /*
+         * Set the consider_parallel flag if this joinrel could potentially be
+         * scanned within a parallel worker.  If this flag is false for either
+         * inner_rel or outer_rel, then it must be false for the joinrel also.
+         * Even if both are true, there might be parallel-restricted
+         * expressions in the targetlist or quals.
+         *
+         * Note that if there are more than two rels in this relation, they
+         * could be divided between inner_rel and outer_rel in any arbitrary
+         * way.  We assume this doesn't matter, because we should hit all the
+         * same baserels and joinclauses while building up to this joinrel no
+         * matter which we take; therefore, we should make the same decision
+         * here however we get here.
+         */
+        if (inner_rel->consider_parallel && outer_rel->consider_parallel &&
+            is_parallel_safe(root, (Node *) restrictlist) &&
+            is_parallel_safe(root, (Node *) joinrel->reltarget->exprs))
+            joinrel->consider_parallel = true;
+    }
+    else
+    {
+        /*
+         * Grouping essentially changes the number of rows.
+         *
+         * XXX We do not distinguish whether two plain rels are joined and the
+         * result is aggregated, or the aggregation has been already applied
+         * to one of the input rels. Is this worth extra effort, e.g.
+         * maintaining a separate RelOptInfo for each case (one difficulty
+         * that would introduce is construction of AppendPath)?
+         */
+        joinrel->rows = estimate_num_groups(root, agg_info->group_exprs,
+                                            agg_info->input_rows, NULL, NULL);
+    }
 
     /* Add the joinrel to the PlannerInfo. */
-    add_join_rel(root, joinrel);
+    if (!grouped)
+        add_join_rel(root, joinrel);
+    else
+        add_grouped_rel(root, joinrel, agg_info);
 
     /*
-     * Also, if dynamic-programming join search is active, add the new joinrel
-     * to the appropriate sublist.  Note: you might think the Assert on number
-     * of members should be for equality, but some of the level 1 rels might
-     * have been joinrels already, so we can only assert <=.
+     * Also, if dynamic-programming join search is active, add the new
+     * joinrelset to the appropriate sublist.  Note: you might think the
+     * Assert on number of members should be for equality, but some of the
+     * level 1 rels might have been joinrels already, so we can only assert
+     * <=.
+     *
+     * Do noting for grouped relation as it's stored aside from
+     * join_rel_level.
      */
-    if (root->join_rel_level)
+    if (root->join_rel_level && !grouped)
     {
         Assert(root->join_cur_level > 0);
-        Assert(root->join_cur_level <= bms_num_members(joinrel->relids));
+        Assert(root->join_cur_level <= bms_num_members(joinrelids));
         root->join_rel_level[root->join_cur_level] =
-            lappend(root->join_rel_level[root->join_cur_level], joinrel);
+            lappend(root->join_rel_level[root->join_cur_level],
+                    joinrel);
     }
 
     return joinrel;
@@ -2066,3 +2300,673 @@ build_child_join_reltarget(PlannerInfo *root,
     childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
     childrel->reltarget->width = parentrel->reltarget->width;
 }
+
+/*
+ * Check if the relation can produce grouped paths and return the information
+ * it'll need for it. The passed relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+    List       *gvis;
+    List       *aggregates = NIL;
+    bool        found_other_rel_agg;
+    ListCell   *lc;
+    RelAggInfo *result;
+    PathTarget *agg_input;
+    PathTarget *target = NULL;
+    List       *grp_exprs_extra = NIL;
+    List       *group_clauses_final;
+    int            i;
+    bool        pk_found, pk_missing;
+
+    /*
+     * The function shouldn't have been called if there's no opportunity for
+     * aggregate push-down.
+     */
+    Assert(root->grouped_var_list != NIL);
+
+    /*
+     * The current implementation of aggregate push-down cannot handle
+     * PlaceHolderVar (PHV).
+     *
+     * If we knew that the PHV should be evaluated in this target (and of
+     * course, if its expression matched some Aggref argument), we'd just let
+     * init_grouping_targets add that Aggref. On the other hand, if we knew
+     * that the PHV is evaluated below the current rel, we could ignore it
+     * because the referencing Aggref would take care of propagation of the
+     * value to upper joins.
+     *
+     * The problem is that the same PHV can be evaluated in the target of the
+     * current rel or in that of lower rel --- depending on the input paths.
+     * For example, consider rel->relids = {A, B, C} and if ph_eval_at = {B,
+     * C}. Path "A JOIN (B JOIN C)" implies that the PHV is evaluated by the
+     * "(B JOIN C)", while path "(A JOIN B) JOIN C" evaluates the PHV itself.
+     */
+    foreach(lc, rel->reltarget->exprs)
+    {
+        Expr       *expr = lfirst(lc);
+
+        if (IsA(expr, PlaceHolderVar))
+            return NULL;
+    }
+
+    if (IS_SIMPLE_REL(rel))
+    {
+        RangeTblEntry *rte = root->simple_rte_array[rel->relid];;
+
+        /*
+         * rtekind != RTE_RELATION case is not supported yet.
+         */
+        if (rte->rtekind != RTE_RELATION)
+            return NULL;
+    }
+
+    /* Caller should only pass base relations or joins. */
+    Assert(rel->reloptkind == RELOPT_BASEREL ||
+           rel->reloptkind == RELOPT_JOINREL);
+
+    /*
+     * If any outer join can set the attribute value to NULL, the Agg plan
+     * would receive different input at the base rel level.
+     *
+     * XXX For RELOPT_JOINREL, do not return if all the joins that can set any
+     * entry of the grouped target (do we need to postpone this check until
+     * the grouped target is available, and init_grouping_targets take care?)
+     * of this rel to NULL are provably below rel. (It's ok if rel is one of
+     * these joins.)
+     */
+    if (bms_overlap(rel->relids, root->nullable_baserels))
+        return NULL;
+
+    /*
+     * Use equivalence classes to generate additional grouping expressions for
+     * the current rel. Without these we might not be able to apply
+     * aggregation to the relation result set.
+     *
+     * It's important that create_grouping_expr_grouped_var_infos has
+     * processed the explicit grouping columns by now. If the grouping clause
+     * contains multiple expressions belonging to the same EC, the original
+     * (i.e. not derived) one should be preferred when we build grouping
+     * target for a relation. Otherwise we have a problem when trying to match
+     * target entries to grouping clauses during plan creation, see
+     * get_grouping_expression().
+     */
+    gvis = list_copy(root->grouped_var_list);
+    foreach(lc, root->grouped_var_list)
+    {
+        GroupedVarInfo *gvi = lfirst_node(GroupedVarInfo, lc);
+        int            relid = -1;
+
+        /* Only interested in grouping expressions. */
+        if (IsA(gvi->gvexpr, Aggref))
+            continue;
+
+        while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+        {
+            GroupedVarInfo *gvi_trans;
+
+            gvi_trans = translate_expression_to_rel(root, gvi, relid);
+            if (gvi_trans != NULL)
+                gvis = lappend(gvis, gvi_trans);
+        }
+    }
+
+    /*
+     * Check if some aggregates or grouping expressions can be evaluated in
+     * this relation's target, and collect all vars referenced by these
+     * aggregates / grouping expressions;
+     */
+    found_other_rel_agg = false;
+    foreach(lc, gvis)
+    {
+        GroupedVarInfo *gvi = lfirst_node(GroupedVarInfo, lc);
+
+        /*
+         * The subset includes gv_eval_at uninitialized, which includes
+         * Aggref.aggstar.
+         */
+        if (bms_is_subset(gvi->gv_eval_at, rel->relids))
+        {
+            /*
+             * init_grouping_targets will handle plain Var grouping
+             * expressions because it needs to look them up in
+             * grouped_var_list anyway.
+             */
+            if (IsA(gvi->gvexpr, Var))
+                continue;
+
+            /*
+             * Currently, GroupedVarInfo only handles Vars and Aggrefs.
+             */
+            Assert(IsA(gvi->gvexpr, Aggref));
+
+            gvi->agg_partial = (Aggref *) copyObject(gvi->gvexpr);
+            mark_partial_aggref(gvi->agg_partial, AGGSPLIT_INITIAL_SERIAL);
+
+            /*
+             * Accept the aggregate.
+             */
+            aggregates = lappend(aggregates, gvi);
+        }
+        else if (IsA(gvi->gvexpr, Aggref))
+        {
+            /*
+             * Remember that there is at least one aggregate expression that
+             * needs something else than this rel.
+             */
+            found_other_rel_agg = true;
+
+            /*
+             * This condition effectively terminates creation of the
+             * RelAggInfo, so there's no reason to check the next
+             * GroupedVarInfo.
+             */
+            break;
+        }
+    }
+
+    /*
+     * Grouping makes little sense w/o aggregate function and w/o grouping
+     * expressions.
+     */
+    if (aggregates == NIL)
+    {
+        list_free(gvis);
+        return NULL;
+    }
+
+    /*
+     * Give up if some other aggregate(s) need relations other than the
+     * current one.
+     *
+     * If the aggregate needs the current rel plus anything else, then the
+     * problem is that grouping of the current relation could make some input
+     * variables unavailable for the "higher aggregate", and it'd also
+     * decrease the number of input rows the "higher aggregate" receives.
+     *
+     * If the aggregate does not even need the current rel, then neither the
+     * current rel nor anything else should be grouped because we do not
+     * support join of two grouped relations.
+     */
+    if (found_other_rel_agg)
+    {
+        list_free(gvis);
+        return NULL;
+    }
+
+    /*
+     * Create target for grouped paths as well as one for the input paths of
+     * the aggregation paths.
+     */
+    target = create_empty_pathtarget();
+    agg_input = create_empty_pathtarget();
+
+    /*
+     * Cannot suitable targets for the aggregation push-down be derived?
+     */
+    if (!init_grouping_targets(root, rel, target, agg_input, gvis,
+                               &grp_exprs_extra))
+    {
+        list_free(gvis);
+        return NULL;
+    }
+
+    list_free(gvis);
+
+    /*
+     * Aggregation push-down makes no sense w/o grouping expressions.
+     */
+    if ((list_length(target->exprs) + list_length(grp_exprs_extra)) == 0)
+        return NULL;
+
+    group_clauses_final = root->parse->groupClause;
+
+    /*
+     * If the aggregation target should have extra grouping expressions (in
+     * order to emit input vars for join conditions), add them now. This step
+     * includes assignment of tleSortGroupRef's which we can generate now.
+     */
+    if (list_length(grp_exprs_extra) > 0)
+    {
+        Index        sortgroupref;
+
+        /*
+         * We'll have to add some clauses, but query group clause must be
+         * preserved.
+         */
+        group_clauses_final = list_copy(group_clauses_final);
+
+        /*
+         * Always start at root->max_sortgroupref. The extra grouping
+         * expressions aren't used during the final aggregation, so the
+         * sortgroupref values don't need to be unique across the query. Thus
+         * we don't have to increase root->max_sortgroupref, which makes
+         * recognition of the extra grouping expressions pretty easy.
+         */
+        sortgroupref = root->max_sortgroupref;
+
+        /*
+         * Generate the SortGroupClause's and add the expressions to the
+         * target.
+         */
+        foreach(lc, grp_exprs_extra)
+        {
+            Var           *var = lfirst_node(Var, lc);
+            SortGroupClause *cl = makeNode(SortGroupClause);
+
+            /*
+             * Initialize the SortGroupClause.
+             *
+             * As the final aggregation will not use this grouping expression,
+             * we don't care whether sortop is < or >. The value of
+             * nulls_first should not matter for the same reason.
+             */
+            cl->tleSortGroupRef = ++sortgroupref;
+            get_sort_group_operators(var->vartype,
+                                     false, true, false,
+                                     &cl->sortop, &cl->eqop, NULL,
+                                     &cl->hashable);
+            group_clauses_final = lappend(group_clauses_final, cl);
+            add_column_to_pathtarget(target, (Expr *) var,
+                                     cl->tleSortGroupRef);
+
+            /*
+             * The aggregation input target must emit this var too.
+             */
+            add_column_to_pathtarget(agg_input, (Expr *) var,
+                                     cl->tleSortGroupRef);
+        }
+    }
+
+    /*
+     * Add aggregates to the grouping target.
+     */
+    foreach(lc, aggregates)
+    {
+        GroupedVarInfo *gvi;
+
+        gvi = lfirst_node(GroupedVarInfo, lc);
+        add_column_to_pathtarget(target, (Expr *) gvi->agg_partial,
+                                 gvi->sortgroupref);
+    }
+
+    /*
+     * Build a list of grouping expressions and a list of the corresponding
+     * SortGroupClauses.
+     */
+    i = 0;
+    result = makeNode(RelAggInfo);
+    pk_missing = false;
+    foreach(lc, target->exprs)
+    {
+        Index        sortgroupref = 0;
+        SortGroupClause *cl;
+        Expr       *texpr;
+        ListCell    *lc2;
+
+        texpr = (Expr *) lfirst(lc);
+
+        if (IsA(texpr, Aggref))
+        {
+            /*
+             * Once we see Aggref, no grouping expressions should follow.
+             */
+            break;
+        }
+
+        /*
+         * Find the clause by sortgroupref.
+         */
+        sortgroupref = target->sortgrouprefs[i++];
+
+        /*
+         * Besides being an aggregate, the target expression should have no
+         * other reason to be there than being a column of a relation
+         * functionally dependent on the GROUP BY clause. So it's not actually
+         * a grouping column.
+         */
+        if (sortgroupref == 0)
+            continue;
+
+        /*
+         * group_clause_final contains the "local" clauses, so this search
+         * should succeed.
+         */
+        cl = get_sortgroupref_clause(sortgroupref, group_clauses_final);
+
+        result->group_clauses = list_append_unique(result->group_clauses,
+                                                   cl);
+
+        /*
+         * Add only unique clauses because of joins (both sides of a join can
+         * point at the same grouping clause). XXX Is it worth adding a bool
+         * argument indicating that we're dealing with join right now?
+         */
+        result->group_exprs = list_append_unique(result->group_exprs,
+                                                 texpr);
+
+        /*
+         * Try to find PathKey for the expression, but don't if we already saw
+         * an expression w/o the PathKey.
+         */
+        if (pk_missing)
+            continue;
+
+        pk_found = false;
+        foreach(lc2, root->group_pathkeys)
+        {
+            PathKey        *pkey = lfirst_node(PathKey, lc2);
+            EquivalenceClass *ec = pkey->pk_eclass;
+            ListCell    *lc3;
+
+            foreach(lc3, ec->ec_members)
+            {
+                EquivalenceMember    *em = lfirst_node(EquivalenceMember, lc3);
+
+                if (equal(texpr, em->em_expr))
+                {
+                    result->group_pathkeys = lappend(result->group_pathkeys,
+                                                     pkey);
+                    pk_found = true;
+                    break;
+                }
+            }
+            if (pk_found)
+                break;
+        }
+
+        /*
+         * If no PathKey was found, the expression was probably generated out
+         * of grp_exprs_extra. If we don't have a single PathKey,
+         * group_pathkeys is not useful, so clear it.
+         */
+        if (!pk_found)
+        {
+            list_free(result->group_pathkeys);
+            result->group_pathkeys = NIL;
+            /*
+             * Do not spend cycles looking for the PathKey for other
+             * expressions.
+             */
+            pk_missing = true;
+        }
+    }
+
+    /*
+     * Since neither target nor agg_input is supposed to be identical to the
+     * source reltarget, compute the width and cost again.
+     *
+     * target does not yet contain aggregates, but these will be accounted by
+     * AggPath.
+     */
+    set_pathtarget_cost_width(root, target);
+    set_pathtarget_cost_width(root, agg_input);
+
+    result->relids = bms_copy(rel->relids);
+    result->target = target;
+    result->agg_input = agg_input;
+
+    /* Finally collect the aggregates. */
+    while (lc != NULL)
+    {
+        Aggref       *aggref = lfirst_node(Aggref, lc);
+
+        /*
+         * Partial aggregation is what the grouped paths should do.
+         */
+        result->agg_exprs = lappend(result->agg_exprs, aggref);
+        lc = lnext(target->exprs, lc);
+    }
+
+    /* The "input_rows" field should be set by caller. */
+    return result;
+}
+
+/*
+ * Initialize target for grouped paths (target) as well as a target for paths
+ * that generate input for aggregation (agg_input).
+ *
+ * group_exprs_extra_p receives a list of Var nodes for which we need to
+ * construct SortGroupClause. Those vars will then be used as additional
+ * grouping expressions, for the sake of join clauses.
+ *
+ * gvis a list of GroupedVarInfo's possibly useful for rel.
+ *
+ * Return true iff the targets could be initialized.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+                      PathTarget *target, PathTarget *agg_input,
+                      List *gvis, List **group_exprs_extra_p)
+{
+    ListCell   *lc;
+    List       *possibly_dependent = NIL;
+    Var           *tvar;
+
+    foreach(lc, rel->reltarget->exprs)
+    {
+        Index        sortgroupref;
+
+        /*
+         * Given that PlaceHolderVar currently prevents us from doing
+         * aggregation push-down, the source target cannot contain anything
+         * more complex than a Var.
+         */
+        tvar = lfirst_node(Var, lc);
+
+        sortgroupref = get_expression_sortgroupref((Expr *) tvar, gvis);
+        if (sortgroupref > 0)
+        {
+            /*
+             * If the target expression can be used as the grouping key, we
+             * don't have to worry whether it can be emitted by the AggPath
+             * pushed down to relation / join.
+             */
+            add_column_to_pathtarget(target, (Expr *) tvar, sortgroupref);
+
+            /*
+             * As for agg_input, add the original expression but set
+             * sortgroupref in addition.
+             */
+            add_column_to_pathtarget(agg_input, (Expr *) tvar, sortgroupref);
+        }
+        else
+        {
+            if (is_var_needed_by_join(root, tvar, rel))
+            {
+                /*
+                 * The variable is needed for a join, however it's neither in
+                 * the GROUP BY clause nor can it be derived from it using EC.
+                 * (Otherwise it would have to be added to the targets above.)
+                 * We need to construct special SortGroupClause for that
+                 * variable.
+                 *
+                 * Note that its tleSortGroupRef needs to be unique within
+                 * agg_input, so we need to postpone creation of the
+                 * SortGroupClause's until we're done with the iteration of
+                 * rel->reltarget->exprs. Also it makes sense for the caller
+                 * to do some more check before it starts to create those
+                 * SortGroupClause's.
+                 */
+                *group_exprs_extra_p = lappend(*group_exprs_extra_p, tvar);
+            }
+            else if (is_var_in_aggref_only(root, tvar))
+            {
+                /*
+                 * Another reason we might need this variable is that some
+                 * aggregate pushed down to this relation references it. In
+                 * such a case, add that var to agg_input, but not to
+                 * "target". However, if the aggregate is not the only reason
+                 * for the var to be in the target, some more checks need to
+                 * be performed below.
+                 */
+                add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+            }
+            else
+            {
+                /*
+                 * The Var can be functionally dependent on another expression
+                 * of the target, but we cannot check until the other
+                 * expressions are in the target.
+                 */
+                possibly_dependent = lappend(possibly_dependent, tvar);
+            }
+        }
+    }
+
+    /*
+     * Now we can check whether the expression is functionally dependent on
+     * another one.
+     */
+    foreach(lc, possibly_dependent)
+    {
+        List       *deps = NIL;
+        RangeTblEntry *rte;
+
+        tvar = lfirst_node(Var, lc);
+        rte = root->simple_rte_array[tvar->varno];
+
+        /*
+         * Check if the Var can be in the grouping key even though it's not
+         * mentioned by the GROUP BY clause (and could not be derived using
+         * ECs).
+         */
+        if (check_functional_grouping(rte->relid, tvar->varno,
+                                      tvar->varlevelsup,
+                                      target->exprs, &deps))
+        {
+            /*
+             * The var shouldn't be actually used for grouping key evaluation
+             * (instead, the one this depends on will be), so sortgroupref
+             * should not be important.
+             */
+            add_new_column_to_pathtarget(target, (Expr *) tvar);
+            add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+        }
+        else
+        {
+            /*
+             * As long as the query is semantically correct, arriving here
+             * means that the var is referenced by a generic grouping
+             * expression but not referenced by any join.
+             *
+             * If the aggregate push-down will support generic grouping
+             * expression sin the future, create_rel_agg_info() will have to
+             * add this variable to "agg_input" target and also add the whole
+             * generic expression to "target".
+             */
+            return false;
+        }
+    }
+
+    return true;
+}
+
+/*
+ * Check whether given variable appears in Aggref(s) which we consider usable
+ * at relation / join level, and only in the Aggref(s).
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+    ListCell   *lc;
+    bool        found = false;
+
+    foreach(lc, root->grouped_var_list)
+    {
+        GroupedVarInfo *gvi = lfirst_node(GroupedVarInfo, lc);
+        ListCell   *lc2;
+        List       *vars;
+
+        if (!IsA(gvi->gvexpr, Aggref))
+            continue;
+
+        if (!bms_is_member(var->varno, gvi->gv_eval_at))
+            continue;
+
+        /*
+         * XXX Consider some sort of caching.
+         */
+        vars = pull_var_clause((Node *) gvi->gvexpr, PVC_RECURSE_AGGREGATES);
+        foreach(lc2, vars)
+        {
+            Var           *v = lfirst_node(Var, lc2);
+
+            if (equal(v, var))
+            {
+                found = true;
+                break;
+            }
+
+        }
+        list_free(vars);
+
+        if (found)
+            break;
+    }
+
+    /* No aggregate references the Var? */
+    if (!found)
+        return false;
+
+    /* Does the Var appear in the target outside aggregates? */
+    found = false;
+    foreach(lc, root->processed_tlist)
+    {
+        TargetEntry *te = lfirst_node(TargetEntry, lc);
+
+        if (IsA(te->expr, Aggref))
+            continue;
+
+        if (equal(te->expr, var))
+            return false;
+
+    }
+
+    /* The Var is in aggregate(s) and only there. */
+    return true;
+}
+
+/*
+ * Check if given variable is needed by joins above the current rel?
+ *
+ * Consider pushing the aggregate avg(b.y) down to relation "b" for the
+ * following query:
+ *
+ *    SELECT a.i, avg(b.y)
+ *    FROM a JOIN b ON b.j = a.i
+ *    GROUP BY a.i;
+ *
+ * If we aggregate the "b" relation alone, the column "b.j" needs to be used
+ * as the grouping key because otherwise it cannot find its way to the input
+ * of the join expression.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+    Relids        relids_no_top;
+    int            ndx;
+    RelOptInfo *baserel;
+
+    /*
+     * The relids we're not interested in do include 0, which is the top-level
+     * targetlist. The only reason for relids to contain 0 should be that
+     * arg_var is referenced either by aggregate or by grouping expression,
+     * but right now we're interested in the *other* reasons. (As soon
+     * aggregation is pushed down, the aggregates in the query targetlist no
+     * longer need direct reference to arg_var anyway.)
+     */
+
+    relids_no_top = bms_copy(rel->relids);
+    bms_add_member(relids_no_top, 0);
+
+    baserel = find_base_rel(root, var->varno);
+    ndx = var->varattno - baserel->min_attr;
+    if (bms_nonempty_difference(baserel->attr_needed[ndx],
+                                relids_no_top))
+        return true;
+
+    return false;
+}
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index 784a1af82d..943ffb0a67 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -820,6 +820,37 @@ apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target)
     }
 }
 
+/*
+ * Return sortgroupref if expr can be used as the grouping expression in an
+ * AggPath at relation or join level, or 0 if it can't.
+ *
+ * gvis a list of a list of GroupedVarInfo's available for the query,
+ * including those derived using equivalence classes.
+ */
+Index
+get_expression_sortgroupref(Expr *expr, List *gvis)
+{
+    ListCell   *lc;
+
+    foreach(lc, gvis)
+    {
+        GroupedVarInfo *gvi = lfirst_node(GroupedVarInfo, lc);
+
+        if (IsA(gvi->gvexpr, Aggref))
+            continue;
+
+        if (equal(gvi->gvexpr, expr))
+        {
+            Assert(gvi->sortgroupref > 0);
+
+            return gvi->sortgroupref;
+        }
+    }
+
+    /* The expression cannot be used as grouping key. */
+    return 0;
+}
+
 /*
  * split_pathtarget_at_srfs
  *        Split given PathTarget into multiple levels to position SRFs safely
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 836b49484a..8b9ec81418 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -939,6 +939,16 @@ struct config_bool ConfigureNamesBool[] =
         false,
         NULL, NULL, NULL
     },
+    {
+        {"enable_agg_pushdown", PGC_USERSET, QUERY_TUNING_METHOD,
+            gettext_noop("Enables aggregate push-down."),
+            NULL,
+            GUC_EXPLAIN
+        },
+        &enable_agg_pushdown,
+        false,
+        NULL, NULL, NULL
+    },
     {
         {"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
             gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 868d21c351..6e87ada684 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -388,6 +388,7 @@
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
+#enable_agg_pushdown = on
 
 # - Planner Cost Constants -
 
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 869854e235..6d46e8d140 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -370,6 +370,9 @@ struct PlannerInfo
     /* list of PlaceHolderInfos */
     List       *placeholder_list;
 
+    /* List of GroupedVarInfos. */
+    List       *grouped_var_list;
+
     /* array of PlaceHolderInfos indexed by phid */
     struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
     /* allocated size of array */
@@ -410,6 +413,12 @@ struct PlannerInfo
      */
     RelInfoList       upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);;
 
+    /*
+     * list of grouped relation RelAggInfos. One instance of RelAggInfo per
+     * item of the upper_rels[UPPERREL_PARTIAL_GROUP_AGG] list.
+     */
+    struct RelInfoList *agg_info_list;
+
     /* Result tlists chosen by grouping_planner for upper-stage processing */
     struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
 
@@ -424,6 +433,12 @@ struct PlannerInfo
      */
     List       *processed_tlist;
 
+    /*
+     * The maximum ressortgroupref among target entries in processed_list.
+     * Useful when adding extra grouping expressions for partial aggregation.
+     */
+    int            max_sortgroupref;
+
     /*
      * For UPDATE, this list contains the target table's attribute numbers to
      * which the first N entries of processed_tlist are to be assigned.  (Any
@@ -1023,6 +1038,64 @@ typedef struct RelOptInfo
     ((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
      (rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
 
+/*
+ * RelAggInfo
+ *        Information needed to create grouped paths for base rels and joins.
+ *
+ * "relids" is the set of base-relation identifiers, just like with
+ * RelOptInfo.
+ *
+ * "target" will be used as pathtarget if partial aggregation is applied to
+ * base relation or join. The same target will also --- if the relation is a
+ * join --- be used to joinin grouped path to a non-grouped one.  This target
+ * can contain plain-Var grouping expressions and Aggref nodes.
+ *
+ * Note: There's a convention that Aggref expressions are supposed to follow
+ * the other expressions of the target. Iterations of ->exprs may rely on this
+ * arrangement.
+ *
+ * "agg_input" contains Vars used either as grouping expressions or aggregate
+ * arguments. Paths providing the aggregation plan with input data should use
+ * this target. The only difference from reltarget of the non-grouped relation
+ * is that some items can have sortgroupref initialized.
+ *
+ * "input_rows" is the estimated number of input rows for AggPath. It's
+ * actually just a workspace for users of the structure, i.e. not initialized
+ * when instance of the structure is created.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClause, the corresponding grouping expressions and PathKey
+ * respectively.
+ *
+ * "agg_exprs" is a list of Aggref nodes for the aggregation of the relation's
+ * paths.
+ *
+ * "rel_grouped" is the relation containing the partially aggregated paths.
+ */
+typedef struct RelAggInfo
+{
+    pg_node_attr(no_copy_equal, no_read)
+
+    NodeTag        type;
+
+    Relids        relids;            /* Base rels contained in this grouped rel. */
+
+    struct PathTarget *target;    /* Target for grouped paths. */
+
+    struct PathTarget *agg_input;    /* pathtarget of paths that generate input
+                                     * for aggregation paths. */
+
+    double        input_rows;
+
+    List       *group_clauses;
+    List       *group_exprs;
+    List       *group_pathkeys;
+
+    List       *agg_exprs;        /* Aggref expressions. */
+
+    RelOptInfo *rel_grouped;    /* Grouped relation. */
+} RelAggInfo;
+
 /*
  * IndexOptInfo
  *        Per-index information for planning/optimization
@@ -2886,6 +2959,29 @@ typedef struct PlaceHolderInfo
     int32        ph_width;
 } PlaceHolderInfo;
 
+/*
+ * GroupedVarInfo exists for each expression that can be used as an aggregate
+ * or grouping expression evaluated below a join.
+ *
+ * TODO Rename, perhaps to GroupedTargetEntry? (Also rename the variables of
+ * this type.)
+ */
+typedef struct GroupedVarInfo
+{
+    pg_node_attr(no_copy_equal, no_read)
+
+    NodeTag        type;
+
+    Expr       *gvexpr;            /* the represented expression. */
+    Aggref       *agg_partial;    /* if gvexpr is aggregate, agg_partial is the
+                                 * corresponding partial aggregate */
+    Index        sortgroupref;    /* If gvexpr is a grouping expression, this is
+                                 * the tleSortGroupRef of the corresponding
+                                 * SortGroupClause. */
+    Relids        gv_eval_at;        /* lowest level we can evaluate the expression
+                                 * at or NULL if it can happen anywhere. */
+} GroupedVarInfo;
+
 /*
  * This struct describes one potentially index-optimizable MIN/MAX aggregate
  * function.  MinMaxAggPath contains a list of these, and if we accept that
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index ff242d1b6d..a4d2249a11 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -54,5 +54,6 @@ extern Query *inline_set_returning_function(PlannerInfo *root,
                                             RangeTblEntry *rte);
 
 extern Bitmapset *pull_paramids(Expr *expr);
-
+extern GroupedVarInfo *translate_expression_to_rel(PlannerInfo *root,
+                                                   GroupedVarInfo *gvi, Index relid);
 #endif                            /* CLAUSES_H */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 050f00e79a..4aea6bd94f 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -230,6 +230,14 @@ extern AggPath *create_agg_path(PlannerInfo *root,
                                 List *qual,
                                 const AggClauseCosts *aggcosts,
                                 double numGroups);
+extern AggPath *create_agg_sorted_path(PlannerInfo *root,
+                                       RelOptInfo *rel,
+                                       Path *subpath,
+                                       RelAggInfo *agg_info);
+extern AggPath *create_agg_hashed_path(PlannerInfo *root,
+                                       RelOptInfo *rel,
+                                       Path *subpath,
+                                       RelAggInfo *agg_info);
 extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
                                                   RelOptInfo *rel,
                                                   Path *subpath,
@@ -303,14 +311,21 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
 extern void expand_planner_arrays(PlannerInfo *root, int add_size);
 extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
                                     RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, int relid,
+                                            RelAggInfo **agg_info_p);
 extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
 extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
+extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel,
+                            RelAggInfo *agg_info);
+extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids,
+                                    RelAggInfo **agg_info_p);
 extern RelOptInfo *build_join_rel(PlannerInfo *root,
                                   Relids joinrelids,
                                   RelOptInfo *outer_rel,
                                   RelOptInfo *inner_rel,
                                   SpecialJoinInfo *sjinfo,
-                                  List **restrictlist_ptr);
+                                  List **restrictlist_ptr,
+                                  RelAggInfo *agg_info);
 extern Relids min_join_parameterization(PlannerInfo *root,
                                         Relids joinrelids,
                                         RelOptInfo *outer_rel,
@@ -336,5 +351,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
                                         RelOptInfo *outer_rel, RelOptInfo *inner_rel,
                                         RelOptInfo *parent_joinrel, List *restrictlist,
                                         SpecialJoinInfo *sjinfo, JoinType jointype);
-
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
 #endif                            /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 41f765d342..0a8e09a2e2 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
  * allpaths.c
  */
 extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_agg_pushdown;
 extern PGDLLIMPORT int geqo_threshold;
 extern PGDLLIMPORT int min_parallel_table_scan_size;
 extern PGDLLIMPORT int min_parallel_index_scan_size;
@@ -56,6 +57,11 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
                                   bool override_rows);
 extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
                                          bool override_rows);
+extern void generate_grouping_paths(PlannerInfo *root,
+                                    RelOptInfo *rel_grouped,
+                                    RelOptInfo *rel_plain,
+                                    RelAggInfo *agg_info);
+
 extern int    compute_parallel_worker(RelOptInfo *rel, double heap_pages,
                                     double index_pages, int max_workers);
 extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9dffdcfd1e..5a253e2283 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -72,6 +72,7 @@ extern void add_other_rels_to_query(PlannerInfo *root);
 extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
 extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
                                    Relids where_needed);
+extern void setup_aggregate_pushdown(PlannerInfo *root);
 extern void find_lateral_references(PlannerInfo *root);
 extern void create_lateral_join_info(PlannerInfo *root);
 extern List *deconstruct_jointree(PlannerInfo *root);
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 5b4f350b33..c8e0f2a0d7 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -46,6 +46,8 @@ extern PlanRowMark *get_plan_rowmark(List *rowmarks, Index rtindex);
  */
 extern void get_agg_clause_costs(PlannerInfo *root, AggSplit aggsplit,
                                  AggClauseCosts *costs);
+extern void get_agg_clause_costs_some(PlannerInfo *root, AggSplit aggsplit,
+                                      List *aggrefs, AggClauseCosts *costs);
 extern void preprocess_aggrefs(PlannerInfo *root, Node *clause);
 
 /*
diff --git a/src/include/optimizer/tlist.h b/src/include/optimizer/tlist.h
index 04668ba1c0..6e71ed47ab 100644
--- a/src/include/optimizer/tlist.h
+++ b/src/include/optimizer/tlist.h
@@ -49,8 +49,10 @@ extern void split_pathtarget_at_srfs(PlannerInfo *root,
                                      PathTarget *target, PathTarget *input_target,
                                      List **targets, List **targets_contain_srfs);
 
+/* TODO Find the best location for this one. */
+extern Index get_expression_sortgroupref(Expr *expr, List *gvis);
+
 /* Convenience macro to get a PathTarget with valid cost/width fields */
 #define create_pathtarget(root, tlist) \
     set_pathtarget_cost_width(root, make_pathtarget_from_tlist(tlist))
-
 #endif                            /* TLIST_H */
diff --git a/src/test/regress/expected/agg_pushdown.out b/src/test/regress/expected/agg_pushdown.out
new file mode 100644
index 0000000000..03a5ccf571
--- /dev/null
+++ b/src/test/regress/expected/agg_pushdown.out
@@ -0,0 +1,216 @@
+CREATE TABLE agg_pushdown_parent (
+    i int primary key,
+    x int);
+CREATE TABLE agg_pushdown_child1 (
+    j int,
+    parent int references agg_pushdown_parent,
+    v double precision,
+    PRIMARY KEY (j, parent));
+CREATE INDEX ON agg_pushdown_child1(parent);
+CREATE TABLE agg_pushdown_child2 (
+    k int,
+    parent int references agg_pushdown_parent,
+    v double precision,
+    PRIMARY KEY (k, parent));;
+INSERT INTO agg_pushdown_parent(i, x)
+SELECT n, n
+FROM generate_series(0, 7) AS s(n);
+INSERT INTO agg_pushdown_child1(j, parent, v)
+SELECT 128 * i + n, i, random()
+FROM generate_series(0, 127) AS s(n), agg_pushdown_parent;
+INSERT INTO agg_pushdown_child2(k, parent, v)
+SELECT 128 * i + n, i, random()
+FROM generate_series(0, 127) AS s(n), agg_pushdown_parent;
+ANALYZE;
+SET enable_agg_pushdown TO on;
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+--
+-- In addition, check that functionally dependent column "c.x" can be
+-- referenced by SELECT although GROUP BY references "p.i".
+EXPLAIN (COSTS off)
+SELECT p.x, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+                                      QUERY PLAN                                      
+--------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Sort
+         Sort Key: p.i
+         ->  Nested Loop
+               ->  Partial HashAggregate
+                     Group Key: c1.parent
+                     ->  Seq Scan on agg_pushdown_child1 c1
+               ->  Index Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+                     Index Cond: (i = c1.parent)
+(10 rows)
+
+-- The same for hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Sort
+         Sort Key: p.i
+         ->  Hash Join
+               Hash Cond: (p.i = c1.parent)
+               ->  Seq Scan on agg_pushdown_parent p
+               ->  Hash
+                     ->  Partial HashAggregate
+                           Group Key: c1.parent
+                           ->  Seq Scan on agg_pushdown_child1 c1
+(11 rows)
+
+-- The same for merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Merge Join
+         Merge Cond: (p.i = c1.parent)
+         ->  Sort
+               Sort Key: p.i
+               ->  Seq Scan on agg_pushdown_parent p
+         ->  Sort
+               Sort Key: c1.parent
+               ->  Partial HashAggregate
+                     Group Key: c1.parent
+                     ->  Seq Scan on agg_pushdown_child1 c1
+(12 rows)
+
+-- Restore the default values.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO on;
+-- Scan index on agg_pushdown_child1(parent) column and aggregate the result
+-- using AGG_SORTED strategy.
+SET enable_seqscan TO off;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+                                         QUERY PLAN                                          
+---------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Nested Loop
+         ->  Partial GroupAggregate
+               Group Key: c1.parent
+               ->  Index Scan using agg_pushdown_child1_parent_idx on agg_pushdown_child1 c1
+         ->  Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+               Index Cond: (i = c1.parent)
+(8 rows)
+
+SET enable_seqscan TO on;
+-- Join "c1" to "p.x" column, i.e. one that is not in the GROUP BY clause. The
+-- planner should still use "c1.parent" as grouping expression for partial
+-- aggregation, although it's not in the same equivalence class as the GROUP
+-- BY expression ("p.i"). The reason to use "c1.parent" for partial
+-- aggregation is that this is the only way for "c1" to provide the join
+-- expression with input data.
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.x GROUP BY p.i;
+                            QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Sort
+         Sort Key: p.i
+         ->  Hash Join
+               Hash Cond: (p.x = c1.parent)
+               ->  Seq Scan on agg_pushdown_parent p
+               ->  Hash
+                     ->  Partial HashAggregate
+                           Group Key: c1.parent
+                           ->  Seq Scan on agg_pushdown_child1 c1
+(11 rows)
+
+-- Perform nestloop join between agg_pushdown_child1 and agg_pushdown_child2
+-- and aggregate the result.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+                                            QUERY PLAN                                             
+---------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Sort
+         Sort Key: p.i
+         ->  Nested Loop
+               ->  Partial HashAggregate
+                     Group Key: c1.parent
+                     ->  Nested Loop
+                           ->  Seq Scan on agg_pushdown_child1 c1
+                           ->  Index Scan using agg_pushdown_child2_pkey on agg_pushdown_child2 c2
+                                 Index Cond: ((k = c1.j) AND (parent = c1.parent))
+               ->  Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+                     Index Cond: (i = c1.parent)
+(13 rows)
+
+-- The same for hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+                                       QUERY PLAN                                       
+----------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Sort
+         Sort Key: p.i
+         ->  Hash Join
+               Hash Cond: (p.i = c1.parent)
+               ->  Seq Scan on agg_pushdown_parent p
+               ->  Hash
+                     ->  Partial HashAggregate
+                           Group Key: c1.parent
+                           ->  Hash Join
+                                 Hash Cond: ((c1.parent = c2.parent) AND (c1.j = c2.k))
+                                 ->  Seq Scan on agg_pushdown_child1 c1
+                                 ->  Hash
+                                       ->  Seq Scan on agg_pushdown_child2 c2
+(15 rows)
+
+-- The same for merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+SET enable_seqscan TO off;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+                                            QUERY PLAN                                             
+---------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Merge Join
+         Merge Cond: (c1.parent = p.i)
+         ->  Sort
+               Sort Key: c1.parent
+               ->  Partial HashAggregate
+                     Group Key: c1.parent
+                     ->  Merge Join
+                           Merge Cond: ((c1.j = c2.k) AND (c1.parent = c2.parent))
+                           ->  Index Scan using agg_pushdown_child1_pkey on agg_pushdown_child1 c1
+                           ->  Index Scan using agg_pushdown_child2_pkey on agg_pushdown_child2 c2
+         ->  Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+(13 rows)
+
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 579b861d84..442f7f9b41 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -111,6 +111,7 @@ select count(*) = 0 as ok from pg_stat_wal_receiver;
 select name, setting from pg_settings where name like 'enable%';
               name              | setting 
 --------------------------------+---------
+ enable_agg_pushdown            | off
  enable_async_append            | on
  enable_bitmapscan              | on
  enable_gathermerge             | on
@@ -131,7 +132,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(20 rows)
+(21 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 9a139f1e24..1fbf5321da 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -99,6 +99,8 @@ test: select_parallel
 test: write_parallel
 test: vacuum_parallel
 
+test: agg_pushdown
+
 # no relation related tests can be put in this group
 test: publication subscription
 
diff --git a/src/test/regress/sql/agg_pushdown.sql b/src/test/regress/sql/agg_pushdown.sql
new file mode 100644
index 0000000000..0a4614592b
--- /dev/null
+++ b/src/test/regress/sql/agg_pushdown.sql
@@ -0,0 +1,115 @@
+CREATE TABLE agg_pushdown_parent (
+    i int primary key,
+    x int);
+
+CREATE TABLE agg_pushdown_child1 (
+    j int,
+    parent int references agg_pushdown_parent,
+    v double precision,
+    PRIMARY KEY (j, parent));
+
+CREATE INDEX ON agg_pushdown_child1(parent);
+
+CREATE TABLE agg_pushdown_child2 (
+    k int,
+    parent int references agg_pushdown_parent,
+    v double precision,
+    PRIMARY KEY (k, parent));;
+
+INSERT INTO agg_pushdown_parent(i, x)
+SELECT n, n
+FROM generate_series(0, 7) AS s(n);
+
+INSERT INTO agg_pushdown_child1(j, parent, v)
+SELECT 128 * i + n, i, random()
+FROM generate_series(0, 127) AS s(n), agg_pushdown_parent;
+
+INSERT INTO agg_pushdown_child2(k, parent, v)
+SELECT 128 * i + n, i, random()
+FROM generate_series(0, 127) AS s(n), agg_pushdown_parent;
+
+ANALYZE;
+
+SET enable_agg_pushdown TO on;
+
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+--
+-- In addition, check that functionally dependent column "c.x" can be
+-- referenced by SELECT although GROUP BY references "p.i".
+EXPLAIN (COSTS off)
+SELECT p.x, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+
+-- The same for hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+
+-- The same for merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+
+-- Restore the default values.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO on;
+
+-- Scan index on agg_pushdown_child1(parent) column and aggregate the result
+-- using AGG_SORTED strategy.
+SET enable_seqscan TO off;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+
+SET enable_seqscan TO on;
+
+-- Join "c1" to "p.x" column, i.e. one that is not in the GROUP BY clause. The
+-- planner should still use "c1.parent" as grouping expression for partial
+-- aggregation, although it's not in the same equivalence class as the GROUP
+-- BY expression ("p.i"). The reason to use "c1.parent" for partial
+-- aggregation is that this is the only way for "c1" to provide the join
+-- expression with input data.
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.x GROUP BY p.i;
+
+-- Perform nestloop join between agg_pushdown_child1 and agg_pushdown_child2
+-- and aggregate the result.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+
+-- The same for hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+
+-- The same for merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+SET enable_seqscan TO off;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
-- 
2.31.1

From a6bd3d1c101b0248d86a2f705ad6109f117539ca Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Thu, 17 Nov 2022 10:41:12 +0100
Subject: [PATCH 3/3] Use also partial paths as the input for grouped paths.

---
 src/backend/optimizer/path/allpaths.c      |  44 +++++-
 src/backend/optimizer/util/relnode.c       |  46 +++---
 src/test/regress/expected/agg_pushdown.out | 156 +++++++++++++++++++++
 src/test/regress/sql/agg_pushdown.sql      |  65 +++++++++
 4 files changed, 281 insertions(+), 30 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6638311ead..50f0cf8365 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -130,7 +130,7 @@ static void set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel,
                                    RangeTblEntry *rte);
 static void add_grouped_path(PlannerInfo *root, RelOptInfo *rel,
                              Path *subpath, AggStrategy aggstrategy,
-                             RelAggInfo *agg_info);
+                             RelAggInfo *agg_info, bool partial);
 static RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist);
 static bool subquery_is_pushdown_safe(Query *subquery, Query *topquery,
                                       pushdown_safety_info *safetyInfo);
@@ -3343,6 +3343,7 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
                         RelOptInfo *rel_plain, RelAggInfo *agg_info)
 {
     ListCell   *lc;
+    Path       *path;
 
     if (IS_DUMMY_REL(rel_plain))
     {
@@ -3352,7 +3353,7 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
 
     foreach(lc, rel_plain->pathlist)
     {
-        Path       *path = (Path *) lfirst(lc);
+        path = (Path *) lfirst(lc);
 
         /*
          * Since the path originates from the non-grouped relation which is
@@ -3366,7 +3367,8 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
          * add_grouped_path() will check whether the path has suitable
          * pathkeys.
          */
-        add_grouped_path(root, rel_grouped, path, AGG_SORTED, agg_info);
+        add_grouped_path(root, rel_grouped, path, AGG_SORTED, agg_info,
+                         false);
 
         /*
          * Repeated creation of hash table (for new parameter values) should
@@ -3374,12 +3376,38 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
          * efficiency.
          */
         if (path->param_info == NULL)
-            add_grouped_path(root, rel_grouped, path, AGG_HASHED, agg_info);
+            add_grouped_path(root, rel_grouped, path, AGG_HASHED, agg_info,
+                             false);
     }
 
     /* Could not generate any grouped paths? */
     if (rel_grouped->pathlist == NIL)
+    {
         mark_dummy_rel(rel_grouped);
+        return;
+    }
+
+    /*
+     * Almost the same for partial paths.
+     *
+     * The difference is that parameterized paths are never created, see
+     * add_partial_path() for explanation.
+     */
+    foreach(lc, rel_plain->partial_pathlist)
+    {
+        path = (Path *) lfirst(lc);
+
+        if (path->param_info != NULL)
+            continue;
+
+        path = (Path *) create_projection_path(root, rel_grouped, path,
+                                               agg_info->agg_input);
+
+        add_grouped_path(root, rel_grouped, path, AGG_SORTED, agg_info,
+                         true);
+        add_grouped_path(root, rel_grouped, path, AGG_HASHED, agg_info,
+                         true);
+    }
 }
 
 /*
@@ -3387,7 +3415,8 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
  */
 static void
 add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
-                 AggStrategy aggstrategy, RelAggInfo *agg_info)
+                 AggStrategy aggstrategy, RelAggInfo *agg_info,
+                 bool partial)
 {
     Path       *agg_path;
 
@@ -3410,7 +3439,10 @@ add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
         return;
 
     /* Add the grouped path to the list of grouped base paths. */
-    add_path(rel, (Path *) agg_path);
+    if (!partial)
+        add_path(rel, (Path *) agg_path);
+    else
+        add_partial_path(rel, (Path *) agg_path);
 }
 
 /*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 556f25bece..17aa8804ff 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -948,33 +948,12 @@ build_join_rel(PlannerInfo *root,
         build_joinrel_partition_info(joinrel, outer_rel, inner_rel,
                                      restrictlist, sjinfo->jointype);
 
+    /*
+     * Set estimates of the joinrel's size.
+     */
     if (!grouped)
-    {
-        /*
-         * Set estimates of the joinrel's size.
-         */
         set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
                                    sjinfo, restrictlist);
-
-        /*
-         * Set the consider_parallel flag if this joinrel could potentially be
-         * scanned within a parallel worker.  If this flag is false for either
-         * inner_rel or outer_rel, then it must be false for the joinrel also.
-         * Even if both are true, there might be parallel-restricted
-         * expressions in the targetlist or quals.
-         *
-         * Note that if there are more than two rels in this relation, they
-         * could be divided between inner_rel and outer_rel in any arbitrary
-         * way.  We assume this doesn't matter, because we should hit all the
-         * same baserels and joinclauses while building up to this joinrel no
-         * matter which we take; therefore, we should make the same decision
-         * here however we get here.
-         */
-        if (inner_rel->consider_parallel && outer_rel->consider_parallel &&
-            is_parallel_safe(root, (Node *) restrictlist) &&
-            is_parallel_safe(root, (Node *) joinrel->reltarget->exprs))
-            joinrel->consider_parallel = true;
-    }
     else
     {
         /*
@@ -990,6 +969,25 @@ build_join_rel(PlannerInfo *root,
                                             agg_info->input_rows, NULL, NULL);
     }
 
+    /*
+     * Set the consider_parallel flag if this joinrel could potentially be
+     * scanned within a parallel worker.  If this flag is false for either
+     * inner_rel or outer_rel, then it must be false for the joinrel also.
+     * Even if both are true, there might be parallel-restricted expressions
+     * in the targetlist or quals.
+     *
+     * Note that if there are more than two rels in this relation, they could
+     * be divided between inner_rel and outer_rel in any arbitrary way.  We
+     * assume this doesn't matter, because we should hit all the same baserels
+     * and joinclauses while building up to this joinrel no matter which we
+     * take; therefore, we should make the same decision here however we get
+     * here.
+     */
+    if (inner_rel->consider_parallel && outer_rel->consider_parallel &&
+        is_parallel_safe(root, (Node *) restrictlist) &&
+        is_parallel_safe(root, (Node *) joinrel->reltarget->exprs))
+        joinrel->consider_parallel = true;
+
     /* Add the joinrel to the PlannerInfo. */
     if (!grouped)
         add_join_rel(root, joinrel);
diff --git a/src/test/regress/expected/agg_pushdown.out b/src/test/regress/expected/agg_pushdown.out
index 03a5ccf571..66d36d122e 100644
--- a/src/test/regress/expected/agg_pushdown.out
+++ b/src/test/regress/expected/agg_pushdown.out
@@ -214,3 +214,159 @@ c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
          ->  Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
 (13 rows)
 
+-- Most of the tests above with parallel query processing enforced.
+SET min_parallel_index_scan_size = 0;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+-- Partially aggregate a single relation.
+--
+-- Nestloop join.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+EXPLAIN (COSTS off)
+SELECT p.x, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+                                                 QUERY PLAN                                                 
+------------------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Gather Merge
+         Workers Planned: 1
+         ->  Nested Loop
+               ->  Partial GroupAggregate
+                     Group Key: c1.parent
+                     ->  Parallel Index Scan using agg_pushdown_child1_parent_idx on agg_pushdown_child1 c1
+               ->  Index Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+                     Index Cond: (i = c1.parent)
+(10 rows)
+
+-- Hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+                                                    QUERY PLAN                                                    
+------------------------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Sort
+         Sort Key: p.i
+         ->  Gather
+               Workers Planned: 1
+               ->  Parallel Hash Join
+                     Hash Cond: (c1.parent = p.i)
+                     ->  Partial GroupAggregate
+                           Group Key: c1.parent
+                           ->  Parallel Index Scan using agg_pushdown_child1_parent_idx on agg_pushdown_child1 c1
+                     ->  Parallel Hash
+                           ->  Parallel Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+(13 rows)
+
+-- Merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+                                                 QUERY PLAN                                                 
+------------------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Gather Merge
+         Workers Planned: 1
+         ->  Merge Join
+               Merge Cond: (c1.parent = p.i)
+               ->  Partial GroupAggregate
+                     Group Key: c1.parent
+                     ->  Parallel Index Scan using agg_pushdown_child1_parent_idx on agg_pushdown_child1 c1
+               ->  Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+(10 rows)
+
+SET enable_nestloop TO on;
+SET enable_hashjoin TO on;
+-- Perform nestloop join between agg_pushdown_child1 and agg_pushdown_child2
+-- and aggregate the result.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+                                                    QUERY PLAN                                                    
+------------------------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Gather Merge
+         Workers Planned: 2
+         ->  Sort
+               Sort Key: p.i
+               ->  Nested Loop
+                     ->  Partial HashAggregate
+                           Group Key: c1.parent
+                           ->  Nested Loop
+                                 ->  Parallel Index Scan using agg_pushdown_child1_pkey on agg_pushdown_child1 c1
+                                 ->  Index Scan using agg_pushdown_child2_pkey on agg_pushdown_child2 c2
+                                       Index Cond: ((k = c1.j) AND (parent = c1.parent))
+                     ->  Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+                           Index Cond: (i = c1.parent)
+(15 rows)
+
+-- The same for hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+                                                       QUERY PLAN

 

+------------------------------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Gather Merge
+         Workers Planned: 1
+         ->  Sort
+               Sort Key: p.i
+               ->  Parallel Hash Join
+                     Hash Cond: (c1.parent = p.i)
+                     ->  Partial HashAggregate
+                           Group Key: c1.parent
+                           ->  Parallel Hash Join
+                                 Hash Cond: ((c1.parent = c2.parent) AND (c1.j = c2.k))
+                                 ->  Parallel Index Scan using agg_pushdown_child1_parent_idx on agg_pushdown_child1
c1
+                                 ->  Parallel Hash
+                                       ->  Parallel Index Scan using agg_pushdown_child2_pkey on agg_pushdown_child2
c2
+                     ->  Parallel Hash
+                           ->  Parallel Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+(17 rows)
+
+-- The same for merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+SET enable_seqscan TO off;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+                                                    QUERY PLAN                                                    
+------------------------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Gather Merge
+         Workers Planned: 2
+         ->  Merge Join
+               Merge Cond: (c1.parent = p.i)
+               ->  Sort
+                     Sort Key: c1.parent
+                     ->  Partial HashAggregate
+                           Group Key: c1.parent
+                           ->  Merge Join
+                                 Merge Cond: ((c1.j = c2.k) AND (c1.parent = c2.parent))
+                                 ->  Parallel Index Scan using agg_pushdown_child1_pkey on agg_pushdown_child1 c1
+                                 ->  Index Scan using agg_pushdown_child2_pkey on agg_pushdown_child2 c2
+               ->  Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+(15 rows)
+
diff --git a/src/test/regress/sql/agg_pushdown.sql b/src/test/regress/sql/agg_pushdown.sql
index 0a4614592b..49ba6dd67c 100644
--- a/src/test/regress/sql/agg_pushdown.sql
+++ b/src/test/regress/sql/agg_pushdown.sql
@@ -113,3 +113,68 @@ EXPLAIN (COSTS off)
 SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
 agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
 c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+
+-- Most of the tests above with parallel query processing enforced.
+SET min_parallel_index_scan_size = 0;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+
+-- Partially aggregate a single relation.
+--
+-- Nestloop join.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+EXPLAIN (COSTS off)
+SELECT p.x, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+
+-- Hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+
+-- Merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+
+SET enable_nestloop TO on;
+SET enable_hashjoin TO on;
+
+-- Perform nestloop join between agg_pushdown_child1 and agg_pushdown_child2
+-- and aggregate the result.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+
+-- The same for hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+
+-- The same for merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+SET enable_seqscan TO off;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
-- 
2.31.1

diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 2fd1a96269..db97bd254d 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1186,14 +1186,34 @@ Thus the join above the partial aggregate node receives fewer input rows, and
 so the number of outer-to-inner pairs of tuples to be checked can be
 significantly lower, which can in turn lead to considerably lower join cost.
 
-Note that there's often no GROUP BY expression to be used for the partial
-aggregation, so we use equivalence classes to derive grouping expression: in
-the example above, the grouping key "b.j" was derived from "a.i".
-
-Also note that in this case the partial aggregate uses the "b.j" as grouping
-column although the column does not appear in the query target list. The point
-is that "b.j" is needed to evaluate the join condition, and there's no other
-way for the partial aggregate to emit its values.
+Note that the GROUP BY expression might not be useful for the partial
+aggregate. In the example above, the aggregate avg(b.y) references table "b",
+but the GROUP BY expression mentions "a". However, the equivalence class {a.i,
+b.j} allows us to use the b.j column as a grouping key for the partial
+aggregation of the "b" table. The equivalence class mechanism is suitable
+because it's designed to derive join clauses, and at the same time the join
+clauses determine the choice of grouping columns of the partial aggregate: the
+only way for the partial aggregate to provide upper join(s) with input values
+is to have the join input expression(s) in the grouping key: besides grouping
+columns, the partial aggregate can only produce the transient states of the
+aggregate functions, but aggregate functions cannot be referenced by the JOIN
+clauses.
+
+Regarding correctness, join node considers the output of the partial aggregate
+to be equivalent to the output of a plain (non-aggregated) relation scan. That
+is, a group (i.e. a row of the partial aggregate output) matches the other
+side of the join if and only if each row of the non-aggregate relation
+does. In other words, all rows belonging to the same group have the same value
+of the join columns (As mentioned above, a join cannot reference other output
+expressions of the partial aggregate than the grouping expressions.).
+
+However, there's a restriction from the aggregate's perspective: the aggregate
+cannot be pushed down if any column referenced by either grouping expression
+or aggregate function can be set to NULL by an outer join above the relation
+to which we want to apply the partiall aggregation. The point is that those
+NULL values would not appear on the input of the pushed-down aggregate, so it could
+either put the rows into groups in a different way than the aggregate at the
+top of the plan, or it could compute wrong values of the aggregate functions.
 
 Besides base relation, the aggregation can also be pushed down to join:
 
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 32b3dedc71..50f0cf8365 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -197,8 +197,7 @@ make_one_rel(PlannerInfo *root, List *joinlist)
      * Now that the sizes are known, we can estimate the sizes of the grouped
      * relations.
      */
-    if (root->grouped_var_list)
-        setup_base_grouped_rels(root);
+    setup_base_grouped_rels(root);
 
     /*
      * We should now have size estimates for every actual table involved in
@@ -341,7 +340,7 @@ set_base_rel_sizes(PlannerInfo *root)
 }
 
 /*
- * setup_based_grouped_rels
+ * setup_base_grouped_rels
  *      For each "plain" relation build a grouped relation if aggregate pushdown
  *    is possible and if this relation is suitable for partial aggregation.
  */
@@ -350,6 +349,11 @@ setup_base_grouped_rels(PlannerInfo *root)
 {
     Index        rti;
 
+    /* If there are no grouping expressions, no aggregate push-down. */
+    if (!root->grouped_var_list)
+        return;
+
+
     for (rti = 1; rti < root->simple_rel_array_size; rti++)
     {
         RelOptInfo *brel = root->simple_rel_array[rti];
@@ -362,6 +366,10 @@ setup_base_grouped_rels(PlannerInfo *root)
 
         Assert(brel->relid == rti); /* sanity check on array */
 
+        /* ignore RTEs that are "other rels" */
+        if (brel->reloptkind != RELOPT_BASEREL)
+            continue;
+
         /*
          * The aggregate push-down feature only makes sense if there are
          * multiple base rels in the query.
@@ -369,16 +377,13 @@ setup_base_grouped_rels(PlannerInfo *root)
         if (!bms_nonempty_difference(root->all_baserels, brel->relids))
             continue;
 
-        /* ignore RTEs that are "other rels" */
-        if (brel->reloptkind != RELOPT_BASEREL)
+        rel_grouped = build_simple_grouped_rel(root, brel->relid, &agg_info);
+        /* Couldn't any aggregate be pushed down to this relation? */
+        if (!rel_grouped)
             continue;
 
-        rel_grouped = build_simple_grouped_rel(root, brel->relid, &agg_info);
-        if (rel_grouped)
-        {
-            /* Make the relation available for joining. */
-            add_grouped_rel(root, rel_grouped, agg_info);
-        }
+        /* Make the relation available for joining. */
+        add_grouped_rel(root, rel_grouped, agg_info);
     }
 }
 
@@ -554,21 +559,8 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
                 }
                 else
                 {
-                    RelOptInfo *rel_grouped;
-                    RelAggInfo *agg_info;
-
                     /* Plain relation */
                     set_plain_rel_pathlist(root, rel, rte);
-
-                    /* Add paths to the grouped relation if one exists. */
-                    rel_grouped = find_grouped_rel(root, rel->relids,
-                                                   &agg_info);
-                    if (rel_grouped)
-                    {
-                        generate_grouping_paths(root, rel_grouped, rel,
-                                                agg_info);
-                        set_cheapest(rel_grouped);
-                    }
                 }
                 break;
             case RTE_SUBQUERY:
@@ -836,6 +828,8 @@ static void
 set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 {
     Relids        required_outer;
+    RelOptInfo *rel_grouped;
+    RelAggInfo *agg_info;
 
     /*
      * We don't support pushing join clauses into the quals of a seqscan, but
@@ -856,6 +850,14 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
     /* Consider TID scans */
     create_tidscan_paths(root, rel);
+
+    /* Add paths to the grouped relation if one exists. */
+    rel_grouped = find_grouped_rel(root, rel->relids, &agg_info);
+    if (!rel_grouped)
+        return;
+
+    generate_grouping_paths(root, rel_grouped, rel, agg_info);
+    set_cheapest(rel_grouped);
 }
 
 /*
@@ -3428,14 +3430,19 @@ add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
     else
         elog(ERROR, "unexpected strategy %d", aggstrategy);
 
+    /*
+     * Bail out if we failed to create a suitable aggregated path. This can
+     * happen e.g. then the path does not support hashing (for AGG_HASHED),
+     * or when the input path is not sorted.
+     */
+    if (agg_path == NULL)
+        return;
+
     /* Add the grouped path to the list of grouped base paths. */
-    if (agg_path != NULL)
-    {
-        if (!partial)
-            add_path(rel, (Path *) agg_path);
-        else
-            add_partial_path(rel, (Path *) agg_path);
-    }
+    if (!partial)
+        add_path(rel, (Path *) agg_path);
+    else
+        add_partial_path(rel, (Path *) agg_path);
 }
 
 /*
@@ -3579,7 +3586,6 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 
     for (lev = 2; lev <= levels_needed; lev++)
     {
-        RelOptInfo *rel_grouped;
         ListCell   *lc;
 
         /*
@@ -3601,6 +3607,8 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
          */
         foreach(lc, root->join_rel_level[lev])
         {
+            RelOptInfo *rel_grouped;
+
             rel = (RelOptInfo *) lfirst(lc);
 
             /* Create paths for partitionwise joins. */
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b34ad90d08..4688f561f0 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -4999,6 +4999,7 @@ set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
                                0,
                                JOIN_INNER,
                                NULL);
+
     rel->rows = clamp_row_est(nrows);
 
     cost_qual_eval(&rel->baserestrictcost, rel->baserestrictinfo, root);
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 8e913c92d8..8dc39765f2 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -355,7 +355,8 @@ create_aggregate_grouped_var_infos(PlannerInfo *root)
     Assert(root->grouped_var_list == NIL);
 
     tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
-                                  PVC_INCLUDE_AGGREGATES);
+                                  PVC_INCLUDE_AGGREGATES |
+                                  PVC_RECURSE_WINDOWFUNCS);
 
     /*
      * Although GroupingFunc is related to root->parse->groupingSets, this
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0ada3ba3eb..3292b4b419 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6847,6 +6847,12 @@ create_partial_grouping_paths(PlannerInfo *root,
      * push-down.
      */
     partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
+
+    /*
+     * If the relation already exists, it must have been created by aggregate
+     * pushdown. We can't check how exactly it got created, but we can at
+     * least check that aggregate pushdown is enabled.
+     */
     Assert(enable_agg_pushdown || partially_grouped_rel == NULL);
 
     /*
@@ -6871,17 +6877,25 @@ create_partial_grouping_paths(PlannerInfo *root,
     /*
      * If we can't partially aggregate partial paths, and we can't partially
      * aggregate non-partial paths, then don't bother creating the new
-     * RelOptInfo at all, unless the caller specified force_rel_creation.
+     * RelOptInfo at all, unless the caller specified force_rel_creation. However
      */
     if (cheapest_total_path == NULL &&
         cheapest_partial_path == NULL &&
-        !force_rel_creation &&
-        partially_grouped_rel == NULL)
-        return NULL;
+        !force_rel_creation)
+    {
+        /*
+         * If partially_grouped_rel exists, it should contain paths generated
+         * by the aggregate push-down feature, so the caller is interested in
+         * it.
+         */
+        return partially_grouped_rel;
+    }
 
     /*
      * Build a new upper relation to represent the result of partially
-     * aggregating the rows from the input relation.
+     * aggregating the rows from the input relation. The relation may already
+     * exist due to aggregate pushdown, in which case we don't need to create
+     * it.
      */
     if (partially_grouped_rel == NULL)
         partially_grouped_rel = fetch_upper_rel(root,
@@ -6903,6 +6917,11 @@ create_partial_grouping_paths(PlannerInfo *root,
      *
      * If the target was already created for the sake of aggregate push-down,
      * it should be compatible with what we'd create here.
+     *
+     * XXX If fetch_upper_rel() had to create a new relation (i.e. aggregate
+     * push-down generated no paths), it created an empty target. Should we
+     * change the convention and have it assign NULL to reltarget instead?  Or
+     * should we introduce a function like is_pathtarget_empty()?
      */
     if (partially_grouped_rel->reltarget->exprs == NIL)
         partially_grouped_rel->reltarget =
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 7025ebf94b..2627e2f252 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3163,6 +3163,9 @@ create_agg_path(PlannerInfo *root,
 }
 
 /*
+ * create_agg_sorted_path
+ *        Creates a pathnode performing sorted aggregation/grouping
+ *
  * Apply AGG_SORTED aggregation path to subpath if it's suitably sorted.
  *
  * NULL is returned if sorting of subpath output is not suitable.
@@ -3176,45 +3179,32 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
     AggClauseCosts agg_costs;
     PathTarget *target;
     double        dNumGroups;
-    ListCell   *lc1;
-    List       *key_subset = NIL;
     AggPath    *result = NULL;
 
     aggsplit = AGGSPLIT_INITIAL_SERIAL;
     agg_exprs = agg_info->agg_exprs;
     target = agg_info->target;
 
-    if (subpath->pathkeys == NIL)
-        return NULL;
-
-    if (!grouping_is_sortable(root->parse->groupClause))
+    /* group_pathkeys are necessary to evaluate the sorting. */
+    if (agg_info->group_pathkeys == NIL)
         return NULL;
 
     /*
-     * Find all query pathkeys that our relation does affect.
+     * The input path must be sorted in a specific way, but if it's not sorted
+     * at all, it's not useful for AGG_SORTED.
      */
-    foreach(lc1, root->group_pathkeys)
-    {
-        PathKey    *gkey = castNode(PathKey, lfirst(lc1));
-        ListCell   *lc2;
-
-        foreach(lc2, subpath->pathkeys)
-        {
-            PathKey    *skey = castNode(PathKey, lfirst(lc2));
-
-            if (skey == gkey)
-            {
-                key_subset = lappend(key_subset, gkey);
-                break;
-            }
-        }
-    }
+    if (subpath->pathkeys == NIL)
+        return NULL;
 
-    if (key_subset == NIL)
+    /* Are the grouping clauses suitable for sorted aggregation? */
+    if (!grouping_is_sortable(agg_info->group_clauses))
         return NULL;
 
-    /* Check if AGG_SORTED is useful for the whole query.  */
-    if (!pathkeys_contained_in(key_subset, subpath->pathkeys))
+    /*
+     * Is the input path sorted enough for this grouping? TODO Consider using
+     * incremental sort if the sorting is "almost sufficient".
+     */
+    if (!pathkeys_contained_in(agg_info->group_pathkeys, subpath->pathkeys))
         return NULL;
 
     MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
@@ -3231,7 +3221,7 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
     result = create_agg_path(root, rel, subpath, target,
                              AGG_SORTED, aggsplit,
                              agg_info->group_clauses,
-                             NIL,
+                             NIL,    /* qual for HAVING clause */
                              &agg_costs,
                              dNumGroups);
 
@@ -3254,7 +3244,6 @@ create_agg_hashed_path(PlannerInfo *root, RelOptInfo *rel,
     AggClauseCosts agg_costs;
     PathTarget *target;
     double        dNumGroups;
-    double        hashaggtablesize;
     Query       *parse = root->parse;
     AggPath    *result = NULL;
 
@@ -3279,25 +3268,18 @@ create_agg_hashed_path(PlannerInfo *root, RelOptInfo *rel,
         dNumGroups = estimate_num_groups(root, agg_info->group_exprs,
                                          subpath->rows, NULL, NULL);
 
-        hashaggtablesize = estimate_hashagg_tablesize(root, subpath,
-                                                      &agg_costs,
-                                                      dNumGroups);
-
-        if (hashaggtablesize < work_mem * 1024L)
-        {
-            /*
-             * qual is NIL because the HAVING clause cannot be evaluated until
-             * the final value of the aggregate is known.
-             */
-            result = create_agg_path(root, rel, subpath,
-                                     target,
-                                     AGG_HASHED,
-                                     aggsplit,
-                                     agg_info->group_clauses,
-                                     NIL,
-                                     &agg_costs,
-                                     dNumGroups);
-        }
+        /*
+         * qual is NIL because the HAVING clause cannot be evaluated until the
+         * final value of the aggregate is known.
+         */
+        result = create_agg_path(root, rel, subpath,
+                                 target,
+                                 AGG_HASHED,
+                                 aggsplit,
+                                 agg_info->group_clauses,
+                                 NIL, /* qual for HAVING clause */
+                                 &agg_costs,
+                                 dNumGroups);
     }
 
     return result;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ce2e267e91..fcec58f10a 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -2330,15 +2330,16 @@ create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
     List       *grp_exprs_extra = NIL;
     List       *group_clauses_final;
     int            i;
+    bool        pk_found, pk_missing;
 
     /*
      * The function shouldn't have been called if there's no opportunity for
-     * aggregation push-down.
+     * aggregate push-down.
      */
     Assert(root->grouped_var_list != NIL);
 
     /*
-     * The current implementation of aggregation push-down cannot handle
+     * The current implementation of aggregate push-down cannot handle
      * PlaceHolderVar (PHV).
      *
      * If we knew that the PHV should be evaluated in this target (and of
@@ -2608,11 +2609,13 @@ create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
      */
     i = 0;
     result = makeNode(RelAggInfo);
+    pk_missing = false;
     foreach(lc, target->exprs)
     {
         Index        sortgroupref = 0;
         SortGroupClause *cl;
         Expr       *texpr;
+        ListCell    *lc2;
 
         texpr = (Expr *) lfirst(lc);
 
@@ -2631,9 +2634,9 @@ create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
 
         /*
          * Besides being an aggregate, the target expression should have no
-         * other reason then being a column of a relation functionally
-         * dependent on the GROUP BY clause. So it's not actually a grouping
-         * column.
+         * other reason to be there than being a column of a relation
+         * functionally dependent on the GROUP BY clause. So it's not actually
+         * a grouping column.
          */
         if (sortgroupref == 0)
             continue;
@@ -2654,6 +2657,52 @@ create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
          */
         result->group_exprs = list_append_unique(result->group_exprs,
                                                  texpr);
+
+        /*
+         * Try to find PathKey for the expression, but don't if we already saw
+         * an expression w/o the PathKey.
+         */
+        if (pk_missing)
+            continue;
+
+        pk_found = false;
+        foreach(lc2, root->group_pathkeys)
+        {
+            PathKey        *pkey = lfirst_node(PathKey, lc2);
+            EquivalenceClass *ec = pkey->pk_eclass;
+            ListCell    *lc3;
+
+            foreach(lc3, ec->ec_members)
+            {
+                EquivalenceMember    *em = lfirst_node(EquivalenceMember, lc3);
+
+                if (equal(texpr, em->em_expr))
+                {
+                    result->group_pathkeys = lappend(result->group_pathkeys,
+                                                     pkey);
+                    pk_found = true;
+                    break;
+                }
+            }
+            if (pk_found)
+                break;
+        }
+
+        /*
+         * If no PathKey was found, the expression was probably generated out
+         * of grp_exprs_extra. If we don't have a single PathKey,
+         * group_pathkeys is not useful, so clear it.
+         */
+        if (!pk_found)
+        {
+            list_free(result->group_pathkeys);
+            result->group_pathkeys = NIL;
+            /*
+             * Do not spend cycles looking for the PathKey for other
+             * expressions.
+             */
+            pk_missing = true;
+        }
     }
 
     /*
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 89f944d83a..310c5ff774 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -389,6 +389,7 @@
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
+#enable_agg_pushdown = on
 
 # - Planner Cost Constants -
 
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 07459c423f..38e105b6de 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1065,8 +1065,9 @@ typedef struct RelOptInfo
  * actually just a workspace for users of the structure, i.e. not initialized
  * when instance of the structure is created.
  *
- * "group_clauses" and "group_exprs" are lists of SortGroupClause and the
- * corresponding grouping expressions respectively.
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClause, the corresponding grouping expressions and PathKey
+ * respectively.
  *
  * "agg_exprs" is a list of Aggref nodes for the aggregation of the relation's
  * paths.
@@ -1090,6 +1091,7 @@ typedef struct RelAggInfo
 
     List       *group_clauses;
     List       *group_exprs;
+    List       *group_pathkeys;
 
     List       *agg_exprs;        /* Aggref expressions. */
 
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 442f7f9b41..da67f3a901 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -111,7 +111,7 @@ select count(*) = 0 as ok from pg_stat_wal_receiver;
 select name, setting from pg_settings where name like 'enable%';
               name              | setting 
 --------------------------------+---------
- enable_agg_pushdown            | off
+ enable_agg_pushdown            | on
  enable_async_append            | on
  enable_bitmapscan              | on
  enable_gathermerge             | on

Re: WIP: Aggregation push-down - take2

From
vignesh C
Date:
On Thu, 17 Nov 2022 at 16:34, Antonin Houska <ah@cybertec.at> wrote:
>
> Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
> > Hi,
> >
> > I did a quick initial review of the v20 patch series. I plan to do a
> > more thorough review over the next couple days, if time permits. In
> > general I think the patch is in pretty good shape.
>
> Thanks.
>
> > I've added a bunch of comments in a number of places - see the "review
> > comments" parts for each of the original parts. That should make it
> > easier to deal with all the items. I'll go through the main stuff here:
>
> Unless I miss something, all these items are covered in context below, except
> for this one:
>
> > 7) when I change enable_agg_pushdown to true and run regression tests, I
> > get a bunch of failures like
> >
> >    ERROR:  WindowFunc found where not expected
> >
> > Seems we don't handle window functions correctly somewhere, or maybe
> > setup_aggregate_pushdown should check/reject hasWindowFuncs too?
>
> We don't need to reject window functions, window functions are processed after
> grouping/aggregation. The problem I noticed in the regression tests was that a
> window function referenced a (non-window) aggregate. We just need to ensure
> that pull_var_clause() recurses into that window function in such cases:
>
> Besides the next version, v21-fixes.patch file is attached. It tries to
> summarize all the changes between v21 and v22. (I wonder if this attachment
> makes the cfbot fail.)
>
>
> diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
> index 8e913c92d8..8dc39765f2 100644
> --- a/src/backend/optimizer/plan/initsplan.c
> +++ b/src/backend/optimizer/plan/initsplan.c
> @@ -355,7 +355,8 @@ create_aggregate_grouped_var_infos(PlannerInfo *root)
>         Assert(root->grouped_var_list == NIL);
>
>         tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
> -                                                                 PVC_INCLUDE_AGGREGATES);
> +                                                                 PVC_INCLUDE_AGGREGATES |
> +                                                                 PVC_RECURSE_WINDOWFUNCS);
>
>         /*
>          * Although GroupingFunc is related to root->parse->groupingSets, this
>
>
> > ---
> >  src/backend/optimizer/util/relnode.c | 11 +++++++++++
> >  src/include/nodes/pathnodes.h        |  3 +++
> >  2 files changed, 14 insertions(+)
> >
> > diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
> > index 94720865f47..d4367ba14a5 100644
> > --- a/src/backend/optimizer/util/relnode.c
> > +++ b/src/backend/optimizer/util/relnode.c
> > @@ -382,6 +382,12 @@ find_base_rel(PlannerInfo *root, int relid)
> >  /*
> >   * build_rel_hash
> >   *     Construct the auxiliary hash table for relation specific data.
> > + *
> > + * XXX Why is this renamed, leaving out the "join" part? Are we going to use
> > + * it for other purposes?
>
> Yes, besides join relation, it's used to find the "grouped relation" by
> Relids. This change tries to follow the suggestion "Maybe an appropriate
> preliminary patch ..." in [1], but I haven't got any feedback whether my
> understanding was correct.
>
> > + * XXX Also, why change the API and not pass PlannerInfo? Seems pretty usual
> > + * for planner functions.
>
> I think that the reason was that, with the patch applied, PlannerInfo contains
> multiple fields of the RelInfoList type, so build_rel_hash() needs an
> information which one it should process. Passing the exact field is simpler
> than passing PlannerInfo plus some additional information.
>
> >   */
> >  static void
> >  build_rel_hash(RelInfoList *list)
> > @@ -422,6 +428,11 @@ build_rel_hash(RelInfoList *list)
> >  /*
> >   * find_rel_info
> >   *     Find a base or join relation entry.
> > + *
> > + * XXX Why change the API and not pass PlannerInfo? Seems pretty usual
> > + * for planner functions.
>
> For the same reason that build_rel_hash() receives the list explicitly, see
> above.
>
> > + * XXX I don't understand why we need both this and find_join_rel.
>
> Perhaps I just wanted to keep the call sites of find_join_rel() untouched. I
> think that
>
>     find_join_rel(root, relids);
>
> is a little bit easier to read than
>
>     (RelOptInfo *) find_rel_info(root->join_rel_list, relids);
>
> >   */
> >  static void *
> >  find_rel_info(RelInfoList *list, Relids relids)
> > diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
> > index 0ca7d5ab51e..018ce755720 100644
> > --- a/src/include/nodes/pathnodes.h
> > +++ b/src/include/nodes/pathnodes.h
> > @@ -88,6 +88,9 @@ typedef enum UpperRelationKind
> >   * present and valid when rel_hash is not NULL.  Note that we still maintain
> >   * the list even when using the hash table for lookups; this simplifies life
> >   * for GEQO.
> > + *
> > + * XXX I wonder why we actually need a separate node, merely wrapping fields
> > + * that already existed ...
>
> This is so that the existing fields can still be printed out
> (nodes/outfuncs.c).
>
> > diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
> > index 2fd1a962699..6f6b7d0b93b 100644
> > --- a/src/backend/optimizer/README
> > +++ b/src/backend/optimizer/README
> > @@ -1168,6 +1168,12 @@ input of Agg node. However, if the groups are large enough, it may be more
> >  efficient to apply the partial aggregation to the output of base relation
> >  scan, and finalize it when we have all relations of the query joined:
> >
> > +XXX review: Hmm, do we need to push it all the way down to base relations? Or
> > +would it make sense to do the agg on an intermediate level? Say, we're joining
> > +three tables A, B and C. Maybe the agg could/should be evaluated on top of join
> > +A+B, before joining with C? Say, maybe the aggregate references columns from
> > +both base relations?
> > +
> >    EXPLAIN
> >    SELECT a.i, avg(b.y)
> >    FROM a JOIN b ON b.j = a.i
>
> Another example below does show the partial aggregates at join level.
>
> > +XXX Perhaps mention this may also mean the partial ggregate could be pushed
> > +to a remote server with FDW partitions?
>
> Even if it's not implemented in the current patch version?
>
> > +
> >  Note that there's often no GROUP BY expression to be used for the partial
> >  aggregation, so we use equivalence classes to derive grouping expression: in
> >  the example above, the grouping key "b.j" was derived from "a.i".
> >
> > +XXX I think this is slightly confusing - there is a GROUP BY expression for the
> > +partial aggregate, but as stated in the query it may not reference the side of
> > +a join explicitly.
>
> ok, changed.
>
> >  Also note that in this case the partial aggregate uses the "b.j" as grouping
> >  column although the column does not appear in the query target list. The point
> >  is that "b.j" is needed to evaluate the join condition, and there's no other
> >  way for the partial aggregate to emit its values.
> >
> > +XXX Not sure I understand what this is trying to say. Firstly, maybe it'd be
> > +helpful to show targetlists in the EXPLAIN, i.e. do it as VERBOSE. But more
> > +importantly, isn't this a direct consequence of the equivalence classes stuff
> > +mentioned in the preceding paragraph?
>
> The equivalence class is just a mechanism to derive expressions which are not
> explicitly mentioned in the query, but there's always a question whether you
> need to derive any expression for particular table or not. Here I tried to
> explain that the choice of join columns is related to the choice of grouping
> keys for the partial aggregate.
>
> I've deleted this paragraph and added a note to the previous one.
>
> >  Besides base relation, the aggregation can also be pushed down to join:
> >
> >    EXPLAIN
> > @@ -1217,6 +1235,10 @@ Besides base relation, the aggregation can also be pushed down to join:
> >         ->  Hash
> >               ->  Seq Scan on a
> >
> > +XXX Aha, so this is pretty-much an answer to my earlier comment, and matches
> > +my example with three tables. Maybe this suggests the initial reference to
> > +base relations is a bit confusing.
>
> I tried to use the simplest example to demonstrate the concepts, then extended
> it to the partially-aggregated joins.
>
> > +XXX I think this is a good explanation of the motivation for this patch, but
> > +maybe it'd be good to go into more details about how we decide if it's correct
> > +to actually do the pushdown, data structures etc. Similar to earlier parts of
> > +this README.
>
> Added two paragraphs, see "Regarding correctness...".
>
> > diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
> > index f00f900ff41..6d2c2f4fc36 100644
> > --- a/src/backend/optimizer/path/allpaths.c
> > +++ b/src/backend/optimizer/path/allpaths.c
> > @@ -196,9 +196,10 @@ make_one_rel(PlannerInfo *root, List *joinlist)
> >       /*
> >        * Now that the sizes are known, we can estimate the sizes of the grouped
> >        * relations.
> > +      *
> > +      * XXX Seems more consistent with code nearby.
> >        */
> > -     if (root->grouped_var_list)
> > -             setup_base_grouped_rels(root);
> > +     setup_base_grouped_rels(root);
>
> In general I prefer not calling a function if it's obvious that it's not
> needed, but on the other hand the test of the 'grouped_var_list' field may be
> considered disturbing from the caller's perspective. I've got no strong
> opinion on this, so I can accept this proposal.
>
> >
> >  /*
> > - * setup_based_grouped_rels
> > + * setup_base_grouped_rels
> >   *     For each "plain" relation build a grouped relation if aggregate pushdown
> >   *    is possible and if this relation is suitable for partial aggregation.
> >   */
>
> Fixed, thanks.
>
> >  {
> >       Index           rti;
> >
> > +     /* If there are no grouped relations, estimate their sizes. */
> > +     if (!root->grouped_var_list)
> > +             return;
> > +
>
> Accepted, but with different wording (s/relations/expressions/).
>
> > +             /* XXX Shouldn't this check be earlier? Seems cheaper than the check
> > +              * calling bms_nonempty_difference, for example. */
> >               if (brel->reloptkind != RELOPT_BASEREL)
> >                       continue;
>
> Right, moved.
>
> >               rel_grouped = build_simple_grouped_rel(root, brel->relid, &agg_info);
> > -             if (rel_grouped)
> > -             {
> > -                     /* Make the relation available for joining. */
> > -                     add_grouped_rel(root, rel_grouped, agg_info);
> > -             }
> > +
> > +             /* XXX When does this happen? */
> > +             if (!rel_grouped)
> > +                     continue;
> > +
> > +             /* Make the relation available for joining. */
> > +             add_grouped_rel(root, rel_grouped, agg_info);
>
> I'd use the "continue" statement if there was a lot of code in the "if
> (rel_grouped) {...}" branch, but no strong preference in this case, so
> accepted.
>
> >       }
> >  }
> >
> > @@ -560,6 +569,8 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
> >                                       /* Plain relation */
> >                                       set_plain_rel_pathlist(root, rel, rte);
> >
> > +                                     /* XXX Shouldn't this really be part of set_plain_rel_pathlist? */
> > +
> >                                       /* Add paths to the grouped relation if one exists. */
> >                                       rel_grouped = find_grouped_rel(root, rel->relids,
>
> Yes, it can. Moved.
>
> > @@ -3382,6 +3393,11 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
> >
> >  /*
> >   * Apply partial aggregation to a subpath and add the AggPath to the pathlist.
> > + *
> > + * XXX I think this is potentially quite confusing, because the existing "add"
> > + * functions add_path and add_partial_path only check if the proposed path is
> > + * dominated by an existing path, pathkeys, etc. But this does more than that,
> > + * perhaps even constructing new path etc.
> >   */
> >  static void
> >  add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
>
> Maybe, but I don't have a good idea of an alternative name.
> create_group_path() already exists and the create_*_path() functions are
> rather low-level. Maybe generate_grouped_path(), and at the same time rename
> generate_grouping_paths() to generate_grouped_paths()? In general, the
> generate_*_path*() functions do non-trivial things and eventually call
> add_path().
>
> > @@ -3399,9 +3414,16 @@ add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
> >       else
> >               elog(ERROR, "unexpected strategy %d", aggstrategy);
> >
> > +     /*
> > +      * Bail out if we failed to create a suitable aggregated path. This can
> > +      * happen e.g. then the path does not support hashing (for AGG_HASHED),
> > +      * or when the input path is not sorted.
> > +      */
> > +     if (agg_path == NULL)
> > +             return;
> > +
> >       /* Add the grouped path to the list of grouped base paths. */
> > -     if (agg_path != NULL)
> > -             add_path(rel, (Path *) agg_path);
> > +     add_path(rel, (Path *) agg_path);
>
> ok, changed.
>
> >  }
> >
> >  /*
> > @@ -3545,7 +3567,6 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
> >
> >       for (lev = 2; lev <= levels_needed; lev++)
> >       {
> > -             RelOptInfo *rel_grouped;
> >               ListCell   *lc;
> >
> >               /*
> > @@ -3567,6 +3588,8 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
> >                */
> >               foreach(lc, root->join_rel_level[lev])
> >               {
> > +                     RelOptInfo *rel_grouped;
> > +
> >                       rel = (RelOptInfo *) lfirst(lc);
>
> Sure, fixed.
>
> > diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
> > index 8e913c92d8b..d7a9de9645e 100644
> > --- a/src/backend/optimizer/plan/initsplan.c
> > +++ b/src/backend/optimizer/plan/initsplan.c
> > @@ -278,6 +278,8 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
> >   * each possible grouping expression.
> >   *
> >   * root->group_pathkeys must be setup before this function is called.
> > + *
> > + * XXX Perhaps this should check/reject hasWindowFuncs too?
>
> create_window_paths() is called after create_grouping_paths() (see
> grouping_planner()), so it should not care whether the input (possibly
> grouped) paths involve the aggregate push-down or not.
>
> >   */
> >  extern void
> >  setup_aggregate_pushdown(PlannerInfo *root)
> > @@ -311,6 +313,12 @@ setup_aggregate_pushdown(PlannerInfo *root)
> >       if (root->parse->hasTargetSRFs)
> >               return;
> >
> > +     /*
> > +      * XXX Maybe it'd be better to move create_aggregate_grouped_var_infos and
> > +      * create_grouping_expr_grouped_var_infos to a function returning bool, and
> > +      * only check that here.
> > +      */
> > +
>
> Hm, it looks to me like too much "indirection", and also a decriptive function
> name would be tricky to invent.
>
> >       /* Create GroupedVarInfo per (distinct) aggregate. */
> >       create_aggregate_grouped_var_infos(root);
> >
> > @@ -329,6 +337,8 @@ setup_aggregate_pushdown(PlannerInfo *root)
> >        * Now that we know that grouping can be pushed down, search for the
> >        * maximum sortgroupref. The base relations may need it if extra grouping
> >        * expressions get added to them.
> > +      *
> > +      * XXX Shouldn't we do that only when adding extra grouping expressions?
> >        */
> >       Assert(root->max_sortgroupref == 0);
> >       foreach(lc, root->processed_tlist)
>
> We don't know at this (early) stage whether those "extra grouping expression"
> will be needed for at least one relation. (max_sortgroupref is used by
> create_rel_agg_info())
>
> > diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
> > index 0ada3ba3ebe..2f4db69c1f9 100644
> > --- a/src/backend/optimizer/plan/planner.c
> > +++ b/src/backend/optimizer/plan/planner.c
> > @@ -3899,6 +3899,10 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
> >       /*
> >        * The non-partial paths can come either from the Gather above or from
> >        * aggregate push-down.
> > +      *
> > +      * XXX I can't quite convince myself this is correct. How come it's fine
> > +      * to check pathlist and then call set_cheapest() on partially_grouped_rel?
> > +      * Maybe it's correct and the comment merely needs to explain this.
>
> It's not clear to me what makes you confused. Without my patch, the code looks
> like this:
>
>     if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
>     {
>         gather_grouping_paths(root, partially_grouped_rel);
>         set_cheapest(partially_grouped_rel);
>     }
>
> Here gather_grouping_paths() adds paths to partially_grouped_rel->pathlist. My
> patch calls set_cheapest() independent from gather_grouping_paths() because
> the paths requiring the aggregate finalization can also be generated by the
> aggregate push-down feature.
>
> >        */
> >       if (partially_grouped_rel && partially_grouped_rel->pathlist)
> >               set_cheapest(partially_grouped_rel);
> > @@ -6847,6 +6851,12 @@ create_partial_grouping_paths(PlannerInfo *root,
> >        * push-down.
> >        */
> >       partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
> > +
> > +     /*
> > +      * If the relation already exists, it must have been created by aggregate
> > +      * pushdown. We can't check how exactly it got created, but we can at least
> > +      * check that aggregate pushdown is enabled.
> > +      */
> >       Assert(enable_agg_pushdown || partially_grouped_rel == NULL);
>
> ok, done.
>
> > @@ -6872,6 +6882,8 @@ create_partial_grouping_paths(PlannerInfo *root,
> >        * If we can't partially aggregate partial paths, and we can't partially
> >        * aggregate non-partial paths, then don't bother creating the new
> >        * RelOptInfo at all, unless the caller specified force_rel_creation.
> > +      *
> > +      * XXX Not sure why we're checking the partially_grouped_rel here?
> >        */
> >       if (cheapest_total_path == NULL &&
> >               cheapest_partial_path == NULL &&
>
> I think (but not verified yet) that without this test the function could
> return NULL for reasons unrelated to the aggregate push-down. Nevertheless, I
> realize now that there's no aggregate push-down specific processing in the
> function. I've adjusted it so that it does return, but the returned value is
> partially_grouped_rel rather than NULL.
>
> > @@ -6881,7 +6893,9 @@ create_partial_grouping_paths(PlannerInfo *root,
> >
> >       /*
> >        * Build a new upper relation to represent the result of partially
> > -      * aggregating the rows from the input relation.
> > +      * aggregating the rows from the input relation. The relation may
> > +      * already exist due to aggregate pushdown, in which case we don't
> > +      * need to create it.
> >        */
> >       if (partially_grouped_rel == NULL)
> >               partially_grouped_rel = fetch_upper_rel(root,
>
> ok, done.
>
> > @@ -6903,6 +6917,8 @@ create_partial_grouping_paths(PlannerInfo *root,
> >        *
> >        * If the target was already created for the sake of aggregate push-down,
> >        * it should be compatible with what we'd create here.
> > +      *
> > +      * XXX Why is this checking reltarget->exprs? What does that mean?
> >        */
> >       if (partially_grouped_rel->reltarget->exprs == NIL)
> >               partially_grouped_rel->reltarget =
>
> I've added this comment:
>
>          * XXX If fetch_upper_rel() had to create a new relation (i.e. aggregate
>          * push-down generated no paths), it created an empty target. Should we
>          * change the convention and have it assign NULL to reltarget instead?  Or
>          * should we introduce a function like is_pathtarget_empty()?
>
> > diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
> > index 7025ebf94be..395bd093d34 100644
> > --- a/src/backend/optimizer/util/pathnode.c
> > +++ b/src/backend/optimizer/util/pathnode.c
> > @@ -3163,9 +3163,21 @@ create_agg_path(PlannerInfo *root,
> >  }
> >
> >  /*
> > + * create_agg_sorted_path
> > + *           Creates a pathnode performing sorted aggregation/grouping
> > + *
> >   * Apply AGG_SORTED aggregation path to subpath if it's suitably sorted.
> >   *
> >   * NULL is returned if sorting of subpath output is not suitable.
> > + *
> > + * XXX I'm a bit confused why we need this? We now have create_agg_path and also
> > + * create_agg_sorted_path and create_agg_hashed_path.
>
> Do you mean that the function names are confusing? The functions
> create_agg_sorted_path() and create_agg_hashed_path() do some checks /
> preparation for the call of the existing function create_agg_path(), which is
> more low-level. Should the names be something like
> create_partial_agg_sorted_path() and create_partial_agg_hashed_path() ?
>
> > + *
> > + * XXX This assumes the input path to be sorted in a suitable way, but for
> > + * regular aggregation we check that separately and then perhaps add sort
> > + * if needed (possibly incremental one). That is, we don't do such checks
> > + * in create_agg_path. Shouldn't we do the same thing before calling this
> > + * new functions?
> >   */
> >  AggPath *
> >  create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
> > @@ -3184,6 +3196,7 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
> >       agg_exprs = agg_info->agg_exprs;
> >       target = agg_info->target;
>
> Likewise, it seems that you'd like to see different function name and maybe
> different location of this function. Both create_agg_sorted_path() and
> create_agg_hashed_path() are rather wrappers for create_agg_path().
>
> >
> > +     /* Bail out if the input path is not sorted at all. */
> >       if (subpath->pathkeys == NIL)
> >               return NULL;
>
> ok, done.
>
> > @@ -3192,6 +3205,18 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
> >
> >       /*
> >        * Find all query pathkeys that our relation does affect.
> > +      *
> > +      * XXX Not sure what "that our relation does affect" means? Also, we
> > +      * are not looking at query_pathkeys but group_pathkeys, so that's a
> > +      * bit confusing. Perhaps something like this would be better:
> > +      *
>
> Indeed, the check of pathkeys was weird, I've reworked it.
>
> > @@ -3210,10 +3235,21 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
> >               }
> >       }
> >
> > +     /* Bail out if the subquery has no pathkeys for the grouping. */
> >       if (key_subset == NIL)
> >               return NULL;
> >
> > -     /* Check if AGG_SORTED is useful for the whole query.  */
> > +     /*
> > +      * Check if AGG_SORTED is useful for the whole query.
> > +      *
> > +      * XXX So this means we require the group pathkeys matched to the
> > +      * subpath have to be a prefix of subpath->pathkeys. Why is that
> > +      * necessary? We'll reduce the cardinality, and in the worst case
> > +      * we'll have to add a separate sort (full or incremental). Or we
> > +      * could finalize using hashed aggregate.
>
> Although with different arguments, pathkeys_contained_in() is still used in
> the new version of the patch. I've added a TODO comment about the incremental
> sort (it did not exist when I was writing the patch), but what do you mean by
> "reducing the cardinality"? Eventually the partial aggregate should reduce the
> cardinality, but for the AGG_SORT strategy to work, the input sorting must be
> such that the executor can recognize the group boundaries.
>
> > +      *
> > +      * XXX Doesn't seem to change any regression tests when disabled.
> > +      */
> >       if (!pathkeys_contained_in(key_subset, subpath->pathkeys))
> >               return NULL;
>
> "disabled" means removal of this part (including the return statement), or
> returning NULL unconditionally? Whatever you mean, please check with the new
> version.
>
> > @@ -3231,7 +3267,7 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
> >       result = create_agg_path(root, rel, subpath, target,
> >                                                        AGG_SORTED, aggsplit,
> >                                                        agg_info->group_clauses,
> > -                                                      NIL,
> > +                                                      NIL,   /* qual for HAVING clause */
> >                                                        &agg_costs,
> >                                                        dNumGroups);
>
> ok, done here as well as in create_agg_hashed_path().
>
> > @@ -3283,6 +3319,9 @@ create_agg_hashed_path(PlannerInfo *root, RelOptInfo *rel,
> >
&agg_costs,
> >
dNumGroups);
> >
> > +             /*
> > +              * XXX But we can spill to disk in hashagg now, no?
> > +              */
> >               if (hashaggtablesize < work_mem * 1024L)
> >               {
>
> Yes, we can. It wasn't possible while I was writing the patch. Fixed.
>
> > diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
> > index 868d21c351e..6e87ada684b 100644
> > --- a/src/backend/utils/misc/postgresql.conf.sample
> > +++ b/src/backend/utils/misc/postgresql.conf.sample
> > @@ -388,6 +388,7 @@
> >  #enable_seqscan = on
> >  #enable_sort = on
> >  #enable_tidscan = on
> > +#enable_agg_pushdown = on
>
> Done.
>
> > diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
> > index 1055ea70940..05192ca549a 100644
> > --- a/src/backend/optimizer/path/allpaths.c
> > +++ b/src/backend/optimizer/path/allpaths.c
> > @@ -3352,7 +3352,7 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
> >                                               RelOptInfo *rel_plain, RelAggInfo *agg_info)
> >  {
> >       ListCell   *lc;
> > -     Path       *path;
> > +     Path       *path;       /* XXX why declare at this level, not in the loops */
> >
>
> I usually do it this way, not sure why. Perhaps because it's less typing :-) I
> changed that in the next version so that we don't waste time arguing about
> unimportant things.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
5212d447fa53518458cbe609092b347803a667c5 ===
=== applying patch ./v21-fixes.patch
patching file src/backend/optimizer/README
Hunk #1 FAILED at 1186.
1 out of 1 hunk FAILED -- saving rejects to file
src/backend/optimizer/README.rej
patching file src/backend/optimizer/path/allpaths.c
Hunk #1 FAILED at 197.
Hunk #2 FAILED at 341.
Hunk #3 succeeded at 339 with fuzz 1 (offset -11 lines).
Hunk #4 succeeded at 1014 with fuzz 2 (offset 647 lines).
Hunk #5 FAILED at 378.
Hunk #6 FAILED at 563.
Hunk #7 succeeded at 2793 with fuzz 1 (offset 1948 lines).
Hunk #8 FAILED at 867.
Hunk #9 FAILED at 3439.
Hunk #10 FAILED at 3590.
Hunk #11 succeeded at 3430 (offset -182 lines).
7 out of 11 hunks FAILED -- saving rejects to file
src/backend/optimizer/path/allpaths.c.rej
patching file src/backend/optimizer/path/costsize.c

[1] - http://cfbot.cputube.org/patch_41_3764.log

Regards,
Vignesh



Re: WIP: Aggregation push-down - take2

From
Antonin Houska
Date:
vignesh C <vignesh21@gmail.com> wrote:

> The patch does not apply on top of HEAD as in [1], please post a rebased patch:

> [1] - http://cfbot.cputube.org/patch_41_3764.log

This is the next version (only rebased, no other changes).

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

From 388be92d9e1c577968896aada3901619468791cc Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Wed, 4 Jan 2023 14:41:39 +0100
Subject: [PATCH 1/3] Introduce RelInfoList structure.

This patch puts join_rel_list and join_rel_hash fields of PlannerInfo
structure into a new structure RelInfoList. It also adjusts add_join_rel() and
find_join_rel() functions so they only call add_rel_info() and find_rel_info()
respectively.

fetch_upper_rel() now uses the new API and the hash table as well because the
list stored in root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG] will contain many
relations as soon as the aggregate push-down feature is added.
---
 contrib/postgres_fdw/postgres_fdw.c    |   3 +-
 src/backend/optimizer/geqo/geqo_eval.c |  12 +-
 src/backend/optimizer/plan/planmain.c  |   3 +-
 src/backend/optimizer/util/relnode.c   | 170 ++++++++++++++-----------
 src/include/nodes/pathnodes.h          |  31 +++--
 5 files changed, 126 insertions(+), 93 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 332b4a5cde..231ee967b0 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -5960,7 +5960,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
      */
     Assert(fpinfo->relation_index == 0);    /* shouldn't be set yet */
     fpinfo->relation_index =
-        list_length(root->parse->rtable) + list_length(root->join_rel_list);
+        list_length(root->parse->rtable) +
+        list_length(root->join_rel_list->items);
 
     return true;
 }
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 6d5d1a7eb2..1df62b71a9 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -92,11 +92,11 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
      *
      * join_rel_level[] shouldn't be in use, so just Assert it isn't.
      */
-    savelength = list_length(root->join_rel_list);
-    savehash = root->join_rel_hash;
+    savelength = list_length(root->join_rel_list->items);
+    savehash = root->join_rel_list->hash;
     Assert(root->join_rel_level == NULL);
 
-    root->join_rel_hash = NULL;
+    root->join_rel_list->hash = NULL;
 
     /* construct the best path for the given combination of relations */
     joinrel = gimme_tree(root, tour, num_gene);
@@ -121,9 +121,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
      * Restore join_rel_list to its former state, and put back original
      * hashtable if any.
      */
-    root->join_rel_list = list_truncate(root->join_rel_list,
-                                        savelength);
-    root->join_rel_hash = savehash;
+    root->join_rel_list->items = list_truncate(root->join_rel_list->items,
+                                               savelength);
+    root->join_rel_list->hash = savehash;
 
     /* release all the memory acquired within gimme_tree */
     MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 4c17407e5d..b4bfbe6e32 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -65,8 +65,7 @@ query_planner(PlannerInfo *root,
      * NOTE: append_rel_list was set up by subquery_planner, so do not touch
      * here.
      */
-    root->join_rel_list = NIL;
-    root->join_rel_hash = NULL;
+    root->join_rel_list = makeNode(RelInfoList);
     root->join_rel_level = NULL;
     root->join_cur_level = 0;
     root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 75bc20c7c9..f8ccd1a5db 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -33,11 +33,15 @@
 #include "utils/lsyscache.h"
 
 
-typedef struct JoinHashEntry
+/*
+ * An entry of a hash table that we use to make lookup for RelOptInfo
+ * structures more efficient.
+ */
+typedef struct RelInfoEntry
 {
-    Relids        join_relids;    /* hash key --- MUST BE FIRST */
-    RelOptInfo *join_rel;
-} JoinHashEntry;
+    Relids        relids;            /* hash key --- MUST BE FIRST */
+    void       *data;
+} RelInfoEntry;
 
 static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
                                 RelOptInfo *input_rel);
@@ -395,11 +399,11 @@ find_base_rel(PlannerInfo *root, int relid)
 }
 
 /*
- * build_join_rel_hash
- *      Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ *      Construct the auxiliary hash table for relation specific data.
  */
 static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
 {
     HTAB       *hashtab;
     HASHCTL        hash_ctl;
@@ -407,47 +411,49 @@ build_join_rel_hash(PlannerInfo *root)
 
     /* Create the hash table */
     hash_ctl.keysize = sizeof(Relids);
-    hash_ctl.entrysize = sizeof(JoinHashEntry);
+    hash_ctl.entrysize = sizeof(RelInfoEntry);
     hash_ctl.hash = bitmap_hash;
     hash_ctl.match = bitmap_match;
     hash_ctl.hcxt = CurrentMemoryContext;
-    hashtab = hash_create("JoinRelHashTable",
+    hashtab = hash_create("RelHashTable",
                           256L,
                           &hash_ctl,
                           HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
 
     /* Insert all the already-existing joinrels */
-    foreach(l, root->join_rel_list)
+    foreach(l, list->items)
     {
-        RelOptInfo *rel = (RelOptInfo *) lfirst(l);
-        JoinHashEntry *hentry;
+        RelOptInfo       *rel = lfirst_node(RelOptInfo, l);
+        RelInfoEntry *hentry;
         bool        found;
 
-        hentry = (JoinHashEntry *) hash_search(hashtab,
-                                               &(rel->relids),
-                                               HASH_ENTER,
-                                               &found);
+        hentry = (RelInfoEntry *) hash_search(hashtab,
+                                              &rel->relids,
+                                              HASH_ENTER,
+                                              &found);
         Assert(!found);
-        hentry->join_rel = rel;
+        hentry->data = rel;
     }
 
-    root->join_rel_hash = hashtab;
+    list->hash = hashtab;
 }
 
 /*
- * find_join_rel
- *      Returns relation entry corresponding to 'relids' (a set of RT indexes),
- *      or NULL if none exists.  This is for join relations.
+ * find_rel_info
+ *      Find a base or join relation entry.
  */
-RelOptInfo *
-find_join_rel(PlannerInfo *root, Relids relids)
+static void *
+find_rel_info(RelInfoList *list, Relids relids)
 {
+    if (list == NULL)
+        return NULL;
+
     /*
      * Switch to using hash lookup when list grows "too long".  The threshold
      * is arbitrary and is known only here.
      */
-    if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
-        build_join_rel_hash(root);
+    if (!list->hash && list_length(list->items) > 32)
+        build_rel_hash(list);
 
     /*
      * Use either hashtable lookup or linear search, as appropriate.
@@ -457,34 +463,82 @@ find_join_rel(PlannerInfo *root, Relids relids)
      * so would force relids out of a register and thus probably slow down the
      * list-search case.
      */
-    if (root->join_rel_hash)
+    if (list->hash)
     {
         Relids        hashkey = relids;
-        JoinHashEntry *hentry;
+        RelInfoEntry *hentry;
 
-        hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
-                                               &hashkey,
-                                               HASH_FIND,
-                                               NULL);
+        hentry = (RelInfoEntry *) hash_search(list->hash,
+                                              &hashkey,
+                                              HASH_FIND,
+                                              NULL);
         if (hentry)
-            return hentry->join_rel;
+            return hentry->data;
     }
     else
     {
         ListCell   *l;
 
-        foreach(l, root->join_rel_list)
+        foreach(l, list->items)
         {
-            RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+            RelOptInfo   *item = lfirst_node(RelOptInfo, l);
 
-            if (bms_equal(rel->relids, relids))
-                return rel;
+            if (bms_equal(item->relids, relids))
+                return item;
         }
     }
 
     return NULL;
 }
 
+/*
+ * find_join_rel
+ *      Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ *      or NULL if none exists.  This is for join relations.
+ */
+RelOptInfo *
+find_join_rel(PlannerInfo *root, Relids relids)
+{
+    return (RelOptInfo *) find_rel_info(root->join_rel_list, relids);
+}
+
+/*
+ * add_rel_info
+ *        Add relation specific info to a list, and also add it to the auxiliary
+ *        hashtable if there is one.
+ */
+static void
+add_rel_info(RelInfoList *list, RelOptInfo *rel)
+{
+    /* GEQO requires us to append the new joinrel to the end of the list! */
+    list->items = lappend(list->items, rel);
+
+    /* store it into the auxiliary hashtable if there is one. */
+    if (list->hash)
+    {
+        RelInfoEntry *hentry;
+        bool        found;
+
+        hentry = (RelInfoEntry *) hash_search(list->hash,
+                                              &rel->relids,
+                                              HASH_ENTER,
+                                              &found);
+        Assert(!found);
+        hentry->data = rel;
+    }
+}
+
+/*
+ * add_join_rel
+ *        Add given join relation to the list of join relations in the given
+ *        PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+    add_rel_info(root->join_rel_list, joinrel);
+}
+
 /*
  * set_foreign_rel_properties
  *        Set up foreign-join fields if outer and inner relation are foreign
@@ -535,32 +589,6 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
     }
 }
 
-/*
- * add_join_rel
- *        Add given join relation to the list of join relations in the given
- *        PlannerInfo. Also add it to the auxiliary hashtable if there is one.
- */
-static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
-{
-    /* GEQO requires us to append the new joinrel to the end of the list! */
-    root->join_rel_list = lappend(root->join_rel_list, joinrel);
-
-    /* store it into the auxiliary hashtable if there is one. */
-    if (root->join_rel_hash)
-    {
-        JoinHashEntry *hentry;
-        bool        found;
-
-        hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
-                                               &(joinrel->relids),
-                                               HASH_ENTER,
-                                               &found);
-        Assert(!found);
-        hentry->join_rel = joinrel;
-    }
-}
-
 /*
  * build_join_rel
  *      Returns relation entry corresponding to the union of two given rels,
@@ -1229,22 +1257,14 @@ subbuild_joinrel_joinlist(RelOptInfo *joinrel,
 RelOptInfo *
 fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
 {
+    RelInfoList *list = &root->upper_rels[kind];
     RelOptInfo *upperrel;
-    ListCell   *lc;
-
-    /*
-     * For the moment, our indexing data structure is just a List for each
-     * relation kind.  If we ever get so many of one kind that this stops
-     * working well, we can improve it.  No code outside this function should
-     * assume anything about how to find a particular upperrel.
-     */
 
     /* If we already made this upperrel for the query, return it */
-    foreach(lc, root->upper_rels[kind])
+    if (list)
     {
-        upperrel = (RelOptInfo *) lfirst(lc);
-
-        if (bms_equal(upperrel->relids, relids))
+        upperrel = find_rel_info(list, relids);
+        if (upperrel)
             return upperrel;
     }
 
@@ -1263,7 +1283,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
     upperrel->cheapest_unique_path = NULL;
     upperrel->cheapest_parameterized_paths = NIL;
 
-    root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
+    add_rel_info(&root->upper_rels[kind], upperrel);
 
     return upperrel;
 }
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 1827e50647..9790c058f9 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -80,6 +80,25 @@ typedef enum UpperRelationKind
     /* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
 } UpperRelationKind;
 
+/*
+ * Hashed list to store relation specific info and to retrieve it by relids.
+ *
+ * For small problems we just scan the list to do lookups, but when there are
+ * many relations we build a hash table for faster lookups. The hash table is
+ * present and valid when rel_hash is not NULL.  Note that we still maintain
+ * the list even when using the hash table for lookups; this simplifies life
+ * for GEQO.
+ */
+typedef struct RelInfoList
+{
+    pg_node_attr(no_copy_equal, no_read)
+
+    NodeTag        type;
+
+    List       *items;
+    struct HTAB *hash pg_node_attr(read_write_ignore);
+} RelInfoList;
+
 /*----------
  * PlannerGlobal
  *        Global information for planning/optimization
@@ -266,15 +285,9 @@ struct PlannerInfo
 
     /*
      * join_rel_list is a list of all join-relation RelOptInfos we have
-     * considered in this planning run.  For small problems we just scan the
-     * list to do lookups, but when there are many join relations we build a
-     * hash table for faster lookups.  The hash table is present and valid
-     * when join_rel_hash is not NULL.  Note that we still maintain the list
-     * even when using the hash table for lookups; this simplifies life for
-     * GEQO.
+     * considered in this planning run.
      */
-    List       *join_rel_list;
-    struct HTAB *join_rel_hash pg_node_attr(read_write_ignore);
+    struct RelInfoList *join_rel_list;    /* list of join-relation RelOptInfos */
 
     /*
      * When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -401,7 +414,7 @@ struct PlannerInfo
      * Upper-rel RelOptInfos. Use fetch_upper_rel() to get any particular
      * upper rel.
      */
-    List       *upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+    RelInfoList       upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);;
 
     /* Result tlists chosen by grouping_planner for upper-stage processing */
     struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
-- 
2.31.1

From 668afaba18d8fc98e4f0a9747190168d3847912c Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Wed, 4 Jan 2023 14:41:39 +0100
Subject: [PATCH 2/3] Aggregate push-down - basic functionality.

With this patch, partial aggregation can be applied to a base relation or to a
join, and the resulting "grouped" relations can be joined to other "plain"
relations. Once all tables are joined, the aggregation is finalized. See
README for more information.

The next patches will enable the aggregate push-down feature for parallel
query processing, for partitioned tables and for foreign tables.
---
 src/backend/optimizer/README                  |  89 ++
 src/backend/optimizer/path/allpaths.c         | 157 +++
 src/backend/optimizer/path/costsize.c         |  16 +-
 src/backend/optimizer/path/equivclass.c       | 130 +++
 src/backend/optimizer/path/joinrels.c         | 193 +++-
 src/backend/optimizer/plan/initsplan.c        | 290 +++++
 src/backend/optimizer/plan/planmain.c         |  12 +
 src/backend/optimizer/plan/planner.c          |  71 +-
 src/backend/optimizer/plan/setrefs.c          |  33 +
 src/backend/optimizer/prep/prepagg.c          | 264 +++--
 src/backend/optimizer/prep/prepjointree.c     |   1 +
 src/backend/optimizer/util/pathnode.c         | 126 ++-
 src/backend/optimizer/util/relnode.c          | 998 +++++++++++++++++-
 src/backend/optimizer/util/tlist.c            |  31 +
 src/backend/utils/misc/guc_tables.c           |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/pathnodes.h                 |  96 ++
 src/include/optimizer/clauses.h               |   3 +-
 src/include/optimizer/pathnode.h              |  19 +-
 src/include/optimizer/paths.h                 |   6 +
 src/include/optimizer/planmain.h              |   1 +
 src/include/optimizer/prep.h                  |   2 +
 src/include/optimizer/tlist.h                 |   4 +-
 src/test/regress/expected/agg_pushdown.out    | 216 ++++
 src/test/regress/expected/sysviews.out        |   3 +-
 src/test/regress/parallel_schedule            |   2 +
 src/test/regress/sql/agg_pushdown.sql         | 115 ++
 27 files changed, 2694 insertions(+), 195 deletions(-)
 create mode 100644 src/test/regress/expected/agg_pushdown.out
 create mode 100644 src/test/regress/sql/agg_pushdown.sql

diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 41c120e0cd..db97bd254d 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1158,3 +1158,92 @@ breaking down aggregation or grouping over a partitioned relation into
 aggregation or grouping over its partitions is called partitionwise
 aggregation.  Especially when the partition keys match the GROUP BY clause,
 this can be significantly faster than the regular method.
+
+Aggregate push-down
+-------------------
+
+The obvious way to evaluate aggregates is to evaluate the FROM clause of the
+SQL query (this is what query_planner does) and use the resulting paths as the
+input of Agg node. However, if the groups are large enough, it may be more
+efficient to apply the partial aggregation to the output of base relation
+scan, and finalize it when we have all relations of the query joined:
+
+  EXPLAIN
+  SELECT a.i, avg(b.y)
+  FROM a JOIN b ON b.j = a.i
+  GROUP BY a.i;
+
+  Finalize HashAggregate
+    Group Key: a.i
+    ->  Nested Loop
+          ->  Partial HashAggregate
+                Group Key: b.j
+                ->  Seq Scan on b
+          ->  Index Only Scan using a_pkey on a
+                Index Cond: (i = b.j)
+
+Thus the join above the partial aggregate node receives fewer input rows, and
+so the number of outer-to-inner pairs of tuples to be checked can be
+significantly lower, which can in turn lead to considerably lower join cost.
+
+Note that the GROUP BY expression might not be useful for the partial
+aggregate. In the example above, the aggregate avg(b.y) references table "b",
+but the GROUP BY expression mentions "a". However, the equivalence class {a.i,
+b.j} allows us to use the b.j column as a grouping key for the partial
+aggregation of the "b" table. The equivalence class mechanism is suitable
+because it's designed to derive join clauses, and at the same time the join
+clauses determine the choice of grouping columns of the partial aggregate: the
+only way for the partial aggregate to provide upper join(s) with input values
+is to have the join input expression(s) in the grouping key: besides grouping
+columns, the partial aggregate can only produce the transient states of the
+aggregate functions, but aggregate functions cannot be referenced by the JOIN
+clauses.
+
+Regarding correctness, join node considers the output of the partial aggregate
+to be equivalent to the output of a plain (non-aggregated) relation scan. That
+is, a group (i.e. a row of the partial aggregate output) matches the other
+side of the join if and only if each row of the non-aggregate relation
+does. In other words, all rows belonging to the same group have the same value
+of the join columns (As mentioned above, a join cannot reference other output
+expressions of the partial aggregate than the grouping expressions.).
+
+However, there's a restriction from the aggregate's perspective: the aggregate
+cannot be pushed down if any column referenced by either grouping expression
+or aggregate function can be set to NULL by an outer join above the relation
+to which we want to apply the partiall aggregation. The point is that those
+NULL values would not appear on the input of the pushed-down, so it could
+either put the rows into groups in a different way than the aggregate at the
+top of the plan, or it could compute wrong values of the aggregate functions.
+
+Besides base relation, the aggregation can also be pushed down to join:
+
+  EXPLAIN
+  SELECT a.i, avg(b.y + c.v)
+  FROM   a JOIN b ON b.j = a.i
+         JOIN c ON c.k = a.i
+  WHERE b.j = c.k GROUP BY a.i;
+
+  Finalize HashAggregate
+    Group Key: a.i
+    ->  Hash Join
+      Hash Cond: (b.j = a.i)
+      ->  Partial HashAggregate
+        Group Key: b.j
+        ->  Hash Join
+              Hash Cond: (b.j = c.k)
+              ->  Seq Scan on b
+              ->  Hash
+                ->  Seq Scan on c
+      ->  Hash
+        ->  Seq Scan on a
+
+Whether the Agg node is created out of base relation or out of join, it's
+added to a separate RelOptInfo that we call "grouped relation". Grouped
+relation can be joined to a non-grouped relation, which results in a grouped
+relation too. Join of two grouped relations does not seem to be very useful
+and is currently not supported.
+
+If query_planner produces a grouped relation that contains valid paths, these
+are simply added to the UPPERREL_PARTIAL_GROUP_AGG relation. Further
+processing of these paths then does not differ from processing of other
+partially grouped paths.
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index c2fc568dc8..e42266f220 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -62,6 +62,7 @@ typedef struct pushdown_safety_info
 
 /* These parameters are set by GUC */
 bool        enable_geqo = false;    /* just in case GUC doesn't set it */
+bool        enable_agg_pushdown;
 int            geqo_threshold;
 int            min_parallel_table_scan_size;
 int            min_parallel_index_scan_size;
@@ -75,6 +76,7 @@ join_search_hook_type join_search_hook = NULL;
 
 static void set_base_rel_consider_startup(PlannerInfo *root);
 static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
 static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
                          Index rti, RangeTblEntry *rte);
@@ -126,6 +128,9 @@ static void set_result_pathlist(PlannerInfo *root, RelOptInfo *rel,
                                 RangeTblEntry *rte);
 static void set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel,
                                    RangeTblEntry *rte);
+static void add_grouped_path(PlannerInfo *root, RelOptInfo *rel,
+                             Path *subpath, AggStrategy aggstrategy,
+                             RelAggInfo *agg_info);
 static RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist);
 static bool subquery_is_pushdown_safe(Query *subquery, Query *topquery,
                                       pushdown_safety_info *safetyInfo);
@@ -188,6 +193,12 @@ make_one_rel(PlannerInfo *root, List *joinlist)
      */
     set_base_rel_sizes(root);
 
+    /*
+     * Now that the sizes are known, we can estimate the sizes of the grouped
+     * relations.
+     */
+    setup_base_grouped_rels(root);
+
     /*
      * We should now have size estimates for every actual table involved in
      * the query, and we also know which if any have been deleted from the
@@ -328,6 +339,54 @@ set_base_rel_sizes(PlannerInfo *root)
     }
 }
 
+/*
+ * setup_base_grouped_rels
+ *      For each "plain" relation build a grouped relation if aggregate pushdown
+ *    is possible and if this relation is suitable for partial aggregation.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+    Index        rti;
+
+    /* If there are no grouping expressions, no aggregate push-down. */
+    if (!root->grouped_var_list)
+        return;
+
+
+    for (rti = 1; rti < root->simple_rel_array_size; rti++)
+    {
+        RelOptInfo *brel = root->simple_rel_array[rti];
+        RelOptInfo *rel_grouped;
+        RelAggInfo *agg_info;
+
+        /* there may be empty slots corresponding to non-baserel RTEs */
+        if (brel == NULL)
+            continue;
+
+        Assert(brel->relid == rti); /* sanity check on array */
+
+        /* ignore RTEs that are "other rels" */
+        if (brel->reloptkind != RELOPT_BASEREL)
+            continue;
+
+        /*
+         * The aggregate push-down feature only makes sense if there are
+         * multiple base rels in the query.
+         */
+        if (!bms_nonempty_difference(root->all_baserels, brel->relids))
+            continue;
+
+        rel_grouped = build_simple_grouped_rel(root, brel->relid, &agg_info);
+        /* Couldn't any aggregate be pushed down to this relation? */
+        if (!rel_grouped)
+            continue;
+
+        /* Make the relation available for joining. */
+        add_grouped_rel(root, rel_grouped, agg_info);
+    }
+}
+
 /*
  * set_base_rel_pathlists
  *      Finds all paths available for scanning each base-relation entry.
@@ -769,6 +828,8 @@ static void
 set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 {
     Relids        required_outer;
+    RelOptInfo *rel_grouped;
+    RelAggInfo *agg_info;
 
     /*
      * We don't support pushing join clauses into the quals of a seqscan, but
@@ -789,6 +850,14 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
     /* Consider TID scans */
     create_tidscan_paths(root, rel);
+
+    /* Add paths to the grouped relation if one exists. */
+    rel_grouped = find_grouped_rel(root, rel->relids, &agg_info);
+    if (!rel_grouped)
+        return;
+
+    generate_grouping_paths(root, rel_grouped, rel, agg_info);
+    set_cheapest(rel_grouped);
 }
 
 /*
@@ -3257,6 +3326,87 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
     }
 }
 
+/*
+ * generate_grouping_paths
+ *         Create partially aggregated paths and add them to grouped relation.
+ *
+ * "rel_plain" is base or join relation whose paths are not grouped.
+ */
+void
+generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
+                        RelOptInfo *rel_plain, RelAggInfo *agg_info)
+{
+    ListCell   *lc;
+
+    if (IS_DUMMY_REL(rel_plain))
+    {
+        mark_dummy_rel(rel_grouped);
+        return;
+    }
+
+    foreach(lc, rel_plain->pathlist)
+    {
+        Path       *path = (Path *) lfirst(lc);
+
+        /*
+         * Since the path originates from the non-grouped relation which is
+         * not aware of the aggregate push-down, we must ensure that it
+         * provides the correct input for aggregation.
+         */
+        path = (Path *) create_projection_path(root, rel_grouped, path,
+                                               agg_info->agg_input);
+
+        /*
+         * add_grouped_path() will check whether the path has suitable
+         * pathkeys.
+         */
+        add_grouped_path(root, rel_grouped, path, AGG_SORTED, agg_info);
+
+        /*
+         * Repeated creation of hash table (for new parameter values) should
+         * be possible, does not sound like a good idea in terms of
+         * efficiency.
+         */
+        if (path->param_info == NULL)
+            add_grouped_path(root, rel_grouped, path, AGG_HASHED, agg_info);
+    }
+
+    /* Could not generate any grouped paths? */
+    if (rel_grouped->pathlist == NIL)
+        mark_dummy_rel(rel_grouped);
+}
+
+/*
+ * Apply partial aggregation to a subpath and add the AggPath to the pathlist.
+ */
+static void
+add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+                 AggStrategy aggstrategy, RelAggInfo *agg_info)
+{
+    Path       *agg_path;
+
+
+    if (aggstrategy == AGG_HASHED)
+        agg_path = (Path *) create_agg_hashed_path(root, rel, subpath,
+                                                   agg_info);
+    else if (aggstrategy == AGG_SORTED)
+        agg_path = (Path *) create_agg_sorted_path(root, rel, subpath,
+                                                   agg_info);
+    else
+        elog(ERROR, "unexpected strategy %d", aggstrategy);
+
+    /*
+     * Bail out if we failed to create a suitable aggregated path. This can
+     * happen e.g. then the path does not support hashing (for AGG_HASHED),
+     * or when the input path is not sorted.
+     */
+    if (agg_path == NULL)
+        return;
+
+    /* Add the grouped path to the list of grouped base paths. */
+    add_path(rel, (Path *) agg_path);
+}
+
 /*
  * make_rel_from_joinlist
  *      Build access paths using a "joinlist" to guide the join path search.
@@ -3419,6 +3569,8 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
          */
         foreach(lc, root->join_rel_level[lev])
         {
+            RelOptInfo *rel_grouped;
+
             rel = (RelOptInfo *) lfirst(lc);
 
             /* Create paths for partitionwise joins. */
@@ -3435,6 +3587,11 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
             /* Find and save the cheapest paths for this rel */
             set_cheapest(rel);
 
+            /* The same for grouped relation if one exists. */
+            rel_grouped = find_grouped_rel(root, rel->relids, NULL);
+            if (rel_grouped)
+                set_cheapest(rel_grouped);
+
 #ifdef OPTIMIZER_DEBUG
             debug_print_rel(root, rel);
 #endif
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 29ae32d960..261041b466 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -6014,11 +6014,11 @@ set_pathtarget_cost_width(PlannerInfo *root, PathTarget *target)
     foreach(lc, target->exprs)
     {
         Node       *node = (Node *) lfirst(lc);
+        int32        item_width;
 
         if (IsA(node, Var))
         {
             Var           *var = (Var *) node;
-            int32        item_width;
 
             /* We should not see any upper-level Vars here */
             Assert(var->varlevelsup == 0);
@@ -6050,6 +6050,20 @@ set_pathtarget_cost_width(PlannerInfo *root, PathTarget *target)
             Assert(item_width > 0);
             tuple_width += item_width;
         }
+        else if (IsA(node, Aggref))
+        {
+            /*
+             * If the target is evaluated by AggPath, it'll care of cost
+             * estimate. If the target is above AggPath (typically target of a
+             * join relation that contains grouped relation), the cost of
+             * Aggref should not be accounted for again.
+             *
+             * On the other hand, width is always needed.
+             */
+            item_width = get_typavgwidth(exprType(node), exprTypmod(node));
+            Assert(item_width > 0);
+            tuple_width += item_width;
+        }
         else
         {
             /*
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 7d7e6facdf..ed7d4eb2f7 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -3149,6 +3149,136 @@ is_redundant_derived_clause(RestrictInfo *rinfo, List *clauselist)
     return false;
 }
 
+/*
+ * translate_expression_to_rels
+ *        If the appropriate equivalence classes exist, replace vars in
+ *        gvi->gvexpr with vars whose varno is equal to relid. Return NULL if
+ *        translation is not possible or needed.
+ *
+ * Note: Currently we only translate Var expressions. This is subject to
+ * change as the aggregate push-down feature gets enhanced.
+ */
+GroupedVarInfo *
+translate_expression_to_rel(PlannerInfo *root, GroupedVarInfo *gvi,
+                            Index relid)
+{
+    Var           *var;
+    ListCell   *l1;
+    bool        found_orig = false;
+    Var           *var_translated = NULL;
+    GroupedVarInfo *result;
+
+    /* Can't do anything w/o equivalence classes. */
+    if (root->eq_classes == NIL)
+        return NULL;
+
+    var = castNode(Var, gvi->gvexpr);
+
+    /*
+     * Do we need to translate the var?
+     */
+    if (var->varno == relid)
+        return NULL;
+
+    /*
+     * Find the replacement var.
+     */
+    foreach(l1, root->eq_classes)
+    {
+        EquivalenceClass *ec = lfirst_node(EquivalenceClass, l1);
+        ListCell   *l2;
+
+        /* TODO Check if any other EC kind should be ignored. */
+        if (ec->ec_has_volatile || ec->ec_below_outer_join || ec->ec_broken)
+            continue;
+
+        /* Single-element EC can hardly help in translations. */
+        if (list_length(ec->ec_members) == 1)
+            continue;
+
+        /*
+         * Collect all vars of this EC and their varnos.
+         *
+         * ec->ec_relids does not help because we're only interested in a
+         * subset of EC members.
+         */
+        foreach(l2, ec->ec_members)
+        {
+            EquivalenceMember *em = lfirst_node(EquivalenceMember, l2);
+            Var           *ec_var;
+
+            /*
+             * The grouping expressions derived here are used to evaluate
+             * possibility to push aggregation down to RELOPT_BASEREL or
+             * RELOPT_JOINREL relations, and to construct reltargets for the
+             * grouped rels. We're not interested at the moment whether the
+             * relations do have children.
+             */
+            if (em->em_is_child)
+                continue;
+
+            if (!IsA(em->em_expr, Var))
+                continue;
+
+            ec_var = castNode(Var, em->em_expr);
+            if (equal(ec_var, var))
+                found_orig = true;
+            else if (ec_var->varno == relid)
+                var_translated = ec_var;
+
+            if (found_orig && var_translated)
+            {
+                /*
+                 * The replacement Var must have the same data type, otherwise
+                 * the values are not guaranteed to be grouped in the same way
+                 * as values of the original Var.
+                 */
+                if (ec_var->vartype != var->vartype)
+                    return NULL;
+
+                break;
+            }
+        }
+
+        if (found_orig)
+        {
+            /*
+             * The same expression probably does not exist in multiple ECs.
+             */
+            if (var_translated == NULL)
+            {
+                /*
+                 * Failed to translate the expression.
+                 */
+                return NULL;
+            }
+            else
+            {
+                /* Success. */
+                break;
+            }
+        }
+        else
+        {
+            /*
+             * Vars of the requested relid can be in the next ECs too.
+             */
+            var_translated = NULL;
+        }
+    }
+
+    if (!found_orig)
+        return NULL;
+
+    result = makeNode(GroupedVarInfo);
+    memcpy(result, gvi, sizeof(GroupedVarInfo));
+
+    result->gv_eval_at = bms_make_singleton(relid);
+    result->gvexpr = (Expr *) var_translated;
+
+    return result;
+}
+
 /*
  * is_redundant_with_indexclauses
  *        Test whether rinfo is redundant with any clause in the IndexClause
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 9a5930ce86..32a1bf6376 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -21,6 +21,7 @@
 #include "optimizer/paths.h"
 #include "partitioning/partbounds.h"
 #include "utils/memutils.h"
+#include "utils/selfuncs.h"
 
 
 static void make_rels_by_clause_joins(PlannerInfo *root,
@@ -35,6 +36,10 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
 static bool restriction_is_constant_false(List *restrictlist,
                                           RelOptInfo *joinrel,
                                           bool only_pushed_down);
+static RelOptInfo *make_join_rel_common(PlannerInfo *root, RelOptInfo *rel1,
+                                        RelOptInfo *rel2,
+                                        RelAggInfo *agg_info,
+                                        RelOptInfo *rel_agg_input);
 static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
                                         RelOptInfo *rel2, RelOptInfo *joinrel,
                                         SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -669,21 +674,20 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
     return true;
 }
 
-
 /*
- * make_join_rel
- *       Find or create a join RelOptInfo that represents the join of
- *       the two given rels, and add to it path information for paths
- *       created with the two rels as outer and inner rel.
- *       (The join rel may already contain paths generated from other
- *       pairs of rels that add up to the same set of base rels.)
+ * make_join_rel_common
+ *     The workhorse of make_join_rel().
+ *
+ *    'agg_info' contains the reltarget of grouped relation and everything we
+ *    need to aggregate the join result. If NULL, then the join relation should
+ *    not be grouped.
  *
- * NB: will return NULL if attempted join is not valid.  This can happen
- * when working with outer joins, or with IN or EXISTS clauses that have been
- * turned into joins.
+ *    'rel_agg_input' describes the AggPath input relation if the join output
+ *    should be aggregated. If NULL is passed, do not aggregate the join output.
  */
-RelOptInfo *
-make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
+static RelOptInfo *
+make_join_rel_common(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
+                     RelAggInfo *agg_info, RelOptInfo *rel_agg_input)
 {
     Relids        joinrelids;
     SpecialJoinInfo *sjinfo;
@@ -744,7 +748,7 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
      * goes with this particular joining.
      */
     joinrel = build_join_rel(root, joinrelids, rel1, rel2, sjinfo,
-                             &restrictlist);
+                             &restrictlist, agg_info);
 
     /*
      * If we've already proven this join is empty, we needn't consider any
@@ -757,14 +761,173 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
     }
 
     /* Add paths to the join relation. */
-    populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
-                                restrictlist);
+    if (rel_agg_input == NULL)
+    {
+        /*
+         * Simply join the input relations, whether both are plain or one of
+         * them is grouped.
+         */
+        populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
+                                    restrictlist);
+    }
+    else
+    {
+        /* The join relation is grouped. */
+        Assert(agg_info != NULL);
+
+        /*
+         * Apply partial aggregation to the paths of rel_agg_input and add the
+         * resulting paths to joinrel.
+         */
+        generate_grouping_paths(root, joinrel, rel_agg_input, agg_info);
+    }
 
     bms_free(joinrelids);
 
     return joinrel;
 }
 
+/*
+ * make_join_rel_combined
+ *     Join grouped relation to non-grouped one.
+ */
+static void
+make_join_rel_combined(PlannerInfo *root, RelOptInfo *rel1,
+                       RelOptInfo *rel2,
+                       RelAggInfo *agg_info)
+{
+    RelOptInfo *rel1_grouped;
+    RelOptInfo *rel2_grouped;
+    bool        rel1_grouped_useful = false;
+    bool        rel2_grouped_useful = false;
+
+    /* Retrieve the grouped relations. */
+    rel1_grouped = find_grouped_rel(root, rel1->relids, NULL);
+    rel2_grouped = find_grouped_rel(root, rel2->relids, NULL);
+
+    /*
+     * Dummy rel may indicate a join relation that is able to generate grouped
+     * paths as such (i.e. it has valid agg_info), but for which the path
+     * actually could not be created (e.g. only AGG_HASHED strategy was
+     * possible but work_mem was not sufficient for hash table).
+     */
+    rel1_grouped_useful = rel1_grouped != NULL && !IS_DUMMY_REL(rel1_grouped);
+    rel2_grouped_useful = rel2_grouped != NULL && !IS_DUMMY_REL(rel2_grouped);
+
+    /* Nothing to do if there's no grouped relation. */
+    if (!rel1_grouped_useful && !rel2_grouped_useful)
+        return;
+
+    if (rel1_grouped_useful)
+        make_join_rel_common(root, rel1_grouped, rel2, agg_info, NULL);
+
+    if (rel2_grouped_useful)
+        make_join_rel_common(root, rel1, rel2_grouped, agg_info, NULL);
+
+    /*
+     * Join of two grouped relations is currently not supported. In such a
+     * case, grouping of one side would change the occurrence of the other
+     * side's aggregate transient states on the input of the final
+     * aggregation. This can be handled by adjusting the transient states, but
+     * it's not worth the effort because it's hard to find a use case for this
+     * kind of join.
+     *
+     * XXX If the join of two grouped rels is implemented someday, note that
+     * both rels can have aggregates, so it'd be hard to join grouped rel to
+     * non-grouped here: 1) such a "mixed join" would require a special
+     * target, 2) both AGGSPLIT_FINAL_DESERIAL and AGGSPLIT_SIMPLE aggregates
+     * could appear in the target of the final aggregation node, originating
+     * from the grouped and the non-grouped input rel respectively.
+     */
+}
+
+/*
+ * make_join_rel
+ *       Find or create a join RelOptInfo that represents the join of
+ *       the two given rels, and add to it path information for paths
+ *       created with the two rels as outer and inner rel.
+ *       (The join rel may already contain paths generated from other
+ *       pairs of rels that add up to the same set of base rels.)
+ *
+ *       In addition to creating an ordinary join relation, try to create a
+ *       grouped one. There are two strategies to achieve that: join a grouped
+ *       relation to plain one, or join two plain relations and apply partial
+ *       aggregation to the result.
+ *
+ * NB: will return NULL if attempted join is not valid.  This can happen when
+ * working with outer joins, or with IN or EXISTS clauses that have been
+ * turned into joins. Besides that, NULL is also returned if caller is
+ * interested in a grouped relation but it could not be created.
+ *
+ * Only the plain relation is returned; if grouped relation exists, it can be
+ * retrieved using find_grouped_rel().
+ */
+RelOptInfo *
+make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
+{
+    Relids        joinrelids;
+    RelAggInfo *agg_info = NULL;
+    RelOptInfo *joinrel,
+               *joinrel_plain;
+
+    /* 1) form the plain join. */
+    joinrel = make_join_rel_common(root, rel1, rel2, NULL, NULL);
+    joinrel_plain = joinrel;
+
+    if (joinrel_plain == NULL)
+        return joinrel_plain;
+
+    /*
+     * We're done if there are no grouping expressions nor aggregates.
+     */
+    if (root->grouped_var_list == NIL)
+        return joinrel_plain;
+
+    joinrelids = bms_union(rel1->relids, rel2->relids);
+    joinrel = find_grouped_rel(root, joinrelids, &agg_info);
+
+    if (joinrel != NULL)
+    {
+        /*
+         * If the same grouped joinrel was already formed, just with the base
+         * rels divided between rel1 and rel2 in a different way, the matching
+         * agg_info should already be there.
+         */
+        Assert(agg_info != NULL);
+    }
+    else
+    {
+        /*
+         * agg_info must be created from scratch.
+         */
+        agg_info = create_rel_agg_info(root, joinrel_plain);
+
+        /* Cannot we build grouped join? */
+        if (agg_info == NULL)
+            return joinrel_plain;
+
+        /*
+         * The number of aggregate input rows is simply the number of rows of
+         * the non-grouped relation, which should have been estimated by now.
+         */
+        agg_info->input_rows = joinrel_plain->rows;
+    }
+
+    /*
+     * 2) join two plain rels and aggregate the join paths. Aggregate
+     * push-down only makes sense if the join is not the top-level one.
+     */
+    if (bms_nonempty_difference(root->all_baserels, joinrelids))
+        make_join_rel_common(root, rel1, rel2, agg_info, joinrel_plain);
+
+    /*
+     * 3) combine plain and grouped relations.
+     */
+    make_join_rel_combined(root, rel1, rel2, agg_info);
+
+    return joinrel_plain;
+}
+
 /*
  * populate_joinrel_with_paths
  *      Add paths to the given joinrel for given pair of joining relations. The
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index d60398f1c6..32f2bd32aa 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "access/nbtree.h"
 #include "catalog/pg_class.h"
 #include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
@@ -48,6 +49,8 @@ typedef struct PostponedQual
 } PostponedQual;
 
 
+static void create_aggregate_grouped_var_infos(PlannerInfo *root);
+static void create_grouping_expr_grouped_var_infos(PlannerInfo *root);
 static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
                                        Index rtindex);
 static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -270,6 +273,293 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
     }
 }
 
+/*
+ * Add GroupedVarInfo to grouped_var_list for each aggregate as well as for
+ * each possible grouping expression.
+ *
+ * root->group_pathkeys must be setup before this function is called.
+ */
+extern void
+setup_aggregate_pushdown(PlannerInfo *root)
+{
+    ListCell   *lc;
+
+    /*
+     * Isn't user interested in the aggregate push-down feature?
+     */
+    if (!enable_agg_pushdown)
+        return;
+
+    /* The feature can only be applied to grouped aggregation. */
+    if (!root->parse->groupClause)
+        return;
+
+    /*
+     * Grouping sets require multiple different groupings but the base
+     * relation can only generate one.
+     */
+    if (root->parse->groupingSets)
+        return;
+
+    /*
+     * SRF is not allowed in the aggregate argument and we don't even want it
+     * in the GROUP BY clause, so forbid it in general. It needs to be
+     * analyzed if evaluation of a GROUP BY clause containing SRF below the
+     * query targetlist would be correct. Currently it does not seem to be an
+     * important use case.
+     */
+    if (root->parse->hasTargetSRFs)
+        return;
+
+    /* Create GroupedVarInfo per (distinct) aggregate. */
+    create_aggregate_grouped_var_infos(root);
+
+    /* Isn't there any aggregate to be pushed down? */
+    if (root->grouped_var_list == NIL)
+        return;
+
+    /* Create GroupedVarInfo per grouping expression. */
+    create_grouping_expr_grouped_var_infos(root);
+
+    /* Isn't there any useful grouping expression for aggregate push-down? */
+    if (root->grouped_var_list == NIL)
+        return;
+
+    /*
+     * Now that we know that grouping can be pushed down, search for the
+     * maximum sortgroupref. The base relations may need it if extra grouping
+     * expressions get added to them.
+     */
+    Assert(root->max_sortgroupref == 0);
+    foreach(lc, root->processed_tlist)
+    {
+        TargetEntry *te = lfirst_node(TargetEntry, lc);
+
+        if (te->ressortgroupref > root->max_sortgroupref)
+            root->max_sortgroupref = te->ressortgroupref;
+    }
+}
+
+/*
+ * Create GroupedVarInfo for each distinct aggregate.
+ *
+ * If any aggregate is not suitable, set root->grouped_var_list to NIL and
+ * return.
+ */
+static void
+create_aggregate_grouped_var_infos(PlannerInfo *root)
+{
+    List       *tlist_exprs;
+    ListCell   *lc;
+
+    Assert(root->grouped_var_list == NIL);
+
+    tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+                                  PVC_INCLUDE_AGGREGATES |
+                                  PVC_RECURSE_WINDOWFUNCS);
+
+    /*
+     * Although GroupingFunc is related to root->parse->groupingSets, this
+     * field does not necessarily reflect its presence.
+     */
+    foreach(lc, tlist_exprs)
+    {
+        Expr       *expr = (Expr *) lfirst(lc);
+
+        if (IsA(expr, GroupingFunc))
+            return;
+    }
+
+    /*
+     * Aggregates within the HAVING clause need to be processed in the same
+     * way as those in the main targetlist.
+     *
+     * Note that the contained aggregates will be pushed down, but the
+     * containing HAVING clause must be ignored until the aggregation is
+     * finalized.
+     */
+    if (root->parse->havingQual != NULL)
+    {
+        List       *having_exprs;
+
+        having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+                                       PVC_INCLUDE_AGGREGATES);
+        if (having_exprs != NIL)
+            tlist_exprs = list_concat(tlist_exprs, having_exprs);
+    }
+
+    if (tlist_exprs == NIL)
+        return;
+
+    foreach(lc, tlist_exprs)
+    {
+        Expr       *expr = (Expr *) lfirst(lc);
+        Aggref       *aggref;
+        ListCell   *lc2;
+        GroupedVarInfo *gvi;
+        bool        exists;
+
+        /*
+         * tlist_exprs may also contain Vars, but we only need Aggrefs.
+         */
+        if (IsA(expr, Var))
+            continue;
+
+        aggref = castNode(Aggref, expr);
+
+        /* TODO Think if (some of) these can be handled. */
+        if (aggref->aggvariadic ||
+            aggref->aggdirectargs || aggref->aggorder ||
+            aggref->aggdistinct)
+        {
+            /*
+             * Aggregation push-down is not useful if at least one aggregate
+             * cannot be evaluated below the top-level join.
+             *
+             * XXX Is it worth freeing the GroupedVarInfos and their subtrees?
+             */
+            root->grouped_var_list = NIL;
+            break;
+        }
+
+        /* Does GroupedVarInfo for this aggregate already exist? */
+        exists = false;
+        foreach(lc2, root->grouped_var_list)
+        {
+            gvi = lfirst_node(GroupedVarInfo, lc2);
+
+            if (equal(expr, gvi->gvexpr))
+            {
+                exists = true;
+                break;
+            }
+        }
+
+        /* Construct a new GroupedVarInfo if does not exist yet. */
+        if (!exists)
+        {
+            Relids        relids;
+
+            gvi = makeNode(GroupedVarInfo);
+            gvi->gvexpr = (Expr *) copyObject(aggref);
+
+            /* Find out where the aggregate should be evaluated. */
+            relids = pull_varnos(root, (Node *) aggref);
+            if (!bms_is_empty(relids))
+                gvi->gv_eval_at = relids;
+            else
+                gvi->gv_eval_at = NULL;
+
+            root->grouped_var_list = lappend(root->grouped_var_list, gvi);
+        }
+    }
+
+    list_free(tlist_exprs);
+}
+
+/*
+ * Create GroupedVarInfo for each expression usable as grouping key.
+ *
+ * In addition to the expressions of the query targetlist, group_pathkeys is
+ * also considered the source of grouping expressions. That increases the
+ * chance to get the relation output grouped.
+ */
+static void
+create_grouping_expr_grouped_var_infos(PlannerInfo *root)
+{
+    ListCell   *l1,
+               *l2;
+    List       *exprs = NIL;
+    List       *sortgrouprefs = NIL;
+
+    /*
+     * Make sure GroupedVarInfo exists for each expression usable as grouping
+     * key.
+     */
+    foreach(l1, root->parse->groupClause)
+    {
+        SortGroupClause *sgClause;
+        TargetEntry *te;
+        Index        sortgroupref;
+        TypeCacheEntry *tce;
+        Oid            equalimageproc;
+
+        sgClause = lfirst_node(SortGroupClause, l1);
+        te = get_sortgroupclause_tle(sgClause, root->processed_tlist);
+        sortgroupref = te->ressortgroupref;
+
+        Assert(sortgroupref > 0);
+
+        /*
+         * Non-zero sortgroupref does not necessarily imply grouping
+         * expression: data can also be sorted by aggregate.
+         */
+        if (IsA(te->expr, Aggref))
+            continue;
+
+        /*
+         * The aggregate push-down feature currently supports only plain Vars
+         * as grouping expressions.
+         */
+        if (!IsA(te->expr, Var))
+        {
+            root->grouped_var_list = NIL;
+            return;
+        }
+
+        /*
+         * Aggregate push-down is only possible if equality of grouping keys
+         * per the equality operator implies bitwise equality. Otherwise, if
+         * we put keys of different byte images into the same group, we lose
+         * some information that may be needed to evaluate join clauses above
+         * the pushed-down aggregate node, or the WHERE clause.
+         *
+         * For example, the NUMERIC data type is not supported because values
+         * that fall into the same group according to the equality operator
+         * (e.g. 0 and 0.0) can have different scale.
+         */
+        tce = lookup_type_cache(exprType((Node *) te->expr),
+                                TYPECACHE_BTREE_OPFAMILY);
+        if (!OidIsValid(tce->btree_opf) ||
+            !OidIsValid(tce->btree_opintype))
+            goto fail;
+
+        equalimageproc = get_opfamily_proc(tce->btree_opf,
+                                           tce->btree_opintype,
+                                           tce->btree_opintype,
+                                           BTEQUALIMAGE_PROC);
+        if (!OidIsValid(equalimageproc) ||
+            !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+                                               tce->typcollation,
+                                               ObjectIdGetDatum(tce->btree_opintype))))
+            goto fail;
+
+        exprs = lappend(exprs, te->expr);
+        sortgrouprefs = lappend_int(sortgrouprefs, sortgroupref);
+    }
+
+    /*
+     * Construct GroupedVarInfo for each expression.
+     */
+    forboth(l1, exprs, l2, sortgrouprefs)
+    {
+        Var           *var = lfirst_node(Var, l1);
+        int            sortgroupref = lfirst_int(l2);
+        GroupedVarInfo *gvi = makeNode(GroupedVarInfo);
+
+        gvi->gvexpr = (Expr *) copyObject(var);
+        gvi->sortgroupref = sortgroupref;
+
+        /* Find out where the expression should be evaluated. */
+        gvi->gv_eval_at = bms_make_singleton(var->varno);
+
+        root->grouped_var_list = lappend(root->grouped_var_list, gvi);
+    }
+    return;
+
+fail:
+    root->grouped_var_list = NIL;
+}
 
 /*****************************************************************************
  *
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index b4bfbe6e32..1dc2ac4bbe 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -66,6 +66,7 @@ query_planner(PlannerInfo *root,
      * here.
      */
     root->join_rel_list = makeNode(RelInfoList);
+    root->agg_info_list = makeNode(RelInfoList);
     root->join_rel_level = NULL;
     root->join_cur_level = 0;
     root->canon_pathkeys = NIL;
@@ -76,6 +77,7 @@ query_planner(PlannerInfo *root,
     root->placeholder_list = NIL;
     root->placeholder_array = NULL;
     root->placeholder_array_size = 0;
+    root->grouped_var_list = NIL;
     root->fkey_list = NIL;
     root->initial_rels = NIL;
 
@@ -254,6 +256,16 @@ query_planner(PlannerInfo *root,
      */
     extract_restriction_or_clauses(root);
 
+    /*
+     * If the query result can be grouped, check if any grouping can be
+     * performed below the top-level join. If so, setup
+     * root->grouped_var_list.
+     *
+     * The base relations should be fully initialized now, so that we have
+     * enough info to decide whether grouping is possible.
+     */
+    setup_aggregate_pushdown(root);
+
     /*
      * Now expand appendrels by adding "otherrels" for their children.  We
      * delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d6ba7589f3..f19e218309 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -637,6 +637,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
     memset(root->upper_rels, 0, sizeof(root->upper_rels));
     memset(root->upper_targets, 0, sizeof(root->upper_targets));
     root->processed_tlist = NIL;
+    root->max_sortgroupref = 0;
     root->update_colnos = NIL;
     root->grouping_map = NULL;
     root->minmax_aggs = NIL;
@@ -3877,11 +3878,11 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
         bool        force_rel_creation;
 
         /*
-         * If we're doing partitionwise aggregation at this level, force
-         * creation of a partially_grouped_rel so we can add partitionwise
-         * paths to it.
+         * If we're doing partitionwise aggregation at this level or if
+         * aggregate push-down succeeded to create some paths, force creation
+         * of a partially_grouped_rel so we can add the related paths to it.
          */
-        force_rel_creation = (patype == PARTITIONWISE_AGGREGATE_PARTIAL);
+        force_rel_creation = patype == PARTITIONWISE_AGGREGATE_PARTIAL;
 
         partially_grouped_rel =
             create_partial_grouping_paths(root,
@@ -3914,10 +3915,14 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 
     /* Gather any partially grouped partial paths. */
     if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
-    {
         gather_grouping_paths(root, partially_grouped_rel);
+
+    /*
+     * The non-partial paths can come either from the Gather above or from
+     * aggregate push-down.
+     */
+    if (partially_grouped_rel && partially_grouped_rel->pathlist)
         set_cheapest(partially_grouped_rel);
-    }
 
     /*
      * Estimate number of groups.
@@ -6899,6 +6904,19 @@ create_partial_grouping_paths(PlannerInfo *root,
     bool        can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
     bool        can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
 
+    /*
+     * The output relation could have been already created due to aggregate
+     * push-down.
+     */
+    partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
+
+    /*
+     * If the relation already exists, it must have been created by aggregate
+     * pushdown. We can't check how exactly it got created, but we can at
+     * least check that aggregate pushdown is enabled.
+     */
+    Assert(enable_agg_pushdown || partially_grouped_rel == NULL);
+
     /*
      * Consider whether we should generate partially aggregated non-partial
      * paths.  We can only do this if we have a non-partial path, and only if
@@ -6921,20 +6939,30 @@ create_partial_grouping_paths(PlannerInfo *root,
     /*
      * If we can't partially aggregate partial paths, and we can't partially
      * aggregate non-partial paths, then don't bother creating the new
-     * RelOptInfo at all, unless the caller specified force_rel_creation.
+     * RelOptInfo at all, unless the caller specified force_rel_creation. However
      */
     if (cheapest_total_path == NULL &&
         cheapest_partial_path == NULL &&
         !force_rel_creation)
-        return NULL;
+    {
+        /*
+         * If partially_grouped_rel exists, it should contain paths generated
+         * by the aggregate push-down feature, so the caller is interested in
+         * it.
+         */
+        return partially_grouped_rel;
+    }
 
     /*
      * Build a new upper relation to represent the result of partially
-     * aggregating the rows from the input relation.
-     */
-    partially_grouped_rel = fetch_upper_rel(root,
-                                            UPPERREL_PARTIAL_GROUP_AGG,
-                                            grouped_rel->relids);
+     * aggregating the rows from the input relation. The relation may already
+     * exist due to aggregate pushdown, in which case we don't need to create
+     * it.
+     */
+    if (partially_grouped_rel == NULL)
+        partially_grouped_rel = fetch_upper_rel(root,
+                                                UPPERREL_PARTIAL_GROUP_AGG,
+                                                grouped_rel->relids);
     partially_grouped_rel->consider_parallel =
         grouped_rel->consider_parallel;
     partially_grouped_rel->reloptkind = grouped_rel->reloptkind;
@@ -6948,10 +6976,19 @@ create_partial_grouping_paths(PlannerInfo *root,
      * emit the same tlist as regular aggregate paths, because (1) we must
      * include Vars and Aggrefs needed in HAVING, which might not appear in
      * the result tlist, and (2) the Aggrefs must be set in partial mode.
-     */
-    partially_grouped_rel->reltarget =
-        make_partial_grouping_target(root, grouped_rel->reltarget,
-                                     extra->havingQual);
+     *
+     * If the target was already created for the sake of aggregate push-down,
+     * it should be compatible with what we'd create here.
+     *
+     * XXX If fetch_upper_rel() had to create a new relation (i.e. aggregate
+     * push-down generated no paths), it created an empty target. Should we
+     * change the convention and have it assign NULL to reltarget instead?  Or
+     * should we introduce a function like is_pathtarget_empty()?
+     */
+    if (partially_grouped_rel->reltarget->exprs == NIL)
+        partially_grouped_rel->reltarget =
+            make_partial_grouping_target(root, grouped_rel->reltarget,
+                                         extra->havingQual);
 
     if (!extra->partial_costs_set)
     {
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ed9c1e6187..48ab092449 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -2920,6 +2920,39 @@ fix_join_expr_mutator(Node *node, fix_join_expr_context *context)
         /* No referent found for Var */
         elog(ERROR, "variable not found in subplan target lists");
     }
+    if (IsA(node, Aggref))
+    {
+        Aggref       *aggref = castNode(Aggref, node);
+
+        /*
+         * The upper plan targetlist can contain Aggref whose value has
+         * already been evaluated by the subplan. However this can only happen
+         * with specific value of aggsplit.
+         */
+        if (aggref->aggsplit == AGGSPLIT_INITIAL_SERIAL)
+        {
+            /* See if the Aggref has bubbled up from a lower plan node */
+            if (context->outer_itlist && context->outer_itlist->has_non_vars)
+            {
+                newvar = search_indexed_tlist_for_non_var((Expr *) node,
+                                                          context->outer_itlist,
+                                                          OUTER_VAR);
+                if (newvar)
+                    return (Node *) newvar;
+            }
+            if (context->inner_itlist && context->inner_itlist->has_non_vars)
+            {
+                newvar = search_indexed_tlist_for_non_var((Expr *) node,
+                                                          context->inner_itlist,
+                                                          INNER_VAR);
+                if (newvar)
+                    return (Node *) newvar;
+            }
+        }
+
+        /* No referent found for Aggref */
+        elog(ERROR, "Aggref not found in subplan target lists");
+    }
     if (IsA(node, PlaceHolderVar))
     {
         PlaceHolderVar *phv = (PlaceHolderVar *) node;
diff --git a/src/backend/optimizer/prep/prepagg.c b/src/backend/optimizer/prep/prepagg.c
index 2d31ad6bed..cb97bc4b65 100644
--- a/src/backend/optimizer/prep/prepagg.c
+++ b/src/backend/optimizer/prep/prepagg.c
@@ -64,6 +64,10 @@ static int    find_compatible_trans(PlannerInfo *root, Aggref *newagg,
                                   Datum initValue, bool initValueIsNull,
                                   List *transnos);
 static Datum GetAggInitVal(Datum textInitVal, Oid transtype);
+static void get_agg_clause_costs_trans(PlannerInfo *root, AggSplit aggsplit,
+                                       AggTransInfo *transinfo, AggClauseCosts *costs);
+static void get_agg_clause_costs_agginfo(PlannerInfo *root, AggSplit aggsplit,
+                                         AggInfo *agginfo, AggClauseCosts *costs);
 
 /* -----------------
  * Resolve the transition type of all Aggrefs, and determine which Aggrefs
@@ -546,132 +550,176 @@ get_agg_clause_costs(PlannerInfo *root, AggSplit aggsplit, AggClauseCosts *costs
     {
         AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
 
-        /*
-         * Add the appropriate component function execution costs to
-         * appropriate totals.
-         */
-        if (DO_AGGSPLIT_COMBINE(aggsplit))
-        {
-            /* charge for combining previously aggregated states */
-            add_function_cost(root, transinfo->combinefn_oid, NULL,
-                              &costs->transCost);
-        }
-        else
-            add_function_cost(root, transinfo->transfn_oid, NULL,
-                              &costs->transCost);
-        if (DO_AGGSPLIT_DESERIALIZE(aggsplit) &&
-            OidIsValid(transinfo->deserialfn_oid))
-            add_function_cost(root, transinfo->deserialfn_oid, NULL,
-                              &costs->transCost);
-        if (DO_AGGSPLIT_SERIALIZE(aggsplit) &&
-            OidIsValid(transinfo->serialfn_oid))
-            add_function_cost(root, transinfo->serialfn_oid, NULL,
-                              &costs->finalCost);
+        get_agg_clause_costs_trans(root, aggsplit, transinfo, costs);
+    }
 
-        /*
-         * These costs are incurred only by the initial aggregate node, so we
-         * mustn't include them again at upper levels.
-         */
-        if (!DO_AGGSPLIT_COMBINE(aggsplit))
-        {
-            /* add the input expressions' cost to per-input-row costs */
-            QualCost    argcosts;
+    foreach(lc, root->agginfos)
+    {
+        AggInfo    *agginfo = (AggInfo *) lfirst(lc);
 
-            cost_qual_eval_node(&argcosts, (Node *) transinfo->args, root);
-            costs->transCost.startup += argcosts.startup;
-            costs->transCost.per_tuple += argcosts.per_tuple;
+        get_agg_clause_costs_agginfo(root, aggsplit, agginfo, costs);
 
-            /*
-             * Add any filter's cost to per-input-row costs.
-             *
-             * XXX Ideally we should reduce input expression costs according
-             * to filter selectivity, but it's not clear it's worth the
-             * trouble.
-             */
-            if (transinfo->aggfilter)
-            {
-                cost_qual_eval_node(&argcosts, (Node *) transinfo->aggfilter,
-                                    root);
-                costs->transCost.startup += argcosts.startup;
-                costs->transCost.per_tuple += argcosts.per_tuple;
-            }
-        }
+    }
+}
+
+/*
+ * Like get_agg_clause_costs(), but only consider aggregates passed in the
+ * 'aggrefs' list.
+ */
+void
+get_agg_clause_costs_some(PlannerInfo *root, AggSplit aggsplit, List *aggrefs,
+                          AggClauseCosts *costs)
+{
+    ListCell    *lc;
+
+    foreach(lc, aggrefs)
+    {
+        Aggref    *aggref    = lfirst_node(Aggref, lc);
+        AggTransInfo *aggtrans = (AggTransInfo *) list_nth(root->aggtransinfos,
+                                                           aggref->aggtransno);
+        AggInfo    *agginfo = list_nth(root->agginfos, aggref->aggno);
+
+
+        get_agg_clause_costs_trans(root, aggsplit, aggtrans, costs);
+        get_agg_clause_costs_agginfo(root, aggsplit, agginfo, costs);
+    }
+}
+
+/*
+ * Sub-routine of get_agg_clause_costs(), to process a single AggTransInfo.
+ */
+static void
+get_agg_clause_costs_trans(PlannerInfo *root, AggSplit aggsplit,
+                           AggTransInfo *transinfo, AggClauseCosts *costs)
+{
+    /*
+     * Add the appropriate component function execution costs to appropriate
+     * totals.
+     */
+    if (DO_AGGSPLIT_COMBINE(aggsplit))
+    {
+        /* charge for combining previously aggregated states */
+        add_function_cost(root, transinfo->combinefn_oid, NULL,
+                          &costs->transCost);
+    }
+    else
+        add_function_cost(root, transinfo->transfn_oid, NULL,
+                          &costs->transCost);
+    if (DO_AGGSPLIT_DESERIALIZE(aggsplit) &&
+        OidIsValid(transinfo->deserialfn_oid))
+        add_function_cost(root, transinfo->deserialfn_oid, NULL,
+                          &costs->transCost);
+    if (DO_AGGSPLIT_SERIALIZE(aggsplit) &&
+        OidIsValid(transinfo->serialfn_oid))
+        add_function_cost(root, transinfo->serialfn_oid, NULL,
+                          &costs->finalCost);
+
+    /*
+     * These costs are incurred only by the initial aggregate node, so we
+     * mustn't include them again at upper levels.
+     */
+    if (!DO_AGGSPLIT_COMBINE(aggsplit))
+    {
+        /* add the input expressions' cost to per-input-row costs */
+        QualCost    argcosts;
+
+        cost_qual_eval_node(&argcosts, (Node *) transinfo->args, root);
+        costs->transCost.startup += argcosts.startup;
+        costs->transCost.per_tuple += argcosts.per_tuple;
 
         /*
-         * If the transition type is pass-by-value then it doesn't add
-         * anything to the required size of the hashtable.  If it is
-         * pass-by-reference then we have to add the estimated size of the
-         * value itself, plus palloc overhead.
+         * Add any filter's cost to per-input-row costs.
+         *
+         * XXX Ideally we should reduce input expression costs according to
+         * filter selectivity, but it's not clear it's worth the trouble.
          */
-        if (!transinfo->transtypeByVal)
+        if (transinfo->aggfilter)
         {
-            int32        avgwidth;
+            cost_qual_eval_node(&argcosts, (Node *) transinfo->aggfilter,
+                                root);
+            costs->transCost.startup += argcosts.startup;
+            costs->transCost.per_tuple += argcosts.per_tuple;
+        }
+    }
 
-            /* Use average width if aggregate definition gave one */
-            if (transinfo->aggtransspace > 0)
-                avgwidth = transinfo->aggtransspace;
-            else if (transinfo->transfn_oid == F_ARRAY_APPEND)
-            {
-                /*
-                 * If the transition function is array_append(), it'll use an
-                 * expanded array as transvalue, which will occupy at least
-                 * ALLOCSET_SMALL_INITSIZE and possibly more.  Use that as the
-                 * estimate for lack of a better idea.
-                 */
-                avgwidth = ALLOCSET_SMALL_INITSIZE;
-            }
-            else
-            {
-                avgwidth = get_typavgwidth(transinfo->aggtranstype, transinfo->aggtranstypmod);
-            }
+    /*
+     * If the transition type is pass-by-value then it doesn't add anything to
+     * the required size of the hashtable.  If it is pass-by-reference then we
+     * have to add the estimated size of the value itself, plus palloc
+     * overhead.
+     */
+    if (!transinfo->transtypeByVal)
+    {
+        int32        avgwidth;
 
-            avgwidth = MAXALIGN(avgwidth);
-            costs->transitionSpace += avgwidth + 2 * sizeof(void *);
-        }
-        else if (transinfo->aggtranstype == INTERNALOID)
+        /* Use average width if aggregate definition gave one */
+        if (transinfo->aggtransspace > 0)
+            avgwidth = transinfo->aggtransspace;
+        else if (transinfo->transfn_oid == F_ARRAY_APPEND)
         {
             /*
-             * INTERNAL transition type is a special case: although INTERNAL
-             * is pass-by-value, it's almost certainly being used as a pointer
-             * to some large data structure.  The aggregate definition can
-             * provide an estimate of the size.  If it doesn't, then we assume
-             * ALLOCSET_DEFAULT_INITSIZE, which is a good guess if the data is
-             * being kept in a private memory context, as is done by
-             * array_agg() for instance.
+             * If the transition function is array_append(), it'll use an
+             * expanded array as transvalue, which will occupy at least
+             * ALLOCSET_SMALL_INITSIZE and possibly more.  Use that as the
+             * estimate for lack of a better idea.
              */
-            if (transinfo->aggtransspace > 0)
-                costs->transitionSpace += transinfo->aggtransspace;
-            else
-                costs->transitionSpace += ALLOCSET_DEFAULT_INITSIZE;
+            avgwidth = ALLOCSET_SMALL_INITSIZE;
+        }
+        else
+        {
+            avgwidth = get_typavgwidth(transinfo->aggtranstype, transinfo->aggtranstypmod);
         }
-    }
 
-    foreach(lc, root->agginfos)
+        avgwidth = MAXALIGN(avgwidth);
+        costs->transitionSpace += avgwidth + 2 * sizeof(void *);
+    }
+    else if (transinfo->aggtranstype == INTERNALOID)
     {
-        AggInfo    *agginfo = lfirst_node(AggInfo, lc);
-        Aggref       *aggref = linitial_node(Aggref, agginfo->aggrefs);
-
         /*
-         * Add the appropriate component function execution costs to
-         * appropriate totals.
+         * INTERNAL transition type is a special case: although INTERNAL is
+         * pass-by-value, it's almost certainly being used as a pointer to
+         * some large data structure.  The aggregate definition can provide an
+         * estimate of the size.  If it doesn't, then we assume
+         * ALLOCSET_DEFAULT_INITSIZE, which is a good guess if the data is
+         * being kept in a private memory context, as is done by array_agg()
+         * for instance.
          */
-        if (!DO_AGGSPLIT_SKIPFINAL(aggsplit) &&
-            OidIsValid(agginfo->finalfn_oid))
-            add_function_cost(root, agginfo->finalfn_oid, NULL,
-                              &costs->finalCost);
+        if (transinfo->aggtransspace > 0)
+            costs->transitionSpace += transinfo->aggtransspace;
+        else
+            costs->transitionSpace += ALLOCSET_DEFAULT_INITSIZE;
+    }
+}
 
-        /*
-         * If there are direct arguments, treat their evaluation cost like the
-         * cost of the finalfn.
-         */
-        if (aggref->aggdirectargs)
-        {
-            QualCost    argcosts;
+/*
+ * Sub-routine of get_agg_clause_costs(), to process a single AggInfo.
+ */
+static void
+get_agg_clause_costs_agginfo(PlannerInfo *root, AggSplit aggsplit,
+                             AggInfo *agginfo, AggClauseCosts *costs)
+{
+    Aggref       *aggref = linitial_node(Aggref, agginfo->aggrefs);
 
-            cost_qual_eval_node(&argcosts, (Node *) aggref->aggdirectargs,
-                                root);
-            costs->finalCost.startup += argcosts.startup;
-            costs->finalCost.per_tuple += argcosts.per_tuple;
-        }
+    /*
+     * Add the appropriate component function execution costs to appropriate
+     * totals.
+     */
+    if (!DO_AGGSPLIT_SKIPFINAL(aggsplit) &&
+        OidIsValid(agginfo->finalfn_oid))
+        add_function_cost(root, agginfo->finalfn_oid, NULL,
+                          &costs->finalCost);
+
+    /*
+     * If there are direct arguments, treat their evaluation cost like the
+     * cost of the finalfn.
+     */
+    if (aggref->aggdirectargs)
+    {
+        QualCost    argcosts;
+
+        cost_qual_eval_node(&argcosts, (Node *) aggref->aggdirectargs,
+                            root);
+        costs->finalCost.startup += argcosts.startup;
+        costs->finalCost.per_tuple += argcosts.per_tuple;
     }
 }
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index cfb314e11d..023d0417b3 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -1009,6 +1009,7 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
     memset(subroot->upper_rels, 0, sizeof(subroot->upper_rels));
     memset(subroot->upper_targets, 0, sizeof(subroot->upper_targets));
     subroot->processed_tlist = NIL;
+    root->max_sortgroupref = 0;
     subroot->update_colnos = NIL;
     subroot->grouping_map = NULL;
     subroot->minmax_aggs = NIL;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 4478036bb6..ac04142768 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2670,8 +2670,7 @@ create_projection_path(PlannerInfo *root,
     pathnode->path.pathtype = T_Result;
     pathnode->path.parent = rel;
     pathnode->path.pathtarget = target;
-    /* For now, assume we are above any joins, so no parameterization */
-    pathnode->path.param_info = NULL;
+    pathnode->path.param_info = subpath->param_info;
     pathnode->path.parallel_aware = false;
     pathnode->path.parallel_safe = rel->consider_parallel &&
         subpath->parallel_safe &&
@@ -3163,6 +3162,129 @@ create_agg_path(PlannerInfo *root,
     return pathnode;
 }
 
+/*
+ * create_agg_sorted_path
+ *        Creates a pathnode performing sorted aggregation/grouping
+ *
+ * Apply AGG_SORTED aggregation path to subpath if it's suitably sorted.
+ *
+ * NULL is returned if sorting of subpath output is not suitable.
+ */
+AggPath *
+create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
+                       RelAggInfo *agg_info)
+{
+    List       *agg_exprs;
+    AggSplit    aggsplit;
+    AggClauseCosts agg_costs;
+    PathTarget *target;
+    double        dNumGroups;
+    AggPath    *result = NULL;
+
+    aggsplit = AGGSPLIT_INITIAL_SERIAL;
+    agg_exprs = agg_info->agg_exprs;
+    target = agg_info->target;
+
+    /* group_pathkeys are necessary to evaluate the sorting. */
+    if (agg_info->group_pathkeys == NIL)
+        return NULL;
+
+    /*
+     * The input path must be sorted in a specific way, but if it's not sorted
+     * at all, it's not useful for AGG_SORTED.
+     */
+    if (subpath->pathkeys == NIL)
+        return NULL;
+
+    /* Are the grouping clauses suitable for sorted aggregation? */
+    if (!grouping_is_sortable(agg_info->group_clauses))
+        return NULL;
+
+    /*
+     * Is the input path sorted enough for this grouping? TODO Consider using
+     * incremental sort if the sorting is "almost sufficient".
+     */
+    if (!pathkeys_contained_in(agg_info->group_pathkeys, subpath->pathkeys))
+        return NULL;
+
+    MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+    get_agg_clause_costs_some(root, aggsplit, agg_exprs, &agg_costs);
+
+    Assert(agg_info->group_exprs != NIL);
+    dNumGroups = estimate_num_groups(root, agg_info->group_exprs,
+                                     subpath->rows, NULL, NULL);
+
+    /*
+     * qual is NIL because the HAVING clause cannot be evaluated until the
+     * final value of the aggregate is known.
+     */
+    result = create_agg_path(root, rel, subpath, target,
+                             AGG_SORTED, aggsplit,
+                             agg_info->group_clauses,
+                             NIL,    /* qual for HAVING clause */
+                             &agg_costs,
+                             dNumGroups);
+
+    /* The agg path should require no fewer parameters than the plain one. */
+    result->path.param_info = subpath->param_info;
+
+    return result;
+}
+
+/*
+ * Apply AGG_HASHED aggregation to subpath.
+ */
+AggPath *
+create_agg_hashed_path(PlannerInfo *root, RelOptInfo *rel,
+                       Path *subpath, RelAggInfo *agg_info)
+{
+    bool        can_hash;
+    List       *agg_exprs;
+    AggSplit    aggsplit;
+    AggClauseCosts agg_costs;
+    PathTarget *target;
+    double        dNumGroups;
+    Query       *parse = root->parse;
+    AggPath    *result = NULL;
+
+    /* Do not try to create hash table for each parameter value. */
+    Assert(subpath->param_info == NULL);
+
+    aggsplit = AGGSPLIT_INITIAL_SERIAL;
+    agg_exprs = agg_info->agg_exprs;
+    target = agg_info->target;
+
+    MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+    get_agg_clause_costs_some(root, aggsplit, agg_exprs, &agg_costs);
+
+    can_hash = (parse->groupClause != NIL &&
+                parse->groupingSets == NIL &&
+                root->numOrderedAggs == 0 &&
+                grouping_is_hashable(parse->groupClause));
+
+    if (can_hash)
+    {
+        Assert(agg_info->group_exprs != NIL);
+        dNumGroups = estimate_num_groups(root, agg_info->group_exprs,
+                                         subpath->rows, NULL, NULL);
+
+        /*
+         * qual is NIL because the HAVING clause cannot be evaluated until the
+         * final value of the aggregate is known.
+         */
+        result = create_agg_path(root, rel, subpath,
+                                 target,
+                                 AGG_HASHED,
+                                 aggsplit,
+                                 agg_info->group_clauses,
+                                 NIL, /* qual for HAVING clause */
+                                 &agg_costs,
+                                 dNumGroups);
+    }
+
+    return result;
+}
+
 /*
  * create_groupingsets_path
  *      Creates a pathnode that represents performing GROUPING SETS aggregation
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index f8ccd1a5db..67f5bbe59f 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -18,18 +18,24 @@
 
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
+#include "catalog/pg_class_d.h"
+#include "catalog/pg_constraint.h"
 #include "optimizer/appendinfo.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/inherit.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/placeholder.h"
 #include "optimizer/plancat.h"
+#include "optimizer/planner.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
 #include "parser/parse_relation.h"
 #include "utils/hsearch.h"
+#include "utils/selfuncs.h"
 #include "utils/lsyscache.h"
 
 
@@ -77,6 +83,11 @@ static void build_child_join_reltarget(PlannerInfo *root,
                                        RelOptInfo *childrel,
                                        int nappinfos,
                                        AppendRelInfo **appinfos);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+                                  PathTarget *target, PathTarget *agg_input,
+                                  List *gvis, List **group_exprs_extra_p);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
 
 
 /*
@@ -375,6 +386,110 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
     return rel;
 }
 
+/*
+ * build_simple_grouped_rel
+ *      Construct a new RelOptInfo for a grouped base relation out of an
+ *      existing non-grouped relation. On success, pointer to the corresponding
+ *      RelAggInfo is stored in *agg_info_p in addition to returning the grouped
+ *      relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, int relid,
+                         RelAggInfo **agg_info_p)
+{
+    RangeTblEntry *rte;
+    RelOptInfo *rel_plain,
+               *rel_grouped;
+    RelAggInfo *agg_info;
+
+    /* Isn't there any grouping expression to be pushed down? */
+    if (root->grouped_var_list == NIL)
+        return NULL;
+
+    rel_plain = root->simple_rel_array[relid];
+
+    /* Caller should only pass rti that represents base relation. */
+    Assert(rel_plain != NULL);
+
+    /*
+     * Not all RTE kinds are supported when grouping is considered.
+     *
+     * TODO Consider relaxing some of these restrictions.
+     */
+    rte = root->simple_rte_array[rel_plain->relid];
+    if (rte->rtekind != RTE_RELATION ||
+        rte->relkind == RELKIND_FOREIGN_TABLE ||
+        rte->tablesample != NULL)
+        return NULL;
+
+    /*
+     * Grouped append relation is not supported yet.
+     */
+    if (rte->inh)
+        return NULL;
+
+    /*
+     * Currently we do not support child relations ("other rels").
+     */
+    if (rel_plain->reloptkind != RELOPT_BASEREL)
+        return NULL;
+
+    /*
+     * Prepare the information we need for aggregation of the rel contents.
+     */
+    agg_info = create_rel_agg_info(root, rel_plain);
+    if (agg_info == NULL)
+        return NULL;
+
+    /*
+     * TODO Consider if 1) a flat copy is o.k., 2) it's safer in terms of
+     * adding new fields to RelOptInfo) to copy everything and then reset some
+     * fields, or to zero the structure and copy individual fields.
+     */
+    rel_grouped = makeNode(RelOptInfo);
+    memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
+
+    /*
+     * Note on consider_startup: while the AGG_HASHED strategy needs the whole
+     * relation, AGG_SORTED does not. Therefore we do not force
+     * consider_startup to false.
+     */
+
+    /*
+     * Set the appropriate target for grouped paths.
+     *
+     * reltarget should match the target of partially aggregated paths.
+     */
+    rel_grouped->reltarget = agg_info->target;
+
+    /*
+     * Grouped paths must not be mixed with the plain ones.
+     */
+    rel_grouped->pathlist = NIL;
+    rel_grouped->partial_pathlist = NIL;
+    rel_grouped->cheapest_startup_path = NULL;
+    rel_grouped->cheapest_total_path = NULL;
+    rel_grouped->cheapest_unique_path = NULL;
+    rel_grouped->cheapest_parameterized_paths = NIL;
+
+    /*
+     * The number of aggregation input rows is simply the number of rows of
+     * the non-grouped relation, which should have been estimated by now.
+     */
+    agg_info->input_rows = rel_plain->rows;
+
+    /*
+     * The number of output rows is supposedly different (lower) due to
+     * grouping.
+     */
+    rel_grouped->rows = estimate_num_groups(root, agg_info->group_exprs,
+                                            agg_info->input_rows, NULL,
+                                            NULL);
+
+    *agg_info_p = agg_info;
+    return rel_grouped;
+}
+
 /*
  * find_base_rel
  *      Find a base or other relation entry, which must already exist.
@@ -423,16 +538,20 @@ build_rel_hash(RelInfoList *list)
     /* Insert all the already-existing joinrels */
     foreach(l, list->items)
     {
-        RelOptInfo       *rel = lfirst_node(RelOptInfo, l);
+        void       *item = lfirst(l);
         RelInfoEntry *hentry;
         bool        found;
+        Relids        relids;
+
+        Assert(IsA(item, RelOptInfo));
+        relids = ((RelOptInfo *) item)->relids;
 
         hentry = (RelInfoEntry *) hash_search(hashtab,
-                                              &rel->relids,
+                                              &relids,
                                               HASH_ENTER,
                                               &found);
         Assert(!found);
-        hentry->data = rel;
+        hentry->data = item;
     }
 
     list->hash = hashtab;
@@ -481,9 +600,17 @@ find_rel_info(RelInfoList *list, Relids relids)
 
         foreach(l, list->items)
         {
-            RelOptInfo   *item = lfirst_node(RelOptInfo, l);
+            void       *item = lfirst(l);
+            Relids        item_relids = NULL;
 
-            if (bms_equal(item->relids, relids))
+            Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
+
+            if (IsA(item, RelOptInfo))
+                item_relids = ((RelOptInfo *) item)->relids;
+            else if (IsA(item, RelAggInfo))
+                item_relids = ((RelAggInfo *) item)->relids;
+
+            if (bms_equal(item_relids, relids))
                 return item;
         }
     }
@@ -508,23 +635,31 @@ find_join_rel(PlannerInfo *root, Relids relids)
  *        hashtable if there is one.
  */
 static void
-add_rel_info(RelInfoList *list, RelOptInfo *rel)
+add_rel_info(RelInfoList *list, void *data)
 {
+    Assert(IsA(data, RelOptInfo) || IsA(data, RelAggInfo));
+
     /* GEQO requires us to append the new joinrel to the end of the list! */
-    list->items = lappend(list->items, rel);
+    list->items = lappend(list->items, data);
 
     /* store it into the auxiliary hashtable if there is one. */
     if (list->hash)
     {
+        Relids        relids;
         RelInfoEntry *hentry;
         bool        found;
 
+        if (IsA(data, RelOptInfo))
+            relids = ((RelOptInfo *) data)->relids;
+        else if (IsA(data, RelAggInfo))
+            relids = ((RelAggInfo *) data)->relids;
+
         hentry = (RelInfoEntry *) hash_search(list->hash,
-                                              &rel->relids,
+                                              &relids,
                                               HASH_ENTER,
                                               &found);
         Assert(!found);
-        hentry->data = rel;
+        hentry->data = data;
     }
 }
 
@@ -539,6 +674,63 @@ add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
     add_rel_info(root->join_rel_list, joinrel);
 }
 
+/*
+ * add_grouped_rel
+ *        Add grouped base or join relation to the list of grouped relations in
+ *        the given PlannerInfo. Also add the corresponding RelAggInfo to
+ *        agg_info_list.
+ */
+void
+add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info)
+{
+    add_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], rel);
+    add_rel_info(root->agg_info_list, agg_info);
+}
+
+/*
+ * find_grouped_rel
+ *      Returns grouped relation entry (base or join relation) corresponding to
+ *      'relids' or NULL if none exists.
+ *
+ * If agg_info_p is a valid pointer, then pointer to RelAggInfo that
+ * corresponds to the relation returned is assigned to *agg_info_p.
+ *
+ * The call fetch_upper_rel(root, UPPERREL_PARTIAL_GROUP_AGG, ...) should
+ * return the same relation if it exists, however the behavior is different if
+ * the relation is not there. find_grouped_rel() should be used in
+ * query_planner() and subroutines.
+ */
+RelOptInfo *
+find_grouped_rel(PlannerInfo *root, Relids relids, RelAggInfo **agg_info_p)
+{
+    RelOptInfo *rel;
+
+    rel = (RelOptInfo *) find_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG],
+                                       relids);
+    if (rel == NULL)
+    {
+        if (agg_info_p)
+            *agg_info_p = NULL;
+
+        return NULL;
+    }
+
+    /* Is caller interested in RelAggInfo? */
+    if (agg_info_p)
+    {
+        RelAggInfo *agg_info;
+
+        agg_info = (RelAggInfo *) find_rel_info(root->agg_info_list, relids);
+
+        /* The relation exists, so the agg_info should be there too. */
+        Assert(agg_info != NULL);
+
+        *agg_info_p = agg_info;
+    }
+
+    return rel;
+}
+
 /*
  * set_foreign_rel_properties
  *        Set up foreign-join fields if outer and inner relation are foreign
@@ -601,6 +793,7 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
  * 'restrictlist_ptr': result variable.  If not NULL, *restrictlist_ptr
  *        receives the list of RestrictInfo nodes that apply to this
  *        particular pair of joinable relations.
+ * 'agg_info' indicates that grouped join relation should be created.
  *
  * restrictlist_ptr makes the routine's API a little grotty, but it saves
  * duplicated calculation of the restrictlist...
@@ -611,10 +804,12 @@ build_join_rel(PlannerInfo *root,
                RelOptInfo *outer_rel,
                RelOptInfo *inner_rel,
                SpecialJoinInfo *sjinfo,
-               List **restrictlist_ptr)
+               List **restrictlist_ptr,
+               RelAggInfo *agg_info)
 {
     RelOptInfo *joinrel;
     List       *restrictlist;
+    bool        grouped = agg_info != NULL;
 
     /* This function should be used only for join between parents. */
     Assert(!IS_OTHER_REL(outer_rel) && !IS_OTHER_REL(inner_rel));
@@ -622,7 +817,8 @@ build_join_rel(PlannerInfo *root,
     /*
      * See if we already have a joinrel for this set of base rels.
      */
-    joinrel = find_join_rel(root, joinrelids);
+    joinrel = !grouped ? find_join_rel(root, joinrelids) :
+        find_grouped_rel(root, joinrelids, NULL);
 
     if (joinrel)
     {
@@ -721,9 +917,21 @@ build_join_rel(PlannerInfo *root,
      * and inner rels we first try to build it from.  But the contents should
      * be the same regardless.
      */
-    build_joinrel_tlist(root, joinrel, outer_rel);
-    build_joinrel_tlist(root, joinrel, inner_rel);
-    add_placeholders_to_joinrel(root, joinrel, outer_rel, inner_rel);
+    if (!grouped)
+    {
+        joinrel->reltarget = create_empty_pathtarget();
+        build_joinrel_tlist(root, joinrel, outer_rel);
+        build_joinrel_tlist(root, joinrel, inner_rel);
+        add_placeholders_to_joinrel(root, joinrel, outer_rel, inner_rel);
+    }
+    else
+    {
+        /*
+         * The target for grouped join should already have its cost and width
+         * computed, see create_rel_agg_info().
+         */
+        joinrel->reltarget = agg_info->target;
+    }
 
     /*
      * add_placeholders_to_joinrel also took care of adding the ph_lateral
@@ -755,49 +963,75 @@ build_join_rel(PlannerInfo *root,
     joinrel->has_eclass_joins = has_relevant_eclass_joinclause(root, joinrel);
 
     /* Store the partition information. */
-    build_joinrel_partition_info(joinrel, outer_rel, inner_rel, restrictlist,
-                                 sjinfo->jointype);
+    if (!grouped)
+        build_joinrel_partition_info(joinrel, outer_rel, inner_rel,
+                                     restrictlist, sjinfo->jointype);
 
-    /*
-     * Set estimates of the joinrel's size.
-     */
-    set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
-                               sjinfo, restrictlist);
+    if (!grouped)
+    {
+        /*
+         * Set estimates of the joinrel's size.
+         */
+        set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
+                                   sjinfo, restrictlist);
 
-    /*
-     * Set the consider_parallel flag if this joinrel could potentially be
-     * scanned within a parallel worker.  If this flag is false for either
-     * inner_rel or outer_rel, then it must be false for the joinrel also.
-     * Even if both are true, there might be parallel-restricted expressions
-     * in the targetlist or quals.
-     *
-     * Note that if there are more than two rels in this relation, they could
-     * be divided between inner_rel and outer_rel in any arbitrary way.  We
-     * assume this doesn't matter, because we should hit all the same baserels
-     * and joinclauses while building up to this joinrel no matter which we
-     * take; therefore, we should make the same decision here however we get
-     * here.
-     */
-    if (inner_rel->consider_parallel && outer_rel->consider_parallel &&
-        is_parallel_safe(root, (Node *) restrictlist) &&
-        is_parallel_safe(root, (Node *) joinrel->reltarget->exprs))
-        joinrel->consider_parallel = true;
+        /*
+         * Set the consider_parallel flag if this joinrel could potentially be
+         * scanned within a parallel worker.  If this flag is false for either
+         * inner_rel or outer_rel, then it must be false for the joinrel also.
+         * Even if both are true, there might be parallel-restricted
+         * expressions in the targetlist or quals.
+         *
+         * Note that if there are more than two rels in this relation, they
+         * could be divided between inner_rel and outer_rel in any arbitrary
+         * way.  We assume this doesn't matter, because we should hit all the
+         * same baserels and joinclauses while building up to this joinrel no
+         * matter which we take; therefore, we should make the same decision
+         * here however we get here.
+         */
+        if (inner_rel->consider_parallel && outer_rel->consider_parallel &&
+            is_parallel_safe(root, (Node *) restrictlist) &&
+            is_parallel_safe(root, (Node *) joinrel->reltarget->exprs))
+            joinrel->consider_parallel = true;
+    }
+    else
+    {
+        /*
+         * Grouping essentially changes the number of rows.
+         *
+         * XXX We do not distinguish whether two plain rels are joined and the
+         * result is aggregated, or the aggregation has been already applied
+         * to one of the input rels. Is this worth extra effort, e.g.
+         * maintaining a separate RelOptInfo for each case (one difficulty
+         * that would introduce is construction of AppendPath)?
+         */
+        joinrel->rows = estimate_num_groups(root, agg_info->group_exprs,
+                                            agg_info->input_rows, NULL, NULL);
+    }
 
     /* Add the joinrel to the PlannerInfo. */
-    add_join_rel(root, joinrel);
+    if (!grouped)
+        add_join_rel(root, joinrel);
+    else
+        add_grouped_rel(root, joinrel, agg_info);
 
     /*
-     * Also, if dynamic-programming join search is active, add the new joinrel
-     * to the appropriate sublist.  Note: you might think the Assert on number
-     * of members should be for equality, but some of the level 1 rels might
-     * have been joinrels already, so we can only assert <=.
+     * Also, if dynamic-programming join search is active, add the new
+     * joinrelset to the appropriate sublist.  Note: you might think the
+     * Assert on number of members should be for equality, but some of the
+     * level 1 rels might have been joinrels already, so we can only assert
+     * <=.
+     *
+     * Do noting for grouped relation as it's stored aside from
+     * join_rel_level.
      */
-    if (root->join_rel_level)
+    if (root->join_rel_level && !grouped)
     {
         Assert(root->join_cur_level > 0);
-        Assert(root->join_cur_level <= bms_num_members(joinrel->relids));
+        Assert(root->join_cur_level <= bms_num_members(joinrelids));
         root->join_rel_level[root->join_cur_level] =
-            lappend(root->join_rel_level[root->join_cur_level], joinrel);
+            lappend(root->join_rel_level[root->join_cur_level],
+                    joinrel);
     }
 
     return joinrel;
@@ -2085,3 +2319,673 @@ build_child_join_reltarget(PlannerInfo *root,
     childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
     childrel->reltarget->width = parentrel->reltarget->width;
 }
+
+/*
+ * Check if the relation can produce grouped paths and return the information
+ * it'll need for it. The passed relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+    List       *gvis;
+    List       *aggregates = NIL;
+    bool        found_other_rel_agg;
+    ListCell   *lc;
+    RelAggInfo *result;
+    PathTarget *agg_input;
+    PathTarget *target = NULL;
+    List       *grp_exprs_extra = NIL;
+    List       *group_clauses_final;
+    int            i;
+    bool        pk_found, pk_missing;
+
+    /*
+     * The function shouldn't have been called if there's no opportunity for
+     * aggregate push-down.
+     */
+    Assert(root->grouped_var_list != NIL);
+
+    /*
+     * The current implementation of aggregate push-down cannot handle
+     * PlaceHolderVar (PHV).
+     *
+     * If we knew that the PHV should be evaluated in this target (and of
+     * course, if its expression matched some Aggref argument), we'd just let
+     * init_grouping_targets add that Aggref. On the other hand, if we knew
+     * that the PHV is evaluated below the current rel, we could ignore it
+     * because the referencing Aggref would take care of propagation of the
+     * value to upper joins.
+     *
+     * The problem is that the same PHV can be evaluated in the target of the
+     * current rel or in that of lower rel --- depending on the input paths.
+     * For example, consider rel->relids = {A, B, C} and if ph_eval_at = {B,
+     * C}. Path "A JOIN (B JOIN C)" implies that the PHV is evaluated by the
+     * "(B JOIN C)", while path "(A JOIN B) JOIN C" evaluates the PHV itself.
+     */
+    foreach(lc, rel->reltarget->exprs)
+    {
+        Expr       *expr = lfirst(lc);
+
+        if (IsA(expr, PlaceHolderVar))
+            return NULL;
+    }
+
+    if (IS_SIMPLE_REL(rel))
+    {
+        RangeTblEntry *rte = root->simple_rte_array[rel->relid];;
+
+        /*
+         * rtekind != RTE_RELATION case is not supported yet.
+         */
+        if (rte->rtekind != RTE_RELATION)
+            return NULL;
+    }
+
+    /* Caller should only pass base relations or joins. */
+    Assert(rel->reloptkind == RELOPT_BASEREL ||
+           rel->reloptkind == RELOPT_JOINREL);
+
+    /*
+     * If any outer join can set the attribute value to NULL, the Agg plan
+     * would receive different input at the base rel level.
+     *
+     * XXX For RELOPT_JOINREL, do not return if all the joins that can set any
+     * entry of the grouped target (do we need to postpone this check until
+     * the grouped target is available, and init_grouping_targets take care?)
+     * of this rel to NULL are provably below rel. (It's ok if rel is one of
+     * these joins.)
+     */
+    if (bms_overlap(rel->relids, root->nullable_baserels))
+        return NULL;
+
+    /*
+     * Use equivalence classes to generate additional grouping expressions for
+     * the current rel. Without these we might not be able to apply
+     * aggregation to the relation result set.
+     *
+     * It's important that create_grouping_expr_grouped_var_infos has
+     * processed the explicit grouping columns by now. If the grouping clause
+     * contains multiple expressions belonging to the same EC, the original
+     * (i.e. not derived) one should be preferred when we build grouping
+     * target for a relation. Otherwise we have a problem when trying to match
+     * target entries to grouping clauses during plan creation, see
+     * get_grouping_expression().
+     */
+    gvis = list_copy(root->grouped_var_list);
+    foreach(lc, root->grouped_var_list)
+    {
+        GroupedVarInfo *gvi = lfirst_node(GroupedVarInfo, lc);
+        int            relid = -1;
+
+        /* Only interested in grouping expressions. */
+        if (IsA(gvi->gvexpr, Aggref))
+            continue;
+
+        while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+        {
+            GroupedVarInfo *gvi_trans;
+
+            gvi_trans = translate_expression_to_rel(root, gvi, relid);
+            if (gvi_trans != NULL)
+                gvis = lappend(gvis, gvi_trans);
+        }
+    }
+
+    /*
+     * Check if some aggregates or grouping expressions can be evaluated in
+     * this relation's target, and collect all vars referenced by these
+     * aggregates / grouping expressions;
+     */
+    found_other_rel_agg = false;
+    foreach(lc, gvis)
+    {
+        GroupedVarInfo *gvi = lfirst_node(GroupedVarInfo, lc);
+
+        /*
+         * The subset includes gv_eval_at uninitialized, which includes
+         * Aggref.aggstar.
+         */
+        if (bms_is_subset(gvi->gv_eval_at, rel->relids))
+        {
+            /*
+             * init_grouping_targets will handle plain Var grouping
+             * expressions because it needs to look them up in
+             * grouped_var_list anyway.
+             */
+            if (IsA(gvi->gvexpr, Var))
+                continue;
+
+            /*
+             * Currently, GroupedVarInfo only handles Vars and Aggrefs.
+             */
+            Assert(IsA(gvi->gvexpr, Aggref));
+
+            gvi->agg_partial = (Aggref *) copyObject(gvi->gvexpr);
+            mark_partial_aggref(gvi->agg_partial, AGGSPLIT_INITIAL_SERIAL);
+
+            /*
+             * Accept the aggregate.
+             */
+            aggregates = lappend(aggregates, gvi);
+        }
+        else if (IsA(gvi->gvexpr, Aggref))
+        {
+            /*
+             * Remember that there is at least one aggregate expression that
+             * needs something else than this rel.
+             */
+            found_other_rel_agg = true;
+
+            /*
+             * This condition effectively terminates creation of the
+             * RelAggInfo, so there's no reason to check the next
+             * GroupedVarInfo.
+             */
+            break;
+        }
+    }
+
+    /*
+     * Grouping makes little sense w/o aggregate function and w/o grouping
+     * expressions.
+     */
+    if (aggregates == NIL)
+    {
+        list_free(gvis);
+        return NULL;
+    }
+
+    /*
+     * Give up if some other aggregate(s) need relations other than the
+     * current one.
+     *
+     * If the aggregate needs the current rel plus anything else, then the
+     * problem is that grouping of the current relation could make some input
+     * variables unavailable for the "higher aggregate", and it'd also
+     * decrease the number of input rows the "higher aggregate" receives.
+     *
+     * If the aggregate does not even need the current rel, then neither the
+     * current rel nor anything else should be grouped because we do not
+     * support join of two grouped relations.
+     */
+    if (found_other_rel_agg)
+    {
+        list_free(gvis);
+        return NULL;
+    }
+
+    /*
+     * Create target for grouped paths as well as one for the input paths of
+     * the aggregation paths.
+     */
+    target = create_empty_pathtarget();
+    agg_input = create_empty_pathtarget();
+
+    /*
+     * Cannot suitable targets for the aggregation push-down be derived?
+     */
+    if (!init_grouping_targets(root, rel, target, agg_input, gvis,
+                               &grp_exprs_extra))
+    {
+        list_free(gvis);
+        return NULL;
+    }
+
+    list_free(gvis);
+
+    /*
+     * Aggregation push-down makes no sense w/o grouping expressions.
+     */
+    if ((list_length(target->exprs) + list_length(grp_exprs_extra)) == 0)
+        return NULL;
+
+    group_clauses_final = root->parse->groupClause;
+
+    /*
+     * If the aggregation target should have extra grouping expressions (in
+     * order to emit input vars for join conditions), add them now. This step
+     * includes assignment of tleSortGroupRef's which we can generate now.
+     */
+    if (list_length(grp_exprs_extra) > 0)
+    {
+        Index        sortgroupref;
+
+        /*
+         * We'll have to add some clauses, but query group clause must be
+         * preserved.
+         */
+        group_clauses_final = list_copy(group_clauses_final);
+
+        /*
+         * Always start at root->max_sortgroupref. The extra grouping
+         * expressions aren't used during the final aggregation, so the
+         * sortgroupref values don't need to be unique across the query. Thus
+         * we don't have to increase root->max_sortgroupref, which makes
+         * recognition of the extra grouping expressions pretty easy.
+         */
+        sortgroupref = root->max_sortgroupref;
+
+        /*
+         * Generate the SortGroupClause's and add the expressions to the
+         * target.
+         */
+        foreach(lc, grp_exprs_extra)
+        {
+            Var           *var = lfirst_node(Var, lc);
+            SortGroupClause *cl = makeNode(SortGroupClause);
+
+            /*
+             * Initialize the SortGroupClause.
+             *
+             * As the final aggregation will not use this grouping expression,
+             * we don't care whether sortop is < or >. The value of
+             * nulls_first should not matter for the same reason.
+             */
+            cl->tleSortGroupRef = ++sortgroupref;
+            get_sort_group_operators(var->vartype,
+                                     false, true, false,
+                                     &cl->sortop, &cl->eqop, NULL,
+                                     &cl->hashable);
+            group_clauses_final = lappend(group_clauses_final, cl);
+            add_column_to_pathtarget(target, (Expr *) var,
+                                     cl->tleSortGroupRef);
+
+            /*
+             * The aggregation input target must emit this var too.
+             */
+            add_column_to_pathtarget(agg_input, (Expr *) var,
+                                     cl->tleSortGroupRef);
+        }
+    }
+
+    /*
+     * Add aggregates to the grouping target.
+     */
+    foreach(lc, aggregates)
+    {
+        GroupedVarInfo *gvi;
+
+        gvi = lfirst_node(GroupedVarInfo, lc);
+        add_column_to_pathtarget(target, (Expr *) gvi->agg_partial,
+                                 gvi->sortgroupref);
+    }
+
+    /*
+     * Build a list of grouping expressions and a list of the corresponding
+     * SortGroupClauses.
+     */
+    i = 0;
+    result = makeNode(RelAggInfo);
+    pk_missing = false;
+    foreach(lc, target->exprs)
+    {
+        Index        sortgroupref = 0;
+        SortGroupClause *cl;
+        Expr       *texpr;
+        ListCell    *lc2;
+
+        texpr = (Expr *) lfirst(lc);
+
+        if (IsA(texpr, Aggref))
+        {
+            /*
+             * Once we see Aggref, no grouping expressions should follow.
+             */
+            break;
+        }
+
+        /*
+         * Find the clause by sortgroupref.
+         */
+        sortgroupref = target->sortgrouprefs[i++];
+
+        /*
+         * Besides being an aggregate, the target expression should have no
+         * other reason to be there than being a column of a relation
+         * functionally dependent on the GROUP BY clause. So it's not actually
+         * a grouping column.
+         */
+        if (sortgroupref == 0)
+            continue;
+
+        /*
+         * group_clause_final contains the "local" clauses, so this search
+         * should succeed.
+         */
+        cl = get_sortgroupref_clause(sortgroupref, group_clauses_final);
+
+        result->group_clauses = list_append_unique(result->group_clauses,
+                                                   cl);
+
+        /*
+         * Add only unique clauses because of joins (both sides of a join can
+         * point at the same grouping clause). XXX Is it worth adding a bool
+         * argument indicating that we're dealing with join right now?
+         */
+        result->group_exprs = list_append_unique(result->group_exprs,
+                                                 texpr);
+
+        /*
+         * Try to find PathKey for the expression, but don't if we already saw
+         * an expression w/o the PathKey.
+         */
+        if (pk_missing)
+            continue;
+
+        pk_found = false;
+        foreach(lc2, root->group_pathkeys)
+        {
+            PathKey        *pkey = lfirst_node(PathKey, lc2);
+            EquivalenceClass *ec = pkey->pk_eclass;
+            ListCell    *lc3;
+
+            foreach(lc3, ec->ec_members)
+            {
+                EquivalenceMember    *em = lfirst_node(EquivalenceMember, lc3);
+
+                if (equal(texpr, em->em_expr))
+                {
+                    result->group_pathkeys = lappend(result->group_pathkeys,
+                                                     pkey);
+                    pk_found = true;
+                    break;
+                }
+            }
+            if (pk_found)
+                break;
+        }
+
+        /*
+         * If no PathKey was found, the expression was probably generated out
+         * of grp_exprs_extra. If we don't have a single PathKey,
+         * group_pathkeys is not useful, so clear it.
+         */
+        if (!pk_found)
+        {
+            list_free(result->group_pathkeys);
+            result->group_pathkeys = NIL;
+            /*
+             * Do not spend cycles looking for the PathKey for other
+             * expressions.
+             */
+            pk_missing = true;
+        }
+    }
+
+    /*
+     * Since neither target nor agg_input is supposed to be identical to the
+     * source reltarget, compute the width and cost again.
+     *
+     * target does not yet contain aggregates, but these will be accounted by
+     * AggPath.
+     */
+    set_pathtarget_cost_width(root, target);
+    set_pathtarget_cost_width(root, agg_input);
+
+    result->relids = bms_copy(rel->relids);
+    result->target = target;
+    result->agg_input = agg_input;
+
+    /* Finally collect the aggregates. */
+    while (lc != NULL)
+    {
+        Aggref       *aggref = lfirst_node(Aggref, lc);
+
+        /*
+         * Partial aggregation is what the grouped paths should do.
+         */
+        result->agg_exprs = lappend(result->agg_exprs, aggref);
+        lc = lnext(target->exprs, lc);
+    }
+
+    /* The "input_rows" field should be set by caller. */
+    return result;
+}
+
+/*
+ * Initialize target for grouped paths (target) as well as a target for paths
+ * that generate input for aggregation (agg_input).
+ *
+ * group_exprs_extra_p receives a list of Var nodes for which we need to
+ * construct SortGroupClause. Those vars will then be used as additional
+ * grouping expressions, for the sake of join clauses.
+ *
+ * gvis a list of GroupedVarInfo's possibly useful for rel.
+ *
+ * Return true iff the targets could be initialized.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+                      PathTarget *target, PathTarget *agg_input,
+                      List *gvis, List **group_exprs_extra_p)
+{
+    ListCell   *lc;
+    List       *possibly_dependent = NIL;
+    Var           *tvar;
+
+    foreach(lc, rel->reltarget->exprs)
+    {
+        Index        sortgroupref;
+
+        /*
+         * Given that PlaceHolderVar currently prevents us from doing
+         * aggregation push-down, the source target cannot contain anything
+         * more complex than a Var.
+         */
+        tvar = lfirst_node(Var, lc);
+
+        sortgroupref = get_expression_sortgroupref((Expr *) tvar, gvis);
+        if (sortgroupref > 0)
+        {
+            /*
+             * If the target expression can be used as the grouping key, we
+             * don't have to worry whether it can be emitted by the AggPath
+             * pushed down to relation / join.
+             */
+            add_column_to_pathtarget(target, (Expr *) tvar, sortgroupref);
+
+            /*
+             * As for agg_input, add the original expression but set
+             * sortgroupref in addition.
+             */
+            add_column_to_pathtarget(agg_input, (Expr *) tvar, sortgroupref);
+        }
+        else
+        {
+            if (is_var_needed_by_join(root, tvar, rel))
+            {
+                /*
+                 * The variable is needed for a join, however it's neither in
+                 * the GROUP BY clause nor can it be derived from it using EC.
+                 * (Otherwise it would have to be added to the targets above.)
+                 * We need to construct special SortGroupClause for that
+                 * variable.
+                 *
+                 * Note that its tleSortGroupRef needs to be unique within
+                 * agg_input, so we need to postpone creation of the
+                 * SortGroupClause's until we're done with the iteration of
+                 * rel->reltarget->exprs. Also it makes sense for the caller
+                 * to do some more check before it starts to create those
+                 * SortGroupClause's.
+                 */
+                *group_exprs_extra_p = lappend(*group_exprs_extra_p, tvar);
+            }
+            else if (is_var_in_aggref_only(root, tvar))
+            {
+                /*
+                 * Another reason we might need this variable is that some
+                 * aggregate pushed down to this relation references it. In
+                 * such a case, add that var to agg_input, but not to
+                 * "target". However, if the aggregate is not the only reason
+                 * for the var to be in the target, some more checks need to
+                 * be performed below.
+                 */
+                add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+            }
+            else
+            {
+                /*
+                 * The Var can be functionally dependent on another expression
+                 * of the target, but we cannot check until the other
+                 * expressions are in the target.
+                 */
+                possibly_dependent = lappend(possibly_dependent, tvar);
+            }
+        }
+    }
+
+    /*
+     * Now we can check whether the expression is functionally dependent on
+     * another one.
+     */
+    foreach(lc, possibly_dependent)
+    {
+        List       *deps = NIL;
+        RangeTblEntry *rte;
+
+        tvar = lfirst_node(Var, lc);
+        rte = root->simple_rte_array[tvar->varno];
+
+        /*
+         * Check if the Var can be in the grouping key even though it's not
+         * mentioned by the GROUP BY clause (and could not be derived using
+         * ECs).
+         */
+        if (check_functional_grouping(rte->relid, tvar->varno,
+                                      tvar->varlevelsup,
+                                      target->exprs, &deps))
+        {
+            /*
+             * The var shouldn't be actually used for grouping key evaluation
+             * (instead, the one this depends on will be), so sortgroupref
+             * should not be important.
+             */
+            add_new_column_to_pathtarget(target, (Expr *) tvar);
+            add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+        }
+        else
+        {
+            /*
+             * As long as the query is semantically correct, arriving here
+             * means that the var is referenced by a generic grouping
+             * expression but not referenced by any join.
+             *
+             * If the aggregate push-down will support generic grouping
+             * expression sin the future, create_rel_agg_info() will have to
+             * add this variable to "agg_input" target and also add the whole
+             * generic expression to "target".
+             */
+            return false;
+        }
+    }
+
+    return true;
+}
+
+/*
+ * Check whether given variable appears in Aggref(s) which we consider usable
+ * at relation / join level, and only in the Aggref(s).
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+    ListCell   *lc;
+    bool        found = false;
+
+    foreach(lc, root->grouped_var_list)
+    {
+        GroupedVarInfo *gvi = lfirst_node(GroupedVarInfo, lc);
+        ListCell   *lc2;
+        List       *vars;
+
+        if (!IsA(gvi->gvexpr, Aggref))
+            continue;
+
+        if (!bms_is_member(var->varno, gvi->gv_eval_at))
+            continue;
+
+        /*
+         * XXX Consider some sort of caching.
+         */
+        vars = pull_var_clause((Node *) gvi->gvexpr, PVC_RECURSE_AGGREGATES);
+        foreach(lc2, vars)
+        {
+            Var           *v = lfirst_node(Var, lc2);
+
+            if (equal(v, var))
+            {
+                found = true;
+                break;
+            }
+
+        }
+        list_free(vars);
+
+        if (found)
+            break;
+    }
+
+    /* No aggregate references the Var? */
+    if (!found)
+        return false;
+
+    /* Does the Var appear in the target outside aggregates? */
+    found = false;
+    foreach(lc, root->processed_tlist)
+    {
+        TargetEntry *te = lfirst_node(TargetEntry, lc);
+
+        if (IsA(te->expr, Aggref))
+            continue;
+
+        if (equal(te->expr, var))
+            return false;
+
+    }
+
+    /* The Var is in aggregate(s) and only there. */
+    return true;
+}
+
+/*
+ * Check if given variable is needed by joins above the current rel?
+ *
+ * Consider pushing the aggregate avg(b.y) down to relation "b" for the
+ * following query:
+ *
+ *    SELECT a.i, avg(b.y)
+ *    FROM a JOIN b ON b.j = a.i
+ *    GROUP BY a.i;
+ *
+ * If we aggregate the "b" relation alone, the column "b.j" needs to be used
+ * as the grouping key because otherwise it cannot find its way to the input
+ * of the join expression.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+    Relids        relids_no_top;
+    int            ndx;
+    RelOptInfo *baserel;
+
+    /*
+     * The relids we're not interested in do include 0, which is the top-level
+     * targetlist. The only reason for relids to contain 0 should be that
+     * arg_var is referenced either by aggregate or by grouping expression,
+     * but right now we're interested in the *other* reasons. (As soon
+     * aggregation is pushed down, the aggregates in the query targetlist no
+     * longer need direct reference to arg_var anyway.)
+     */
+
+    relids_no_top = bms_copy(rel->relids);
+    bms_add_member(relids_no_top, 0);
+
+    baserel = find_base_rel(root, var->varno);
+    ndx = var->varattno - baserel->min_attr;
+    if (bms_nonempty_difference(baserel->attr_needed[ndx],
+                                relids_no_top))
+        return true;
+
+    return false;
+}
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index c672b338c0..443e9fb42c 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -820,6 +820,37 @@ apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target)
     }
 }
 
+/*
+ * Return sortgroupref if expr can be used as the grouping expression in an
+ * AggPath at relation or join level, or 0 if it can't.
+ *
+ * gvis a list of a list of GroupedVarInfo's available for the query,
+ * including those derived using equivalence classes.
+ */
+Index
+get_expression_sortgroupref(Expr *expr, List *gvis)
+{
+    ListCell   *lc;
+
+    foreach(lc, gvis)
+    {
+        GroupedVarInfo *gvi = lfirst_node(GroupedVarInfo, lc);
+
+        if (IsA(gvi->gvexpr, Aggref))
+            continue;
+
+        if (equal(gvi->gvexpr, expr))
+        {
+            Assert(gvi->sortgroupref > 0);
+
+            return gvi->sortgroupref;
+        }
+    }
+
+    /* The expression cannot be used as grouping key. */
+    return 0;
+}
+
 /*
  * split_pathtarget_at_srfs
  *        Split given PathTarget into multiple levels to position SRFs safely
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 68328b1402..b10fd25ae6 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -945,6 +945,16 @@ struct config_bool ConfigureNamesBool[] =
         false,
         NULL, NULL, NULL
     },
+    {
+        {"enable_agg_pushdown", PGC_USERSET, QUERY_TUNING_METHOD,
+            gettext_noop("Enables aggregate push-down."),
+            NULL,
+            GUC_EXPLAIN
+        },
+        &enable_agg_pushdown,
+        false,
+        NULL, NULL, NULL
+    },
     {
         {"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
             gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 5afdeb04de..14ea2c96b8 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -388,6 +388,7 @@
 #enable_seqscan = on
 #enable_sort = on
 #enable_tidscan = on
+#enable_agg_pushdown = on
 
 # - Planner Cost Constants -
 
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 9790c058f9..d22efbb196 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -376,6 +376,9 @@ struct PlannerInfo
     /* list of PlaceHolderInfos */
     List       *placeholder_list;
 
+    /* List of GroupedVarInfos. */
+    List       *grouped_var_list;
+
     /* array of PlaceHolderInfos indexed by phid */
     struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
     /* allocated size of array */
@@ -416,6 +419,12 @@ struct PlannerInfo
      */
     RelInfoList       upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);;
 
+    /*
+     * list of grouped relation RelAggInfos. One instance of RelAggInfo per
+     * item of the upper_rels[UPPERREL_PARTIAL_GROUP_AGG] list.
+     */
+    struct RelInfoList *agg_info_list;
+
     /* Result tlists chosen by grouping_planner for upper-stage processing */
     struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
 
@@ -430,6 +439,12 @@ struct PlannerInfo
      */
     List       *processed_tlist;
 
+    /*
+     * The maximum ressortgroupref among target entries in processed_list.
+     * Useful when adding extra grouping expressions for partial aggregation.
+     */
+    int            max_sortgroupref;
+
     /*
      * For UPDATE, this list contains the target table's attribute numbers to
      * which the first N entries of processed_tlist are to be assigned.  (Any
@@ -1032,6 +1047,64 @@ typedef struct RelOptInfo
     ((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
      (rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
 
+/*
+ * RelAggInfo
+ *        Information needed to create grouped paths for base rels and joins.
+ *
+ * "relids" is the set of base-relation identifiers, just like with
+ * RelOptInfo.
+ *
+ * "target" will be used as pathtarget if partial aggregation is applied to
+ * base relation or join. The same target will also --- if the relation is a
+ * join --- be used to joinin grouped path to a non-grouped one.  This target
+ * can contain plain-Var grouping expressions and Aggref nodes.
+ *
+ * Note: There's a convention that Aggref expressions are supposed to follow
+ * the other expressions of the target. Iterations of ->exprs may rely on this
+ * arrangement.
+ *
+ * "agg_input" contains Vars used either as grouping expressions or aggregate
+ * arguments. Paths providing the aggregation plan with input data should use
+ * this target. The only difference from reltarget of the non-grouped relation
+ * is that some items can have sortgroupref initialized.
+ *
+ * "input_rows" is the estimated number of input rows for AggPath. It's
+ * actually just a workspace for users of the structure, i.e. not initialized
+ * when instance of the structure is created.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClause, the corresponding grouping expressions and PathKey
+ * respectively.
+ *
+ * "agg_exprs" is a list of Aggref nodes for the aggregation of the relation's
+ * paths.
+ *
+ * "rel_grouped" is the relation containing the partially aggregated paths.
+ */
+typedef struct RelAggInfo
+{
+    pg_node_attr(no_copy_equal, no_read)
+
+    NodeTag        type;
+
+    Relids        relids;            /* Base rels contained in this grouped rel. */
+
+    struct PathTarget *target;    /* Target for grouped paths. */
+
+    struct PathTarget *agg_input;    /* pathtarget of paths that generate input
+                                     * for aggregation paths. */
+
+    double        input_rows;
+
+    List       *group_clauses;
+    List       *group_exprs;
+    List       *group_pathkeys;
+
+    List       *agg_exprs;        /* Aggref expressions. */
+
+    RelOptInfo *rel_grouped;    /* Grouped relation. */
+} RelAggInfo;
+
 /*
  * IndexOptInfo
  *        Per-index information for planning/optimization
@@ -2898,6 +2971,29 @@ typedef struct PlaceHolderInfo
     int32        ph_width;
 } PlaceHolderInfo;
 
+/*
+ * GroupedVarInfo exists for each expression that can be used as an aggregate
+ * or grouping expression evaluated below a join.
+ *
+ * TODO Rename, perhaps to GroupedTargetEntry? (Also rename the variables of
+ * this type.)
+ */
+typedef struct GroupedVarInfo
+{
+    pg_node_attr(no_copy_equal, no_read)
+
+    NodeTag        type;
+
+    Expr       *gvexpr;            /* the represented expression. */
+    Aggref       *agg_partial;    /* if gvexpr is aggregate, agg_partial is the
+                                 * corresponding partial aggregate */
+    Index        sortgroupref;    /* If gvexpr is a grouping expression, this is
+                                 * the tleSortGroupRef of the corresponding
+                                 * SortGroupClause. */
+    Relids        gv_eval_at;        /* lowest level we can evaluate the expression
+                                 * at or NULL if it can happen anywhere. */
+} GroupedVarInfo;
+
 /*
  * This struct describes one potentially index-optimizable MIN/MAX aggregate
  * function.  MinMaxAggPath contains a list of these, and if we accept that
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index cbe0607e85..723ef7343a 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -54,5 +54,6 @@ extern Query *inline_set_returning_function(PlannerInfo *root,
                                             RangeTblEntry *rte);
 
 extern Bitmapset *pull_paramids(Expr *expr);
-
+extern GroupedVarInfo *translate_expression_to_rel(PlannerInfo *root,
+                                                   GroupedVarInfo *gvi, Index relid);
 #endif                            /* CLAUSES_H */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 02305ef902..7f917675b5 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -230,6 +230,14 @@ extern AggPath *create_agg_path(PlannerInfo *root,
                                 List *qual,
                                 const AggClauseCosts *aggcosts,
                                 double numGroups);
+extern AggPath *create_agg_sorted_path(PlannerInfo *root,
+                                       RelOptInfo *rel,
+                                       Path *subpath,
+                                       RelAggInfo *agg_info);
+extern AggPath *create_agg_hashed_path(PlannerInfo *root,
+                                       RelOptInfo *rel,
+                                       Path *subpath,
+                                       RelAggInfo *agg_info);
 extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
                                                   RelOptInfo *rel,
                                                   Path *subpath,
@@ -303,14 +311,21 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
 extern void expand_planner_arrays(PlannerInfo *root, int add_size);
 extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
                                     RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, int relid,
+                                            RelAggInfo **agg_info_p);
 extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
 extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
+extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel,
+                            RelAggInfo *agg_info);
+extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids,
+                                    RelAggInfo **agg_info_p);
 extern RelOptInfo *build_join_rel(PlannerInfo *root,
                                   Relids joinrelids,
                                   RelOptInfo *outer_rel,
                                   RelOptInfo *inner_rel,
                                   SpecialJoinInfo *sjinfo,
-                                  List **restrictlist_ptr);
+                                  List **restrictlist_ptr,
+                                  RelAggInfo *agg_info);
 extern Relids min_join_parameterization(PlannerInfo *root,
                                         Relids joinrelids,
                                         RelOptInfo *outer_rel,
@@ -336,5 +351,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
                                         RelOptInfo *outer_rel, RelOptInfo *inner_rel,
                                         RelOptInfo *parent_joinrel, List *restrictlist,
                                         SpecialJoinInfo *sjinfo, JoinType jointype);
-
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
 #endif                            /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 65a3c35611..d61f24df07 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
  * allpaths.c
  */
 extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_agg_pushdown;
 extern PGDLLIMPORT int geqo_threshold;
 extern PGDLLIMPORT int min_parallel_table_scan_size;
 extern PGDLLIMPORT int min_parallel_index_scan_size;
@@ -56,6 +57,11 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
                                   bool override_rows);
 extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
                                          bool override_rows);
+extern void generate_grouping_paths(PlannerInfo *root,
+                                    RelOptInfo *rel_grouped,
+                                    RelOptInfo *rel_plain,
+                                    RelAggInfo *agg_info);
+
 extern int    compute_parallel_worker(RelOptInfo *rel, double heap_pages,
                                     double index_pages, int max_workers);
 extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 95ecefdade..0e552065cf 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -72,6 +72,7 @@ extern void add_other_rels_to_query(PlannerInfo *root);
 extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
 extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
                                    Relids where_needed);
+extern void setup_aggregate_pushdown(PlannerInfo *root);
 extern void find_lateral_references(PlannerInfo *root);
 extern void create_lateral_join_info(PlannerInfo *root);
 extern List *deconstruct_jointree(PlannerInfo *root);
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 452b92ad55..107aae5ff6 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -46,6 +46,8 @@ extern PlanRowMark *get_plan_rowmark(List *rowmarks, Index rtindex);
  */
 extern void get_agg_clause_costs(PlannerInfo *root, AggSplit aggsplit,
                                  AggClauseCosts *costs);
+extern void get_agg_clause_costs_some(PlannerInfo *root, AggSplit aggsplit,
+                                      List *aggrefs, AggClauseCosts *costs);
 extern void preprocess_aggrefs(PlannerInfo *root, Node *clause);
 
 /*
diff --git a/src/include/optimizer/tlist.h b/src/include/optimizer/tlist.h
index ca64309c32..7903ba8f3b 100644
--- a/src/include/optimizer/tlist.h
+++ b/src/include/optimizer/tlist.h
@@ -49,8 +49,10 @@ extern void split_pathtarget_at_srfs(PlannerInfo *root,
                                      PathTarget *target, PathTarget *input_target,
                                      List **targets, List **targets_contain_srfs);
 
+/* TODO Find the best location for this one. */
+extern Index get_expression_sortgroupref(Expr *expr, List *gvis);
+
 /* Convenience macro to get a PathTarget with valid cost/width fields */
 #define create_pathtarget(root, tlist) \
     set_pathtarget_cost_width(root, make_pathtarget_from_tlist(tlist))
-
 #endif                            /* TLIST_H */
diff --git a/src/test/regress/expected/agg_pushdown.out b/src/test/regress/expected/agg_pushdown.out
new file mode 100644
index 0000000000..03a5ccf571
--- /dev/null
+++ b/src/test/regress/expected/agg_pushdown.out
@@ -0,0 +1,216 @@
+CREATE TABLE agg_pushdown_parent (
+    i int primary key,
+    x int);
+CREATE TABLE agg_pushdown_child1 (
+    j int,
+    parent int references agg_pushdown_parent,
+    v double precision,
+    PRIMARY KEY (j, parent));
+CREATE INDEX ON agg_pushdown_child1(parent);
+CREATE TABLE agg_pushdown_child2 (
+    k int,
+    parent int references agg_pushdown_parent,
+    v double precision,
+    PRIMARY KEY (k, parent));;
+INSERT INTO agg_pushdown_parent(i, x)
+SELECT n, n
+FROM generate_series(0, 7) AS s(n);
+INSERT INTO agg_pushdown_child1(j, parent, v)
+SELECT 128 * i + n, i, random()
+FROM generate_series(0, 127) AS s(n), agg_pushdown_parent;
+INSERT INTO agg_pushdown_child2(k, parent, v)
+SELECT 128 * i + n, i, random()
+FROM generate_series(0, 127) AS s(n), agg_pushdown_parent;
+ANALYZE;
+SET enable_agg_pushdown TO on;
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+--
+-- In addition, check that functionally dependent column "c.x" can be
+-- referenced by SELECT although GROUP BY references "p.i".
+EXPLAIN (COSTS off)
+SELECT p.x, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+                                      QUERY PLAN                                      
+--------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Sort
+         Sort Key: p.i
+         ->  Nested Loop
+               ->  Partial HashAggregate
+                     Group Key: c1.parent
+                     ->  Seq Scan on agg_pushdown_child1 c1
+               ->  Index Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+                     Index Cond: (i = c1.parent)
+(10 rows)
+
+-- The same for hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Sort
+         Sort Key: p.i
+         ->  Hash Join
+               Hash Cond: (p.i = c1.parent)
+               ->  Seq Scan on agg_pushdown_parent p
+               ->  Hash
+                     ->  Partial HashAggregate
+                           Group Key: c1.parent
+                           ->  Seq Scan on agg_pushdown_child1 c1
+(11 rows)
+
+-- The same for merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Merge Join
+         Merge Cond: (p.i = c1.parent)
+         ->  Sort
+               Sort Key: p.i
+               ->  Seq Scan on agg_pushdown_parent p
+         ->  Sort
+               Sort Key: c1.parent
+               ->  Partial HashAggregate
+                     Group Key: c1.parent
+                     ->  Seq Scan on agg_pushdown_child1 c1
+(12 rows)
+
+-- Restore the default values.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO on;
+-- Scan index on agg_pushdown_child1(parent) column and aggregate the result
+-- using AGG_SORTED strategy.
+SET enable_seqscan TO off;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+                                         QUERY PLAN                                          
+---------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Nested Loop
+         ->  Partial GroupAggregate
+               Group Key: c1.parent
+               ->  Index Scan using agg_pushdown_child1_parent_idx on agg_pushdown_child1 c1
+         ->  Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+               Index Cond: (i = c1.parent)
+(8 rows)
+
+SET enable_seqscan TO on;
+-- Join "c1" to "p.x" column, i.e. one that is not in the GROUP BY clause. The
+-- planner should still use "c1.parent" as grouping expression for partial
+-- aggregation, although it's not in the same equivalence class as the GROUP
+-- BY expression ("p.i"). The reason to use "c1.parent" for partial
+-- aggregation is that this is the only way for "c1" to provide the join
+-- expression with input data.
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.x GROUP BY p.i;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Sort
+         Sort Key: p.i
+         ->  Hash Join
+               Hash Cond: (p.x = c1.parent)
+               ->  Seq Scan on agg_pushdown_parent p
+               ->  Hash
+                     ->  Partial HashAggregate
+                           Group Key: c1.parent
+                           ->  Seq Scan on agg_pushdown_child1 c1
+(11 rows)
+
+-- Perform nestloop join between agg_pushdown_child1 and agg_pushdown_child2
+-- and aggregate the result.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+                                            QUERY PLAN                                             
+---------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Sort
+         Sort Key: p.i
+         ->  Nested Loop
+               ->  Partial HashAggregate
+                     Group Key: c1.parent
+                     ->  Nested Loop
+                           ->  Seq Scan on agg_pushdown_child1 c1
+                           ->  Index Scan using agg_pushdown_child2_pkey on agg_pushdown_child2 c2
+                                 Index Cond: ((k = c1.j) AND (parent = c1.parent))
+               ->  Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+                     Index Cond: (i = c1.parent)
+(13 rows)
+
+-- The same for hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+                                       QUERY PLAN                                       
+----------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Sort
+         Sort Key: p.i
+         ->  Hash Join
+               Hash Cond: (p.i = c1.parent)
+               ->  Seq Scan on agg_pushdown_parent p
+               ->  Hash
+                     ->  Partial HashAggregate
+                           Group Key: c1.parent
+                           ->  Hash Join
+                                 Hash Cond: ((c1.parent = c2.parent) AND (c1.j = c2.k))
+                                 ->  Seq Scan on agg_pushdown_child1 c1
+                                 ->  Hash
+                                       ->  Seq Scan on agg_pushdown_child2 c2
+(15 rows)
+
+-- The same for merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+SET enable_seqscan TO off;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+                                            QUERY PLAN                                             
+---------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Merge Join
+         Merge Cond: (c1.parent = p.i)
+         ->  Sort
+               Sort Key: c1.parent
+               ->  Partial HashAggregate
+                     Group Key: c1.parent
+                     ->  Merge Join
+                           Merge Cond: ((c1.j = c2.k) AND (c1.parent = c2.parent))
+                           ->  Index Scan using agg_pushdown_child1_pkey on agg_pushdown_child1 c1
+                           ->  Index Scan using agg_pushdown_child2_pkey on agg_pushdown_child2 c2
+         ->  Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+(13 rows)
+
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 001c6e7eb9..5921495beb 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -111,6 +111,7 @@ select count(*) = 0 as ok from pg_stat_wal_receiver;
 select name, setting from pg_settings where name like 'enable%';
               name              | setting 
 --------------------------------+---------
+ enable_agg_pushdown            | off
  enable_async_append            | on
  enable_bitmapscan              | on
  enable_gathermerge             | on
@@ -132,7 +133,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(21 rows)
+(22 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index f99e99373a..74e3d0a806 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -99,6 +99,8 @@ test: select_parallel
 test: write_parallel
 test: vacuum_parallel
 
+test: agg_pushdown
+
 # no relation related tests can be put in this group
 test: publication subscription
 
diff --git a/src/test/regress/sql/agg_pushdown.sql b/src/test/regress/sql/agg_pushdown.sql
new file mode 100644
index 0000000000..0a4614592b
--- /dev/null
+++ b/src/test/regress/sql/agg_pushdown.sql
@@ -0,0 +1,115 @@
+CREATE TABLE agg_pushdown_parent (
+    i int primary key,
+    x int);
+
+CREATE TABLE agg_pushdown_child1 (
+    j int,
+    parent int references agg_pushdown_parent,
+    v double precision,
+    PRIMARY KEY (j, parent));
+
+CREATE INDEX ON agg_pushdown_child1(parent);
+
+CREATE TABLE agg_pushdown_child2 (
+    k int,
+    parent int references agg_pushdown_parent,
+    v double precision,
+    PRIMARY KEY (k, parent));;
+
+INSERT INTO agg_pushdown_parent(i, x)
+SELECT n, n
+FROM generate_series(0, 7) AS s(n);
+
+INSERT INTO agg_pushdown_child1(j, parent, v)
+SELECT 128 * i + n, i, random()
+FROM generate_series(0, 127) AS s(n), agg_pushdown_parent;
+
+INSERT INTO agg_pushdown_child2(k, parent, v)
+SELECT 128 * i + n, i, random()
+FROM generate_series(0, 127) AS s(n), agg_pushdown_parent;
+
+ANALYZE;
+
+SET enable_agg_pushdown TO on;
+
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+--
+-- In addition, check that functionally dependent column "c.x" can be
+-- referenced by SELECT although GROUP BY references "p.i".
+EXPLAIN (COSTS off)
+SELECT p.x, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+
+-- The same for hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+
+-- The same for merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+
+-- Restore the default values.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO on;
+
+-- Scan index on agg_pushdown_child1(parent) column and aggregate the result
+-- using AGG_SORTED strategy.
+SET enable_seqscan TO off;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+
+SET enable_seqscan TO on;
+
+-- Join "c1" to "p.x" column, i.e. one that is not in the GROUP BY clause. The
+-- planner should still use "c1.parent" as grouping expression for partial
+-- aggregation, although it's not in the same equivalence class as the GROUP
+-- BY expression ("p.i"). The reason to use "c1.parent" for partial
+-- aggregation is that this is the only way for "c1" to provide the join
+-- expression with input data.
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.x GROUP BY p.i;
+
+-- Perform nestloop join between agg_pushdown_child1 and agg_pushdown_child2
+-- and aggregate the result.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+
+-- The same for hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+
+-- The same for merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+SET enable_seqscan TO off;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
-- 
2.31.1

From 11e3ccc4054517f3415d727e35ec5be2f0ff0de2 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Wed, 4 Jan 2023 14:41:39 +0100
Subject: [PATCH 3/3] Use also partial paths as the input for grouped paths.

---
 src/backend/optimizer/path/allpaths.c      |  44 +++++-
 src/backend/optimizer/util/relnode.c       |  46 +++---
 src/test/regress/expected/agg_pushdown.out | 156 +++++++++++++++++++++
 src/test/regress/sql/agg_pushdown.sql      |  65 +++++++++
 4 files changed, 281 insertions(+), 30 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index e42266f220..9c6571ffe9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -130,7 +130,7 @@ static void set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel,
                                    RangeTblEntry *rte);
 static void add_grouped_path(PlannerInfo *root, RelOptInfo *rel,
                              Path *subpath, AggStrategy aggstrategy,
-                             RelAggInfo *agg_info);
+                             RelAggInfo *agg_info, bool partial);
 static RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist);
 static bool subquery_is_pushdown_safe(Query *subquery, Query *topquery,
                                       pushdown_safety_info *safetyInfo);
@@ -3337,6 +3337,7 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
                         RelOptInfo *rel_plain, RelAggInfo *agg_info)
 {
     ListCell   *lc;
+    Path       *path;
 
     if (IS_DUMMY_REL(rel_plain))
     {
@@ -3346,7 +3347,7 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
 
     foreach(lc, rel_plain->pathlist)
     {
-        Path       *path = (Path *) lfirst(lc);
+        path = (Path *) lfirst(lc);
 
         /*
          * Since the path originates from the non-grouped relation which is
@@ -3360,7 +3361,8 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
          * add_grouped_path() will check whether the path has suitable
          * pathkeys.
          */
-        add_grouped_path(root, rel_grouped, path, AGG_SORTED, agg_info);
+        add_grouped_path(root, rel_grouped, path, AGG_SORTED, agg_info,
+                         false);
 
         /*
          * Repeated creation of hash table (for new parameter values) should
@@ -3368,12 +3370,38 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
          * efficiency.
          */
         if (path->param_info == NULL)
-            add_grouped_path(root, rel_grouped, path, AGG_HASHED, agg_info);
+            add_grouped_path(root, rel_grouped, path, AGG_HASHED, agg_info,
+                             false);
     }
 
     /* Could not generate any grouped paths? */
     if (rel_grouped->pathlist == NIL)
+    {
         mark_dummy_rel(rel_grouped);
+        return;
+    }
+
+    /*
+     * Almost the same for partial paths.
+     *
+     * The difference is that parameterized paths are never created, see
+     * add_partial_path() for explanation.
+     */
+    foreach(lc, rel_plain->partial_pathlist)
+    {
+        path = (Path *) lfirst(lc);
+
+        if (path->param_info != NULL)
+            continue;
+
+        path = (Path *) create_projection_path(root, rel_grouped, path,
+                                               agg_info->agg_input);
+
+        add_grouped_path(root, rel_grouped, path, AGG_SORTED, agg_info,
+                         true);
+        add_grouped_path(root, rel_grouped, path, AGG_HASHED, agg_info,
+                         true);
+    }
 }
 
 /*
@@ -3381,7 +3409,8 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
  */
 static void
 add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
-                 AggStrategy aggstrategy, RelAggInfo *agg_info)
+                 AggStrategy aggstrategy, RelAggInfo *agg_info,
+                 bool partial)
 {
     Path       *agg_path;
 
@@ -3404,7 +3433,10 @@ add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
         return;
 
     /* Add the grouped path to the list of grouped base paths. */
-    add_path(rel, (Path *) agg_path);
+    if (!partial)
+        add_path(rel, (Path *) agg_path);
+    else
+        add_partial_path(rel, (Path *) agg_path);
 }
 
 /*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 67f5bbe59f..8762477ccc 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -967,33 +967,12 @@ build_join_rel(PlannerInfo *root,
         build_joinrel_partition_info(joinrel, outer_rel, inner_rel,
                                      restrictlist, sjinfo->jointype);
 
+    /*
+     * Set estimates of the joinrel's size.
+     */
     if (!grouped)
-    {
-        /*
-         * Set estimates of the joinrel's size.
-         */
         set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
                                    sjinfo, restrictlist);
-
-        /*
-         * Set the consider_parallel flag if this joinrel could potentially be
-         * scanned within a parallel worker.  If this flag is false for either
-         * inner_rel or outer_rel, then it must be false for the joinrel also.
-         * Even if both are true, there might be parallel-restricted
-         * expressions in the targetlist or quals.
-         *
-         * Note that if there are more than two rels in this relation, they
-         * could be divided between inner_rel and outer_rel in any arbitrary
-         * way.  We assume this doesn't matter, because we should hit all the
-         * same baserels and joinclauses while building up to this joinrel no
-         * matter which we take; therefore, we should make the same decision
-         * here however we get here.
-         */
-        if (inner_rel->consider_parallel && outer_rel->consider_parallel &&
-            is_parallel_safe(root, (Node *) restrictlist) &&
-            is_parallel_safe(root, (Node *) joinrel->reltarget->exprs))
-            joinrel->consider_parallel = true;
-    }
     else
     {
         /*
@@ -1009,6 +988,25 @@ build_join_rel(PlannerInfo *root,
                                             agg_info->input_rows, NULL, NULL);
     }
 
+    /*
+     * Set the consider_parallel flag if this joinrel could potentially be
+     * scanned within a parallel worker.  If this flag is false for either
+     * inner_rel or outer_rel, then it must be false for the joinrel also.
+     * Even if both are true, there might be parallel-restricted expressions
+     * in the targetlist or quals.
+     *
+     * Note that if there are more than two rels in this relation, they could
+     * be divided between inner_rel and outer_rel in any arbitrary way.  We
+     * assume this doesn't matter, because we should hit all the same baserels
+     * and joinclauses while building up to this joinrel no matter which we
+     * take; therefore, we should make the same decision here however we get
+     * here.
+     */
+    if (inner_rel->consider_parallel && outer_rel->consider_parallel &&
+        is_parallel_safe(root, (Node *) restrictlist) &&
+        is_parallel_safe(root, (Node *) joinrel->reltarget->exprs))
+        joinrel->consider_parallel = true;
+
     /* Add the joinrel to the PlannerInfo. */
     if (!grouped)
         add_join_rel(root, joinrel);
diff --git a/src/test/regress/expected/agg_pushdown.out b/src/test/regress/expected/agg_pushdown.out
index 03a5ccf571..66d36d122e 100644
--- a/src/test/regress/expected/agg_pushdown.out
+++ b/src/test/regress/expected/agg_pushdown.out
@@ -214,3 +214,159 @@ c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
          ->  Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
 (13 rows)
 
+-- Most of the tests above with parallel query processing enforced.
+SET min_parallel_index_scan_size = 0;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+-- Partially aggregate a single relation.
+--
+-- Nestloop join.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+EXPLAIN (COSTS off)
+SELECT p.x, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+                                                 QUERY PLAN                                                 
+------------------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Gather Merge
+         Workers Planned: 1
+         ->  Nested Loop
+               ->  Partial GroupAggregate
+                     Group Key: c1.parent
+                     ->  Parallel Index Scan using agg_pushdown_child1_parent_idx on agg_pushdown_child1 c1
+               ->  Index Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+                     Index Cond: (i = c1.parent)
+(10 rows)
+
+-- Hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+                                                    QUERY PLAN                                                    
+------------------------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Sort
+         Sort Key: p.i
+         ->  Gather
+               Workers Planned: 1
+               ->  Parallel Hash Join
+                     Hash Cond: (c1.parent = p.i)
+                     ->  Partial GroupAggregate
+                           Group Key: c1.parent
+                           ->  Parallel Index Scan using agg_pushdown_child1_parent_idx on agg_pushdown_child1 c1
+                     ->  Parallel Hash
+                           ->  Parallel Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+(13 rows)
+
+-- Merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+                                                 QUERY PLAN                                                 
+------------------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Gather Merge
+         Workers Planned: 1
+         ->  Merge Join
+               Merge Cond: (c1.parent = p.i)
+               ->  Partial GroupAggregate
+                     Group Key: c1.parent
+                     ->  Parallel Index Scan using agg_pushdown_child1_parent_idx on agg_pushdown_child1 c1
+               ->  Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+(10 rows)
+
+SET enable_nestloop TO on;
+SET enable_hashjoin TO on;
+-- Perform nestloop join between agg_pushdown_child1 and agg_pushdown_child2
+-- and aggregate the result.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+                                                    QUERY PLAN                                                    
+------------------------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Gather Merge
+         Workers Planned: 2
+         ->  Sort
+               Sort Key: p.i
+               ->  Nested Loop
+                     ->  Partial HashAggregate
+                           Group Key: c1.parent
+                           ->  Nested Loop
+                                 ->  Parallel Index Scan using agg_pushdown_child1_pkey on agg_pushdown_child1 c1
+                                 ->  Index Scan using agg_pushdown_child2_pkey on agg_pushdown_child2 c2
+                                       Index Cond: ((k = c1.j) AND (parent = c1.parent))
+                     ->  Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+                           Index Cond: (i = c1.parent)
+(15 rows)
+
+-- The same for hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+                                                       QUERY PLAN

 

+------------------------------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Gather Merge
+         Workers Planned: 1
+         ->  Sort
+               Sort Key: p.i
+               ->  Parallel Hash Join
+                     Hash Cond: (c1.parent = p.i)
+                     ->  Partial HashAggregate
+                           Group Key: c1.parent
+                           ->  Parallel Hash Join
+                                 Hash Cond: ((c1.parent = c2.parent) AND (c1.j = c2.k))
+                                 ->  Parallel Index Scan using agg_pushdown_child1_parent_idx on agg_pushdown_child1
c1
+                                 ->  Parallel Hash
+                                       ->  Parallel Index Scan using agg_pushdown_child2_pkey on agg_pushdown_child2
c2
+                     ->  Parallel Hash
+                           ->  Parallel Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+(17 rows)
+
+-- The same for merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+SET enable_seqscan TO off;
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+                                                    QUERY PLAN                                                    
+------------------------------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Group Key: p.i
+   ->  Gather Merge
+         Workers Planned: 2
+         ->  Merge Join
+               Merge Cond: (c1.parent = p.i)
+               ->  Sort
+                     Sort Key: c1.parent
+                     ->  Partial HashAggregate
+                           Group Key: c1.parent
+                           ->  Merge Join
+                                 Merge Cond: ((c1.j = c2.k) AND (c1.parent = c2.parent))
+                                 ->  Parallel Index Scan using agg_pushdown_child1_pkey on agg_pushdown_child1 c1
+                                 ->  Index Scan using agg_pushdown_child2_pkey on agg_pushdown_child2 c2
+               ->  Index Only Scan using agg_pushdown_parent_pkey on agg_pushdown_parent p
+(15 rows)
+
diff --git a/src/test/regress/sql/agg_pushdown.sql b/src/test/regress/sql/agg_pushdown.sql
index 0a4614592b..49ba6dd67c 100644
--- a/src/test/regress/sql/agg_pushdown.sql
+++ b/src/test/regress/sql/agg_pushdown.sql
@@ -113,3 +113,68 @@ EXPLAIN (COSTS off)
 SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
 agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
 c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+
+-- Most of the tests above with parallel query processing enforced.
+SET min_parallel_index_scan_size = 0;
+SET min_parallel_table_scan_size = 0;
+SET parallel_setup_cost = 0;
+SET parallel_tuple_cost = 0;
+
+-- Partially aggregate a single relation.
+--
+-- Nestloop join.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+EXPLAIN (COSTS off)
+SELECT p.x, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+
+-- Hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+
+-- Merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1
+AS c1 ON c1.parent = p.i GROUP BY p.i;
+
+SET enable_nestloop TO on;
+SET enable_hashjoin TO on;
+
+-- Perform nestloop join between agg_pushdown_child1 and agg_pushdown_child2
+-- and aggregate the result.
+SET enable_nestloop TO on;
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO off;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+
+-- The same for hash join.
+SET enable_nestloop TO off;
+SET enable_hashjoin TO on;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
+
+-- The same for merge join.
+SET enable_hashjoin TO off;
+SET enable_mergejoin TO on;
+SET enable_seqscan TO off;
+
+EXPLAIN (COSTS off)
+SELECT p.i, avg(c1.v + c2.v) FROM agg_pushdown_parent AS p JOIN
+agg_pushdown_child1 AS c1 ON c1.parent = p.i JOIN agg_pushdown_child2 AS c2 ON
+c2.parent = p.i WHERE c1.j = c2.k GROUP BY p.i;
-- 
2.31.1


Re: WIP: Aggregation push-down - take2

From
"Gregory Stark (as CFM)"
Date:
On Thu, 5 Jan 2023 at 02:59, Antonin Houska <ah@cybertec.at> wrote:
>
> vignesh C <vignesh21@gmail.com> wrote:
>
> > The patch does not apply on top of HEAD as in [1], please post a rebased patch:

And again...

Setting this to Waiting on Author for the moment.

Do you think this patch is likely to be ready for this release or the
next one? Is there specific feedback you're looking for?

patching file src/backend/optimizer/util/relnode.c
Hunk #1 FAILED at 18.
Hunk #2 succeeded at 85 (offset 8 lines).
Hunk #3 succeeded at 405 with fuzz 1 (offset 25 lines).
Hunk #4 succeeded at 595 (offset 63 lines).
Hunk #5 succeeded at 657 (offset 63 lines).
Hunk #6 succeeded at 692 (offset 63 lines).
Hunk #7 succeeded at 731 (offset 63 lines).
Hunk #8 succeeded at 849 (offset 62 lines).
Hunk #9 succeeded at 860 (offset 62 lines).
Hunk #10 succeeded at 873 (offset 62 lines).
Hunk #11 FAILED at 911.
Hunk #12 FAILED at 945.
Hunk #13 succeeded at 2585 (offset 310 lines).
3 out of 13 hunks FAILED -- saving rejects to file
src/backend/optimizer/util/relnode.c.rej
patching file src/backend/optimizer/util/tlist.c
patching file src/backend/utils/misc/guc_tables.c
Hunk #1 succeeded at 946 (offset 1 line).
patching file src/backend/utils/misc/postgresql.conf.sample
Hunk #1 succeeded at 390 (offset 2 lines).
patching file src/include/nodes/pathnodes.h
Hunk #1 succeeded at 386 (offset 10 lines).
Hunk #2 succeeded at 429 (offset 10 lines).
Hunk #3 succeeded at 477 (offset 38 lines).
Hunk #4 succeeded at 1084 (offset 37 lines).
Hunk #5 succeeded at 3117 (offset 146 lines).
patching file src/include/optimizer/clauses.h
patching file src/include/optimizer/pathnode.h
Hunk #2 FAILED at 311.
Hunk #3 FAILED at 344.
2 out of 3 hunks FAILED -- saving rejects to file
src/include/optimizer/pathnode.h.rej




-- 
Gregory Stark
As Commitfest Manager



Re: WIP: Aggregation push-down - take2

From
"Gregory Stark (as CFM)"
Date:
It looks like in November 2022 Tomas Vondra said:

> I did a quick initial review of the v20 patch series.
> I plan to do a
more thorough review over the next couple days, if time permits.
> In
general I think the patch is in pretty good shape.

Following which Antonin Houska updated the patch responding to his
review comments.

Since then this patch has demonstrated the unfortunate "please rebase
thx" followed by the author rebasing and getting no feedback until
"please rebase again thx"...

So while the patch doesn't currently apply it seems like it really
should be either Needs Review or Ready for Commit.

That said, I suspect this patch has missed the boat for this CF.
Hopefully it will get more attention next release.

I'll move it to the next CF but set it to Needs Review even though it
needs a rebase.

-- 
Gregory Stark
As Commitfest Manager



Re: WIP: Aggregation push-down - take2

From
Peter Smith
Date:
2024-01 Commitfest.

Hi, this patch was marked in CF as "Needs Review", but there has been
no activity on this thread for 9+ months.

Since there seems not much interest, I have changed the status to
"Returned with Feedback" [1]. Feel free to propose a stronger use case
for the patch and add an entry for the same.

======
[1] https://commitfest.postgresql.org/46/3764/

Kind Regards,
Peter Smith.