Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers

From Jeevan Chalke
Subject Re: [HACKERS] Partition-wise aggregation/grouping
Date
Msg-id CAM2+6=Vwyx+doCm425Oqfw5z=zxYuQ1gvpUzsc+8i2cQH58FkA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Partition-wise aggregation/grouping  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers


On Fri, Mar 2, 2018 at 3:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Mar 1, 2018 at 5:34 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
> Attached new patchset after rebasing my changes over these changes and on
> latest HEAD.

+        * We have already created a Gather or Gather Merge path atop cheapest
+        * partial path. Thus the partial path referenced by the
Gather node needs
+        * to be preserved as adding new partial paths in same rel may
delete this
+        * referenced path. To do this we need to clear the
partial_pathlist from
+        * the partially_grouped_rel as we may add partial paths again
while doing
+        * partitionwise aggregation. Keeping older partial path intact seems
+        * reasonable too as it might possible that the final path
chosen which is
+        * using it wins, but the underneath partial path is not the
cheapest one.

This isn't a good design.  You shouldn't create a Gather or Gather
Merge node until all partial paths have been added.  I mean, the point
is to put a Gather node on top of the cheapest path, not the path that
is currently the cheapest but might not actually be the cheapest once
we've added them all.

To be honest, I didn't know that we should not generated Gather or Gather Merge until we have all possible partial paths in place. I realize it recently while debugging one issue reported by Rajkumar off-list. While working on that fix, what I have observed is
- I have cheapest partial path with cost say 10, a Gather on it increased cost to 11.
- Later when I add a partial path it has a cost say 9 but a gather on it resulted is total cost to 12.
This means, the first Gather path is the cheapest one but not the underneath partial path and unfortunately that got removed when my partial path is added into the partial_pathlist.

Due to this, I thought it is better to have both paths valid and to avoid deleting earlier cheapest partial_path, I chose to reset the partially_grouped_rel->partial_pathlist.

But, yes per comment in generate_gather_paths() and as you said, we should add Gather or Gather Merge only after we have done with all partial path creation. Sorry for not knowing this before.
 

+add_gather_or_gather_merge(PlannerInfo *root,

Please stop picking generic function names for functions that have
very specific purposes.  I don't really think that you need this to be
a separate function at all, but it it is certainly NOT a
general-purpose function for adding a Gather or Gather Merge node.

OK. Got it now.
 

+        /*
+         * Collect statistics about aggregates for estimating costs of
+         * performing aggregation in parallel or in partial, if not already
+         * done. We use same cost for all the children as they will be same
+         * anyways.
+         */

If it only needs to be done once, do we really have to have it inside
the loop?  I see that you're using the varno-translated
partial_target->exprs and target->exprs, but if the specific varnos
don't matter, why not just use the untranslated version of the targets
before entering the loop?  And if the specific varnos do matter, then
presumably you need to do it every time.

Yes. It can be pulled outside a loop.
 

This is not a full review, but I'm out of time for right now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

pgsql-hackers by date:

Previous
From: Vik Fearing
Date:
Subject: Re: PATCH: psql tab completion for SELECT
Next
From: Jeevan Chalke
Date:
Subject: Re: [HACKERS] Partition-wise aggregation/grouping