Re: upper planner path-ification - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: upper planner path-ification
Date
Msg-id CANP8+jKeGV0oF2SaOR3HyiE_KjcwF6GWT3N4nDVZNuUyG6BKbQ@mail.gmail.com
Whole thread Raw
In response to Re: upper planner path-ification  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: upper planner path-ification  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 18 May 2015 at 14:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, May 17, 2015 at 12:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Rather than adding tlists per se to Paths, I've been vaguely toying with
>> a notion of identifying all the "interesting" subexpressions in a query
>> (expensive functions, aggregates, etc), giving them indexes 1..n, and then
>> marking Paths with bitmapsets showing which interesting subexpressions
>> they can produce values for.  This would make things like "does this Path
>> compute all the needed aggregates" much cheaper to deal with than a raw
>> tlist representation would do.  But maybe that's still not the best way.

> I don't know, but it seems like this might be pulling in the opposite
> direction from your previously-stated desire to get subquery_planner
> to output Paths rather than Plans as soon as possible.

Sorry, I didn't mean to suggest that that necessarily had to happen right
away.

What we do need right away, though, is *some* design for distinguishing
Paths for the different possible upper-level steps.  I won't cry if we
change it around later, but we have to have something to start with.

So for the moment, let's assume that we still rigidly follow the sequence
of upper-level steps currently embodied in grouping_planner.  (I'm not
sure if it even makes sense to consider other orderings of those
processing steps, but in any case we don't need to allow it on day zero.)
Then, make a dummy RelOptInfo corresponding to the result of each step,
and insert links to those in new fields in PlannerInfo.  (We create these
*before* starting scan/join planning, so that FDWs, custom scans, etc, can
inject paths into these RelOptInfos if they want, so as to represent cases
like remote aggregation.)  Then just use add_path with the appropriate
target RelOptInfo when producing different ways to do grouping etc.

This is a bit ad-hoc but it would be a place to start.

Comments?

My thinking was to push aggregation down to the lowest level possible in the plan, hopefully a single relation. That way we can generate paths for the current grouping_planner options as well as others, such as these

* Push down aggregate prior to a join (which might then affect join planning)
* Allow parallel queries to follow a scan-aggregate-collectfromslaves-aggregate strategy (hence need for double aggregation semantics)
* Allow a lookaside to a Mat View rather than do a scan-aggregate (assume for now these are maintained correctly)
* Allow a lookaside to an alternate datastore/mechanism via CustomScan (assume these are maintained correctly)

all of which need to be costed against each other and the current strategies (aggregate last).

The above proposal sounds like it will do that, but not completely sure.

I'm assuming the O(N^2) Mat View planning problem can be solved in part by recognizing that many MVs are just single-table plus aggregates, and that we'd have a small enough number of MVs in play that search would not be a problem in practice.

I'm also aware that LIMIT is still very badly optimized, so I'm hoping it helps there also.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: jsonb concatenate operator's semantics seem questionable
Next
From: Peter Geoghegan
Date:
Subject: Re: jsonb concatenate operator's semantics seem questionable