Re: upper planner path-ification - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: upper planner path-ification
Date
Msg-id CAFjFpRc_9y1Tx=fq+82TCyJE4ToRuM+1+oonEkutKDfvMpHAnw@mail.gmail.com
Whole thread Raw
In response to upper planner path-ification  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers


On Thu, May 14, 2015 at 7:09 AM, Robert Haas <robertmhaas@gmail.com> wrote:
Hi,

I've been pulling over Tom's occasional remarks about redoing
grouping_planner - and maybe further layers of the planner - to work
with Paths instead of Plans.  I've had difficulty locating all of the
relevant threads.  Here's everything I've found so far, which I'm
pretty sure is not everything:

http://www.postgresql.org/message-id/17400.1311716272@sss.pgh.pa.us
http://www.postgresql.org/message-id/2614.1375730493@sss.pgh.pa.us
http://www.postgresql.org/message-id/22721.1385048662@sss.pgh.pa.us
http://www.postgresql.org/message-id/BANLkTinDjjFHNOzESG2J2U4GOkqLu69Zqg@mail.gmail.com
http://www.postgresql.org/message-id/8479.1418420018@sss.pgh.pa.us

I think there are two separate problems here.  First, there's the
problem that grouping_planner() is complicated.  It's doing cost
comparisons, but all in ad-hoc fashion rather than using a consistent
mechanic the way add_path() does.  Generally, we can plan an aggregate
using either (1) a hashed aggregate, (2) a sorted aggregate, or (3)
for min or max, an index scan that just grabs the highest or lowest
value in lieu of a full table scan.  Instead of generating a plan for
each of these possibilities, we'd like to generate paths for each one,
and then pick one to turn into a plan.  AIUI, the hope is that this
would simplify the cost calculations, and also make it easier to
inject other paths, such as a path where an FDW performs the
aggregation step on the remote server.

Second, there's the problem that we might like to order aggregates
with respect to joins.  If we have something like SELECT DISTINCT ON
(foo.x) foo.x, foo.y, bar.y FROM foo, bar WHERE foo.x = bar.x, then
(a) if foo.x has many duplicates, it will be better to DISTINCT-ify
foo and then join to bar afterwards but (b) if foo.x = bar.x is highly
selective, it will be better to join to bar first and then
DISTINCT-ify the result.  Currently, aggregation is always last;
that's not great.  Hitoshi Harada's proposed strategy of essentially
figuring out where the aggregation steps can go and then re-planning
for each one is also not great, because each join problem will be a
subtree of the one we use for the aggregate-last strategy, and thus
we're wasting effort by planning the same subtrees multiple times.
Instead, we might imagine letting grouping planner fish out the best
paths for the joinrels that represent possible aggregation points,
generate aggregation paths for each of those, and then work out what
additional rels need to be joined afterwards.  That sounds hard, but
not as hard as doing something sensible with what we have today.


Another futuristic avenue for optimization would be commuting Append and Join when joining two similarly partitioned tables.
 
I'm inclined to think that it would be useful to solve the first
problem even if we didn't solve the second one right away (but that
might be wrong).  As a preparatory step, I'm thinking it would be
sensible to split grouping_planner() into an outer function that would
handle the addition of Limit and LockRows nodes and maybe planning of
set operations, and an inner function that would handle GROUP BY,
DISTINCT, and possibly window function planning.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Fix token exceeding NAMELEN
Next
From: Alvaro Herrera
Date:
Subject: Re: best place for "rtree" strategy numbers