Re: WIP: Upper planner pathification - Mailing list pgsql-hackers

From Tom Lane
Subject Re: WIP: Upper planner pathification
Date
Msg-id 10131.1456844527@sss.pgh.pa.us
Whole thread Raw
In response to Re: WIP: Upper planner pathification  (Greg Stark <stark@mit.edu>)
Responses Re: WIP: Upper planner pathification  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
Greg Stark <stark@mit.edu> writes:
> On Tue, Mar 1, 2016 at 2:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There are a couple of
>> regression test cases that change plans for the better, but it's sort of
>> accidental.  Those cases look like
>> 
>> select d.* from d left join (select * from b group by b.id, b.c_id) s
>> on d.a = s.id;
>> 
>> and what happens in HEAD is that the subquery chooses a hashagg plan
>> and then the upper query decides a mergejoin would be a good idea ...
>> so it has to sort the output of the hashagg.  With the patch, what
>> comes back from the subquery is a Path for the hashagg and a Path
>> for doing the GROUP BY with Sort/Uniq.  The second path is more expensive,
>> but it survives the add_path tournament because it can produce sorted
>> output.  Then the outer level discovers that it can use that to do its
>> mergejoin without a separate sort step, and that way is cheaper overall.

> This doesn't sound accidental at all. It sounds like a perfect example
> of exactly the benefits of this approach.

Well, my point is that no such path would have been generated if the
subquery hadn't had an internal reason to consider sorting on b.id.
The "accidental" part of this is that the subquery's GROUP BY key
matches what the outer query needs as a mergejoin key.


> (Actually the first hunk in the patch kind of surprised me. Do we dump
> node trees with -> notation currently? I thought they normally all
> looked like sexpressions.)

I chose in 19a541143 to not make PathTarget be a subclass of Node,
so that's kind of forced --- we can't print it by recursing to
_outNode().  We could change that but I'm not sure it would be an
improvement.  The restarget fields are embedded in RelOptInfo, not
sub-nodes of it, so pretending that they're independent nodes seems
a bit phony in its own way.  I'm not wedded to that reasoning though;
if people are more concerned about what pprint() output looks like,
we can change it.  Or we could make restarget actually be a subnode,
at the cost of one more palloc per RelOptInfo.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Sort returns more rows than seq scan?
Next
From: Tomas Vondra
Date:
Subject: Re: checkpointer continuous flushing - V16