Re: make Gather node projection-capable - Mailing list pgsql-hackers

From Robert Haas
Subject Re: make Gather node projection-capable
Date
Msg-id CA+Tgmobet41kbVZ9pPO+q7cjvw9PpsgiL2jTaY_GFJ8JQJETQQ@mail.gmail.com
Whole thread Raw
In response to Re: make Gather node projection-capable  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Sun, Oct 25, 2015 at 11:59 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 22 October 2015 at 16:01, Robert Haas <robertmhaas@gmail.com> wrote:
>> If we make Gather projection-capable,
>> we can just end up with Gather->PartialSeqScan.
>
> Is there a reason not to do projection in the Gather node? I don't see one.

I don't see one either.  There may be some work that needs to be done
to get the projection to happen in the Gather node in all of the cases
where we want it to happen in the Gather node, but that's not an
argument against having the capability.

>> > That said, I don't understand Tom's comment either.  Surely the planner
>> > is going to choose to do the projection in the innermost node possible,
>> > so that the children nodes are going to do the projections most of the
>> > time.  But if for whatever reason this fails to happen, wouldn't it make
>> > more sense to do it at Gather than having to put a Result on top?
>>
>> The planner doesn't seem to choose to do projection in the innermost
>> node possible.  The final tlist only gets projected at the top of the
>> join tree.  Beneath that, it seems like we project in order to avoid
>> carrying Vars through nodes where that would be a needless expense,
>> but that's just dropping columns, not computing anything.  That having
>> been said, I don't think that takes anything away from your chain of
>> reasoning here, and I agree with your conclusion.  There seems to be
>> little reason to force a Result node atop a Gather node when we don't
>> do that for most other node types.
>
> Presumably this is a performance issue then? If we are calculating something
> *after* a join which increases rows then the query will be slower than need
> be.

I don't think there will be a performance issue in most cases because
in most cases the node immediately beneath the Gather node will be
able to do projection, which in most cases is in fact better, because
then the work gets done in the workers.  However, there may be some
cases where it is useful.  After having mulled it over, I think it's
likely that the reason why we didn't introduce a separate node for
projection is that you generally want to project to remove unnecessary
columns at the earliest stage that doesn't lose performance.  So if we
didn't have projection capabilities built into the individual nodes,
then you'd end up with things like Aggregate -> Project -> Join ->
Project -> Scan, which would start to get silly, and likely
inefficient.

> I agree the rule should be to project as early as possible.

Cool.

I'm not sure Tom was really disagreeing with the idea of making Gather
projection-capable ... it seems like he may have just been saying that
there wasn't as much of a rule as I was alleging.  Which is fine: we
can decide what is best here, and I still think this is it.  Barring
further objections, I'm going to commit this, because (1) the status
quo is definitely weird because Gather is abusing the projection stuff
to come up with an extra slot, so doing thing seems unappealing and
(2) I need to make other changes that touch the same areas of the
code, and I want to get this stuff done quickly so that we get a
user-visible feature people can test without writing C code in the
near future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: Freezing without cleanup lock
Next
From: Tomas Vondra
Date:
Subject: Re: pgbench gaussian/exponential docs improvements