Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Parallel Seq Scan
Date
Msg-id CA+TgmoYhHGOh6ktTJWMW6R9Fi8qhZzHBAf+Cfe5TdXWnnZ92cQ@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wed, Sep 30, 2015 at 11:23 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> - I don't believe that shm_toc *toc has any business being part of a
>> generic PlanState node.  At most, it should be part of an individual
>> type of PlanState, like a GatherState or PartialSeqScanState.  But
>> really, I don't see why we need it there at all.
>
> We need it for getting parallelheapscan descriptor in case of
> partial sequence scan node, it doesn't seem like a good idea
> to retrieve it in begining, as we need to dig into plan tree to
> get the node_id for getting the value of parallelheapscan descriptor
> from toc.
>
> Now, I think we can surely keep it in PartialSeqScanState or any
> other node state which might need it later, but I felt this is quite
> generic and we might need to fetch node specific information from toc
> going forward.

It's true that the PartialSeqScanState will need a way to get at the
toc, but I don't think that means we should stash it in the PlanState.
I've taken that part out for now.

>> - I think that a Gather node should inherit from Plan, not Scan.  A
>> Gather node really shouldn't have a scanrelid.  Now, admittedly, if
>> the only thing under the Gather is a Partial Seq Scan, it wouldn't be
>> totally bonkers to think of the Gather as scanning the same relation
>> that the Partial Seq Scan is scanning.  But in any more complex case,
>> like where it's scanning a join, you're out of luck.  You'll have to
>> set scanrelid == 0, I suppose, but then, for example, ExecScanReScan
>> is not going to work.  In fact, as far as I can see, the only way
>> nodeGather.c is actually using any of the generic scan stuff is by
>> calling ExecInitScanTupleSlot, which is all of one line of code.
>> ExecEndGather fetches node->ss.ss_currentRelation but then does
>> nothing with it.  So I think this is just a holdover from early
>> version of this patch where what's now Gather and PartialSeqScan were
>> a single node, and I think we should rip it out.
>
> makes sense and I think GatherState should also be inherit from PlanState
> instead of ScanState which I have changed in patch attached.

You missed a number of things while doing this - I cleaned them up.

>> - Also, I think this stuff about physical tlists in
>> create_gather_plan() is bogus.  use_physical_tlist is ignorant of the
>> possibility that the RelOptInfo passed to it might be anything other
>> than a baserel, and I think it won't be happy if it gets a joinrel.
>> Moreover, I think our plan here is that, at least for now, the
>> Gather's tlist will always match the tlist of its child.  If that's
>> so, there's no point to this: it will end up with the same tlist
>> either way.  If any projection is needed, it should be done by the
>> Gather node's child, not the Gather node itself.
>
> Yes, Gather node itself doesn't need to do projection, but it
> needs the projection info to store the same in Slot after fetching
> the tuple from tuple queue.  Now this is not required for Gather
> node itself, but it might be required for any node on top of
> Gather node.
>
> Here, I think one thing we could do is that use the subplan's target
> list as currently is being done for quals.  The only risk is what if
> Gating node is added on top of partialseqscan (subplan), but I have checked
> that is safe, because Gating plan uses the same target list as it's child.
> Also I don't think we need to process any quals at Gather node, so I will
> make that as Null, I will do this change in next version unless you see
> any problem with it.
>
> Yet another idea is during set_plan_refs(), we can assign leftchild's
> target list to parent in case of Gather node (right now it's done in
> reverse way which needs to be changed.)
>
> What is your preference?

I made it work like other nodes that inherit their left child's target list.

I made a few other changes as well:

- I wrote documentation for the GUCs.  This probably needs to be
expanded once we get the whole feature in, but it's something.

- I added a new single_copy option to the gather.  A single-copy
gather never tries to execute the plan itself, unless it can't get any
workers.  This is very handy for testing, since it lets you stick a
Gather node on top of an arbitrary plan and, if everything's working,
it should work just as if the Gather node weren't there.  I did a bit
of minor fiddling with the contents of the GatherState to make this
work.  It's also useful in real life, since somebody can stick a
single-copy Gather node into a plan someplace and run everything below
that in a worker.

- I fixed a bug in ExecGather - you were testing whether
node->pei->pcxt is NULL, which seg faults on the first time through.
The correct thing is to node->pei.

- Assorted cosmetic changes.

- I again left out the early-executor-stop stuff, preferring to leave
that for a separate commit.

That done, I have committed this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: No Issue Tracker - Say it Ain't So!
Next
From: Tomas Vondra
Date:
Subject: Re: PATCH: index-only scans with partial indexes