Home > mailing lists

Re: [DESIGN] ParallelAppend - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: [DESIGN] ParallelAppend
Date	August 25, 2015 03:53:58
Msg-id	CAA4eK1LNt6wQBCxKsMj_QC+GahBuwyKWsQn6UL3nWVQ2savzwg@mail.gmail.com Whole thread Raw
In response to	Re: [DESIGN] ParallelAppend (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
List	pgsql-hackers

Tree view

On Tue, Aug 25, 2015 at 6:19 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>
> > On Fri, Aug 21, 2015 at 7:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > It could be possible, but let me summarize what I thought would be required
> > for above use case. For Parallel Append, we need to push multiple
> > planned statements in contrast to one planned statement as is done for
> > current patch and then one or more parallel workers needs to work on each
> > planned statement. So if we know in advance how many planned statements
> > are we passing down (which we should), then using ParallelWorkerNumber
> > (ParallelWorkerNumber % num_planned_statements or some other similar
> > way), workers can find the the planned statement on which they need to work
> > and similarly information for PartialSeqScan (which currently is parallel heap
> > scan descriptor information).
> >
> My problem is that we have no identifier to point a particular element on
> the TOC segment even if PARALLEL_KEY_PLANNEDSTMT or PARALLEL_KEY_SCAN can
> have multiple items.
> Please assume a situation when ExecPartialSeqScan() has to lookup
> a particular item on TOC but multiple PartialSeqScan nodes can exist.
>
> Currently, it does:
> pscan = shm_toc_lookup(node->ss.ps.toc, PARALLEL_KEY_SCAN);
>
> However, ExecPartialSeqScan() cannot know which is the index of mine,
> or it is not reasonable to pay attention on other node in this level.
> Even if PARALLEL_KEY_SCAN has multiple items, PartialSeqScan node also
> needs to have identifier.
>

Yes that's right and I think we can find out the same. Basically we need to

know the planned statement number on which current worker is working and

that anyway we have to do before the worker can start the work. One way is

as I have explained above that use ParallelWorkerNumber

(ParallelWorkerNumber % num_planned_statements) to find or might need

some sophisticated way to find that out, but definitely we need to know that

before start of execution by worker and once we know that we can use it

find the PARALLEL_KEY_SCAN or whatever key for this worker (as the

the position of PARALLEL_KEY_SCAN will be same as of planned stmt

for a worker).

> > > I think KaiGai's correct,
> > > and I pointed out the same problem to you before. The parallel key
> > > for the Partial Seq Scan needs to be allocated on the fly and carried
> > > in the node, or we'll never be able to push multiple things below the
> > > funnel.
> >
> > Okay, immediately I don't see what is the best way to achieve this but
> > let us discuss this separately on Parallel Seq Scan thread and let me
> > know if you have something specific in your mind. I will also give this
> > a more thought.
> >
> I want to have 'node_id' in the Plan node, then unique identifier is
> assigned on the field prior to serialization. It is a property of the
> Plan node, so we can reproduce this identifier on the background worker
> side using stringToNode(), then ExecPartialSeqScan can pull out a proper
> field from the TOC segment by this node_id.

Okay, this can also work, but why to introduce identifier in plan node, if it

can work without it.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Tom Lane
Date: 25 August 2015, 02:41:52
Subject: Re: pg_controldata output alignment regression

From: David Rowley
Date: 25 August 2015, 05:26:07
Subject: Re: Performance improvement for joins where outer side is unique

Re: [DESIGN] ParallelAppend - Mailing list pgsql-hackers

Previous

Next