Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel Seq Scan
Date
Msg-id CAA4eK1K=P4cqpv_CU-0DNd93wv1KBKgvJ5rds3ZCDsQRpJPyrA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Parallel Seq Scan  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Mon, Dec 8, 2014 at 11:27 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sat, Dec 6, 2014 at 7:07 AM, Stephen Frost <sfrost@snowman.net> wrote:
> > For my 2c, I'd like to see it support exactly what the SeqScan node
> > supports and then also what Foreign Scan supports.  That would mean we'd
> > then be able to push filtering down to the workers which would be great.
> > Even better would be figuring out how to parallelize an Append node
> > (perhaps only possible when the nodes underneath are all SeqScan or
> > ForeignScan nodes) since that would allow us to then parallelize the
> > work across multiple tables and remote servers.
>
> I don't see how we can support the stuff ForeignScan does; presumably
> any parallelism there is up to the FDW to implement, using whatever
> in-core tools we provide.  I do agree that parallelizing Append nodes
> is useful; but let's get one thing done first before we start trying
> to do thing #2.
>
> > I'm not entirely following this.  How can the worker be responsible for
> > its own "plan" when the information passed to it (per the above
> > paragraph..) is pretty minimal?  In general, I don't think we need to
> > have specifics like "this worker is going to do exactly X" because we
> > will eventually need some communication to happen between the worker and
> > the master process where the worker can ask for more work because it's
> > finished what it was tasked with and the master will need to give it
> > another chunk of work to do.  I don't think we want exactly what each
> > worker process will do to be fully formed at the outset because, even
> > with the best information available, given concurrent load on the
> > system, it's not going to be perfect and we'll end up starving workers.
> > The plan, as formed by the master, should be more along the lines of
> > "this is what I'm gonna have my workers do" along w/ how many workers,
> > etc, and then it goes and does it.  Perhaps for an 'explain analyze' we
> > return information about what workers actually *did* what, but that's a
> > whole different discussion.
>
> I agree with this.  For a first version, I think it's OK to start a
> worker up for a particular sequential scan and have it help with that
> sequential scan until the scan is completed, and then exit.  It should
> not, as the present version of the patch does, assign a fixed block
> range to each worker; instead, workers should allocate a block or
> chunk of blocks to work on until no blocks remain.  That way, even if
> every worker but one gets stuck, the rest of the scan can still
> finish.
>

I will check on this point and see if it is feasible to do something on
those lines, basically currently at Executor initialization phase, we
set the scan limits and then during Executor Run phase use
heap_getnext to fetch the tuples accordingly, but doing it dynamically
means at ExecutorRun phase we need to reset the scan limit for
which page/pages to scan, still I have to check if there is any problem
with such an idea.  Do you any different idea in mind?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Parallel Seq Scan
Next
From: "Amit Langote"
Date:
Subject: Re: On partitioning