Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel Seq Scan
Date
Msg-id CAA4eK1KjhZ_LrhsvicbeV46sD4M+DUMun9HmqWYwWKJZ3dnjng@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Parallel Seq Scan  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Tue, Feb 17, 2015 at 9:52 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>
> On 2015-02-11 15:49:17 -0500, Robert Haas wrote:
>
> A query whose runetime is dominated by a sequential scan (+ attached
> filter) is certainly going to require a bigger prefetch size than one
> that does other expensive stuff.
>
> Imagine parallelizing
> SELECT * FROM largetable WHERE col = low_cardinality_value;
> and
> SELECT *
> FROM largetable JOIN gigantic_table ON (index_nestloop_condition)
> WHERE col = high_cardinality_value;
>
> The first query will be a simple sequential and disk reads on largetable
> will be the major cost of executing it.  In contrast the second query
> might very well sensibly be planned as a parallel sequential scan with
> the nested loop executing in the same worker. But the cost of the
> sequential scan itself will likely be completely drowned out by the
> nestloop execution - index probes are expensive/unpredictable.
>

I think the work/task given to each worker should be as granular
as possible to make it more predictable.
I think the better way to parallelize such a work (Join query) is that
first worker does sequential scan and filtering on large table and
then pass it to next worker for doing join with gigantic_table.

> >
> > I think it makes sense to think of a set of tasks in which workers can
> > assist.  So you a query tree which is just one query tree, with no
> > copies of the nodes, and then there are certain places in that query
> > tree where a worker can jump in and assist that node.  To do that, it
> > will have a copy of the node, but that doesn't mean that all of the
> > stuff inside the node becomes shared data at the code level, because
> > that would be stupid.
>
> My only "problem" with that description is that I think workers will
> have to work on more than one node - it'll be entire subtrees of the
> executor tree.
>

There could be some cases where it could be beneficial for worker
to process a sub-tree, but I think there will be more cases where
it will just work on a part of node and send the result back to either
master backend or another worker for further processing. 

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: pgaudit - an auditing extension for PostgreSQL
Next
From: Stephen Frost
Date:
Subject: Re: Odd behavior of updatable security barrier views on foreign tables