Home > mailing lists

Re: Parallel Seq Scan - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Parallel Seq Scan
Date	February 18, 2015 14:29:36
Msg-id	CAA4eK1KjhZ_LrhsvicbeV46sD4M+DUMun9HmqWYwWKJZ3dnjng@mail.gmail.com Whole thread Raw
In response to	Re: Parallel Seq Scan (Andres Freund <andres@2ndquadrant.com>)
Responses	Re: Parallel Seq Scan (Andres Freund <andres@2ndquadrant.com>)
List	pgsql-hackers

Tree view

On Tue, Feb 17, 2015 at 9:52 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>
> On 2015-02-11 15:49:17 -0500, Robert Haas wrote:
>
> A query whose runetime is dominated by a sequential scan (+ attached
> filter) is certainly going to require a bigger prefetch size than one
> that does other expensive stuff.
>
> Imagine parallelizing
> SELECT * FROM largetable WHERE col = low_cardinality_value;
> and
> SELECT *
> FROM largetable JOIN gigantic_table ON (index_nestloop_condition)
> WHERE col = high_cardinality_value;
>
> The first query will be a simple sequential and disk reads on largetable
> will be the major cost of executing it. In contrast the second query
> might very well sensibly be planned as a parallel sequential scan with
> the nested loop executing in the same worker. But the cost of the
> sequential scan itself will likely be completely drowned out by the
> nestloop execution - index probes are expensive/unpredictable.
>

I think the work/task given to each worker should be as granular

as possible to make it more predictable.

I think the better way to parallelize such a work (Join query) is that

first worker does sequential scan and filtering on large table and

then pass it to next worker for doing join with gigantic_table.

> >
> > I think it makes sense to think of a set of tasks in which workers can
> > assist. So you a query tree which is just one query tree, with no
> > copies of the nodes, and then there are certain places in that query
> > tree where a worker can jump in and assist that node. To do that, it
> > will have a copy of the node, but that doesn't mean that all of the
> > stuff inside the node becomes shared data at the code level, because
> > that would be stupid.
>
> My only "problem" with that description is that I think workers will
> have to work on more than one node - it'll be entire subtrees of the
> executor tree.
>

There could be some cases where it could be beneficial for worker

to process a sub-tree, but I think there will be more cases where

it will just work on a part of node and send the result back to either

master backend or another worker for further processing.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Fujii Masao
Date: 18 February 2015, 14:11:27
Subject: Re: pgaudit - an auditing extension for PostgreSQL

From: Stephen Frost
Date: 18 February 2015, 15:44:49
Subject: Re: Odd behavior of updatable security barrier views on foreign tables

Re: Parallel Seq Scan - Mailing list pgsql-hackers

Previous

Next