Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Parallel Seq Scan
Date
Msg-id 20150218131400.GB16383@alap3.anarazel.de
Whole thread Raw
In response to Re: Parallel Seq Scan  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Parallel Seq Scan  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On 2015-02-18 16:59:26 +0530, Amit Kapila wrote:
> On Tue, Feb 17, 2015 at 9:52 PM, Andres Freund <andres@2ndquadrant.com>
> wrote:
> > A query whose runetime is dominated by a sequential scan (+ attached
> > filter) is certainly going to require a bigger prefetch size than one
> > that does other expensive stuff.
> >
> > Imagine parallelizing
> > SELECT * FROM largetable WHERE col = low_cardinality_value;
> > and
> > SELECT *
> > FROM largetable JOIN gigantic_table ON (index_nestloop_condition)
> > WHERE col = high_cardinality_value;
> >
> > The first query will be a simple sequential and disk reads on largetable
> > will be the major cost of executing it.  In contrast the second query
> > might very well sensibly be planned as a parallel sequential scan with
> > the nested loop executing in the same worker. But the cost of the
> > sequential scan itself will likely be completely drowned out by the
> > nestloop execution - index probes are expensive/unpredictable.

> I think the work/task given to each worker should be as granular
> as possible to make it more predictable.
> I think the better way to parallelize such a work (Join query) is that
> first worker does sequential scan and filtering on large table and
> then pass it to next worker for doing join with gigantic_table.

I'm pretty sure that'll result in rather horrible performance. IPC is
rather expensive, you want to do as little of it as possible.

> > >
> > > I think it makes sense to think of a set of tasks in which workers can
> > > assist.  So you a query tree which is just one query tree, with no
> > > copies of the nodes, and then there are certain places in that query
> > > tree where a worker can jump in and assist that node.  To do that, it
> > > will have a copy of the node, but that doesn't mean that all of the
> > > stuff inside the node becomes shared data at the code level, because
> > > that would be stupid.
> >
> > My only "problem" with that description is that I think workers will
> > have to work on more than one node - it'll be entire subtrees of the
> > executor tree.

> There could be some cases where it could be beneficial for worker
> to process a sub-tree, but I think there will be more cases where
> it will just work on a part of node and send the result back to either
> master backend or another worker for further processing.

I think many parallelism projects start out that way, and then notice
that it doesn't parallelize very efficiently.

The most extreme example, but common, is aggregation over large amounts
of data - unless you want to ship huge amounts of data between processes
eto parallize it you have to do the sequential scan and the
pre-aggregate step (that e.g. selects count() and sum() to implement a
avg over all the workers) inside one worker.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]
Next
From: Stephen Frost
Date:
Subject: Re: pgaudit - an auditing extension for PostgreSQL