Re: parallel.c is not marked as test covered - Mailing list pgsql-hackers

From Robert Haas
Subject Re: parallel.c is not marked as test covered
Date
Msg-id CA+TgmoZfTM0H9mQm2T0hQ7pORstfyJvfrELeBvZpqo7Uv8t9Tg@mail.gmail.com
Whole thread Raw
In response to Re: parallel.c is not marked as test covered  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: parallel.c is not marked as test covered  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: parallel.c is not marked as test covered  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-hackers
On Sun, Jun 19, 2016 at 10:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> although I fear we
>> might be getting to a level of tinkering with parallel query that
>> starts to look more like feature development.
>
> Personally, I'm +1 for such tinkering if it makes the feature either more
> controllable or more understandable.  After reading the comments at the
> head of nodeGather.c, though, I don't think that single_copy is either
> understandable or useful, and merely renaming it won't help.  Apparently,
> it runs code in the worker, except when it doesn't, and even when it does,
> it's absolutely guaranteed to be a performance loss because the leader is
> doing nothing.  What in the world is the point?

The single_copy flag allows a Gather node to have a child plan which
is not intrinsically parallel.  For example, consider these two plans:

Gather
-> Parallel Seq Scan

Gather
-> Seq Scan

The first plan is safe regardless of the setting of the single-copy
flag.  If the plan is executed in every worker, the results in
aggregate across all workers will add up to the results of a
non-parallel sequential scan of the table.  The second plan is safe
only if the # of workers is 1 and the single-copy flag is set.  If
either of those things is not true, then more than one process might
try to execute the sequential scan, and the result will be that you'll
get N copies of the output, where N = (# of parallel workers) +
(leader also participates ? 1 : 0).

For force_parallel_mode = {on, regress}, the single-copy behavior is
essential.  We can run all of those plans inside a worker, but only
because we know that the leader won't also try to run those same
plans.

But it might be useful in other cases too.  For example, imagine a
plan like this:

Join
-> Join -> Join   -> Join     -> Gather (single copy)       -> Join         -> Join           -> Join             ->
Join              -> Scan (not parallel aware)
 

This is pipeline parallelism.  Instead of having one process do all of
the joins, you can have a worker do some subset of them and the send
the outputs back to the leader which can do the rest and return the
results to the client.  This is actually kind of hard to get right -
according to the literature I've read on parallel query - because you
can get pipeline stalls that erase most or all of the benefit, but
it's a possible area to explore.

Actually, though, the behavior I really want the single_copy flag to
embody is not so much "only one process runs this" but "leader does
not participate unless there are no workers", which is the same thing
only when the budgeted number of workers is one.  This is useful
because of plans like this:

Finalize HashAggregate
-> Gather -> Partial HashAggregate   -> Hash Join      -> Parallel Seq Scan on large_table      -> Hash        -> Seq
Scanon another_large_table
 

Unless the # of groups is very small, the leader actually won't
perform very much of the parallel-seq-scan on large_table, because
it'll be too busy aggregating the results from the other workers.
However, if it ever reaches a point where the Gather can't read a
tuple from one of the workers immediately, which is almost certain to
occur right at the beginning of execution, it's going to go build a
copy of the hash table so that it can "help" with the hash join.  By
the time it finishes, the workers will have done the same and be
feeding it results, and it will likely get little use out of the copy
that it built itself.  But it will still have gone to the effort of
building it.

For 10.0, Thomas Munro has already done a bunch of work, and will be
doing more work, so that we can build a shared hash table, rather than
one copy per worker.  That's going to be better when the table is
large anyway, so maybe this particular case won't matter so much.  But
in general when a partial path has a substantial startup cost, it may
be better for the leader not to get involved.  In a case like this,
it's hard to see how the leader's involvement can ever hurt:

Finalize HashAggregate
-> Gather -> Partial HashAggregate   -> Nested Loop      -> Parallel Seq Scan on large_table      -> Index Scan on
some_other_table

Even if the leader only processes only one or two pages of
large_table, there's no real harm done unless, I suppose, the combine
function is fabulously expensive, which seems unlikely.  The lack of
harm stems directly from the fact that there's no startup cost here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: 10.0
Next
From: Robert Haas
Date:
Subject: Re: parallel.c is not marked as test covered