Re: parallel.c is not marked as test covered - Mailing list pgsql-hackers
From | David G. Johnston |
---|---|
Subject | Re: parallel.c is not marked as test covered |
Date | |
Msg-id | CAKFQuwZKXM48mW5pjWJhm3NBjqRRYjSq9VeJpZbssztKxTKg1Q@mail.gmail.com Whole thread Raw |
In response to | Re: parallel.c is not marked as test covered (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: parallel.c is not marked as test covered
|
List | pgsql-hackers |
On Sun, Jun 19, 2016 at 10:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> although I fear we
>> might be getting to a level of tinkering with parallel query that
>> starts to look more like feature development.
>
> Personally, I'm +1 for such tinkering if it makes the feature either more
> controllable or more understandable. After reading the comments at the
> head of nodeGather.c, though, I don't think that single_copy is either
> understandable or useful, and merely renaming it won't help. Apparently,
> it runs code in the worker, except when it doesn't, and even when it does,
> it's absolutely guaranteed to be a performance loss because the leader is
> doing nothing. What in the world is the point?
The single_copy flag allows a Gather node to have a child plan which
is not intrinsically parallel. For example, consider these two plans:
Gather
-> Parallel Seq Scan
Gather
-> Seq Scan
The first plan is safe regardless of the setting of the single-copy
flag. If the plan is executed in every worker, the results in
aggregate across all workers will add up to the results of a
non-parallel sequential scan of the table. The second plan is safe
only if the # of workers is 1 and the single-copy flag is set. If
either of those things is not true, then more than one process might
try to execute the sequential scan, and the result will be that you'll
get N copies of the output, where N = (# of parallel workers) +
(leader also participates ? 1 : 0).
For force_parallel_mode = {on, regress}, the single-copy behavior is
essential. We can run all of those plans inside a worker, but only
because we know that the leader won't also try to run those same
plans.
The entire theory here looks whacked - and seems to fall into the "GUCs controlling results" bucket of undesirable things.
Is this GUC enabled by a compile time directive, or otherwise protected from misuse in production?
I'm having trouble sounding smart here about what is bothering me but basically the parallel infrastructure (i.e., additional workers) shouldn't even be used for "Seq Scan" and a "Seq Scan" under a Gather should behave no differently than a "Parallel Seq Scan" under a Gather where all work is done by the leader because no workers were available to help.
At worse this behavior should be an implementation artifact of force_parallel_mode={on,regress}; at best the Gather node would simply have this intelligence on, always, so as not to silently generate bogus results in a mis-configured or buggy setup.
[...]
Actually, though, the behavior I really want the single_copy flag to
embody is not so much "only one process runs this" but "leader does
not participate unless there are no workers", which is the same thing
only when the budgeted number of workers is one.
This sounds an awful lot like a planner hint; especially since it defaults to off.
This is useful
because of plans like this:
Finalize HashAggregate
-> Gather
-> Partial HashAggregate
-> Hash Join
-> Parallel Seq Scan on large_table
-> Hash
-> Seq Scan on another_large_table
Unless the # of groups is very small, the leader actually won't
perform very much of the parallel-seq-scan on large_table, because
it'll be too busy aggregating the results from the other workers.
However, if it ever reaches a point where the Gather can't read a
tuple from one of the workers immediately, which is almost certain to
occur right at the beginning of execution, it's going to go build a
copy of the hash table so that it can "help" with the hash join. By
the time it finishes, the workers will have done the same and be
feeding it results, and it will likely get little use out of the copy
that it built itself. But it will still have gone to the effort of
building it.
For 10.0, Thomas Munro has already done a bunch of work, and will be
doing more work, so that we can build a shared hash table, rather than
one copy per worker. That's going to be better when the table is
large anyway, so maybe this particular case won't matter so much. But
in general when a partial path has a substantial startup cost, it may
be better for the leader not to get involved.
So have the Gather node understand this and act accordingly.
This is also quite different than the "we'll get wrong results" problem described above but which this GUC also attempts to solve.
I'm inclined to believe three things:
1) We need a test mode whereby we guarantee at least one worker is used for processing.
2) Gather needs to be inherently smart enough to accept data from a non-parallel source.
3) Gather needs to use its knowledge (hopefully it has some) of partial plan startup costs and worker availability to decide whether it wants to participate in both hunting and gathering. It should make this decision once at startup and live with it for the duration.
The first option seems logical but doesn't actually address either of the two scenarios described as motivation for the existing GUC.
The second and third options are presented to address the two scenarios via an alternative, non-GUC, solution.
As for "we must have exactly one worker and it must perform all of the hunting, the leader shall only gather": Peter's comment seems to drive this particular use case, and it does seem to be test oriented, to prove certain capabilities are functioning correctly in parallel mode. That said I'm not positive that "and the leader does nothing" is a mandatory aspect of this need. If not having a setting like: <min_parallel_degree^D^D^D^D^D^Dworkers_per_gather> may be sufficient. I guess a <parallel_leader_gather_only=on> option could be added to control the leader's participation orthogonally.
David J.
pgsql-hackers by date: