Re: [HACKERS] why not parallel seq scan for slow functions - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] why not parallel seq scan for slow functions
Date
Msg-id CA+TgmoahVf2vrAqPtxcZ44qzGCDgNpfViizqUmLebOkpqikJ4Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] why not parallel seq scan for slow functions  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [HACKERS] why not parallel seq scan for slow functions
List pgsql-hackers
On Wed, Jan 31, 2018 at 11:48 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> I can see what you have in mind, but I think we still need to change
> the parallel safety flag of the path if  *_target is not parallel safe
> either inside apply_projection_to_path or may be outside where it is
> called.

Hmm.  You have a point.

That's not the only problem, though.  With these changes, we need to
check every place where  apply_projection_to_path is used to see
whether we're losing the ability to push down the target list below
Gather (Merge) in some situations.  If we are, then we need to see if
we can supply the correct target list to the code that generates the
partial path, before Gather (Merge) is added.

There are the following call sites:

* grouping_planner.
* create_ordered_paths (x2).
* adjust_path_for_srfs.
* build_minmax_path.
* recurse_set_operations (x2).

The cases in recurse_set_operations() don't matter, because the path
whose target list we're adjusting is known not to be a Gather path.
In the first call, it's definitely a Subquery Scan, and in the second
case it'll be a path implementing some set operation.
grouping_planner() is the core thing the patch is trying to fix, and
as far as I can tell the logic changes there are adequate.  Also, the
second case in create_ordered_paths() is fixed: the patch changes
things so that it inserts a projection path before Gather Merge if
that's safe to do so.

The other cases aren't so clear.  In the case of the first call within
create_ordered_paths, there's no loss in the !is_sorted case because
apply_projection_to_path will be getting called on a Sort path.  But
when is_sorted is true, the current code can push a target list into a
Gather or Gather Merge that was created with some other target list,
and with the patch it can't.  I'm not quite sure what sort of query
would trigger that problem, but it seems like there's something to
worry about there.  Similarly I can't see any obvious reason why this
isn't a problem for adjust_path_for_srfs and build_minmax_path as
well, although I haven't tried to construct queries that hit those
cases, either.

Now, we could just give up on this approach and leave that code in
apply_projection_to_path, but what's bothering me is that, presumably,
any place where that code is actually getting used has the same
problem that we're trying to fix in grouping_planner: the tlist evals
costs are not being factored into the decision as to which path we
should choose, which might make a good parallel path lose to an
inferior non-parallel path.  It would be best to fix that throughout
the code base rather than only fixing the more common paths -- if we
can do so with a reasonable amount of work.

Here's a new version which (a) restores the code that you pointed out,
(b) always passes the target to generate_gather_paths() instead of
treating NULL as a special case, and (c) reworks some of the comments
you added.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: MCV lists for highly skewed distributions
Next
From: Robert Haas
Date:
Subject: Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()