Re: Question about commit 11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5 - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: Question about commit 11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5
Date
Msg-id 5C7E2CAB.2040001@lab.ntt.co.jp
Whole thread Raw
In response to Re: Question about commit 11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Question about commit 11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5
List pgsql-hackers
(2019/03/02 1:10), Robert Haas wrote:
> On Fri, Mar 1, 2019 at 5:47 AM Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> Robert, I CCed you because you are the author of that commit.  Before
>> that commit ("Rewrite the code that applies scan/join targets to
>> paths."), apply_scanjoin_target_to_paths() had a boolean parameter named
>> modify_in_place, and used apply_projection_to_path(), not
>> create_projection_path(), to adjust scan/join paths when modify_in_place
>> was true, which allowed us to save cycles at plan creation time by
>> avoiding creating projection paths, which I think would be a good thing,
>> but that commit removed that.  Why?
>
> One of the goals of the commit was to properly account for the cost of
> computing the target list.  Before parallel query and partition-wise
> join, it didn't really matter what the cost of computing the target
> list was, because every path was going to have to do the same work, so
> it was just a constant factor getting added to every path.  However,
> parallel query and partition-wise join mean that some paths can
> compute the final target list more cheaply than others, and that turns
> out to be important for things like PostGIS.  One of the complaints
> that provoked that change was that PostGIS was picking non-parallel
> plans even when a parallel plan was substantially superior, because it
> wasn't accounting for the fact that in the parallel plan, the cost of
> computing the target-list could be shared across all the workers
> rather than paid entirely by the leader.
>
> In order to accomplish this goal of properly accounting for the cost
> of computing the target list, we need to create a new path, not just
> jam the target list into an already-costed path.  Note that we did
> some performance optimization around the same time to minimize the
> performance hit here (see d7c19e62a8e0a634eb6b29f8f1111d944e57081f,
> and I think there may have been something else as well although I
> can't find it right now).

apply_projection_to_path() not only jams the given tlist into the 
existing path but updates its tlist eval costs appropriately except for 
the cases of Gather and GatherMerge:

     /*
      * If the path happens to be a Gather or GatherMerge path, we'd like to
      * arrange for the subpath to return the required target list so that
      * workers can help project.  But if there is something that is not
      * parallel-safe in the target expressions, then we can't.
      */
     if ((IsA(path, GatherPath) ||IsA(path, GatherMergePath)) &&
         is_parallel_safe(root, (Node *) target->exprs))
     {
         /*
          * We always use create_projection_path here, even if the 
subpath is
          * projection-capable, so as to avoid modifying the subpath in 
place.
          * It seems unlikely at present that there could be any other
          * references to the subpath, but better safe than sorry.
          *
-->      * Note that we don't change the parallel path's cost estimates; it
-->      * might be appropriate to do so, to reflect the fact that the 
bulk of
-->      * the target evaluation will happen in workers.
          */
         if (IsA(path, GatherPath))
         {
             GatherPath *gpath = (GatherPath *) path;

             gpath->subpath = (Path *)
                 create_projection_path(root,
                                        gpath->subpath->parent,
                                        gpath->subpath,
                                        target);
         }
         else
         {
             GatherMergePath *gmpath = (GatherMergePath *) path;

             gmpath->subpath = (Path *)
                 create_projection_path(root,
                                        gmpath->subpath->parent,
                                        gmpath->subpath,
                                        target);
         }
     }

>> The real reason for this question is: I noticed that projection paths
>> put on foreign paths will make it hard for FDWs to detect whether there
>> is an already-well-enough-sorted remote path in the pathlist for the
>> final scan/join relation as an input relation to GetForeignUpperPaths()
>> for the UPPERREL_ORDERED step (the IsA(path, ForeignPath) test would not
>> work well enough to detect remote paths!), so I'm wondering whether we
>> could revive that parameter like the attached, to avoid the overhead at
>> plan creation time and to make the FDW work easy.  Maybe I'm missing
>> something, though.
>
> I think this would be a bad idea, for the reasons explained above.  I
> also think that it's probably the wrong direction on principle.  I
> think the way we account for target lists is still pretty crude and
> needs to be thought of as more of a real planning step and less as
> something that we can just ignore when it's inconvenient for some
> reason.

I'm not sure I 100% agree with you, but I also think we need to give 
more thought to the tlist-eval-cost adjustment.

> I think the FDW just needs to look through the projection
> path and see what's underneath, but also take the projection path's
> target list into account when decide whether more can be pushed down.

OK, I'll go with that.

Thanks for the explanation!

Best regards,
Etsuro Fujita



pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: Re: \describe*
Next
From: David Steele
Date:
Subject: Re: Re: [PATCH] pgbench tap tests fail if the path contains a perlspecial character