Re: Question about commit 11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5 - Mailing list pgsql-hackers
From | Etsuro Fujita |
---|---|
Subject | Re: Question about commit 11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5 |
Date | |
Msg-id | 5C7E2CAB.2040001@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Question about commit 11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5 (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Question about commit 11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5
|
List | pgsql-hackers |
(2019/03/02 1:10), Robert Haas wrote: > On Fri, Mar 1, 2019 at 5:47 AM Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> Robert, I CCed you because you are the author of that commit. Before >> that commit ("Rewrite the code that applies scan/join targets to >> paths."), apply_scanjoin_target_to_paths() had a boolean parameter named >> modify_in_place, and used apply_projection_to_path(), not >> create_projection_path(), to adjust scan/join paths when modify_in_place >> was true, which allowed us to save cycles at plan creation time by >> avoiding creating projection paths, which I think would be a good thing, >> but that commit removed that. Why? > > One of the goals of the commit was to properly account for the cost of > computing the target list. Before parallel query and partition-wise > join, it didn't really matter what the cost of computing the target > list was, because every path was going to have to do the same work, so > it was just a constant factor getting added to every path. However, > parallel query and partition-wise join mean that some paths can > compute the final target list more cheaply than others, and that turns > out to be important for things like PostGIS. One of the complaints > that provoked that change was that PostGIS was picking non-parallel > plans even when a parallel plan was substantially superior, because it > wasn't accounting for the fact that in the parallel plan, the cost of > computing the target-list could be shared across all the workers > rather than paid entirely by the leader. > > In order to accomplish this goal of properly accounting for the cost > of computing the target list, we need to create a new path, not just > jam the target list into an already-costed path. Note that we did > some performance optimization around the same time to minimize the > performance hit here (see d7c19e62a8e0a634eb6b29f8f1111d944e57081f, > and I think there may have been something else as well although I > can't find it right now). apply_projection_to_path() not only jams the given tlist into the existing path but updates its tlist eval costs appropriately except for the cases of Gather and GatherMerge: /* * If the path happens to be a Gather or GatherMerge path, we'd like to * arrange for the subpath to return the required target list so that * workers can help project. But if there is something that is not * parallel-safe in the target expressions, then we can't. */ if ((IsA(path, GatherPath) ||IsA(path, GatherMergePath)) && is_parallel_safe(root, (Node *) target->exprs)) { /* * We always use create_projection_path here, even if the subpath is * projection-capable, so as to avoid modifying the subpath in place. * It seems unlikely at present that there could be any other * references to the subpath, but better safe than sorry. * --> * Note that we don't change the parallel path's cost estimates; it --> * might be appropriate to do so, to reflect the fact that the bulk of --> * the target evaluation will happen in workers. */ if (IsA(path, GatherPath)) { GatherPath *gpath = (GatherPath *) path; gpath->subpath = (Path *) create_projection_path(root, gpath->subpath->parent, gpath->subpath, target); } else { GatherMergePath *gmpath = (GatherMergePath *) path; gmpath->subpath = (Path *) create_projection_path(root, gmpath->subpath->parent, gmpath->subpath, target); } } >> The real reason for this question is: I noticed that projection paths >> put on foreign paths will make it hard for FDWs to detect whether there >> is an already-well-enough-sorted remote path in the pathlist for the >> final scan/join relation as an input relation to GetForeignUpperPaths() >> for the UPPERREL_ORDERED step (the IsA(path, ForeignPath) test would not >> work well enough to detect remote paths!), so I'm wondering whether we >> could revive that parameter like the attached, to avoid the overhead at >> plan creation time and to make the FDW work easy. Maybe I'm missing >> something, though. > > I think this would be a bad idea, for the reasons explained above. I > also think that it's probably the wrong direction on principle. I > think the way we account for target lists is still pretty crude and > needs to be thought of as more of a real planning step and less as > something that we can just ignore when it's inconvenient for some > reason. I'm not sure I 100% agree with you, but I also think we need to give more thought to the tlist-eval-cost adjustment. > I think the FDW just needs to look through the projection > path and see what's underneath, but also take the projection path's > target list into account when decide whether more can be pushed down. OK, I'll go with that. Thanks for the explanation! Best regards, Etsuro Fujita
pgsql-hackers by date: