Re: [HACKERS] why not parallel seq scan for slow functions - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: [HACKERS] why not parallel seq scan for slow functions |
Date | |
Msg-id | CAFjFpRd=7nrMg5eyU7RtVz_eDAJ9xjk5FnVBR7xdLVA5jaLjtA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] why not parallel seq scan for slow functions (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Responses |
Re: [HACKERS] why not parallel seq scan for slow functions
(Amit Kapila <amit.kapila16@gmail.com>)
|
List | pgsql-hackers |
On Mon, Feb 19, 2018 at 9:35 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > On Sat, Feb 17, 2018 at 8:17 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Fri, Feb 16, 2018 at 9:29 AM, Ashutosh Bapat >> <ashutosh.bapat@enterprisedb.com> wrote: >>> On Thu, Feb 15, 2018 at 7:47 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>>> On Thu, Feb 15, 2018 at 4:18 PM, Ashutosh Bapat >>>> <ashutosh.bapat@enterprisedb.com> wrote: >>>>> I happened to look at the patch for something else. But here are some >>>>> comments. If any of those have been already discussed, please feel >>>>> free to ignore. I have gone through the thread cursorily, so I might >>>>> have missed something. >>>>> >>>>> In grouping_planner() we call query_planner() first which builds the >>>>> join tree and creates paths, then calculate the target for scan/join >>>>> rel which is applied on the topmost scan join rel. I am wondering >>>>> whether we can reverse this order to calculate the target list of >>>>> scan/join relation and pass it to standard_join_search() (or the hook) >>>>> through query_planner(). >>>>> >>>> >>>> I think the reason for doing in that order is that we can't compute >>>> target's width till after query_planner(). See the below note in >>>> code: >>>> >>>> /* >>>> * Convert the query's result tlist into PathTarget format. >>>> * >>>> * Note: it's desirable to not do this till after query_planner(), >>>> * because the target width estimates can use per-Var width numbers >>>> * that were obtained within query_planner(). >>>> */ >>>> final_target = create_pathtarget(root, tlist); >>>> >>>> Now, I think we can try to juggle the code in a way that the width can >>>> be computed later, but the rest can be done earlier. >>> >>> /* Convenience macro to get a PathTarget with valid cost/width fields */ >>> #define create_pathtarget(root, tlist) \ >>> set_pathtarget_cost_width(root, make_pathtarget_from_tlist(tlist)) >>> create_pathtarget already works that way. We will need to split it. >>> >>> Create the Pathtarget without widths. Apply width estimates once we >>> know the width of Vars somewhere here in query_planner() >>> /* >>> * We should now have size estimates for every actual table involved in >>> * the query, and we also know which if any have been deleted from the >>> * query by join removal; so we can compute total_table_pages. >>> * >>> * Note that appendrels are not double-counted here, even though we don't >>> * bother to distinguish RelOptInfos for appendrel parents, because the >>> * parents will still have size zero. >>> * >>> The next step is building the join tree. Set the pathtarget before that. >>> >>>> However, I think >>>> that will be somewhat major change >>> >>> I agree. >>> >>>> still won't address all kind of >>>> cases (like for ordered paths) unless we can try to get all kind of >>>> targets pushed down in the call stack. >>> >>> I didn't understand that. >>> >> >> The places where we use a target different than the target which is >> pushed via query planner can cause a similar problem (ex. see the >> usage of adjust_paths_for_srfs) because the cost of that target >> wouldn't be taken into consideration for Gather's costing. >> > > Right. Right now apply_projection_to_path() or adjust_paths_for_srfs() > do not take into consideration the type of path whose cost is being > adjusted for the new targetlist. That works for most of the paths but > not all the paths like custom, FDW or parallel paths. The idea I am > proposing is to compute final targetlist before query planner so that > it's available when we create paths for the topmost scan/join > relation. That way, any path creation logic can then take advantage of > this list to compute costs, not just parallel paths. In fact, we should do this not just for scan/join relations, but all the upper relations as well. Upper relations too create parallel paths, whose costs need to be adjusted after their targetlists are updated by adjust_paths_for_srfs(). Similar adjustments are needed for any FDWs, custom paths which cost targetlists differently. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
pgsql-hackers by date: