Home > mailing lists

Re: [HACKERS] why not parallel seq scan for slow functions - Mailing list pgsql-hackers

From	Ashutosh Bapat
Subject	Re: [HACKERS] why not parallel seq scan for slow functions
Date	February 19, 2018 07:26:35
Msg-id	CAFjFpRd=7nrMg5eyU7RtVz_eDAJ9xjk5FnVBR7xdLVA5jaLjtA@mail.gmail.com Whole thread
In response to	Re: [HACKERS] why not parallel seq scan for slow functions (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Responses	Re: [HACKERS] why not parallel seq scan for slow functions
List	pgsql-hackers

Tree view

On Mon, Feb 19, 2018 at 9:35 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Sat, Feb 17, 2018 at 8:17 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Fri, Feb 16, 2018 at 9:29 AM, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>>> On Thu, Feb 15, 2018 at 7:47 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>> On Thu, Feb 15, 2018 at 4:18 PM, Ashutosh Bapat
>>>> <ashutosh.bapat@enterprisedb.com> wrote:
>>>>> I happened to look at the patch for something else. But here are some
>>>>> comments. If any of those have been already discussed, please feel
>>>>> free to ignore. I have gone through the thread cursorily, so I might
>>>>> have missed something.
>>>>>
>>>>> In grouping_planner() we call query_planner() first which builds the
>>>>> join tree and creates paths, then calculate the target for scan/join
>>>>> rel which is applied on the topmost scan join rel. I am wondering
>>>>> whether we can reverse this order to calculate the target list of
>>>>> scan/join relation and pass it to standard_join_search() (or the hook)
>>>>> through query_planner().
>>>>>
>>>>
>>>> I think the reason for doing in that order is that we can't compute
>>>> target's width till after query_planner().  See the below note in
>>>> code:
>>>>
>>>> /*
>>>> * Convert the query's result tlist into PathTarget format.
>>>> *
>>>> * Note: it's desirable to not do this till after query_planner(),
>>>> * because the target width estimates can use per-Var width numbers
>>>> * that were obtained within query_planner().
>>>> */
>>>> final_target = create_pathtarget(root, tlist);
>>>>
>>>> Now, I think we can try to juggle the code in a way that the width can
>>>> be computed later, but the rest can be done earlier.
>>>
>>> /* Convenience macro to get a PathTarget with valid cost/width fields */
>>> #define create_pathtarget(root, tlist) \
>>>     set_pathtarget_cost_width(root, make_pathtarget_from_tlist(tlist))
>>> create_pathtarget already works that way. We will need to split it.
>>>
>>> Create the Pathtarget without widths. Apply width estimates once we
>>> know the width of Vars somewhere here in query_planner()
>>>     /*
>>>      * We should now have size estimates for every actual table involved in
>>>      * the query, and we also know which if any have been deleted from the
>>>      * query by join removal; so we can compute total_table_pages.
>>>      *
>>>      * Note that appendrels are not double-counted here, even though we don't
>>>      * bother to distinguish RelOptInfos for appendrel parents, because the
>>>      * parents will still have size zero.
>>>      *
>>> The next step is building the join tree. Set the pathtarget before that.
>>>
>>>> However, I think
>>>> that will be somewhat major change
>>>
>>> I agree.
>>>
>>>>  still won't address all kind of
>>>> cases (like for ordered paths) unless we can try to get all kind of
>>>> targets pushed down in the call stack.
>>>
>>> I didn't understand that.
>>>
>>
>> The places where we use a target different than the target which is
>> pushed via query planner can cause a similar problem (ex. see the
>> usage of adjust_paths_for_srfs) because the cost of that target
>> wouldn't be taken into consideration for Gather's costing.
>>
>
> Right. Right now apply_projection_to_path() or adjust_paths_for_srfs()
> do not take into consideration the type of path whose cost is being
> adjusted for the new targetlist. That works for most of the paths but
> not all the paths like custom, FDW or parallel paths. The idea I am
> proposing is to compute final targetlist before query planner so that
> it's available when we create paths for the topmost scan/join
> relation. That way, any path creation logic can then take advantage of
> this list to compute costs, not just parallel paths.

In fact, we should do this not just for scan/join relations, but all
the upper relations as well. Upper relations too create parallel
paths, whose costs need to be adjusted after their targetlists are
updated by adjust_paths_for_srfs(). Similar adjustments are needed for
any FDWs, custom paths which cost targetlists differently.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

pgsql-hackers by date:

From: Amit Langote
Date: 19 February 2018, 07:19:10
Subject: Re: non-bulk inserts and tuple routing

From: Amit Langote
Date: 19 February 2018, 07:59:28
Subject: Re: [HACKERS] path toward faster partition pruning

Re: [HACKERS] why not parallel seq scan for slow functions - Mailing list pgsql-hackers

Previous

Next