Re: Todo: Teach planner to evaluate multiple windows in the optimal order - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Todo: Teach planner to evaluate multiple windows in the optimal order |
Date | |
Msg-id | CAApHDvpdD1jytGoyJxtGkGJCzWUtnmfgQpeSHdf_vYXs5W01NQ@mail.gmail.com Whole thread Raw |
In response to | Re: Todo: Teach planner to evaluate multiple windows in the optimal order (Ankit Kumar Pandey <itsankitkp@gmail.com>) |
Responses |
Re: Todo: Teach planner to evaluate multiple windows in the optimal order
|
List | pgsql-hackers |
On Mon, 16 Jan 2023 at 06:52, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote: > Hence, input_rel->pathlist returns null for select distinct b,a from ab > where a < 10; even if index is created on a. > > In order to tackle this, I have added index_pathkeys in indexpath node > itself. I don't think we should touch this. It could significantly increase the number of indexes that we consider when generating paths on base relations and therefore *significantly* increase the number of paths we consider during the join search. What I had in mind was just making better use of existing paths to see if we can find a cheaper way to perform the DISTINCT. That'll only possibly increase the number of paths for the distinct upper relation which really only increases the number of paths which are considered in create_ordered_paths(). That's unlikely to cause much of a slowdown in the planner. > Although I started this patch from master, I merged changes to window sort > optimizations. I'm seeing these two things as separate patches. I don't think there's any need to add further complexity to the patch that tries to reduce the number of sorts for WindowAggs. I think you'd better start a new thread for this. > Also, I am assuming distinct pathkeys can be changed without any issues. > As changes are limited to modification in distinct path only, As far as I see it, you shouldn't be touching the distinct_pathkeys. Those are set in such a way as to minimise the likelihood of an additional sort for the ORDER BY. If you've fiddled with that, then I imagine this is why the plan below has an additional Incremental Sort that didn't exist before. I've not looked at your patch, but all I imagine you need to do for it is to invent a function in pathkeys.c which is along the lines of what pathkeys_count_contained_in() does, but returns a List of pathkeys which are in keys1 but not in keys2 and NIL if keys2 has a pathkey that does not exist as a pathkey in keys1. In create_final_distinct_paths(), you can then perform an incremental sort on any input_path which has a non-empty return list and in create_incremental_sort_path(), you'll pass presorted_keys as the number of pathkeys in the path, and the required pathkeys the input_path->pathkeys + the pathkeys returned from the new function. As an optimization, you might want to consider that the distinct_pathkeys list might be long and that the new function, if you code the lookup as a nested loop, might be slow. You might want to consider hashing the distinct_pathkeys once in create_final_distinct_paths(), then for each input_path, perform a series of hash lookups to see which of the input_path->pathkeys are in the hash table. That might require adding two functions to pathkeys.c, one to build the hash table and then another to probe it and return the remaining pathkeys list. I'd go and make sure it all works as we expect before going to the trouble of trying to optimize this. A simple nested loop lookup will allow us to review that this works as we expect. > EXPLAIN (COSTS OFF) > SELECT DISTINCT > empno, > enroll_date, > depname, > sum(salary) OVER (PARTITION BY depname order by empno) depsalary, > min(salary) OVER (PARTITION BY depname order by enroll_date) depminsalary > FROM empsalary > ORDER BY depname, empno; > QUERY PLAN > ----------------------------------------------------------------------------------------------------- > Incremental Sort > Sort Key: depname, empno > Presorted Key: depname > -> Unique > -> Incremental Sort > Sort Key: depname, enroll_date, empno, (sum(salary) OVER (?)), (min(salary) OVER (?)) > Presorted Key: depname > -> WindowAgg > -> Incremental Sort > Sort Key: depname, empno > Presorted Key: depname > -> WindowAgg > -> Sort > Sort Key: depname, enroll_date > -> Seq Scan on empsalary > (15 rows) David
pgsql-hackers by date: