Re: Todo: Teach planner to evaluate multiple windows in the optimal order - Mailing list pgsql-hackers

From David Rowley
Subject Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Date
Msg-id CAApHDvpdD1jytGoyJxtGkGJCzWUtnmfgQpeSHdf_vYXs5W01NQ@mail.gmail.com
Whole thread Raw
In response to Re: Todo: Teach planner to evaluate multiple windows in the optimal order  (Ankit Kumar Pandey <itsankitkp@gmail.com>)
Responses Re: Todo: Teach planner to evaluate multiple windows in the optimal order
List pgsql-hackers
On Mon, 16 Jan 2023 at 06:52, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
> Hence, input_rel->pathlist returns null for select distinct b,a from ab
> where a < 10; even if index is created on a.
>
> In order to tackle this, I have added index_pathkeys in indexpath node
> itself.

I don't think we should touch this.  It could significantly increase
the number of indexes that we consider when generating paths on base
relations and therefore *significantly* increase the number of paths
we consider during the join search.  What I had in mind was just
making better use of existing paths to see if we can find a cheaper
way to perform the DISTINCT.  That'll only possibly increase the
number of paths for the distinct upper relation which really only
increases the number of paths which are considered in
create_ordered_paths(). That's unlikely to cause much of a slowdown in
the planner.

> Although I started this patch from master, I merged changes to window sort
> optimizations.

I'm seeing these two things as separate patches. I don't think there's
any need to add further complexity to the patch that tries to reduce
the number of sorts for WindowAggs. I think you'd better start a new
thread for this.

> Also, I am assuming distinct pathkeys can be changed without any issues.
> As changes are limited to modification in distinct path only,

As far as I see it, you shouldn't be touching the distinct_pathkeys.
Those are set in such a way as to minimise the likelihood of an
additional sort for the ORDER BY.  If you've fiddled with that, then I
imagine this is why the plan below has an additional Incremental Sort
that didn't exist before.

I've not looked at your patch, but all I imagine you need to do for it
is to invent a function in pathkeys.c which is along the lines of what
pathkeys_count_contained_in() does, but returns a List of pathkeys
which are in keys1 but not in keys2 and NIL if keys2 has a pathkey
that does not exist as a pathkey in keys1. In
create_final_distinct_paths(), you can then perform an incremental
sort on any input_path which has a non-empty return list and in
create_incremental_sort_path(), you'll pass presorted_keys as the
number of pathkeys in the path, and the required pathkeys the
input_path->pathkeys + the pathkeys returned from the new function.

As an optimization, you might want to consider that the
distinct_pathkeys list might be long and that the new function, if you
code the lookup as a nested loop, might be slow. You might want to
consider hashing the distinct_pathkeys once in
create_final_distinct_paths(), then for each input_path, perform a
series of hash lookups to see which of the input_path->pathkeys are in
the hash table. That might require adding two functions to pathkeys.c,
one to build the hash table and then another to probe it and return
the remaining pathkeys list.  I'd go and make sure it all works as we
expect before going to the trouble of trying to optimize this. A
simple nested loop lookup will allow us to review that this works as
we expect.

> EXPLAIN (COSTS OFF)
> SELECT DISTINCT
>         empno,
>         enroll_date,
>         depname,
>         sum(salary) OVER (PARTITION BY depname order by empno) depsalary,
>         min(salary) OVER (PARTITION BY depname order by enroll_date) depminsalary
> FROM empsalary
> ORDER BY depname, empno;
>                                               QUERY PLAN
> -----------------------------------------------------------------------------------------------------
>   Incremental Sort
>     Sort Key: depname, empno
>     Presorted Key: depname
>     ->  Unique
>           ->  Incremental Sort
>                 Sort Key: depname, enroll_date, empno, (sum(salary) OVER (?)), (min(salary) OVER (?))
>                 Presorted Key: depname
>                 ->  WindowAgg
>                       ->  Incremental Sort
>                             Sort Key: depname, empno
>                             Presorted Key: depname
>                             ->  WindowAgg
>                                   ->  Sort
>                                         Sort Key: depname, enroll_date
>                                         ->  Seq Scan on empsalary
> (15 rows)

David



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Support logical replication of DDLs
Next
From: Peter Smith
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply