Home > mailing lists

[PATCH] Teach planner to further optimize sort in distinct - Mailing list pgsql-hackers

From	Ankit Kumar Pandey
Subject	[PATCH] Teach planner to further optimize sort in distinct
Date	January 17, 2023 19:27:54
Msg-id	da9425ae-8ff7-33d9-23b3-2a3eb605e106@gmail.com Whole thread Raw
Responses	Re: [PATCH] Teach planner to further optimize sort in distinct
List	pgsql-hackers

Tree view

Hi, this is extension of `teach planner to evaluate multiple windows in 
the optimal order` work applied to distinct operation.

Based on discussions before 

(https://www.postgresql.org/message-id/flat/CAApHDvr7rSCVXzGfVa1L9pLpkKj6-s8NynK8o%2B98X9sKjejnQQ%40mail.gmail.com#e01327a3053d9281c40f281ef7105b42)

,

 > All I imagine you need to do for it
 > is to invent a function in pathkeys.c which is along the lines of what
 > pathkeys_count_contained_in() does, but returns a List of pathkeys
 > which are in keys1 but not in keys2 and NIL if keys2 has a pathkey
 > that does not exist as a pathkey in keys1. In
 > create_final_distinct_paths(), you can then perform an incremental
 > sort on any input_path which has a non-empty return list and in
 > create_incremental_sort_path(), you'll pass presorted_keys as the
 > number of pathkeys in the path, and the required pathkeys the
 > input_path->pathkeys + the pathkeys returned from the new function.


There is bit confusion in wording here:

"returns a List of pathkeys
which are in keys1 but not in keys2 and NIL if keys2 has a pathkey
that does not exist as a pathkey in keys1."

You mean extract common keys without ordering right?

Example: keys1 = (a,b,c), keys2 = (b,a)

returns (a,b)

and

keys1 = (a,b,c), keys = (d)

returns = ()

which translates to

needed_pathkeys = (a,b,c) = key2

input_pathkeys = (b,a) key1

returns (b,a) = common_keys

new needed_pathkeys = unique(common_keys + old needed_pathkeys)

=> new needed_pathkeys = (b,a,c)

The new needed_pathkeys matches input_pathkeys.

This is what I implemented in the patch.


The patched version yields the following plans:

set enable_hashagg=0;
set enable_seqscan=0;

explain (costs off) select distinct relname,relkind,count(*) over 
(partition by
relkind) from pg_Class;
                        QUERY PLAN
---------------------------------------------------------
  Unique
    ->  Incremental Sort
          Sort Key: relkind, relname, (count(*) OVER (?))
          Presorted Key: relkind
          ->  WindowAgg
                ->  Sort
                      Sort Key: relkind
                      ->  Seq Scan on pg_class
(8 rows)

explain (costs off) select distinct a, b, count(*) over (partition by b, 
a) from abcd;
                        QUERY PLAN
--------------------------------------------------------
  Unique
    ->  Incremental Sort
          Sort Key: b, a, (count(*) OVER (?))
          Presorted Key: b, a
          ->  WindowAgg
                ->  Incremental Sort
                      Sort Key: b, a
                      Presorted Key: b
                      ->  Index Scan using b_idx on abcd
(9 rows)

explain (costs off) select distinct a, b, count(*) over (partition by c, 
d) from abcd;
                        QUERY PLAN
--------------------------------------------------------
  Unique
    ->  Sort
          Sort Key: a, b, (count(*) OVER (?))
          ->  WindowAgg
                ->  Incremental Sort
                      Sort Key: c, d
                      Presorted Key: c
                      ->  Index Scan using c_idx on abcd
(8 rows)


Issue with index path still remains as pathkeys get purged by 
truncate_useless_pathkeys

and hence are not available in create_final_distinct_paths for the above 
optimizations.


I have attached a patch for the reference.


Thanks,

Ankit

Attachment

distinct_sort_optimizations.patch

pgsql-hackers by date:

From: Peter Eisentraut
Date: 17 January 2023, 19:23:49
Subject: Re: constify arguments of copy_file() and copydir()

From: Isaac Morland
Date: 17 January 2023, 19:29:04
Subject: Re: Remove source code display from \df+?

[PATCH] Teach planner to further optimize sort in distinct - Mailing list pgsql-hackers

Attachment

Previous

Next