[PATCH] Teach planner to further optimize sort in distinct - Mailing list pgsql-hackers
From | Ankit Kumar Pandey |
---|---|
Subject | [PATCH] Teach planner to further optimize sort in distinct |
Date | |
Msg-id | da9425ae-8ff7-33d9-23b3-2a3eb605e106@gmail.com Whole thread Raw |
Responses |
Re: [PATCH] Teach planner to further optimize sort in distinct
|
List | pgsql-hackers |
Hi, this is extension of `teach planner to evaluate multiple windows in the optimal order` work applied to distinct operation. Based on discussions before (https://www.postgresql.org/message-id/flat/CAApHDvr7rSCVXzGfVa1L9pLpkKj6-s8NynK8o%2B98X9sKjejnQQ%40mail.gmail.com#e01327a3053d9281c40f281ef7105b42) , > All I imagine you need to do for it > is to invent a function in pathkeys.c which is along the lines of what > pathkeys_count_contained_in() does, but returns a List of pathkeys > which are in keys1 but not in keys2 and NIL if keys2 has a pathkey > that does not exist as a pathkey in keys1. In > create_final_distinct_paths(), you can then perform an incremental > sort on any input_path which has a non-empty return list and in > create_incremental_sort_path(), you'll pass presorted_keys as the > number of pathkeys in the path, and the required pathkeys the > input_path->pathkeys + the pathkeys returned from the new function. There is bit confusion in wording here: "returns a List of pathkeys which are in keys1 but not in keys2 and NIL if keys2 has a pathkey that does not exist as a pathkey in keys1." You mean extract common keys without ordering right? Example: keys1 = (a,b,c), keys2 = (b,a) returns (a,b) and keys1 = (a,b,c), keys = (d) returns = () which translates to needed_pathkeys = (a,b,c) = key2 input_pathkeys = (b,a) key1 returns (b,a) = common_keys new needed_pathkeys = unique(common_keys + old needed_pathkeys) => new needed_pathkeys = (b,a,c) The new needed_pathkeys matches input_pathkeys. This is what I implemented in the patch. The patched version yields the following plans: set enable_hashagg=0; set enable_seqscan=0; explain (costs off) select distinct relname,relkind,count(*) over (partition by relkind) from pg_Class; QUERY PLAN --------------------------------------------------------- Unique -> Incremental Sort Sort Key: relkind, relname, (count(*) OVER (?)) Presorted Key: relkind -> WindowAgg -> Sort Sort Key: relkind -> Seq Scan on pg_class (8 rows) explain (costs off) select distinct a, b, count(*) over (partition by b, a) from abcd; QUERY PLAN -------------------------------------------------------- Unique -> Incremental Sort Sort Key: b, a, (count(*) OVER (?)) Presorted Key: b, a -> WindowAgg -> Incremental Sort Sort Key: b, a Presorted Key: b -> Index Scan using b_idx on abcd (9 rows) explain (costs off) select distinct a, b, count(*) over (partition by c, d) from abcd; QUERY PLAN -------------------------------------------------------- Unique -> Sort Sort Key: a, b, (count(*) OVER (?)) -> WindowAgg -> Incremental Sort Sort Key: c, d Presorted Key: c -> Index Scan using c_idx on abcd (8 rows) Issue with index path still remains as pathkeys get purged by truncate_useless_pathkeys and hence are not available in create_final_distinct_paths for the above optimizations. I have attached a patch for the reference. Thanks, Ankit
Attachment
pgsql-hackers by date: