Re: Todo: Teach planner to evaluate multiple windows in the optimal order - Mailing list pgsql-hackers
From | Ankit Kumar Pandey |
---|---|
Subject | Re: Todo: Teach planner to evaluate multiple windows in the optimal order |
Date | |
Msg-id | 4303947c-c313-ec43-d442-80c4c8e92490@gmail.com Whole thread Raw |
In response to | Re: Todo: Teach planner to evaluate multiple windows in the optimal order (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Todo: Teach planner to evaluate multiple windows in the optimal order
|
List | pgsql-hackers |
> On 13/01/23 07:48, David Rowley wrote: > It would just care if the > pathkeys were present and return a list of pathkeys not contained so > that an incremental sort could be done only on the returned list and a > Unique on an empty returned list. In create_final_distinct_paths, presorted keys are determined from input_rel->pathlist & needed_pathkeys. Problem with input_rel->pathlist is that, for index node, useful_pathkeys is stored in input_rel->pathlist but this useful_pathkeys is determined from truncate_useless_pathkeys(index_pathkeys) which removes index_keys if ordering is different. Hence, input_rel->pathlist returns null for select distinct b,a from ab where a < 10; even if index is created on a. In order to tackle this, I have added index_pathkeys in indexpath node itself. Although I started this patch from master, I merged changes to window sort optimizations. In patched version: set enable_hashagg=0; set enable_seqscan=0; explain select distinct relname,relkind,count(*) over (partition by relkind) from pg_Class; QUERY PLAN ------------------------------------------------------------------------------------------------------- Unique (cost=10000000039.49..10000000063.73 rows=415 width=73) -> Incremental Sort (cost=10000000039.49..10000000060.62 rows=415 width=73) Sort Key: relkind, relname, (count(*) OVER (?)) Presorted Key: relkind -> WindowAgg (cost=10000000034.20..10000000041.46 rows=415 width=73) -> Sort (cost=10000000034.20..10000000035.23 rows=415 width=65) Sort Key: relkind -> Seq Scan on pg_class (cost=10000000000.00..10000000016.15 rows=415 width=65) (8 rows) explain select distinct b,a from ab where a < 10; QUERY PLAN ---------------------------------------------------------------------------------- Unique (cost=0.71..60.05 rows=611 width=8) -> Incremental Sort (cost=0.71..55.55 rows=900 width=8) Sort Key: a, b Presorted Key: a -> Index Scan using ab_a_idx on ab (cost=0.29..28.04 rows=900 width=8) Index Cond: (a < 10) (6 rows) explain select distinct b,a, count(*) over (partition by a) from abcd order by a,b; QUERY PLAN ----------------------------------------------------------------------------------------------------- Unique (cost=10000021174.63..10000038095.75 rows=60 width=16) -> Incremental Sort (cost=10000021174.63..10000036745.75 rows=180000 width=16) Sort Key: a, b, (count(*) OVER (?)) Presorted Key: a, b -> WindowAgg (cost=10000020948.87..10000024098.87 rows=180000 width=16) -> Sort (cost=10000020948.87..10000021398.87 rows=180000 width=8) Sort Key: a, b -> Seq Scan on abcd (cost=10000000000.00..10000002773.00 rows=180000 width=8) (8 rows) explain select distinct a, b, count(*) over (partition by a,b,c) from abcd; QUERY PLAN ------------------------------------------------------------------------------------------------------ Unique (cost=10000021580.47..10000036629.31 rows=60 width=20) -> Incremental Sort (cost=10000021580.47..10000035279.31 rows=180000 width=20) Sort Key: a, b, c, (count(*) OVER (?)) Presorted Key: a, b, c -> WindowAgg (cost=10000021561.37..10000025611.37 rows=180000 width=20) -> Sort (cost=10000021561.37..10000022011.37 rows=180000 width=12) Sort Key: a, b, c -> Seq Scan on abcd (cost=10000000000.00..10000002773.00 rows=180000 width=12) (8 rows) explain select distinct a, b, count(*) over (partition by b,a, c) from abcd; QUERY PLAN --------------------------------------------------------------------------------------------------- Unique (cost=2041.88..36764.90 rows=60 width=20) -> Incremental Sort (cost=2041.88..35414.90 rows=180000 width=20) Sort Key: b, a, c, (count(*) OVER (?)) Presorted Key: b, a, c -> WindowAgg (cost=1989.94..25746.96 rows=180000 width=20) -> Incremental Sort (cost=1989.94..22146.96 rows=180000 width=12) Sort Key: b, a, c Presorted Key: b -> Index Scan using b_idx on abcd (cost=0.29..7174.62 rows=180000 width=12) (9 rows) In master: explain select distinct relname,relkind,count(*) over (partition by relkind) from pg_Class; QUERY PLAN ------------------------------------------------------------------------------------------------------- Unique (cost=10000000059.50..10000000063.65 rows=415 width=73) -> Sort (cost=10000000059.50..10000000060.54 rows=415 width=73) Sort Key: relname, relkind, (count(*) OVER (?)) -> WindowAgg (cost=10000000034.20..10000000041.46 rows=415 width=73) -> Sort (cost=10000000034.20..10000000035.23 rows=415 width=65) Sort Key: relkind -> Seq Scan on pg_class (cost=10000000000.00..10000000016.15 rows=415 width=65) (7 rows) explain select distinct b,a from ab where a < 10; QUERY PLAN ---------------------------------------------------------------------------------- Unique (cost=72.20..78.95 rows=611 width=8) -> Sort (cost=72.20..74.45 rows=900 width=8) Sort Key: b, a -> Index Scan using ab_a_idx on ab (cost=0.29..28.04 rows=900 width=8) Index Cond: (a < 10) (5 rows) explain select distinct b,a, count(*) over (partition by a) from abcd order by a,b; QUERY PLAN ----------------------------------------------------------------------------------------------------- Unique (cost=10000023704.77..10000041084.40 rows=60 width=16) -> Incremental Sort (cost=10000023704.77..10000039734.40 rows=180000 width=16) Sort Key: a, b, (count(*) OVER (?)) Presorted Key: a -> WindowAgg (cost=10000020948.87..10000024098.87 rows=180000 width=16) -> Sort (cost=10000020948.87..10000021398.87 rows=180000 width=8) Sort Key: a -> Seq Scan on abcd (cost=10000000000.00..10000002773.00 rows=180000 width=8) (8 rows) explain select distinct a, b, count(*) over (partition by b,a, c) from abcd; QUERY PLAN --------------------------------------------------------------------------------------------------- Unique (cost=45151.33..46951.33 rows=60 width=20) -> Sort (cost=45151.33..45601.33 rows=180000 width=20) Sort Key: a, b, (count(*) OVER (?)) -> WindowAgg (cost=1989.94..25746.96 rows=180000 width=20) -> Incremental Sort (cost=1989.94..22146.96 rows=180000 width=12) Sort Key: b, a, c Presorted Key: b -> Index Scan using b_idx on abcd (cost=0.29..7174.62 rows=180000 width=12) (8 rows) Note: Composite keys are also handled. create index xy_idx on xyz(x,y); explain select distinct x,z,y from xyz; QUERY PLAN ---------------------------------------------------------------------------------- Unique (cost=0.86..55.97 rows=60 width=12) -> Incremental Sort (cost=0.86..51.47 rows=600 width=12) Sort Key: x, y, z Presorted Key: x, y -> Index Scan using xy_idx on xyz (cost=0.15..32.80 rows=600 width=12) (5 rows) There are some cases where different kind of scan happens explain select distinct x,y from xyz where y < 10; QUERY PLAN ----------------------------------------------------------------------------------- Unique (cost=47.59..51.64 rows=60 width=8) -> Sort (cost=47.59..48.94 rows=540 width=8) Sort Key: x, y -> Bitmap Heap Scan on xyz (cost=12.34..23.09 rows=540 width=8) Recheck Cond: (y < 10) -> Bitmap Index Scan on y_idx (cost=0.00..12.20 rows=540 width=0) Index Cond: (y < 10) (7 rows) As code only checks from IndexPath (at the moment), other scan paths are not covered. Is it okay to cover these in same way as I did for IndexPath? (with no limitation on this behaviour on certain path types?) Also, I am assuming distinct pathkeys can be changed without any issues. As changes are limited to modification in distinct path only, I don't see this affecting other nodes. Test cases are green, with a couple of failures in window functions (one which I had added) and one very weird: EXPLAIN (COSTS OFF) SELECT DISTINCT empno, enroll_date, depname, sum(salary) OVER (PARTITION BY depname order by empno) depsalary, min(salary) OVER (PARTITION BY depname order by enroll_date) depminsalary FROM empsalary ORDER BY depname, empno; QUERY PLAN ----------------------------------------------------------------------------------------------------- Incremental Sort Sort Key: depname, empno Presorted Key: depname -> Unique -> Incremental Sort Sort Key: depname, enroll_date, empno, (sum(salary) OVER (?)), (min(salary) OVER (?)) Presorted Key: depname -> WindowAgg -> Incremental Sort Sort Key: depname, empno Presorted Key: depname -> WindowAgg -> Sort Sort Key: depname, enroll_date -> Seq Scan on empsalary (15 rows) In above query plan, unique used to come after Incremental sort in the master. Pending: 1. Consider if Pathkey.pk_strategy and pk_nulls_first need to be compared too, this is pending as I have to look these up and understand them. 2. Test cases (failures and new cases) 3. Improve comments Patch v8 attached. Please let me know any review comments, will address these in followup patch with pending items. Thanks, Ankit
Attachment
pgsql-hackers by date: