Re: Todo: Teach planner to evaluate multiple windows in the optimal order - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Todo: Teach planner to evaluate multiple windows in the optimal order |
Date | |
Msg-id | CAApHDvpEYcQ7d-c6Lje26gLi4dXOM3qQ9nrTe1Z241tT8nBiCg@mail.gmail.com Whole thread Raw |
In response to | Re: Todo: Teach planner to evaluate multiple windows in the optimal order (Ankit Kumar Pandey <itsankitkp@gmail.com>) |
Responses |
Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Re: Todo: Teach planner to evaluate multiple windows in the optimal order |
List | pgsql-hackers |
On Sat, 7 Jan 2023 at 02:11, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote: > > On 05/01/23 12:53, David Rowley wrote: > >> > >> We *can* reuse Sorts where a more strict or equivalent sort order is > >> available. The question is how do we get the final WindowClause to do > >> something slightly more strict to save having to do anything for the > >> ORDER BY. One way you might think would be to adjust the > >> WindowClause's orderClause to add the additional clauses, but that > >> cannot be done because that would cause are_peers() in nodeWindowAgg.c > >> to not count some rows as peers when they maybe should be given a less > >> strict orderClause in the WindowClause. > > I attempted this in attached patch. I had a quick look at this and it's going to need some additional code to ensure there are no WindowFuncs in the ORDER BY clause. We can't sort on those before we evaluate them. Right now you get: postgres=# explain select *,row_number() over (order by oid) rn1 from pg_class order by oid,rn1; ERROR: could not find pathkey item to sort I also don't think there's any point in adding the additional pathkeys when the input path is already presorted. Have a look at: postgres=# set enable_seqscan=0; SET postgres=# explain select *,row_number() over (order by oid) rn1 from pg_class order by oid,relname; QUERY PLAN ---------------------------------------------------------------------------------------------------- WindowAgg (cost=0.43..85.44 rows=412 width=281) -> Incremental Sort (cost=0.43..79.26 rows=412 width=273) Sort Key: oid, relname Presorted Key: oid -> Index Scan using pg_class_oid_index on pg_class (cost=0.27..60.72 rows=412 width=273) (5 rows) It would be better to leave this case alone and just do the incremental sort afterwards. You also don't seem to be considering the fact that the query might have a DISTINCT clause. That's evaluated between window functions and the order by. It would be fairly useless to do a more strict sort when the sort order is going to be obliterated by a Hash Aggregate. Perhaps we can just not do this when the query has a DISTINCT clause. On the other hand, there are also a few reasons why we shouldn't do this. I mentioned the WindowClause runConditions earlier here. The patched version produces: postgres=# explain (analyze, costs off) select * from (select oid,relname,row_number() over (order by oid) rn1 from pg_class order by oid,relname) where rn1 < 10; QUERY PLAN ------------------------------------------------------------------------------ WindowAgg (actual time=0.488..0.497 rows=9 loops=1) Run Condition: (row_number() OVER (?) < 10) -> Sort (actual time=0.466..0.468 rows=10 loops=1) Sort Key: pg_class.oid, pg_class.relname Sort Method: quicksort Memory: 67kB -> Seq Scan on pg_class (actual time=0.028..0.170 rows=420 loops=1) Planning Time: 0.214 ms Execution Time: 0.581 ms (8 rows) Whereas master produces: postgres=# explain (analyze, costs off) select * from (select oid,relname,row_number() over (order by oid) rn1 from pg_class order by oid,relname) where rn1 < 10; QUERY PLAN ---------------------------------------------------------------------------------------- Incremental Sort (actual time=0.506..0.508 rows=9 loops=1) Sort Key: pg_class.oid, pg_class.relname Presorted Key: pg_class.oid Full-sort Groups: 1 Sort Method: quicksort Average Memory: 26kB Peak Memory: 26kB -> WindowAgg (actual time=0.475..0.483 rows=9 loops=1) Run Condition: (row_number() OVER (?) < 10) -> Sort (actual time=0.461..0.461 rows=10 loops=1) Sort Key: pg_class.oid Sort Method: quicksort Memory: 67kB -> Seq Scan on pg_class (actual time=0.022..0.178 rows=420 loops=1) Planning Time: 0.245 ms Execution Time: 0.594 ms (12 rows) (slightly bad example since oid is unique but...) It's not too clear to me that the patched version is a better plan. The bottom level sort, which sorts 420 rows has a more complex comparison to do. Imagine the 2nd column is a long text string. That would make the sort much more expensive. The top-level sort has far fewer rows to sort due to the runCondition filtering out anything that does not match rn1 < 10. The same can be said for a query with a LIMIT clause. I think the patch should also be using pathkeys_contained_in() and Lists of pathkeys rather than concatenating lists of SortGroupClauses together. That should allow things to work correctly when a given pathkey has become redundant due to either duplication or a Const in the Eclass. Also, since I seem to be only be able to think of these cases properly by actually trying them, I ended up with the attached patch. I opted to not do the optimisation when there are runConditions or a LIMIT clause. Doing it when there are runConditions really should be a cost-based decision, but we've about no hope of having any idea about how many rows will match the runCondition. For the LIMIT case, it's also difficult as it would be hard to get an idea of how many times the additional sort columns would need their comparison function called. That's only required in a tie-break when the leading columns are all equal. The attached patch has no tests added. It's going to need some of those. These tests should go directly after the tests added in a14a58329 and likely use the same table for consistency. David
Attachment
pgsql-hackers by date: