Re: Todo: Teach planner to evaluate multiple windows in the optimal order - Mailing list pgsql-hackers
From | Ankit Kumar Pandey |
---|---|
Subject | Re: Todo: Teach planner to evaluate multiple windows in the optimal order |
Date | |
Msg-id | e428fc75-3c56-7cfa-1ec8-5f57e96b1ced@gmail.com Whole thread Raw |
In response to | Re: Todo: Teach planner to evaluate multiple windows in the optimal order (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Todo: Teach planner to evaluate multiple windows in the optimal order
|
List | pgsql-hackers |
On 07/01/23 17:28, David Rowley wrote: > Your email client seems to be adding additional vertical space to your > emails. I've removed the additional newlines in the quotes. Are you > able to fix the client so it does not do that? I have adjusted my mail client, hope it is better now? > On Sun, 8 Jan 2023 at 00:10, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote: > > > > On 07/01/23 09:58, David Rowley wrote: > > > You also don't seem to be considering the fact that the query might > > > have a DISTINCT clause. > > > > Major reason for this was that I am not exactly aware of what distinct > > clause means (especially in > > > > context of window functions) and how it is different from other > > sortClauses (partition, order by, group). > > I'm talking about the query's DISTINCT clause. i.e SELECT DISTINCT. > If you look in the grouping_planner() function, you'll see that > create_distinct_paths() is called between create_window_paths() and > create_ordered_paths(). Yes just saw this and got what you meant. > > Yes, this is a fair point. Multiple sort is actually beneficial in cases > > like this, perhaps limits clause and runCondition should be no op too? > > I'm not sure what you mean by "no op". Do you mean not apply the optimization? Yes, no op = no optimization. Sorry I didn't mention it before. > > > I think the patch should also be using pathkeys_contained_in() and > > > Lists of pathkeys rather than concatenating lists of SortGroupClauses > > > together. That should allow things to work correctly when a given > > > pathkey has become redundant due to either duplication or a Const in > > > the Eclass. > > > > Make sense, I actually duplicated that logic from > > make_pathkeys_for_window. We should make this changes there as well because > > if we have SELECT rank() OVER (PARTITION BY a ORDER BY a) > > (weird example but you get the idea), it leads to duplicates in > > window_sortclauses. > > It won't lead to duplicate pathkeys. Look in > make_pathkeys_for_sortclauses() and what pathkey_is_redundant() does. > Notice that it checks if the pathkey already exists in the list before > appending. Okay I see this, pathkey_is_redundant is much more smarter as well. Replacing list_concat_copy with list_concat_unique in make_pathkeys_for_window won't be of much benefit. > > Agree with runConditions part but for limit clause, row reduction happens > > at the last, so whether we use patched version or master version, > > none of sorts would benefit/degrade from that, right? > > Maybe you're right. Just be aware that a sort done for a query with an > ORDER BY LIMIT will perform a top-n sort. top-n sorts only need to > store the top-n tuples and that can significantly reduce the memory > required for sorting perhaps resulting in the sort fitting in memory > rather than spilling out to disk. > > You might want to test this by having the leading sort column as an > INT, and then the 2nd one as a long text column of maybe around two > kilobytes. Make all the leading column values the same so that the > comparison for the text column is always performed. Make the LIMIT > small compared to the total number of rows, that should test the worse > case. Check the performance with and without the limitCount != NULL > part of the patch that disables the optimization for LIMIT. I checked this. For limit <<< total number of rows, top-n sort was performed but when I changed limit to higher value (or no limit), quick sort was performed. Top-n sort was twice as fast. Also, tested (first) patch version vs master, top-n sort was twice as fast there as well (outputs mentioned below). Current patch version (with limit excluded for optimizations) explain (analyze ,costs off) select count(*) over (order by id) from tt order by id, name limit 1; QUERY PLAN --------------------------------------------------------------------------------------------------- Limit (actual time=1.718..1.719 rows=1 loops=1) -> Incremental Sort (actual time=1.717..1.717 rows=1 loops=1) Sort Key: id, name Presorted Key: id Full-sort Groups: 1 Sort Method: top-N heapsort Average Memory: 25kB Peak Memory: 25kB -> WindowAgg (actual time=0.028..0.036 rows=6 loops=1) -> Sort (actual time=0.017..0.018 rows=6 loops=1) Sort Key: id Sort Method: quicksort Memory: 25kB -> Seq Scan on tt (actual time=0.011..0.012 rows=6 loops=1) Planning Time: 0.069 ms Execution Time: 1.799 ms Earlier patch(which included limit clause for optimizations) explain (analyze ,costs off) select count(*) over (order by id) from tt order by id, name limit 1; QUERY PLAN ---------------------------------------------------------------------------- Limit (actual time=3.766..3.767 rows=1 loops=1) -> WindowAgg (actual time=3.764..3.765 rows=1 loops=1) -> Sort (actual time=3.749..3.750 rows=6 loops=1) Sort Key: id, name Sort Method: quicksort Memory: 25kB -> Seq Scan on tt (actual time=0.011..0.013 rows=6 loops=1) Planning Time: 0.068 ms Execution Time: 3.881 ms I am just wondering though, why can we not do top-N sort in optimized version if we include limit clause? Is top-N sort is limited to non strict sorting or cases last operation before limit is sort? . > > Is it okay > > to add comments in test cases? I don't see it much on existing cases > > so kind of reluctant to add but it makes intentions much more clear. > > I think tests should always have a comment to state what they're > testing. Not many people seem to do that, unfortunately. The problem > with not stating what the test is testing is that if, for example, the > test is checking that the EXPLAIN output is showing a Sort, what if at > some point in the future someone adjusts some costing code and the > plan changes to an Index Scan. If there's no comment to state that > we're looking for a Sort plan, then the author of the patch that's > adjusting the costs might just think it's ok to change the expected > plan to an Index Scan. I've seen this problem occur even when the > comments *do* exist. There's just about no hope of such a test > continuing to do what it's meant to if the comments don't exist. Thanks for clarifying this out, I will freely add comments if that helps to explain things better. -- Regards, Ankit Kumar Pandey
pgsql-hackers by date: