Re: Todo: Teach planner to evaluate multiple windows in the optimal order - Mailing list pgsql-hackers
From | Ankit Kumar Pandey |
---|---|
Subject | Re: Todo: Teach planner to evaluate multiple windows in the optimal order |
Date | |
Msg-id | b1df021b-e6c0-f26a-1d5a-bd07f765db62@gmail.com Whole thread Raw |
In response to | Re: Todo: Teach planner to evaluate multiple windows in the optimal order (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Todo: Teach planner to evaluate multiple windows in the optimal order
|
List | pgsql-hackers |
> On 09/01/23 03:48, David Rowley wrote: > On Mon, 9 Jan 2023 at 06:17, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote: >> I have addressed all points 1-6 in the attached patch. > A few more things: > 1. You're still using the 'i' variable in the foreach loop. foreach_current_index() will work. > 2. I think the "index" variable needs a better name. sort_pushdown_idx maybe. Done these (1 & 2) > 3. I don't think you need to set "index" on every loop. Why not just set it to foreach_current_index(l) - 1; break; Consider this query EXPLAIN (COSTS OFF) SELECT empno, depname, min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary, sum(salary) OVER (PARTITION BY depname) depsalary, count(*) OVER (ORDER BY enroll_date DESC) c FROM empsalary ORDER BY depname, empno, enroll_date; Here, W1 = min(salary) OVER (PARTITION BY depname ORDER BY empno) W2 = sum(salary) OVER (PARTITION BY depname) W3 = count(*) OVER (ORDER BY enroll_date DESC) (1,2,3 are winref). activeWindows = [W3, W1, W2] If we iterate from reverse and break at first occurrence, we will break at W2 and add extra keys there, but what we want it to add keys at W1 so that it gets spilled to W2 (as existing logic is designed to carry over sorted cols first to last). > 4. You're still setting orderby_pathkeys in the foreach loop. That's already been set above and it won't have changed. > 5. I don't think there's any need to check pathkeys_contained_in() in the foreach loop anymore. With #3 the index will be -1 if the optimisation cannot apply. You could likely also get rid of try_sort_pushdown too and just make the condition "if (sort_pushdown_idx == foreach_current_index(l))". Done this. Added pathkeys_contained_in as an assert, hope that's okay. > I'm a little unsure why there's still the is_sorted check there. Shouldn't that always be false now that you're looping until the pathkeys don't match in the foreach_reverse loop? Removing is_sorted causes issue if there is matching pathkey which is presorted eg this case -- Do not perform sort pushdown if column is presorted CREATE INDEX depname_idx ON empsalary(depname); SET enable_seqscan=0; EXPLAIN (COSTS OFF) SELECT empno, min(salary) OVER (PARTITION BY depname) depminsalary FROM empsalary ORDER BY depname, empno; We can move this to if (try_sort_pushdown) block but it looks to me bit ugly. Nevertheless, it make sense to have it here, sort_pushdown_index should point to exact window function which needs to be modified. Having extra check (for is_sorted) in 2nd foreach loop adds ambiguity if we don't add it in first check. foreach_reverse(l, activeWindows) { WindowClause *wc = lfirst_node(WindowClause, l); orderby_pathkeys = make_pathkeys_for_sortclauses(root,root->parse->sortClause,root->processed_tlist); window_pathkeys = make_pathkeys_for_window(root,wc,root->processed_tlist); is_sorted = pathkeys_count_contained_in(window_pathkeys,path->pathkeys,&presorted_keys); has_runcondition |= (wc->runCondition != NIL); if (!pathkeys_contained_in(window_pathkeys, orderby_pathkeys) || has_runcondition) break; if(!is_sorted) sort_pushdown_idx = foreach_current_index(l); } Tests passes on this so logically it is ok. > Correct me if I'm wrong as I've not tested, but I think the new code > in the foreach loop can just become: > > if (sort_pushdown_idx == foreach_current_index(l)) > { > Assert(!is_sorted); > window_pathkeys = orderby_pathkeys; > is_sorted = pathkeys_count_contained_in(window_pathkeys, > path->pathkeys, &presorted_keys); > } Depending on where we have is_sorted (as mentioned above) it looks a lot like you mentioned. Also, we can add Assert(pathkeys_contained_in(window_pathkeys, orderby_pathkeys)) >> I have one doubt regarding runCondition, do we only need to ensure >> that window function which has subset sort clause of main query should >> not have runCondition or none of the window functions should not contain >> runCondition? I have gone with later case but wanted to clarify. > Actually, maybe it's ok just to check the top-level WindowClause for > runConditions. It's only that one that'll filter rows. That probably > simplifies the code quite a bit. Lower-level runConditions only serve > to halt the evaluation of WindowFuncs when the runCondition is no > longer met. Okay, then this approach makes sense. > hmm, perhaps the Limit would have to be put between the WindowAgg and > Sort for it to work. Maybe that's more complexity than it's worth. Yes, not specific to this change. It is more around allowing top-N sort in window functions (in general). Once we have it there, then this could be taken care of. I have attached patch which fixes 1 & 2 and rearrange is_sorted. Point #3 needs to be resolved (and perhaps another way to handle is_sorted) Thanks, -- Regards, Ankit Kumar Pandey
Attachment
pgsql-hackers by date: