Allow WindowFuncs prosupport function to use more optimal WindowClause options - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Allow WindowFuncs prosupport function to use more optimal WindowClause options |
Date | |
Msg-id | CAApHDvohAKEtTXxq7Pc-ic2dKT8oZfbRKeEJP64M0B6+S88z+A@mail.gmail.com Whole thread Raw |
Responses |
Re: Allow WindowFuncs prosupport function to use more optimal WindowClause options
(Vik Fearing <vik@postgresfriends.org>)
|
List | pgsql-hackers |
Over on [1], Erwin mentions that row_number() over (ORDER BY ... ROWS UNBOUNDED PRECEDING) is substantially faster than the default RANGE UNBOUNDED PRECEDING WindowClause options. The difference between these options are that nodeWindowAgg.c checks for peer rows for RANGE but not for ROWS. That would make a difference for many of our built-in window and aggregate functions, but row_number() does not really care. To quantify the performance difference, take the following example: create table ab (a int, b int) ; insert into ab select x,y from generate_series(1,1000)x,generate_series(1,1000)y; create index on ab(a,b); -- range unbounded (the default) explain (analyze, costs off, timing off) select a,b from (select a,b,row_number() over (partition by a order by a range unbounded preceding) rn from ab) ab where rn <= 1; QUERY PLAN --------------------------------------------------------------------------------------- Subquery Scan on ab (actual rows=1000 loops=1) -> WindowAgg (actual rows=1000 loops=1) Run Condition: (row_number() OVER (?) <= 1) -> Index Only Scan using ab_a_b_idx on ab ab_1 (actual rows=1000000 loops=1) Heap Fetches: 1000000 Planning Time: 0.091 ms Execution Time: 336.172 ms (7 rows) If that were switched to use ROWS UNBOUNDED PRECEDING then the executor would not have to check for peer rows in the window frame. explain (analyze, costs off, timing off) select a,b from (select a,b,row_number() over (partition by a order by a rows unbounded preceding) rn from ab) ab where rn <= 1; QUERY PLAN --------------------------------------------------------------------------------------- Subquery Scan on ab (actual rows=1000 loops=1) -> WindowAgg (actual rows=1000 loops=1) Run Condition: (row_number() OVER (?) <= 1) -> Index Only Scan using ab_a_b_idx on ab ab_1 (actual rows=1000000 loops=1) Heap Fetches: 0 Planning Time: 0.178 ms Execution Time: 75.101 ms (7 rows) Time: 75.673 ms (447% faster) You can see that this executes quite a bit more quickly. As Erwin pointed out to me (off-list), this along with the monotonic window function optimisation that was added in PG15 the performance of this gets close to DISTINCT ON. explain (analyze, costs off, timing off) select distinct on (a) a,b from ab order by a,b; QUERY PLAN ---------------------------------------------------------------------------- Unique (actual rows=1000 loops=1) -> Index Only Scan using ab_a_b_idx on ab (actual rows=1000000 loops=1) Heap Fetches: 0 Planning Time: 0.071 ms Execution Time: 77.988 ms (5 rows) I've not really done any analysis into which other window functions can use this optimisation. The attached only adds support to row_number()'s support function and only converts exactly "RANGE UNBOUNDED PRECEDING AND CURRENT ROW" into "ROW UNBOUNDED PRECEDING AND CURRENT ROW". That might need to be relaxed a little, but I've done no analysis to find that out. Erwin mentioned to me that he's not currently in a position to produce a patch for this, so here's the patch. I'm hoping the existence of this might coax Erwin into doing some analysis into what other window functions we can support and what other frame options can be optimised. [1] https://postgr.es/m/CAGHENJ7LBBszxS+SkWWFVnBmOT2oVsBhDMB1DFrgerCeYa_DyA@mail.gmail.com
Attachment
pgsql-hackers by date: