Home > mailing lists

Allow WindowFuncs prosupport function to use more optimal WindowClause options - Mailing list pgsql-hackers

From	David Rowley
Subject	Allow WindowFuncs prosupport function to use more optimal WindowClause options
Date	October 12, 2022 05:40:42
Msg-id	CAApHDvohAKEtTXxq7Pc-ic2dKT8oZfbRKeEJP64M0B6+S88z+A@mail.gmail.com Whole thread Raw
Responses	Re: Allow WindowFuncs prosupport function to use more optimal WindowClause options (Vik Fearing <vik@postgresfriends.org>)
List	pgsql-hackers

Tree view

Over on [1], Erwin mentions that row_number() over (ORDER BY ... ROWS
UNBOUNDED PRECEDING) is substantially faster than the default RANGE
UNBOUNDED PRECEDING WindowClause options.  The difference between
these options are that nodeWindowAgg.c checks for peer rows for RANGE
but not for ROWS.  That would make a difference for many of our
built-in window and aggregate functions, but row_number() does not
really care.

To quantify the performance difference, take the following example:

create table ab (a int, b int) ;
insert into ab
select x,y from generate_series(1,1000)x,generate_series(1,1000)y;
create index on ab(a,b);

--  range unbounded (the default)
explain (analyze, costs off, timing off)
select a,b from (select a,b,row_number() over (partition by a order by
a range unbounded preceding) rn from ab) ab where rn <= 1;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Subquery Scan on ab (actual rows=1000 loops=1)
   ->  WindowAgg (actual rows=1000 loops=1)
         Run Condition: (row_number() OVER (?) <= 1)
         ->  Index Only Scan using ab_a_b_idx on ab ab_1 (actual
rows=1000000 loops=1)
               Heap Fetches: 1000000
 Planning Time: 0.091 ms
 Execution Time: 336.172 ms
(7 rows)

If that were switched to use ROWS UNBOUNDED PRECEDING then the
executor would not have to check for peer rows in the window frame.

explain (analyze, costs off, timing off)
select a,b from (select a,b,row_number() over (partition by a order by
a rows unbounded preceding) rn from ab) ab where rn <= 1;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Subquery Scan on ab (actual rows=1000 loops=1)
   ->  WindowAgg (actual rows=1000 loops=1)
         Run Condition: (row_number() OVER (?) <= 1)
         ->  Index Only Scan using ab_a_b_idx on ab ab_1 (actual
rows=1000000 loops=1)
               Heap Fetches: 0
 Planning Time: 0.178 ms
 Execution Time: 75.101 ms
(7 rows)

Time: 75.673 ms (447% faster)

You can see that this executes quite a bit more quickly. As Erwin
pointed out to me (off-list), this along with the monotonic window
function optimisation that was added in PG15 the performance of this
gets close to DISTINCT ON.

explain (analyze, costs off, timing off)
select distinct on (a) a,b from ab order by a,b;
                                 QUERY PLAN
----------------------------------------------------------------------------
 Unique (actual rows=1000 loops=1)
   ->  Index Only Scan using ab_a_b_idx on ab (actual rows=1000000 loops=1)
         Heap Fetches: 0
 Planning Time: 0.071 ms
 Execution Time: 77.988 ms
(5 rows)

I've not really done any analysis into which other window functions
can use this optimisation. The attached only adds support to
row_number()'s support function and only converts exactly "RANGE
UNBOUNDED PRECEDING AND CURRENT ROW" into "ROW UNBOUNDED PRECEDING AND
CURRENT ROW".  That might need to be relaxed a little, but I've done
no analysis to find that out.   Erwin mentioned to me that he's not
currently in a position to produce a patch for this, so here's the
patch. I'm hoping the existence of this might coax Erwin into doing
some analysis into what other window functions we can support and what
other frame options can be optimised.

[1] https://postgr.es/m/CAGHENJ7LBBszxS+SkWWFVnBmOT2oVsBhDMB1DFrgerCeYa_DyA@mail.gmail.com

Attachment

optimize_row_numbers_frameoptions_when_possible.patch

pgsql-hackers by date:

From: Richard Guo
Date: 12 October 2022, 05:20:18
Subject: Re: Remove an unnecessary LSN calculation while validating WAL page header

From: Etsuro Fujita
Date: 12 October 2022, 05:56:17
Subject: Re: Fast COPY FROM based on batch insert

Allow WindowFuncs prosupport function to use more optimal WindowClause options - Mailing list pgsql-hackers

Attachment

Previous

Next