Thread: Default framing option RANGE adds cost for no gain to some window functions

Default framing option RANGE adds cost for no gain to some window functions

From
Erwin Brandstetter
Date:
The manual states:

> The default framing option is RANGE UNBOUNDED PRECEDING, ...

This seems to be mandated by the SQL standard.

However, unless I am missing something, there are window functions where RANGE mode makes no sense on principle, and the result is identical to ROWS mode. Among those, the most popular window function of all: row_number(). These expressions do the same:

row_number() OVER (ORDER BY a)

row_number() OVER (ORDER BY a ROWS UNBOUNDED PRECEDING)

Unfortunately, the first one is substantially more expensive. Hardly anybody seems to be aware of that. I consistently see a performance penalty of around 20 % (or more). Demo for Postgres 15 with a couple of variants:


This looks like a performance. Either RANGE mode should only apply to window functions where it actually makes a difference. Or it should be optimized internally to use the faster code path where there is no effective difference.

(In an ideal world, the default mode would be ROWS to begin with, as this is more intuitive. But too late now.)

Regards
Erwin Brandstetter

Re: Default framing option RANGE adds cost for no gain to some window functions

From
David Rowley
Date:
On Mon, 10 Oct 2022 at 12:45, Erwin Brandstetter <brsaweda@gmail.com> wrote:
> However, unless I am missing something, there are window functions where RANGE mode makes no sense on principle, and
theresult is identical to ROWS mode. Among those, the most popular window function of all: row_number(). These
expressionsdo the same:
 
>
> row_number() OVER (ORDER BY a)
>
> row_number() OVER (ORDER BY a ROWS UNBOUNDED PRECEDING)
>
> Unfortunately, the first one is substantially more expensive. Hardly anybody seems to be aware of that. I
consistentlysee a performance penalty of around 20 % (or more). Demo for Postgres 15 with a couple of variants:
 

You might be onto something there. However, a lack of an optimisation
is not a bug, so this is not the correct place to discuss.

If you were keen to come up with a patch, you could look at what was
done in [1] and perhaps invent a new Node type that can be given to
the support function so that the support function can be called to ask
if the window function cares about the ROWS / RANGE option.  Then in
the planner, perhaps just after select_active_windows() is called,
call the support function for each set of window functions in each
WindowClause to see if the window function cares about this option.

This perhaps should be done more generically than just asking the
support function about ROWS vs RANGE. Maybe you can just ask the
support function if the frameOptions can be optimised for this window
function, then if every WindowFunc in the WindowClause agrees on what
those optimised frameOptions are, then you can change the
WindowClause.frameOptions to the optimised set. If the given
WindowFunc does not have a support function or the support function
does not understand the new Node type, then you'll need to leave the
WindowClause.frameOptions alone.

If you're keen to do this, then you should start a thread on the
-hackers list mentioning what you'd like to do and how you plan to go
about doing it.  That's a good place to get feedback before you get
too deep into writing a patch.

David

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9d9c02ccd1a



Re: Default framing option RANGE adds cost for no gain to some window functions

From
Erwin Brandstetter
Date:

On Mon, 10 Oct 2022 at 06:21, David Rowley <dgrowleyml@gmail.com> wrote:
On Mon, 10 Oct 2022 at 12:45, Erwin Brandstetter <brsaweda@gmail.com> wrote:
> However, unless I am missing something, there are window functions where RANGE mode makes no sense on principle, and the result is identical to ROWS mode. Among those, the most popular window function of all: row_number(). These expressions do the same:
>
> row_number() OVER (ORDER BY a)
>
> row_number() OVER (ORDER BY a ROWS UNBOUNDED PRECEDING)
>
> Unfortunately, the first one is substantially more expensive. Hardly anybody seems to be aware of that. I consistently see a performance penalty of around 20 % (or more). Demo for Postgres 15 with a couple of variants:

You might be onto something there. However, a lack of an optimisation
is not a bug, so this is not the correct place to discuss.

If you were keen to come up with a patch, you could look at what was
done in [1] and perhaps invent a new Node type that can be given to
the support function so that the support function can be called to ask
if the window function cares about the ROWS / RANGE option.  Then in
the planner, perhaps just after select_active_windows() is called,
call the support function for each set of window functions in each
WindowClause to see if the window function cares about this option.

This perhaps should be done more generically than just asking the
support function about ROWS vs RANGE. Maybe you can just ask the
support function if the frameOptions can be optimised for this window
function, then if every WindowFunc in the WindowClause agrees on what
those optimised frameOptions are, then you can change the
WindowClause.frameOptions to the optimised set. If the given
WindowFunc does not have a support function or the support function
does not understand the new Node type, then you'll need to leave the
WindowClause.frameOptions alone.

If you're keen to do this, then you should start a thread on the
-hackers list mentioning what you'd like to do and how you plan to go
about doing it.  That's a good place to get feedback before you get
too deep into writing a patch.

David

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9d9c02ccd1a