Thread: Window Function "Run Conditions"

Window Function "Run Conditions"

From

David Rowley

Date:

01 July 2021, 09:11:21

It seems like a few too many years of an SQL standard without any
standardised way to LIMIT the number of records in a result set caused
various applications to adopt some strange ways to get this behaviour.
Over here in the PostgreSQL world, we just type LIMIT n; at the end of
our queries.  I believe Oracle people did a few tricks with a special
column named "rownum". Another set of people needed SQL that would
work over multiple DBMSes and used something like:

SELECT * FROM (SELECT ... row_number() over (order by ...) rn) a WHERE rn <= 10;

I believe it's fairly common to do paging this way on commerce sites.

The problem with PostgreSQL here is that neither the planner nor
executor knows that once we get to row_number 11 that we may as well
stop.  The number will never go back down in this partition.

I'd like to make this better for PostgreSQL 15. I've attached a WIP
patch to do so.

How this works is that I've added prosupport functions for each of
row_number(), rank() and dense_rank().  When doing qual pushdown, if
we happen to hit a windowing function, instead of rejecting the
pushdown, we see if there's a prosupport function and if there is, ask
it if this qual can be used to allow us to stop emitting tuples from
the Window node by making use of this qual.  I've called these "run
conditions".  Basically, keep running while this remains true. Stop
when it's not.

We can't always use the qual directly. For example, if someone does.

SELECT * FROM (SELECT ... row_number() over (order by ...) rn) a WHERE rn = 10;

then if we use the rn = 10 qual, we'd think we could stop right away.
Instead, I've made the prosupport function handle this by generating a
rn <= 10 qual so that we can stop once we get to 11.  In this case we
cannot completely pushdown the qual. It needs to remain in place to
filter out rn values 1-9.

Row_number(), rank() and dense_rank() are all monotonically increasing
functions.  But we're not limited to just those.  COUNT(*) works too
providing the frame bounds guarantee that the function is either
monotonically increasing or decreasing.

COUNT(*) OVER (ORDER BY .. ROWS BETWEEN CURRENT ROW AND UNBOUNDED
FOLLOWING) is monotonically decreasing, whereas the standard bound
options would make it monotonically increasing.

The same could be done for MIN() and MAX(). I just don't think that's
worth doing. It seems unlikely that would get enough use.

Anyway. I'd like to work on this more during the PG15 cycle.  I
believe the attached patch makes this work ok. There are just a few
things to iron out.

1) Unsure of the API to the prosupport function.  I wonder if the
prosupport function should just be able to say if the function is
either monotonically increasing or decreasing or neither then have
core code build a qual.  That would make the job of building new
functions easier, but massively reduce the flexibility of the feature.
I'm just not sure it needs to do more in the future.

2) Unsure if what I've got to make EXPLAIN show the run condition is
the right way to do it. Because I don't want nodeWindow.c to have to
re-evaluate the window function to determine of the run condition is
no longer met, I've coded the qual to reference the varno in the
window node's targetlist.  That qual is no good for EXPLAIN so had to
include another set of quals that include the WindowFunc reference. I
saw that Index Only Scans have a similar means to make EXPLAIN work,
so I just followed that.

David

Attachment

v1-0001-Allow-some-window-functions-to-finish-execution-e.patch

Re: Window Function "Run Conditions"

From

Pavel Stehule

Date:

01 July 2021, 09:17:22

čt 1. 7. 2021 v 11:11 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:

It seems like a few too many years of an SQL standard without any
standardised way to LIMIT the number of records in a result set caused
various applications to adopt some strange ways to get this behaviour.
Over here in the PostgreSQL world, we just type LIMIT n; at the end of
our queries. I believe Oracle people did a few tricks with a special
column named "rownum". Another set of people needed SQL that would
work over multiple DBMSes and used something like:

SELECT * FROM (SELECT ... row_number() over (order by ...) rn) a WHERE rn <= 10;

I believe it's fairly common to do paging this way on commerce sites.

The problem with PostgreSQL here is that neither the planner nor
executor knows that once we get to row_number 11 that we may as well
stop. The number will never go back down in this partition.

I'd like to make this better for PostgreSQL 15. I've attached a WIP
patch to do so.

How this works is that I've added prosupport functions for each of
row_number(), rank() and dense_rank(). When doing qual pushdown, if
we happen to hit a windowing function, instead of rejecting the
pushdown, we see if there's a prosupport function and if there is, ask
it if this qual can be used to allow us to stop emitting tuples from
the Window node by making use of this qual. I've called these "run
conditions". Basically, keep running while this remains true. Stop
when it's not.

We can't always use the qual directly. For example, if someone does.

SELECT * FROM (SELECT ... row_number() over (order by ...) rn) a WHERE rn = 10;

then if we use the rn = 10 qual, we'd think we could stop right away.
Instead, I've made the prosupport function handle this by generating a
rn <= 10 qual so that we can stop once we get to 11. In this case we
cannot completely pushdown the qual. It needs to remain in place to
filter out rn values 1-9.

Row_number(), rank() and dense_rank() are all monotonically increasing
functions. But we're not limited to just those. COUNT(*) works too
providing the frame bounds guarantee that the function is either
monotonically increasing or decreasing.

COUNT(*) OVER (ORDER BY .. ROWS BETWEEN CURRENT ROW AND UNBOUNDED
FOLLOWING) is monotonically decreasing, whereas the standard bound
options would make it monotonically increasing.

The same could be done for MIN() and MAX(). I just don't think that's
worth doing. It seems unlikely that would get enough use.

Anyway. I'd like to work on this more during the PG15 cycle. I
believe the attached patch makes this work ok. There are just a few
things to iron out.

1) Unsure of the API to the prosupport function. I wonder if the
prosupport function should just be able to say if the function is
either monotonically increasing or decreasing or neither then have
core code build a qual. That would make the job of building new
functions easier, but massively reduce the flexibility of the feature.
I'm just not sure it needs to do more in the future.

2) Unsure if what I've got to make EXPLAIN show the run condition is
the right way to do it. Because I don't want nodeWindow.c to have to
re-evaluate the window function to determine of the run condition is
no longer met, I've coded the qual to reference the varno in the
window node's targetlist. That qual is no good for EXPLAIN so had to
include another set of quals that include the WindowFunc reference. I
saw that Index Only Scans have a similar means to make EXPLAIN work,
so I just followed that.

this can be very nice feature

Pavel

David

Re: Window Function "Run Conditions"

From

David Rowley

Date:

16 August 2021, 10:28:24

On Thu, 1 Jul 2021 at 21:11, David Rowley <dgrowleyml@gmail.com> wrote:
> 1) Unsure of the API to the prosupport function.  I wonder if the
> prosupport function should just be able to say if the function is
> either monotonically increasing or decreasing or neither then have
> core code build a qual.  That would make the job of building new
> functions easier, but massively reduce the flexibility of the feature.
> I'm just not sure it needs to do more in the future.

I looked at this patch again today and ended up changing the API that
I'd done for the prosupport functions.  These just now set a new
"monotonic" field in the (also newly renamed)
SupportRequestWFuncMonotonic struct. This can be set to one of the
values from the newly added MonotonicFunction enum, namely:
MONOTONICFUNC_NONE, MONOTONICFUNC_INCREASING, MONOTONICFUNC_DECREASING
or MONOTONICFUNC_BOTH.

I also added handling for a few more cases that are perhaps rare but
could be done with just a few lines of code. For example; COUNT(*)
OVER() is MONOTONICFUNC_BOTH as it can neither increase nor decrease
for a given window partition. I think technically all of the standard
set of aggregate functions could have a prosupport function to handle
that case. Min() and Max() could go a little further, but I'm not sure
if adding handling for that would be worth it, and if someone does
think that it is worth it, then I'd rather do that as a separate
patch.

I put the MonotonicFunction enum in plannodes.h. There's nothing
specific about window functions or support functions. It could, for
example, be reused again for something else such as monotonic
set-returning functions.

One thing which I'm still not sure about is where
find_window_run_conditions() should be located. Currently, it's in
allpaths.c but that does not really feel like the right place to me.
We do have planagg.c in src/backend/optimizer/plan, maybe we need
planwindow.c?

David

Attachment

v2_teach_planner_and_executor_about_monotonic_wfuncs.patch

Re: Window Function "Run Conditions"

From

Zhihong Yu

Date:

16 August 2021, 15:57:12

On Mon, Aug 16, 2021 at 3:28 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 1 Jul 2021 at 21:11, David Rowley <dgrowleyml@gmail.com> wrote:
> 1) Unsure of the API to the prosupport function. I wonder if the
> prosupport function should just be able to say if the function is
> either monotonically increasing or decreasing or neither then have
> core code build a qual. That would make the job of building new
> functions easier, but massively reduce the flexibility of the feature.
> I'm just not sure it needs to do more in the future.

I looked at this patch again today and ended up changing the API that
I'd done for the prosupport functions. These just now set a new
"monotonic" field in the (also newly renamed)
SupportRequestWFuncMonotonic struct. This can be set to one of the
values from the newly added MonotonicFunction enum, namely:
MONOTONICFUNC_NONE, MONOTONICFUNC_INCREASING, MONOTONICFUNC_DECREASING
or MONOTONICFUNC_BOTH.

I also added handling for a few more cases that are perhaps rare but
could be done with just a few lines of code. For example; COUNT(*)
OVER() is MONOTONICFUNC_BOTH as it can neither increase nor decrease
for a given window partition. I think technically all of the standard
set of aggregate functions could have a prosupport function to handle
that case. Min() and Max() could go a little further, but I'm not sure
if adding handling for that would be worth it, and if someone does
think that it is worth it, then I'd rather do that as a separate
patch.

I put the MonotonicFunction enum in plannodes.h. There's nothing
specific about window functions or support functions. It could, for
example, be reused again for something else such as monotonic
set-returning functions.

One thing which I'm still not sure about is where
find_window_run_conditions() should be located. Currently, it's in
allpaths.c but that does not really feel like the right place to me.
We do have planagg.c in src/backend/optimizer/plan, maybe we need
planwindow.c?

David

Hi,

+ if ((res->monotonic & MONOTONICFUNC_INCREASING) == MONOTONICFUNC_INCREASING)

The above can be simplified as 'if (res->monotonic & MONOTONICFUNC_INCREASING) '

Cheers

Re: Window Function "Run Conditions"

From

David Rowley

Date:

18 August 2021, 10:39:36

On Tue, 17 Aug 2021 at 03:51, Zhihong Yu <zyu@yugabyte.com> wrote:
> +               if ((res->monotonic & MONOTONICFUNC_INCREASING) == MONOTONICFUNC_INCREASING)
>
> The above can be simplified as 'if (res->monotonic & MONOTONICFUNC_INCREASING) '

True.  I've attached an updated patch.

David

Attachment

v3_teach_planner_and_executor_about_monotonic_wfuncs.patch

Re: Window Function "Run Conditions"

From

Andy Fan

Date:

18 August 2021, 12:20:45

Hi David:

Thanks for the patch.

On Wed, Aug 18, 2021 at 6:40 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Tue, 17 Aug 2021 at 03:51, Zhihong Yu <zyu@yugabyte.com> wrote:
> > +               if ((res->monotonic & MONOTONICFUNC_INCREASING) == MONOTONICFUNC_INCREASING)
> >
> > The above can be simplified as 'if (res->monotonic & MONOTONICFUNC_INCREASING) '
>
> True.  I've attached an updated patch.
>
> David

Looks like we need to narrow down the situation where we can apply
this optimization.

SELECT * FROM
  (SELECT empno,
          salary,
                  count(*) over (order by empno desc) as c      ,
          dense_rank() OVER (ORDER BY salary DESC) dr

   FROM empsalary) emp
WHERE dr = 1;

In the current master, the result is:

 empno | salary | c | dr

-------+--------+---+----

     8 |   6000 | 4 |  1

(1 row)

In the patched version， the result is:

 empno | salary | c | dr

-------+--------+---+----

     8 |   6000 | 1 |  1

(1 row)

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Re: Window Function "Run Conditions"

From

David Rowley

Date:

19 August 2021, 06:35:27

On Thu, 19 Aug 2021 at 00:20, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> In the current master, the result is:
>
>  empno | salary | c | dr
> -------+--------+---+----
>      8 |   6000 | 4 |  1

> In the patched version， the result is:
>
>  empno | salary | c | dr
> -------+--------+---+----
>      8 |   6000 | 1 |  1

Thanks for taking it for a spin.

That's a bit unfortunate.  I don't immediately see how to fix it other
than to restrict the optimisation to only apply when there's a single
WindowClause. It might be possible to relax it further and only apply
if it's the final window clause to be evaluated, but in those cases,
the savings are likely to be much less anyway as some previous
WindowAgg will have exhausted all rows from its subplan.  Likely
restricting it to only working if there's 1 WindowClause would be fine
as for the people using row_number() for a top-N type query, there's
most likely only going to be 1 WindowClause.

Anyway, I'll take a few more days to think about it before posting a fix.

David

Re: Window Function "Run Conditions"

From

Andy Fan

Date:

26 August 2021, 02:54:05

On Thu, Aug 19, 2021 at 2:35 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Thu, 19 Aug 2021 at 00:20, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> > In the current master, the result is:
> >
> >  empno | salary | c | dr
> > -------+--------+---+----
> >      8 |   6000 | 4 |  1
>
> > In the patched version， the result is:
> >
> >  empno | salary | c | dr
> > -------+--------+---+----
> >      8 |   6000 | 1 |  1
>
> Thanks for taking it for a spin.
>
> That's a bit unfortunate.  I don't immediately see how to fix it other
> than to restrict the optimisation to only apply when there's a single
> WindowClause. It might be possible to relax it further and only apply
> if it's the final window clause to be evaluated, but in those cases,
> the savings are likely to be much less anyway as some previous
> WindowAgg will have exhausted all rows from its subplan.

I am trying to hack the select_active_windows function to make
sure the WindowClause with Run Condition clause to be the last one
to evaluate (we also need to consider more than 1 window func has
run condition), at that time, the run condition clause is ready already.

However there are two troubles in this direction: a).  This may conflict
with "the windows that need the same sorting are adjacent in the list."
b). "when two or more windows are order-equivalent then all peer rows
must be presented in the same order in all of them. .. (See General Rule 4 of
<window clause> in SQL2008 - SQL2016.)"

In summary, I am not sure if it is correct to change the execution Order
of WindowAgg freely.

>   Likely
> restricting it to only working if there's 1 WindowClause would be fine
> as for the people using row_number() for a top-N type query, there's
> most likely only going to be 1 WindowClause.
>

This sounds practical.  And I suggest the following small changes.
(just check the partitionClause before the prosupport)

@@ -2133,20 +2133,22 @@ find_window_run_conditions(Query *subquery,
RangeTblEntry *rte, Index rti,

        *keep_original = true;

-       prosupport = get_func_support(wfunc->winfnoid);
-
-       /* Check if there's a support function for 'wfunc' */
-       if (!OidIsValid(prosupport))
-               return false;
-
        /*
         * Currently the WindowAgg node just stop when the run condition is no
         * longer true.  If there is a PARTITION BY clause then we cannot just
         * stop as other partitions still need to be processed.
         */
+
+       /* Check this first since window function with a partition
clause is common*/
        if (wclause->partitionClause != NIL)
                return false;

+       prosupport = get_func_support(wfunc->winfnoid);
+
+       /* Check if there's a support function for 'wfunc' */
+       if (!OidIsValid(prosupport))
+               return false;
+
        /* get the Expr from the other side of the OpExpr */
        if (wfunc_left)
                otherexpr = lsecond(opexpr->args);



--
Best Regards
Andy Fan (https://www.aliyun.com/)

Re: Window Function "Run Conditions"

From

Greg Stark

Date:

15 March 2022, 21:23:40

This looks like an awesome addition.

I have one technical questions...

Is it possible to actually transform the row_number case into a LIMIT
clause or make the planner support for this case equivalent to it (in
which case we can replace the LIMIT clause planning to transform into
a window function)?

The reason I ask is because the Limit plan node is actually quite a
bit more optimized than the general window function plan node. It
calculates cost estimates based on the limit and can support Top-N
sort nodes.

But the bigger question is whether this patch is ready for a committer
to look at? Were you able to resolve Andy Fan's bug report? Did you
resolve the two questions in the original email?

Re: Window Function "Run Conditions"

From

Corey Huinker

Date:

17 March 2022, 04:04:03

On Tue, Mar 15, 2022 at 5:24 PM Greg Stark <stark@mit.edu> wrote:

This looks like an awesome addition.

I have one technical questions...

Is it possible to actually transform the row_number case into a LIMIT
clause or make the planner support for this case equivalent to it (in
which case we can replace the LIMIT clause planning to transform into
a window function)?

The reason I ask is because the Limit plan node is actually quite a
bit more optimized than the general window function plan node. It
calculates cost estimates based on the limit and can support Top-N
sort nodes.

But the bigger question is whether this patch is ready for a committer
to look at? Were you able to resolve Andy Fan's bug report? Did you
resolve the two questions in the original email?

+1 to all this

It seems like this effort would aid in implementing what some other databases implement via the QUALIFY clause, which is to window functions what HAVING is to aggregate functions.

example: https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#qualify_clause

Re: Window Function "Run Conditions"

From

Andres Freund

Date:

22 March 2022, 01:07:18

Hi,

On 2021-08-19 18:35:27 +1200, David Rowley wrote:
> Anyway, I'll take a few more days to think about it before posting a fix.

The patch in the CF entry doesn't apply: http://cfbot.cputube.org/patch_37_3234.log

The quoted message was ~6 months ago. I think this CF entry should be marked
as returned-with-feedback?

- Andres

Re: Window Function "Run Conditions"

From

David Rowley

Date:

22 March 2022, 22:24:13

On Thu, 26 Aug 2021 at 14:54, Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> On Thu, Aug 19, 2021 at 2:35 PM David Rowley <dgrowleyml@gmail.com> wrote:
> >
> > On Thu, 19 Aug 2021 at 00:20, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> > > In the current master, the result is:
> > >
> > >  empno | salary | c | dr
> > > -------+--------+---+----
> > >      8 |   6000 | 4 |  1
> >
> > > In the patched version， the result is:
> > >
> > >  empno | salary | c | dr
> > > -------+--------+---+----
> > >      8 |   6000 | 1 |  1
> >
> > Thanks for taking it for a spin.
> >
> > That's a bit unfortunate.  I don't immediately see how to fix it other
> > than to restrict the optimisation to only apply when there's a single
> > WindowClause. It might be possible to relax it further and only apply
> > if it's the final window clause to be evaluated, but in those cases,
> > the savings are likely to be much less anyway as some previous
> > WindowAgg will have exhausted all rows from its subplan.
>
> I am trying to hack the select_active_windows function to make
> sure the WindowClause with Run Condition clause to be the last one
> to evaluate (we also need to consider more than 1 window func has
> run condition), at that time, the run condition clause is ready already.
>
> However there are two troubles in this direction: a).  This may conflict
> with "the windows that need the same sorting are adjacent in the list."
> b). "when two or more windows are order-equivalent then all peer rows
> must be presented in the same order in all of them. .. (See General Rule 4 of
> <window clause> in SQL2008 - SQL2016.)"
>
> In summary, I am not sure if it is correct to change the execution Order
> of WindowAgg freely.

Thanks for looking at that.

My current thoughts are that it just feels a little too risky to
adjust the comparison function that sorts the window clauses to pay
attention to the run-condition.

We would need to ensure that there's just a single window function
with a run condition as it wouldn't be valid for there to be multiple.
It would be easy enough to ensure we only push quals into just 1
window clause, but that and meddling with the evaluation order has
trade-offs.  To do that properly, we'd likely want to consider the
costs when deciding which window clause would benefit from having
quals pushed the most.  Plus, if we start messing with the evaluation
order then we'd likely really want some sort of costing to check if
pushing a qual and adjusting the evaluation order is actually cheaper
than not pushing the qual and keeping the clauses in the order that
requires the minimum number of sorts.   The planner is not really
geared up for costing things like that properly, that's why we just
assume the order with the least sorts is best. In reality that's often
not going to be true as an index may exist and we might want to
evaluate a clause first if we could get rid of a sort and index scan
instead. Sorting the window clauses based on their SortGroupClause is
just the best we can do for now at that stage in planning.

I think it's safer to just disable the optimisation when there are
multiple window clauses.  Multiple matching clauses are merged
already, so it's perfectly valid to have multiple window functions,
it's just they must share the same window clause.  I don't think
that's terrible as with the major use case that I have in mind for
this, the window function is only added to limit the number of rows.
In most cases I can imagine, there'd be no reason to have an
additional window function with different frame options.

I've attached an updated patch.

Attachment

v2-0001-Teach-planner-and-executor-about-monotonic-window.patch

Re: Window Function "Run Conditions"

From

David Rowley

Date:

22 March 2022, 22:35:53

On Wed, 16 Mar 2022 at 10:24, Greg Stark <stark@mit.edu> wrote:
>
> This looks like an awesome addition.

Thanks

> I have one technical questions...
>
> Is it possible to actually transform the row_number case into a LIMIT
> clause or make the planner support for this case equivalent to it (in
> which case we can replace the LIMIT clause planning to transform into
> a window function)?

Currently, I have only coded it to support monotonically increasing
and decreasing functions. Putting a <= <const> type condition on a
row_number() function with no PARTITION BY clause I think is logically
the same as a LIMIT clause, but that's not the case for rank() and
dense_rank().  There may be multiple peer rows with the same rank in
those cases. We'd have no way to know what the LIMIT should be set to.
I don't really want to just do this for row_number().

> The reason I ask is because the Limit plan node is actually quite a
> bit more optimized than the general window function plan node. It
> calculates cost estimates based on the limit and can support Top-N
> sort nodes.

This is true.  There's perhaps no reason why an additional property
could not be added to allow the prosupport function to optionally set
*exactly* the maximum number of rows that could match the condition.
e.g. for select * from (select *,row_number() over (order by c) rn
from ..) w where rn <= 10; that could be set to 10, and if we used
rank() instead of row_number(), it could just be left unset.

I think this is probably worth thinking about at some future date. I
don't really want to make it part of this effort. I also don't think
I'm doing anything here that would need to be undone to make that
work.

David

Re: Window Function "Run Conditions"

From

David Rowley

Date:

22 March 2022, 22:39:13

On Thu, 17 Mar 2022 at 17:04, Corey Huinker <corey.huinker@gmail.com> wrote:
> It seems like this effort would aid in implementing what some other databases implement via the QUALIFY clause, which
isto window functions what HAVING is to aggregate functions.
 
> example: https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#qualify_clause

Isn't that just syntactic sugar?  You could get the same from adding a
subquery where a WHERE clause to filter rows evaluated after the
window clause.

David

Re: Window Function "Run Conditions"

From

Zhihong Yu

Date:

22 March 2022, 23:54:08

On Tue, Mar 22, 2022 at 3:24 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 26 Aug 2021 at 14:54, Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> On Thu, Aug 19, 2021 at 2:35 PM David Rowley <dgrowleyml@gmail.com> wrote:
> >
> > On Thu, 19 Aug 2021 at 00:20, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> > > In the current master, the result is:
> > >
> > > empno | salary | c | dr
> > > -------+--------+---+----
> > > 8 | 6000 | 4 | 1
> >
> > > In the patched version， the result is:
> > >
> > > empno | salary | c | dr
> > > -------+--------+---+----
> > > 8 | 6000 | 1 | 1
> >
> > Thanks for taking it for a spin.
> >
> > That's a bit unfortunate. I don't immediately see how to fix it other
> > than to restrict the optimisation to only apply when there's a single
> > WindowClause. It might be possible to relax it further and only apply
> > if it's the final window clause to be evaluated, but in those cases,
> > the savings are likely to be much less anyway as some previous
> > WindowAgg will have exhausted all rows from its subplan.
>
> I am trying to hack the select_active_windows function to make
> sure the WindowClause with Run Condition clause to be the last one
> to evaluate (we also need to consider more than 1 window func has
> run condition), at that time, the run condition clause is ready already.
>
> However there are two troubles in this direction: a). This may conflict
> with "the windows that need the same sorting are adjacent in the list."
> b). "when two or more windows are order-equivalent then all peer rows
> must be presented in the same order in all of them. .. (See General Rule 4 of
> <window clause> in SQL2008 - SQL2016.)"
>
> In summary, I am not sure if it is correct to change the execution Order
> of WindowAgg freely.

Thanks for looking at that.

My current thoughts are that it just feels a little too risky to
adjust the comparison function that sorts the window clauses to pay
attention to the run-condition.

We would need to ensure that there's just a single window function
with a run condition as it wouldn't be valid for there to be multiple.
It would be easy enough to ensure we only push quals into just 1
window clause, but that and meddling with the evaluation order has
trade-offs. To do that properly, we'd likely want to consider the
costs when deciding which window clause would benefit from having
quals pushed the most. Plus, if we start messing with the evaluation
order then we'd likely really want some sort of costing to check if
pushing a qual and adjusting the evaluation order is actually cheaper
than not pushing the qual and keeping the clauses in the order that
requires the minimum number of sorts. The planner is not really
geared up for costing things like that properly, that's why we just
assume the order with the least sorts is best. In reality that's often
not going to be true as an index may exist and we might want to
evaluate a clause first if we could get rid of a sort and index scan
instead. Sorting the window clauses based on their SortGroupClause is
just the best we can do for now at that stage in planning.

I think it's safer to just disable the optimisation when there are
multiple window clauses. Multiple matching clauses are merged
already, so it's perfectly valid to have multiple window functions,
it's just they must share the same window clause. I don't think
that's terrible as with the major use case that I have in mind for
this, the window function is only added to limit the number of rows.
In most cases I can imagine, there'd be no reason to have an
additional window function with different frame options.

I've attached an updated patch.

Hi,

The following code seems to be common between if / else blocks (w.r.t. wfunc_left) of find_window_run_conditions().

+ op = get_opfamily_member(opinfo->opfamily_id,
+ opinfo->oplefttype,
+ opinfo->oprighttype,
+ newstrategy);
+
+ newopexpr = (OpExpr *) make_opclause(op,
+ opexpr->opresulttype,
+ opexpr->opretset,
+ otherexpr,
+ (Expr *) wfunc,
+ opexpr->opcollid,
+ opexpr->inputcollid);
+ newopexpr->opfuncid = get_opcode(op);
+
+ *keep_original = true;

+ runopexpr = newopexpr;

It would be nice if this code can be shared.

+ WindowClause *wclause = (WindowClause *)
+ list_nth(subquery->windowClause,
+ wfunc->winref - 1);

The code would be more readable if list_nth() is indented.

+ /* Check the left side of the OpExpr */

It seems the code for checking left / right is the same. It would be better to extract and reuse the code.

Cheers

Re: Window Function "Run Conditions"

From

David Rowley

Date:

23 March 2022, 03:23:02

 On Wed, 23 Mar 2022 at 12:50, Zhihong Yu <zyu@yugabyte.com> wrote:
> The following code seems to be common between if / else blocks (w.r.t. wfunc_left) of find_window_run_conditions().

> It would be nice if this code can be shared.

I remember thinking about that and thinking that I didn't want to
overcomplicate the if conditions for the strategy tests. I'd thought
these would have become:

if ((wfunc_left && (strategy == BTLessStrategyNumber ||
    strategy == BTLessEqualStrategyNumber)) ||
     (!wfunc_left && (strategy == BTGreaterStrategyNumber ||
      strategy == BTGreaterEqualStrategyNumber)))

which I didn't think was very readable. That caused me to keep it separate.

On reflection, we can just leave the strategy checks as they are, then
add the additional code for checking wfunc_left when checking the
res->monotonic, i.e:

if ((wfunc_left && (res->monotonic & MONOTONICFUNC_INCREASING)) ||
   (!wfunc_left && (res->monotonic & MONOTONICFUNC_DECREASING)))

I think that's more readable than doubling up the strategy checks, so
I've done it that way in the attached.

>
> +           WindowClause *wclause = (WindowClause *)
> +           list_nth(subquery->windowClause,
> +                    wfunc->winref - 1);
>
> The code would be more readable if list_nth() is indented.

That's just the way pgindent put it.

> +   /* Check the left side of the OpExpr */
>
> It seems the code for checking left / right is the same. It would be better to extract and reuse the code.

I've moved some of that code into find_window_run_conditions() which
removes about 10 lines of code.

Updated patch attached. Thanks for looking.

David

Attachment

v3-0001-Teach-planner-and-executor-about-monotonic-window.patch

Re: Window Function "Run Conditions"

From

David Rowley

Date:

23 March 2022, 03:30:27

On Wed, 23 Mar 2022 at 11:24, David Rowley <dgrowleyml@gmail.com> wrote:
> I think it's safer to just disable the optimisation when there are
> multiple window clauses.  Multiple matching clauses are merged
> already, so it's perfectly valid to have multiple window functions,
> it's just they must share the same window clause.  I don't think
> that's terrible as with the major use case that I have in mind for
> this, the window function is only added to limit the number of rows.
> In most cases I can imagine, there'd be no reason to have an
> additional window function with different frame options.

I've not looked into the feasibility of it, but I had a thought that
we could just accumulate all the run-conditions in a new field in the
PlannerInfo then just tag them onto the top-level WindowAgg when
building the plan.

I'm just not sure it would be any more useful than what the v3 patch
is currently doing as intermediate WindowAggs would still need to
process all rows.  I think it would only save the window function
evaluation of the top-level WindowAgg for rows that don't match the
run-condition.  All the supported window functions are quite cheap, so
it's not a huge saving. I'd bet there would be example cases where it
would be measurable though.

David

Re: Window Function "Run Conditions"

From

"David G. Johnston"

Date:

23 March 2022, 05:09:27

On Tue, Mar 22, 2022 at 3:39 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 17 Mar 2022 at 17:04, Corey Huinker <corey.huinker@gmail.com> wrote:
> It seems like this effort would aid in implementing what some other databases implement via the QUALIFY clause, which is to window functions what HAVING is to aggregate functions.
> example: https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#qualify_clause

Isn't that just syntactic sugar? You could get the same from adding a
subquery where a WHERE clause to filter rows evaluated after the
window clause.

I'd like some of that syntactic sugar please. It goes nicely with my HAVING syntactic coffee.

David J.

Re: Window Function "Run Conditions"

From

David Rowley

Date:

29 March 2022, 02:11:52

On Wed, 23 Mar 2022 at 16:30, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Wed, 23 Mar 2022 at 11:24, David Rowley <dgrowleyml@gmail.com> wrote:
> > I think it's safer to just disable the optimisation when there are
> > multiple window clauses.  Multiple matching clauses are merged
> > already, so it's perfectly valid to have multiple window functions,
> > it's just they must share the same window clause.  I don't think
> > that's terrible as with the major use case that I have in mind for
> > this, the window function is only added to limit the number of rows.
> > In most cases I can imagine, there'd be no reason to have an
> > additional window function with different frame options.
>
> I've not looked into the feasibility of it, but I had a thought that
> we could just accumulate all the run-conditions in a new field in the
> PlannerInfo then just tag them onto the top-level WindowAgg when
> building the plan.
>
> I'm just not sure it would be any more useful than what the v3 patch
> is currently doing as intermediate WindowAggs would still need to
> process all rows.  I think it would only save the window function
> evaluation of the top-level WindowAgg for rows that don't match the
> run-condition.  All the supported window functions are quite cheap, so
> it's not a huge saving. I'd bet there would be example cases where it
> would be measurable though.

Another way of doing this that seems better is to make it so only the
top-level WindowAgg will stop processing when the run condition
becomes false.  Any intermediate WindowAggs must continue processing
tuples, but may skip evaluation of their WindowFuncs.

Doing things this way also allows us to handle cases where there is a
PARTITION BY clause, however, in this case, the top-level WindowAgg
must not stop processing and return NULL, instead, it can just act as
if it were an intermediate WindowAgg and just stop evaluating
WindowFuncs.  The top-level WindowAgg must continue processing the
tuples so that the other partitions are also processed.

I made the v4 patch do things this way and tested the performance of
it vs current master.  Test 1 and 2 have PARTITION BY clauses. There's
a small performance increase from not evaluating the row_number()
function once rn <= 2 is no longer true.

Test 3 shows the same speedup as the original patch where we just stop
processing any further tuples when the run condition is no longer true
and there is no PARTITION BY clause.

Setup:
create table xy (x int, y int);
insert into xy select x,y from generate_series(1,1000)x,
generate_Series(1,1000)y;
create index on xy(x,y);
vacuum analyze xy;

Test 1:

explain analyze select * from (select x,y,row_number() over (partition
by x order by y) rn from xy) as xy where rn <= 2;

Master:

Execution Time: 359.553 ms
Execution Time: 354.235 ms
Execution Time: 357.646 ms

v4 patch:

Execution Time: 346.641 ms
Execution Time: 337.131 ms
Execution Time: 336.531 ms

(5% faster)

Test 2:

explain analyze select * from (select x,y,row_number() over (partition
by x order by y) rn from xy) as xy where rn = 1;

Master:

Execution Time: 359.046 ms
Execution Time: 357.601 ms
Execution Time: 357.977 ms

v4 patch:

Execution Time: 336.540 ms
Execution Time: 337.024 ms
Execution Time: 342.706 ms

(5.7% faster)

Test 3:

explain analyze select * from (select x,y,row_number() over (order by
x,y) rn from xy) as xy where rn <= 2;

Master:

Execution Time: 362.322 ms
Execution Time: 348.812 ms
Execution Time: 349.471 ms

v4 patch:

Execution Time: 0.060 ms
Execution Time: 0.037 ms
Execution Time: 0.037 ms

(~8000x faster)

One thing which I'm not sure about with the patch is how I'm handling
the evaluation of the runcondition in nodeWindowAgg.c.  Instead of
having ExecQual() evaluate an OpExpr such as "row_number() over (...)
<= 10", I'm replacing the WindowFunc with the Var in the targetlist
that corresponds to the given WindowFunc.  This saves having to double
evaluate the WindowFunc. Instead, the value of the Var can be taken
directly from the slot.  I don't know of anywhere else we do things
quite like that.  The runcondition is slightly similar to HAVING
clauses, but HAVING clauses don't work this way.  Maybe they would
have if slots had existed back then. Or maybe it's a bad idea to set a
precedent that the targetlist Vars must be evaluated already.  Does
anyone have any thoughts on this part?

v4 patch attached.

David

Attachment

v4-0001-Teach-planner-and-executor-about-monotonic-window.patch

Re: Window Function "Run Conditions"

From

Andres Freund

Date:

29 March 2022, 22:16:15

Hi,

On 2022-03-29 15:11:52 +1300, David Rowley wrote:
> One thing which I'm not sure about with the patch is how I'm handling
> the evaluation of the runcondition in nodeWindowAgg.c.  Instead of
> having ExecQual() evaluate an OpExpr such as "row_number() over (...)
> <= 10", I'm replacing the WindowFunc with the Var in the targetlist
> that corresponds to the given WindowFunc.  This saves having to double
> evaluate the WindowFunc. Instead, the value of the Var can be taken
> directly from the slot.  I don't know of anywhere else we do things
> quite like that.  The runcondition is slightly similar to HAVING
> clauses, but HAVING clauses don't work this way.

Don't HAVING clauses actually work pretty similar? Yes, they don't have a Var,
but for expression evaluation purposes an Aggref is nearly the same as a plain
Var:

        EEO_CASE(EEOP_INNER_VAR)
        {
            int         attnum = op->d.var.attnum;

            /*
             * Since we already extracted all referenced columns from the
             * tuple with a FETCHSOME step, we can just grab the value
             * directly out of the slot's decomposed-data arrays.  But let's
             * have an Assert to check that that did happen.
             */
            Assert(attnum >= 0 && attnum < innerslot->tts_nvalid);
            *op->resvalue = innerslot->tts_values[attnum];
            *op->resnull = innerslot->tts_isnull[attnum];

            EEO_NEXT();
        }
vs
        EEO_CASE(EEOP_AGGREF)
        {
            /*
             * Returns a Datum whose value is the precomputed aggregate value
             * found in the given expression context.
             */
            int         aggno = op->d.aggref.aggno;

            Assert(econtext->ecxt_aggvalues != NULL);

            *op->resvalue = econtext->ecxt_aggvalues[aggno];
            *op->resnull = econtext->ecxt_aggnulls[aggno];

            EEO_NEXT();
        }

specifically we don't re-evaluate expressions?

This is afaics slightly cheaper than referencing a variable in a slot.

Greetings,

Andres Freund

Re: Window Function "Run Conditions"

From

David Rowley

Date:

05 April 2022, 00:04:18

Thanks for having a look at this.

On Wed, 30 Mar 2022 at 11:16, Andres Freund <andres@anarazel.de> wrote:
> On 2022-03-29 15:11:52 +1300, David Rowley wrote:
> > One thing which I'm not sure about with the patch is how I'm handling
> > the evaluation of the runcondition in nodeWindowAgg.c.  Instead of
> > having ExecQual() evaluate an OpExpr such as "row_number() over (...)
> > <= 10", I'm replacing the WindowFunc with the Var in the targetlist
> > that corresponds to the given WindowFunc.  This saves having to double
> > evaluate the WindowFunc. Instead, the value of the Var can be taken
> > directly from the slot.  I don't know of anywhere else we do things
> > quite like that.  The runcondition is slightly similar to HAVING
> > clauses, but HAVING clauses don't work this way.
>
> Don't HAVING clauses actually work pretty similar? Yes, they don't have a Var,
> but for expression evaluation purposes an Aggref is nearly the same as a plain
> Var:
>
>         EEO_CASE(EEOP_INNER_VAR)
>         {
>             int         attnum = op->d.var.attnum;
>
>             /*
>              * Since we already extracted all referenced columns from the
>              * tuple with a FETCHSOME step, we can just grab the value
>              * directly out of the slot's decomposed-data arrays.  But let's
>              * have an Assert to check that that did happen.
>              */
>             Assert(attnum >= 0 && attnum < innerslot->tts_nvalid);
>             *op->resvalue = innerslot->tts_values[attnum];
>             *op->resnull = innerslot->tts_isnull[attnum];
>
>             EEO_NEXT();
>         }
> vs
>         EEO_CASE(EEOP_AGGREF)
>         {
>             /*
>              * Returns a Datum whose value is the precomputed aggregate value
>              * found in the given expression context.
>              */
>             int         aggno = op->d.aggref.aggno;
>
>             Assert(econtext->ecxt_aggvalues != NULL);
>
>             *op->resvalue = econtext->ecxt_aggvalues[aggno];
>             *op->resnull = econtext->ecxt_aggnulls[aggno];
>
>             EEO_NEXT();
>         }
>
> specifically we don't re-evaluate expressions?

Thanks for highlighting the similarities. I'm feeling better about the
choice now.

I've made another pass over the patch and updated a few comments and
made a small code change to delay the initialisation of a variable.

I'm pretty happy with this now. If anyone wants to have a look at
this, can they do so or let me know they're going to within the next
24 hours.  Otherwise I plan to move into commit mode with it.

> This is afaics slightly cheaper than referencing a variable in a slot.

I guess you must mean cheaper because it means there will be no
EEOP_*_FETCHSOME step?  Otherwise it seems a fairly similar amount of
work.

David

Attachment

v5-0001-Teach-planner-and-executor-about-monotonic-window.patch

Re: Window Function "Run Conditions"

From

Andy Fan

Date:

05 April 2022, 02:35:31

I'm pretty happy with this now. If anyone wants to have a look at
this, can they do so or let me know they're going to within the next
24 hours. Otherwise I plan to move into commit mode with it.

I just came to the office today to double check this patch. I probably can

finish it very soon. But if you are willing to commit it sooner, I am totally

fine with it.

Best Regards

Andy Fan

Re: Window Function "Run Conditions"

From

Andres Freund

Date:

05 April 2022, 03:09:57

On 2022-04-05 12:04:18 +1200, David Rowley wrote:
> > This is afaics slightly cheaper than referencing a variable in a slot.
> 
> I guess you must mean cheaper because it means there will be no
> EEOP_*_FETCHSOME step?  Otherwise it seems a fairly similar amount of
> work.

That, and slightly fewer indirections for accessing values IIRC.

Re: Window Function "Run Conditions"

From

Andy Fan

Date:

05 April 2022, 07:38:44

Hi David:

I just came to the office today to double check this patch. I probably can
finish it very soon.

I would share my current review result first and more review is still in progress.

There is a lot of amazing stuff there but I'd save the simple +1 and just share

something I'm not fully understand now. I just focused on the execution part and

only 1 WindowAgg node situation right now.

1. We can do more on PASSTHROUGH, we just bypass the window function

currently, but IIUC we can ignore all of the following tuples in current partition

once we go into this mode. patch 0001 shows what I mean.

--- without patch 0001, we need 1653 ms for the below query, with the patch 0001,

--- we need 629ms. This is not a serious performance comparison since I

--- build software with -O0 and --enable_cassert. but it can show some improvement.

postgres=# explain analyze select * from (select x,y,row_number() over (partition
by x order by y) rn from xy) as xy where rn < 2;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on xy (cost=0.42..55980.43 rows=5000 width=16) (actual time=0.072..1653.631 rows=1000 loops=1)
Filter: (xy.rn = 1)
Rows Removed by Filter: 999000
-> WindowAgg (cost=0.42..43480.43 rows=1000000 width=16) (actual time=0.069..1494.553 rows=1000000 loops=1)
Run Condition: (row_number() OVER (?) < 2)
-> Index Only Scan using xy_x_y_idx on xy xy_1 (cost=0.42..25980.42 rows=1000000 width=8) (actual time=0.047..330.283 rows=1000000 loops=1)
Heap Fetches: 0
Planning Time: 0.240 ms
Execution Time: 1653.913 ms
(9 rows)

postgres=# explain analyze select * from (select x,y,row_number() over (partition
by x order by y) rn from xy) as xy where rn < 2;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on xy (cost=0.42..55980.43 rows=5000 width=16) (actual time=0.103..629.428 rows=1000 loops=1)
Filter: (xy.rn < 2)
Rows Removed by Filter: 1000
-> WindowAgg (cost=0.42..43480.43 rows=1000000 width=16) (actual time=0.101..628.821 rows=2000 loops=1)
Run Condition: (row_number() OVER (?) < 2)
-> Index Only Scan using xy_x_y_idx on xy xy_1 (cost=0.42..25980.42 rows=1000000 width=8) (actual time=0.063..281.715 rows=1000000 loops=1)
Heap Fetches: 0
Planning Time: 1.119 ms
Execution Time: 629.781 ms
(9 rows)

Time: 633.241 ms

2. the "Rows Removed by Filter: 1000" is strange to me for the above example.

Subquery Scan on xy (cost=0.42..55980.43 rows=5000 width=16) (actual time=0.103..629.428 rows=1000 loops=1)
Filter: (xy.rn < 2)
Rows Removed by Filter: 1000

The root cause is even ExecQual(winstate->runcondition, econtext) return false, we
still return the slot to the upper node. A simple hack can avoid it.

3. With the changes in 2, I think we can avoid the subquery node totally for the above query.

4. If all the above are correct, looks the enum WindowAggStatus addition is not a

must since we can do what WINDOWAGG_PASSTHROUGH does just when we find it is, like
patch 3 shows. (I leave WINDOWAGG_DONE only, but it can be replaced with
previous all_done field).

Finally, Thanks for the patch, it is a good material to study the knowledge

in this area.

Best Regards

Andy Fan

Attachment

Re: Window Function "Run Conditions"

From

Andy Fan

Date:

05 April 2022, 07:40:29

The root cause is even ExecQual(winstate->runcondition, econtext) return false, we
still return the slot to the upper node. A simple hack can avoid it.

Forget to say 0002 shows what I mean.

Best Regards

Andy Fan

Re: Window Function "Run Conditions"

From

David Rowley

Date:

05 April 2022, 11:49:15

On Tue, 5 Apr 2022 at 19:38, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> 1. We can do more on PASSTHROUGH, we just bypass the window function
> currently,  but IIUC we can ignore all of the following tuples in current partition
> once we go into this mode.  patch 0001 shows what I mean.

Yeah, there is more performance to be had than even what you've done
there.  There's no reason really for spool_tuples() to do
tuplestore_puttupleslot() when we're not in run mode.

The attached should give slightly more performance.  I'm unsure if
there's more that can be done for window aggregates, i.e.
eval_windowaggregates()

I'll consider the idea about doing all the filtering in
nodeWindowAgg.c. For now I made find_window_run_conditions() keep the
qual so that it's still filtered in the subquery level when there is a
PARTITION BY clause. Probably the best way would be to make
nodeWindowAgg.c just loop with a for(;;) loop. I'll need to give it
more thought. I'll do that in the morning.

David

Attachment

windowagg_hacks.patch

Re: Window Function "Run Conditions"

From

Andy Fan

Date:

05 April 2022, 12:59:10

On Tue, Apr 5, 2022 at 7:49 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, 5 Apr 2022 at 19:38, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> 1. We can do more on PASSTHROUGH, we just bypass the window function
> currently, but IIUC we can ignore all of the following tuples in current partition
> once we go into this mode. patch 0001 shows what I mean.

Yeah, there is more performance to be had than even what you've done
there. There's no reason really for spool_tuples() to do
tuplestore_puttupleslot() when we're not in run mode.

Yeah, this is a great idea.

The attached should give slightly more performance. I'm unsure if
there's more that can be done for window aggregates, i.e.
eval_windowaggregates()

I'll consider the idea about doing all the filtering in
nodeWindowAgg.c. For now I made find_window_run_conditions() keep the
qual so that it's still filtered in the subquery level when there is a
PARTITION BY clause. Probably the best way would be to make

nodeWindowAgg.c just loop with a for(;;) loop. I'll need to give it
more thought. I'll do that in the morning.

I just finished the planner part review and thought about the multi activeWindows

cases, I think passthrough mode should be still needed but just for multi

activeWindow cases, In the passthrough mode, we can not discard the tuples

in the same partition. Just that PARTITION BY clause should not be the requirement

for passthrough mode and we can do such optimization. We can discuss

more after your final decision.

And I would suggest the below fastpath for this feature.

@@ -2535,7 +2535,7 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
* if it happens to reference a window function. If so then
* it might be useful to use for the WindowAgg's runCondition.
*/
- if (check_and_push_window_quals(subquery, rte, rti, clause))
+ if (!subquery->hasWindowFuncs || check_and_push_window_quals(subquery, rte, rti, clause))
{
/*
* It's not a suitable window run condition qual or it is,

Best Regards

Andy Fan

Re: Window Function "Run Conditions"

From

David Rowley

Date:

07 April 2022, 02:36:37

On Wed, 6 Apr 2022 at 00:59, Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> On Tue, Apr 5, 2022 at 7:49 PM David Rowley <dgrowleyml@gmail.com> wrote:
>> Yeah, there is more performance to be had than even what you've done
>> there.  There's no reason really for spool_tuples() to do
>> tuplestore_puttupleslot() when we're not in run mode.
>
>
> Yeah, this is a great idea.

I've attached an updated patch that does most of what you mentioned.
To make this work I had to add another state to the WindowAggStatus.
This new state is what the top-level WindowAgg will move into when
there's a PARTITION BY clause and the run condition becomes false.
The new state is named WINDOWAGG_PASSTHROUGH_STRICT, which does all
that WINDOWAGG_PASSTHROUGH does plus skips tuplestoring tuples during
the spool.  We must still spool those tuples when we're not the
top-level WindowAgg so that we can send those out to any calling
WindowAgg nodes. They'll need those so they return the correct result.

This means that for intermediate WindowAgg nodes, when the
runcondition becomes false, we only skip evaluation of WindowFuncs.
WindowAgg nodes above us cannot reference these, so there's no need to
evaluate them, plus, if there's a run condition then these tuples will
be filtered out in the final WindowAgg node.

For the top-level WindowAgg node, when the run condition becomes false
we can save quite a bit more work. If there's no PARTITION BY clause,
then we're done. Just return NULL.  When there is a PARTITION BY
clause we move into WINDOWAGG_PASSTHROUGH_STRICT which allows us to
skip both the evaluation of WindowFuncs and also allows us to consume
tuples from our outer plan until we get a tuple belonging to another
partition.  No need to tuplestore these tuples as they're being
filtered out.

Since intermediate WindowAggs cannot filter tuples, all the filtering
must occur in the top-level WindowAgg.  This cannot be done by way of
the run condition as the run condition is special as when it becomes
false, we don't check again to see if it's become true.  A sort node
between the WindowAggs can change the tuple order (i.e previously
monotonic values may no longer be monotonic) so it's only valid to
evaluate the run condition that's meant for the WindowAgg node it was
intended for.  To filter out the tuples that don't match the run
condition from intermediate WindowAggs in the top-level WindowAgg,
what I've done is introduced quals for WindowAgg nodes.  This means
that we can now see Filter in EXPLAIN For WindowAgg and "Rows Removed
by Filter".

Why didn't I just do the filtering in the outer query like was
happening before?  The problem is that when we push the quals down
into the subquery, we don't yet have knowledge of which order that the
WindowAggs will be evaluated in.  Only run conditions from
intermediate WindowAggs will ever make it into the Filter, and we
don't know which one the top-level WindowAgg will be until later in
planning. To do the filtering in the outer query we'd need to push
quals back out the subquery again. It seems to me to be easier and
better to filter them out lower down in the plan.

Since the top-level WindowAgg node can now filter tuples, the executor
node had to be given a for(;;) loop so that it goes around again for
another tuple after it filters a tuple out.

I've also updated the commit message which I think I've made quite
clear about what we optimise and how it's done.

> And I would suggest the below fastpath for this feature.
> -                               if (check_and_push_window_quals(subquery, rte, rti, clause))
> +                               if (!subquery->hasWindowFuncs || check_and_push_window_quals(subquery, rte, rti,
clause))

Good idea. Thanks!

David

Attachment

v6-0001-Teach-planner-and-executor-about-monotonic-window.patch

Re: Window Function "Run Conditions"

From

Zhihong Yu

Date:

07 April 2022, 03:45:56

On Wed, Apr 6, 2022 at 7:36 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 6 Apr 2022 at 00:59, Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> On Tue, Apr 5, 2022 at 7:49 PM David Rowley <dgrowleyml@gmail.com> wrote:
>> Yeah, there is more performance to be had than even what you've done
>> there. There's no reason really for spool_tuples() to do
>> tuplestore_puttupleslot() when we're not in run mode.
>
>
> Yeah, this is a great idea.

I've attached an updated patch that does most of what you mentioned.
To make this work I had to add another state to the WindowAggStatus.
This new state is what the top-level WindowAgg will move into when
there's a PARTITION BY clause and the run condition becomes false.
The new state is named WINDOWAGG_PASSTHROUGH_STRICT, which does all
that WINDOWAGG_PASSTHROUGH does plus skips tuplestoring tuples during
the spool. We must still spool those tuples when we're not the
top-level WindowAgg so that we can send those out to any calling
WindowAgg nodes. They'll need those so they return the correct result.

This means that for intermediate WindowAgg nodes, when the
runcondition becomes false, we only skip evaluation of WindowFuncs.
WindowAgg nodes above us cannot reference these, so there's no need to
evaluate them, plus, if there's a run condition then these tuples will
be filtered out in the final WindowAgg node.

For the top-level WindowAgg node, when the run condition becomes false
we can save quite a bit more work. If there's no PARTITION BY clause,
then we're done. Just return NULL. When there is a PARTITION BY
clause we move into WINDOWAGG_PASSTHROUGH_STRICT which allows us to
skip both the evaluation of WindowFuncs and also allows us to consume
tuples from our outer plan until we get a tuple belonging to another
partition. No need to tuplestore these tuples as they're being
filtered out.

Since intermediate WindowAggs cannot filter tuples, all the filtering
must occur in the top-level WindowAgg. This cannot be done by way of
the run condition as the run condition is special as when it becomes
false, we don't check again to see if it's become true. A sort node
between the WindowAggs can change the tuple order (i.e previously
monotonic values may no longer be monotonic) so it's only valid to
evaluate the run condition that's meant for the WindowAgg node it was
intended for. To filter out the tuples that don't match the run
condition from intermediate WindowAggs in the top-level WindowAgg,
what I've done is introduced quals for WindowAgg nodes. This means
that we can now see Filter in EXPLAIN For WindowAgg and "Rows Removed
by Filter".

Why didn't I just do the filtering in the outer query like was
happening before? The problem is that when we push the quals down
into the subquery, we don't yet have knowledge of which order that the
WindowAggs will be evaluated in. Only run conditions from
intermediate WindowAggs will ever make it into the Filter, and we
don't know which one the top-level WindowAgg will be until later in
planning. To do the filtering in the outer query we'd need to push
quals back out the subquery again. It seems to me to be easier and
better to filter them out lower down in the plan.

Since the top-level WindowAgg node can now filter tuples, the executor
node had to be given a for(;;) loop so that it goes around again for
another tuple after it filters a tuple out.

I've also updated the commit message which I think I've made quite
clear about what we optimise and how it's done.

> And I would suggest the below fastpath for this feature.
> - if (check_and_push_window_quals(subquery, rte, rti, clause))
> + if (!subquery->hasWindowFuncs || check_and_push_window_quals(subquery, rte, rti, clause))

Good idea. Thanks!

David

Hi,

+ * We must keep the original qual in place if there is a
+ * PARTITION BY clause as the top-level WindowAgg remains in
+ * pass-through mode and does nothing to filter out unwanted
+ * tuples.
+ */

+ *keep_original = false;

The comment talks about keeping original qual but the assignment uses the value false.

Maybe the comment can be rephrased so that it matches the assignment.

Cheers

Re: Window Function "Run Conditions"

From

David Rowley

Date:

07 April 2022, 07:01:13

On Thu, 7 Apr 2022 at 15:41, Zhihong Yu <zyu@yugabyte.com> wrote:
> +                * We must keep the original qual in place if there is a
> +                * PARTITION BY clause as the top-level WindowAgg remains in
> +                * pass-through mode and does nothing to filter out unwanted
> +                * tuples.
> +                */
> +               *keep_original = false;
>
> The comment talks about keeping original qual but the assignment uses the value false.
> Maybe the comment can be rephrased so that it matches the assignment.

Thanks. I've just removed that comment locally now. You're right, it
was out of date.

David

Re: Window Function "Run Conditions"

From

David Rowley

Date:

07 April 2022, 14:11:06

On Thu, 7 Apr 2022 at 19:01, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Thu, 7 Apr 2022 at 15:41, Zhihong Yu <zyu@yugabyte.com> wrote:
> > +                * We must keep the original qual in place if there is a
> > +                * PARTITION BY clause as the top-level WindowAgg remains in
> > +                * pass-through mode and does nothing to filter out unwanted
> > +                * tuples.
> > +                */
> > +               *keep_original = false;
> >
> > The comment talks about keeping original qual but the assignment uses the value false.
> > Maybe the comment can be rephrased so that it matches the assignment.
>
> Thanks. I've just removed that comment locally now. You're right, it
> was out of date.

I've attached the updated patch with the fixed comment and a few other
comments reworded slightly.

I've also pgindented the patch.

Barring any objection, I'm planning to push this one in around 10 hours time.

David

Attachment

v7-0001-Teach-planner-and-executor-about-monotonic-window.patch

Re: Window Function "Run Conditions"

From

Zhihong Yu

Date:

07 April 2022, 22:08:10

On Thu, Apr 7, 2022 at 7:11 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 7 Apr 2022 at 19:01, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Thu, 7 Apr 2022 at 15:41, Zhihong Yu <zyu@yugabyte.com> wrote:
> > + * We must keep the original qual in place if there is a
> > + * PARTITION BY clause as the top-level WindowAgg remains in
> > + * pass-through mode and does nothing to filter out unwanted
> > + * tuples.
> > + */
> > + *keep_original = false;
> >
> > The comment talks about keeping original qual but the assignment uses the value false.
> > Maybe the comment can be rephrased so that it matches the assignment.
>
> Thanks. I've just removed that comment locally now. You're right, it
> was out of date.

I've attached the updated patch with the fixed comment and a few other
comments reworded slightly.

I've also pgindented the patch.

Barring any objection, I'm planning to push this one in around 10 hours time.

David

Hi,

+ WINDOWAGG_PASSTHROUGH_STRICT /* Pass-through plus don't store new

+ * tuples during spool */

I think the comment in code is illustrative:

+ * STRICT pass-through mode is required for the top window
+ * when there is a PARTITION BY clause. Otherwise we must
+ * ensure we store tuples that don't match the

+ * runcondition so they're available to WindowAggs above.

If you think the above is too long where WINDOWAGG_PASSTHROUGH_STRICT is defined, maybe point to the longer version so that people can find that more easily.

Cheers

Re: Window Function "Run Conditions"

From

David Rowley

Date:

07 April 2022, 22:40:37

On Fri, 8 Apr 2022 at 02:11, David Rowley <dgrowleyml@gmail.com> wrote:
> Barring any objection, I'm planning to push this one in around 10 hours time.

Pushed. 9d9c02ccd

Thank you all for the reviews.

David