Re: Bug in row_number() optimization - Mailing list pgsql-hackers

From David Rowley
Subject Re: Bug in row_number() optimization
Date
Msg-id CAApHDvrv_1KBjGWsvoFO1dHke57c8bkQUVwSHevkivrf2KjaYg@mail.gmail.com
Whole thread Raw
In response to Re: Bug in row_number() optimization  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: Bug in row_number() optimization
List pgsql-hackers
On Fri, 25 Nov 2022 at 16:00, Richard Guo <guofenglinux@gmail.com> wrote:
> Verified the problem is fixed with this patch.  I'm not familiar with
> the WindowAgg execution codes.  As far as I understand, this patch works
> because we set ecxt_aggnulls to true, making it a NULL value.  And the
> top-level WindowAgg node's "Filter" is strict so that it can filter out
> all the tuples that don't match the intermediate WindowAgg node's run
> condition.  So I find the comments about "WindowAggs above us cannot
> reference the result of another WindowAgg" confusing.  But maybe I'm
> missing something.

There are two different pass-through modes that the WindowAgg can move
into when it detects that the run condition is no longer true:

1) WINDOWAGG_PASSTHROUGH and
2) WINDOWAGG_PASSTHROUGH_STRICT

#2 is used when the WindowAgg is the top-level one in this query
level. Remember we'll need multiple WindowAgg nodes when there are
multiple different windows to evaluate.  The reason that we need #1 is
that if there are multiple WindowAggs, then the top-level one (or just
any WindowAgg above it) might need all the rows from a lower-level
WindowAgg.  For example:

SELECT * FROM (SELECT row_number() over(order by id) rn, sum(qty) over
(order by date) qty from t) t where rn <= 10;

if the "order by id" window is evaluated first, we can't stop
outputting rows when rn <= 10 is no longer true as the "order by date"
window might need those.  In this case, once rn <= 10 is no longer
true, the WindowAgg for that window would go into
WINDOWAGG_PASSTHROUGH. This means we can stop window func evaluation
on any additional rows.  The final query will never see rn==11, so we
don't need to generate that.

The problem is that once the "order by id" window stops evaluating the
window funcs, if the window result is byref, then we leave junk in the
aggregate output columns.  Since we continue to output rows from that
WindowAgg for the top-level "order by date" window, we don't want to
form tuples with free'd memory.

Since nothing in the query will ever seen rn==11 and beyond, there's
no need to put anything in that part of the output tuple. We can just
make it an SQL NULL.

What I mean by "WindowAggs above us cannot reference the result of
another WindowAgg" is that the evaluation of sum(qty) over (order by
date) can't see the "rn" column. SQL does not allow it. If it did,
that would have to look something like:

SELECT * FROM (SELECT SUM(row_number() over (order by id)) over (order
by date) qty from t); -- not valid SQL

WINDOWAGG_PASSTHROUGH_STRICT not only does not evaluate window funcs,
it also does not even bother to store tuples in the tuple store. In
this case there's no higher-level WindowAgg that will need these
tuples, so we can just read our sub-node until we find the next
partition, or stop when there's no PARTITION BY clause.

Just thinking of the patch a bit more, what I wrote ends up
continually zeroing the values and marking the columns as NULL. Likely
we can just do this once when we do: winstate->status =
WINDOWAGG_PASSTHROUGH;  I'll test that out and make sure it works.

David



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: Patch: Global Unique Index
Next
From: Dilip Kumar
Date:
Subject: Re: Add sub-transaction overflow status in pg_stat_activity