Re: Bug in row_number() optimization - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Bug in row_number() optimization |
Date | |
Msg-id | CAApHDvrv_1KBjGWsvoFO1dHke57c8bkQUVwSHevkivrf2KjaYg@mail.gmail.com Whole thread Raw |
In response to | Re: Bug in row_number() optimization (Richard Guo <guofenglinux@gmail.com>) |
Responses |
Re: Bug in row_number() optimization
|
List | pgsql-hackers |
On Fri, 25 Nov 2022 at 16:00, Richard Guo <guofenglinux@gmail.com> wrote: > Verified the problem is fixed with this patch. I'm not familiar with > the WindowAgg execution codes. As far as I understand, this patch works > because we set ecxt_aggnulls to true, making it a NULL value. And the > top-level WindowAgg node's "Filter" is strict so that it can filter out > all the tuples that don't match the intermediate WindowAgg node's run > condition. So I find the comments about "WindowAggs above us cannot > reference the result of another WindowAgg" confusing. But maybe I'm > missing something. There are two different pass-through modes that the WindowAgg can move into when it detects that the run condition is no longer true: 1) WINDOWAGG_PASSTHROUGH and 2) WINDOWAGG_PASSTHROUGH_STRICT #2 is used when the WindowAgg is the top-level one in this query level. Remember we'll need multiple WindowAgg nodes when there are multiple different windows to evaluate. The reason that we need #1 is that if there are multiple WindowAggs, then the top-level one (or just any WindowAgg above it) might need all the rows from a lower-level WindowAgg. For example: SELECT * FROM (SELECT row_number() over(order by id) rn, sum(qty) over (order by date) qty from t) t where rn <= 10; if the "order by id" window is evaluated first, we can't stop outputting rows when rn <= 10 is no longer true as the "order by date" window might need those. In this case, once rn <= 10 is no longer true, the WindowAgg for that window would go into WINDOWAGG_PASSTHROUGH. This means we can stop window func evaluation on any additional rows. The final query will never see rn==11, so we don't need to generate that. The problem is that once the "order by id" window stops evaluating the window funcs, if the window result is byref, then we leave junk in the aggregate output columns. Since we continue to output rows from that WindowAgg for the top-level "order by date" window, we don't want to form tuples with free'd memory. Since nothing in the query will ever seen rn==11 and beyond, there's no need to put anything in that part of the output tuple. We can just make it an SQL NULL. What I mean by "WindowAggs above us cannot reference the result of another WindowAgg" is that the evaluation of sum(qty) over (order by date) can't see the "rn" column. SQL does not allow it. If it did, that would have to look something like: SELECT * FROM (SELECT SUM(row_number() over (order by id)) over (order by date) qty from t); -- not valid SQL WINDOWAGG_PASSTHROUGH_STRICT not only does not evaluate window funcs, it also does not even bother to store tuples in the tuple store. In this case there's no higher-level WindowAgg that will need these tuples, so we can just read our sub-node until we find the next partition, or stop when there's no PARTITION BY clause. Just thinking of the patch a bit more, what I wrote ends up continually zeroing the values and marking the columns as NULL. Likely we can just do this once when we do: winstate->status = WINDOWAGG_PASSTHROUGH; I'll test that out and make sure it works. David
pgsql-hackers by date: