Thread: BUG #17502: View based on window functions returns wrong results when queried
BUG #17502: View based on window functions returns wrong results when queried
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 17502 Logged by: Daniel Farkaš Email address: daniel.farkas@datoris.com PostgreSQL version: 10.10 Operating system: Linux Description: Hey, Please be gentle, I've never been in contact with Postgres developers. In short, I've created a view, which has rather sketchy window functions, but it gives me results I need. When I do select * on it, it gives me what I expect. One of the columns has five distinct values. But when I do group by on that column, it gives me only one of the values. When I drop the view and create materialized view, all is good, I get all five values. My guess is that some parts of the inner select are affecting outer, view's select, which is not something I would expect. My current Postgres is PostgreSQL 10.10 on x86_64-pc-linux-musl, compiled by gcc (Alpine 8.3.0) 8.3.0, 64-bit. If you think this is worth investigating further, I will try composing a simpler example, and test it in a more recent Postgres version. Maybe it's a known limitation I'm not aware of. Let me know what you think. Daniel
Re: BUG #17502: View based on window functions returns wrong results when queried
From
Magnus Hagander
Date:
On Sun, May 29, 2022 at 4:20 PM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:
Bug reference: 17502
Logged by: Daniel Farkaš
Email address: daniel.farkas@datoris.com
PostgreSQL version: 10.10
Operating system: Linux
Description:
Hey,
Please be gentle, I've never been in contact with Postgres developers.
In short, I've created a view, which has rather sketchy window functions,
but it gives me results I need.
When I do select * on it, it gives me what I expect. One of the columns has
five distinct values.
But when I do group by on that column, it gives me only one of the values.
When I drop the view and create materialized view, all is good, I get all
five values.
My guess is that some parts of the inner select are affecting outer, view's
select, which is not something I would expect.
My current Postgres is PostgreSQL 10.10 on x86_64-pc-linux-musl, compiled by
gcc (Alpine 8.3.0) 8.3.0, 64-bit.
If you think this is worth investigating further, I will try composing a
simpler example, and test it in a more recent Postgres version.
Maybe it's a known limitation I'm not aware of.
Let me know what you think.
Please see if you can reproduce this on a current version of PostgreSQL 10, which is 10.21. Version 10.10 is lacking more than two and a half years worth of bugfixes.
If you can then yes, try to put together a simpler example, because it certainly does not sound like correct behavior.
//Magnus
Re: BUG #17502: View based on window functions returns wrong results when queried
From
Daniel Farkaš
Date:
Hey,
SELECT metric_name FROM analytics_view;
Thanks for the response.
I was able to replicate in 10.21.
Here is how:
SELECT version();
PostgreSQL 10.21 on x86_64-pc-linux-musl, compiled by gcc (Alpine 11.2.1_git20220219) 11.2.1 20220219, 64-bit
CREATE TABLE analytics_table (dimension_1 VARCHAR, dimension_2 VARCHAR, metric_1 VARCHAR, metric_2 VARCHAR);
INSERT INTO analytics_table VALUES ('a1', 'b1', 'c1', 'd1'), ('a1', 'b2', 'c2', 'd2');
SELECT * FROM analytics_table;
dimension_1|dimension_2|metric_1|metric_2|
-----------+-----------+--------+--------+
a1 |b1 |c1 |d1 |
a1 |b2 |c2 |d2 |
INSERT INTO analytics_table VALUES ('a1', 'b1', 'c1', 'd1'), ('a1', 'b2', 'c2', 'd2');
SELECT * FROM analytics_table;
dimension_1|dimension_2|metric_1|metric_2|
-----------+-----------+--------+--------+
a1 |b1 |c1 |d1 |
a1 |b2 |c2 |d2 |
SELECT
dimension_1,
dimension_2,
row_number() OVER(partition by generate_series(1,2) order by dimension_1, dimension_2) AS rownum,
row_number() OVER(partition by dimension_1, dimension_2) AS metricnum
FROM analytics_table ORDER BY dimension_1, dimension_2;
dimension_1,
dimension_2,
row_number() OVER(partition by generate_series(1,2) order by dimension_1, dimension_2) AS rownum,
row_number() OVER(partition by dimension_1, dimension_2) AS metricnum
FROM analytics_table ORDER BY dimension_1, dimension_2;
dimension_1|dimension_2|rownum|metricnum|
-----------+-----------+------+------+
a1 |b1 | 1| 1|
a1 |b1 | 1| 2|
a1 |b2 | 2| 1|
a1 |b2 | 2| 2|
-----------+-----------+------+------+
a1 |b1 | 1| 1|
a1 |b1 | 1| 2|
a1 |b2 | 2| 1|
a1 |b2 | 2| 2|
CREATE VIEW analytics_view AS
SELECT
dimension_1,
dimension_2,
row_number() OVER(partition by generate_series(1,2) order by dimension_1, dimension_2) AS rownum,
CASE WHEN row_number() OVER(partition by dimension_1, dimension_2) = 1 THEN 'metric_1' ELSE 'metric_2' END AS metric_name,
CASE WHEN row_number() OVER(partition by dimension_1, dimension_2) = 1 THEN metric_1 ELSE metric_2 END AS metric_value
FROM analytics_table ORDER BY dimension_1, dimension_2;
SELECT * FROM analytics_view;
SELECT
dimension_1,
dimension_2,
row_number() OVER(partition by generate_series(1,2) order by dimension_1, dimension_2) AS rownum,
CASE WHEN row_number() OVER(partition by dimension_1, dimension_2) = 1 THEN 'metric_1' ELSE 'metric_2' END AS metric_name,
CASE WHEN row_number() OVER(partition by dimension_1, dimension_2) = 1 THEN metric_1 ELSE metric_2 END AS metric_value
FROM analytics_table ORDER BY dimension_1, dimension_2;
SELECT * FROM analytics_view;
dimension_1|dimension_2|rownum|metric_name|metric_value|
-----------+-----------+------+-----------+------------+
a1 |b1 | 1|metric_1 |c1 |
a1 |b1 | 1|metric_2 |d1 |
a1 |b2 | 2|metric_1 |c2 |
a1 |b2 | 2|metric_2 |d2 |
-----------+-----------+------+-----------+------------+
a1 |b1 | 1|metric_1 |c1 |
a1 |b1 | 1|metric_2 |d1 |
a1 |b2 | 2|metric_1 |c2 |
a1 |b2 | 2|metric_2 |d2 |
SELECT metric_name FROM analytics_view;
metric_name|
-----------+
metric_1 |
metric_1 |
metric_1 |
metric_1 |
CREATE MATERIALIZED VIEW analytics_materialized_view AS
SELECT
dimension_1,
dimension_2,
row_number() OVER(partition by generate_series(1,2) order by dimension_1, dimension_2) AS rownum,
CASE WHEN row_number() OVER(partition by dimension_1, dimension_2) = 1 THEN 'metric_1' ELSE 'metric_2' END AS metric_name,
CASE WHEN row_number() OVER(partition by dimension_1, dimension_2) = 1 THEN metric_1 ELSE metric_2 END AS metric_value
FROM analytics_table ORDER BY dimension_1, dimension_2;
-----------+
metric_1 |
metric_1 |
metric_1 |
metric_1 |
CREATE MATERIALIZED VIEW analytics_materialized_view AS
SELECT
dimension_1,
dimension_2,
row_number() OVER(partition by generate_series(1,2) order by dimension_1, dimension_2) AS rownum,
CASE WHEN row_number() OVER(partition by dimension_1, dimension_2) = 1 THEN 'metric_1' ELSE 'metric_2' END AS metric_name,
CASE WHEN row_number() OVER(partition by dimension_1, dimension_2) = 1 THEN metric_1 ELSE metric_2 END AS metric_value
FROM analytics_table ORDER BY dimension_1, dimension_2;
SELECT * FROM analytics_materialized_view;
dimension_1|dimension_2|rownum|metric_name|metric_value|
-----------+-----------+------+-----------+------------+
a1 |b1 | 1|metric_1 |c1 |
a1 |b1 | 1|metric_2 |d1 |
a1 |b2 | 2|metric_1 |c2 |
a1 |b2 | 2|metric_2 |d2 |
-----------+-----------+------+-----------+------------+
a1 |b1 | 1|metric_1 |c1 |
a1 |b1 | 1|metric_2 |d1 |
a1 |b2 | 2|metric_1 |c2 |
a1 |b2 | 2|metric_2 |d2 |
SELECT metric_name FROM analytics_materialized_view;
metric_name|
-----------+
metric_1 |
metric_2 |
metric_1 |
metric_2 |
-----------+
metric_1 |
metric_2 |
metric_1 |
metric_2 |
For some analytics purposes I needed to transform one wide table with multiple metrics into a metric_name/metric_value pairs stored as separate rows. That's why I did all this. I guess the reason and the method are not important, the fact that the view gives different results does look like a bug.
Cheers,
Daniel Farkas
Datoris
On Sun, May 29, 2022 at 4:22 PM Magnus Hagander <magnus@hagander.net> wrote:
On Sun, May 29, 2022 at 4:20 PM PG Bug reporting form <noreply@postgresql.org> wrote:The following bug has been logged on the website:
Bug reference: 17502
Logged by: Daniel Farkaš
Email address: daniel.farkas@datoris.com
PostgreSQL version: 10.10
Operating system: Linux
Description:
Hey,
Please be gentle, I've never been in contact with Postgres developers.
In short, I've created a view, which has rather sketchy window functions,
but it gives me results I need.
When I do select * on it, it gives me what I expect. One of the columns has
five distinct values.
But when I do group by on that column, it gives me only one of the values.
When I drop the view and create materialized view, all is good, I get all
five values.
My guess is that some parts of the inner select are affecting outer, view's
select, which is not something I would expect.
My current Postgres is PostgreSQL 10.10 on x86_64-pc-linux-musl, compiled by
gcc (Alpine 8.3.0) 8.3.0, 64-bit.
If you think this is worth investigating further, I will try composing a
simpler example, and test it in a more recent Postgres version.
Maybe it's a known limitation I'm not aware of.
Let me know what you think.Please see if you can reproduce this on a current version of PostgreSQL 10, which is 10.21. Version 10.10 is lacking more than two and a half years worth of bugfixes.If you can then yes, try to put together a simpler example, because it certainly does not sound like correct behavior.//Magnus
Re: BUG #17502: View based on window functions returns wrong results when queried
From
David Rowley
Date:
On Mon, 30 May 2022 at 06:58, Daniel Farkaš <daniel.farkas@datoris.com> wrote: > SELECT metric_name FROM analytics_view; > metric_name| > -----------+ > metric_1 | > metric_1 | > metric_1 | > metric_1 | This is certainly a bug. Thanks for reporting it. The problem seems to be down to the fact that remove_unused_subquery_outputs() does not check if the to-be-removed target entry references WindowClauses which contain set-returning functions. We only seem to check if the target entry itself is an SRF, per: /* * If it contains a set-returning function, we can't remove it since * that could change the number of rows returned by the subquery. */ if (subquery->hasTargetSRFs && expression_returns_set(texpr)) continue; This ensures queries such as the following don't have SRF columns removed: postgres=# explain verbose select a from (select a,generate_series(1,2) as b from t) t; QUERY PLAN ---------------------------------------------------------------------------- Subquery Scan on t (cost=0.00..131.13 rows=5100 width=4) Output: t.a -> ProjectSet (cost=0.00..80.13 rows=5100 width=8) Output: t_1.a, generate_series(1, 2) -> Seq Scan on public.t t_1 (cost=0.00..35.50 rows=2550 width=4) Output: t_1.a (6 rows) I'm a little bit uncertain if the correct fix is to have expression_returns_set() look deeper into WindowFuncs to check if the WindowClause that the function belongs to has any SRFs in the PARTITION BY / ORDER BY clause. Unfortunately, doing that means having to pass the PlannerInfo to expression_returns_set(). I don't quite see how that could be made to work in the back branches. The other fix would be to make remove_unused_subquery_outputs() pull out all WindowFuncs from the texpr and check if any of the WindowClauses have SRFs in the PARTITION BY / ORDER BY clause. I'll need to look a bit deeper into the usages of expression_returns_set() to know which of the fixes is correct. There might be some other bugs lurking due to expression_returns_set() not checking WindowClauses for SRFs. David
Re: BUG #17502: View based on window functions returns wrong results when queried
From
David Rowley
Date:
On Mon, 30 May 2022 at 10:30, David Rowley <dgrowleyml@gmail.com> wrote: > I'm a little bit uncertain if the correct fix is to have > expression_returns_set() look deeper into WindowFuncs to check if the > WindowClause that the function belongs to has any SRFs in the > PARTITION BY / ORDER BY clause. Unfortunately, doing that means > having to pass the PlannerInfo to expression_returns_set(). I don't > quite see how that could be made to work in the back branches. There does seem to be other weird anomalies as a result of expression_returns_set() not looking for SRFs in the WindowFunc PARTITION BY / ORDER BY clause. Looking at the usage of expression_returns_set() in make_sort_input_target(), I'd have expected the following 2 queries to act in the same way: setup: create table ab (a int, b int); insert into ab values(1,1),(1,2),(2,1),(2,2); postgres=# select distinct on (a) a,b, row_number() over (order by generate_Series(1,2)) from ab order by a,b; a | b | row_number ---+---+------------ 1 | 1 | 1 2 | 1 | 7 (2 rows) postgres=# select distinct on (a) a,b, generate_Series(1,2) from ab order by a,b; a | b | generate_series ---+---+----------------- 1 | 1 | 1 1 | 1 | 2 2 | 1 | 1 2 | 1 | 2 (4 rows) The fact that those are different is starting to make me think that expression_returns_set() should be looking into WindowFuncs to see if the WindowClause has SRFs rather than adding special code to remove_unused_subquery_outputs() in order to fix the bug being reported on this thread. However, I'm not sure the latter of the above two queries result makes any sense given that our documentation [1] claims: "PostgreSQL's behavior for a set-returning function in a query's select list is almost exactly the same as if the set-returning function had been written in a LATERAL FROM-clause item instead. For example," the "almost" is later defined as: "It would be exactly the same, except that in this specific example, the planner could choose to put g on the outside of the nested-loop join, since g has no actual lateral dependency on tab." which only indicates that the order of the results may differ, not the actual results themselves. You can see the LATERAL version only returns 2 rows: postgres=# select distinct on (a) a, b, g.s from ab, lateral generate_Series(1,2) g(s) order by a,b; a | b | s ---+---+--- 1 | 1 | 1 2 | 1 | 1 (2 rows) The following is also pretty strange. Why should adding the SRF column to the ORDER BY change the number of output rows? postgres=# select distinct on (a) a,b, generate_Series(1,2) from ab order by a,b,3 desc; a | b | generate_series ---+---+----------------- 1 | 1 | 2 2 | 1 | 2 (2 rows) postgres=# select distinct on (a) a,b, generate_Series(1,2) from ab order by a,b; a | b | generate_series ---+---+----------------- 1 | 1 | 1 1 | 1 | 2 2 | 1 | 1 2 | 1 | 2 (4 rows) The LATERAL version of this will return 2 rows regardless of what the ORDER BY says: postgres=# select distinct on (a) a, b, g.s from ab, lateral generate_Series(1,2) g(s) order by a,b,g.s DESC; a | b | s ---+---+--- 1 | 1 | 2 2 | 1 | 2 (2 rows) postgres=# select distinct on (a) a, b, g.s from ab, lateral generate_Series(1,2) g(s) order by a,b; a | b | s ---+---+--- 1 | 1 | 1 2 | 1 | 1 (2 rows) In addition to what the documentation claims about SRFs in the target list being the same as LATERAL, maybe the following should give us some guidance on if the non-lateral SRF queries above should return 2 or 4 rows: postgres=# select distinct generate_Series(0,1)/2; ?column? ---------- 0 (1 row) In the above, the ProjectSet occurs before the distinctification. That would translate into the 2-row version of the DISTINCT ON query above being correct. I think that roughly translates into changes being required in both make_sort_input_target() and expression_returns_set(). Does anyone have any other opinions? David [1] https://www.postgresql.org/docs/current/xfunc-sql.html#XFUNC-SQL-FUNCTIONS-RETURNING-SET
Re: BUG #17502: View based on window functions returns wrong results when queried
From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes: > The problem seems to be down to the fact that > remove_unused_subquery_outputs() does not check if the to-be-removed > target entry references WindowClauses which contain set-returning > functions. I was sort of wondering why we allow SRFs in this context in the first place. The results don't seem terribly well-defined to me. In particular, a WindowFunc invocation is not supposed to change the number of rows in the query result, and yet this one is doing so. regards, tom lane
Re: BUG #17502: View based on window functions returns wrong results when queried
From
David Rowley
Date:
On Mon, 30 May 2022 at 14:19, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > The problem seems to be down to the fact that > > remove_unused_subquery_outputs() does not check if the to-be-removed > > target entry references WindowClauses which contain set-returning > > functions. > > I was sort of wondering why we allow SRFs in this context in the > first place. The results don't seem terribly well-defined to me. > In particular, a WindowFunc invocation is not supposed to change the > number of rows in the query result, and yet this one is doing so. That would certainly be an easier fix for the reported problem. Do you think it would fly to add such a restriction in the backbranches? David
Re: BUG #17502: View based on window functions returns wrong results when queried
From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes: > On Mon, 30 May 2022 at 14:19, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I was sort of wondering why we allow SRFs in this context in the >> first place. The results don't seem terribly well-defined to me. > Do you think it would fly to add such a restriction in the backbranches? We could leave it alone in the back branches on the grounds that if you like the results you get, we shouldn't break it in a minor release. regards, tom lane
Re: BUG #17502: View based on window functions returns wrong results when queried
From
David Rowley
Date:
On Mon, 30 May 2022 at 14:45, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > On Mon, 30 May 2022 at 14:19, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I was sort of wondering why we allow SRFs in this context in the > >> first place. The results don't seem terribly well-defined to me. > > > Do you think it would fly to add such a restriction in the backbranches? > > We could leave it alone in the back branches on the grounds that if > you like the results you get, we shouldn't break it in a minor > release. I struggle to see how anyone would like their result correctness to depend on whether remove_unused_subquery_outputs() is able or unable to remove a column from the subquery. David
Re: BUG #17502: View based on window functions returns wrong results when queried
From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes: > On Mon, 30 May 2022 at 14:45, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> We could leave it alone in the back branches on the grounds that if >> you like the results you get, we shouldn't break it in a minor >> release. > I struggle to see how anyone would like their result correctness to > depend on whether remove_unused_subquery_outputs() is able or unable > to remove a column from the subquery. Indeed, it's unlikely that anybody would like these particular results. But perhaps somebody is running an application that does this and happens to not trip over any obviously-wrong case; for example, if one always selects all columns from this view, the issue is not apparent. I'm not necessarily against adding the prohibition in the back branches. However, if this has been wrong since 10.x (if not further back) then it seems like few people are tripping over the inconsistency. regards, tom lane
Re: BUG #17502: View based on window functions returns wrong results when queried
From
David Rowley
Date:
On Mon, 30 May 2022 at 15:02, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm not necessarily against adding the prohibition in the back > branches. However, if this has been wrong since 10.x (if not > further back) then it seems like few people are tripping over > the inconsistency. Yeah, perhaps. But we have received a report from 1 person and I think it seems strange to adopt a policy that we require multiple bug reports for the same issue before we consider fixing. FWIW, just to demonstrate what a fix could look like, I've attached a patch. I'm not planning to commit it if you really think we shouldn't be fixing it. The patch also does nothing for the weirdness that I described in the DISTINCT ON case earlier in this thread. David
Attachment
Re: BUG #17502: View based on window functions returns wrong results when queried
From
Richard Guo
Date:
On Mon, May 30, 2022 at 9:12 AM David Rowley <dgrowleyml@gmail.com> wrote:
The following is also pretty strange. Why should adding the SRF column
to the ORDER BY change the number of output rows?
postgres=# select distinct on (a) a,b, generate_Series(1,2) from ab
order by a,b,3 desc;
a | b | generate_series
---+---+-----------------
1 | 1 | 2
2 | 1 | 2
(2 rows)
postgres=# select distinct on (a) a,b, generate_Series(1,2) from ab
order by a,b;
a | b | generate_series
---+---+-----------------
1 | 1 | 1
1 | 1 | 2
2 | 1 | 1
2 | 1 | 2
(4 rows)
This is indeed weird. This seems depends on at which plan level we add
ProjectSetPath. For the first query we insert ProjectSetPath at the
scan/join level, because the SRF is included in the ORDER BY, which
makes the SRF appear in the scan/join target. For the second query the
SRF is postponed to after the Sort.
Is this a bug we should fix?
Thanks
Richard
ProjectSetPath. For the first query we insert ProjectSetPath at the
scan/join level, because the SRF is included in the ORDER BY, which
makes the SRF appear in the scan/join target. For the second query the
SRF is postponed to after the Sort.
Is this a bug we should fix?
Thanks
Richard
Re: BUG #17502: View based on window functions returns wrong results when queried
From
Ilya Anfimov
Date:
On Sun, May 29, 2022 at 10:18:59PM -0400, Tom Lane wrote: > David Rowley <dgrowleyml@gmail.com> writes: > > The problem seems to be down to the fact that > > remove_unused_subquery_outputs() does not check if the to-be-removed > > target entry references WindowClauses which contain set-returning > > functions. > > I was sort of wondering why we allow SRFs in this context in the > first place. The results don't seem terribly well-defined to me. > In particular, a WindowFunc invocation is not supposed to change the > number of rows in the query result, and yet this one is doing so. I would like to note, that some SRFs may return only one row in specific cases, like dblink() with the specific query. It may used much more often, than real multirow SRFs in PARTITION BY. While converting it to a LATERAL JOIN is not a lot of work -- breaking it in previous versions may affect a lot of people.
Re: BUG #17502: View based on window functions returns wrong results when queried
From
Tom Lane
Date:
Richard Guo <guofenglinux@gmail.com> writes: > On Mon, May 30, 2022 at 9:12 AM David Rowley <dgrowleyml@gmail.com> wrote: >> The following is also pretty strange. Why should adding the SRF column >> to the ORDER BY change the number of output rows? > Is this a bug we should fix? This bug seems to have slipped off the radar screen, but it's still a bug. I continue to believe that the best fix is to disallow SRFs in window definitions, and present the trivial patch to do so. As discussed, I'm comfortable with leaving this alone in the back branches. regards, tom lane diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c index ca14f06308..a13f28615b 100644 --- a/src/backend/parser/parse_func.c +++ b/src/backend/parser/parse_func.c @@ -2570,9 +2570,6 @@ check_srf_call_placement(ParseState *pstate, Node *last_srf, int location) break; case EXPR_KIND_WINDOW_PARTITION: case EXPR_KIND_WINDOW_ORDER: - /* okay, these are effectively GROUP BY/ORDER BY */ - pstate->p_hasTargetSRFs = true; - break; case EXPR_KIND_WINDOW_FRAME_RANGE: case EXPR_KIND_WINDOW_FRAME_ROWS: case EXPR_KIND_WINDOW_FRAME_GROUPS: diff --git a/src/test/regress/expected/tsrf.out b/src/test/regress/expected/tsrf.out index d47b5f6ec5..600652581e 100644 --- a/src/test/regress/expected/tsrf.out +++ b/src/test/regress/expected/tsrf.out @@ -265,7 +265,21 @@ ERROR: window function calls cannot contain set-returning function calls LINE 1: SELECT min(generate_series(1, 3)) OVER() FROM few; ^ HINT: You might be able to move the set-returning function into a LATERAL FROM item. --- SRFs are normally computed after window functions +--- ... nor in window definitions +SELECT sum(id) OVER (PARTITION BY generate_series(1, 3)) FROM few; +ERROR: set-returning functions are not allowed in window definitions +LINE 1: SELECT sum(id) OVER (PARTITION BY generate_series(1, 3)) FRO... + ^ +SELECT sum(id) OVER (ORDER BY generate_series(1, 3)) FROM few; +ERROR: set-returning functions are not allowed in window definitions +LINE 1: SELECT sum(id) OVER (ORDER BY generate_series(1, 3)) FROM fe... + ^ +SELECT sum(id) OVER (ROWS BETWEEN UNBOUNDED PRECEDING + AND generate_series(1, 3) FOLLOWING) FROM few; +ERROR: set-returning functions are not allowed in window definitions +LINE 2: AND generate_series(1, 3) FOLLOWING) FR... + ^ +-- SRFs are computed after window functions SELECT id,lag(id) OVER(), count(*) OVER(), generate_series(1,3) FROM few; id | lag | count | generate_series ----+-----+-------+----------------- @@ -280,15 +294,6 @@ SELECT id,lag(id) OVER(), count(*) OVER(), generate_series(1,3) FROM few; 3 | 2 | 3 | 3 (9 rows) --- unless referencing SRFs -SELECT SUM(count(*)) OVER(PARTITION BY generate_series(1,3) ORDER BY generate_series(1,3)), generate_series(1,3) g FROMfew GROUP BY g; - sum | g ------+--- - 3 | 1 - 3 | 2 - 3 | 3 -(3 rows) - -- sorting + grouping SELECT few.dataa, count(*), min(id), max(id), generate_series(1,3) FROM few GROUP BY few.dataa ORDER BY 5, 1; dataa | count | min | max | generate_series diff --git a/src/test/regress/sql/tsrf.sql b/src/test/regress/sql/tsrf.sql index 7c22529a0d..8442ba9e74 100644 --- a/src/test/regress/sql/tsrf.sql +++ b/src/test/regress/sql/tsrf.sql @@ -82,10 +82,14 @@ SELECT sum((3 = ANY(SELECT lag(x) over(order by x) -- SRFs are not allowed in window function arguments, either SELECT min(generate_series(1, 3)) OVER() FROM few; --- SRFs are normally computed after window functions +--- ... nor in window definitions +SELECT sum(id) OVER (PARTITION BY generate_series(1, 3)) FROM few; +SELECT sum(id) OVER (ORDER BY generate_series(1, 3)) FROM few; +SELECT sum(id) OVER (ROWS BETWEEN UNBOUNDED PRECEDING + AND generate_series(1, 3) FOLLOWING) FROM few; + +-- SRFs are computed after window functions SELECT id,lag(id) OVER(), count(*) OVER(), generate_series(1,3) FROM few; --- unless referencing SRFs -SELECT SUM(count(*)) OVER(PARTITION BY generate_series(1,3) ORDER BY generate_series(1,3)), generate_series(1,3) g FROMfew GROUP BY g; -- sorting + grouping SELECT few.dataa, count(*), min(id), max(id), generate_series(1,3) FROM few GROUP BY few.dataa ORDER BY 5, 1;
Re: BUG #17502: View based on window functions returns wrong results when queried
From
Richard Guo
Date:
On Tue, Jan 31, 2023 at 6:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
This bug seems to have slipped off the radar screen, but it's still
a bug. I continue to believe that the best fix is to disallow SRFs
in window definitions, and present the trivial patch to do so.
This patch fixes the original issue reported by Daniel. Another issue
discussed in this thread is the weirdness when there are SRFs in the
targetlist of a query with DISTINCT ON clause, as described by David.
create table ab (a int, b int);
insert into ab values(1,1),(1,2),(2,1),(2,2);
# select distinct on (a) a, b, generate_series(1,2) g from ab order by a, b;
a | b | g
---+---+---
1 | 1 | 1
1 | 1 | 2
2 | 1 | 1
2 | 1 | 2
(4 rows)
# select distinct on (a) a, b, g from ab, lateral generate_series(1,2) as g order by a, b;
a | b | g
---+---+---
1 | 1 | 1
2 | 1 | 1
(2 rows)
According to the doc these two queries are supposed to be 'almost
exactly the same'. So it's weird to see they give different number of
output rows.
Should we also fix this issue? If so it seems we need some changes
about postponing SRFs in make_sort_input_target().
Thanks
Richard
discussed in this thread is the weirdness when there are SRFs in the
targetlist of a query with DISTINCT ON clause, as described by David.
create table ab (a int, b int);
insert into ab values(1,1),(1,2),(2,1),(2,2);
# select distinct on (a) a, b, generate_series(1,2) g from ab order by a, b;
a | b | g
---+---+---
1 | 1 | 1
1 | 1 | 2
2 | 1 | 1
2 | 1 | 2
(4 rows)
# select distinct on (a) a, b, g from ab, lateral generate_series(1,2) as g order by a, b;
a | b | g
---+---+---
1 | 1 | 1
2 | 1 | 1
(2 rows)
According to the doc these two queries are supposed to be 'almost
exactly the same'. So it's weird to see they give different number of
output rows.
Should we also fix this issue? If so it seems we need some changes
about postponing SRFs in make_sort_input_target().
Thanks
Richard
Re: BUG #17502: View based on window functions returns wrong results when queried
From
David Rowley
Date:
On Tue, 31 Jan 2023 at 11:53, Tom Lane <tgl@sss.pgh.pa.us> wrote: > This bug seems to have slipped off the radar screen, but it's still > a bug. I continue to believe that the best fix is to disallow SRFs > in window definitions, and present the trivial patch to do so. I don't really object to doing this, but I'm not convinced that someone else won't eventually come along complaining about us changing this. As far as I'm aware, the behaviour is not wrong today provided you always use the column so that remove_unused_subquery_outputs() does not get rid of it. If we disallow this, then maybe it's worth also giving ourselves a bit more flexibility to make it work correctly in the future should someone come along and convince us we shouldn't have disallowed this. Adjusting expression_returns_set() to pass it the relevant PlannerInfo for the Node might be a good idea. If we change our minds in the future then we could easily then modify expression_returns_set_walker() to have it check if the given WindowFunc's WindowClause has SRFs so that we don't remove the unused WindowFunc subquery columns in remove_unused_subquery_outputs(). I'd have probably gone with this as a master-only fix, but is quite likely you're thinking of something that I'm not. Certainly, the consistency of when SRFs are evaluated is pretty terrible so I understand the desire to disallow the WindowClause SRFs so we have one less thing to get wrong. David