Thread: BUG #17495: Regression in 15beta1 when filtering subquery including row_number window function

The following bug has been logged on the website:

Bug reference:      17495
Logged by:          Jeremy Evans
Email address:      jeremyevans0@gmail.com
PostgreSQL version: Unsupported/Unknown
Operating system:   OpenBSD 7.1/amd64
Description:

From testing with 15beta1, I think there is a regression in filtering a
subquery using the result of the row_number window function.  Here is the
code:

CREATE TEMPORARY TABLE artists (id integer PRIMARY KEY, name text);
CREATE TEMPORARY TABLE albums (id integer PRIMARY KEY, name text, artist_id
integer REFERENCES artists);
CREATE TEMPORARY TABLE tags (id integer  PRIMARY KEY, name text);
CREATE TEMPORARY TABLE albums_tags (album_id integer REFERENCES albums,
tag_id integer REFERENCES tags);

INSERT INTO artists (id, name) VALUES (1, 'Ar'), (2, 'Ar2');
INSERT INTO albums (id, name, artist_id) VALUES (1, 'Al', 1), (2, 'Al2',
2);
INSERT INTO tags (id, name) VALUES (1, 'T'), (2, 'U'), (3, 'V'), (4,
'T2');
INSERT INTO albums_tags (album_id, tag_id) VALUES (1, 1), (1, 2), (1, 3),
(2, 4);

SELECT albums.artist_id AS b, tags.id AS c, row_number() OVER (PARTITION BY
albums.artist_id ORDER BY tags.name) AS x_sequel_row_number_x FROM tags
INNER JOIN albums_tags ON (albums_tags.tag_id = tags.id) INNER JOIN albums
ON (albums.id = albums_tags.album_id);

-- Regression in 15beta1, includes (1, 3) even though x_sequel_row_number_x
= 3 for that row
SELECT b, c FROM (SELECT albums.artist_id AS b, tags.id AS c, row_number()
OVER (PARTITION BY albums.artist_id ORDER BY tags.name) AS
x_sequel_row_number_x FROM tags INNER JOIN albums_tags ON
(albums_tags.tag_id = tags.id) INNER JOIN albums ON (albums.id =
albums_tags.album_id)) AS t1 WHERE (x_sequel_row_number_x <= 2);

Results From PostgreSQL 8.4-14:

 b | c | x_sequel_row_number_x 
---+---+-----------------------
 1 | 1 |                     1
 1 | 2 |                     2
 1 | 3 |                     3
 2 | 4 |                     1
(4 rows)

 b | c 
---+---
 1 | 1
 1 | 2
 2 | 4
(3 rows)

Results on PostgreSQL 15beta1:

 b | c | x_sequel_row_number_x 
---+---+-----------------------
 1 | 1 |                     1
 1 | 2 |                     2
 1 | 3 |                     3
 2 | 4 |                     1
(4 rows)

 b | c 
---+---
 1 | 1
 1 | 2
 1 | 3
 2 | 4
(4 rows)

This was discovered by the tests for Sequel, a Ruby database access
library.

Thanks,
Jeremy


On Tue, May 24, 2022 at 04:17:35PM +0000, PG Bug reporting form wrote:
> From testing with 15beta1, I think there is a regression in filtering a
> subquery using the result of the row_number window function.  Here is the
> code:
>
> This was discovered by the tests for Sequel, a Ruby database access
> library.

Inconsistency reproduced here, thanks for the detailed report and the
test case!  I have added an open item:
https://wiki.postgresql.org/wiki/PostgreSQL_15_Open_Items
--
Michael

Attachment
On Wed, 25 May 2022 at 12:51, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, May 24, 2022 at 04:17:35PM +0000, PG Bug reporting form wrote:
> > From testing with 15beta1, I think there is a regression in filtering a
> > subquery using the result of the row_number window function.  Here is the
> > code:
> >
> > This was discovered by the tests for Sequel, a Ruby database access
> > library.
>
> Inconsistency reproduced here, thanks for the detailed report and the
> test case!  I have added an open item:
> https://wiki.postgresql.org/wiki/PostgreSQL_15_Open_Items

Thanks for adding. I've put myself as the owner of this and will look
at it soon.

Jeremy, Thanks for the report.

David




On Wed, May 25, 2022 at 10:17 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Wed, 25 May 2022 at 12:51, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, May 24, 2022 at 04:17:35PM +0000, PG Bug reporting form wrote:
> > From testing with 15beta1, I think there is a regression in filtering a
> > subquery using the result of the row_number window function.  Here is the
> > code:
> >
> > This was discovered by the tests for Sequel, a Ruby database access
> > library.
>
> Inconsistency reproduced here, thanks for the detailed report and the
> test case!  I have added an open item:
> https://wiki.postgresql.org/wiki/PostgreSQL_15_Open_Items

Thanks for adding. I've put myself as the owner of this and will look
at it soon.

This should be introduced in by 9d9c02cc. The rel of the subquery loses
its baserestrictinfo, which is (x_sequel_row_number_x <= 2), when
generating SubqueryScan access paths for it.  As a result, the
WindowFunc target entry is removed as an unused targetlist item.

Thanks
Richard

On Wed, May 25, 2022 at 10:35 AM Richard Guo <guofenglinux@gmail.com> wrote:

On Wed, May 25, 2022 at 10:17 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Wed, 25 May 2022 at 12:51, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, May 24, 2022 at 04:17:35PM +0000, PG Bug reporting form wrote:
> > From testing with 15beta1, I think there is a regression in filtering a
> > subquery using the result of the row_number window function.  Here is the
> > code:
> >
> > This was discovered by the tests for Sequel, a Ruby database access
> > library.
>
> Inconsistency reproduced here, thanks for the detailed report and the
> test case!  I have added an open item:
> https://wiki.postgresql.org/wiki/PostgreSQL_15_Open_Items

Thanks for adding. I've put myself as the owner of this and will look
at it soon.

This should be introduced in by 9d9c02cc. The rel of the subquery loses
its baserestrictinfo, which is (x_sequel_row_number_x <= 2), when
generating SubqueryScan access paths for it.  As a result, the
WindowFunc target entry is removed as an unused targetlist item.

To be more detailed, set_subquery_pathlist believes the restriction
clause of the subquery's rel is useful to use for the WindowAgg's run
condition, and the run condition will handle all of the required
filtering. So the restriction clause is removed from the
rel->baserestrictinfo. Then when we try to remove subquery's unused
targetlist items, we find the WindowFunc target entry is not needed
either in the reltarget, or in the restriction clauses. So we proceed to
remove it from the subquery's targetList. And that makes
grouping_planner think there are no active Windows in the subquery. As a
result, no WindowAgg node would be generated in the final plan.

Thanks
Richard 
On Wed, 25 May 2022 at 14:35, Richard Guo <guofenglinux@gmail.com> wrote:
> This should be introduced in by 9d9c02cc. The rel of the subquery loses
> its baserestrictinfo, which is (x_sequel_row_number_x <= 2), when
> generating SubqueryScan access paths for it.  As a result, the
> WindowFunc target entry is removed as an unused targetlist item.

Thanks for looking into that.

FWIW, a simplified test case is:

explain select x from (select x,row_number() over (order by x) rn from
generate_Series(1,100) x) gs where gs.rn < 10;

I think the correct fix is to change remove_unused_subquery_outputs()
so it also checks the WindowClause runConditions in addition to the
rel targetlist and baserestrictinfo.

I played around with the attached (rather horrible) patch. The problem
I have with the patch is that we can't simply call pull_varattnos() on
the WindowClause.runCondition. This is because the runCondition
OpExprs contain WindowFunc references rather than Vars which reference
one of those in the targetlist.  To make this work I had to trawl
through the targetlist of the subquery and look for an entry which is
equal to the runCondition's WindowFunc reference.  This is pretty
horrible as we've no way to know that we've found the right one.

The following query shows the problem:

explain verbose select x from (select x,row_number() over (order by x)
as dummy, row_number() over (order by x) rn from
generate_Series(1,100) x) gs where gs.rn < 10;

Looking at the output of the WindowAgg shows that the "dummy" column
was kept and "rn" was replaced with the NULL Const.  This does not
seem to lead to any actual problems, but it does set off some alarm
bells.

  ->  WindowAgg  (cost=6.82..8.57 rows=100 width=20)
        Output: x.x, row_number() OVER (?), NULL::bigint


Maybe a better fix is to add a new Bitmapset field to WindowClause and
have find_window_run_conditions() record the attno in that field when
it appends the new runCondition to the runCondition field.
remove_unused_subquery_outputs() can just bms_add_members that field
to attrs_used.  This just means having to add a field to WindowClause
post-beta.  Is that going to be a problem?

David

Attachment
David Rowley <dgrowleyml@gmail.com> writes:
> Maybe a better fix is to add a new Bitmapset field to WindowClause and
> have find_window_run_conditions() record the attno in that field when
> it appends the new runCondition to the runCondition field.
> remove_unused_subquery_outputs() can just bms_add_members that field
> to attrs_used.  This just means having to add a field to WindowClause
> post-beta.  Is that going to be a problem?

It'd mean a forced initdb, which is not great, but unless 0ff20288e
gets reverted there'd be no additional impact on beta testers.

A bigger problem with what you describe is that I don't really think
the planner should be mucking with the input parse tree like that.
Can't we retain this information somewhere else instead, in storage
associated with the PlannerInfo?

            regards, tom lane




On Thu, May 26, 2022 at 6:33 AM David Rowley <dgrowleyml@gmail.com> wrote:
I think the correct fix is to change remove_unused_subquery_outputs()
so it also checks the WindowClause runConditions in addition to the
rel targetlist and baserestrictinfo.

Yes, concur with that. Yesterday I played around with a similar patch
which leverages find_window_functions() to locate all the WindowFunc
nodes in runCondition expressions and then check if each WindowFunc
target entry of the subquery is contained there.
 
Maybe a better fix is to add a new Bitmapset field to WindowClause and
have find_window_run_conditions() record the attno in that field when
it appends the new runCondition to the runCondition field.
remove_unused_subquery_outputs() can just bms_add_members that field
to attrs_used.  This just means having to add a field to WindowClause
post-beta.  Is that going to be a problem?

This is even better if we can keep the attnos somewhere for the window
quals that are pushed down to runConditions. Like Tom proposed, maybe
PlannerInfo is a more proper place.

Thanks
Richard
On Thu, 26 May 2022 at 10:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > Maybe a better fix is to add a new Bitmapset field to WindowClause and
> > have find_window_run_conditions() record the attno in that field when
> > it appends the new runCondition to the runCondition field.
> > remove_unused_subquery_outputs() can just bms_add_members that field
> > to attrs_used.  This just means having to add a field to WindowClause
> > post-beta.  Is that going to be a problem?
>
> It'd mean a forced initdb, which is not great, but unless 0ff20288e
> gets reverted there'd be no additional impact on beta testers.

I'm partially surprised that we'd consider rolling that back to what
it was before that commit if it was reverted. I didn't know that was
the policy. Likely it's less painful for hackers to deal with than all
beta testers.

> A bigger problem with what you describe is that I don't really think
> the planner should be mucking with the input parse tree like that.
> Can't we retain this information somewhere else instead, in storage
> associated with the PlannerInfo?

Yeah, I'm definitely onboard with not messing around with the parse
when we don't have to. I just can't quite see how this could be done
for this case.

The problem is that the qual pushdown stuff all happens in
set_subquery_pathlist() before we call subquery_planner() for the
subquery.  We don't yet have a PlannerInfo made for the subquery when
we call check_and_push_window_quals().  We don't really have any other
means to communicate to subquery_planner() what the run conditions are
for the given Query object. Plus, we *do* really need to know what the
runConditions are before we call subquery_planner() so that those
conditions can be properly tagged onto WindowAggPaths. I don't really
think it would be right to pluck those from the PlannerInfo when we
later call create_plan(). That wouldn't leave us any opportunity to do
any costing related stuff with run conditions if we decide to do that
later.

The best way I can think to do that is to have some supplementary
struct that we can pass to subquery_planner() like QueryInfo that
contains additional info we've learned about the query. Having
something like that might help us get away from other places where we
modify the parse, such as adding inheritance RTEs for partitioned
tables. However, that does not feel like a topic for PG15.

David




On Thu, May 26, 2022 at 11:47 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 26 May 2022 at 10:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > Maybe a better fix is to add a new Bitmapset field to WindowClause and
> > have find_window_run_conditions() record the attno in that field when
> > it appends the new runCondition to the runCondition field.
> > remove_unused_subquery_outputs() can just bms_add_members that field
> > to attrs_used.  This just means having to add a field to WindowClause
> > post-beta.  Is that going to be a problem?
>
> It'd mean a forced initdb, which is not great, but unless 0ff20288e
> gets reverted there'd be no additional impact on beta testers.

I'm partially surprised that we'd consider rolling that back to what
it was before that commit if it was reverted. I didn't know that was
the policy. Likely it's less painful for hackers to deal with than all
beta testers.

> A bigger problem with what you describe is that I don't really think
> the planner should be mucking with the input parse tree like that.
> Can't we retain this information somewhere else instead, in storage
> associated with the PlannerInfo?

Yeah, I'm definitely onboard with not messing around with the parse
when we don't have to. I just can't quite see how this could be done
for this case.

The problem is that the qual pushdown stuff all happens in
set_subquery_pathlist() before we call subquery_planner() for the
subquery.  We don't yet have a PlannerInfo made for the subquery when
we call check_and_push_window_quals().  We don't really have any other
means to communicate to subquery_planner() what the run conditions are
for the given Query object. Plus, we *do* really need to know what the
runConditions are before we call subquery_planner() so that those
conditions can be properly tagged onto WindowAggPaths. I don't really
think it would be right to pluck those from the PlannerInfo when we
later call create_plan(). That wouldn't leave us any opportunity to do
any costing related stuff with run conditions if we decide to do that
later.

The remove_unused_subquery_outputs() also happens before we call
subquery_planner() for the subquery. Cann't we just store the attnos
used in window quals that are pushed down to runConditions in the
PlannerInfo of the upper query?

Thanks
Richard 

On Thu, May 26, 2022 at 12:13 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Thu, May 26, 2022 at 11:47 AM David Rowley <dgrowleyml@gmail.com> wrote:
The problem is that the qual pushdown stuff all happens in
set_subquery_pathlist() before we call subquery_planner() for the
subquery.  We don't yet have a PlannerInfo made for the subquery when
we call check_and_push_window_quals().  We don't really have any other
means to communicate to subquery_planner() what the run conditions are
for the given Query object. Plus, we *do* really need to know what the
runConditions are before we call subquery_planner() so that those
conditions can be properly tagged onto WindowAggPaths. I don't really
think it would be right to pluck those from the PlannerInfo when we
later call create_plan(). That wouldn't leave us any opportunity to do
any costing related stuff with run conditions if we decide to do that
later.

The remove_unused_subquery_outputs() also happens before we call
subquery_planner() for the subquery. Cann't we just store the attnos
used in window quals that are pushed down to runConditions in the
PlannerInfo of the upper query?

Ah, your point is to move runConditions also out of WindowClause and
store them in PlannerInfo, right? That's indeed not an easy job.

Thanks
Richard 
On Thu, 26 May 2022 at 16:34, Richard Guo <guofenglinux@gmail.com> wrote:
> On Thu, May 26, 2022 at 12:13 PM Richard Guo <guofenglinux@gmail.com> wrote:
>>
>>
>> On Thu, May 26, 2022 at 11:47 AM David Rowley <dgrowleyml@gmail.com> wrote:
>>>
>>> The problem is that the qual pushdown stuff all happens in
>>> set_subquery_pathlist() before we call subquery_planner() for the
>>> subquery.  We don't yet have a PlannerInfo made for the subquery when
>>> we call check_and_push_window_quals().  We don't really have any other
>>> means to communicate to subquery_planner() what the run conditions are
>>> for the given Query object. Plus, we *do* really need to know what the
>>> runConditions are before we call subquery_planner() so that those
>>> conditions can be properly tagged onto WindowAggPaths. I don't really
>>> think it would be right to pluck those from the PlannerInfo when we
>>> later call create_plan(). That wouldn't leave us any opportunity to do
>>> any costing related stuff with run conditions if we decide to do that
>>> later.
>>
>>
>> The remove_unused_subquery_outputs() also happens before we call
>> subquery_planner() for the subquery. Cann't we just store the attnos
>> used in window quals that are pushed down to runConditions in the
>> PlannerInfo of the upper query?
>
>
> Ah, your point is to move runConditions also out of WindowClause and
> store them in PlannerInfo, right? That's indeed not an easy job.

Yeah, I was aiming for a way to have it so the planner didn't
vandalise WindowClause.

I've attached a patch which fixes the bug and does not require any
further modifications to WindowClause.

David

Attachment

On Thu, May 26, 2022 at 1:38 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 26 May 2022 at 16:34, Richard Guo <guofenglinux@gmail.com> wrote:
> On Thu, May 26, 2022 at 12:13 PM Richard Guo <guofenglinux@gmail.com> wrote:
>>
>>
>> On Thu, May 26, 2022 at 11:47 AM David Rowley <dgrowleyml@gmail.com> wrote:
>>>
>>> The problem is that the qual pushdown stuff all happens in
>>> set_subquery_pathlist() before we call subquery_planner() for the
>>> subquery.  We don't yet have a PlannerInfo made for the subquery when
>>> we call check_and_push_window_quals().  We don't really have any other
>>> means to communicate to subquery_planner() what the run conditions are
>>> for the given Query object. Plus, we *do* really need to know what the
>>> runConditions are before we call subquery_planner() so that those
>>> conditions can be properly tagged onto WindowAggPaths. I don't really
>>> think it would be right to pluck those from the PlannerInfo when we
>>> later call create_plan(). That wouldn't leave us any opportunity to do
>>> any costing related stuff with run conditions if we decide to do that
>>> later.
>>
>>
>> The remove_unused_subquery_outputs() also happens before we call
>> subquery_planner() for the subquery. Cann't we just store the attnos
>> used in window quals that are pushed down to runConditions in the
>> PlannerInfo of the upper query?
>
>
> Ah, your point is to move runConditions also out of WindowClause and
> store them in PlannerInfo, right? That's indeed not an easy job.

Yeah, I was aiming for a way to have it so the planner didn't
vandalise WindowClause.

I've attached a patch which fixes the bug and does not require any
further modifications to WindowClause.

The patch looks sane to me, expect that line 122 in the patch introduces
a whitespace-only line.

BTW, surprised that param 'attno' in find_window_run_conditions() is not
used before.

So after this fix, do we plan to further move runConditions out of
WindowClause so that we don't need to modify the input parse tree?

Thanks
Richard

On Thu, May 26, 2022 at 4:22 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Thu, May 26, 2022 at 1:38 PM David Rowley <dgrowleyml@gmail.com> wrote:
Yeah, I was aiming for a way to have it so the planner didn't
vandalise WindowClause.

I've attached a patch which fixes the bug and does not require any
further modifications to WindowClause.

The patch looks sane to me, expect that line 122 in the patch introduces
a whitespace-only line.

Another review comment:

- * extra_used_attrs can passed as non-NULL to mark any columns that we should
+ * extra_used_attrs can be passed as non-NULL to mark any columns that we should

Thanks
Richard 
On Thu, 26 May 2022 at 20:44, Richard Guo <guofenglinux@gmail.com> wrote:
> Another review comment:
>
> - * extra_used_attrs can passed as non-NULL to mark any columns that we should
> + * extra_used_attrs can be passed as non-NULL to mark any columns that we should

Thanks for the reviews.

Since the latest patch didn't require changing the WindowClause
struct, I saw no reason to delay fixing the bug. I've now pushed the
patch after doing a bit more work on the comments.

I also added the missing initialization of runCondition in
transformWindowDefinitions().

I am keen to devise some way of not having this feature scribble
anything on the WindowClause struct.  I just think we need to come up
with some supplementary data structure that's not "Query" to write
these sorts of things down on. I'm thinking that's PG16 material now.

David