Wrong results with subquery pullup and grouping sets - Mailing list pgsql-hackers

From Richard Guo
Subject Wrong results with subquery pullup and grouping sets
Date
Msg-id CAMbWs4-meSahaanKskpBn0KKxdHAXC1_EJCVWHxEodqirrGJnw@mail.gmail.com
Whole thread Raw
Responses Re: Wrong results with subquery pullup and grouping sets
Re: Wrong results with subquery pullup and grouping sets
List pgsql-hackers
While working on the expansion of virtual generated columns, Dean
encountered $subject in [1], which can be reproduced using the query
below.

create table t (a int);
insert into t values (1);

# select a, b
  from (select a, a as b from t) ss
  group by grouping sets(a, b)
  having b = 1;
 a | b
---+---
 1 |
(1 row)

Note that the having clause filters out the wrong row.

In 90947674f, we fixed a similar issue by wrapping subquery outputs
that are non-var expressions in PlaceHolderVars.  This prevents
const-simplification from merging them into the surrounding expression
after subquery pullup and resulting in something that won't match the
grouping set expression in setrefs.c.

It seems that we still have loose ends with that fix.

This issue illustrates that if the subquery's target list contains two
or more identical Var expressions, we can also fail to match the Var
expression to the expected grouping set expression.  This is not
related to const-simplification, but rather to how we match
expressions to lower target items in setrefs.c.

For sort/group expressions, we use ressortgroupref matching, which
works well.  For other expressions, we primarily rely on comparing the
expressions to determine if they are the same.  This is why, in the
query above, the Var 'b' in the HAVING clause matches the first target
entry of the subquery, rather than the expected second one.

You might wonder why this issue wasn't fixed by the commit that
introduced a dummy RTE representing the output of the grouping step.
I think that commit prevents one expression that matches the grouping
item from being const-folded or preprocessed, but it doesn't prevent
setrefs.c from matching the expression to some other identical ones.

It seems to me that simple Var expressions in a subquery's target list
also need to retain their separate identity in order to match grouping
set columns after subquery pullup, not just non-var expressions.
Alternatively, is there a way to teach setrefs.c to match an
expression to the expected one when there are multiple identical
expressions?

[1] https://postgr.es/m/CAEZATCWsKqCtZ=ud26-gGV3zHt-hjS4OKG43GCBhSaYUWyfKiw@mail.gmail.com

Thanks
Richard



pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: Improve CRC32C performance on SSE4.2
Next
From: Michael Paquier
Date:
Subject: Add regression test checking combinations of (object,backend_type,context) in pg_stat_io