Thread: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS

BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS

From

PG Bug reporting form

Date:

19 August 2020, 13:31:49

The following bug has been logged on the website:

Bug reference:      16585
Logged by:          Paul Sivash
Email address:      pavelsivash@gmail.com
PostgreSQL version: 12.4
Operating system:   x86_64-pc-linux-gnu
Description:

Hello! There is a problem with filtering COALESCE field which has constant
from nested subselect with GROUPING SETS as first element. 

Example: 

WITH table1 AS (
     SELECT 2 AS city_id, 5 AS cnt
     UNION ALL
     SELECT 2 AS city_id, 1 AS cnt
     UNION ALL
     SELECT 3 AS city_id, 2 AS cnt
     UNION ALL
     SELECT 3 AS city_id, 7 AS cnt
    ), 
 
fin AS (
    SELECT 
        coalesce(country_id, city_id) AS location_id,
        total
    FROM (
        SELECT
            1 as country_id,
            city_id,
            sum(cnt) as total
        FROM table1 
        GROUP BY GROUPING SETS (1,2)
        ) base
    )
    
SELECT * 
FROM fin 
WHERE location_id = 1;

As you can see in the end I want to keep only rows with location_id = 1 but
the result gives me all available rows. This happens because Postgres sees
that I filter COALESCE field which has "country_id" as first element and
"country_id" is previously set as constant - 1. But the thing is that using
GROUPING SETS turns "country_id" to NULL in some rows and this behaviour is
wrong. 

When I change final filter to "location_id = 2" it returns 0 rows for the
same reason. 

Thank you in advance!

Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS

From

David Rowley

Date:

19 August 2020, 14:34:19

On Wed, 19 Aug 2020 at 23:12, PG Bug reporting form
<noreply@postgresql.org> wrote:
> As you can see in the end I want to keep only rows with location_id = 1 but
> the result gives me all available rows.

hmm yeah, certainly a bug.  On a very quick look, it looks like the
CTE inlining code is to blame as it works ok if the fin CTE is
materialized (as it would have been before 608b167f9). i.e:

WITH table1 AS (
     SELECT 2 AS city_id, 5 AS cnt
     UNION ALL
     SELECT 2 AS city_id, 1 AS cnt
     UNION ALL
     SELECT 3 AS city_id, 2 AS cnt
     UNION ALL
     SELECT 3 AS city_id, 7 AS cnt
        ),

fin AS MATERIALIZED (
        SELECT
            coalesce(country_id, city_id) AS location_id,
            total
        FROM (
                SELECT
                        1 as country_id,
                        city_id,
                        sum(cnt) as total
                FROM table1
                GROUP BY GROUPING SETS (1,2)
                ) base
        )
SELECT *
FROM fin
WHERE location_id = 1;

I see with the materialized version the CTE has a qual. This is the
qual that appears to go missing in the non-materialized version:

 CTE Scan on fin  (cost=0.28..0.39 rows=1 width=12)
   Filter: (location_id = 1)

David

Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS

From

Andrew Gierth

Date:

19 August 2020, 16:21:24

>>>>> "David" == David Rowley <dgrowleyml@gmail.com> writes:

 David> hmm yeah, certainly a bug. On a very quick look, it looks like
 David> the CTE inlining code

Nope. You can tell it's not that because rewriting it with no CTEs at
all does not eliminate the bug (and this way, it reproduces right back
to 9.5, oops):

select *
  from (select coalesce(country_id, city_id) AS location_id,
               total
          from (select 1 as country_id,
                       city_id,
                       sum(cnt) as total
                  from (values (2,5),(2,1),(3,2),(3,7)) as table1(city_id,cnt)
                 group by grouping sets (1,2)) base) fin
 where location_id=1;
 location_id | total 
-------------+-------
           1 |    15
           2 |     6
           3 |     9
(3 rows)

The problem here is that something is assuming that the country_id is
still constant 1 despite its participation in grouping sets rendering it
sometimes null.

Using a materialized CTE avoids the bug (at least partially) by hiding
the constant projection from the optimizer.

Most likely, that constant column needs to either be treated as not
constant, or something should be replacing it with a PHV - I'd have to
dig into the code a bit to see what's actually going wrong.

-- 
Andrew (irc:RhodiumToad)

Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS

From

Tom Lane

Date:

21 August 2020, 00:25:27

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> The problem here is that something is assuming that the country_id is
> still constant 1 despite its participation in grouping sets rendering it
> sometimes null.

Yeah.  Your version of the query is initially simplified, by one level
of subquery pullup, into

select coalesce(country_id, city_id) AS location_id,
       total
from (select 1 as country_id,
             city_id,
             sum(cnt) as total
      from (values (2,5),(2,1),(3,2),(3,7)) as table1(city_id,cnt)
      group by grouping sets (1,2)) base
where coalesce(country_id, city_id) = 1;

We can't pull up the remaining subquery because it has GROUP BY.
But what we will try to do instead is to push down the outer
WHERE clause into the subquery (cf. set_subquery_pathlist and
subroutines).  That code sees no reason not to do so, so
it converts this into

select coalesce(country_id, city_id) AS location_id,
       total
from (select 1 as country_id,
             city_id,
             sum(cnt) as total
      from (values (2,5),(2,1),(3,2),(3,7)) as table1(city_id,cnt)
      group by grouping sets (1,2)
      having coalesce(1, city_id) = 1
     ) base;

and then const-folding proves the HAVING to be constant-true.

> Most likely, that constant column needs to either be treated as not
> constant, or something should be replacing it with a PHV - I'd have to
> dig into the code a bit to see what's actually going wrong.

PHVs don't save us here because those are only added when pulling up
a subquery, which is not what's happening.

As a stopgap measure, I think what we have to do is teach
check_output_expressions that subquery output columns are
unsafe to reference if they are not listed in all grouping
sets (do I have that condition right?).

The scheme I've been thinking about for clarifying the nullability
semantics of Vars might eventually provide a nicer answer for this,
but we haven't got it today.

            regards, tom lane

Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS

From

Andrew Gierth

Date:

21 August 2020, 00:51:32

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> As a stopgap measure, I think what we have to do is teach
 Tom> check_output_expressions that subquery output columns are unsafe
 Tom> to reference if they are not listed in all grouping sets (do I
 Tom> have that condition right?).

Unless I'm missing something, it should be safe to reference output
columns that are not mentioned in any grouping set, or which are
mentioned in all grouping sets (after all expansions); but unsafe to
reference columns mentioned in some grouping sets but not others (since
these will be forced to null in the output for the sets in which they
don't appear).

-- 
Andrew (irc:RhodiumToad)

Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS

From

Andy Fan

Date:

21 August 2020, 04:19:57

The scheme I've been thinking about for clarifying the nullability
semantics of Vars might eventually provide a nicer answer for this,
but we haven't got it today.

regards, tom lane

With this direction, how about maintaining a RelOptInfo->notnullattrs (BitmapSet *)

attrs, which might be accessed in a more efficient way? We should set it to nullattrs

if it exists in groupSet clause. I introduced this attribute in UniqueKey patch [1]

(patch 0001), but it is not maintained in baserelonly now. I think we can expand it

for other types of rel as well.

However both implementations can avoid the issue here, but still make it impossible

to push down an qual even if it references all the exprs in the groupset, am I right?

[1] https://www.postgresql.org/message-id/CAKU4AWrT%3Do0dEmdFCH3W6B%3DBeUS1%3DOU5N86FfAmKFUahSP48SA%40mail.gmail.com

Best Regards

Andy Fan

Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS

From

Andy Fan

Date:

21 August 2020, 05:08:42

On Fri, Aug 21, 2020 at 5:51 AM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Tom> As a stopgap measure, I think what we have to do is teach
Tom> check_output_expressions that subquery output columns are unsafe
Tom> to reference if they are not listed in all grouping sets (do I
Tom> have that condition right?).

Unless I'm missing something, it should be safe to reference output
columns that are not mentioned in any grouping set,

I think such columns usually are aggregation expr, If we want to push down

a qual which reference to an aggregation expr, we have to push down

to having cause, However I am not sure such pushing down really helps.

or which are
mentioned in all grouping sets (after all expansions);

Best Regards

Andy Fan

Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS

From

Tom Lane

Date:

21 August 2020, 05:44:12

Andy Fan <zhihui.fan1213@gmail.com> writes:
> On Fri, Aug 21, 2020 at 5:51 AM Andrew Gierth <andrew@tao11.riddles.org.uk>
> wrote:
>> Unless I'm missing something, it should be safe to reference output
>> columns that are not mentioned in any grouping set,

> I think such columns usually are aggregation expr,  If we want to push down
> a qual which reference to an aggregation expr,  we have to push down
> to having cause, However I am not sure such pushing down really helps.

Well, they can either be aggregates, or functions of the grouping
columns.  You're right that there's not much we can do (today) with
restrictions on aggregate outputs, but there can be value in pushing
down restrictions on the other sort.

As an example, consider the regression database's tenk1 table, and
for argument's sake add

regression=# create index on tenk1 (abs(hundred));
CREATE INDEX

Then we can get

regression=# explain select * from (select hundred, ten, abs(hundred) a, count(*) c from tenk1 group by 1,2) ss where a
=42; 
                                     QUERY PLAN
------------------------------------------------------------------------------------
 HashAggregate  (cost=225.98..227.18 rows=96 width=20)
   Group Key: tenk1.hundred, tenk1.ten
   ->  Bitmap Heap Scan on tenk1  (cost=5.06..225.23 rows=100 width=8)
         Recheck Cond: (abs(hundred) = 42)
         ->  Bitmap Index Scan on tenk1_abs_idx  (cost=0.00..5.04 rows=100 width=0)
               Index Cond: (abs(hundred) = 42)
(6 rows)

which is a lot cheaper than the pure seqscan you get with no pushed-down
condition.

One thing that I find curious is that if I alter this example to use
grouping sets, say

regression=# explain select * from (select hundred, ten, abs(hundred) a, count(*) c from tenk1 group by grouping sets
(1,2))ss where a = 42; 
                           QUERY PLAN
-----------------------------------------------------------------
 HashAggregate  (cost=495.00..546.65 rows=2 width=20)
   Hash Key: tenk1.hundred
   Hash Key: tenk1.ten
   Filter: (abs(tenk1.hundred) = 42)
   ->  Seq Scan on tenk1  (cost=0.00..445.00 rows=10000 width=8)
(5 rows)

i.e. it's not seeing the abs() condition as pushable below the
aggregation.  I'm not quite sure if that's a necessary restriction
or a missed optimization.

            regards, tom lane

Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS

From

Andy Fan

Date:

21 August 2020, 06:50:10

On Fri, Aug 21, 2020 at 10:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andy Fan <zhihui.fan1213@gmail.com> writes:
> On Fri, Aug 21, 2020 at 5:51 AM Andrew Gierth <andrew@tao11.riddles.org.uk>
> wrote:
>> Unless I'm missing something, it should be safe to reference output
>> columns that are not mentioned in any grouping set,

> I think such columns usually are aggregation expr, If we want to push down
> a qual which reference to an aggregation expr, we have to push down
> to having cause, However I am not sure such pushing down really helps.

Well, they can either be aggregates, or functions of the grouping
columns. You're right that there's not much we can do (today) with
restrictions on aggregate outputs, but there can be value in pushing
down restrictions on the other sort.

As an example, consider the regression database's tenk1 table, and
for argument's sake add

regression=# create index on tenk1 (abs(hundred));
CREATE INDEX

Then we can get

regression=# explain select * from (select hundred, ten, abs(hundred) a, count(*) c from tenk1 group by 1,2) ss where a = 42;
QUERY PLAN
------------------------------------------------------------------------------------
HashAggregate (cost=225.98..227.18 rows=96 width=20)
Group Key: tenk1.hundred, tenk1.ten
-> Bitmap Heap Scan on tenk1 (cost=5.06..225.23 rows=100 width=8)
Recheck Cond: (abs(hundred) = 42)
-> Bitmap Index Scan on tenk1_abs_idx (cost=0.00..5.04 rows=100 width=0)
Index Cond: (abs(hundred) = 42)
(6 rows)

which is a lot cheaper than the pure seqscan you get with no pushed-down
condition.

One thing that I find curious is that if I alter this example to use
grouping sets, say

regression=# explain select * from (select hundred, ten, abs(hundred) a, count(*) c from tenk1 group by grouping sets (1,2)) ss where a = 42;
QUERY PLAN
-----------------------------------------------------------------
HashAggregate (cost=495.00..546.65 rows=2 width=20)
Hash Key: tenk1.hundred
Hash Key: tenk1.ten
Filter: (abs(tenk1.hundred) = 42)
-> Seq Scan on tenk1 (cost=0.00..445.00 rows=10000 width=8)
(5 rows)

i.e. it's not seeing the abs() condition as pushable below the
aggregation. I'm not quite sure if that's a necessary restriction
or a missed optimization.

regards, tom lane

Both of the queries can push down the qual "a = 42" to the subquery->havingQual

since we have group-by clause, this method unify the process for aggregation call

and non-aggregation expr. . so it become to

select .. from (select .. from tenk1 group .. having (abs(hundred) = 2);

later in the subquery_planner, we will try to pull the having clause to where clause.

then the Q2 failed to do so.

* In some cases we may want to transfer a HAVING clause into WHERE. We
* cannot do so if the HAVING clause contains aggregates (obviously) or
* volatile functions (since a HAVING clause is supposed to be executed
* only once per group). We also can't do this if there are any nonempty
* grouping sets; moving such a clause into WHERE would potentially change
* the results, if any referenced column isn't present in all the grouping
* sets. (If there are only empty grouping sets, then the HAVING clause
* must be degenerate as discussed below.)

I'm still trying to understand the comment, though.

Best Regards

Andy Fan

Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS

From

Andy Fan

Date:

21 August 2020, 07:15:06

On Fri, Aug 21, 2020 at 11:50 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Fri, Aug 21, 2020 at 10:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andy Fan <zhihui.fan1213@gmail.com> writes:
> On Fri, Aug 21, 2020 at 5:51 AM Andrew Gierth <andrew@tao11.riddles.org.uk>
> wrote:
>> Unless I'm missing something, it should be safe to reference output
>> columns that are not mentioned in any grouping set,

> I think such columns usually are aggregation expr, If we want to push down
> a qual which reference to an aggregation expr, we have to push down
> to having cause, However I am not sure such pushing down really helps.

Well, they can either be aggregates, or functions of the grouping
columns. You're right that there's not much we can do (today) with
restrictions on aggregate outputs, but there can be value in pushing
down restrictions on the other sort.

As an example, consider the regression database's tenk1 table, and
for argument's sake add

regression=# create index on tenk1 (abs(hundred));
CREATE INDEX

Then we can get

regression=# explain select * from (select hundred, ten, abs(hundred) a, count(*) c from tenk1 group by 1,2) ss where a = 42;
QUERY PLAN
------------------------------------------------------------------------------------
HashAggregate (cost=225.98..227.18 rows=96 width=20)
Group Key: tenk1.hundred, tenk1.ten
-> Bitmap Heap Scan on tenk1 (cost=5.06..225.23 rows=100 width=8)
Recheck Cond: (abs(hundred) = 42)
-> Bitmap Index Scan on tenk1_abs_idx (cost=0.00..5.04 rows=100 width=0)
Index Cond: (abs(hundred) = 42)
(6 rows)

which is a lot cheaper than the pure seqscan you get with no pushed-down
condition.

One thing that I find curious is that if I alter this example to use
grouping sets, say

regression=# explain select * from (select hundred, ten, abs(hundred) a, count(*) c from tenk1 group by grouping sets (1,2)) ss where a = 42;
QUERY PLAN
-----------------------------------------------------------------
HashAggregate (cost=495.00..546.65 rows=2 width=20)
Hash Key: tenk1.hundred
Hash Key: tenk1.ten
Filter: (abs(tenk1.hundred) = 42)
-> Seq Scan on tenk1 (cost=0.00..445.00 rows=10000 width=8)
(5 rows)

i.e. it's not seeing the abs() condition as pushable below the
aggregation. I'm not quite sure if that's a necessary restriction
or a missed optimization.

regards, tom lane

Both of the queries can push down the qual "a = 42" to the subquery->havingQual
since we have group-by clause, this method unify the process for aggregation call
and non-aggregation expr. . so it become to

select .. from (select .. from tenk1 group .. having (abs(hundred) = 2);

later in the subquery_planner, we will try to pull the having clause to where clause.
then the Q2 failed to do so.

/*
* In some cases we may want to transfer a HAVING clause into WHERE. We
* cannot do so if the HAVING clause contains aggregates (obviously) or
* volatile functions (since a HAVING clause is supposed to be executed
* only once per group). We also can't do this if there are any nonempty
* grouping sets; moving such a clause into WHERE would potentially change
* the results, if any referenced column isn't present in all the grouping
* sets. (If there are only empty grouping sets, then the HAVING clause
* must be degenerate as discussed below.)
*/

I'm still trying to understand the comment, though.

This should be a correct behavior, we should not push down in the Q2 case. Here is an example:

regression=# create table tgs(a int, b int);
CREATE TABLE
regression=# insert into tgs values(1, 1), (1, 2), (2, 2);
INSERT 0 3
regression=# select * from (select a, b, count(*) from tgs group by grouping sets((a), (b))) t where b = 1;
a | b | count
---+---+-------
| 1 | 1
(1 row)

regression=# select * from (select a, b, count(*) from tgs group by grouping sets((a), (b)) having b = 1) t;
a | b | count
---+---+-------
| 1 | 1
(1 row)

regression=# select * from (select a, b, count(*) from tgs where b = 1 group by grouping sets((a), (b)) ) t;
a | b | count
---+---+-------
1 | | 1
| 1 | 1

(2 rows)

At the same time, our optimizer is smart enough to handle the below case (only 1 set in group sets, which equals

group by).

regression=# explain select * from (select a, b, count(*) from tgs group by grouping sets((a, b)) ) t where b = 1;
QUERY PLAN
-----------------------------------------------------------------
GroupAggregate (cost=38.44..38.66 rows=11 width=16)
Group Key: tgs.a, tgs.b
-> Sort (cost=38.44..38.47 rows=11 width=8)
Sort Key: tgs.a
-> Seq Scan on tgs (cost=0.00..38.25 rows=11 width=8)
Filter: (b = 1)
(6 rows)

Best Regards

Andy Fan

Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS

From

Andrew Gierth

Date:

21 August 2020, 16:04:50

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> One thing that I find curious is that if I alter this example to use
 Tom> grouping sets, say

 Tom> regression=# explain select * from (select hundred, ten, abs(hundred) a, count(*) c from tenk1 group by grouping
sets(1,2)) ss where a = 42;
 
 Tom>                            QUERY PLAN                            
 Tom> -----------------------------------------------------------------
 Tom>  HashAggregate  (cost=495.00..546.65 rows=2 width=20)
 Tom>    Hash Key: tenk1.hundred
 Tom>    Hash Key: tenk1.ten
 Tom>    Filter: (abs(tenk1.hundred) = 42)
 Tom>    -> Seq Scan on tenk1  (cost=0.00..445.00 rows=10000 width=8)
 Tom> (5 rows)

 Tom> i.e. it's not seeing the abs() condition as pushable below the
 Tom> aggregation. I'm not quite sure if that's a necessary restriction
 Tom> or a missed optimization.

subquery_planner isn't transferring HAVING clauses to WHERE if that
would cross a nontrivial GROUPING SETS. It could in theory do so by
inspecting whether the referenced columns are in all grouping sets or
none, but currently the planner doesn't have any reason to compute that
intersection and it would add quite a bit of complexity to that specific
point in the code. (Without grouping sets, a HAVING clause is movable to
WHERE if it's non-volatile and has no aggregations, since that implies
it must evaluate to the same value for each row in any group.)

In this example, pushing the condition below the aggregate would be
wrong anyway, no?

-- 
Andrew (irc:RhodiumToad)

Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS

From

Tom Lane

Date:

21 August 2020, 23:49:07

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> subquery_planner isn't transferring HAVING clauses to WHERE if that
> would cross a nontrivial GROUPING SETS. It could in theory do so by
> inspecting whether the referenced columns are in all grouping sets or
> none, but currently the planner doesn't have any reason to compute that
> intersection and it would add quite a bit of complexity to that specific
> point in the code.

Hm.  I see that computing that set is not really trivial.  I'd supposed
that we probably had code to do it somewhere, but if we don't, I'm
disinclined to add it for this.  So that leads to the conclusion that we
should just shut off push-down in this situation, as per attached quick
hack (no test case) patch.

> In this example, pushing the condition below the aggregate would be
> wrong anyway, no?

Agreed.  I hadn't thought hard enough about the semantics, but if
"hundred" goes to null in a particular grouping set, so should
"abs(hundred)".

            regards, tom lane

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6da0dcd61c..763e348d52 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3182,6 +3182,15 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
  * volatile qual could succeed for some SRF output rows and fail for others,
  * a behavior that cannot occur if it's evaluated before SRF expansion.
  *
+ * 6. If the subquery has nonempty grouping sets, we cannot push down any
+ * quals.  The concern here is that a qual referencing a "constant" grouping
+ * column could get constant-folded, which would be improper because the value
+ * is potentially nullable by grouping-set expansion.  This restriction could
+ * be removed if we had a parsetree representation that shows that such
+ * grouping columns are not really constant.  (There are other ideas that
+ * could be used to relax this restriction, but that's the approach most
+ * likely to get taken in the future.)
+ *
  * In addition, we make several checks on the subquery's output columns to see
  * if it is safe to reference them in pushed-down quals.  If output column k
  * is found to be unsafe to reference, we set safetyInfo->unsafeColumns[k]
@@ -3226,6 +3235,10 @@ subquery_is_pushdown_safe(Query *subquery, Query *topquery,
     if (subquery->limitOffset != NULL || subquery->limitCount != NULL)
         return false;

+    /* Check point 6 */
+    if (subquery->groupClause && subquery->groupingSets)
+        return false;
+
     /* Check points 3, 4, and 5 */
     if (subquery->distinctClause ||
         subquery->hasWindowFuncs ||