Thread: BUG #17985: Inconsistent results of SELECT comparing two CASE WHEN clause

BUG #17985: Inconsistent results of SELECT comparing two CASE WHEN clause

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      17985
Logged by:          Zuming Jiang
Email address:      zuming.jiang@inf.ethz.ch
PostgreSQL version: 16beta1
Operating system:   Ubuntu 20.04
Description:

My fuzzer finds a correctness bug in Postgres, which makes Postgres return
inconsistent results. This bug can be reproduced even after applying the
fixing patches for
https://www.postgresql.org/message-id/flat/17976-4b638b525e9a983b%40postgresql.org
and
https://www.postgresql.org/message-id/flat/17978-12f3d93a55297266%40postgresql.org

--- Set up database ---
create table t0 (c2 text);
create table t2 (c10 text);
create table t5 (vkey int4, pkey int4, c27 text, c28 text, c29 text, c30
text);
insert into t0 values ('');
insert into t2 values ('');
insert into t5 values (1, 2, 'a', 'a', 'a', 'a'), (0, 1, '', '', 'a',
'L');
---

The fuzzer generates Test case 1:

--- Test case 1 ---
select * from t5
where (t5.pkey >= t5.vkey) <> (t5.c30 = (
    select
        t5.c29 as c_0
      from
        (t2 as ref_0
          inner join t0 as ref_1
          on (ref_0.c10 = ref_1.c2))
      where ((case when (((ref_0.c10 like 'z~%')
                        and (not (ref_0.c10 like 'z~%')))
                        and ((ref_0.c10 like 'z~%') is not null)) then
t5.c28 else t5.c28 end)
           = (case when (((ref_1.c2 not like '_%%')
                        and (not (ref_1.c2 not like '_%%')))
                        and ((ref_1.c2 not like '_%%') is not null)) then
t5.c29 else t5.c27 end))
      order by c_0 desc limit 1));
---

Because the then branch and else branch of the CASE WHEN expression '((case
when (((ref_0.c10 like 'z~%') and (not (ref_0.c10 like 'z~%'))) and
((ref_0.c10 like 'z~%') is not null)) then t5.c28 else t5.c28 end)' are the
same (both are t5.c28), I simplify this CASE WHEN expression by replacing it
with t5.c28, and get Test case 2:

--- Test case 2 ---
select * from t5
where (t5.pkey >= t5.vkey) <> (t5.c30 = (
    select
        t5.c29 as c_0
      from
        (t2 as ref_0
          inner join t0 as ref_1
          on (ref_0.c10 = ref_1.c2))
      where (t5.c28
           = (case when (((ref_1.c2 not like '_%%')
                        and (not (ref_1.c2 not like '_%%')))
                        and ((ref_1.c2 not like '_%%') is not null)) then
t5.c29 else t5.c27 end))
      order by c_0 desc limit 1));
---

--- Expected behavior ---
Test case 1 and Test case 2 return the same results.

--- Actual behavior ---
Test case 1 returns 0 rows, while Test case 2 returns 1 row.

Output of Test case 1:
 vkey | pkey | c27 | c28 | c29 | c30 
------+------+-----+-----+-----+-----
(0 rows)

Output of Test case 2:
 vkey | pkey | c27 | c28 | c29 | c30 
------+------+-----+-----+-----+-----
    0 |    1 |     |     | a   | L
(1 row)

--- Postgres version ---
Github commit: efeb12ef0bfef0b5aa966a56bb4dbb1f936bda0c
Version: PostgreSQL 16beta1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit

--- Platform information ---
Platform: Ubuntu 20.04
Kernel: Linux 5.4.0-147-generic


PG Bug reporting form <noreply@postgresql.org> writes:
> My fuzzer finds a correctness bug in Postgres, which makes Postgres return
> inconsistent results.

Huh.  This is a very ancient bug, dating seemingly to commit d24d75ff1.
When rescanning a hash join, we'll skip rebuilding the hash table if
the inner subplan contains no updated Param values.  However, the
inner hash key expressions can themselves contain Param references,
which evidently are not caught by that test.  A change in the value
of such a Param necessitates rebuilding the hash table, and this
example shows that we're not doing that.

One way to fix it is as attached.  I wonder though if this isn't
telling us that there's a bug in the planner's assignment of
allParams bits for Hash nodes.  The dangerous Param is present
in the Hash node's hashkeys field, so why isn't the existing
chgParam test adequate?

            regards, tom lane

diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 980746128b..41217d189d 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -169,6 +169,7 @@
 #include "executor/nodeHash.h"
 #include "executor/nodeHashjoin.h"
 #include "miscadmin.h"
+#include "optimizer/clauses.h"
 #include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/sharedtuplestore.h"
@@ -1413,13 +1414,14 @@ ExecReScanHashJoin(HashJoinState *node)
      * In a multi-batch join, we currently have to do rescans the hard way,
      * primarily because batch temp files may have already been released. But
      * if it's a single-batch join, and there is no parameter change for the
-     * inner subnode, then we can just re-use the existing hash table without
-     * rebuilding it.
+     * inner subnode nor any PARAM_EXEC Params in the inner hash keys, then we
+     * can just re-use the existing hash table without rebuilding it.
      */
     if (node->hj_HashTable != NULL)
     {
         if (node->hj_HashTable->nbatch == 1 &&
-            innerPlan->chgParam == NULL)
+            innerPlan->chgParam == NULL &&
+            !contain_any_exec_param((Node *) ((Hash *) innerPlan->plan)->hashkeys))
         {
             /*
              * Okay to reuse the hash table; needn't rescan inner, either.
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 7f453b04f8..bee69014d4 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -103,6 +103,7 @@ static bool contain_volatile_functions_not_nextval_walker(Node *node, void *cont
 static bool max_parallel_hazard_walker(Node *node,
                                        max_parallel_hazard_context *context);
 static bool contain_nonstrict_functions_walker(Node *node, void *context);
+static bool contain_any_exec_param_walker(Node *node, void *arg);
 static bool contain_exec_param_walker(Node *node, List *param_ids);
 static bool contain_context_dependent_node(Node *clause);
 static bool contain_context_dependent_node_walker(Node *node, int *flags);
@@ -1043,9 +1044,37 @@ contain_nonstrict_functions_walker(Node *node, void *context)
  *****************************************************************************/

 /*
- * contain_exec_param
+ * contain_any_exec_param
  *      Recursively search for PARAM_EXEC Params within a clause.
  *
+ * Returns true if the clause contains any PARAM_EXEC Param.
+ * Does not descend into subqueries!
+ */
+bool
+contain_any_exec_param(Node *clause)
+{
+    return contain_any_exec_param_walker(clause, NULL);
+}
+
+static bool
+contain_any_exec_param_walker(Node *node, void *arg)
+{
+    if (node == NULL)
+        return false;
+    if (IsA(node, Param))
+    {
+        Param       *p = (Param *) node;
+
+        if (p->paramkind == PARAM_EXEC)
+            return true;
+    }
+    return expression_tree_walker(node, contain_any_exec_param_walker, arg);
+}
+
+/*
+ * contain_exec_param
+ *      Recursively search for specific PARAM_EXEC Params within a clause.
+ *
  * Returns true if the clause contains any PARAM_EXEC Param with a paramid
  * appearing in the given list of Param IDs.  Does not descend into
  * subqueries!
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index cbe0607e85..058f9c0de9 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -35,6 +35,7 @@ extern bool contain_subplans(Node *clause);
 extern char max_parallel_hazard(Query *parse);
 extern bool is_parallel_safe(PlannerInfo *root, Node *node);
 extern bool contain_nonstrict_functions(Node *clause);
+extern bool contain_any_exec_param(Node *clause);
 extern bool contain_exec_param(Node *clause, List *param_ids);
 extern bool contain_leaked_vars(Node *clause);

diff --git a/src/test/regress/expected/join_hash.out b/src/test/regress/expected/join_hash.out
index 4faf010f8c..262fa71ed8 100644
--- a/src/test/regress/expected/join_hash.out
+++ b/src/test/regress/expected/join_hash.out
@@ -1128,3 +1128,39 @@ WHERE
 (1 row)

 ROLLBACK;
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+                        QUERY PLAN
+-----------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on int8_tbl i8
+   ->  Sort
+         Sort Key: t1.fivethous, i4.f1
+         ->  Hash Join
+               Hash Cond: (t1.fivethous = (i4.f1 + i8.q2))
+               ->  Seq Scan on tenk1 t1
+               ->  Hash
+                     ->  Seq Scan on int4_tbl i4
+(9 rows)
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+ q2  | fivethous | f1
+-----+-----------+----
+ 456 |       456 |  0
+ 456 |       456 |  0
+ 123 |       123 |  0
+ 123 |       123 |  0
+(4 rows)
+
+rollback;
diff --git a/src/test/regress/sql/join_hash.sql b/src/test/regress/sql/join_hash.sql
index e73f645e9e..6b0688ab0a 100644
--- a/src/test/regress/sql/join_hash.sql
+++ b/src/test/regress/sql/join_hash.sql
@@ -604,3 +604,22 @@ WHERE
     AND hjtest_1.a <> hjtest_2.b;

 ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;

I wrote:
> One way to fix it is as attached.  I wonder though if this isn't
> telling us that there's a bug in the planner's assignment of
> allParams bits for Hash nodes.  The dangerous Param is present
> in the Hash node's hashkeys field, so why isn't the existing
> chgParam test adequate?

Yeah, so looking at that, the planner is just ignoring Hash.hashkeys,
apparently figuring that incorporating Param IDs from HashJoin.hashclauses
into the allParams for the parent HashJoin is sufficient.  But as we
see here, it isn't.  The attached seems like a better fix.

            regards, tom lane

diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 5f12b2ef9b..da2d8abe01 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2648,6 +2648,11 @@ finalize_plan(PlannerInfo *root, Plan *plan,
                               &context);
             break;

+        case T_Hash:
+            finalize_primnode((Node *) ((Hash *) plan)->hashkeys,
+                              &context);
+            break;
+
         case T_Limit:
             finalize_primnode(((Limit *) plan)->limitOffset,
                               &context);
@@ -2748,7 +2753,6 @@ finalize_plan(PlannerInfo *root, Plan *plan,
             break;

         case T_ProjectSet:
-        case T_Hash:
         case T_Material:
         case T_Sort:
         case T_IncrementalSort:
diff --git a/src/test/regress/expected/join_hash.out b/src/test/regress/expected/join_hash.out
index 4faf010f8c..262fa71ed8 100644
--- a/src/test/regress/expected/join_hash.out
+++ b/src/test/regress/expected/join_hash.out
@@ -1128,3 +1128,39 @@ WHERE
 (1 row)

 ROLLBACK;
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+                        QUERY PLAN
+-----------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on int8_tbl i8
+   ->  Sort
+         Sort Key: t1.fivethous, i4.f1
+         ->  Hash Join
+               Hash Cond: (t1.fivethous = (i4.f1 + i8.q2))
+               ->  Seq Scan on tenk1 t1
+               ->  Hash
+                     ->  Seq Scan on int4_tbl i4
+(9 rows)
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+ q2  | fivethous | f1
+-----+-----------+----
+ 456 |       456 |  0
+ 456 |       456 |  0
+ 123 |       123 |  0
+ 123 |       123 |  0
+(4 rows)
+
+rollback;
diff --git a/src/test/regress/sql/join_hash.sql b/src/test/regress/sql/join_hash.sql
index e73f645e9e..6b0688ab0a 100644
--- a/src/test/regress/sql/join_hash.sql
+++ b/src/test/regress/sql/join_hash.sql
@@ -604,3 +604,22 @@ WHERE
     AND hjtest_1.a <> hjtest_2.b;

 ROLLBACK;
+
+-- Verify that we behave sanely when the inner hash keys contain parameters
+-- (that is, outer or lateral references).  This situation has to defeat
+-- re-use of the inner hash table across rescans.
+begin;
+set local enable_hashjoin = on;
+
+explain (costs off)
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+select i8.q2, ss.* from
+int8_tbl i8,
+lateral (select t1.fivethous, i4.f1 from tenk1 t1 join int4_tbl i4
+         on t1.fivethous = i4.f1+i8.q2 order by 1,2) ss;
+
+rollback;