Hi Richard
> + if (OidIsValid(inputcollid))
> + {
> + List *vars;
> +
> + /*
> + * We use PVC_RECURSE_PLACEHOLDERS because PlaceHolderVars may have
> + * been introduced by pull_up_subqueries, and we need to look through
> + * them to find the underlying Vars. We don't need to consider
> + * Aggrefs because clauses containing aggregates are already excluded
> + * from HAVING-to-WHERE pushdown by the contain_agg_clause check.
> + * Likewise, WindowFuncs are ignored since they cannot appear in a
> + * HAVING clause.
> + */
> + vars = pull_var_clause(node, PVC_RECURSE_PLACEHOLDERS);
> +
> + foreach_node(Var, var, vars)
> + {
> + if (var->varno == *group_rtindex &&
> + OidIsValid(var->varcollid) &&
> + var->varcollid != inputcollid &&
> + !get_collation_isdeterministic(var->varcollid))
> + {
> + list_free(vars);
> + return true;
> + }
> + }
> +
> + list_free(vars);
> + }
> +
> + return expression_tree_walker(node, having_collation_conflict_walker,
> + group_rtindex);
> +}
This might be overthinking, but I wonder if calling pull_var_clause() at each walker step could introduce some overhead due to repeated subtree scans
,Should we add a test (SELECT x, count(*) FROM test3ci GROUP BY x HAVING max(x) = 'abc' COLLATE case_sensitive;)
Thanks
As briefly discussed on Discord, when a GROUP BY clause uses a
nondeterministic collation, the planner's optimization of moving
HAVING clauses to WHERE can produce incorrect results if the HAVING
clause applies a stricter collation.
CREATE TABLE t (x TEXT COLLATE case_insensitive);
INSERT INTO t VALUES ('a'), ('A');
SELECT x, count(*) FROM t GROUP BY x HAVING x = 'a' COLLATE "C";
This returns count=1, but should return count=2.
The attached draft patch fixes this for HEAD by leveraging GROUP Vars
(Vars referencing RTE_GROUP) to detect collation conflicts on a
per-clause basis, so only unsafe clauses are kept in HAVING while safe
ones are still pushed. Please see the commit message for more
details.
For versions prior to v18, we do not have GROUP Vars. I wonder if we
can take a conservative approach: skipping the HAVING-to-WHERE
pushdown optimization entirely if any GROUP BY expression uses a
nondeterministic collation.
Thoughts and reviews are welcome.
- Richard