Re: Fix HAVING-to-WHERE pushdown with nondeterministic collations - Mailing list pgsql-hackers

From wenhui qiu
Subject Re: Fix HAVING-to-WHERE pushdown with nondeterministic collations
Date
Msg-id CAGjGUA+zH9co3x-vGTDUp2h9sS0NQ9XPF9ZWL_D-aP1KE9BpKg@mail.gmail.com
Whole thread
In response to Fix HAVING-to-WHERE pushdown with nondeterministic collations  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: Fix HAVING-to-WHERE pushdown with nondeterministic collations
List pgsql-hackers
Hi Richard
> + if (OidIsValid(inputcollid))
> + {
> + List   *vars;
> +
> + /*
> + * We use PVC_RECURSE_PLACEHOLDERS because PlaceHolderVars may have
> + * been introduced by pull_up_subqueries, and we need to look through
> + * them to find the underlying Vars.  We don't need to consider
> + * Aggrefs because clauses containing aggregates are already excluded
> + * from HAVING-to-WHERE pushdown by the contain_agg_clause check.
> + * Likewise, WindowFuncs are ignored since they cannot appear in a
> + * HAVING clause.
> + */
> + vars = pull_var_clause(node, PVC_RECURSE_PLACEHOLDERS);
> +
> + foreach_node(Var, var, vars)
> + {
> + if (var->varno == *group_rtindex &&
> + OidIsValid(var->varcollid) &&
> + var->varcollid != inputcollid &&
> + !get_collation_isdeterministic(var->varcollid))
> + {
> + list_free(vars);
> + return true;
> + }
> + }
> +
> + list_free(vars);
> + }
> +
> + return expression_tree_walker(node, having_collation_conflict_walker,
> +  group_rtindex);
> +}
This might be overthinking, but I wonder if calling pull_var_clause() at each walker step could introduce some overhead due to repeated subtree scans
,Should we add a test (SELECT x, count(*) FROM test3ci GROUP BY x HAVING max(x) = 'abc' COLLATE case_sensitive;)

Thanks

On Tue, Mar 31, 2026 at 11:41 AM Richard Guo <guofenglinux@gmail.com> wrote:
As briefly discussed on Discord, when a GROUP BY clause uses a
nondeterministic collation, the planner's optimization of moving
HAVING clauses to WHERE can produce incorrect results if the HAVING
clause applies a stricter collation.

CREATE TABLE t (x TEXT COLLATE case_insensitive);
INSERT INTO t VALUES ('a'), ('A');

SELECT x, count(*) FROM t GROUP BY x HAVING x = 'a' COLLATE "C";

This returns count=1, but should return count=2.

The attached draft patch fixes this for HEAD by leveraging GROUP Vars
(Vars referencing RTE_GROUP) to detect collation conflicts on a
per-clause basis, so only unsafe clauses are kept in HAVING while safe
ones are still pushed.  Please see the commit message for more
details.

For versions prior to v18, we do not have GROUP Vars.  I wonder if we
can take a conservative approach: skipping the HAVING-to-WHERE
pushdown optimization entirely if any GROUP BY expression uses a
nondeterministic collation.

Thoughts and reviews are welcome.

- Richard

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Skipping schema changes in publication
Next
From: Raghav Mittal
Date:
Subject: Proposal: Track last-used timestamp for index usage