Re: Fix HAVING-to-WHERE pushdown with nondeterministic collations - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Fix HAVING-to-WHERE pushdown with nondeterministic collations
Date
Msg-id CAMbWs49bYv4Z=Kpc36y9VjM_6yVQxe8fwKhHizn3o3Ktbhu4hQ@mail.gmail.com
Whole thread
In response to Re: Fix HAVING-to-WHERE pushdown with nondeterministic collations  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: Fix HAVING-to-WHERE pushdown with nondeterministic collations
List pgsql-hackers
On Thu, Apr 2, 2026 at 7:11 PM Richard Guo <guofenglinux@gmail.com> wrote:
> On Wed, Apr 1, 2026 at 11:19 AM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
> > > + vars = pull_var_clause(node, PVC_RECURSE_PLACEHOLDERS);
> > > +
> > > + foreach_node(Var, var, vars)
> > > + {
> > > + if (var->varno == *group_rtindex &&
> > > + OidIsValid(var->varcollid) &&
> > > + var->varcollid != inputcollid &&
> > > + !get_collation_isdeterministic(var->varcollid))
> > > + {
> > > + list_free(vars);
> > > + return true;
> > > + }
> > > + }
> > > +
> > > + list_free(vars);
>
> > This might be overthinking, but I wonder if calling pull_var_clause() at each walker step could introduce some
overheaddue to repeated subtree scans 

> That's a good point, but I doubt that it'd be an issue in practice.
> HAVING clauses are typically very small expressions.  Even in unusual
> queries, the clause size is bounded by what a human writes, which is
> negligible compared to the work the planner does elsewhere.

I was about to push the v2 patch, but I just can't shake off the
concern Wenhui Qiu raised about the repeated subtree scan.  I still
don't have a concrete real-world case where a query has a large enough
HAVING clause for it to matter, but let's just be paranoid.

I think we can fix it easily.  The current walker calls
pull_var_clause() at every collation-aware node, which re-walks the
subtree.  The fix is to flip it inside out: walk top-down, push
inputcollids onto a LIFO stack, and at each GROUP Var check against
the stack.  This way, we only need to walk the expression tree once.
Attached v3 does this.

v3 also fixes the RowCompareExpr case.  Unlike the node types covered
by exprInputCollation(), RowCompareExpr carries per-column
inputcollids[] rather than a single inputcollid, so we need to descend
into each (largs[i], rargs[i]) pair with the matching collation pushed
onto the stack.  Without this, a HAVING clause like:

  HAVING ROW(x, 1) < ROW('ABC' COLLATE case_sensitive, 1)

over a case_insensitive group would give wrong results.

- Richard

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [PATCH] Fix EINTR retry condition in pg_flush_data sync_file_range path.
Next
From: Chao Li
Date:
Subject: Re: Fix race condition in XLogLogicalInfo and ProcSignal initialization.