Re: FuncExpr.collid/OpExpr.collid unworkably serving double duty - Mailing list pgsql-hackers

From Tom Lane
Subject Re: FuncExpr.collid/OpExpr.collid unworkably serving double duty
Date
Msg-id 13639.1300124961@sss.pgh.pa.us
Whole thread Raw
In response to Re: FuncExpr.collid/OpExpr.collid unworkably serving double duty  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I wrote:
> In any case, I am sure that that what this describes is not what our
> current code does :-(, and that we can't get anywhere close to this with
> the existing infrastructure.  So at this point I'm thinking that the
> safest approach is to rip out the result-collation caching fields and
> perform collation assignment in a parsing post-pass.  That will allow us
> to revise the collation assignment algorithm without further catversion
> bumps.

After further examination of the patch, I'm feeling that removing the
result-collation fields isn't going to be very practical after all.
The problem is that there are now exprCollation() calls all over the
place, many of them in performance-critical places, and there is no
way that exprCollation() will be a cheap operation if we remove the
result-collation fields.  I had thought that we could avoid this if
we still stored collation in a few critical places such as TargetEntry.
But that only appears to take care of some of the call sites, and anyway
it'd be a little weird to store collation there when we don't store
column type there.  For better or worse, the system is designed around
the assumption that expression trees are self-identifying as to output
type and typmod, so I think we've now got to include result collation
in that too.

This implies further bloat in expression trees of course.  By my count,
the following node types also need to store input collation, so will now
have 2 collation fields not
one:AggrefWindowFuncFuncExprOpExprDistinctExprNullIfExprScalarArrayOpExprRowCompareExprMinMaxExpr

(No, wait, DistinctExpr and RowCompareExpr always yield boolean so
they don't need result collation fields, just the input collation.)

There are also several node types that the existing patch neglected to
add collation fields to, but really need to have it AFAICS; one pretty
obvious example being CoerceToDomain.  Actually I think we have to have
a result collation field in every node type that can possibly deliver a
result of a collatable type.

I'm still going to switch over to the post-pass collation assignment
method, because otherwise we're going to need to add collation state
fields as well to all of these node types ...
        regards, tom lane


pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: Avoiding repeated ON COMMIT truncation for temporary tables
Next
From: Magnus Hagander
Date:
Subject: Re: initdb -A ident, with params