Thread: FuncExpr.collid/OpExpr.collid unworkably serving double duty

FuncExpr.collid/OpExpr.collid unworkably serving double duty

From

Tom Lane

Date:

09 March 2011, 17:49:42

So I was moving some error checks around and all of a sudden the
regression tests blew up on me, with lots of errors about how type X
didn't support collations (which indeed it didn't).  After some
investigation I realized what should have been apparent much earlier:
the collations patch is trying to use one field for two different
purposes.  In particular, collid in FuncExpr and related nodes is
used in both of these ways:

* as the collation to apply during execution of the function;

* as the collation of the function's result.

The trouble is that these usages are only compatible if both the
arguments and result of the function are of collatable types.  In
particular, the change I made was causing CREATE VIEW to fail on
examples like this:

regression=# create view vv as select 'z'::text < 'y'::text as b;
ERROR:  collations are not supported by type boolean

because the OpExpr node must have nonzero collid to do a textual
comparison, but then exprCollation() claims that the result of the
expression has that collation too, which it does not because it's
only bool.

Aside from confusing code that tries to impute a collation to the
result of an expression, this will confuse the collation assignment code
itself: select_common_collation will mistakenly believe that
non-collatable input arguments should affect its results.  I'm not sure
how you managed to avoid such failures in the committed patch (if indeed
you did avoid them and they're not just lurking in un-regression-tested
cases); but in any case it seems far too fragile to keep it this way.

There are basically two things we could do about this:

1. Add two fields not one to nodes representing function/operator calls.

2. Change exprCollation() to do a type_is_collatable check on the
node result type before believing that the collid field is relevant.

Of course the syscache lookup implied by type_is_collatable will mean
that exprCollation is orders of magnitude slower than it is now.  So
this is a pretty straightforward space vs speed tradeoff.  I'm inclined
to think that choice #1 is the lesser evil, because I'm afraid that this
patch has already added an undesirable number of new cache lookups to
the basic expression parsing paths.

Thoughts?
        regards, tom lane

Re: FuncExpr.collid/OpExpr.collid unworkably serving double duty

From

Martijn van Oosterhout

Date:

10 March 2011, 04:20:46

On Wed, Mar 09, 2011 at 04:49:28PM -0500, Tom Lane wrote:
> So I was moving some error checks around and all of a sudden the
> regression tests blew up on me, with lots of errors about how type X
> didn't support collations (which indeed it didn't).  After some
> investigation I realized what should have been apparent much earlier:
> the collations patch is trying to use one field for two different
> purposes.  In particular, collid in FuncExpr and related nodes is
> used in both of these ways:
>
> * as the collation to apply during execution of the function;
>
> * as the collation of the function's result.

Ouch, that is painful.

Looking back at my first attempt I see I made the same error, though I
had noted that it had an unusual failure mode, namely that:

func( a COLLATE x ) COLLATE y

would determine that "x" was the collation to use for func, not "y",
and that "y" may be ignored. A bit of a corner case, but someone was
bound to try it.

I think I avoided the particular failure mode you found, because the
GetCollation on the FuncExpr didn't return the collation calculated for
the node, but the the collation derived from the collations of any
arguments that had the same type and the return value. So operators
like = and < automatically got NONE because none of their arguments are
booleans.

> regression=# create view vv as select 'z'::text < 'y'::text as b;
> ERROR:  collations are not supported by type boolean

I'm my original idea, any data type was collatable, since I considered
ASC and DESC, NULLS FIRST/LAST to be collations every datatype had.
Thus the above wasn't an error. As long as the collation was a
collation appropriate for booleans it worked.

> There are basically two things we could do about this:
>
> 1. Add two fields not one to nodes representing function/operator calls.
>
> 2. Change exprCollation() to do a type_is_collatable check on the
> node result type before believing that the collid field is relevant.

It might be worthwhile adding an extra field, but I think I didn't do
it because you only need the information exactly once, while descending
the parse tree in parse_expr. But for clarity the extra field is a
definite win.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
>                                       - Charles de Gaulle

Re: FuncExpr.collid/OpExpr.collid unworkably serving double duty

From

Tom Lane

Date:

10 March 2011, 11:34:11

Martijn van Oosterhout <kleptog@svana.org> writes:
> On Wed, Mar 09, 2011 at 04:49:28PM -0500, Tom Lane wrote:
>> There are basically two things we could do about this:
>> 
>> 1. Add two fields not one to nodes representing function/operator calls.
>> 
>> 2. Change exprCollation() to do a type_is_collatable check on the
>> node result type before believing that the collid field is relevant.

> It might be worthwhile adding an extra field, but I think I didn't do
> it because you only need the information exactly once, while descending
> the parse tree in parse_expr. But for clarity the extra field is a
> definite win.

Hmm.  That suggests a third solution: revert the addition of *all* the
collid fields except the ones that represent collation-to-apply-during-
function-execution.  (So they'd still be there in FuncExpr/OpExpr, but
not most other places.)  Then we'd have to dig down more deeply in the
expression tree during select_common_collation, but we'd save space
and avoid confusion over the meaning of the fields.

I suspect this is probably not a good idea because of the added cost in
select_common_collation: aside from probably needing more syscache
lookups, there's a potential for worse-than-linear cost behavior if we
have to repeatedly dig through a deep expression tree to find out
collations.  We had a similar case in the past [ checks archives ... see
http://archives.postgresql.org/pgsql-performance/2005-06/msg00075.php
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=ba4200246
] so I'm hesitant to go down that road again.  Still, I'll throw it out
for comment.
        regards, tom lane

Re: FuncExpr.collid/OpExpr.collid unworkably serving double duty

From

Martijn van Oosterhout

Date:

10 March 2011, 17:18:50

On Thu, Mar 10, 2011 at 10:34:00AM -0500, Tom Lane wrote:
> Hmm.  That suggests a third solution: revert the addition of *all* the
> collid fields except the ones that represent collation-to-apply-during-
> function-execution.  (So they'd still be there in FuncExpr/OpExpr, but
> not most other places.)  Then we'd have to dig down more deeply in the
> expression tree during select_common_collation, but we'd save space
> and avoid confusion over the meaning of the fields.

Yeah, it occurred to me if you made each collate clause translate to a
collate node that changes the collation, a bit like casts, then the
parse nodes don't need to know about collation at all.

> I suspect this is probably not a good idea because of the added cost in
> select_common_collation: aside from probably needing more syscache
> lookups, there's a potential for worse-than-linear cost behavior if we
> have to repeatedly dig through a deep expression tree to find out
> collations.  We had a similar case in the past [ checks archives ... see
> http://archives.postgresql.org/pgsql-performance/2005-06/msg00075.php
> http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=ba4200246
> ] so I'm hesitant to go down that road again.  Still, I'll throw it out
> for comment.

Two things can make a difference here:

- If you knew which operators/functions cared about the collation, the cost could be manageable. We don't so...

- ISTM that in theory any algorithm that is defined by recursion at each node, should be calculatable via a single pass
ofthe tree by something like parse_expr. That's essentially what the variables are doing in the Expr nodes, though
whetheryou need one or two is ofcourse another question. 

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
>                                       - Charles de Gaulle

Re: FuncExpr.collid/OpExpr.collid unworkably serving double duty

From

Tom Lane

Date:

10 March 2011, 18:08:55

Martijn van Oosterhout <kleptog@svana.org> writes:
> On Thu, Mar 10, 2011 at 10:34:00AM -0500, Tom Lane wrote:
>> I suspect this is probably not a good idea because of the added cost in
>> select_common_collation: aside from probably needing more syscache
>> lookups, there's a potential for worse-than-linear cost behavior if we
>> have to repeatedly dig through a deep expression tree to find out
>> collations.

> Two things can make a difference here:

> - If you knew which operators/functions cared about the collation, the
>   cost could be manageable. We don't so...

Yeah, the possibility of skipping select_common_collation altogether for
most operators is pretty attractive.  Maybe we'll get to that before
we're done, but I don't want to assume it'll be done for 9.1.

> - ISTM that in theory any algorithm that is defined by recursion at
>   each node, should be calculatable via a single pass of the tree by
>   something like parse_expr. That's essentially what the variables are
>   doing in the Expr nodes, though whether you need one or two is
>   ofcourse another question.

We could do that if we were willing to go back and fill in the collation
fields after the whole expression tree is built.  If you want to fill in
at the time the FuncExpr/OpExpr is first built, then you will get O(N^2)
behavior from repeated calculations in a deep tree if you don't cache
the results for the lower levels.  Which is what the output-collation
fields would do for us.

A post-pass is not out of the question, but it's enough unlike
everything else the parser does that I'm not too thrilled about it.

Also, there's the issue that started the whole discussion, which is that
sometimes we *do* need to know, post-parse-analysis, what the result
collation of an expression tree is.  See CREATE VIEW.  If that's the
*only* thing that ever needed to know it, I wouldn't mind accepting a
double calculation of the collation for CREATE VIEW ... but somehow it
doesn't seem real likely that no other uses for the information will
emerge, and some of them might be more performance-critical.
        regards, tom lane

Re: FuncExpr.collid/OpExpr.collid unworkably serving double duty

From

Tom Lane

Date:

10 March 2011, 18:17:02

I wrote:
> A post-pass is not out of the question, but it's enough unlike
> everything else the parser does that I'm not too thrilled about it.

On the other hand ... one thing that's been bothering me is that
select_common_collation assumes that "explicit" collation derivation
doesn't bubble up in the tree, ie a COLLATE is only a forcing function
for the immediate parent expression node.  It's not at all clear to me
that that's a correct reading of the spec.  If it's not, the only way
we could make it work correctly in the current design is to keep
*two* additional fields, both the collation OID and an explicit/implicit
derivation flag.  Which would be well past the level of annoying.
But in a post-pass implementation it would be no great trouble to do
either one, and we'd not be looking at a forced initdb to change our
minds either.

Maybe a post-pass, with only collation-to-apply fields actually stored
in the tree, is the way to go.

Comments?
        regards, tom lane

Re: FuncExpr.collid/OpExpr.collid unworkably serving double duty

From

Martijn van Oosterhout

Date:

10 March 2011, 18:31:58

On Thu, Mar 10, 2011 at 05:16:52PM -0500, Tom Lane wrote:
> On the other hand ... one thing that's been bothering me is that
> select_common_collation assumes that "explicit" collation derivation
> doesn't bubble up in the tree, ie a COLLATE is only a forcing function
> for the immediate parent expression node.  It's not at all clear to me
> that that's a correct reading of the spec.  If it's not, the only way
> we could make it work correctly in the current design is to keep
> *two* additional fields, both the collation OID and an explicit/implicit
> derivation flag.  Which would be well past the level of annoying.
> But in a post-pass implementation it would be no great trouble to do
> either one, and we'd not be looking at a forced initdb to change our
> minds either.

I beleive the current interpretation, that is the COLLATE only applies
to levels above, is the correct interpretation. COLLATE binds tightly,
so

A op B COLLATE C  parses as  A op (B COLLATE C)

which is why it works. It's actually the only way that makes sense,
otherwise it becomes completely impossible to specify different
collations for a function and its return value.

For example in your example with a view:

CREATE VIEW foo AS func(x COLLATE A) COLLATE B;

B is the collation for the output column, A is the collation for the
function.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
>                                       - Charles de Gaulle

Re: FuncExpr.collid/OpExpr.collid unworkably serving double duty

From

Tom Lane

Date:

10 March 2011, 18:51:44

Martijn van Oosterhout <kleptog@svana.org> writes:
> On Thu, Mar 10, 2011 at 05:16:52PM -0500, Tom Lane wrote:
>> On the other hand ... one thing that's been bothering me is that
>> select_common_collation assumes that "explicit" collation derivation
>> doesn't bubble up in the tree, ie a COLLATE is only a forcing function
>> for the immediate parent expression node.  It's not at all clear to me
>> that that's a correct reading of the spec.

> I beleive the current interpretation, that is the COLLATE only applies
> to levels above, is the correct interpretation. COLLATE binds tightly,
> so

> A op B COLLATE C  parses as  A op (B COLLATE C)

> which is why it works.

No, that's not what I'm on about.  Consider
(((A COLLATE X) || B) || (C COLLATE Y)) < (D COLLATE Z)

(I've spelled out the parenthesization in full for clarity, but most
of these parens could be omitted.)  Is this expression legal, or
should the "<" operator be throwing an error for conflicting
explicitly-derived collations?  Our code as it stands will take it,
because no individual operator sees more than one COLLATE among its
arguments.  But I'm not sure this is right.  The only text I can find
in SQL2008 that seems to bear on the point is in 4.2.2:
Anything that has a declared type can, if that type is acharacter string type, be associated with a collation
applicabletoits character set; this is known as a declared typecollation. Every declared type that is a character
stringtypehas a collation derivation, this being either none, implicit, orexplicit. The collation derivation of a
declaredtype with adeclared type collation that is explicitly or implicitlyspecified by a <data type> is implicit. If
thecollationderivation of a declared type that has a declared type collationis not implicit, then it is explicit. The
collationderivationof an expression of character string type that has no declaredtype collation is none.

As I read this, the collation attached to any Var clause is implicit
(because it came from the Var's data type), and the collation attached
to a CollateClause is presumably explicit, but where does it say what
happens at higher levels in the expression tree?  It's at least arguable
that the result collation of an expression is explicit if its input
collation was explicit.  The fact that the default in case of doubt
apparently is supposed to be "explicit" doesn't give any aid or comfort
to your position either.  If explicitness comes only from the immediate
use of COLLATE, why don't they say that?  This is worded to make one
think that most cases will have explicit derivation, not only COLLATE.

I wonder if anyone can check the behavior of nested collate clauses in
DB2 or some other probably-spec-conforming database.
        regards, tom lane

Re: FuncExpr.collid/OpExpr.collid unworkably serving double duty

From

Martijn van Oosterhout

Date:

11 March 2011, 17:20:37

On Thu, Mar 10, 2011 at 05:51:31PM -0500, Tom Lane wrote:
> No, that's not what I'm on about.  Consider
>
>     (((A COLLATE X) || B) || (C COLLATE Y)) < (D COLLATE Z)
>
> (I've spelled out the parenthesization in full for clarity, but most
> of these parens could be omitted.)  Is this expression legal, or
> should the "<" operator be throwing an error for conflicting
> explicitly-derived collations?  Our code as it stands will take it,
> because no individual operator sees more than one COLLATE among its
> arguments.  But I'm not sure this is right.  The only text I can find
> in SQL2008 that seems to bear on the point is in 4.2.2:

The rules are essentially as described here:

http://msdn.microsoft.com/en-us/library/ms179886.aspx

So:

(A COLLATE X)     => collation X
((A COLLATE X) || B)   => collation X
(((A COLLATE X) || B) || (C COLLATE Y))  => error

If we aren't erroring on this then we're doing it wrong. The whole
point of going through the parse tree and assigning a collation to each
node is to catch these things.

> As I read this, the collation attached to any Var clause is implicit
> (because it came from the Var's data type), and the collation attached
> to a CollateClause is presumably explicit, but where does it say what
> happens at higher levels in the expression tree?  It's at least arguable
> that the result collation of an expression is explicit if its input
> collation was explicit.  The fact that the default in case of doubt
> apparently is supposed to be "explicit" doesn't give any aid or comfort
> to your position either.  If explicitness comes only from the immediate
> use of COLLATE, why don't they say that?  This is worded to make one
> think that most cases will have explicit derivation, not only COLLATE.

See 9.3 "Data types of results of aggregations" clause (ii). It
contains essentially the rules outlined by the Transact-SQL page above.
   The collation derivation and declared type collation of the result are   determined as follows.   Case:   1) If some
datatype in DTS has an explicit collation derivation and   declared type collation   EC1, then every data type in DTS
thathas an explicit collation   derivation shall have a declared   type collation that is EC1. The collation derivation
isexplicit and   the collation is EC1.   2) If every data type in DTS has an implicit collation derivation, then
Case:  A) If every data type in DTS has the same declared type collation IC1,   then the collation   derivation is
implicitand the declared type collation is IC1.   B) Otherwise, the collation derivation is none.   3) Otherwise, the
collationderivation is none. 

In my implementation I needed to expand this to the general set of
operators postgresql supported and relaxed this to only consider
arguments to the function/operator that had the same type as the
resulting type of the function/operator, since that's the only thing
that makes sense.

A concatination then requires its arguments to be compatible. A substr
has the collation of its sole string argument.

I hope this helps,

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
>                                       - Charles de Gaulle

Re: FuncExpr.collid/OpExpr.collid unworkably serving double duty

From

Tom Lane

Date:

11 March 2011, 17:56:41

Martijn van Oosterhout <kleptog@svana.org> writes:
> On Thu, Mar 10, 2011 at 05:51:31PM -0500, Tom Lane wrote:
>> No, that's not what I'm on about.  Consider
>> 
>> (((A COLLATE X) || B) || (C COLLATE Y)) < (D COLLATE Z)

> The rules are essentially as described here:
> http://msdn.microsoft.com/en-us/library/ms179886.aspx

Hmm ... that's an interesting document, but I'm not at all sure that
it intends to describe the same rules that are in the SQL standard.
In particular I don't believe that their notion of coercible-default
matches the standard.

> In my implementation I needed to expand this to the general set of
> operators postgresql supported and relaxed this to only consider
> arguments to the function/operator that had the same type as the
> resulting type of the function/operator, since that's the only thing
> that makes sense.

Yeah, that corresponds to Transact-SQL's distinction between functions
that take a string and produce a string, versus those that produce a
string without any string inputs.  But I don't see anything justifying
such a distinction in the text of the standard.

Also note that the TSQL doc explicitly points out that collation labels
can be carried up through changes of character string types, so I think
you're wrong to say that collation is only carried through functions that
produce exactly the same type as their input.  I'd say collation labels
propagate through any function that has both collatable input(s) and a
collatable result type.

In any case, I am sure that that what this describes is not what our
current code does :-(, and that we can't get anywhere close to this with
the existing infrastructure.  So at this point I'm thinking that the
safest approach is to rip out the result-collation caching fields and
perform collation assignment in a parsing post-pass.  That will allow us
to revise the collation assignment algorithm without further catversion
bumps.
        regards, tom lane

Re: FuncExpr.collid/OpExpr.collid unworkably serving double duty

From

Martijn van Oosterhout

Date:

12 March 2011, 11:23:41

On Fri, Mar 11, 2011 at 04:56:31PM -0500, Tom Lane wrote:
> > In my implementation I needed to expand this to the general set of
> > operators postgresql supported and relaxed this to only consider
> > arguments to the function/operator that had the same type as the
> > resulting type of the function/operator, since that's the only thing
> > that makes sense.
>
> Yeah, that corresponds to Transact-SQL's distinction between functions
> that take a string and produce a string, versus those that produce a
> string without any string inputs.  But I don't see anything justifying
> such a distinction in the text of the standard.

I remember being bugged by the fact that the standard indeed said
nothing (that I could find) about the results of functions that
accepted something other than strings.

> Also note that the TSQL doc explicitly points out that collation labels
> can be carried up through changes of character string types, so I think
> you're wrong to say that collation is only carried through functions that
> produce exactly the same type as their input.  I'd say collation labels
> propagate through any function that has both collatable input(s) and a
> collatable result type.

Yes, this is the most logical and useful interpretation. Collations for
all the string types are essentially compatable so can be treated as
equivalent for this purpose.

(My bit about same type at input is due to my idea of extending
collation to more than just strings. I don't beleive the current patch
supports that anyway. I intended to treat all string types as
equivalent.

> In any case, I am sure that that what this describes is not what our
> current code does :-(, and that we can't get anywhere close to this with
> the existing infrastructure.  So at this point I'm thinking that the
> safest approach is to rip out the result-collation caching fields and
> perform collation assignment in a parsing post-pass.  That will allow us
> to revise the collation assignment algorithm without further catversion
> bumps.

Are you sure? I've grabbed the patch and an looking at it. It looks to
me like it is parse_expr properly combining the collations of the
subnodes, just like it is doing the type determination.

Hmm, but select_common_collation looks a bit dodgy indeed. For example,
whether a collation is explicit or not depends on the type of the
subnode. That's not right at all. During the parse phase you must not
only track the collation but also the state (default, implicit,
explicit, error, none). Otherwise you indeed get the issue where you
don't throw errors at the right time.

I had a collatestate object:

+       Oid             colloid;                /* OID of collation */
+       CollateType     colltype;               /* Type of Collation */

Where CollateType is IMPLICIT, EXPLICIT, NONE or DEFAULT.

And then I had the function below to handle the combining. This I think
handles the collation rules correctly, but I don't think it can be
shoehorned into the current patch.

Hope this helps with your review.

Have a nice day,

+ /* This logic is dictated by the SQL standard */
+ CollateState *
+ CombineCollateState( CollateState *arg1, CollateState *arg2 )
+ {
+       /* If either argument is coerable (the default), return the other */
+       if( !arg1 || arg1->colltype == COLLATE_COERCIBLE )
+               return arg2;
+       if( !arg2 || arg2->colltype == COLLATE_COERCIBLE )
+               return arg1;
+       /* If both are explicit, they must be the same */
+       if( arg1->colltype == COLLATE_EXPLICIT && arg2->colltype == COLLATE_EXPLICIT )
+       {
+               if( arg1->colloid != arg2->colloid )
+                       ereport( ERROR, (errmsg("Conflicting COLLATE clauses")));
+               /* Both same, doesn't matter which we return */
+               return arg1;
+       }
+       /* Otherwise, explicit overrides anything */
+       if( arg1->colltype == COLLATE_EXPLICIT )
+               return arg1;
+       if( arg2->colltype == COLLATE_EXPLICIT )
+               return arg2;
+       /* COLLATE_NONE can only be overridden by EXPLICIT */
+       if( arg1->colltype == COLLATE_NONE )
+               return arg1;
+       if( arg2->colltype == COLLATE_NONE )
+               return arg2;
+
+       /* We only get here if both are implicit */
+       /* Same locale, return that implicitly */
+       if( arg1->colloid == arg2->colloid )
+               return arg1;
+
+       /* Conflicting implicit collation, return NONE */
+       {
+               CollateState *result = makeNode(CollateState);
+               result->colltype = COLLATE_NONE;
+               result->colloid = InvalidOid;
+               return result;
+       }
+ }
+

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
>                                       - Charles de Gaulle

Re: FuncExpr.collid/OpExpr.collid unworkably serving double duty

From

Tom Lane

Date:

14 March 2011, 14:49:33

I wrote:
> In any case, I am sure that that what this describes is not what our
> current code does :-(, and that we can't get anywhere close to this with
> the existing infrastructure.  So at this point I'm thinking that the
> safest approach is to rip out the result-collation caching fields and
> perform collation assignment in a parsing post-pass.  That will allow us
> to revise the collation assignment algorithm without further catversion
> bumps.

After further examination of the patch, I'm feeling that removing the
result-collation fields isn't going to be very practical after all.
The problem is that there are now exprCollation() calls all over the
place, many of them in performance-critical places, and there is no
way that exprCollation() will be a cheap operation if we remove the
result-collation fields.  I had thought that we could avoid this if
we still stored collation in a few critical places such as TargetEntry.
But that only appears to take care of some of the call sites, and anyway
it'd be a little weird to store collation there when we don't store
column type there.  For better or worse, the system is designed around
the assumption that expression trees are self-identifying as to output
type and typmod, so I think we've now got to include result collation
in that too.

This implies further bloat in expression trees of course.  By my count,
the following node types also need to store input collation, so will now
have 2 collation fields not
one:AggrefWindowFuncFuncExprOpExprDistinctExprNullIfExprScalarArrayOpExprRowCompareExprMinMaxExpr

(No, wait, DistinctExpr and RowCompareExpr always yield boolean so
they don't need result collation fields, just the input collation.)

There are also several node types that the existing patch neglected to
add collation fields to, but really need to have it AFAICS; one pretty
obvious example being CoerceToDomain.  Actually I think we have to have
a result collation field in every node type that can possibly deliver a
result of a collatable type.

I'm still going to switch over to the post-pass collation assignment
method, because otherwise we're going to need to add collation state
fields as well to all of these node types ...
        regards, tom lane

Re: FuncExpr.collid/OpExpr.collid unworkably serving double duty

From

Peter Eisentraut

Date:

15 March 2011, 17:01:34

On tor, 2011-03-10 at 17:16 -0500, Tom Lane wrote:
> On the other hand ... one thing that's been bothering me is that
> select_common_collation assumes that "explicit" collation derivation
> doesn't bubble up in the tree, ie a COLLATE is only a forcing function
> for the immediate parent expression node.  It's not at all clear to me
> that that's a correct reading of the spec.  If it's not, the only way
> we could make it work correctly in the current design is to keep
> *two* additional fields, both the collation OID and an
> explicit/implicit
> derivation flag.  Which would be well past the level of annoying.

That's correct.  I didn't finish implementing that yet because I didn't
have a good solution for the annoyance bit.  The patch I proposed early
on that would have grouped type+typmod+collation into a separate Node
would have provided a simple solution but did not go through.  My
assumption was that this issue was not critical to the core feature and
could be solved later.

Re: FuncExpr.collid/OpExpr.collid unworkably serving double duty

From

Tom Lane

Date:

15 March 2011, 17:16:15

Peter Eisentraut <peter_e@gmx.net> writes:
> On tor, 2011-03-10 at 17:16 -0500, Tom Lane wrote:
>> On the other hand ... one thing that's been bothering me is that
>> select_common_collation assumes that "explicit" collation derivation
>> doesn't bubble up in the tree, ie a COLLATE is only a forcing function
>> for the immediate parent expression node.  It's not at all clear to me
>> that that's a correct reading of the spec.  If it's not, the only way
>> we could make it work correctly in the current design is to keep
>> *two* additional fields, both the collation OID and an
>> explicit/implicit
>> derivation flag.  Which would be well past the level of annoying.

> That's correct.  I didn't finish implementing that yet because I didn't
> have a good solution for the annoyance bit.  The patch I proposed early
> on that would have grouped type+typmod+collation into a separate Node
> would have provided a simple solution but did not go through.  My
> assumption was that this issue was not critical to the core feature and
> could be solved later.

Adding a separate Node for those fields would hardly meet the core
objection, which is the extra space needed for the storage --- in fact,
it would make that problem considerably worse.  But what did you think
of the idea of setting collations during a post-pass, so that the
collation derivation values need only be local storage during that
one recursive routine?
        regards, tom lane

Re: FuncExpr.collid/OpExpr.collid unworkably serving double duty

From

Peter Eisentraut

Date:

15 March 2011, 17:22:54

On tis, 2011-03-15 at 16:16 -0400, Tom Lane wrote:
> But what did you think of the idea of setting collations during a
> post-pass, so that the collation derivation values need only be local
> storage during that one recursive routine?

That sounds reasonable.  We do need the collation value in the
expression tree permanently, but not the collation derivation, as far as
I can tell.