Re: FuncExpr.collid/OpExpr.collid unworkably serving double duty - Mailing list pgsql-hackers

From Martijn van Oosterhout
Subject Re: FuncExpr.collid/OpExpr.collid unworkably serving double duty
Date
Msg-id 20110312152311.GA4380@svana.org
Whole thread Raw
In response to Re: FuncExpr.collid/OpExpr.collid unworkably serving double duty  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, Mar 11, 2011 at 04:56:31PM -0500, Tom Lane wrote:
> > In my implementation I needed to expand this to the general set of
> > operators postgresql supported and relaxed this to only consider
> > arguments to the function/operator that had the same type as the
> > resulting type of the function/operator, since that's the only thing
> > that makes sense.
>
> Yeah, that corresponds to Transact-SQL's distinction between functions
> that take a string and produce a string, versus those that produce a
> string without any string inputs.  But I don't see anything justifying
> such a distinction in the text of the standard.

I remember being bugged by the fact that the standard indeed said
nothing (that I could find) about the results of functions that
accepted something other than strings.

> Also note that the TSQL doc explicitly points out that collation labels
> can be carried up through changes of character string types, so I think
> you're wrong to say that collation is only carried through functions that
> produce exactly the same type as their input.  I'd say collation labels
> propagate through any function that has both collatable input(s) and a
> collatable result type.

Yes, this is the most logical and useful interpretation. Collations for
all the string types are essentially compatable so can be treated as
equivalent for this purpose.

(My bit about same type at input is due to my idea of extending
collation to more than just strings. I don't beleive the current patch
supports that anyway. I intended to treat all string types as
equivalent.

> In any case, I am sure that that what this describes is not what our
> current code does :-(, and that we can't get anywhere close to this with
> the existing infrastructure.  So at this point I'm thinking that the
> safest approach is to rip out the result-collation caching fields and
> perform collation assignment in a parsing post-pass.  That will allow us
> to revise the collation assignment algorithm without further catversion
> bumps.

Are you sure? I've grabbed the patch and an looking at it. It looks to
me like it is parse_expr properly combining the collations of the
subnodes, just like it is doing the type determination.

Hmm, but select_common_collation looks a bit dodgy indeed. For example,
whether a collation is explicit or not depends on the type of the
subnode. That's not right at all. During the parse phase you must not
only track the collation but also the state (default, implicit,
explicit, error, none). Otherwise you indeed get the issue where you
don't throw errors at the right time.

I had a collatestate object:

+       Oid             colloid;                /* OID of collation */
+       CollateType     colltype;               /* Type of Collation */

Where CollateType is IMPLICIT, EXPLICIT, NONE or DEFAULT.

And then I had the function below to handle the combining. This I think
handles the collation rules correctly, but I don't think it can be
shoehorned into the current patch.

Hope this helps with your review.

Have a nice day,

+ /* This logic is dictated by the SQL standard */
+ CollateState *
+ CombineCollateState( CollateState *arg1, CollateState *arg2 )
+ {
+       /* If either argument is coerable (the default), return the other */
+       if( !arg1 || arg1->colltype == COLLATE_COERCIBLE )
+               return arg2;
+       if( !arg2 || arg2->colltype == COLLATE_COERCIBLE )
+               return arg1;
+       /* If both are explicit, they must be the same */
+       if( arg1->colltype == COLLATE_EXPLICIT && arg2->colltype == COLLATE_EXPLICIT )
+       {
+               if( arg1->colloid != arg2->colloid )
+                       ereport( ERROR, (errmsg("Conflicting COLLATE clauses")));
+               /* Both same, doesn't matter which we return */
+               return arg1;
+       }
+       /* Otherwise, explicit overrides anything */
+       if( arg1->colltype == COLLATE_EXPLICIT )
+               return arg1;
+       if( arg2->colltype == COLLATE_EXPLICIT )
+               return arg2;
+       /* COLLATE_NONE can only be overridden by EXPLICIT */
+       if( arg1->colltype == COLLATE_NONE )
+               return arg1;
+       if( arg2->colltype == COLLATE_NONE )
+               return arg2;
+
+       /* We only get here if both are implicit */
+       /* Same locale, return that implicitly */
+       if( arg1->colloid == arg2->colloid )
+               return arg1;
+
+       /* Conflicting implicit collation, return NONE */
+       {
+               CollateState *result = makeNode(CollateState);
+               result->colltype = COLLATE_NONE;
+               result->colloid = InvalidOid;
+               return result;
+       }
+ }
+

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
>                                       - Charles de Gaulle

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Macros for time magic values
Next
From: Greg Stark
Date:
Subject: Re: template0 database comment