Re: Collations versus user-defined functions - Mailing list pgsql-hackers

From Martijn van Oosterhout
Subject Re: Collations versus user-defined functions
Date
Msg-id 20110312183701.GC4380@svana.org
Whole thread Raw
In response to Collations versus user-defined functions  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Collations versus user-defined functions
List pgsql-hackers
On Sat, Mar 12, 2011 at 12:17:11PM -0500, Tom Lane wrote:
> I've thought of another area that AFAICT the current patch fails to
> address at all: what should happen in user-defined functions?

<snip>

The POLA suggests that the collation derivation of the original query
should not be affected by the implementation of a function. In the case
of SQL functions this means that the expansion of the function must
not change the results. This would mean introducing a CollateNode above
the SQL function after expansion, though you may be able to acheive
this by doing the collation derivation prior to expansion of the SQL
function, but I don't know if that's feasable.

(Note the introduced collate node would need to remember the collation state.)

Similarly, inside the function the parameters should be considered to
be IMPLICIT collation, to avoid strange errors depending on how its
called.

This means you can't make a set_collation function, but that doesn't
seem like a loss to me.

>     select my_lt('foo', 'bar' collate "de_DE");
>     select my_lt('foo', 'bar' collate "fr_FR");
>
> I think it's at least arguably desirable that the results of the two
> calls respond to the collation clauses, but it does not look to me
> like that will happen: plpgsql isn't doing anything to propagate
> its call-site collation value into expressions it evaluates, and
> if it did, it'd still get the wrong answer on the second call because it
> would have cached an expression plan tree containing the collation info
> from the first call.

I think you need to consider the collation to be a variation of the
type. plpgsql makes new plans for each type when dealing with any
parameters, this should fit right in.

SQL would need a recollate-label node like suggested above.

For other languages you just need to provide the info, what they do
with it is not your problem.

> What do we want to do about this?  Making it work the way it seems like
> it ought to will require a rather substantial investment of effort.
> It looks to me like the least invasive answer would be to have plpgsql
> cache different plan trees depending on the collation it receives for
> its parameters, but that's still a whole lot of work.  Does the SQL
> standard have anything to say on the matter, or is there a precedent in
> the behavior of TSQL or other DBMSes?

I can't help you with other DBs, google isn't finding me anything. But
the plpgsql problem should be done already right, given it already
handles cached plans for different types.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
>                                       - Charles de Gaulle

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: memory-related bugs
Next
From: Greg Stark
Date:
Subject: Re: Collations versus user-defined functions