Re: [19] Proposal: function markers to indicate collation/ctype sensitivity - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: [19] Proposal: function markers to indicate collation/ctype sensitivity
Date
Msg-id 42d4ddef-2a39-4c13-bbce-dd1026c1259f@eisentraut.org
Whole thread Raw
In response to Re: [19] Proposal: function markers to indicate collation/ctype sensitivity  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: [19] Proposal: function markers to indicate collation/ctype sensitivity
List pgsql-hackers
On 05.06.25 21:56, Jeff Davis wrote:
> On Thu, 2025-06-05 at 10:12 +0200, Peter Eisentraut wrote:
>> The reason we don't do it at parse time is that we don't have the
>> information which functions care about collations, which is exactly
>> what
>> you are proposing here to add.
> 
> Currently, we have:
> 
>     create table c(x text collate "C", y text collate "en_US");
>     insert into c values ('x', 'y');
>     select x < y from c; -- fails (runtime check)
>     select x || y from c; -- succeeds
> 
> Surely, "<" would be marked as ordering-sensitive, and we could move
> the error to parse-time.
> 
> But what about UDFs? If we assume that all UDFs are ordering-sensitive
> unless marked otherwise, then a user-defined version of "||" that
> previously worked would now start failing, until they add the ordering-
> insensitive mark.

I think no matter how we slice it, there is going to be some case that 
will be degraded until some update is applied.  I would be content to 
accept this particular variant, because it doesn't seem very realistic. 
Why would a user define their own concatenation function?  There already 
is one.  Unless your concatenation function does something special, in 
which case you should probably think about this collations topic.  More 
generally, there are I think only so many operations you can do on 
characters strings that you can do without considering the 
collation/ctype/etc.  These are essentially all the operations that you 
can do without looking at the characters, like length(), ||, repeat(). 
Everything beyond that looks at the characters and needs to take 
collation/ctype/etc. into account.

> We'd need some kind of migration path where we could retain the runtime
> checks and disable the parse time checks until people have a chance to
> add the right marks to their UDFs. Migration paths like that are not
> great because they take several releases to work out, and we're never
> quite sure when to finally remove the deprecated behavior.

Perhaps pg_dump can apply some properties during upgrades?

> If we make the opposite assumption, that none are ordering-sensitive
> unless we mark them so, that would allow properly-marked functions to
> fail at parse time, and the rest to fail at runtime. But this
> assumption doesn't work as well for recording dependencies, because
> we'd miss the dependencies for UDFs that aren't properly marked.

That feels like the worst of both worlds.




pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [19] Proposal: function markers to indicate collation/ctype sensitivity
Next
From: Peter Eisentraut
Date:
Subject: Re: [19] Proposal: function markers to indicate collation/ctype sensitivity