Thread: citext function overloads for text parameters
Hi hackers.
The following works well of course:
test=# select strpos('Aa'::citext, 'a');
strpos
--------
1
However, if I pass a typed text parameter for the substring, I get case-sensitive behavior instead:
test=# select strpos('Aa'::citext, 'a'::text);
strpos
--------
2
This seems like surprising behavior - my expectation was that the first parameter being citext would be enough to trigger case-insensitive behavior. The same may be happening with other string functions (e.g. regexp_matches). This is causing some difficulties in a real scenario where SQL and parameters are getting generated by an O/RM, and changing them isn't trivial.
Do the above seem like problematic behavior like it does to me, or is it the expected behavior?
Shay
2018-05-06 8:26 GMT+02:00 Shay Rojansky <roji@roji.org>:
Hi hackers.The following works well of course:test=# select strpos('Aa'::citext, 'a');strpos--------1However, if I pass a typed text parameter for the substring, I get case-sensitive behavior instead:test=# select strpos('Aa'::citext, 'a'::text);strpos--------2This seems like surprising behavior - my expectation was that the first parameter being citext would be enough to trigger case-insensitive behavior. The same may be happening with other string functions (e.g. regexp_matches). This is causing some difficulties in a real scenario where SQL and parameters are getting generated by an O/RM, and changing them isn't trivial.Do the above seem like problematic behavior like it does to me, or is it the expected behavior?
This is expected - it is side effect of PostgreSQL implementation of function overloading and type conversions
after installation citext, you will have more instances of function strpos
strpos(citext, citext)
strpos(text, text)
the call strpos('aa'::citext, 'a') is effective strpos('aa'::citext, 'a'::unknown) and that strpos(citext, citext) can be used in this case.
strpos('aa'::citext, 'a'::text) is ambiguous (both functions can be used with necessary conversion - cast citext<->text is available), and usually it fails with related error message - but there is a exception - the text type is PREFERRED - what means, so strpost(text, text) is selected.
PostgreSQL type system is very generic and works almost well, but sometimes there can be unwanted effects when some functions are overloaded. In this case is better to implement own instance of unique function and use only it.
some like
create or replace function strpos_ci(text, text) returns int as $$ select strpos($1::citext, $2::citext) $$ language sql;
create or replace function strpos_ci(citext, citext) returns int as $$ select strpos($1, $1) $$ language sql;
Regards
Pavel
Shay
Thanks for your answer Pavel.
This is expected - it is side effect of PostgreSQL implementation of function overloading and type conversionsafter installation citext, you will have more instances of function strposstrpos(citext, citext)strpos(text, text)
Do you think it would be appropriate to simply add an strpos(citext, text) overload to the extension to make sure this behaves more as expected? If so I can try to submit a patch at some point.
2018-05-06 14:55 GMT+02:00 Shay Rojansky <roji@roji.org>:
Thanks for your answer Pavel.This is expected - it is side effect of PostgreSQL implementation of function overloading and type conversionsafter installation citext, you will have more instances of function strposstrpos(citext, citext)strpos(text, text)Do you think it would be appropriate to simply add an strpos(citext, text) overload to the extension to make sure this behaves more as expected? If so I can try to submit a patch at some point.
It has sense in this case. But I have not a idea if there can be some more broken.
regards
Pavel
Shay Rojansky <roji@roji.org> writes: > Do you think it would be appropriate to simply add an strpos(citext, text) > overload to the extension to make sure this behaves more as expected? If so > I can try to submit a patch at some point. To me, if there's both a citext and a text parameter, then it's simply unclear which behavior is wanted; I do not think there's a principled argument that it should be case-insensitive. Thus, adding a function to reverse the current interpretation would amount to breaking some currently-working apps to favor others that currently don't work. Doesn't really sound like a net gain. regards, tom lane
> Do you think it would be appropriate to simply add an strpos(citext, text)
> overload to the extension to make sure this behaves more as expected? If so
> I can try to submit a patch at some point.
To me, if there's both a citext and a text parameter, then it's simply
unclear which behavior is wanted; I do not think there's a principled
argument that it should be case-insensitive. Thus, adding a function
to reverse the current interpretation would amount to breaking some
currently-working apps to favor others that currently don't work.
Doesn't really sound like a net gain.
Thanks for the input. It's worth noting that the equality operator currently works in the same way: citext = text comparison is (surprisingly for me) case-sensitive.
My expectation was that since citext is supposed to be a case-insensitive *type*, all comparison operations involving it should be case-insensitive; at least there seems to be more value in things being this way. But if that doesn't sound like a convincing/principled argument this can be dropped (and I do agree that the breakage may not be worth it).
On Sunday, May 6, 2018, Shay Rojansky <roji@roji.org> wrote:
Thanks for the input. It's worth noting that the equality operator currently works in the same way: citext = text comparison is (surprisingly for me) case-sensitive.My expectation was that since citext is supposed to be a case-insensitive *type*, all comparison operations involving it should be case-insensitive;
Comparison requires both things to be the same type. The rules for implicitly converting one type to another prefer the core type text over the extension type citext.
IOW, there is no such operator =(citext,text) and thus "citext = text comparison" is technically invalid.
At this point we're sorta stuck with our choice, and while individual databases can implement their own functions and operators there is value in doing things the way the system provides to minimize future confusion and bugs.
David J.
Thanks for the input. It's worth noting that the equality operator currently works in the same way: citext = text comparison is (surprisingly for me) case-sensitive.My expectation was that since citext is supposed to be a case-insensitive *type*, all comparison operations involving it should be case-insensitive;Comparison requires both things to be the same type. The rules for implicitly converting one type to another prefer the core type text over the extension type citext.IOW, there is no such operator =(citext,text) and thus "citext = text comparison" is technically invalid.At this point we're sorta stuck with our choice, and while individual databases can implement their own functions and operators there is value in doing things the way the system provides to minimize future confusion and bugs.
OK, thanks for everyone's input.
Hello hanckers,
We use this simple function to workaround citext=text behavior.
create extension citext;
CREATE FUNCTION citext_eq( citext, text )
RETURNS bool
AS 'citext'
LANGUAGE C IMMUTABLE STRICT;
CREATE OPERATOR = (
LEFTARG = CITEXT,
RIGHTARG = TEXT,
COMMUTATOR = =,
NEGATOR = <>,
PROCEDURE = citext_eq,
RESTRICT = eqsel,
JOIN = eqjoinsel,
HASHES,
MERGES
);
select 'xexe'::text = 'Xexe'::citext
select 'xexe' = 'Xexe'::citext
select 'xexe'::citext = 'Xexe'
select 'xexe'::citext = 'Xexe'::text
select 'xexe'::citext = 1234::text
CREATE or replace FUNCTION ttt(t text)
RETURNS bool
AS
$$select $1 = 'Ttt'::citext $$
LANGUAGE sql;
select ttt('ttt')
We used this and other strange cases like TO_CHAR for type text for our own BI system with user defined calculations .
On Mon, May 7, 2018 at 12:09 PM, Shay Rojansky <roji@roji.org> wrote:
Thanks for the input. It's worth noting that the equality operator currently works in the same way: citext = text comparison is (surprisingly for me) case-sensitive.My expectation was that since citext is supposed to be a case-insensitive *type*, all comparison operations involving it should be case-insensitive;Comparison requires both things to be the same type. The rules for implicitly converting one type to another prefer the core type text over the extension type citext.IOW, there is no such operator =(citext,text) and thus "citext = text comparison" is technically invalid.At this point we're sorta stuck with our choice, and while individual databases can implement their own functions and operators there is value in doing things the way the system provides to minimize future confusion and bugs.OK, thanks for everyone's input.
--
--Regards, Sergey Mirvoda