Thread: citext function overloads for text parameters

citext function overloads for text parameters

From
Shay Rojansky
Date:
Hi hackers.

The following works well of course:

test=# select strpos('Aa'::citext, 'a');
 strpos 
--------
      1

However, if I pass a typed text parameter for the substring, I get case-sensitive behavior instead:

test=# select strpos('Aa'::citext, 'a'::text);
 strpos 
--------
      2

This seems like surprising behavior - my expectation was that the first parameter being citext would be enough to trigger case-insensitive behavior. The same may be happening with other string functions (e.g. regexp_matches). This is causing some difficulties in a real scenario where SQL and parameters are getting generated by an O/RM, and changing them isn't trivial.

Do the above seem like problematic behavior like it does to me, or is it the expected behavior?

Shay

Re: citext function overloads for text parameters

From
Pavel Stehule
Date:


2018-05-06 8:26 GMT+02:00 Shay Rojansky <roji@roji.org>:
Hi hackers.

The following works well of course:

test=# select strpos('Aa'::citext, 'a');
 strpos 
--------
      1

However, if I pass a typed text parameter for the substring, I get case-sensitive behavior instead:

test=# select strpos('Aa'::citext, 'a'::text);
 strpos 
--------
      2

This seems like surprising behavior - my expectation was that the first parameter being citext would be enough to trigger case-insensitive behavior. The same may be happening with other string functions (e.g. regexp_matches). This is causing some difficulties in a real scenario where SQL and parameters are getting generated by an O/RM, and changing them isn't trivial.

Do the above seem like problematic behavior like it does to me, or is it the expected behavior?

This is expected - it is side effect of PostgreSQL implementation of function overloading and type conversions

after installation citext, you will have more instances of function strpos

strpos(citext, citext)
strpos(text, text)

the call strpos('aa'::citext, 'a') is effective strpos('aa'::citext, 'a'::unknown) and that strpos(citext, citext) can be used in this case.

strpos('aa'::citext, 'a'::text) is ambiguous (both functions can be used with necessary conversion - cast citext<->text is available), and usually it fails with related error message - but there is a exception - the text type is PREFERRED - what means, so strpost(text, text) is selected.

PostgreSQL type system is very generic and works almost well, but sometimes there can be unwanted effects when some functions are overloaded. In this case is better to implement own instance of unique function and use only it.

some like

create or replace function strpos_ci(text, text) returns int as $$ select strpos($1::citext, $2::citext) $$ language sql;
create or replace function strpos_ci(citext, citext) returns int as $$ select strpos($1, $1) $$ language sql;

Regards

Pavel


 

Shay

Re: citext function overloads for text parameters

From
Shay Rojansky
Date:
Thanks for your answer Pavel.

This is expected - it is side effect of PostgreSQL implementation of function overloading and type conversions

after installation citext, you will have more instances of function strpos

strpos(citext, citext)
strpos(text, text)

Do you think it would be appropriate to simply add an strpos(citext, text) overload to the extension to make sure this behaves more as expected? If so I can try to submit a patch at some point.

Re: citext function overloads for text parameters

From
Pavel Stehule
Date:


2018-05-06 14:55 GMT+02:00 Shay Rojansky <roji@roji.org>:
Thanks for your answer Pavel.

This is expected - it is side effect of PostgreSQL implementation of function overloading and type conversions

after installation citext, you will have more instances of function strpos

strpos(citext, citext)
strpos(text, text)

Do you think it would be appropriate to simply add an strpos(citext, text) overload to the extension to make sure this behaves more as expected? If so I can try to submit a patch at some point.

It has sense in this case. But I have not a idea if there can be some more broken.

regards

Pavel

Re: citext function overloads for text parameters

From
Tom Lane
Date:
Shay Rojansky <roji@roji.org> writes:
> Do you think it would be appropriate to simply add an strpos(citext, text)
> overload to the extension to make sure this behaves more as expected? If so
> I can try to submit a patch at some point.

To me, if there's both a citext and a text parameter, then it's simply
unclear which behavior is wanted; I do not think there's a principled
argument that it should be case-insensitive.  Thus, adding a function
to reverse the current interpretation would amount to breaking some
currently-working apps to favor others that currently don't work.
Doesn't really sound like a net gain.

            regards, tom lane


Re: citext function overloads for text parameters

From
Shay Rojansky
Date:
> Do you think it would be appropriate to simply add an strpos(citext, text)
> overload to the extension to make sure this behaves more as expected? If so
> I can try to submit a patch at some point.

To me, if there's both a citext and a text parameter, then it's simply
unclear which behavior is wanted; I do not think there's a principled
argument that it should be case-insensitive.  Thus, adding a function
to reverse the current interpretation would amount to breaking some
currently-working apps to favor others that currently don't work.
Doesn't really sound like a net gain.

Thanks for the input. It's worth noting that the equality operator currently works in the same way: citext = text comparison is (surprisingly for me) case-sensitive.

My expectation was that since citext is supposed to be a case-insensitive *type*, all comparison operations involving it should be case-insensitive; at least there seems to be more value in things being this way. But if that doesn't sound like a convincing/principled argument this can be dropped (and I do agree that the breakage may not be worth it).

citext function overloads for text parameters

From
"David G. Johnston"
Date:
On Sunday, May 6, 2018, Shay Rojansky <roji@roji.org> wrote:

Thanks for the input. It's worth noting that the equality operator currently works in the same way: citext = text comparison is (surprisingly for me) case-sensitive.

My expectation was that since citext is supposed to be a case-insensitive *type*, all comparison operations involving it should be case-insensitive;

Comparison requires both things to be the same type.  The rules for implicitly converting one type to another prefer the core type text over the extension type citext.

IOW, there is no such operator =(citext,text) and thus "citext = text comparison" is technically invalid.

At this point we're sorta stuck with our choice, and while individual databases can implement their own functions and operators there is value in doing things the way the system provides to minimize future confusion and bugs.

David J.

Re: citext function overloads for text parameters

From
Shay Rojansky
Date:

Thanks for the input. It's worth noting that the equality operator currently works in the same way: citext = text comparison is (surprisingly for me) case-sensitive.

My expectation was that since citext is supposed to be a case-insensitive *type*, all comparison operations involving it should be case-insensitive;

Comparison requires both things to be the same type.  The rules for implicitly converting one type to another prefer the core type text over the extension type citext.

IOW, there is no such operator =(citext,text) and thus "citext = text comparison" is technically invalid.

At this point we're sorta stuck with our choice, and while individual databases can implement their own functions and operators there is value in doing things the way the system provides to minimize future confusion and bugs.

OK, thanks for everyone's input. 

Re: citext function overloads for text parameters

From
Sergey Mirvoda
Date:
Hello hanckers,
We use this simple function to workaround citext=text behavior.
create extension citext
;

CREATE FUNCTION citext_eq( citext, text )
RETURNS bool
AS 'citext'
LANGUAGE C IMMUTABLE STRICT;

CREATE OPERATOR = (
LEFTARG = CITEXT,
RIGHTARG = TEXT,
COMMUTATOR = =,
NEGATOR = <>,
PROCEDURE = citext_eq,
RESTRICT = eqsel,
JOIN = eqjoinsel,
HASHES,
MERGES
);

select 'xexe'::text = 'Xexe'::citext
select 'xexe' = 'Xexe'::citext
select 'xexe'::citext = 'Xexe'
select 'xexe'::citext = 'Xexe'::text
select 'xexe'::citext = 1234::text


CREATE or replace FUNCTION ttt(t text)
RETURNS bool
AS
$$select $1 = 'Ttt'::citext $$
LANGUAGE sql;

select ttt('ttt')


But in general, it is wrong to compare values with different types.
We used this and other strange cases like TO_CHAR for type text  for our own BI system with user defined calculations .

 

On Mon, May 7, 2018 at 12:09 PM, Shay Rojansky <roji@roji.org> wrote:

Thanks for the input. It's worth noting that the equality operator currently works in the same way: citext = text comparison is (surprisingly for me) case-sensitive.

My expectation was that since citext is supposed to be a case-insensitive *type*, all comparison operations involving it should be case-insensitive;

Comparison requires both things to be the same type.  The rules for implicitly converting one type to another prefer the core type text over the extension type citext.

IOW, there is no such operator =(citext,text) and thus "citext = text comparison" is technically invalid.

At this point we're sorta stuck with our choice, and while individual databases can implement their own functions and operators there is value in doing things the way the system provides to minimize future confusion and bugs.

OK, thanks for everyone's input. 




--
--Regards, Sergey Mirvoda