Home > mailing lists

Re: PATCH: CITEXT 2.0 v4 - Mailing list pgsql-hackers

From	David E. Wheeler
Subject	Re: PATCH: CITEXT 2.0 v4
Date	July 22, 2008 21:17:10
Msg-id	65071C7E-0CE0-489C-B294-D1C6DAD49D16@kineticode.com Whole thread Raw
In response to	Re: PATCH: CITEXT 2.0 v4 (Michael Paesold <mpaesold@gmx.at>)
List	pgsql-hackers

Tree view

On Jul 18, 2008, at 01:39, Michael Paesold wrote:

> Calling regex functions with the case-insensitivity option would be  
> great. It should also be possible to rewrite replace() into  
> regexp_replace() by first escaping the regex meta characters.
>
> Actually re-implementing those functions in a case insensitive way  
> would still be an option, but of course some amount of work. The  
> question is, how much use case there is.

I've figured out how to make all the functions work using SQL function  
workarounds, converting things and re-dispatching to the text versions  
as appropriate. They work quite well, and can be converted to C later  
if that becomes a requirement.

Meanwhile, on the question of whether or not regular expression and  
LIKE comparisons *should* match case-insensitively, I have a couple  
more observations:

* Thinking about how a true case-insensitive collation would work, I'm  
quite certain that it would match case-insensitively. Anything else  
would just be unexpected, because in a case-insensitive collation,  
lowercase characters are, in practice, identical to uppercase  
characters. As far as matching is concerned, there is no difference  
between them. So the matching operators and functions against CITEXT  
should follow that assumption.

* I tried a few matches on MySQL, where the collation is case- 
insensitive by default, and it confirms my impression:

mysql> select 'Foo' regexp 'o$';
+-------------------+
| 'Foo' regexp 'o$' |
+-------------------+
|                 1 |
+-------------------+
1 row in set (0.00 sec)

mysql> select 'Foo' regexp 'O$';
+-------------------+
| 'Foo' regexp 'O$' |
+-------------------+
|                 1 |
+-------------------+
1 row in set (0.00 sec)

mysql> select 'Foo' like '%o';
+-----------------+
| 'Foo' like '%o' |
+-----------------+
|               1 |
+-----------------+
1 row in set (0.00 sec)

mysql> select 'Foo' like '%O';
+-----------------+
| 'Foo' like '%O' |
+-----------------+
|               1 |
+-----------------+
1 row in set (0.00 sec)

I'll grant that MySQL may not be the best model for how things should  
work, but it's something, at least. Anyone else got access to another  
database with case-insensitive collations to see how LIKE and regular  
expressions work?

Thanks,

David

pgsql-hackers by date:

From: Tom Lane
Date: 22 July 2008, 21:14:09
Subject: Re: Postgres-R: primary key patches

From: Tom Lane
Date: 22 July 2008, 21:18:37
Subject: Re: Plans for 8.4

Re: PATCH: CITEXT 2.0 v4 - Mailing list pgsql-hackers

Previous

Next