Re: PATCH: CITEXT 2.0 v4 - Mailing list pgsql-hackers

From David E. Wheeler
Subject Re: PATCH: CITEXT 2.0 v4
Date
Msg-id 65071C7E-0CE0-489C-B294-D1C6DAD49D16@kineticode.com
Whole thread Raw
In response to Re: PATCH: CITEXT 2.0 v4  (Michael Paesold <mpaesold@gmx.at>)
List pgsql-hackers
On Jul 18, 2008, at 01:39, Michael Paesold wrote:

> Calling regex functions with the case-insensitivity option would be  
> great. It should also be possible to rewrite replace() into  
> regexp_replace() by first escaping the regex meta characters.
>
> Actually re-implementing those functions in a case insensitive way  
> would still be an option, but of course some amount of work. The  
> question is, how much use case there is.

I've figured out how to make all the functions work using SQL function  
workarounds, converting things and re-dispatching to the text versions  
as appropriate. They work quite well, and can be converted to C later  
if that becomes a requirement.

Meanwhile, on the question of whether or not regular expression and  
LIKE comparisons *should* match case-insensitively, I have a couple  
more observations:

* Thinking about how a true case-insensitive collation would work, I'm  
quite certain that it would match case-insensitively. Anything else  
would just be unexpected, because in a case-insensitive collation,  
lowercase characters are, in practice, identical to uppercase  
characters. As far as matching is concerned, there is no difference  
between them. So the matching operators and functions against CITEXT  
should follow that assumption.

* I tried a few matches on MySQL, where the collation is case- 
insensitive by default, and it confirms my impression:

mysql> select 'Foo' regexp 'o$';
+-------------------+
| 'Foo' regexp 'o$' |
+-------------------+
|                 1 |
+-------------------+
1 row in set (0.00 sec)

mysql> select 'Foo' regexp 'O$';
+-------------------+
| 'Foo' regexp 'O$' |
+-------------------+
|                 1 |
+-------------------+
1 row in set (0.00 sec)

mysql> select 'Foo' like '%o';
+-----------------+
| 'Foo' like '%o' |
+-----------------+
|               1 |
+-----------------+
1 row in set (0.00 sec)

mysql> select 'Foo' like '%O';
+-----------------+
| 'Foo' like '%O' |
+-----------------+
|               1 |
+-----------------+
1 row in set (0.00 sec)

I'll grant that MySQL may not be the best model for how things should  
work, but it's something, at least. Anyone else got access to another  
database with case-insensitive collations to see how LIKE and regular  
expressions work?

Thanks,

David


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Postgres-R: primary key patches
Next
From: Tom Lane
Date:
Subject: Re: Plans for 8.4