Home > mailing lists

Re: PATCH: CITEXT 2.0 - Mailing list pgsql-hackers

From	Zdenek Kotala
Subject	Re: PATCH: CITEXT 2.0
Date	July 7, 2008 16:50:06
Msg-id	487272AF.5070002@sun.com Whole thread Raw
In response to	Re: PATCH: CITEXT 2.0 ("David E. Wheeler" <david@kineticode.com>)
Responses	Re: PATCH: CITEXT 2.0
List	pgsql-hackers

Tree view

David E. Wheeler napsal(a):
> On Jul 7, 2008, at 12:21, David E. Wheeler wrote:
> 
>> My question is: why? Shouldn't they all use the same function for 
>> comparison? I'm happy to dupe this implementation for citext, but I 
>> don't understand it. Should not all comparisons be executed consistently?
> 
> Let me try to answer my own question by citing this comment:
> 
>     /*
>      * Since we only care about equality or not-equality, we can avoid 
> all the
>      * expense of strcoll() here, and just do bitwise comparison.
>      */
> 
> So, the upshot is that the = and <> operators are not locale-aware, yes? 
> They just do byte comparisons. Is that really the way it should be? I 
> mean, could there not be strings that are equivalent but have different 
> bytes?

Correct. The problem is complex. It works fine only for normalized string. But 
postgres now assume that all utf8 strings are normalized.

If you need to implement < <= >= > operators you need to use strcol which take 
care of locale collation.

See unicode collation algorithm http://www.unicode.org/reports/tr10/
    Zdenek




-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql

pgsql-hackers by date:

From: "Pavel Stehule"
Date: 07 July 2008, 16:49:02
Subject: Re: PATCH: CITEXT 2.0

From: Tom Lane
Date: 07 July 2008, 17:10:33
Subject: Re: PATCH: CITEXT 2.0

Re: PATCH: CITEXT 2.0 - Mailing list pgsql-hackers

Previous

Next