Re: PATCH: CITEXT 2.0 - Mailing list pgsql-hackers

From David E. Wheeler
Subject Re: PATCH: CITEXT 2.0
Date
Msg-id 8E2D49F2-E366-4504-9428-0AB6F35468FA@kineticode.com
Whole thread Raw
In response to Re: PATCH: CITEXT 2.0  (Zdenek Kotala <Zdenek.Kotala@Sun.COM>)
List pgsql-hackers
On Jul 7, 2008, at 12:46, Zdenek Kotala wrote:

>> So, the upshot is that the = and <> operators are not locale-aware,  
>> yes? They just do byte comparisons. Is that really the way it  
>> should be? I mean, could there not be strings that are equivalent  
>> but have different bytes?
>
> Correct. The problem is complex. It works fine only for normalized  
> string. But postgres now assume that all utf8 strings are normalized.

I see. So binary equivalence is okay, in that case.

> If you need to implement < <= >= > operators you need to use strcol  
> which take care of locale collation.

Which varstr_cmp() does, I guess. It's what textlt uses, for example.

> See unicode collation algorithm http://www.unicode.org/reports/tr10/

Wow, that looks like a fun read.

Best,

David



pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: PATCH: CITEXT 2.0
Next
From: Gregory Stark
Date:
Subject: Re: PATCH: CITEXT 2.0