Re: Unicode Normalization - Mailing list pgsql-hackers

From David E. Wheeler
Subject Re: Unicode Normalization
Date
Msg-id 233B7C57-2096-4C9E-9704-14D1EF2164B4@kineticode.com
Whole thread Raw
In response to Re: Unicode Normalization  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
On Sep 24, 2009, at 8:59 AM, Andrew Dunstan wrote:

>> That might be nice, but I'd be wary of a geometric multiplication  
>> of text types. We already have TEXT and CITEXT; what if we had your  
>> NTEXT (normalized text) but I wanted it to also be case-insensitive?
>
> Actually, I don't think it's necessarily a good idea at all. If a  
> user inputs a perfectly valid piece of UTF8 text, we should be able  
> to give it back to them exactly, whether or not it's in normalized  
> form. The normalized forms are useful for certain comparison  
> purposes, but they don't affect the validity of the text. CITEXT  
> doesn't mangle what is stored, just how it's compared.

Right, I don't think there's a need for a normalized TEXT type.

Best,

David


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Unicode Normalization
Next
From: Peter Eisentraut
Date:
Subject: Re: [rfc] unicode escapes for extended strings