Home > mailing lists

Re: Unicode Normalization - Mailing list pgsql-hackers

From	David E. Wheeler
Subject	Re: Unicode Normalization
Date	September 24, 2009 12:36:51
Msg-id	9BD6C83B-018E-4263-9EC8-33344FEDF655@kineticode.com Whole thread
In response to	Unicode Normalization ("David E. Wheeler" <david@kineticode.com>)
Responses	Re: Unicode Normalization
List	pgsql-hackers

Tree view

On Sep 24, 2009, at 6:24 AM, pg@thetdh.com wrote:

> In a context using normalization, wouldn't you typically want to  
> store a normalized-text type that could perhaps (depending on  
> locale) take advantage of simpler, more-efficient comparison  
> functions?

That might be nice, but I'd be wary of a geometric multiplication of  
text types. We already have TEXT and CITEXT; what if we had your NTEXT  
(normalized text) but I wanted it to also be case-insensitive?

> Whether you're doing INSERT/UPDATE, or importing a flat text file,  
> if you canonicalize characters and substrings of identical meaning  
> when trivial distinctions of encoding are irrelevant, you're better  
> off later.  User-invocable normalization functions by themselves  
> don't make much sense.

Well, they make sense because there's nothing else right now. It's an  
easy way to get some support in, and besides, it's mandated by the SQL  
standard.

> (If Postgres now supports binary- or mixed-binary-and-text flat  
> files, perhaps for restore purposes, the same thing applies.)

Don't follow this bit.

Best,

David

pgsql-hackers by date:

From: Marko Tiikkaja
Date: 24 September 2009, 11:23:38
Subject: Re: Using results from INSERT ... RETURNING

From: Andrew Dunstan
Date: 24 September 2009, 12:59:27
Subject: Re: Unicode Normalization

Re: Unicode Normalization - Mailing list pgsql-hackers

Previous

Next