Re: [WIP] collation support revisited (phase 1) - Mailing list pgsql-hackers

From Zdenek Kotala
Subject Re: [WIP] collation support revisited (phase 1)
Date
Msg-id 4885EF87.4020608@sun.com
Whole thread Raw
In response to Re: [WIP] collation support revisited (phase 1)  (Martijn van Oosterhout <kleptog@svana.org>)
Responses Re: [WIP] collation support revisited (phase 1)  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
Martijn van Oosterhout napsal(a):
> On Sat, Jul 12, 2008 at 10:02:24AM +0200, Zdenek Kotala wrote:
>> Background:
>> We specify encoding in initdb phase. ANSI specify repertoire, charset, 
>> encoding and collation. If I understand it correctly, then charset is 
>> subset of repertoire and specify list of allowed characters for 
>> language->collation. Encoding is mapping of character set to binary format. 
>> For example for Czech alphabet(charset) we have 6 different encoding for 
>> 8bit ASCII, but on other side for UTF8 there is specified multi charsets.
> 
> Oh, so you're thinking of a charset as a sort of check constraint. If
> your locale is turkish and you have a column marked charset ASCII then
> storing lower('HI') results in an error.

Yeah, if you use strcoll function it fails when illegal character is found.
See
http://www.opengroup.org/onlinepubs/009695399/functions/strcoll.html

> A collation must be defined over all possible characters, it can't
> depend on the character set. That doesn't mean sorting in en_US must do
> something meaningful with japanese characters, it does mean it can't
> throw an error (the usual procedure is to sort on unicode point).

Collation cannot be defined on any character. There is not any relation between
Latin and Chines characters. Collation has sense when you are able to specify <  = > operators.

If you need compare Japanese and Latin characters then ansi specify default 
collation for each repertoire. I think it is usually bitwise comparing.

    Zdenek

-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pltcl_*mod commands are broken on Solaris 10
Next
From: Andrew Sullivan
Date:
Subject: Re: [patch] plproxy v2