Re: sorting chinese characters - Mailing list pgsql-sql

From Ian Barwick
Subject Re: sorting chinese characters
Date
Msg-id 200304252010.11373.barwick@gmx.net
Whole thread Raw
In response to sorting chinese characters  ("prabahar" <prabahar@questech.co.in>)
Responses Re: sorting chinese characters  ("prabahar" <prabahar@questech.co.in>)
List pgsql-sql
On Friday 25 April 2003 10:22, prabahar wrote:
> Hi, I have a requirement where I have to sort a field which has euc-jp
> characters in it. When i sort them we find that Japanese Hiragana
> Characters are sorted properly. But Chinese characters are not sorted
> properly.

Can you define "properly"? What is it you want to sort?

> Can any one give some sujestions how to fix it? I have set the
> LC_ALL=ja_JP in the profile.

Unfortunately with Japanese "Chinese" characters there is no algorithmically
determinable sort order  You will need some kind of lookup table containing
hiragana (and possibly katakana) if you want to sort in phonetic dictionary
order as there is a "many to many" relationship between characters / 
combinations of characters and their pronuncation(s).

If the data you are dealing with represents names you don't have a chance
unless the data comes with the pronunciation in a seperate field (which
is why Japanese forms usually have space for both characters and 
pronuncation).

It should be possible using a lookup table to determine sort order of a given
set of characters based on their structure (radical / stroke count), but this
method of sorting is archaic and generally not used.


Ian Barwick
barwick@gmx.net



pgsql-sql by date:

Previous
From: Dennis Gearon
Date:
Subject: Re: [GENERAL] rewriting values with before trigger
Next
From: Vernon
Date:
Subject: Re: Fwd: Unicode, RedHat Linux, & PostgreSQL