Thread: sorting chinese characters

sorting chinese characters

From
"prabahar"
Date:
Hi, I have a requirement where I have to sort a field which has euc-jp 
characters in it. When i sort them we find that Japanese Hiragana Characters 
are sorted properly. But Chinese characters are not sorted properly. Can any 
one give some sujestions how to fix it? I have set the LC_ALL=ja_JP in the 
profile.

Thanks in advance for your replies,
Prabahar
_____________________________________________________________________
Any Opinions, explicit or implied, are solely those of the author and do not necessarily
represent those of Questech. This e-mail may contain confidential and/or privileged
information. If you are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any unauthorized copying,
disclosure or distribution of the material in this e-mail is strictly forbidden.
_____________________________________________________________________



Re: sorting chinese characters

From
Ian Barwick
Date:
On Friday 25 April 2003 10:22, prabahar wrote:
> Hi, I have a requirement where I have to sort a field which has euc-jp
> characters in it. When i sort them we find that Japanese Hiragana
> Characters are sorted properly. But Chinese characters are not sorted
> properly.

Can you define "properly"? What is it you want to sort?

> Can any one give some sujestions how to fix it? I have set the
> LC_ALL=ja_JP in the profile.

Unfortunately with Japanese "Chinese" characters there is no algorithmically
determinable sort order  You will need some kind of lookup table containing
hiragana (and possibly katakana) if you want to sort in phonetic dictionary
order as there is a "many to many" relationship between characters / 
combinations of characters and their pronuncation(s).

If the data you are dealing with represents names you don't have a chance
unless the data comes with the pronunciation in a seperate field (which
is why Japanese forms usually have space for both characters and 
pronuncation).

It should be possible using a lookup table to determine sort order of a given
set of characters based on their structure (radical / stroke count), but this
method of sorting is archaic and generally not used.


Ian Barwick
barwick@gmx.net



Re: sorting chinese characters

From
Ian Barwick
Date:
On Saturday 26 April 2003 03:55, prabahar wrote:
> Thanks for your reply, Let me explain a bit more about my problem. I have a
> member master table where in i have member_name_hiragana,
> member_name_chinese [member_name_hiragana has member names in hiragana
> chars and
> member_name_chinese has member names in chinese chars].
> when i execute the query "select * from member_master orderby
> member_name_hiragana" then the sort it a proper sort order. But when i
> execute "select * from member_master orderby member_name_chinese" then it
> does not sort properly. I am not sure about the sort order, but my clients
> say that that is not a proper sort. How can i fix this?

This sounds like "member_name_hiragana" is the sort key you need so I would
stick to this unless you have some specific requirement to sort by Chinese
character, which is unusual and as I said non-trivial. It might be worth using
"member_name_chinese" as a secondary sort key to produce reproducible
results in those cases where "member_name_hiragana" are the same
but "member_name_chinese" are different; the JIS character sets contain
characters in a _crude_ order which may be useful for this case.


HTH

Ian Barwick
barwick@gmx.net



Re: sorting chinese characters

From
"prabahar"
Date:
Thanks for your reply, Let me explain a bit more about my problem. I have a 
member master table where in i have member_name_hiragana, member_name_chinese 
[member_name_hiragana has member names in hiragana chars and 
member_name_chinese has member names in chinese chars]. 
when i execute the query "select * from member_master orderby 
member_name_hiragana" then the sort it a proper sort order. But when i 
execute "select * from member_master orderby member_name_chinese" then it 
does not sort properly. I am not sure about the sort order, but my clients 
say that that is not a proper sort. How can i fix this?

Thanks again,
Prabahar

> On Friday 25 April 2003 10:22, prabahar wrote:
> > Hi, I have a requirement where I have to sort a field which has euc-jp
> > characters in it. When i sort them we find that Japanese Hiragana
> > Characters are sorted properly. But Chinese characters are not sorted
> > properly.
> 
> Can you define "properly"? What is it you want to sort?
> 
> > Can any one give some sujestions how to fix it? I have set the
> > LC_ALL=ja_JP in the profile.
> 
> Unfortunately with Japanese "Chinese" characters there is no algorithmically
> determinable sort order  You will need some kind of lookup table containing
> hiragana (and possibly katakana) if you want to sort in phonetic dictionary
> order as there is a "many to many" relationship between characters / 
> combinations of characters and their pronuncation(s).
> 
> If the data you are dealing with represents names you don't have a chance
> unless the data comes with the pronunciation in a seperate field (which
> is why Japanese forms usually have space for both characters and 
> pronuncation).
> 
> It should be possible using a lookup table to determine sort order 
> of a given set of characters based on their structure (radical / 
> stroke count), but this method of sorting is archaic and generally 
> not used.
> 
> Ian Barwick
> barwick@gmx.net



_____________________________________________________________________
Any Opinions, explicit or implied, are solely those of the author and do not necessarily
represent those of Questech. This e-mail may contain confidential and/or privileged
information. If you are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any unauthorized copying,
disclosure or distribution of the material in this e-mail is strictly forbidden.
_____________________________________________________________________



Re: sorting chinese characters

From
Tatsuo Ishii
Date:
> Thanks for your reply, Let me explain a bit more about my problem. I have a 
> member master table where in i have member_name_hiragana, member_name_chinese 
> [member_name_hiragana has member names in hiragana chars and 
> member_name_chinese has member names in chinese chars]. 
> when i execute the query "select * from member_master orderby 
> member_name_hiragana" then the sort it a proper sort order. But when i 
> execute "select * from member_master orderby member_name_chinese" then it 
> does not sort properly. I am not sure about the sort order, but my clients 
> say that that is not a proper sort. How can i fix this?

There are many possible reasons why your customers are not satisfied
(it's even possible that they do not understand what "chinese"
are. Do they understand Japanese?) Anyway, can you plesae provide more
detailed info:

The OS PostgreSQL running on
PostgreSQL version
configure option
initdb option
database encoding
"chinese characters" not sorted in proper order (hex dump is preferred)
--
Tatsuo Ishii