Thread: Multibyte (Japanese Character) Sorting

Multibyte (Japanese Character) Sorting

From
Mhor Gonzales
Date:
Hi there,

Im having a problem in sorting multibyte characters.

I am using EUC-JP for my database encoding becuase we need to support
japanese (hiragana, katakana, kanji) text, since our clients are japanese.

I have a table named "user_info" with the following fields:

first_name character(60) NOT NULL
last_name character(60) NOT NULL

We've forced doublebyte character our entries so that all data stored in
the table are doublebyte. The problem is, the sorting procedure. when
you user ORDER BY last_name ASC, the list is not sorted properly. Please
help me fix this problem. Thank you in advanced.

--
==================================================
Morgan Gonzales - 1st BU (MSI) - Tsukiden Software

There are two kinds of people in this world.
One says to God, thy will be done,
and the other to whom God says, thy will be done.

Re: Multibyte (Japanese Character) Sorting

From
Tatsuo Ishii
Date:
> Hi there,
>
> Im having a problem in sorting multibyte characters.
>
> I am using EUC-JP for my database encoding becuase we need to support
> japanese (hiragana, katakana, kanji) text, since our clients are japanese.
>
> I have a table named "user_info" with the following fields:
>
> first_name character(60) NOT NULL
> last_name character(60) NOT NULL
>
> We've forced doublebyte character our entries so that all data stored in
> the table are doublebyte. The problem is, the sorting procedure. when
> you user ORDER BY last_name ASC, the list is not sorted properly. Please
> help me fix this problem. Thank you in advanced.

I'm not sure why you think "not sorted properly", but my wild guess is
your OS's locale data is broken. Use C locale.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Re: Multibyte (Japanese Character) Sorting

From
Tatsuo Ishii
Date:
I have taken a look at the screen shot. Yes, the sort order seems
pretty ridiculous. I tested similar data on my Linux box and the
result was nothing strange. Do you have an index on the field? What is
the platform PostgreSQL is running on? Do you see the same problem
using psql? Can you give me the pg_dump data if possible?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> Thank you for your reply. But I believe our LOCALE was already set to C
> (since this is the default setting).
>
> I've attached the result of my query using "ORDER BY <field> ASC". This
> field contains double byte character for both english and japanese text.
> I think the problem with this sorting is, it sorts by length then by
> ascii code value.
>
> Tatsuo Ishii wrote:
> >> Hi there,
> >>
> >> Im having a problem in sorting multibyte characters.
> >>
> >> I am using EUC-JP for my database encoding becuase we need to support
> >> japanese (hiragana, katakana, kanji) text, since our clients are japanese.
> >>
> >> I have a table named "user_info" with the following fields:
> >>
> >> first_name character(60) NOT NULL
> >> last_name character(60) NOT NULL
> >>
> >> We've forced doublebyte character our entries so that all data stored in
> >> the table are doublebyte. The problem is, the sorting procedure. when
> >> you user ORDER BY last_name ASC, the list is not sorted properly. Please
> >> help me fix this problem. Thank you in advanced.
> >
> > I'm not sure why you think "not sorted properly", but my wild guess is
> > your OS's locale data is broken. Use C locale.
> > --
> > Tatsuo Ishii
> > SRA OSS, Inc. Japan
> >
> >
>
> --
> ==================================================
> Morgan Gonzales - 1st BU (MSI) - Tsukiden Software
>
> There are two kinds of people in this world.
> One says to God, thy will be done,
> and the other to whom God says, thy will be done.