Thread: sorting Chinese varchar field

sorting Chinese varchar field

From
jian chen
Date:
Hi,

I installed postgres 8.0 for windows on my win xp (Simplified Chinese
version). The encoding is unicode. When I set pgsql client encoding to
gb18030, I could insert Chinese text from the command line to
postgres.

However, I could not get the sort order of Chinese varchar field to
work properly.

What I tried are as follows:

1)  installed postgres for windows and used the "C" locale.
2)  installed postgres for windows and used the "Chinese, PRC" locale.

Again, in both cases, the backend encoding is unicode.

The other interesting thing I observed was, when setting to the "C"
locale, the following sql worked fine:
select * from user where  name = 'xxxxx';
xxxxx is a Chinese text
However, if I set the locale to "Chinese, PRC" during installation,
the above select did not get the any matching rows, where it should
have got. In this case, the following worked fine:
select * from user where name ilike 'xxxxx%';

Could anyone let me know the best practice for using postgres to store
Chinese text? (This should be the same problem I guess, for using
postgres to store other languages than English.)

Thanks a lot,

Jian

Re: sorting Chinese varchar field

From
"Magnus Hagander"
Date:
>Hi,
>
>I installed postgres 8.0 for windows on my win xp (Simplified Chinese
>version). The encoding is unicode. When I set pgsql client encoding to
>gb18030, I could insert Chinese text from the command line to
>postgres.
>
>However, I could not get the sort order of Chinese varchar field to
>work properly.
>
>What I tried are as follows:
>
>1)  installed postgres for windows and used the "C" locale.
>2)  installed postgres for windows and used the "Chinese, PRC" locale.
>
>Again, in both cases, the backend encoding is unicode.

Sorting is not currently supported in the UNICODE/UTF-8 encoding on
Win32. Sorry. You need to pick a specific non-unicode encoding/locale
combination. (Or run the server on a platform that supports it, if that
is an option)

//Magnus

Re: sorting Chinese varchar field

From
Tatsuo Ishii
Date:
> Hi,
>
> I installed postgres 8.0 for windows on my win xp (Simplified Chinese
> version). The encoding is unicode. When I set pgsql client encoding to
> gb18030, I could insert Chinese text from the command line to
> postgres.
>
> However, I could not get the sort order of Chinese varchar field to
> work properly.
>
> What I tried are as follows:
>
> 1)  installed postgres for windows and used the "C" locale.
> 2)  installed postgres for windows and used the "Chinese, PRC" locale.
>
> Again, in both cases, the backend encoding is unicode.
>
> The other interesting thing I observed was, when setting to the "C"
> locale, the following sql worked fine:
> select * from user where  name = 'xxxxx';
> xxxxx is a Chinese text
> However, if I set the locale to "Chinese, PRC" during installation,
> the above select did not get the any matching rows, where it should
> have got. In this case, the following worked fine:
> select * from user where name ilike 'xxxxx%';
>
> Could anyone let me know the best practice for using postgres to store
> Chinese text? (This should be the same problem I guess, for using
> postgres to store other languages than English.)

I would suggest to use UNICODE/C locale combo. On most systems the
locale database for multibyte encodings are broken as far as I
know. For the sorting problem, probably you could get the right sort
order by using convert. i.e.

SELECT * FROM t1 WHERE ... ORDER BY CONVERT(your_chinese_character, utf_8_to_gb_18030);

If above does not work, you cannot get the right sort order even if
you use GB18030 anyway.
--
Tatsuo Ishii

Re: sorting Chinese varchar field

From
jian chen
Date:
Great, that works out fine!

So, the SQL I tested with is:
select * from mytable order by convert(name, 'utf8', 'gb18030');

It produces the correct output.

Thanks Tatsuo!

Jian

On Tue, 29 Mar 2005 10:25:58 +0900 (JST), Tatsuo Ishii
<t-ishii@sra.co.jp> wrote:
> > Hi,
> >
> > I installed postgres 8.0 for windows on my win xp (Simplified Chinese
> > version). The encoding is unicode. When I set pgsql client encoding to
> > gb18030, I could insert Chinese text from the command line to
> > postgres.
> >
> > However, I could not get the sort order of Chinese varchar field to
> > work properly.
> >
> > What I tried are as follows:
> >
> > 1)  installed postgres for windows and used the "C" locale.
> > 2)  installed postgres for windows and used the "Chinese, PRC" locale.
> >
> > Again, in both cases, the backend encoding is unicode.
> >
> > The other interesting thing I observed was, when setting to the "C"
> > locale, the following sql worked fine:
> > select * from user where  name = 'xxxxx';
> > xxxxx is a Chinese text
> > However, if I set the locale to "Chinese, PRC" during installation,
> > the above select did not get the any matching rows, where it should
> > have got. In this case, the following worked fine:
> > select * from user where name ilike 'xxxxx%';
> >
> > Could anyone let me know the best practice for using postgres to store
> > Chinese text? (This should be the same problem I guess, for using
> > postgres to store other languages than English.)
>
> I would suggest to use UNICODE/C locale combo. On most systems the
> locale database for multibyte encodings are broken as far as I
> know. For the sorting problem, probably you could get the right sort
> order by using convert. i.e.
>
> SELECT * FROM t1 WHERE ... ORDER BY CONVERT(your_chinese_character, utf_8_to_gb_18030);
>
> If above does not work, you cannot get the right sort order even if
> you use GB18030 anyway.
> --
> Tatsuo Ishii
>

Re: sorting Chinese varchar field

From
Tatsuo Ishii
Date:
> Great, that works out fine!
>
> So, the SQL I tested with is:
> select * from mytable order by convert(name, 'utf8', 'gb18030');

Sorry, what I wanted to say was:

SELECT * FROM t1 WHERE ... ORDER BY CONVERT(your_chinese_character using utf_8_to_gb_18030);

Of course your example is fine too (actually they are internally
identical).
--
Tatsuo Ishii

> It produces the correct output.
>
> Thanks Tatsuo!
>
> Jian
>
> On Tue, 29 Mar 2005 10:25:58 +0900 (JST), Tatsuo Ishii
> <t-ishii@sra.co.jp> wrote:
> > > Hi,
> > >
> > > I installed postgres 8.0 for windows on my win xp (Simplified Chinese
> > > version). The encoding is unicode. When I set pgsql client encoding to
> > > gb18030, I could insert Chinese text from the command line to
> > > postgres.
> > >
> > > However, I could not get the sort order of Chinese varchar field to
> > > work properly.
> > >
> > > What I tried are as follows:
> > >
> > > 1)  installed postgres for windows and used the "C" locale.
> > > 2)  installed postgres for windows and used the "Chinese, PRC" locale.
> > >
> > > Again, in both cases, the backend encoding is unicode.
> > >
> > > The other interesting thing I observed was, when setting to the "C"
> > > locale, the following sql worked fine:
> > > select * from user where  name = 'xxxxx';
> > > xxxxx is a Chinese text
> > > However, if I set the locale to "Chinese, PRC" during installation,
> > > the above select did not get the any matching rows, where it should
> > > have got. In this case, the following worked fine:
> > > select * from user where name ilike 'xxxxx%';
> > >
> > > Could anyone let me know the best practice for using postgres to store
> > > Chinese text? (This should be the same problem I guess, for using
> > > postgres to store other languages than English.)
> >
> > I would suggest to use UNICODE/C locale combo. On most systems the
> > locale database for multibyte encodings are broken as far as I
> > know. For the sorting problem, probably you could get the right sort
> > order by using convert. i.e.
> >
> > SELECT * FROM t1 WHERE ... ORDER BY CONVERT(your_chinese_character, utf_8_to_gb_18030);
> >
> > If above does not work, you cannot get the right sort order even if
> > you use GB18030 anyway.
> > --
> > Tatsuo Ishii
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>