Thread: sorting Chinese varchar field
Hi, I installed postgres 8.0 for windows on my win xp (Simplified Chinese version). The encoding is unicode. When I set pgsql client encoding to gb18030, I could insert Chinese text from the command line to postgres. However, I could not get the sort order of Chinese varchar field to work properly. What I tried are as follows: 1) installed postgres for windows and used the "C" locale. 2) installed postgres for windows and used the "Chinese, PRC" locale. Again, in both cases, the backend encoding is unicode. The other interesting thing I observed was, when setting to the "C" locale, the following sql worked fine: select * from user where name = 'xxxxx'; xxxxx is a Chinese text However, if I set the locale to "Chinese, PRC" during installation, the above select did not get the any matching rows, where it should have got. In this case, the following worked fine: select * from user where name ilike 'xxxxx%'; Could anyone let me know the best practice for using postgres to store Chinese text? (This should be the same problem I guess, for using postgres to store other languages than English.) Thanks a lot, Jian
>Hi, > >I installed postgres 8.0 for windows on my win xp (Simplified Chinese >version). The encoding is unicode. When I set pgsql client encoding to >gb18030, I could insert Chinese text from the command line to >postgres. > >However, I could not get the sort order of Chinese varchar field to >work properly. > >What I tried are as follows: > >1) installed postgres for windows and used the "C" locale. >2) installed postgres for windows and used the "Chinese, PRC" locale. > >Again, in both cases, the backend encoding is unicode. Sorting is not currently supported in the UNICODE/UTF-8 encoding on Win32. Sorry. You need to pick a specific non-unicode encoding/locale combination. (Or run the server on a platform that supports it, if that is an option) //Magnus
> Hi, > > I installed postgres 8.0 for windows on my win xp (Simplified Chinese > version). The encoding is unicode. When I set pgsql client encoding to > gb18030, I could insert Chinese text from the command line to > postgres. > > However, I could not get the sort order of Chinese varchar field to > work properly. > > What I tried are as follows: > > 1) installed postgres for windows and used the "C" locale. > 2) installed postgres for windows and used the "Chinese, PRC" locale. > > Again, in both cases, the backend encoding is unicode. > > The other interesting thing I observed was, when setting to the "C" > locale, the following sql worked fine: > select * from user where name = 'xxxxx'; > xxxxx is a Chinese text > However, if I set the locale to "Chinese, PRC" during installation, > the above select did not get the any matching rows, where it should > have got. In this case, the following worked fine: > select * from user where name ilike 'xxxxx%'; > > Could anyone let me know the best practice for using postgres to store > Chinese text? (This should be the same problem I guess, for using > postgres to store other languages than English.) I would suggest to use UNICODE/C locale combo. On most systems the locale database for multibyte encodings are broken as far as I know. For the sorting problem, probably you could get the right sort order by using convert. i.e. SELECT * FROM t1 WHERE ... ORDER BY CONVERT(your_chinese_character, utf_8_to_gb_18030); If above does not work, you cannot get the right sort order even if you use GB18030 anyway. -- Tatsuo Ishii
Great, that works out fine! So, the SQL I tested with is: select * from mytable order by convert(name, 'utf8', 'gb18030'); It produces the correct output. Thanks Tatsuo! Jian On Tue, 29 Mar 2005 10:25:58 +0900 (JST), Tatsuo Ishii <t-ishii@sra.co.jp> wrote: > > Hi, > > > > I installed postgres 8.0 for windows on my win xp (Simplified Chinese > > version). The encoding is unicode. When I set pgsql client encoding to > > gb18030, I could insert Chinese text from the command line to > > postgres. > > > > However, I could not get the sort order of Chinese varchar field to > > work properly. > > > > What I tried are as follows: > > > > 1) installed postgres for windows and used the "C" locale. > > 2) installed postgres for windows and used the "Chinese, PRC" locale. > > > > Again, in both cases, the backend encoding is unicode. > > > > The other interesting thing I observed was, when setting to the "C" > > locale, the following sql worked fine: > > select * from user where name = 'xxxxx'; > > xxxxx is a Chinese text > > However, if I set the locale to "Chinese, PRC" during installation, > > the above select did not get the any matching rows, where it should > > have got. In this case, the following worked fine: > > select * from user where name ilike 'xxxxx%'; > > > > Could anyone let me know the best practice for using postgres to store > > Chinese text? (This should be the same problem I guess, for using > > postgres to store other languages than English.) > > I would suggest to use UNICODE/C locale combo. On most systems the > locale database for multibyte encodings are broken as far as I > know. For the sorting problem, probably you could get the right sort > order by using convert. i.e. > > SELECT * FROM t1 WHERE ... ORDER BY CONVERT(your_chinese_character, utf_8_to_gb_18030); > > If above does not work, you cannot get the right sort order even if > you use GB18030 anyway. > -- > Tatsuo Ishii >
> Great, that works out fine! > > So, the SQL I tested with is: > select * from mytable order by convert(name, 'utf8', 'gb18030'); Sorry, what I wanted to say was: SELECT * FROM t1 WHERE ... ORDER BY CONVERT(your_chinese_character using utf_8_to_gb_18030); Of course your example is fine too (actually they are internally identical). -- Tatsuo Ishii > It produces the correct output. > > Thanks Tatsuo! > > Jian > > On Tue, 29 Mar 2005 10:25:58 +0900 (JST), Tatsuo Ishii > <t-ishii@sra.co.jp> wrote: > > > Hi, > > > > > > I installed postgres 8.0 for windows on my win xp (Simplified Chinese > > > version). The encoding is unicode. When I set pgsql client encoding to > > > gb18030, I could insert Chinese text from the command line to > > > postgres. > > > > > > However, I could not get the sort order of Chinese varchar field to > > > work properly. > > > > > > What I tried are as follows: > > > > > > 1) installed postgres for windows and used the "C" locale. > > > 2) installed postgres for windows and used the "Chinese, PRC" locale. > > > > > > Again, in both cases, the backend encoding is unicode. > > > > > > The other interesting thing I observed was, when setting to the "C" > > > locale, the following sql worked fine: > > > select * from user where name = 'xxxxx'; > > > xxxxx is a Chinese text > > > However, if I set the locale to "Chinese, PRC" during installation, > > > the above select did not get the any matching rows, where it should > > > have got. In this case, the following worked fine: > > > select * from user where name ilike 'xxxxx%'; > > > > > > Could anyone let me know the best practice for using postgres to store > > > Chinese text? (This should be the same problem I guess, for using > > > postgres to store other languages than English.) > > > > I would suggest to use UNICODE/C locale combo. On most systems the > > locale database for multibyte encodings are broken as far as I > > know. For the sorting problem, probably you could get the right sort > > order by using convert. i.e. > > > > SELECT * FROM t1 WHERE ... ORDER BY CONVERT(your_chinese_character, utf_8_to_gb_18030); > > > > If above does not work, you cannot get the right sort order even if > > you use GB18030 anyway. > > -- > > Tatsuo Ishii > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >