Thread: PostgreSQL
Hello, all!
I have a good question for PostgreSQL FAQ.
How to use string functions (like UPPER()/LOWER()) for non-latin strings?
Why UPPER() function doesn't work with my UNICODE PostgreSQL database which contains non-latin characters (like cyrillic)?
How to make case insensetive search by text field which contains non-latin characters?
Thanks for your answers!
Best regards
Eugeny
Not sure. I thought it would work. --------------------------------------------------------------------------- Eugeny Balakhonov wrote: > Hello, all! > > I have a good question for PostgreSQL FAQ. > > How to use string functions (like UPPER()/LOWER()) for non-latin strings? > Why UPPER() function doesn't work with my UNICODE PostgreSQL database which contains non-latin characters (like cyrillic)? > How to make case insensetive search by text field which contains non-latin characters? > > Thanks for your answers! > > Best regards > Eugeny -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I confirm this behavour: cyrilic words are not changed by lower()/upper() functions, nor catched by ilike. I am using : => SELECT version(); version - --------------------------------------------------------------- PostgreSQL 7.2.2 on i686-pc-linux-gnu, compiled by GCC 2.95.2 (1 row) Nothing special was done during database creation (no encoding selected). > Not sure. I thought it would work. > > How to use string functions (like UPPER()/LOWER()) for non-latin strings? > > Why UPPER() function doesn't work with my UNICODE PostgreSQL database > > which contains non-latin characters (like cyrillic)? How to make case > > insensetive search by text field which contains non-latin characters? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/Nw7wV+WKOINIfOYRAuhmAJwMEkdgqXkt6ZhgJsFZfQH2mELRwgCfeDeV L9TbSItEb0tAC7cI0cKwg6A= =veHN -----END PGP SIGNATURE-----
On Mon, 11 Aug 2003, Bruce Momjian wrote: > > Not sure. I thought it would work. > No, it doesn't works. Several people already complained about bad unicode support. I recall Tatsuo comment some piece of code. I have a little page http://www.sai.msu.su/~megera/postgres/utf8.html about my experience with UTF8 and cyrillic. > --------------------------------------------------------------------------- > > Eugeny Balakhonov wrote: > > Hello, all! > > > > I have a good question for PostgreSQL FAQ. > > > > How to use string functions (like UPPER()/LOWER()) for non-latin strings? > > Why UPPER() function doesn't work with my UNICODE PostgreSQL database which contains non-latin characters (like cyrillic)? > > How to make case insensetive search by text field which contains non-latin characters? > > > > Thanks for your answers! > > > > Best regards > > Eugeny > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Well, I have no mention of this problem in the TODO list, so I would like to get a good description of why it isn't working. Looking at the code, I see upper() is defined in oracle_compat.c (you would think it would be more standard), and it calls toupper(), so it probably works on single-bytes encodings, but not multi-byte ones. Is this correct? is there a way to do multi-byte toupper? Perhaps converting to wide characters and calling towupper()? --------------------------------------------------------------------------- Oleg Bartunov wrote: > On Mon, 11 Aug 2003, Bruce Momjian wrote: > > > > > Not sure. I thought it would work. > > > > No, it doesn't works. Several people already complained about bad > unicode support. I recall Tatsuo comment some piece of code. > I have a little page http://www.sai.msu.su/~megera/postgres/utf8.html > about my experience with UTF8 and cyrillic. > > > > > --------------------------------------------------------------------------- > > > > Eugeny Balakhonov wrote: > > > Hello, all! > > > > > > I have a good question for PostgreSQL FAQ. > > > > > > How to use string functions (like UPPER()/LOWER()) for non-latin strings? > > > Why UPPER() function doesn't work with my UNICODE PostgreSQL database which contains non-latin characters (like cyrillic)? > > > How to make case insensetive search by text field which contains non-latin characters? > > > > > > Thanks for your answers! > > > > > > Best regards > > > Eugeny > > > > > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
I think if Postgres were to be completely UTF8 compatible, and as the default configuration, we'd do a lot better against'the others', and take more of Oracle's market. Bruce Momjian wrote: > Well, I have no mention of this problem in the TODO list, so I would > like to get a good description of why it isn't working. > > Looking at the code, I see upper() is defined in oracle_compat.c (you > would think it would be more standard), and it calls toupper(), so it > probably works on single-bytes encodings, but not multi-byte ones. Is > this correct? is there a way to do multi-byte toupper? Perhaps > converting to wide characters and calling towupper()? > > --------------------------------------------------------------------------- > > Oleg Bartunov wrote: > >>On Mon, 11 Aug 2003, Bruce Momjian wrote: >> >> >>>Not sure. I thought it would work. >>> >> >>No, it doesn't works. Several people already complained about bad >>unicode support. I recall Tatsuo comment some piece of code. >>I have a little page http://www.sai.msu.su/~megera/postgres/utf8.html >>about my experience with UTF8 and cyrillic. >> >> >> >> >>>--------------------------------------------------------------------------- >>> >>>Eugeny Balakhonov wrote: >>> >>>>Hello, all! >>>> >>>>I have a good question for PostgreSQL FAQ. >>>> >>>>How to use string functions (like UPPER()/LOWER()) for non-latin strings? >>>>Why UPPER() function doesn't work with my UNICODE PostgreSQL database which contains non-latin characters (like cyrillic)? >>>>How to make case insensetive search by text field which contains non-latin characters? >>>> >>>>Thanks for your answers! >>>> >>>>Best regards >>>>Eugeny >>> >>> >> Regards, >> Oleg >>_____________________________________________________________ >>Oleg Bartunov, sci.researcher, hostmaster of AstroNet, >>Sternberg Astronomical Institute, Moscow University (Russia) >>Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >>phone: +007(095)939-16-83, +007(095)939-23-83 >> > >
Added to TODO: * Fix upper()/lower() to work for multibyte encodings --------------------------------------------------------------------------- Alexander Litvinov wrote: [ PGP not available, raw data follows ] > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I confirm this behavour: cyrilic words are not changed by lower()/upper() > functions, nor catched by ilike. > > I am using : > => SELECT version(); > version > - --------------------------------------------------------------- > PostgreSQL 7.2.2 on i686-pc-linux-gnu, compiled by GCC 2.95.2 > (1 row) > > Nothing special was done during database creation (no encoding selected). > > > Not sure. I thought it would work. > > > > How to use string functions (like UPPER()/LOWER()) for non-latin strings? > > > Why UPPER() function doesn't work with my UNICODE PostgreSQL database > > > which contains non-latin characters (like cyrillic)? How to make case > > > insensetive search by text field which contains non-latin characters? > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.2 (GNU/Linux) > > iD8DBQE/Nw7wV+WKOINIfOYRAuhmAJwMEkdgqXkt6ZhgJsFZfQH2mELRwgCfeDeV > L9TbSItEb0tAC7cI0cKwg6A= > =veHN > -----END PGP SIGNATURE----- > [ End of raw data] -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073