Thread: PostgreSQL

PostgreSQL

From
"Eugeny Balakhonov"
Date:
Hello, all!
 
I have a good question for PostgreSQL FAQ.
 
How to use string functions (like UPPER()/LOWER()) for non-latin strings?
Why UPPER() function doesn't work with my UNICODE PostgreSQL database which contains non-latin characters (like cyrillic)?
How to make case insensetive search by text field which contains non-latin characters?
 
Thanks for your answers!
 
Best regards
Eugeny

Re: PostgreSQL

From
Bruce Momjian
Date:
Not sure.  I thought it would work.

---------------------------------------------------------------------------

Eugeny Balakhonov wrote:
> Hello, all!
>
> I have a good question for PostgreSQL FAQ.
>
> How to use string functions (like UPPER()/LOWER()) for non-latin strings?
> Why UPPER() function doesn't work with my UNICODE PostgreSQL database which contains non-latin characters (like
cyrillic)?
> How to make case insensetive search by text field which contains non-latin characters?
>
> Thanks for your answers!
>
> Best regards
> Eugeny

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: PostgreSQL

From
Alexander Litvinov
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I confirm this behavour: cyrilic words are not changed by lower()/upper()
functions, nor catched by ilike.

I am using :
=> SELECT version();
                            version
- ---------------------------------------------------------------
 PostgreSQL 7.2.2 on i686-pc-linux-gnu, compiled by GCC 2.95.2
(1 row)

Nothing special was done during database creation (no encoding selected).

> Not sure.  I thought it would work.

> > How to use string functions (like UPPER()/LOWER()) for non-latin strings?
> > Why UPPER() function doesn't work with my UNICODE PostgreSQL database
> > which contains non-latin characters (like cyrillic)? How to make case
> > insensetive search by text field which contains non-latin characters?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/Nw7wV+WKOINIfOYRAuhmAJwMEkdgqXkt6ZhgJsFZfQH2mELRwgCfeDeV
L9TbSItEb0tAC7cI0cKwg6A=
=veHN
-----END PGP SIGNATURE-----


Re: PostgreSQL

From
Oleg Bartunov
Date:
On Mon, 11 Aug 2003, Bruce Momjian wrote:

>
> Not sure.  I thought it would work.
>

No, it doesn't works. Several people already complained about bad
unicode support. I recall Tatsuo comment some piece of code.
I have a little page http://www.sai.msu.su/~megera/postgres/utf8.html
about my experience with UTF8 and cyrillic.



> ---------------------------------------------------------------------------
>
> Eugeny Balakhonov wrote:
> > Hello, all!
> >
> > I have a good question for PostgreSQL FAQ.
> >
> > How to use string functions (like UPPER()/LOWER()) for non-latin strings?
> > Why UPPER() function doesn't work with my UNICODE PostgreSQL database which contains non-latin characters (like
cyrillic)?
> > How to make case insensetive search by text field which contains non-latin characters?
> >
> > Thanks for your answers!
> >
> > Best regards
> > Eugeny
>
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: PostgreSQL

From
Bruce Momjian
Date:
Well, I have no mention of this problem in the TODO list, so I would
like to get a good description of why it isn't working.

Looking at the code, I see upper() is defined in oracle_compat.c (you
would think it would be more standard), and it calls toupper(), so it
probably works on single-bytes encodings, but not multi-byte ones.  Is
this correct?  is there a way to do multi-byte toupper?  Perhaps
converting to wide characters and calling towupper()?

---------------------------------------------------------------------------

Oleg Bartunov wrote:
> On Mon, 11 Aug 2003, Bruce Momjian wrote:
>
> >
> > Not sure.  I thought it would work.
> >
>
> No, it doesn't works. Several people already complained about bad
> unicode support. I recall Tatsuo comment some piece of code.
> I have a little page http://www.sai.msu.su/~megera/postgres/utf8.html
> about my experience with UTF8 and cyrillic.
>
>
>
> > ---------------------------------------------------------------------------
> >
> > Eugeny Balakhonov wrote:
> > > Hello, all!
> > >
> > > I have a good question for PostgreSQL FAQ.
> > >
> > > How to use string functions (like UPPER()/LOWER()) for non-latin strings?
> > > Why UPPER() function doesn't work with my UNICODE PostgreSQL database which contains non-latin characters (like
cyrillic)?
> > > How to make case insensetive search by text field which contains non-latin characters?
> > >
> > > Thanks for your answers!
> > >
> > > Best regards
> > > Eugeny
> >
> >
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: PostgreSQL

From
Dennis Gearon
Date:
I think if Postgres were to be completely UTF8 compatible, and as the default configuration, we'd do a lot better
against'the others', and take more of Oracle's market. 

Bruce Momjian wrote:
> Well, I have no mention of this problem in the TODO list, so I would
> like to get a good description of why it isn't working.
>
> Looking at the code, I see upper() is defined in oracle_compat.c (you
> would think it would be more standard), and it calls toupper(), so it
> probably works on single-bytes encodings, but not multi-byte ones.  Is
> this correct?  is there a way to do multi-byte toupper?  Perhaps
> converting to wide characters and calling towupper()?
>
> ---------------------------------------------------------------------------
>
> Oleg Bartunov wrote:
>
>>On Mon, 11 Aug 2003, Bruce Momjian wrote:
>>
>>
>>>Not sure.  I thought it would work.
>>>
>>
>>No, it doesn't works. Several people already complained about bad
>>unicode support. I recall Tatsuo comment some piece of code.
>>I have a little page http://www.sai.msu.su/~megera/postgres/utf8.html
>>about my experience with UTF8 and cyrillic.
>>
>>
>>
>>
>>>---------------------------------------------------------------------------
>>>
>>>Eugeny Balakhonov wrote:
>>>
>>>>Hello, all!
>>>>
>>>>I have a good question for PostgreSQL FAQ.
>>>>
>>>>How to use string functions (like UPPER()/LOWER()) for non-latin strings?
>>>>Why UPPER() function doesn't work with my UNICODE PostgreSQL database which contains non-latin characters (like
cyrillic)?
>>>>How to make case insensetive search by text field which contains non-latin characters?
>>>>
>>>>Thanks for your answers!
>>>>
>>>>Best regards
>>>>Eugeny
>>>
>>>
>>    Regards,
>>        Oleg
>>_____________________________________________________________
>>Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
>>Sternberg Astronomical Institute, Moscow University (Russia)
>>Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>>phone: +007(095)939-16-83, +007(095)939-23-83
>>
>
>


Re: PostgreSQL

From
Bruce Momjian
Date:
Added to TODO:

    * Fix upper()/lower() to work for multibyte encodings


---------------------------------------------------------------------------

Alexander Litvinov wrote:
[ PGP not available, raw data follows ]
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I confirm this behavour: cyrilic words are not changed by lower()/upper()
> functions, nor catched by ilike.
>
> I am using :
> => SELECT version();
>                             version
> - ---------------------------------------------------------------
>  PostgreSQL 7.2.2 on i686-pc-linux-gnu, compiled by GCC 2.95.2
> (1 row)
>
> Nothing special was done during database creation (no encoding selected).
>
> > Not sure.  I thought it would work.
>
> > > How to use string functions (like UPPER()/LOWER()) for non-latin strings?
> > > Why UPPER() function doesn't work with my UNICODE PostgreSQL database
> > > which contains non-latin characters (like cyrillic)? How to make case
> > > insensetive search by text field which contains non-latin characters?
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.2 (GNU/Linux)
>
> iD8DBQE/Nw7wV+WKOINIfOYRAuhmAJwMEkdgqXkt6ZhgJsFZfQH2mELRwgCfeDeV
> L9TbSItEb0tAC7cI0cKwg6A=
> =veHN
> -----END PGP SIGNATURE-----
>
[ End of raw data]

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073