Thread: confused with encodings
Tatsuo, recently I tried to understand why I can't get sorting works properly with cyrillic characters in UTF8 datbase. I figure out the reason of my confusion - I thought I could specify different encodings for different databases and these encodings will be used in text operations (sort, upper,lower), not just for conversion. But, actually, the only encoding is important for text operations - the one specified with 'initdb' command ! Is't true ? If so, it's a big issue :) After I created separate storage for unicode (initdb -E utf8) and restarted postmaster I got success with 'order by', but upper(), lower() functions still fails. Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
> Tatsuo, > > recently I tried to understand why I can't get sorting works properly > with cyrillic characters in UTF8 datbase. I figure out the > reason of my confusion - I thought I could specify different encodings > for different databases and these encodings will be used in text operations > (sort, upper,lower), not just for conversion. > But, actually, the only encoding is important for text operations - the one > specified with 'initdb' command ! Is't true ? > > If so, it's a big issue :) > > After I created separate storage for unicode (initdb -E utf8) and > restarted postmaster I got success with 'order by', but > upper(), lower() functions still fails. [I assume you enable the locale support.] Dont't ask me. These are locale support problems. -- Tatsuo Ishii
On Mon, 16 Jun 2003, Oleg Bartunov wrote: > I thought I could specify different encodings > for different databases and these encodings will be used in text operations > (sort, upper,lower), not just for conversion. En encoding does not imply any sort order. UTF-8 can be used to store strings in many languages, each having different sort order (and other properties). It's the locale that determines these things. It would be nice to be able to set the locale per database, or even per column. -- /Dennis
On Tue, 17 Jun 2003, Tatsuo Ishii wrote: > > Tatsuo, > > > > recently I tried to understand why I can't get sorting works properly > > with cyrillic characters in UTF8 datbase. I figure out the > > reason of my confusion - I thought I could specify different encodings > > for different databases and these encodings will be used in text operations > > (sort, upper,lower), not just for conversion. > > But, actually, the only encoding is important for text operations - the one > > specified with 'initdb' command ! Is't true ? > > > > If so, it's a big issue :) > > > > After I created separate storage for unicode (initdb -E utf8) and > > restarted postmaster I got success with 'order by', but > > upper(), lower() functions still fails. > > [I assume you enable the locale support.] isn't it enabled by default ? > > Dont't ask me. These are locale support problems. Sorry, I just wanted to understand where I get confused. You're right, utf8 locale support in glibc is broke, I've tested simple C-program with glibc 2.2.5 and 2.3.1 on Linux system and toupper, tolower functions are broken. btw, did you try libutf8 library ? http://www.haible.de/bruno/packages-libutf8.html > -- > Tatsuo Ishii > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
> > [I assume you enable the locale support.] > > isn't it enabled by default ? It can be off by using ---no-locale option with initdb. > > Dont't ask me. These are locale support problems. > > Sorry, I just wanted to understand where I get confused. > You're right, utf8 locale support in glibc is broke, > I've tested simple C-program with glibc 2.2.5 and 2.3.1 on > Linux system and toupper, tolower functions are broken. > > btw, did you try libutf8 library ? > http://www.haible.de/bruno/packages-libutf8.html No. BTW, upper() will never work even glibc works fine with UTF-8. See the code fragment below(utils/adt/oracle_compat.c); char *ptr; : :while (m-- > 0){ *ptr = toupper((unsigned char) *ptr); ptr++;} Apparently this is not multibyte aware... -- Tatsuo Ishii
On Tue, 17 Jun 2003, Tatsuo Ishii wrote: > > > [I assume you enable the locale support.] > > > > isn't it enabled by default ? > > It can be off by using ---no-locale option with initdb. > what's the benefit of this for non-ascii world :? > > > Dont't ask me. These are locale support problems. > > > > Sorry, I just wanted to understand where I get confused. > > You're right, utf8 locale support in glibc is broke, > > I've tested simple C-program with glibc 2.2.5 and 2.3.1 on > > Linux system and toupper, tolower functions are broken. > > > > btw, did you try libutf8 library ? > > http://www.haible.de/bruno/packages-libutf8.html > > No. BTW, upper() will never work even glibc works fine with UTF-8. See > the code fragment below(utils/adt/oracle_compat.c); > > char *ptr; > : > : > while (m-- > 0) > { > *ptr = toupper((unsigned char) *ptr); > ptr++; > } > > Apparently this is not multibyte aware... I see. Hope someone is aware on making postgresql unicode compatible. > -- > Tatsuo Ishii > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov writes: > I thought I could specify different encodings for different databases > and these encodings will be used in text operations (sort, upper,lower), > not just for conversion. But, actually, the only encoding is important > for text operations - the one specified with 'initdb' command ! Is't > true ? Absolutely not, but you may find that in order to allow LC_CTYPE operations (namely sort, upper, lower) in UTF8, you need a locale that supports that, namely the xx_XX.utf8 kind. So realistically, you are kind of stuck with one encoding for the entire cluster. -- Peter Eisentraut peter_e@gmx.net