Thread: invalid multibyte character for locale
L.S. I have a database created on: db=# select version(); version --------------------------------------------------------------------- PostgreSQL 8.0.1 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66 (1 row) The initdb was done using no-locale and unicode as default encoding, the particular database itself is indeed encoded as UNICODE. Due to a buggy glibc, the following patch was applied to this install in order to avoid a crash on things like 'upper(<string>)': --- oracle_compat.c_orig Mon Dec 6 22:14:11 2004 +++ oracle_compat.c Mon Dec 6 22:14:24 2004 @@ -43,7 +43,7 @@ * We assume if we have these two functions, we have their friends too, and * can use the wide-character method. */ -#if defined(HAVE_WCSTOMBS) && defined(HAVE_TOWLOWER) +#if defined(HAVE_WCSTOMBS) && defined(HAVE_TOWLOWER) && FALSE #define USE_WIDE_UPPER_LOWER #endif The database on this machine was dumped and then restored on another, which has a more recent installation of Slack on it: db=# select version(); version ------------------------------------------------------------------------ PostgreSQL 8.0.1 on i586-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.3 (1 row) Again, the initdb on this machine was done using no-locale and unicode as default encoding, the particular database obviously is also encoded as UNICODE. On the second machine, I'm now getting the following: db=# select 'JÜTERBOG'; ?column? ---------- JÜTERBOG (1 row) db=# select lower('JÜTERBOG'); ERROR: invalid multibyte character for locale HINT: The server's LC_CTYPE locale is probably incompatible with the database encoding. As far as I can tell, this didn't happen with v8.0.0, but I'm afraid I can't be totally sure about that. Obviously, the error doesn't occur on the first machine due to the hack needed for the buggy glibc. I'd appreciate a pointer as to what is causing this. It 'shouldn't' be the hack nor the dump/restore cycle, but.......? TIA. -- Best, Frank.
Apparently your hack does not kill #define USE_WIDE_UPPER_LOWER. BTW, the current code for upper/lower etc. seems to be broken. The exact problem you have are happening in Japanese encodings too(EUC_JP) too. PostgreSQL should not use wide-character method if LC_CTYPE is C. -- Tatsuo Ishii > L.S. > > I have a database created on: > > db=# select version(); > version > --------------------------------------------------------------------- > PostgreSQL 8.0.1 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66 > (1 row) > > > The initdb was done using no-locale and unicode as default encoding, the > particular database itself is indeed encoded as UNICODE. > > > Due to a buggy glibc, the following patch was applied to this install in order > to avoid a crash on things like 'upper(<string>)': > > --- oracle_compat.c_orig Mon Dec 6 22:14:11 2004 > +++ oracle_compat.c Mon Dec 6 22:14:24 2004 > @@ -43,7 +43,7 @@ > * We assume if we have these two functions, we have their friends too, and > * can use the wide-character method. > */ > -#if defined(HAVE_WCSTOMBS) && defined(HAVE_TOWLOWER) > +#if defined(HAVE_WCSTOMBS) && defined(HAVE_TOWLOWER) && FALSE > #define USE_WIDE_UPPER_LOWER > #endif > > > The database on this machine was dumped and then restored on another, which > has a more recent installation of Slack on it: > > > db=# select version(); > version > ------------------------------------------------------------------------ > PostgreSQL 8.0.1 on i586-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.3 > (1 row) > > > Again, the initdb on this machine was done using no-locale and unicode as > default encoding, the particular database obviously is also encoded as > UNICODE. > > > > On the second machine, I'm now getting the following: > > db=# select 'JÜTERBOG'; > ?column? > ---------- > JÜTERBOG > (1 row) > > db=# select lower('JÜTERBOG'); > ERROR: invalid multibyte character for locale > HINT: The server's LC_CTYPE locale is probably incompatible with the database > encoding. > > > > As far as I can tell, this didn't happen with v8.0.0, but I'm afraid I can't > be totally sure about that. Obviously, the error doesn't occur on the first > machine due to the hack needed for the buggy glibc. > > > I'd appreciate a pointer as to what is causing this. It 'shouldn't' be the > hack nor the dump/restore cycle, but.......? > > > TIA. > > > > -- > Best, > > > > > Frank. > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings >
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > BTW, the current code for upper/lower etc. seems to be broken. The > exact problem you have are happening in Japanese encodings too(EUC_JP) > too. PostgreSQL should not use wide-character method if LC_CTYPE is C. Yeah, we came to that same conclusion a few days ago in another thread. I am planning to install the fix but didn't get to it yet. regards, tom lane
Hi Tatsuo / Tom, [TI] > Apparently your hack does not kill #define USE_WIDE_UPPER_LOWER. Mmm, I think it does, but mind you, the hack was applied to the first machine only (since that was the one with the 'original' buggy glibc causing a postmaster crash when using upper() and stuff), while it was the second one producing the error. This second machine didn't seem to have problems using upper() in earlier versions, but it looks like it does now. Using the hack on the second machine obviously solves the problem there as well, I agree ;) [TI] > BTW, the current code for upper/lower etc. seems to be broken. > PostgreSQL should not use wide-character method if LC_CTYPE is C. [TL] > Yeah, we came to that same conclusion a few days ago in another thread. > I am planning to install the fix <cut> Great, no rush, it's an easily avoided issue ;) -- Best, Frank.