Thread: invalid multibyte character for locale

invalid multibyte character for locale

From

Frank van Vugt

Date:

28 February 2005, 23:08:30

L.S.

I have a database created on:

db=# select version();
                               version
---------------------------------------------------------------------
 PostgreSQL 8.0.1 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66
(1 row)


The initdb was done using no-locale and unicode as default encoding, the
particular database itself is indeed encoded as UNICODE.


Due to a buggy glibc, the following patch was applied to this install in order
to avoid a crash on things like 'upper(<string>)':

--- oracle_compat.c_orig        Mon Dec  6 22:14:11 2004
+++ oracle_compat.c     Mon Dec  6 22:14:24 2004
@@ -43,7 +43,7 @@
  * We assume if we have these two functions, we have their friends too, and
  * can use the wide-character method.
  */
-#if defined(HAVE_WCSTOMBS) && defined(HAVE_TOWLOWER)
+#if defined(HAVE_WCSTOMBS) && defined(HAVE_TOWLOWER) && FALSE
 #define USE_WIDE_UPPER_LOWER
 #endif


The database on this machine was dumped and then restored on another, which
has a more recent installation of Slack on it:


db=# select version();
                                version
------------------------------------------------------------------------
 PostgreSQL 8.0.1 on i586-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.3
(1 row)


Again, the initdb on this machine was done using no-locale and unicode as
default encoding, the particular database obviously is also encoded as
UNICODE.



On the second machine, I'm now getting the following:

db=# select 'JÜTERBOG';
 ?column?
----------
 JÜTERBOG
(1 row)

db=# select lower('JÜTERBOG');
ERROR:  invalid multibyte character for locale
HINT:  The server's LC_CTYPE locale is probably incompatible with the database
encoding.



As far as I can tell, this didn't happen with v8.0.0, but I'm afraid I can't
be totally sure about that. Obviously, the error doesn't occur on the first
machine due to the hack needed for the buggy glibc.


I'd appreciate a pointer as to what is causing this. It 'shouldn't' be the
hack nor the dump/restore cycle, but.......?


TIA.



--
Best,




Frank.

Re: invalid multibyte character for locale

From

Tatsuo Ishii

Date:

28 February 2005, 23:39:25

Apparently your hack does not kill #define USE_WIDE_UPPER_LOWER.

BTW, the current code for upper/lower etc. seems to be broken. The
exact problem you have are happening in Japanese encodings too(EUC_JP)
too. PostgreSQL should not use wide-character method if LC_CTYPE is C.
--
Tatsuo Ishii

> L.S.
>
> I have a database created on:
>
> db=# select version();
>                                version
> ---------------------------------------------------------------------
>  PostgreSQL 8.0.1 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66
> (1 row)
>
>
> The initdb was done using no-locale and unicode as default encoding, the
> particular database itself is indeed encoded as UNICODE.
>
>
> Due to a buggy glibc, the following patch was applied to this install in order
> to avoid a crash on things like 'upper(<string>)':
>
> --- oracle_compat.c_orig        Mon Dec  6 22:14:11 2004
> +++ oracle_compat.c     Mon Dec  6 22:14:24 2004
> @@ -43,7 +43,7 @@
>   * We assume if we have these two functions, we have their friends too, and
>   * can use the wide-character method.
>   */
> -#if defined(HAVE_WCSTOMBS) && defined(HAVE_TOWLOWER)
> +#if defined(HAVE_WCSTOMBS) && defined(HAVE_TOWLOWER) && FALSE
>  #define USE_WIDE_UPPER_LOWER
>  #endif
>
>
> The database on this machine was dumped and then restored on another, which
> has a more recent installation of Slack on it:
>
>
> db=# select version();
>                                 version
> ------------------------------------------------------------------------
>  PostgreSQL 8.0.1 on i586-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.3
> (1 row)
>
>
> Again, the initdb on this machine was done using no-locale and unicode as
> default encoding, the particular database obviously is also encoded as
> UNICODE.
>
>
>
> On the second machine, I'm now getting the following:
>
> db=# select 'JÜTERBOG';
>  ?column?
> ----------
>  JÜTERBOG
> (1 row)
>
> db=# select lower('JÜTERBOG');
> ERROR:  invalid multibyte character for locale
> HINT:  The server's LC_CTYPE locale is probably incompatible with the database
> encoding.
>
>
>
> As far as I can tell, this didn't happen with v8.0.0, but I'm afraid I can't
> be totally sure about that. Obviously, the error doesn't occur on the first
> machine due to the hack needed for the buggy glibc.
>
>
> I'd appreciate a pointer as to what is causing this. It 'shouldn't' be the
> hack nor the dump/restore cycle, but.......?
>
>
> TIA.
>
>
>
> --
> Best,
>
>
>
>
> Frank.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

Re: invalid multibyte character for locale

From

Tom Lane

Date:

01 March 2005, 01:01:57

Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> BTW, the current code for upper/lower etc. seems to be broken. The
> exact problem you have are happening in Japanese encodings too(EUC_JP)
> too. PostgreSQL should not use wide-character method if LC_CTYPE is C.

Yeah, we came to that same conclusion a few days ago in another thread.
I am planning to install the fix but didn't get to it yet.

            regards, tom lane

Re: invalid multibyte character for locale

From

Frank van Vugt

Date:

01 March 2005, 12:01:08

Hi Tatsuo / Tom,

[TI]
> Apparently your hack does not kill #define USE_WIDE_UPPER_LOWER.

Mmm, I think it does, but mind you, the hack was applied to the first machine
only (since that was the one with the 'original' buggy glibc causing a
postmaster crash when using upper() and stuff), while it was the second one
producing the error. This second machine didn't seem to have problems using
upper() in earlier versions, but it looks like it does now.

Using the hack on the second machine obviously solves the problem there as
well, I agree ;)

[TI]
> BTW, the current code for upper/lower etc. seems to be broken.
> PostgreSQL should not use wide-character method if LC_CTYPE is C.

[TL]
> Yeah, we came to that same conclusion a few days ago in another thread.
> I am planning to install the fix <cut>

Great, no rush, it's an easily avoided issue ;)






--
Best,




Frank.