Thread: Strange query results with invalid multibyte character
Hi, I have a strange problem. I have 2 database, both created with: CREATE DATABASE muntdev WITH OWNER = postgres ENCODING = 'UTF8' TABLESPACE = pg_default; Running the queries below on the first database is OK and do what I expect. If I create a backup of the first datase and restore that backup in the second database the following happens: The first query (see below) returns 17 records, all containing 'Boonk%'. The second query (see below)returns 'ERROR: invalid multibyte character for locale' Query 1: select lastname from salesordercustomer where lower(lastname) like 'boonk%' Query 2: select lastname from salesordercustomer where upper(lastname) like 'BOONK%' Both database are running *exactly* the same version of Debian Etch AMD64 and *exactly* the same version of PostgreSQL (8.1.4for AMD64) .... Can anyone explain this to me? TIA Joost
Hi Joost. (hoe gaat het ermee?) I would like to test this too. Could you please provide the data you are inserting into the database. Regards, Gevik On Wed, 2006-09-27 at 13:13 +0200, Joost Kraaijeveld wrote: > Hi, > > I have a strange problem. > > I have 2 database, both created with: > CREATE DATABASE muntdev WITH OWNER = postgres ENCODING = 'UTF8' TABLESPACE = pg_default; > > Running the queries below on the first database is OK and do what I expect. > > If I create a backup of the first datase and restore that backup in the second database the following happens: > The first query (see below) returns 17 records, all containing 'Boonk%'. > The second query (see below)returns 'ERROR: invalid multibyte character for locale' > > Query 1: > select lastname from salesordercustomer where lower(lastname) like 'boonk%' > > Query 2: > select lastname from salesordercustomer where upper(lastname) like 'BOONK%' > > Both database are running *exactly* the same version of Debian Etch AMD64 and *exactly* the same version of PostgreSQL(8.1.4 for AMD64) .... > > Can anyone explain this to me? > > TIA > > Joost > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend > -- Regards, Gevik Babakhani
"Joost Kraaijeveld" <J.Kraaijeveld@Askesis.nl> writes: > I have 2 database, both created with: > CREATE DATABASE muntdev WITH OWNER = postgres ENCODING = 'UTF8' TABLESPACE = pg_default; But have they got the same locale settings (lc_collate, lc_ctype)? regards, tom lane
Hi Tom, On Wed, 2006-09-27 at 11:00 -0400, Tom Lane wrote: > "Joost Kraaijeveld" <J.Kraaijeveld@Askesis.nl> writes: > > I have 2 database, both created with: > > CREATE DATABASE muntdev WITH OWNER = postgres ENCODING = 'UTF8' TABLESPACE = pg_default; > > But have they got the same locale settings (lc_collate, lc_ctype)? According to the postgresql.conf of the *working* database the locales are: lc_messages = 'C' lc_monetary = 'C' lc_numeric = 'C' lc_time = 'C' According to the other obe: lc_messages = 'en_US' lc_monetary = 'en_US' lc_numeric = 'en_US' lc_time = 'en_US' Could this be the difference? Is there any documentation available somewhere, on how these locale settings work and interact (in combination with the server and/or client platform???), besides the explanation in the PostgreSQL manual (http://www.postgresql.org/docs/8.1/interactive/charset.html#LOCALE) (which is to terse for me to understand)? TIA -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 web: www.askesis.nl
Joost Kraaijeveld <J.Kraaijeveld@Askesis.nl> writes: > On Wed, 2006-09-27 at 11:00 -0400, Tom Lane wrote: >> But have they got the same locale settings (lc_collate, lc_ctype)? > According to the postgresql.conf of the *working* database the locales > are: > lc_messages = 'C' > lc_monetary = 'C' > lc_numeric = 'C' > lc_time = 'C' > According to the other obe: > lc_messages = 'en_US' > lc_monetary = 'en_US' > lc_numeric = 'en_US' > lc_time = 'en_US' You did not show us the settings that actually count, but based on this I'm guessing they are en_US also. What you need to find out next is what character set encoding that locale implies on your machine. I'm betting it's not utf8 though :-( regards, tom lane
On Wed, 2006-09-27 at 12:10 -0400, Tom Lane wrote: > Joost Kraaijeveld <J.Kraaijeveld@Askesis.nl> writes: > > On Wed, 2006-09-27 at 11:00 -0400, Tom Lane wrote: > >> But have they got the same locale settings (lc_collate, lc_ctype)? > > > According to the postgresql.conf of the *working* database the locales > > are: > > lc_messages = 'C' > > lc_monetary = 'C' > > lc_numeric = 'C' > > lc_time = 'C' > > > According to the other obe: > > lc_messages = 'en_US' > > lc_monetary = 'en_US' > > lc_numeric = 'en_US' > > lc_time = 'en_US' > > You did not show us the settings that actually count, but based on this > I'm guessing they are en_US also Ah, sorry: "show lc_collate" and "show lc_ctype" confirm your guess. > What you need to find out next is > what character set encoding that locale implies on your machine. I'm > betting it's not utf8 though :-( I am not sure what you mean by that but maybe this helps: the output of "locale" on the working machine is: LANG= LANGUAGE=en_US:en_GB LC_CTYPE="POSIX" LC_NUMERIC="POSIX" LC_TIME="POSIX" LC_COLLATE="POSIX" LC_MONETARY="POSIX" LC_MESSAGES="POSIX" LC_PAPER="POSIX" LC_NAME="POSIX" LC_ADDRESS="POSIX" LC_TELEPHONE="POSIX" LC_MEASUREMENT="POSIX" LC_IDENTIFICATION="POSIX" LC_ALL= The output of the same command on the non-working machine: LANG=en_US LANGUAGE=en_NL:en_US:en_GB:en LC_CTYPE="en_US" LC_NUMERIC="en_US" LC_TIME="en_US" LC_COLLATE="en_US" LC_MONETARY="en_US" LC_MESSAGES="en_US" LC_PAPER="en_US" LC_NAME="en_US" LC_ADDRESS="en_US" LC_TELEPHONE="en_US" LC_MEASUREMENT="en_US" LC_IDENTIFICATION="en_US" LC_ALL= If this is not what you mean, could you help me in the right direction? -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 web: www.askesis.nl
Joost Kraaijeveld <J.Kraaijeveld@Askesis.nl> writes: > On Wed, 2006-09-27 at 12:10 -0400, Tom Lane wrote: >> What you need to find out next is >> what character set encoding that locale implies on your machine. I'm >> betting it's not utf8 though :-( > If this is not what you mean, could you help me in the right direction? Try this command: LANG=en_US locale charmap regards, tom lane
Hi Tom, Thanks for putting up with the questions. On Wed, 2006-09-27 at 14:58 -0400, Tom Lane wrote: > LANG=en_US locale charmap > I have done this on both machines: The working machine says: ISO-8859-1 The not working machine says: ISO-8859-1 I still do not understand what is happening and maybe more important, how I can solve this (without re-installing the OS / database). A better understanding of how the server OS, database, codepages, client OS, charmaps etc work would be nice.... -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 web: www.askesis.nl
Joost Kraaijeveld <J.Kraaijeveld@Askesis.nl> writes: > On Wed, 2006-09-27 at 14:58 -0400, Tom Lane wrote: >> LANG=en_US locale charmap > The working machine says: ISO-8859-1 > The not working machine says: ISO-8859-1 OK, so the problem is that you have a locale that expects ISO-8859-1 encoding, but the database is set to UTF8 encoding, and that means it may contain characters that the locale can't deal with. Locale-dependent operations ... like upper() ... will fail when that happens. If we had a more reliable way of telling what encoding the locale expects, we'd probably forbid creation of databases with incompatible encodings. > I still do not understand what is happening and maybe more important, > how I can solve this (without re-installing the OS / database). If you want to use UTF8 then you're stuck with a re-initdb I'm afraid. regards, tom lane