Thread: accented characters migraine
It seems to me this ought to be simple and clearly documented but I've spent hours researching and experimenting to no avail. PROBLEM: Entering accented characters in psql often results in the error: invalid byte sequence for encoding "UTF8" ENVIRONMENT: Client OS: Windows XP Keyboard: United States-International Terminal program: putty.exe, Translation: ISO-8859-1:1998 (Latin-1, West Europe) Server OS: Ubuntu Server client app: psql 8.2.4 Server db app: PostgreSQL 8.2.4 pg settings: client_encoding: UTF8 lc_collate: en_US.UTF-8 lc_ctype: en_US.UTF-8 server_encoding UTF8 initdb defaulted to UTF-8, which I need because I want ORDER BY to sort alphabetically, not by hex code. When I try to insert a string with an accented character, I generally get the above error. Simple example: template1=# \d sorttest id | integer test | text template1=# insert into sorttest (test) values ('ã'); ERROR: invalid byte sequence for encoding "UTF8": 0xe32729 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". The accented character (a-tilde) is entered from the Windows keyboard with the ~a sequence and displays properly in psql. The problem is that the server rejects it. Observations: 1) The Unicode hex value of a-tilde is C3 A3 but the error message says the invalid sequence is E3 27 29. I don't know what the first byte means but the second and third are the quote and right parenthesis characters following the a-tilde in my insert statement. 2) At various times, data entry as above has started working in a session but I can't figure out what I did to make it happen. 3) I tried entering the character in hex, as I understand it: insert into sorttest (test) values (E'\xc3\xa3'); This avoids the error but the string value then displays as the 2 seemingly irrelevant characters ã (A-tilde, British pound) It looks like I'm caught in some interaction between putty, psql and pg. The real problem is much more grave than just manual data entry-- I'm trying to migrate a large existing database from another pg server with: pg_dumpall -h nnn.nnn.nnn.nnn | psql This throws errors each time the COPY commands encounter an accented character in the dump. Any ideas? Is this just a bonehead mistake on my part? John
Putty is showing ISO-8858-1 which is Latin. I believe both client and server must be UTF-8. -----Original Message----- From: pgsql-novice-owner@postgresql.org [mailto:pgsql-novice-owner@postgresql.org] On Behalf Of John Gunther Sent: Friday, October 12, 2007 11:59 AM To: pgsql-novice@postgresql.org Subject: [NOVICE] accented characters migraine It seems to me this ought to be simple and clearly documented but I've spent hours researching and experimenting to no avail. PROBLEM: Entering accented characters in psql often results in the error: invalid byte sequence for encoding "UTF8" ENVIRONMENT: Client OS: Windows XP Keyboard: United States-International Terminal program: putty.exe, Translation: ISO-8859-1:1998 (Latin-1, West Europe) Server OS: Ubuntu Server client app: psql 8.2.4 Server db app: PostgreSQL 8.2.4 pg settings: client_encoding: UTF8 lc_collate: en_US.UTF-8 lc_ctype: en_US.UTF-8 server_encoding UTF8 initdb defaulted to UTF-8, which I need because I want ORDER BY to sort alphabetically, not by hex code. When I try to insert a string with an accented character, I generally get the above error. Simple example: template1=# \d sorttest id | integer test | text template1=# insert into sorttest (test) values ('ã'); ERROR: invalid byte sequence for encoding "UTF8": 0xe32729 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". The accented character (a-tilde) is entered from the Windows keyboard with the ~a sequence and displays properly in psql. The problem is that the server rejects it. Observations: 1) The Unicode hex value of a-tilde is C3 A3 but the error message says the invalid sequence is E3 27 29. I don't know what the first byte means but the second and third are the quote and right parenthesis characters following the a-tilde in my insert statement. 2) At various times, data entry as above has started working in a session but I can't figure out what I did to make it happen. 3) I tried entering the character in hex, as I understand it: insert into sorttest (test) values (E'\xc3\xa3'); This avoids the error but the string value then displays as the 2 seemingly irrelevant characters ã (A-tilde, British pound) It looks like I'm caught in some interaction between putty, psql and pg. The real problem is much more grave than just manual data entry-- I'm trying to migrate a large existing database from another pg server with: pg_dumpall -h nnn.nnn.nnn.nnn | psql This throws errors each time the COPY commands encounter an accented character in the dump. Any ideas? Is this just a bonehead mistake on my part? John ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
On Fri, 2007-10-12 at 11:19 -0500, Wright, George wrote: > Putty is showing ISO-8858-1 which is Latin. I believe both client and > server must be UTF-8. > > > > -----Original Message----- > ENVIRONMENT: > Terminal program: putty.exe, Translation: ISO-8859-1:1998 (Latin-1, West > Europe) > pg settings: > client_encoding: UTF8 > lc_collate: en_US.UTF-8 > lc_ctype: en_US.UTF-8 > server_encoding UTF8 > > initdb defaulted to UTF-8, which I need because I want ORDER BY to sort > alphabetically, not by hex code. I think you need to set your client encoding to Latin. At the moment it thinks the client encoding is Unicode but you are feeding it Latin characters. The command you need is: set client_encoding to 'latin1'; Alternatively, set putty to output UTF-8, if it can do that. Sorting is done by the backend, and the database is UTF-8, so you don't need to change anything there. -- Oliver Elphick olly@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver GPG: 1024D/A54310EA 92C8 39E7 280E 3631 3F0E 1EC0 5664 7A2F A543 10EA ======================================== Do you want to know God? http://www.lfix.co.uk/knowing_god.html -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
John Gunther <postgresql@bucksvsbytes.com> writes: > PROBLEM: Entering accented characters in psql often results in the > error: invalid byte sequence for encoding "UTF8" > ENVIRONMENT: > Terminal program: putty.exe, Translation: ISO-8859-1:1998 (Latin-1, West > Europe) > client_encoding: UTF8 Those two settings need to match up. PG can convert from latin1 to utf8, but only when you tell it the truth about what encoding the client is using. regards, tom lane
Seems to have done the trick this time. When I tried that earlier the only difference was that accented characters displayed as gray rectangles. It was boneheaded. Thanks. Wright, George wrote: > Putty is showing ISO-8858-1 which is Latin. I believe both client and server must be UTF-8. > > > > -----Original Message----- > From: pgsql-novice-owner@postgresql.org [mailto:pgsql-novice-owner@postgresql.org] On Behalf Of John Gunther > Sent: Friday, October 12, 2007 11:59 AM > To: pgsql-novice@postgresql.org > Subject: [NOVICE] accented characters migraine > > It seems to me this ought to be simple and clearly documented but I've > spent hours researching and experimenting to no avail. > > PROBLEM: Entering accented characters in psql often results in the > error: invalid byte sequence for encoding "UTF8" > > ENVIRONMENT: > Client OS: Windows XP > Keyboard: United States-International > Terminal program: putty.exe, Translation: ISO-8859-1:1998 (Latin-1, West > Europe) > Server OS: Ubuntu > Server client app: psql 8.2.4 > Server db app: PostgreSQL 8.2.4 > pg settings: > client_encoding: UTF8 > lc_collate: en_US.UTF-8 > lc_ctype: en_US.UTF-8 > server_encoding UTF8 > > initdb defaulted to UTF-8, which I need because I want ORDER BY to sort > alphabetically, not by hex code. > > When I try to insert a string with an accented character, I generally > get the above error. Simple example: > template1=# \d sorttest > id | integer > test | text > > template1=# insert into sorttest (test) values ('ã'); > ERROR: invalid byte sequence for encoding "UTF8": 0xe32729 > HINT: This error can also happen if the byte sequence does not match > the encoding expected by the server, which is controlled by > "client_encoding". > > The accented character (a-tilde) is entered from the Windows keyboard > with the ~a sequence and displays properly in psql. The problem is that > the server rejects it. > Observations: > 1) The Unicode hex value of a-tilde is C3 A3 but the error message says > the invalid sequence is E3 27 29. I don't know what the first byte means > but the second and third are the quote and right parenthesis characters > following the a-tilde in my insert statement. > 2) At various times, data entry as above has started working in a > session but I can't figure out what I did to make it happen. > 3) I tried entering the character in hex, as I understand it: insert > into sorttest (test) values (E'\xc3\xa3'); > This avoids the error but the string value then displays as the 2 > seemingly irrelevant characters ã (A-tilde, British pound) > > It looks like I'm caught in some interaction between putty, psql and pg. > The real problem is much more grave than just manual data entry-- I'm > trying to migrate a large existing database from another pg server with: > pg_dumpall -h nnn.nnn.nnn.nnn | psql > This throws errors each time the COPY commands encounter an accented > character in the dump. > > Any ideas? Is this just a bonehead mistake on my part? > > John > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > > >
What's the most straightforward way to populate an array from a select statement? For example, using a fictional extension of SQL syntax, I'd like: update users set emails=ARRAY[select email from address where userid=25] where id=25; So if user 25 has emails john@domain.com, john@gmail.com, and john@yahoo.com in the address table, select emails from user where id=25; will return: emails ---------------------------------------------------------------- {john@domain.com, john@gmail.com, and john@yahoo.com} If there's no built in way to do this, I'd suggest it as a future feature.
am Mon, dem 18.02.2008, um 8:22:14 -0500 mailte John Gunther folgendes: > What's the most straightforward way to populate an array from a select > statement? For example, using a fictional extension of SQL syntax, I'd like: > > update users set emails=ARRAY[select email from address where userid=25] > where id=25; > > So if user 25 has emails john@domain.com, john@gmail.com, and > john@yahoo.com in the address table, > select emails from user where id=25; > will return: > emails > ---------------------------------------------------------------- > {john@domain.com, john@gmail.com, and john@yahoo.com} You can use array_to_string() and array(). Example: test=*# select * from mail_adr ; id | email ----+--------------- 2 | foo@bar 2 | bar@batz 2 | foobar@barfoo (3 rows) test=*# select array_to_string(array(select email from mail_Adr where id=2), ', '); array_to_string ---------------------------------- foo@bar, bar@batz, foobar@barfoo (1 row) Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
Thanks, Andreas. That was easy. I thought I new all the value expression types, but following your clue, I've learned the last two, ARRAY() and ROW(), from Section 4.2. John A. Kretschmer wrote: <blockquote cite="mid:20080218134715.GC23570@a-kretschmer.de" type="cite"> am Mon, dem 18.02.2008, um 8:22:14 -0500 mailte John Gunther folgendes: What's the most straightforward way to populate an array from a select statement? For example, using a fictional extension of SQL syntax, I'd like: update users set emails=ARRAY[select email from address where userid=25] where id=25; So if user 25 has emails john@domain.com, john@gmail.com, and john@yahoo.com in the address table, select emails from user where id=25; will return: emails ---------------------------------------------------------------- {john@domain.com, john@gmail.com, and john@yahoo.com} You can use array_to_string() and array(). Example: test=*# select * from mail_adr ; id | email ----+--------------- 2 | foo@bar 2 | bar@batz 2 | foobar@barfoo (3 rows) test=*# select array_to_string(array(select email from mail_Adr where id=2), ', '); array_to_string ---------------------------------- foo@bar, bar@batz, foobar@barfoo (1 row) Andreas
Just out of curiously, why would you ever want to do that? (I am not saying your method is wrong, but SQL should be ample, no?)
On Mon, Feb 18, 2008 at 9:15 AM, John Gunther <postgresql@bucksvsbytes.com> wrote:
Thanks, Andreas. That was easy. I thought I new all the value expression types, but following your clue, I've learned the last two, ARRAY() and ROW(), from Section 4.2.
John
A. Kretschmer wrote:am Mon, dem 18.02.2008, um 8:22:14 -0500 mailte John Gunther folgendes:What's the most straightforward way to populate an array from a select statement? For example, using a fictional extension of SQL syntax, I'd like: update users set emails=ARRAY[select email from address where userid=25] where id=25; So if user 25 has emails john@domain.com, john@gmail.com, and john@yahoo.com in the address table, select emails from user where id=25; will return: emails ---------------------------------------------------------------- {john@domain.com, john@gmail.com, and john@yahoo.com}You can use array_to_string() and array(). Example: test=*# select * from mail_adr ;id | email ----+--------------- 2 | foo@bar 2 | bar@batz 2 | foobar@barfoo (3 rows) test=*# select array_to_string(array(select email from mail_Adr where id=2), ', '); array_to_string ----------------------------------foo@bar, bar@batz, foobar@barfoo (1 row) Andreas