Thread: accented characters migraine

accented characters migraine

From

John Gunther

Date:

12 October 2007, 16:13:33

It seems to me this ought to be simple and clearly documented but I've
spent hours researching and experimenting to no avail.

PROBLEM: Entering accented characters in psql often results in the
error: invalid byte sequence for encoding "UTF8"

ENVIRONMENT:
Client OS: Windows XP
Keyboard: United States-International
Terminal program: putty.exe, Translation: ISO-8859-1:1998 (Latin-1, West
Europe)
Server OS: Ubuntu
Server client app: psql 8.2.4
Server db app: PostgreSQL 8.2.4
pg settings:
client_encoding: UTF8
lc_collate: en_US.UTF-8
lc_ctype: en_US.UTF-8
server_encoding UTF8

initdb defaulted to UTF-8, which I need because I want ORDER BY to sort
alphabetically, not by hex code.

When I try to insert a string with an accented character, I generally
get the above error. Simple example:
template1=# \d sorttest
id     | integer
test   | text

template1=# insert into sorttest (test) values ('ã');
ERROR:  invalid byte sequence for encoding "UTF8": 0xe32729
HINT:  This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".

The accented character (a-tilde) is entered from the Windows keyboard
with the ~a sequence and displays properly in psql. The problem is that
the server rejects it.
Observations:
1) The Unicode hex value of a-tilde is C3 A3 but the error message says
the invalid sequence is E3 27 29. I don't know what the first byte means
but the second and third are the quote and right parenthesis characters
following the a-tilde in my insert statement.
2) At various times, data entry as above has started working in a
session but I can't figure out what I did to make it happen.
3) I tried entering the character in hex, as I understand it: insert
into sorttest (test) values (E'\xc3\xa3');
This avoids the error but the string value then displays as the 2
seemingly irrelevant characters Ã£ (A-tilde, British pound)

It looks like I'm caught in some interaction between putty, psql and pg.
The real problem is much more grave than just manual data entry-- I'm
trying to migrate a large existing database from another pg server with:
pg_dumpall -h nnn.nnn.nnn.nnn | psql
This throws errors each time the COPY commands encounter an accented
character in the dump.

Any ideas? Is this just a bonehead mistake on my part?

John

Re: accented characters migraine

From

"Wright, George"

Date:

12 October 2007, 16:20:23

Putty is showing ISO-8858-1 which is Latin. I believe both client and server must be UTF-8.

-----Original Message-----
From: pgsql-novice-owner@postgresql.org [mailto:pgsql-novice-owner@postgresql.org] On Behalf Of John Gunther
Sent: Friday, October 12, 2007 11:59 AM
To: pgsql-novice@postgresql.org
Subject: [NOVICE] accented characters migraine

It seems to me this ought to be simple and clearly documented but I've
spent hours researching and experimenting to no avail.

PROBLEM: Entering accented characters in psql often results in the
error: invalid byte sequence for encoding "UTF8"

ENVIRONMENT:
Client OS: Windows XP
Keyboard: United States-International
Terminal program: putty.exe, Translation: ISO-8859-1:1998 (Latin-1, West
Europe)
Server OS: Ubuntu
Server client app: psql 8.2.4
Server db app: PostgreSQL 8.2.4
pg settings:
client_encoding: UTF8
lc_collate: en_US.UTF-8
lc_ctype: en_US.UTF-8
server_encoding UTF8

initdb defaulted to UTF-8, which I need because I want ORDER BY to sort
alphabetically, not by hex code.

When I try to insert a string with an accented character, I generally
get the above error. Simple example:
template1=# \d sorttest
id     | integer
test   | text

template1=# insert into sorttest (test) values ('ã');
ERROR:  invalid byte sequence for encoding "UTF8": 0xe32729
HINT:  This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".

The accented character (a-tilde) is entered from the Windows keyboard
with the ~a sequence and displays properly in psql. The problem is that
the server rejects it.
Observations:
1) The Unicode hex value of a-tilde is C3 A3 but the error message says
the invalid sequence is E3 27 29. I don't know what the first byte means
but the second and third are the quote and right parenthesis characters
following the a-tilde in my insert statement.
2) At various times, data entry as above has started working in a
session but I can't figure out what I did to make it happen.
3) I tried entering the character in hex, as I understand it: insert
into sorttest (test) values (E'\xc3\xa3');
This avoids the error but the string value then displays as the 2
seemingly irrelevant characters Ã£ (A-tilde, British pound)

It looks like I'm caught in some interaction between putty, psql and pg.
The real problem is much more grave than just manual data entry-- I'm
trying to migrate a large existing database from another pg server with:
pg_dumpall -h nnn.nnn.nnn.nnn | psql
This throws errors each time the COPY commands encounter an accented
character in the dump.

Any ideas? Is this just a bonehead mistake on my part?

John

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: accented characters migraine

From

Oliver Elphick

Date:

12 October 2007, 16:46:27

On Fri, 2007-10-12 at 11:19 -0500, Wright, George wrote:
> Putty is showing ISO-8858-1 which is Latin. I believe both client and
> server must be UTF-8.
>
>
>
> -----Original Message-----
> ENVIRONMENT:
> Terminal program: putty.exe, Translation: ISO-8859-1:1998 (Latin-1, West
> Europe)
> pg settings:
> client_encoding: UTF8
> lc_collate: en_US.UTF-8
> lc_ctype: en_US.UTF-8
> server_encoding UTF8
>
> initdb defaulted to UTF-8, which I need because I want ORDER BY to sort
> alphabetically, not by hex code.

I think you need to set your client encoding to Latin.  At the moment it
thinks the client encoding is Unicode but you are feeding it Latin
characters.

The command you need is:

  set client_encoding to 'latin1';

Alternatively, set putty to output UTF-8, if it can do that.

Sorting is done by the backend, and the database is UTF-8, so you don't
need to change anything there.

--
Oliver Elphick                                          olly@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver
GPG: 1024D/A54310EA  92C8 39E7 280E 3631 3F0E  1EC0 5664 7A2F A543 10EA
                 ========================================
   Do you want to know God?   http://www.lfix.co.uk/knowing_god.html


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Re: accented characters migraine

From

Tom Lane

Date:

12 October 2007, 16:50:04

John Gunther <postgresql@bucksvsbytes.com> writes:
> PROBLEM: Entering accented characters in psql often results in the
> error: invalid byte sequence for encoding "UTF8"

> ENVIRONMENT:
> Terminal program: putty.exe, Translation: ISO-8859-1:1998 (Latin-1, West
> Europe)
> client_encoding: UTF8

Those two settings need to match up.  PG can convert from latin1 to
utf8, but only when you tell it the truth about what encoding the client
is using.

            regards, tom lane

Re: accented characters migraine

From

John Gunther

Date:

12 October 2007, 17:50:37

Seems to have done the trick this time. When I tried that earlier the
only difference was that accented characters displayed as gray
rectangles. It was boneheaded. Thanks.

Wright, George wrote:
> Putty is showing ISO-8858-1 which is Latin. I believe both client and server must be UTF-8.
>
>
>
> -----Original Message-----
> From: pgsql-novice-owner@postgresql.org [mailto:pgsql-novice-owner@postgresql.org] On Behalf Of John Gunther
> Sent: Friday, October 12, 2007 11:59 AM
> To: pgsql-novice@postgresql.org
> Subject: [NOVICE] accented characters migraine
>
> It seems to me this ought to be simple and clearly documented but I've
> spent hours researching and experimenting to no avail.
>
> PROBLEM: Entering accented characters in psql often results in the
> error: invalid byte sequence for encoding "UTF8"
>
> ENVIRONMENT:
> Client OS: Windows XP
> Keyboard: United States-International
> Terminal program: putty.exe, Translation: ISO-8859-1:1998 (Latin-1, West
> Europe)
> Server OS: Ubuntu
> Server client app: psql 8.2.4
> Server db app: PostgreSQL 8.2.4
> pg settings:
> client_encoding: UTF8
> lc_collate: en_US.UTF-8
> lc_ctype: en_US.UTF-8
> server_encoding UTF8
>
> initdb defaulted to UTF-8, which I need because I want ORDER BY to sort
> alphabetically, not by hex code.
>
> When I try to insert a string with an accented character, I generally
> get the above error. Simple example:
> template1=# \d sorttest
> id     | integer
> test   | text
>
> template1=# insert into sorttest (test) values ('ã');
> ERROR:  invalid byte sequence for encoding "UTF8": 0xe32729
> HINT:  This error can also happen if the byte sequence does not match
> the encoding expected by the server, which is controlled by
> "client_encoding".
>
> The accented character (a-tilde) is entered from the Windows keyboard
> with the ~a sequence and displays properly in psql. The problem is that
> the server rejects it.
> Observations:
> 1) The Unicode hex value of a-tilde is C3 A3 but the error message says
> the invalid sequence is E3 27 29. I don't know what the first byte means
> but the second and third are the quote and right parenthesis characters
> following the a-tilde in my insert statement.
> 2) At various times, data entry as above has started working in a
> session but I can't figure out what I did to make it happen.
> 3) I tried entering the character in hex, as I understand it: insert
> into sorttest (test) values (E'\xc3\xa3');
> This avoids the error but the string value then displays as the 2
> seemingly irrelevant characters Ã£ (A-tilde, British pound)
>
> It looks like I'm caught in some interaction between putty, psql and pg.
> The real problem is much more grave than just manual data entry-- I'm
> trying to migrate a large existing database from another pg server with:
> pg_dumpall -h nnn.nnn.nnn.nnn | psql
> This throws errors each time the COPY commands encounter an accented
> character in the dump.
>
> Any ideas? Is this just a bonehead mistake on my part?
>
> John
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>
>
>

Populating an array from a select statement

From

John Gunther

Date:

18 February 2008, 12:33:27

What's the most straightforward way to populate an array from a select
statement? For example, using a fictional extension of SQL syntax, I'd like:

update users set emails=ARRAY[select email from address where userid=25]
where id=25;

So if user 25  has emails john@domain.com, john@gmail.com, and
john@yahoo.com in the address table,
select emails from user where id=25;
will return:
                                          emails
----------------------------------------------------------------
 {john@domain.com, john@gmail.com, and john@yahoo.com}


If there's no built in way to do this, I'd suggest it as a future feature.

Re: Populating an array from a select statement

From

"A. Kretschmer"

Date:

18 February 2008, 12:47:22

am  Mon, dem 18.02.2008, um  8:22:14 -0500 mailte John Gunther folgendes:
> What's the most straightforward way to populate an array from a select
> statement? For example, using a fictional extension of SQL syntax, I'd like:
>
> update users set emails=ARRAY[select email from address where userid=25]
> where id=25;
>
> So if user 25  has emails john@domain.com, john@gmail.com, and
> john@yahoo.com in the address table,
> select emails from user where id=25;
> will return:
>                                          emails
> ----------------------------------------------------------------
> {john@domain.com, john@gmail.com, and john@yahoo.com}

You can use array_to_string() and array().

Example:

test=*# select * from mail_adr ;
 id |     email
----+---------------
  2 | foo@bar
  2 | bar@batz
  2 | foobar@barfoo
(3 rows)

test=*# select array_to_string(array(select email from mail_Adr where id=2), ', ');
         array_to_string
----------------------------------
 foo@bar, bar@batz, foobar@barfoo
(1 row)


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: Populating an array from a select statement

From

John Gunther

Date:

18 February 2008, 13:16:32

Thanks, Andreas. That was easy. I thought I new all the value
expression types, but following your clue, I've learned the last two,
ARRAY() and ROW(), from Section 4.2.

John

A. Kretschmer wrote:
<blockquote cite="mid:20080218134715.GC23570@a-kretschmer.de"
 type="cite">
  am  Mon, dem 18.02.2008, um  8:22:14 -0500 mailte John Gunther folgendes:


    What's the most straightforward way to populate an array from a select
statement? For example, using a fictional extension of SQL syntax, I'd like:

update users set emails=ARRAY[select email from address where userid=25]
where id=25;

So if user 25  has emails john@domain.com, john@gmail.com, and
john@yahoo.com in the address table,
select emails from user where id=25;
will return:
                                         emails
----------------------------------------------------------------
{john@domain.com, john@gmail.com, and john@yahoo.com}



You can use array_to_string() and array().

Example:

test=*# select * from mail_adr ;
 id |     email
----+---------------
  2 | foo@bar
  2 | bar@batz
  2 | foobar@barfoo
(3 rows)

test=*# select array_to_string(array(select email from mail_Adr where id=2), ', ');
         array_to_string
----------------------------------
 foo@bar, bar@batz, foobar@barfoo
(1 row)


Andreas

Re: Populating an array from a select statement

From

"Mag Gam"

Date:

27 February 2008, 19:22:48

Just out of curiously, why would you ever want to do that? (I am not saying your method is wrong, but SQL should be ample, no?)

On Mon, Feb 18, 2008 at 9:15 AM, John Gunther <postgresql@bucksvsbytes.com> wrote:

Thanks, Andreas. That was easy. I thought I new all the value expression types, but following your clue, I've learned the last two, ARRAY() and ROW(), from Section 4.2.

John

A. Kretschmer wrote:

am  Mon, dem 18.02.2008, um  8:22:14 -0500 mailte John Gunther folgendes:

What's the most straightforward way to populate an array from a select 
statement? For example, using a fictional extension of SQL syntax, I'd like:

update users set emails=ARRAY[select email from address where userid=25] 
where id=25;

So if user 25  has emails john@domain.com, john@gmail.com, and 
john@yahoo.com in the address table,
select emails from user where id=25;
will return:                                        emails
----------------------------------------------------------------
{john@domain.com, john@gmail.com, and john@yahoo.com}

You can use array_to_string() and array().

Example:

test=*# select * from mail_adr ;id |     email
----+--------------- 2 | foo@bar 2 | bar@batz 2 | foobar@barfoo
(3 rows)

test=*# select array_to_string(array(select email from mail_Adr where id=2), ', ');        array_to_string
----------------------------------foo@bar, bar@batz, foobar@barfoo
(1 row)


Andreas