Thread: Problems with ñ and tildes / CSV import problems in PostgreSQL 9.1

Problems with ñ and tildes / CSV import problems in PostgreSQL 9.1

From
Zach Seaman
Date:
I'm fairly new to PostgreSQL 9.1 but I need it, so here I am.

This a similar question to this one, so I have encoded a database with LATIN-1 as suggested but can't copy a CSV file into a table within the database.

ERROR: invalid byte sequence for encoding "UTF8": 0xe17371

Googling doesn't get me anywhere and I am working with Spanish characters.


Thanks again all,

Zach Seaman

Re: [NOVICE] Problems with ñ and tildes / CSV import problems in PostgreSQL 9.1

From
Gurjeet Singh
Date:
On Wed, Feb 6, 2013 at 7:56 PM, Zach Seaman <znseaman@gmail.com> wrote:
I'm fairly new to PostgreSQL 9.1 but I need it, so here I am.

This a similar question to this one, so I have encoded a database with LATIN-1 as suggested but can't copy a CSV file into a table within the database.

ERROR: invalid byte sequence for encoding "UTF8": 0xe17371

Googling doesn't get me anywhere and I am working with Spanish characters.

I think the data in your CSV file should match the client_encoding parameter.

What is your client_encoding parameter set to?

show client_encoding;

--
Gurjeet Singh

http://gurjeet.singh.im/

Re: [NOVICE] Problems with ñ and tildes / CSV import problems in PostgreSQL 9.1

From
Jaime Casanova
Date:
On Wed, Feb 6, 2013 at 7:56 PM, Zach Seaman <znseaman@gmail.com> wrote:
> I'm fairly new to PostgreSQL 9.1 but I need it, so here I am.
>
> This a similar question to this one, so I have encoded a database with
> LATIN-1 as suggested but can't copy a CSV file into a table within the
> database.
>

well, that mail is from 2005... what version of postgres are you running at?

> ERROR: invalid byte sequence for encoding "UTF8": 0xe17371
>

run:

SET client_encoding TO UTF8;

before running the copy command, or maybe set to LATIN1

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566         Cell: +593 987171157


I think the problem may be that specific character translation.

The chart I typically use is here: http://www.utf8-chartable.de/unicode-utf8-table.pl

The 'valid' UTF-8 codes jump from
0x e0 bf bf (at the bottom of this page: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=3840 )
To: 0x e1 80 80 (at the top of this page: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=4096

So - the problem may be that truly 0x e1 73 71 is not a valid UTF-8 character in the current iteration of PostgreSQL - or at all.

Jut my thoughts.

Ken
On 2/7/2013 7:03 AM, Jaime Casanova wrote:
On Wed, Feb 6, 2013 at 7:56 PM, Zach Seaman <znseaman@gmail.com> wrote:
I'm fairly new to PostgreSQL 9.1 but I need it, so here I am.

This a similar question to this one, so I have encoded a database with
LATIN-1 as suggested but can't copy a CSV file into a table within the
database.

well, that mail is from 2005... what version of postgres are you running at?

ERROR: invalid byte sequence for encoding "UTF8": 0xe17371

run:

SET client_encoding TO UTF8;

before running the copy command, or maybe set to LATIN1


I'm running PostgreSQL 9.1


On Thu, Feb 7, 2013 at 9:03 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
On Wed, Feb 6, 2013 at 7:56 PM, Zach Seaman <znseaman@gmail.com> wrote:
> I'm fairly new to PostgreSQL 9.1 but I need it, so here I am.
>
> This a similar question to this one, so I have encoded a database with
> LATIN-1 as suggested but can't copy a CSV file into a table within the
> database.
>

well, that mail is from 2005... what version of postgres are you running at?

> ERROR: invalid byte sequence for encoding "UTF8": 0xe17371
>

run:

SET client_encoding TO UTF8;

before running the copy command, or maybe set to LATIN1

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566         Cell: +593 987171157



--
Zach Seaman
GIS Expert, IRRI-México

Master of Regional & Community Planning
m 55.2247.1740 (México)
m 01.913.4860.832 (U.S.)

Ken Benson <ken@infowerks.com> writes:
> So - the problem may be that /*truly**0x e1 73 71*/ is not a valid UTF-8
> character in the current iteration of PostgreSQL - or at all.

Of course it isn't, which is why Postgres is complaining.  Presumably
what that data really is is three characters (looks like "�sq") in
LATIN1.  But Postgres is trying to interpret it in UTF8.  As mentioned
upthread, the solution is to adjust the client_encoding setting before
running the COPY command.

            regards, tom lane


I changed from LATIN1, set my database to UTF8, and my client_encoding is UTF8.


ERROR:  invalid byte sequence for encoding "UTF8": 0xe17320
ás[space]

Is it a trial and error type problem now?



On Thu, Feb 7, 2013 at 10:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ken Benson <ken@infowerks.com> writes:
> So - the problem may be that /*truly**0x e1 73 71*/ is not a valid UTF-8
> character in the current iteration of PostgreSQL - or at all.

Of course it isn't, which is why Postgres is complaining.  Presumably
what that data really is is three characters (looks like "ásq") in
LATIN1.  But Postgres is trying to interpret it in UTF8.  As mentioned
upthread, the solution is to adjust the client_encoding setting before
running the COPY command.

                        regards, tom lane


--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice



--
Zach Seaman
GIS Expert, IRRI-México

Master of Regional & Community Planning
m 55.2247.1740 (México)
m 01.913.4860.832 (U.S.)

Keeping the names, in tact, would be helpful. Whatever I change it to, I receive the same error because of the first entry.

I've encoded the csv using Notepad++ to UTF8 and still no luck.

I think "á" followed by the next 2 characters causes the problem. Is there a better encoding for special characters? Is this possible in WIN-1252?


On Thu, Feb 7, 2013 at 10:51 AM, Zach Seaman <znseaman@gmail.com> wrote:
I changed from LATIN1, set my database to UTF8, and my client_encoding is UTF8.


ERROR:  invalid byte sequence for encoding "UTF8": 0xe17320
ás[space]

Is it a trial and error type problem now?



On Thu, Feb 7, 2013 at 10:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ken Benson <ken@infowerks.com> writes:
> So - the problem may be that /*truly**0x e1 73 71*/ is not a valid UTF-8
> character in the current iteration of PostgreSQL - or at all.

Of course it isn't, which is why Postgres is complaining.  Presumably
what that data really is is three characters (looks like "ásq") in
LATIN1.  But Postgres is trying to interpret it in UTF8.  As mentioned
upthread, the solution is to adjust the client_encoding setting before
running the COPY command.

                        regards, tom lane


--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice



--
Zach Seaman
GIS Expert, IRRI-México

Master of Regional & Community Planning
m 55.2247.1740 (México)
m 01.913.4860.832 (U.S.)




--
Zach Seaman
GIS Expert, IRRI-México

Master of Regional & Community Planning
m 55.2247.1740 (México)
m 01.913.4860.832 (U.S.)

Zach Seaman <znseaman@gmail.com> writes:
> I changed from LATIN1, set my database to UTF8, and my client_encoding is
> UTF8.

> ERROR:  invalid byte sequence for encoding "UTF8": 0xe17320
> �s[space]

No, the client encoding needs to be LATIN1 to read this file.

            regards, tom lane


Ok, client encoding is back to LATIN1.

Do I have to sacrifice the readability of these names or is there a way to work around this invalid byte sequence problem?



On Thu, Feb 7, 2013 at 11:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Zach Seaman <znseaman@gmail.com> writes:
> I changed from LATIN1, set my database to UTF8, and my client_encoding is
> UTF8.

> ERROR:  invalid byte sequence for encoding "UTF8": 0xe17320
> ás[space]

No, the client encoding needs to be LATIN1 to read this file.

                        regards, tom lane



--
Zach Seaman
GIS Expert, IRRI-México

Master of Regional & Community Planning
m 55.2247.1740 (México)
m 01.913.4860.832 (U.S.)

On Thu, Feb 7, 2013 at 12:05 PM, Zach Seaman <znseaman@gmail.com> wrote:
>
> Keeping the names, in tact, would be helpful. Whatever I change it to, I receive the same error because of the first
entry.
>
> I've encoded the csv using Notepad++ to UTF8 and still no luck.
>
> I think "á" followed by the next 2 characters causes the problem. Is there a better encoding for special characters?
Isthis possible in WIN-1252? 


Zach,
I've been bitten by this misunderstanding myself.   Changing the file
encoding in Notepad++  just changes a few bytes at the very beginning
of the file to indicate that it's supposed to be read as your new
encoding.  It does not automatically go through the file converting
character like "à" from its 224 (decimal) character value in LATIN1
encoding to the U+00E0 UTF-8 equivalent.   Maybe some other text
editors support actually re-encoding the characters in the file for
you, I don't know.

Good luck,
-Mike Swierczek