Thread: UTF characters compromising data import.

UTF characters compromising data import.

From
Gavin Beau Baumanis
Date:
Hi Everyone,

I am trying to import some data (provided to us from an external source) from a CSV file using "\copy ...."

But I get the following error message;
invalid byte sequence for encoding "UTF8": 0xfd
HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is
controlledby "client_encoding". 

I understand the error message - but what I don't know is what I need to set the encoding to - in order to import  /
usethe data. 

As always - thanks in advance for any help you might be able to provide.


Gavin "Beau" Baumanis


Re: UTF characters compromising data import.

From
Pavel Stehule
Date:
Hello

2011/2/8 Gavin Beau Baumanis <beau@palcare.com.au>:
> Hi Everyone,
>
> I am trying to import some data (provided to us from an external source) from a CSV file using "\copy ...."
>
> But I get the following error message;
> invalid byte sequence for encoding "UTF8": 0xfd
> HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is
controlledby "client_encoding". 
>
> I understand the error message - but what I don't know is what I need to set the encoding to - in order to import  /
usethe data. 
>

is impossible to import data without knowledge of encoding.

you can use a some utils, that try to select a encoding

http://linux.die.net/man/1/enca

Regards

Pavel Stehule


> As always - thanks in advance for any help you might be able to provide.
>
>
> Gavin "Beau" Baumanis
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>


Re: UTF characters compromising data import.

From
Jasen Betts
Date:
On 2011-02-08, Gavin Beau Baumanis <beau@palcare.com.au> wrote:

> I understand the error message - but what I don't know is what I
> need to set the encoding to - in order to import  / use the data.    

if you run it through  
 iconv --from-code=ASCII -to-code=UTF8 -c

it'll strip out all the non-ascii symbols,  without knowing the
encoding it's impossible to assign any useful meaning to them.
This step may render your data useless, it would be much better to
find out what the encoding should be.

perhaps you can figure it out by observation?

-- 
⚂⚃ 100% natural


Re: UTF characters compromising data import.

From
Gavin Beau Baumanis
Date:
Hi and thanks for the replies,

I have had some luck.
I did find the encoding used originally to create the text files I am trying to import.

I have managed to use the client_encoding environmental variable and then successfully did manage to import the data.

Gavin.




On 12/02/2011, at 8:15 PM, Jasen Betts wrote:

> On 2011-02-08, Gavin Beau Baumanis <beau@palcare.com.au> wrote:
>
>> I understand the error message - but what I don't know is what I
>> need to set the encoding to - in order to import  / use the data.
>
> if you run it through
>
>  iconv --from-code=ASCII -to-code=UTF8 -c
>
> it'll strip out all the non-ascii symbols,  without knowing the
> encoding it's impossible to assign any useful meaning to them.
> This step may render your data useless, it would be much better to
> find out what the encoding should be.
>
> perhaps you can figure it out by observation?
>
> --
> ⚂⚃ 100% natural
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql