Thread: UTF characters compromising data import.
Hi Everyone, I am trying to import some data (provided to us from an external source) from a CSV file using "\copy ...." But I get the following error message; invalid byte sequence for encoding "UTF8": 0xfd HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlledby "client_encoding". I understand the error message - but what I don't know is what I need to set the encoding to - in order to import / usethe data. As always - thanks in advance for any help you might be able to provide. Gavin "Beau" Baumanis
Hello 2011/2/8 Gavin Beau Baumanis <beau@palcare.com.au>: > Hi Everyone, > > I am trying to import some data (provided to us from an external source) from a CSV file using "\copy ...." > > But I get the following error message; > invalid byte sequence for encoding "UTF8": 0xfd > HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlledby "client_encoding". > > I understand the error message - but what I don't know is what I need to set the encoding to - in order to import / usethe data. > is impossible to import data without knowledge of encoding. you can use a some utils, that try to select a encoding http://linux.die.net/man/1/enca Regards Pavel Stehule > As always - thanks in advance for any help you might be able to provide. > > > Gavin "Beau" Baumanis > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql >
On 2011-02-08, Gavin Beau Baumanis <beau@palcare.com.au> wrote: > I understand the error message - but what I don't know is what I > need to set the encoding to - in order to import / use the data. if you run it through iconv --from-code=ASCII -to-code=UTF8 -c it'll strip out all the non-ascii symbols, without knowing the encoding it's impossible to assign any useful meaning to them. This step may render your data useless, it would be much better to find out what the encoding should be. perhaps you can figure it out by observation? -- ⚂⚃ 100% natural
Hi and thanks for the replies, I have had some luck. I did find the encoding used originally to create the text files I am trying to import. I have managed to use the client_encoding environmental variable and then successfully did manage to import the data. Gavin. On 12/02/2011, at 8:15 PM, Jasen Betts wrote: > On 2011-02-08, Gavin Beau Baumanis <beau@palcare.com.au> wrote: > >> I understand the error message - but what I don't know is what I >> need to set the encoding to - in order to import / use the data. > > if you run it through > > iconv --from-code=ASCII -to-code=UTF8 -c > > it'll strip out all the non-ascii symbols, without knowing the > encoding it's impossible to assign any useful meaning to them. > This step may render your data useless, it would be much better to > find out what the encoding should be. > > perhaps you can figure it out by observation? > > -- > ⚂⚃ 100% natural > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql