Re: Hex characters in COPY input - Mailing list pgsql-general

From Melvin Call
Subject Re: Hex characters in COPY input
Date
Msg-id CADGQN57rsoRL-tX7w1se0ysu2Zywsf=OJ-4GVVNRdgLNBA3LSQ@mail.gmail.com
Whole thread Raw
In response to Hex characters in COPY input  (Melvin Call <melvincall979@gmail.com>)
List pgsql-general
On 2/27/15, Adam Hooper <adam@adamhooper.com> wrote:
> On Thu, Feb 26, 2015 at 9:50 PM, Melvin Call <melvincall979@gmail.com>
> wrote:
>
>> So my question is, how do I sanitize the hex character in the middle of a
>> word
>> to be able to copy in Montreal with an accented e? Or am I going about
>> this at
>> the wrong point?
>
> Hi Melvin,
>
> This is not a Postgres problem, and it is not a regex problem. So yes,
> you're going about it at the wrong point: you're trying to modify a
> _character_ at a time, but you _should_ be trying to modify a _byte_
> at a time. Text replacement cannot do what you want it to do.
>
> If you're on Linux or Mac, uconv will work -- for instance, `iconv
> --from-code=windows-1252 --to-code=utf-8 < input-file.txt >
> output-file.txt`
>
> Otherwise, you can use a text editor. Be sure to open the file
> properly (such that é appears) and then save it as utf-8.
>
> Alternatively, you could tell Postgres to use your existing encoding
> -- judging from the \xe9, any of "windows-1252", "iso-8859-15" or
> "iso-8859-1" will be accurate. But I always prefer my data to be
> stored as "utf-8", and you should, too.
>
> Read up on character sets here:
> http://www.joelonsoftware.com/articles/Unicode.html
>
> Enjoy life,
> Adam


Thank you Adam. I was able to make this work by adding the ENCODING 'latin1'
option to the COPY command per Vic's suggestion, and as you correctly pointed
out as well. However iconv would probably do the trick too, now that I know
where the problem actually lies. I failed to realize that I was not dealing
with UTF8 because the MySQL data is encoded in UTF8, but you saw what I wasn't
seeing. Your suggested reading is also most appreciated. Maybe one of these
days I will actually make sense of this encoding issue. Thanks for the
link.

Regards,
Melvin


pgsql-general by date:

Previous
From: Melvin Call
Date:
Subject: Re: Hex characters in COPY input
Next
From: Adrian Klaver
Date:
Subject: Re: Hex characters in COPY input