Thread: unconvertable characters

unconvertable characters

From

Sim Zacks

Date:

16 July 2007, 13:24:00

My 8.0.1 database is using ISO_8859_8 encoding. When I select specific fields I get a warning:
WARNING:  ignoring unconvertible ISO_8859_8 character 0x00c2

I now want to upgrade my database to 8.2.4 and change the encoding to UTF-8.
When the restore is done, I get the following errors:
pg_restore: restoring data for table "manufacturers_old"
pg_restore: [archiver (db)] Error from TOC entry 4836; 0 9479397 TABLE DATA manufacturers postgres
pg_restore: [archiver (db)] COPY failed: ERROR:  character 0xc2 of encoding "ISO_8859_8" has no equivalent in "UTF8"
CONTEXT:  COPY manufacturers_old, line 331

And no data is put into the table.
Is there a function I can use to replace the unconvertable charachters to blanks?
such as:
update manufacturers set manufacturername=replace(manufacturername,0x00c2,'')
(that query doesn't work.)
Or is there another way of doing it so that I just get rid of any characters that are not convertable?

Thank You
Sim

Re: unconvertable characters

From

Michael Fuhr

Date:

16 July 2007, 14:19:39

On Mon, Jul 16, 2007 at 04:20:22PM +0300, Sim Zacks wrote:
> My 8.0.1 database is using ISO_8859_8 encoding. When I select specific
> fields I get a warning:
> WARNING:  ignoring unconvertible ISO_8859_8 character 0x00c2

Did any of the data originate on Windows?  Might the data be in
Windows-1255 or some encoding other than ISO-8859-8?  In Windows-1255
0xc2 represents <U+05B2 HEBREW POINT HATAF PATAH> -- does that
character seem correct in the context of the data?

http://en.wikipedia.org/wiki/Windows-1255

> I now want to upgrade my database to 8.2.4 and change the encoding to UTF-8.
> When the restore is done, I get the following errors:
> pg_restore: restoring data for table "manufacturers_old"
> pg_restore: [archiver (db)] Error from TOC entry 4836; 0 9479397 TABLE DATA
> manufacturers postgres
> pg_restore: [archiver (db)] COPY failed: ERROR:  character 0xc2 of encoding
> "ISO_8859_8" has no equivalent in "UTF8"
> CONTEXT:  COPY manufacturers_old, line 331
>
> And no data is put into the table.
> Is there a function I can use to replace the unconvertable charachters to
> blanks?

If the data is in an encoding other than ISO-8859-8 then you could
redirect the output of pg_restore to a file or pipe it through a
filter and change the "SET client_encoding" line to whatever the
encoding really is.  For example, if the data is Windows-1255 then
you'd use the following:

SET client_encoding TO win1255;

Another possibility would be to use a command like iconv to convert
the data to UTF-8 and strip unconvertible characters; on many systems
you could do that with "iconv -f iso8859-8 -t utf-8 -c".  If you
convert to UTF-8 then you'd need to change client_encoding accordingly.

--
Michael Fuhr

Re: unconvertable characters

From

Sim Zacks

Date:

16 July 2007, 15:00:47

Michael,

I have been manually debugging and each symbol is different, though they each give the same error code. For example, in
oneit was a pound sign, though when I did an update and put in the pound sign it worked. 
Another time it was the degree symbol.
I'm going to look at iconv as that sounds like the best possibility.

Sim

Michael Fuhr wrote:
> On Mon, Jul 16, 2007 at 04:20:22PM +0300, Sim Zacks wrote:
>> My 8.0.1 database is using ISO_8859_8 encoding. When I select specific
>> fields I get a warning:
>> WARNING:  ignoring unconvertible ISO_8859_8 character 0x00c2
>
> Did any of the data originate on Windows?  Might the data be in
> Windows-1255 or some encoding other than ISO-8859-8?  In Windows-1255
> 0xc2 represents <U+05B2 HEBREW POINT HATAF PATAH> -- does that
> character seem correct in the context of the data?
>
> http://en.wikipedia.org/wiki/Windows-1255
>
>> I now want to upgrade my database to 8.2.4 and change the encoding to UTF-8.
>> When the restore is done, I get the following errors:
>> pg_restore: restoring data for table "manufacturers_old"
>> pg_restore: [archiver (db)] Error from TOC entry 4836; 0 9479397 TABLE DATA
>> manufacturers postgres
>> pg_restore: [archiver (db)] COPY failed: ERROR:  character 0xc2 of encoding
>> "ISO_8859_8" has no equivalent in "UTF8"
>> CONTEXT:  COPY manufacturers_old, line 331
>>
>> And no data is put into the table.
>> Is there a function I can use to replace the unconvertable charachters to
>> blanks?
>
> If the data is in an encoding other than ISO-8859-8 then you could
> redirect the output of pg_restore to a file or pipe it through a
> filter and change the "SET client_encoding" line to whatever the
> encoding really is.  For example, if the data is Windows-1255 then
> you'd use the following:
>
> SET client_encoding TO win1255;
>
> Another possibility would be to use a command like iconv to convert
> the data to UTF-8 and strip unconvertible characters; on many systems
> you could do that with "iconv -f iso8859-8 -t utf-8 -c".  If you
> convert to UTF-8 then you'd need to change client_encoding accordingly.
>

Re: unconvertable characters

From

Sim Zacks

Date:

18 July 2007, 10:33:09

I fixed my data, but I did it manually. It seems like there were hidden characters, which may actually be the 0xc2
(whichshould not have been there. The data must have been pasted in somehow, but when I copied the value and pasted it
backin (or ran an update statement, I tried both) the same value including the character that looked like it was wrong,
itworked fine. 




Sim Zacks wrote:
> Michael,
>
> I have been manually debugging and each symbol is different, though they
> each give the same error code. For example, in one it was a pound sign,
> though when I did an update and put in the pound sign it worked.
> Another time it was the degree symbol.
> I'm going to look at iconv as that sounds like the best possibility.
>
> Sim
>
> Michael Fuhr wrote:
>> On Mon, Jul 16, 2007 at 04:20:22PM +0300, Sim Zacks wrote:
>>> My 8.0.1 database is using ISO_8859_8 encoding. When I select
>>> specific fields I get a warning:
>>> WARNING:  ignoring unconvertible ISO_8859_8 character 0x00c2
>>
>> Did any of the data originate on Windows?  Might the data be in
>> Windows-1255 or some encoding other than ISO-8859-8?  In Windows-1255
>> 0xc2 represents <U+05B2 HEBREW POINT HATAF PATAH> -- does that
>> character seem correct in the context of the data?
>>
>> http://en.wikipedia.org/wiki/Windows-1255
>>
>>> I now want to upgrade my database to 8.2.4 and change the encoding to
>>> UTF-8.
>>> When the restore is done, I get the following errors:
>>> pg_restore: restoring data for table "manufacturers_old"
>>> pg_restore: [archiver (db)] Error from TOC entry 4836; 0 9479397
>>> TABLE DATA manufacturers postgres
>>> pg_restore: [archiver (db)] COPY failed: ERROR:  character 0xc2 of
>>> encoding "ISO_8859_8" has no equivalent in "UTF8"
>>> CONTEXT:  COPY manufacturers_old, line 331
>>>
>>> And no data is put into the table.
>>> Is there a function I can use to replace the unconvertable
>>> charachters to blanks?
>>
>> If the data is in an encoding other than ISO-8859-8 then you could
>> redirect the output of pg_restore to a file or pipe it through a
>> filter and change the "SET client_encoding" line to whatever the
>> encoding really is.  For example, if the data is Windows-1255 then
>> you'd use the following:
>>
>> SET client_encoding TO win1255;
>>
>> Another possibility would be to use a command like iconv to convert
>> the data to UTF-8 and strip unconvertible characters; on many systems
>> you could do that with "iconv -f iso8859-8 -t utf-8 -c".  If you
>> convert to UTF-8 then you'd need to change client_encoding accordingly.
>>