Re: BUG #3697: utf8 issue: can not reimport a table that was successfully exported. - Mailing list pgsql-bugs

From Marc Mamin
Subject Re: BUG #3697: utf8 issue: can not reimport a table that was successfully exported.
Date
Msg-id CA896D7906BF224F8A6D74A1B7E54AB301750C30@JENMAIL01.ad.intershop.net
Whole thread Raw
In response to Re: BUG #3697: utf8 issue: can not reimport a table that was successfully exported.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Thank you for your quick response,

> if you don't quote backslashes in untrusted input you'll have problems
far worse than this one

I do it now but not since by db is live...=20
So I probably have some invalid caraters in.=20
Is this an issue that must be fixed before I can upgrade to 8.3 ?
Is there a recommendation how to clean these data (I know where to
search for them)

Thanks,

Marc Mamin


=20

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]=20
Sent: Thursday, October 25, 2007 6:08 PM
To: Marc Mamin
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #3697: utf8 issue: can not reimport a table that
was successfully exported.=20

"Marc Mamin" <m.mamin@intershop.de> writes:
> I didn't check if all characters are valid UTF8...

They aren't ...

> select f_utf8_test('(Mozilla/4.0 (compatible; MSIE 6.0; Wind
> \xE0\xF0\xF1\xF2\xE2\xE5\xED\xED\xFB\xE9 \xE2\xFB\xF1\xF8\9
> \xE3\xEE\xF1\xF3\xE4
> xE4\xE6 \xCD\xC1 \xD0\xC1")');

In 8.3 that will throw an error:

utf8=3D# select f_utf8_test('(Mozilla/4.0 (compatible; MSIE 6.0; Wind
utf8'# \xE0\xF0\xF1\xF2\xE2\xE5\xED\xED\xFB\xE9 \xE2\xFB\xF1\xF8\9
utf8'# \xE3\xEE\xF1\xF3\xE4 utf8'# xE4\xE6 \xCD\xC1 \xD0\xC1")');
WARNING:  nonstandard use of escape in a string literal LINE 1: select
f_utf8_test('(Mozilla/4.0 (compatible; MSIE 6.0; Wind
                           ^
HINT:  Use the escape string syntax for escapes, e.g., E'\r\n'.
ERROR:  invalid byte sequence for encoding "UTF8": 0xe0f0f1
HINT:  This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".
utf8=3D#=20

However, since this behavior isn't backwards-compatible, there's not
much appetite for back-patching it.

I don't think this is a security issue --- if you don't quote
backslashes in untrusted input you'll have problems far worse than this
one.

            regards, tom lane

pgsql-bugs by date:

Previous
From: "Pierre-Yves Strub"
Date:
Subject: Re: BUG #3696: FK integrity check bypassed using rules.
Next
From: Michael
Date:
Subject: PostgreSQL crash on Freebsd 7