Thread: how to ignore invalid byte sequence for encoding without using sql_ascii?

how to ignore invalid byte sequence for encoding without using sql_ascii?

From
"detrox@gmail.com"
Date:
I am now importing the dump file of wikipedia into my postgresql using
maintains/importDump.php. It fails on 'ERROR: invalid byte sequence
for encoding UTF-8'. Is there any way to let pgsql just ignore the
invalid characters ( i mean that drop the invalid ones ), that the
script will keep going without die on this error.

I know that i can using sql_ascii or even modify the importDump.php,
but those are not so easy to do as i thought.

thanks for help


Re: how to ignore invalid byte sequence for encoding without using sql_ascii?

From
Martijn van Oosterhout
Date:
On Thu, Sep 27, 2007 at 02:28:27AM -0700, detrox@gmail.com wrote:
> I am now importing the dump file of wikipedia into my postgresql using
> maintains/importDump.php. It fails on 'ERROR: invalid byte sequence
> for encoding UTF-8'. Is there any way to let pgsql just ignore the
> invalid characters ( i mean that drop the invalid ones ), that the
> script will keep going without die on this error.

No, postgres does not destroy data. It you want bits of your data
removed you need to write your own tool to do it.

That said, are you sure that the data you're importing is UTF-8?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: how to ignore invalid byte sequence for encoding without using sql_ascii?

From
"detrox yang"
Date:
got it. thanks very much.

On 10/2/07, Martijn van Oosterhout <kleptog@svana.org> wrote:
On Thu, Sep 27, 2007 at 02:28:27AM -0700, detrox@gmail.com wrote:
> I am now importing the dump file of wikipedia into my postgresql using
> maintains/importDump.php. It fails on 'ERROR: invalid byte sequence
> for encoding UTF-8'. Is there any way to let pgsql just ignore the
> invalid characters ( i mean that drop the invalid ones ), that the
> script will keep going without die on this error.

No, postgres does not destroy data. It you want bits of your data
removed you need to write your own tool to do it.

That said, are you sure that the data you're importing is UTF-8?

Have a nice day,
--
Martijn van Oosterhout   < kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFHAfOQIB7bNG8LQkwRAlMxAJ93gd9QP/c00tOcK9rSzEUvg4kZcQCfQYjS
JhhN/o8NT9xpahZmMz6XjbA=
=n0T1
-----END PGP SIGNATURE-----