Thread: bad unicode characters

bad unicode characters

From

Toby Tremayne

Date:

11 June 2003, 09:26:20

hi all,

        I've been trying to debug this for ages - I have a dump of a database
that
I'm trying to restore to a new db created with "with encoding = 'unicode'"
and I get the following two errors:

ERROR:  copy: line 1298, Invalid UNICODE character sequence found (0xe46e67)
lost synchronization with server, resetting connection
ERROR:  copy: line 205, Invalid UNICODE character sequence found (0xed7427)
lost synchronization with server, resetting connection

Thing is I've tried desperately to locate the characters it's talking about
and I can't for the life of me.  I've tried line 1298 of the copy statement
that seems to be failing, I've tried line 1298 or the script, I've tried the
record in that copy statement that is noted as record 1298 - all of them look
fine to me in vi and shed.  I'm completely stumped - has anyone had this
problem before?  I'd appreciate any help offered - even if it's just how to
find these darn characters!

I'm using postgres 7.3.2 on SuSE 8.2 (linux 2.4.20-4GB-athlon)


cheers,
Toby
--

--------------------------------

  Life is poetry -
    write it in your own words

--------------------------------

Toby Tremayne
Code Poet and Zen Master of the Heavy Sleep
Senior Technical Consultant
Lyricist Software
www.lyricist.com.au
+61 416 048 090
ICQ: 13107913

Re: bad unicode characters

From

"Nigel J. Andrews"

Date:

11 June 2003, 09:32:00

On Wed, 11 Jun 2003, Toby Tremayne wrote:

> hi all,
>
>         I've been trying to debug this for ages - I have a dump of a database
> that
> I'm trying to restore to a new db created with "with encoding = 'unicode'"
> and I get the following two errors:
>
> ERROR:  copy: line 1298, Invalid UNICODE character sequence found (0xe46e67)
> lost synchronization with server, resetting connection
> ERROR:  copy: line 205, Invalid UNICODE character sequence found (0xed7427)
> lost synchronization with server, resetting connection
>
> Thing is I've tried desperately to locate the characters it's talking about
> and I can't for the life of me.  I've tried line 1298 of the copy statement
> that seems to be failing, I've tried line 1298 or the script, I've tried the
> record in that copy statement that is noted as record 1298 - all of them look
> fine to me in vi and shed.  I'm completely stumped - has anyone had this
> problem before?  I'd appreciate any help offered - even if it's just how to
> find these darn characters!
>
> I'm using postgres 7.3.2 on SuSE 8.2 (linux 2.4.20-4GB-athlon)

What encoding was your dump created in and is that client encoding being set
properly during the restore?

It's not likely that is has that wrong but it's worth checking as the only
other thing I can think of is that the dump encoding to unicode conversion is
broken.


--
Nigel Andrews

Re: bad unicode characters

From

Toby Tremayne

Date:

11 June 2003, 09:38:31

Hi Nigel,

    the database I dumped from originally was whatever the standard postgres
database is - SQL_ASCII??

    A bit of background - the data comes from a microsoft access database which
was populated through a cold fusion application.  Apparently cold fusion 5
stored unicode characters incorrectly so I suspect it's either that or
something bad pasted in from word.  These characters were already in the data
before I dumped it out.  What I'm trying to do is convert the whole thing to
a proper unicode database but to do that I need to weed out these bad
characters and I'm damned if I know how to even find them...

cheers,
Toby


On Wednesday 11 June 2003 19:31, Nigel J. Andrews wrote:
> On Wed, 11 Jun 2003, Toby Tremayne wrote:
> > hi all,
> >
> >         I've been trying to debug this for ages - I have a dump of a
> > database that
> > I'm trying to restore to a new db created with "with encoding =
> > 'unicode'" and I get the following two errors:
> >
> > ERROR:  copy: line 1298, Invalid UNICODE character sequence found
> > (0xe46e67) lost synchronization with server, resetting connection
> > ERROR:  copy: line 205, Invalid UNICODE character sequence found
> > (0xed7427) lost synchronization with server, resetting connection
> >
> > Thing is I've tried desperately to locate the characters it's talking
> > about and I can't for the life of me.  I've tried line 1298 of the copy
> > statement that seems to be failing, I've tried line 1298 or the script,
> > I've tried the record in that copy statement that is noted as record 1298
> > - all of them look fine to me in vi and shed.  I'm completely stumped -
> > has anyone had this problem before?  I'd appreciate any help offered -
> > even if it's just how to find these darn characters!
> >
> > I'm using postgres 7.3.2 on SuSE 8.2 (linux 2.4.20-4GB-athlon)
>
> What encoding was your dump created in and is that client encoding being
> set properly during the restore?
>
> It's not likely that is has that wrong but it's worth checking as the only
> other thing I can think of is that the dump encoding to unicode conversion
> is broken.
>
>
> --
> Nigel Andrews
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: bad unicode characters

From

Joseph Shraibman

Date:

11 June 2003, 20:48:38

Toby Tremayne wrote:
> Hi Nigel,
>
>     the database I dumped from originally was whatever the standard postgres
> database is - SQL_ASCII??
>
Do a \encoding SQL_ASCII in the dump file after each \connect

Re: bad unicode characters

From

Toby Tremayne

Date:

12 June 2003, 08:22:01

Do a \encoding SQL_ASCII in the dump file after each \connect


tried this but it doesn't change the results at all... argh!

cheers,
Toby

Re: bad unicode characters

From

"Nigel J. Andrews"

Date:

12 June 2003, 08:34:17

On Thu, 12 Jun 2003, Toby Tremayne wrote:

>> Do a \encoding SQL_ASCII in the dump file after each \connect
>
>
> tried this but it doesn't change the results at all... argh!

I'm not familiar with shed, and I'm not even more than a novice with vi but I
presume you've tried search for those hex values given in the error message.

If not you could always try od or I'm sure there's other binary editors out
there, doesn't emacs do binary editing?

--
Nigel J. Andrews