Re: Problem with restoring dump (may be tsearch-related) - Mailing list pgsql-general

From Markus Wollny
Subject Re: Problem with restoring dump (may be tsearch-related)
Date
Msg-id 2266D0630E43BB4290742247C8910575014CE3C5@dozer.computec.de
Whole thread Raw
In response to Problem with restoring dump (may be tsearch-related)  ("Markus Wollny" <Markus.Wollny@computec.de>)
Responses Re: Problem with restoring dump (may be tsearch-related)  (Oleg Bartunov <oleg@sai.msu.su>)
Re: Problem with restoring dump (may be tsearch-related)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Hi!

The ü is literally in the file - we are parsing all of our editor's
input for optimal HTML-output. And the german umlauts are represented as
&[v]uml; where [v] is the corresponding vowel. Now you mention it, I
believe that all of the strings which are in one of these "parse error
at or near"-messages are actually preceded by a HTML-umlaut or the like:

Just a snippet from my first example:
psql:alldb1.sql:1122826: ERROR:  parser: parse error at or near "ußerst"
would be "äußerst" -> äßerst
psql:alldb1.sql:1122826: ERROR:  parser: parse error at or near "chst"
could be "höchst" -> h¨chst
psql:alldb1.sql:1122826: ERROR:  parser: parse error at or near "mmern"
could be "kümmern" -> kümmern"
psql:alldb1.sql:1122827: ERROR:  parser: parse error at or near "ren"
could be "Türen" -> "Türen"
psql:alldb1.sql:1122827: ERROR:  parser: parse error at or near "rfer"
could be "Dörfer" -> "Dörfer"
psql:alldb1.sql:1122827: ERROR:  parser: parse error at or near "ndig"
could be "hintergründig" -> "hintergründig"
psql:alldb1.sql:1122828: ERROR:  parser: parse error at or near
"henvorteile" could be "Höhenvorteile" -> "Höhenvorteile"
psql:alldb1.sql:1122828: ERROR:  parser: parse error at or near "hten"
could be "blühten" -> "blühten"
psql:alldb1.sql:1122829: ERROR:  parser: parse error at or near
"berqueren" could be "überqueren" -> "überqueren"
psql:alldb1.sql:1122829: ERROR:  parser: parse error at or near "cken"
-> "Lücken" -> "Lücken"
psql:alldb1.sql:1122830: ERROR:  parser: parse error at or near "ck" ->
"zurück" -> "zurück"
psql:alldb1.sql:1122831: ERROR:  parser: parse error at or near "hrend"
-> "führend" -> "führend"
psql:alldb1.sql:1122831: ERROR:  parser: parse error at or near "ude" ->
"Gebäude" -> "Gebäude"
psql:alldb1.sql:1122831: ERROR:  parser: parse error at or near "nnen"
-> "können" -> "können"
psql:alldb1.sql:1122831: ERROR:  parser: parse error at or near
"berzeugen" -> "überzeugen" ->"überzeugen"

As txtidx actually just contains substrings and ignores the HTML-umlauts
(a slight disadvantage we are quite happy to live with), it only stores
those substrings before or after ampersand or semicolon anyway - which
shouldn't cause any problems whatsoever, so I think we might rule out
tsearch being the cause. But why would ordinary plain text cause these
parse-errors?

What shall I do next in order to get down to the problem itself?

Regards,

    Markus

> -----Ursprüngliche Nachricht-----
> Von: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Gesendet: Donnerstag, 5. September 2002 18:23
> An: Markus Wollny
> Cc: pgsql-general@postgresql.org
> Betreff: Re: [GENERAL] Problem with restoring dump (may be
> tsearch-related)
>
>
> "Markus Wollny" <Markus.Wollny@computec.de> writes:
> > The entries are quite long, and I don't want to cause too
> much traffic,
> > so I don't dare to give you more than this one example:
>
> > Restore-attempt outputs e.g.:
> > psql:alldb1.sql:1434914: ERROR:  parser: parse error at or near
> > "ckenmuskeln"
>
> Hmm.  I see that string in the context
>
> > Wie sich die Rückenmuskeln anspannen, wird im Bild aber nicht
>
> What exactly is the string that you've represented here as ü ?
> Is that literally what's in the dump file, or has something helpfully
> html-ized some weird Unicode sequence?
>
> As far as I can tell, what must be happening is that the COPY data
> transfer has been terminated and the regular SQL parser is trying to
> make sense of the input starting at "ckenmuskeln anspannen,".  I'm
> wondering if something is misreading the ü sequence as "\." ...
> which would probably be a character-set-encoding kind of problem.
>
>             regards, tom lane
>

pgsql-general by date:

Previous
From: fpaul@netcourrier.com
Date:
Subject: PostgreSQL vs MySQL : strange results on insertion
Next
From: Joel Rodrigues
Date:
Subject: "...integer[] references..." = error