Thread: restore whoes

restore whoes

From
Simone Tellini
Date:
Hi there,

I was trying to restore a dump from pg 7.1.3 to the newly installed 7.2,
but I got these errors:

You are now connected as new user postgres.
ERROR:  copy: line 47, CopyReadAttribute: end of record marker corrupted
psql:backup.sql:94383: ERROR:  copy: line 47, CopyReadAttribute: end of record marker corrupted
psql:backup.sql:94383: lost synchronization with server, resetting connection
DEBUG:  pq_flush: send() failed: Broken pipe
FATAL 1:  Socket command type
 unknown
ERROR:  Relation "tipi" does not exist
psql:backup.sql:96823: ERROR:  Relation "tipi" does not exist
psql:backup.sql:96830: invalid command \.
ERROR:  parser: parse error at or near "0"
psql:backup.sql:96838: ERROR:  parser: parse error at or near "0"
psql:backup.sql:96950: invalid command \N
psql:backup.sql:96950: invalid command \N
psql:backup.sql:96951: invalid command \N
psql:backup.sql:96951: invalid command \N
psql:backup.sql:96952: invalid command \N
psql:backup.sql:96952: invalid command \N
psql:backup.sql:96953: invalid command \N
psql:backup.sql:96953: invalid command \N
psql:backup.sql:96954: invalid command \N
psql:backup.sql:96954: invalid command \N
psql:backup.sql:96955: invalid command \N
psql:backup.sql:96955: invalid command \N
psql:backup.sql:96956: invalid command \N
psql:backup.sql:96956: invalid command \N
psql:backup.sql:96957: invalid command \N
psql:backup.sql:96957: invalid command \N
psql:backup.sql:97055: invalid command \.
ERROR:  parser: parse error at or near "9223"
psql:backup.sql:97062: ERROR:  parser: parse error at or near "9223"
ERROR:  Relation "prodotti" does not exist
psql:backup.sql:97070: ERROR:  Relation "prodotti" does not exist
ERROR:  Relation "tipi" does not exist
psql:backup.sql:97078: ERROR:  Relation "tipi" does not exist
ERROR:  Relation "particolari" does not exist
psql:backup.sql:97086: ERROR:  Relation "particolari" does not exist
You are now connected to database template1 as user wiznet.
CREATE DATABASE

Don't know what happened exactly, but it seems it failed when restoring
a lot of text entries. Line 94383+47 (if this is how "line 47" should be
interpreted) looks like "^M\" in vim, without quotes - I think it is a
CR character.

Anyway, it seems it skipped a lot of backup lines then, as it completely
failed to recover a database following the records of that table: you
can see the error lines when it tried to create the tables in that db,
without having created it. Obviously it also skipped the rest of the
dump of the previous db.

It seems I'll stick to 7.1.3 for a while on the production machine
(hoping I won't need to restore the db there too... I'm a bit scared now)

--

Simone Tellini
E-mail: tellini@areabusiness.it
http://www.areabusiness.it


Re: restore whoes

From
Tom Lane
Date:
Simone Tellini <tellini@areabusiness.it> writes:
> I was trying to restore a dump from pg 7.1.3 to the newly installed 7.2,
> but I got these errors:

> You are now connected as new user postgres.
> ERROR:  copy: line 47, CopyReadAttribute: end of record marker corrupted
> psql:backup.sql:94383: ERROR:  copy: line 47, CopyReadAttribute: end of record marker corrupted
> psql:backup.sql:94383: lost synchronization with server, resetting connection

It's going to be difficult to do anything about this unless you show us
the failing data.  The segment of the dump file right around line 94383
would be interesting, as would the declaration of the table that's being
loaded there.  Also, what locale and/or multibyte settings are you using?

            regards, tom lane

Re: restore whoes

From
Tom Lane
Date:
Simone Tellini <tellini@areabusiness.it> writes:
> here's a bit of data: the \connect you can see is at line 94382

Okay, the problem with this file is fairly obvious after looking at it
with od -c:

00000d0 \n \r \n  \  c  o  n  n  e  c  t     -     p  o
00000e0  s  t  g  r  e  s \r \n  C  O  P  Y     "  i  n
00000f0  t  e  r  r  o  g  a  z  i  o  n  i  "        F
0000100  R  O  M     s  t  d  i  n  ; \r \n  9  7 \t  2
0000110  0  0  0  0  5  1  2 \t  B  a  n  c  a     I  n
0000120  t  e  s  a     e     C  e  n  t  r  o     C  o
0000130  n  t  a  b  i  l  e     C  o  m  i  t     d  i
0000140     P  a  r  m  a \t  4 \t  V  E  R  B  A  L  E
0000150     D  E  L     C  O  N  S  I  G  L  I  O     C
0000160  O  M  U  N  A  L  E    \r  \ \r \n  S  E  D  U
0000170  T  A     D  E  L     1  2     M  A  G  G  I  O

The newlines in the file have been translated to \r\n, which messes up
the backslash escaping.  It looks to me like the string data that was
originally dumped contained CR/LF newline sequences (\r\n), which
would have been emitted by COPY as \r\\n (ie, carriage return,
backslash, newline).  What's coming back in is carriage return,
backslash, carriage return, newline --- so the backslash uselessly
quotes the second carriage return and the newline is taken as the end
of the line of COPY data.  I see a result like so:

psql:tellini.sql:11: ERROR:  copy: line 2, pg_atoi: error in "SEDUTA DEL 12 MAGG
": can't parse "SEDUTA DEL 12 MAGGIO 2000
psql:tellini.sql:11: lost synchronization with server, resetting connection

I suppose you are working under Windows, or have copied the file into
Windows at some point.  You need to find a way to avoid corrupting the
data file with Windows newline conversion.

            regards, tom lane

Re: restore whoes

From
Bruce Momjian
Date:
Tom Lane wrote:
> Simone Tellini <tellini@areabusiness.it> writes:
> > here's a bit of data: the \connect you can see is at line 94382
>
> Okay, the problem with this file is fairly obvious after looking at it
> with od -c:
>
> 00000d0 \n \r \n  \  c  o  n  n  e  c  t     -     p  o
> 00000e0  s  t  g  r  e  s \r \n  C  O  P  Y     "  i  n
> 00000f0  t  e  r  r  o  g  a  z  i  o  n  i  "        F
> 0000100  R  O  M     s  t  d  i  n  ; \r \n  9  7 \t  2
> 0000110  0  0  0  0  5  1  2 \t  B  a  n  c  a     I  n
> 0000120  t  e  s  a     e     C  e  n  t  r  o     C  o
> 0000130  n  t  a  b  i  l  e     C  o  m  i  t     d  i
> 0000140     P  a  r  m  a \t  4 \t  V  E  R  B  A  L  E
> 0000150     D  E  L     C  O  N  S  I  G  L  I  O     C
> 0000160  O  M  U  N  A  L  E    \r  \ \r \n  S  E  D  U
> 0000170  T  A     D  E  L     1  2     M  A  G  G  I  O

I think we should throw a very specific error from COPY when it sees \r
ending a line.  Comments?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: restore whoes

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I think we should throw a very specific error from COPY when it sees \r
> ending a line.  Comments?

Certainly *not*, as that is a valid data pattern.  At present.
See my proposal on pghackers for fixing this properly.

            regards, tom lane

Re: restore whoes

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I think we should throw a very specific error from COPY when it sees \r
> > ending a line.  Comments?
>
> Certainly *not*, as that is a valid data pattern.  At present.
> See my proposal on pghackers for fixing this properly.

I was thinking of throwing a specific error when copy fails _and_ when
the line ends with a \r, rather than a generic COPY failure message ---
not sure how to do that, though.

I don't think I like the carriage-return -> \r solution anymore because
that would require people creating COPY files by hand, which I am sure
many do, to also escape carriage returns, which seems too MS-ish for me.
We would be hoisting stupid operating system principles on COPY file
creators.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: restore whoes

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I was thinking of throwing a specific error when copy fails _and_ when
> the line ends with a \r, rather than a generic COPY failure message ---
> not sure how to do that, though.

Doesn't seem practical, as the actual error may not occur till much
later, far away from the COPY code itself.

> I don't think I like the carriage-return -> \r solution anymore because
> that would require people creating COPY files by hand, which I am sure
> many do, to also escape carriage returns, which seems too MS-ish for me.

We have to change *something*, Bruce, because as things stand it's
completely ambiguous whether a \r is a legitimate data character or
something introduced by a newline conversion.  If you insist on 100%
backwards compatibility then we'll be fighting this problem forever.
(Or at least till Windows dies ... yeah right.)

Please note also that \ r (two characters) is *already* accepted as
meaning a carriage return.  Ditto \ 0 1 5.  So it's quite possible
that applications generating COPY data may be using one of these
representations and not be affected in the slightest.

            regards, tom lane

Re: restore whoes

From
Simone Tellini
Date:
On Sun, 10 Feb 2002 20:43:16 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

TL> 0000160  O  M  U  N  A  L  E    \r  \ \r \n  S  E  D  U
TL> 0000170  T  A     D  E  L     1  2     M  A  G  G  I  O
TL>
TL> The newlines in the file have been translated to \r\n, which messes up

oh, btw, the \r\n newlines are part of the content of the text field,
they were *in* the database and I expect them to stay there after I
restore the backup :-)

They were not converted after the dump had been generated, if you
suspect this.

--

Simone Tellini
E-mail: tellini@areabusiness.it
http://www.areabusiness.it


Re: restore whoes

From
Simone Tellini
Date:
On Sun, 10 Feb 2002 20:43:16 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

TL> I suppose you are working under Windows, or have copied the file into
TL> Windows at some point.  You need to find a way to avoid corrupting the
TL> data file with Windows newline conversion.

No, I was working under linux: I dumped the db with pg 7.1.3 and tried
to restore it under 7.2, on the same machine.

The dump has only been gzipped by my backup script (pg_dumpall | gzip)
and then decompressed (gzip -d), but this shouldn't have modified it at
all.

--

Simone Tellini
E-mail: tellini@areabusiness.it
http://www.areabusiness.it


Re: restore whoes

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I was thinking of throwing a specific error when copy fails _and_ when
> > the line ends with a \r, rather than a generic COPY failure message ---
> > not sure how to do that, though.
>
> Doesn't seem practical, as the actual error may not occur till much
> later, far away from the COPY code itself.

OK, how about if we have a global char * called "extra_elog_output"
which we set before doing a /utils/adt data conversion on the last
column of the first row _if_ it ends in a carriage return.  We clear it
once we successfully return.  In elog.c, we print any extra string that
may be defined for extra_elog_output.  I wonder if we could make use of
this other places.

Also, if we add a WITH TRIMCR option to COPY, we can suggest that in the
error message.  We can add ONLYCR for Mac files ending only in \r.

>
> > I don't think I like the carriage-return -> \r solution anymore because
> > that would require people creating COPY files by hand, which I am sure
> > many do, to also escape carriage returns, which seems too MS-ish for me.
>
> We have to change *something*, Bruce, because as things stand it's
> completely ambiguous whether a \r is a legitimate data character or
> something introduced by a newline conversion.  If you insist on 100%
> backwards compatibility then we'll be fighting this problem forever.
> (Or at least till Windows dies ... yeah right.)

I think we can live with it _if_ we report a proper error message and
suggest a solution.  In fact, in some ways, it may be better to report
and fail rather than silently try to fix it.

> Please note also that \ r (two characters) is *already* accepted as
> meaning a carriage return.  Ditto \ 0 1 5.  So it's quite possible
> that applications generating COPY data may be using one of these
> representations and not be affected in the slightest.

Yes, I know, and it is obvious to COPY coders they have to escape \n
because that is their end-of-line character, but it is not obvious why
they would escape \r.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: restore whoes

From
Simone Tellini
Date:
On Mon, 11 Feb 2002 10:31:41 -0500 (EST)
Bruce Momjian <pgman@candle.pha.pa.us> wrote:

BM> I think we can live with it _if_ we report a proper error message and
BM> suggest a solution.  In fact, in some ways, it may be better to report
BM> and fail rather than silently try to fix it.

err... probably I didn't make it clear, but the problem is that you
can't restore a database if you've got a text field containing CRLF
terminated text!

You really can't say "we can live with it", can you? ;-)

--

Simone Tellini
E-mail: tellini@areabusiness.it
http://www.areabusiness.it


Re: restore whoes

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I think we can live with it _if_ we report a proper error message and
> suggest a solution.

Looking for a way to report an error message misses the point entirely,
IMHO (even if it weren't a horrid kluge).  I don't want to report an
error message; I want to read the file successfully.  If the file in
question is your dump from an already-erased previous version, being
told that you messed it up is no help.

Adding options to the COPY command is no help either, because the COPY
may be embedded in a multi-gigabyte dump file that you can't edit
conveniently.  (And what of pg_restore, or other applications that
generate COPY commands?)  My approach of using a GUC variable is better
because it can be set from outside the dump script --- in
postgresql.conf, if necessary.

            regards, tom lane

Re: restore woes

From
Naomi Walker
Date:
>
>
>Adding options to the COPY command is no help either, because the COPY
>may be embedded in a multi-gigabyte dump file that you can't edit
>conveniently.  (And what of pg_restore, or other applications that
>generate COPY commands?)  My approach of using a GUC variable is better
>because it can be set from outside the dump script --- in
>postgresql.conf, if necessary.
Speaking of COPY, is there a way to tell copy not to barf on errors (and
set how many errors are acceptable)?

Can I tell copy to commit every X rows?


--
Naomi Walker
Chief Information Officer
Eldorado Computing, Inc.
602-604-3100  ext 242


Re: restore whoes

From
Tom Lane
Date:
Simone Tellini <tellini@areabusiness.it> writes:
> err... probably I didn't make it clear, but the problem is that you
> can't restore a database if you've got a text field containing CRLF
> terminated text!

Au contraire: it works fine ... so long as the COPY data hasn't been
munged between being emitted and being reloaded.

There's a thread in progress now on pghackers concerning whether we
can't make COPY data files proof against Microsoftish newline
translations, but until that happens, the rule is simple.  Don't pass
the data file through anything that thinks it's okay to reformat
newlines.

            regards, tom lane

Re: restore whoes

From
Simone Tellini
Date:
On Mon, 11 Feb 2002 11:56:28 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

TL> > err... probably I didn't make it clear, but the problem is that you
TL> > can't restore a database if you've got a text field containing CRLF
TL> > terminated text!
TL> Au contraire: it works fine ... so long as the COPY data hasn't been
TL> munged between being emitted and being reloaded.

no, it doesn't.

I'll repeat myself: I've dumped the database on 7.1.3 and tried to
restore it with 7.2 on the same Linux machine without modifying the dump.
It didn't work at all.

--

Simone Tellini
E-mail: tellini@areabusiness.it
http://www.areabusiness.it


Re: restore whoes

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I think we can live with it _if_ we report a proper error message and
> > suggest a solution.
>
> Looking for a way to report an error message misses the point entirely,
> IMHO (even if it weren't a horrid kluge).  I don't want to report an
> error message; I want to read the file successfully.  If the file in
> question is your dump from an already-erased previous version, being
> told that you messed it up is no help.
>
> Adding options to the COPY command is no help either, because the COPY
> may be embedded in a multi-gigabyte dump file that you can't edit
> conveniently.  (And what of pg_restore, or other applications that
> generate COPY commands?)  My approach of using a GUC variable is better
> because it can be set from outside the dump script --- in
> postgresql.conf, if necessary.

OK, GUC is OK if you prefer.  Report the error about the munged file,
and tell them what GUC parameter to set to load the file properly.  How
is that?  Of course, that GUC parameter is going to affect everyone
connecting, which seems like somewhat of a problem.  I guess they could
do:

    $ psql
    => SET COPYCR TO 'on';
    => \i copydumpfile

rather than modifying postgresql.conf.  This will not work for
pg_restore.

Another option is to just print the sed command that would strip off the
\r from the end of the line.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: restore whoes

From
Stephan Szabo
Date:
On Mon, 11 Feb 2002, Simone Tellini wrote:

>
> On Mon, 11 Feb 2002 11:56:28 -0500
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> TL> > err... probably I didn't make it clear, but the problem is that you
> TL> > can't restore a database if you've got a text field containing CRLF
> TL> > terminated text!
> TL> Au contraire: it works fine ... so long as the COPY data hasn't been
> TL> munged between being emitted and being reloaded.
>
> no, it doesn't.
>
> I'll repeat myself: I've dumped the database on 7.1.3 and tried to
> restore it with 7.2 on the same Linux machine without modifying the dump.
> It didn't work at all.

I just did a test on a debian machine from 7.1.3 to 7.2 with CRLF included
text with no apparent problems.  There must be something more to it that
that.

create table a(a text);
insert into a values ('abc\015\012abc');
insert into a values ('abc\015\012');
insert into a values ('\015\012');
insert into a values ('\015\012abc');
insert into a values ('\015\012');
insert into a values ('\015');
insert into a values ('abc\015');

- dump and restore then worked properly for the database containing this
table including restoring the CRs and LFs afaics


Re: restore whoes

From
Simone Tellini
Date:
On Mon, 11 Feb 2002 10:35:39 -0800 (PST)
Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote:

SS> I just did a test on a debian machine from 7.1.3 to 7.2 with CRLF included
SS> text with no apparent problems.  There must be something more to it that
SS> that.

hm...

I think I've found it: at the end of the data of the table there was
this line:

\.O

That O seems the cause of the mess. Any idea how it got there?
The file has been created with pg_dumpall | gzip > file.gz

--

Simone Tellini
E-mail: tellini@areabusiness.it
http://www.areabusiness.it


Re: restore whoes

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Another option is to just print the sed command that would strip off the
> > \r from the end of the line.
>
> You keep on assuming that a mechanical fix is possible when a file's
> been munged.  It is *NOT*.  There is no reliable way to tell whether a
> \r was part of the original data or was added by a newline converter.
> We will not do anyone a service by printing a message that falsely
> suggests they don't need to think carefully about how to reconstruct
> their data.
>
> The only thing worse than no error message is a misleading error
> message.

Well, if the COPY fails on the last value of the first row, and that
value has a trailing \r, we can be pretty sure.  I don't see how your
solution make it fool-proof, except by requiring every COPY file, old
and new, and created, to have CR as \r.  That seems too heavy-handed the
other way.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: restore whoes

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Another option is to just print the sed command that would strip off the
> \r from the end of the line.

You keep on assuming that a mechanical fix is possible when a file's
been munged.  It is *NOT*.  There is no reliable way to tell whether a
\r was part of the original data or was added by a newline converter.
We will not do anyone a service by printing a message that falsely
suggests they don't need to think carefully about how to reconstruct
their data.

The only thing worse than no error message is a misleading error
message.

            regards, tom lane

Re: restore whoes

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Well, if the COPY fails on the last value of the first row, and that
> value has a trailing \r, we can be pretty sure.

In the examples given, the error didn't show up till later rows, in
fields where there was no \r anywhere.

> I don't see how your
> solution make it fool-proof, except by requiring every COPY file, old
> and new, and created, to have CR as \r.

There *is no* foolproof solution with the current data representation
(and your idea of throwing untrustworthy error messages is certainly not
foolproof).  What we need to do is migrate to a new data representation.
Fortunately, it's nearly there already, since \ r and \ n sequences are
already supported by COPY IN.

            regards, tom lane

Re: restore whoes

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Well, if the COPY fails on the last value of the first row, and that
> > value has a trailing \r, we can be pretty sure.
>
> In the examples given, the error didn't show up till later rows, in
> fields where there was no \r anywhere.

Hard to imagine why a failure would occur on anything but the first row.
The original poster had \.0, which was some other issue.  I don't
remember seeing those examples.

> > I don't see how your
> > solution make it fool-proof, except by requiring every COPY file, old
> > and new, and created, to have CR as \r.
>
> There *is no* foolproof solution with the current data representation
> (and your idea of throwing untrustworthy error messages is certainly not
> foolproof).  What we need to do is migrate to a new data representation.
> Fortunately, it's nearly there already, since \ r and \ n sequences are
> already supported by COPY IN.

We clearly should be outputting \r and \n from COPY by default.

Input is the issue.  We are foolproof it people don't munge there
original files. The question is how many people are munging them vs. how
many people are creating COPY files on their own (and need to add
mapping for \r), or have COPY files from older versions of PostgreSQL.
I think the latter group is much larger.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: restore whoes

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> In the examples given, the error didn't show up till later rows, in
> >> fields where there was no \r anywhere.
>
> > Hard to imagine why a failure would occur on anything but the first row.
>
> Simple: suppose the dumped data contains \ \n in the body of a field,
> which is the currently-accepted representation for a data newline.
> Microsoft munges this to \ \r \n, which will now be read by COPY IN as a
> backslashed \r (ie, a data \r) followed by a non-escaped newline.
> Ergo, that's the end of the current row.  No error is detected (since a
> datatype that could have contained a data \n is unlikely to reject a
> data \r).  Then parsing resumes for the next row at the next input
> character.  We're totally out of sync and will produce lord only knows
> what peculiar error message when, eventually, some datatype input
> converter doesn't like what it's given.  But that could be many rows
> later than where the problem was.
>
> Part of the problem here is that COPY IN will accept a "short" row (one
> with fewer fields than the table actually has columns).  So early
> termination of a line isn't recognizable as an error unless the last
> column that receives some data manages to recognize a formatting error.
> Personally I think it would be a good idea to change that --- but I
> suppose you'll complain that that might break existing applications,
> even though the "feature" has never been documented.

No, I will not complain at all.  We should not accept short rows,
period.  It is reasonable to expect a full row every time. I just think
forcing the escape of \r isn't reasonable.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: restore whoes

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I just think forcing the escape of \r isn't reasonable.

It's not ideal, certainly, but I think it's preferable to continuing
to get burned every time a pg_dump file comes within hailing distance
of Windoze.  We've seen these complaints over and over, and we'll
continue to see them again and again until we do something to become
proof against the problem.  There is no solution that doesn't involve
some amount of backwards-incompatibility.  So let's bite the bullet and
do it.  We can offer a GUC switch to provide the old COPY IN behavior
for people who need it --- but given the shape of the world today, the
default behavior needs to be Microsoft-proof.

            regards, tom lane

Re: restore whoes

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> In the examples given, the error didn't show up till later rows, in
>> fields where there was no \r anywhere.

> Hard to imagine why a failure would occur on anything but the first row.

Simple: suppose the dumped data contains \ \n in the body of a field,
which is the currently-accepted representation for a data newline.
Microsoft munges this to \ \r \n, which will now be read by COPY IN as a
backslashed \r (ie, a data \r) followed by a non-escaped newline.
Ergo, that's the end of the current row.  No error is detected (since a
datatype that could have contained a data \n is unlikely to reject a
data \r).  Then parsing resumes for the next row at the next input
character.  We're totally out of sync and will produce lord only knows
what peculiar error message when, eventually, some datatype input
converter doesn't like what it's given.  But that could be many rows
later than where the problem was.

Part of the problem here is that COPY IN will accept a "short" row (one
with fewer fields than the table actually has columns).  So early
termination of a line isn't recognizable as an error unless the last
column that receives some data manages to recognize a formatting error.
Personally I think it would be a good idea to change that --- but I
suppose you'll complain that that might break existing applications,
even though the "feature" has never been documented.

            regards, tom lane

Re: restore whoes

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I just think forcing the escape of \r isn't reasonable.
>
> It's not ideal, certainly, but I think it's preferable to continuing
> to get burned every time a pg_dump file comes within hailing distance
> of Windoze.  We've seen these complaints over and over, and we'll
> continue to see them again and again until we do something to become
> proof against the problem.  There is no solution that doesn't involve
> some amount of backwards-incompatibility.  So let's bite the bullet and
> do it.  We can offer a GUC switch to provide the old COPY IN behavior
> for people who need it --- but given the shape of the world today, the
> default behavior needs to be Microsoft-proof.

I am afraid we will fix the \r\n problem but cause many more problems
with old dumps and hand-created COPY files.

What if we add GUC param and keep the behavior unchanged?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: restore whoes

From
Peter Eisentraut
Date:
Tom Lane writes:

> Simple: suppose the dumped data contains \ \n in the body of a field,
> which is the currently-accepted representation for a data newline.
> Microsoft munges this to \ \r \n, which will now be read by COPY IN as a
> backslashed \r (ie, a data \r) followed by a non-escaped newline.

ISTM that this class of problem should be avoided by reading COPY input as
a text stream rather than a binary stream on all operating systems.
Doesn't that effectively guarantee that your program sees the end of the
line as "\n" only, no matter how it's actually stored?

--
Peter Eisentraut   peter_e@gmx.net


Re: restore whoes

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> ISTM that this class of problem should be avoided by reading COPY input as
> a text stream rather than a binary stream on all operating systems.
> Doesn't that effectively guarantee that your program sees the end of the
> line as "\n" only, no matter how it's actually stored?

(1) No, that just guarantees that we have no control over the breakage
or recovering from it.  Our real problem is that COPY's data
representation is not newline-translation-proof; opening up an
additional place where newline translation might or might not occur
cannot fix this problem, it can only make it worse.

(2) On Unix machines, "text mode" I/O does not do anything different
from binary mode anyway.  Our problem is with pg_dump data that has
temporarily left the Unix world and been munged in transit.  We want
to be able to deal with that whether or not the local libc has ever
heard of \r\n newlines.

            regards, tom lane

Re: restore whoes

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> We can offer a GUC switch to provide the old COPY IN behavior
>> for people who need it --- but given the shape of the world today, the
>> default behavior needs to be Microsoft-proof.

> I am afraid we will fix the \r\n problem but cause many more problems
> with old dumps and hand-created COPY files.

> What if we add GUC param and keep the behavior unchanged?

Well, at that point it comes down to opinions.  I stand by my opinion
above, but I have no more numbers to prove it than you do.

However, if we make the appropriate code and documentation changes in
7.2.1, then (a) dumps from 7.2.1 or later will be no problem, and (b)
people will have nearly a full release cycle to update any existing
applications that generate the deprecated \r representation.  So I
don't think that it'll be a big problem for 7.3 to have a
non-backwards-compatible default behavior.

            regards, tom lane

Re: restore whoes

From
dalgoda@ix.netcom.com (Mike Castle)
Date:
In article <15618.1013463240@sss.pgh.pa.us>,
Tom Lane  <tgl@sss.pgh.pa.us> wrote:
>(2) On Unix machines, "text mode" I/O does not do anything different
>from binary mode anyway.  Our problem is with pg_dump data that has
>temporarily left the Unix world and been munged in transit.  We want
>to be able to deal with that whether or not the local libc has ever
>heard of \r\n newlines.

Are there many (any?) complaints about a similar problem when a file is
created/modified on a Mac system with it's own EOL characteristics?

mrc
--     Mike Castle      dalgoda@ix.netcom.com      www.netcom.com/~dalgoda/   We are all of us living in the shadow of
Manhattan. -- Watchmen 
fatal ("You are in a maze of twisty compiler features, all different"); -- gcc

Re: restore whoes

From
"Chad R. Larson"
Date:
At 01:42 PM 2/11/2002 , Tom Lane wrote:
>It's not ideal, certainly, but I think it's preferable to continuing to
>get burned every time a pg_dump file comes within hailing distance of Windoze.

Perhaps something along this line would be useful.

         -crl
--
Chad R. Larson (CRL22)    chad@eldocomp.com
   Eldorado Computing, Inc.   602-604-3100
      5353 North 16th Street, Suite 400
        Phoenix, Arizona  85016-3228

Attachment