Thread: restore whoes
Hi there, I was trying to restore a dump from pg 7.1.3 to the newly installed 7.2, but I got these errors: You are now connected as new user postgres. ERROR: copy: line 47, CopyReadAttribute: end of record marker corrupted psql:backup.sql:94383: ERROR: copy: line 47, CopyReadAttribute: end of record marker corrupted psql:backup.sql:94383: lost synchronization with server, resetting connection DEBUG: pq_flush: send() failed: Broken pipe FATAL 1: Socket command type unknown ERROR: Relation "tipi" does not exist psql:backup.sql:96823: ERROR: Relation "tipi" does not exist psql:backup.sql:96830: invalid command \. ERROR: parser: parse error at or near "0" psql:backup.sql:96838: ERROR: parser: parse error at or near "0" psql:backup.sql:96950: invalid command \N psql:backup.sql:96950: invalid command \N psql:backup.sql:96951: invalid command \N psql:backup.sql:96951: invalid command \N psql:backup.sql:96952: invalid command \N psql:backup.sql:96952: invalid command \N psql:backup.sql:96953: invalid command \N psql:backup.sql:96953: invalid command \N psql:backup.sql:96954: invalid command \N psql:backup.sql:96954: invalid command \N psql:backup.sql:96955: invalid command \N psql:backup.sql:96955: invalid command \N psql:backup.sql:96956: invalid command \N psql:backup.sql:96956: invalid command \N psql:backup.sql:96957: invalid command \N psql:backup.sql:96957: invalid command \N psql:backup.sql:97055: invalid command \. ERROR: parser: parse error at or near "9223" psql:backup.sql:97062: ERROR: parser: parse error at or near "9223" ERROR: Relation "prodotti" does not exist psql:backup.sql:97070: ERROR: Relation "prodotti" does not exist ERROR: Relation "tipi" does not exist psql:backup.sql:97078: ERROR: Relation "tipi" does not exist ERROR: Relation "particolari" does not exist psql:backup.sql:97086: ERROR: Relation "particolari" does not exist You are now connected to database template1 as user wiznet. CREATE DATABASE Don't know what happened exactly, but it seems it failed when restoring a lot of text entries. Line 94383+47 (if this is how "line 47" should be interpreted) looks like "^M\" in vim, without quotes - I think it is a CR character. Anyway, it seems it skipped a lot of backup lines then, as it completely failed to recover a database following the records of that table: you can see the error lines when it tried to create the tables in that db, without having created it. Obviously it also skipped the rest of the dump of the previous db. It seems I'll stick to 7.1.3 for a while on the production machine (hoping I won't need to restore the db there too... I'm a bit scared now) -- Simone Tellini E-mail: tellini@areabusiness.it http://www.areabusiness.it
Simone Tellini <tellini@areabusiness.it> writes: > I was trying to restore a dump from pg 7.1.3 to the newly installed 7.2, > but I got these errors: > You are now connected as new user postgres. > ERROR: copy: line 47, CopyReadAttribute: end of record marker corrupted > psql:backup.sql:94383: ERROR: copy: line 47, CopyReadAttribute: end of record marker corrupted > psql:backup.sql:94383: lost synchronization with server, resetting connection It's going to be difficult to do anything about this unless you show us the failing data. The segment of the dump file right around line 94383 would be interesting, as would the declaration of the table that's being loaded there. Also, what locale and/or multibyte settings are you using? regards, tom lane
Simone Tellini <tellini@areabusiness.it> writes: > here's a bit of data: the \connect you can see is at line 94382 Okay, the problem with this file is fairly obvious after looking at it with od -c: 00000d0 \n \r \n \ c o n n e c t - p o 00000e0 s t g r e s \r \n C O P Y " i n 00000f0 t e r r o g a z i o n i " F 0000100 R O M s t d i n ; \r \n 9 7 \t 2 0000110 0 0 0 0 5 1 2 \t B a n c a I n 0000120 t e s a e C e n t r o C o 0000130 n t a b i l e C o m i t d i 0000140 P a r m a \t 4 \t V E R B A L E 0000150 D E L C O N S I G L I O C 0000160 O M U N A L E \r \ \r \n S E D U 0000170 T A D E L 1 2 M A G G I O The newlines in the file have been translated to \r\n, which messes up the backslash escaping. It looks to me like the string data that was originally dumped contained CR/LF newline sequences (\r\n), which would have been emitted by COPY as \r\\n (ie, carriage return, backslash, newline). What's coming back in is carriage return, backslash, carriage return, newline --- so the backslash uselessly quotes the second carriage return and the newline is taken as the end of the line of COPY data. I see a result like so: psql:tellini.sql:11: ERROR: copy: line 2, pg_atoi: error in "SEDUTA DEL 12 MAGG ": can't parse "SEDUTA DEL 12 MAGGIO 2000 psql:tellini.sql:11: lost synchronization with server, resetting connection I suppose you are working under Windows, or have copied the file into Windows at some point. You need to find a way to avoid corrupting the data file with Windows newline conversion. regards, tom lane
Tom Lane wrote: > Simone Tellini <tellini@areabusiness.it> writes: > > here's a bit of data: the \connect you can see is at line 94382 > > Okay, the problem with this file is fairly obvious after looking at it > with od -c: > > 00000d0 \n \r \n \ c o n n e c t - p o > 00000e0 s t g r e s \r \n C O P Y " i n > 00000f0 t e r r o g a z i o n i " F > 0000100 R O M s t d i n ; \r \n 9 7 \t 2 > 0000110 0 0 0 0 5 1 2 \t B a n c a I n > 0000120 t e s a e C e n t r o C o > 0000130 n t a b i l e C o m i t d i > 0000140 P a r m a \t 4 \t V E R B A L E > 0000150 D E L C O N S I G L I O C > 0000160 O M U N A L E \r \ \r \n S E D U > 0000170 T A D E L 1 2 M A G G I O I think we should throw a very specific error from COPY when it sees \r ending a line. Comments? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I think we should throw a very specific error from COPY when it sees \r > ending a line. Comments? Certainly *not*, as that is a valid data pattern. At present. See my proposal on pghackers for fixing this properly. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I think we should throw a very specific error from COPY when it sees \r > > ending a line. Comments? > > Certainly *not*, as that is a valid data pattern. At present. > See my proposal on pghackers for fixing this properly. I was thinking of throwing a specific error when copy fails _and_ when the line ends with a \r, rather than a generic COPY failure message --- not sure how to do that, though. I don't think I like the carriage-return -> \r solution anymore because that would require people creating COPY files by hand, which I am sure many do, to also escape carriage returns, which seems too MS-ish for me. We would be hoisting stupid operating system principles on COPY file creators. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I was thinking of throwing a specific error when copy fails _and_ when > the line ends with a \r, rather than a generic COPY failure message --- > not sure how to do that, though. Doesn't seem practical, as the actual error may not occur till much later, far away from the COPY code itself. > I don't think I like the carriage-return -> \r solution anymore because > that would require people creating COPY files by hand, which I am sure > many do, to also escape carriage returns, which seems too MS-ish for me. We have to change *something*, Bruce, because as things stand it's completely ambiguous whether a \r is a legitimate data character or something introduced by a newline conversion. If you insist on 100% backwards compatibility then we'll be fighting this problem forever. (Or at least till Windows dies ... yeah right.) Please note also that \ r (two characters) is *already* accepted as meaning a carriage return. Ditto \ 0 1 5. So it's quite possible that applications generating COPY data may be using one of these representations and not be affected in the slightest. regards, tom lane
On Sun, 10 Feb 2002 20:43:16 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: TL> 0000160 O M U N A L E \r \ \r \n S E D U TL> 0000170 T A D E L 1 2 M A G G I O TL> TL> The newlines in the file have been translated to \r\n, which messes up oh, btw, the \r\n newlines are part of the content of the text field, they were *in* the database and I expect them to stay there after I restore the backup :-) They were not converted after the dump had been generated, if you suspect this. -- Simone Tellini E-mail: tellini@areabusiness.it http://www.areabusiness.it
On Sun, 10 Feb 2002 20:43:16 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: TL> I suppose you are working under Windows, or have copied the file into TL> Windows at some point. You need to find a way to avoid corrupting the TL> data file with Windows newline conversion. No, I was working under linux: I dumped the db with pg 7.1.3 and tried to restore it under 7.2, on the same machine. The dump has only been gzipped by my backup script (pg_dumpall | gzip) and then decompressed (gzip -d), but this shouldn't have modified it at all. -- Simone Tellini E-mail: tellini@areabusiness.it http://www.areabusiness.it
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I was thinking of throwing a specific error when copy fails _and_ when > > the line ends with a \r, rather than a generic COPY failure message --- > > not sure how to do that, though. > > Doesn't seem practical, as the actual error may not occur till much > later, far away from the COPY code itself. OK, how about if we have a global char * called "extra_elog_output" which we set before doing a /utils/adt data conversion on the last column of the first row _if_ it ends in a carriage return. We clear it once we successfully return. In elog.c, we print any extra string that may be defined for extra_elog_output. I wonder if we could make use of this other places. Also, if we add a WITH TRIMCR option to COPY, we can suggest that in the error message. We can add ONLYCR for Mac files ending only in \r. > > > I don't think I like the carriage-return -> \r solution anymore because > > that would require people creating COPY files by hand, which I am sure > > many do, to also escape carriage returns, which seems too MS-ish for me. > > We have to change *something*, Bruce, because as things stand it's > completely ambiguous whether a \r is a legitimate data character or > something introduced by a newline conversion. If you insist on 100% > backwards compatibility then we'll be fighting this problem forever. > (Or at least till Windows dies ... yeah right.) I think we can live with it _if_ we report a proper error message and suggest a solution. In fact, in some ways, it may be better to report and fail rather than silently try to fix it. > Please note also that \ r (two characters) is *already* accepted as > meaning a carriage return. Ditto \ 0 1 5. So it's quite possible > that applications generating COPY data may be using one of these > representations and not be affected in the slightest. Yes, I know, and it is obvious to COPY coders they have to escape \n because that is their end-of-line character, but it is not obvious why they would escape \r. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Mon, 11 Feb 2002 10:31:41 -0500 (EST) Bruce Momjian <pgman@candle.pha.pa.us> wrote: BM> I think we can live with it _if_ we report a proper error message and BM> suggest a solution. In fact, in some ways, it may be better to report BM> and fail rather than silently try to fix it. err... probably I didn't make it clear, but the problem is that you can't restore a database if you've got a text field containing CRLF terminated text! You really can't say "we can live with it", can you? ;-) -- Simone Tellini E-mail: tellini@areabusiness.it http://www.areabusiness.it
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I think we can live with it _if_ we report a proper error message and > suggest a solution. Looking for a way to report an error message misses the point entirely, IMHO (even if it weren't a horrid kluge). I don't want to report an error message; I want to read the file successfully. If the file in question is your dump from an already-erased previous version, being told that you messed it up is no help. Adding options to the COPY command is no help either, because the COPY may be embedded in a multi-gigabyte dump file that you can't edit conveniently. (And what of pg_restore, or other applications that generate COPY commands?) My approach of using a GUC variable is better because it can be set from outside the dump script --- in postgresql.conf, if necessary. regards, tom lane
> > >Adding options to the COPY command is no help either, because the COPY >may be embedded in a multi-gigabyte dump file that you can't edit >conveniently. (And what of pg_restore, or other applications that >generate COPY commands?) My approach of using a GUC variable is better >because it can be set from outside the dump script --- in >postgresql.conf, if necessary. Speaking of COPY, is there a way to tell copy not to barf on errors (and set how many errors are acceptable)? Can I tell copy to commit every X rows? -- Naomi Walker Chief Information Officer Eldorado Computing, Inc. 602-604-3100 ext 242
Simone Tellini <tellini@areabusiness.it> writes: > err... probably I didn't make it clear, but the problem is that you > can't restore a database if you've got a text field containing CRLF > terminated text! Au contraire: it works fine ... so long as the COPY data hasn't been munged between being emitted and being reloaded. There's a thread in progress now on pghackers concerning whether we can't make COPY data files proof against Microsoftish newline translations, but until that happens, the rule is simple. Don't pass the data file through anything that thinks it's okay to reformat newlines. regards, tom lane
On Mon, 11 Feb 2002 11:56:28 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: TL> > err... probably I didn't make it clear, but the problem is that you TL> > can't restore a database if you've got a text field containing CRLF TL> > terminated text! TL> Au contraire: it works fine ... so long as the COPY data hasn't been TL> munged between being emitted and being reloaded. no, it doesn't. I'll repeat myself: I've dumped the database on 7.1.3 and tried to restore it with 7.2 on the same Linux machine without modifying the dump. It didn't work at all. -- Simone Tellini E-mail: tellini@areabusiness.it http://www.areabusiness.it
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I think we can live with it _if_ we report a proper error message and > > suggest a solution. > > Looking for a way to report an error message misses the point entirely, > IMHO (even if it weren't a horrid kluge). I don't want to report an > error message; I want to read the file successfully. If the file in > question is your dump from an already-erased previous version, being > told that you messed it up is no help. > > Adding options to the COPY command is no help either, because the COPY > may be embedded in a multi-gigabyte dump file that you can't edit > conveniently. (And what of pg_restore, or other applications that > generate COPY commands?) My approach of using a GUC variable is better > because it can be set from outside the dump script --- in > postgresql.conf, if necessary. OK, GUC is OK if you prefer. Report the error about the munged file, and tell them what GUC parameter to set to load the file properly. How is that? Of course, that GUC parameter is going to affect everyone connecting, which seems like somewhat of a problem. I guess they could do: $ psql => SET COPYCR TO 'on'; => \i copydumpfile rather than modifying postgresql.conf. This will not work for pg_restore. Another option is to just print the sed command that would strip off the \r from the end of the line. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Mon, 11 Feb 2002, Simone Tellini wrote: > > On Mon, 11 Feb 2002 11:56:28 -0500 > Tom Lane <tgl@sss.pgh.pa.us> wrote: > > TL> > err... probably I didn't make it clear, but the problem is that you > TL> > can't restore a database if you've got a text field containing CRLF > TL> > terminated text! > TL> Au contraire: it works fine ... so long as the COPY data hasn't been > TL> munged between being emitted and being reloaded. > > no, it doesn't. > > I'll repeat myself: I've dumped the database on 7.1.3 and tried to > restore it with 7.2 on the same Linux machine without modifying the dump. > It didn't work at all. I just did a test on a debian machine from 7.1.3 to 7.2 with CRLF included text with no apparent problems. There must be something more to it that that. create table a(a text); insert into a values ('abc\015\012abc'); insert into a values ('abc\015\012'); insert into a values ('\015\012'); insert into a values ('\015\012abc'); insert into a values ('\015\012'); insert into a values ('\015'); insert into a values ('abc\015'); - dump and restore then worked properly for the database containing this table including restoring the CRs and LFs afaics
On Mon, 11 Feb 2002 10:35:39 -0800 (PST) Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote: SS> I just did a test on a debian machine from 7.1.3 to 7.2 with CRLF included SS> text with no apparent problems. There must be something more to it that SS> that. hm... I think I've found it: at the end of the data of the table there was this line: \.O That O seems the cause of the mess. Any idea how it got there? The file has been created with pg_dumpall | gzip > file.gz -- Simone Tellini E-mail: tellini@areabusiness.it http://www.areabusiness.it
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Another option is to just print the sed command that would strip off the > > \r from the end of the line. > > You keep on assuming that a mechanical fix is possible when a file's > been munged. It is *NOT*. There is no reliable way to tell whether a > \r was part of the original data or was added by a newline converter. > We will not do anyone a service by printing a message that falsely > suggests they don't need to think carefully about how to reconstruct > their data. > > The only thing worse than no error message is a misleading error > message. Well, if the COPY fails on the last value of the first row, and that value has a trailing \r, we can be pretty sure. I don't see how your solution make it fool-proof, except by requiring every COPY file, old and new, and created, to have CR as \r. That seems too heavy-handed the other way. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Another option is to just print the sed command that would strip off the > \r from the end of the line. You keep on assuming that a mechanical fix is possible when a file's been munged. It is *NOT*. There is no reliable way to tell whether a \r was part of the original data or was added by a newline converter. We will not do anyone a service by printing a message that falsely suggests they don't need to think carefully about how to reconstruct their data. The only thing worse than no error message is a misleading error message. regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Well, if the COPY fails on the last value of the first row, and that > value has a trailing \r, we can be pretty sure. In the examples given, the error didn't show up till later rows, in fields where there was no \r anywhere. > I don't see how your > solution make it fool-proof, except by requiring every COPY file, old > and new, and created, to have CR as \r. There *is no* foolproof solution with the current data representation (and your idea of throwing untrustworthy error messages is certainly not foolproof). What we need to do is migrate to a new data representation. Fortunately, it's nearly there already, since \ r and \ n sequences are already supported by COPY IN. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Well, if the COPY fails on the last value of the first row, and that > > value has a trailing \r, we can be pretty sure. > > In the examples given, the error didn't show up till later rows, in > fields where there was no \r anywhere. Hard to imagine why a failure would occur on anything but the first row. The original poster had \.0, which was some other issue. I don't remember seeing those examples. > > I don't see how your > > solution make it fool-proof, except by requiring every COPY file, old > > and new, and created, to have CR as \r. > > There *is no* foolproof solution with the current data representation > (and your idea of throwing untrustworthy error messages is certainly not > foolproof). What we need to do is migrate to a new data representation. > Fortunately, it's nearly there already, since \ r and \ n sequences are > already supported by COPY IN. We clearly should be outputting \r and \n from COPY by default. Input is the issue. We are foolproof it people don't munge there original files. The question is how many people are munging them vs. how many people are creating COPY files on their own (and need to add mapping for \r), or have COPY files from older versions of PostgreSQL. I think the latter group is much larger. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> In the examples given, the error didn't show up till later rows, in > >> fields where there was no \r anywhere. > > > Hard to imagine why a failure would occur on anything but the first row. > > Simple: suppose the dumped data contains \ \n in the body of a field, > which is the currently-accepted representation for a data newline. > Microsoft munges this to \ \r \n, which will now be read by COPY IN as a > backslashed \r (ie, a data \r) followed by a non-escaped newline. > Ergo, that's the end of the current row. No error is detected (since a > datatype that could have contained a data \n is unlikely to reject a > data \r). Then parsing resumes for the next row at the next input > character. We're totally out of sync and will produce lord only knows > what peculiar error message when, eventually, some datatype input > converter doesn't like what it's given. But that could be many rows > later than where the problem was. > > Part of the problem here is that COPY IN will accept a "short" row (one > with fewer fields than the table actually has columns). So early > termination of a line isn't recognizable as an error unless the last > column that receives some data manages to recognize a formatting error. > Personally I think it would be a good idea to change that --- but I > suppose you'll complain that that might break existing applications, > even though the "feature" has never been documented. No, I will not complain at all. We should not accept short rows, period. It is reasonable to expect a full row every time. I just think forcing the escape of \r isn't reasonable. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I just think forcing the escape of \r isn't reasonable. It's not ideal, certainly, but I think it's preferable to continuing to get burned every time a pg_dump file comes within hailing distance of Windoze. We've seen these complaints over and over, and we'll continue to see them again and again until we do something to become proof against the problem. There is no solution that doesn't involve some amount of backwards-incompatibility. So let's bite the bullet and do it. We can offer a GUC switch to provide the old COPY IN behavior for people who need it --- but given the shape of the world today, the default behavior needs to be Microsoft-proof. regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> In the examples given, the error didn't show up till later rows, in >> fields where there was no \r anywhere. > Hard to imagine why a failure would occur on anything but the first row. Simple: suppose the dumped data contains \ \n in the body of a field, which is the currently-accepted representation for a data newline. Microsoft munges this to \ \r \n, which will now be read by COPY IN as a backslashed \r (ie, a data \r) followed by a non-escaped newline. Ergo, that's the end of the current row. No error is detected (since a datatype that could have contained a data \n is unlikely to reject a data \r). Then parsing resumes for the next row at the next input character. We're totally out of sync and will produce lord only knows what peculiar error message when, eventually, some datatype input converter doesn't like what it's given. But that could be many rows later than where the problem was. Part of the problem here is that COPY IN will accept a "short" row (one with fewer fields than the table actually has columns). So early termination of a line isn't recognizable as an error unless the last column that receives some data manages to recognize a formatting error. Personally I think it would be a good idea to change that --- but I suppose you'll complain that that might break existing applications, even though the "feature" has never been documented. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I just think forcing the escape of \r isn't reasonable. > > It's not ideal, certainly, but I think it's preferable to continuing > to get burned every time a pg_dump file comes within hailing distance > of Windoze. We've seen these complaints over and over, and we'll > continue to see them again and again until we do something to become > proof against the problem. There is no solution that doesn't involve > some amount of backwards-incompatibility. So let's bite the bullet and > do it. We can offer a GUC switch to provide the old COPY IN behavior > for people who need it --- but given the shape of the world today, the > default behavior needs to be Microsoft-proof. I am afraid we will fix the \r\n problem but cause many more problems with old dumps and hand-created COPY files. What if we add GUC param and keep the behavior unchanged? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Tom Lane writes: > Simple: suppose the dumped data contains \ \n in the body of a field, > which is the currently-accepted representation for a data newline. > Microsoft munges this to \ \r \n, which will now be read by COPY IN as a > backslashed \r (ie, a data \r) followed by a non-escaped newline. ISTM that this class of problem should be avoided by reading COPY input as a text stream rather than a binary stream on all operating systems. Doesn't that effectively guarantee that your program sees the end of the line as "\n" only, no matter how it's actually stored? -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes: > ISTM that this class of problem should be avoided by reading COPY input as > a text stream rather than a binary stream on all operating systems. > Doesn't that effectively guarantee that your program sees the end of the > line as "\n" only, no matter how it's actually stored? (1) No, that just guarantees that we have no control over the breakage or recovering from it. Our real problem is that COPY's data representation is not newline-translation-proof; opening up an additional place where newline translation might or might not occur cannot fix this problem, it can only make it worse. (2) On Unix machines, "text mode" I/O does not do anything different from binary mode anyway. Our problem is with pg_dump data that has temporarily left the Unix world and been munged in transit. We want to be able to deal with that whether or not the local libc has ever heard of \r\n newlines. regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> We can offer a GUC switch to provide the old COPY IN behavior >> for people who need it --- but given the shape of the world today, the >> default behavior needs to be Microsoft-proof. > I am afraid we will fix the \r\n problem but cause many more problems > with old dumps and hand-created COPY files. > What if we add GUC param and keep the behavior unchanged? Well, at that point it comes down to opinions. I stand by my opinion above, but I have no more numbers to prove it than you do. However, if we make the appropriate code and documentation changes in 7.2.1, then (a) dumps from 7.2.1 or later will be no problem, and (b) people will have nearly a full release cycle to update any existing applications that generate the deprecated \r representation. So I don't think that it'll be a big problem for 7.3 to have a non-backwards-compatible default behavior. regards, tom lane
In article <15618.1013463240@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> wrote: >(2) On Unix machines, "text mode" I/O does not do anything different >from binary mode anyway. Our problem is with pg_dump data that has >temporarily left the Unix world and been munged in transit. We want >to be able to deal with that whether or not the local libc has ever >heard of \r\n newlines. Are there many (any?) complaints about a similar problem when a file is created/modified on a Mac system with it's own EOL characteristics? mrc -- Mike Castle dalgoda@ix.netcom.com www.netcom.com/~dalgoda/ We are all of us living in the shadow of Manhattan. -- Watchmen fatal ("You are in a maze of twisty compiler features, all different"); -- gcc
At 01:42 PM 2/11/2002 , Tom Lane wrote: >It's not ideal, certainly, but I think it's preferable to continuing to >get burned every time a pg_dump file comes within hailing distance of Windoze. Perhaps something along this line would be useful. -crl -- Chad R. Larson (CRL22) chad@eldocomp.com Eldorado Computing, Inc. 602-604-3100 5353 North 16th Street, Suite 400 Phoenix, Arizona 85016-3228