Re: Win32 patch for COPY - Mailing list pgsql-patches
From | Bruce Momjian |
---|---|
Subject | Re: Win32 patch for COPY |
Date | |
Msg-id | 200304182345.h3INjhc04871@candle.pha.pa.us Whole thread Raw |
In response to | Win32 patch for COPY (Bruce Momjian <pgman@candle.pha.pa.us>) |
List | pgsql-patches |
Here is a new version of the patch. I realized that if you were loading in a column that had only a single column, the tests wouldn't throw an error on a literal \r, so I added code to record the end-of-line type for the first line, and make sure the rest of the lines match. It is a little more code, but I think COPY is important enough to cleanly handle this and be 100% reliable. --------------------------------------------------------------------------- Bruce Momjian wrote: > Here is a patch to allow COPY FROM to accept line terminators of \r, \n, > and \r\n, and for COPY TO to output \r\n on Win32. > > CHANGES FROM PREVIOUS BEHAVIOR: > > o We used to allow a literal carriage return as a data value, > while this patch will assume it is a line terminator. > > This was not documented in the COPY manual page, and was not output as > part of COPY, but it was accepted, while in 7.4 it will not. You can > still supply carriage return as \r or backslash-carriage-return. > > One trick was to prevent silently ignoring carriage returns at the end > of a line in non-\r\n files. The solution was to create a has_crnl > variable that is set from the first copy line --- if it is false, a > literal carriage return found as a data value will throw an error, while > a newline without a preceeding carriage return also throws an error. > Backslash-literal still works fine. Literal carriage returns or line > feeds not at the end of a line will cause the next line to have the > incorrect number of fields which will throw an error. > > Even single-line COPY tables are properly checked when using > STDIN/STDOUT because the \. must also terminate consistenly. > > Another change is that Win32 will output COPY files as native \r\n, > rather than \n. Of course, this can be loaded into non-Win32 too. > > Should be be outputting \r for OS X? > > The good news is that copy.c is the only place where EOL still needs to > be dealt with. Other files are either open in text mode (meaning they > can handle any end-of-line format) or aren't edited/created by users. > There is no need to change psql \copy because those files are opened in > text mode. > > I also cleaned up the BinarySignature variable usage. > > I have tested with \n and \r\n PGEOL values. > > Docs updated. > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: doc/src/sgml/ref/copy.sgml =================================================================== RCS file: /cvsroot/pgsql-server/doc/src/sgml/ref/copy.sgml,v retrieving revision 1.42 diff -c -c -r1.42 copy.sgml *** doc/src/sgml/ref/copy.sgml 15 Apr 2003 13:25:08 -0000 1.42 --- doc/src/sgml/ref/copy.sgml 18 Apr 2003 23:42:48 -0000 *************** *** 289,295 **** otherwise be taken as row or column delimiters. In particular, the following characters <emphasis>must</> be preceded by a backslash if they appear as part of a column value: backslash itself, ! newline, and the current delimiter character. </para> <para> --- 289,295 ---- otherwise be taken as row or column delimiters. In particular, the following characters <emphasis>must</> be preceded by a backslash if they appear as part of a column value: backslash itself, ! newline, carriage return, and the current delimiter character. </para> <para> *************** *** 355,370 **** It is strongly recommended that applications generating COPY data convert data newlines and carriage returns to the <literal>\n</> and <literal>\r</> sequences respectively. At present it is ! possible to represent a data carriage return without any special quoting, ! and to represent a data newline by a backslash and newline. However, ! these representations will not be accepted by default in future releases. </para> <para> ! Note that the end of each row is marked by a Unix-style newline ! (<quote><literal>\n</></>). Presently, <command>COPY FROM</command> will not behave as ! desired if given a file containing DOS- or Mac-style newlines. ! This is expected to change in future releases. </para> </refsect2> --- 355,370 ---- It is strongly recommended that applications generating COPY data convert data newlines and carriage returns to the <literal>\n</> and <literal>\r</> sequences respectively. At present it is ! possible to represent a data carriage return by a backslash and carriage ! return, and to represent a data newline by a backslash and newline. ! However, these representations might not be accepted in future releases. </para> <para> ! <command>COPY TO</command> will terminate each row with a Unix-style ! newline (<quote><literal>\n</></>), or carriage return/newline ! ("\r\n") on MS Windows. <command>COPY FROM</command> can handle lines ! ending with newlines, carriage returns, or carriage return/newlines. </para> </refsect2> *************** *** 393,399 **** 12-byte sequence <literal>PGBCOPY\n\377\r\n\0</> --- note that the zero byte is a required part of the signature. (The signature is designed to allow easy identification of files that have been munged by a non-8-bit-clean ! transfer. This signature will be changed by newline-translation filters, dropped zero bytes, dropped high bits, or parity changes.) </para> </listitem> --- 393,399 ---- 12-byte sequence <literal>PGBCOPY\n\377\r\n\0</> --- note that the zero byte is a required part of the signature. (The signature is designed to allow easy identification of files that have been munged by a non-8-bit-clean ! transfer. This signature will be changed by end-of-line-translation filters, dropped zero bytes, dropped high bits, or parity changes.) </para> </listitem> Index: src/backend/commands/copy.c =================================================================== RCS file: /cvsroot/pgsql-server/src/backend/commands/copy.c,v retrieving revision 1.191 diff -c -c -r1.191 copy.c *** src/backend/commands/copy.c 4 Apr 2003 20:42:11 -0000 1.191 --- src/backend/commands/copy.c 18 Apr 2003 23:42:50 -0000 *************** *** 49,54 **** --- 49,60 ---- #define ISOCTAL(c) (((c) >= '0') && ((c) <= '7')) #define OCTVALUE(c) ((c) - '0') + #ifndef WIN32 + #define PGEOL "\n" + #else + #define PGEOL "\r\n" + #endif + /* * Represents the type of data returned by CopyReadAttribute() */ *************** *** 70,82 **** static void CopyAttributeOut(FILE *fp, char *string, char *delim); static List *CopyGetAttnums(Relation rel, List *attnamelist); ! static const char BinarySignature[12] = "PGBCOPY\n\377\r\n\0"; /* * Static communication variables ... pretty grotty, but COPY has * never been reentrant... */ int copy_lineno = 0; /* exported for use by elog() -- dz */ static bool fe_eof; /* --- 76,97 ---- static void CopyAttributeOut(FILE *fp, char *string, char *delim); static List *CopyGetAttnums(Relation rel, List *attnamelist); ! /* The trailing null is part of the signature */ ! static const char BinarySignature[] = "PGBCOPY\n\377\r\n"; /* * Static communication variables ... pretty grotty, but COPY has * never been reentrant... */ int copy_lineno = 0; /* exported for use by elog() -- dz */ + static enum + { + EOL_UNKNOWN, + EOL_NL, + EOL_CR, + EOL_CRNL + } eol_type; + static bool fe_eof; /* *************** *** 504,510 **** else if (!is_from) { if (!binary) ! CopySendData("\\.\n", 3, fp); if (IsUnderPostmaster) pq_endcopyout(false); } --- 519,528 ---- else if (!is_from) { if (!binary) ! { ! CopySendString("\\.", fp); ! CopySendString(fp ? PGEOL : "\n", fp); ! } if (IsUnderPostmaster) pq_endcopyout(false); } *************** *** 589,595 **** int32 tmp; /* Signature */ ! CopySendData((char *) BinarySignature, 12, fp); /* Integer layout field */ tmp = 0x01020304; CopySendData(&tmp, sizeof(int32), fp); --- 607,613 ---- int32 tmp; /* Signature */ ! CopySendData((char *) BinarySignature, sizeof(BinarySignature), fp); /* Integer layout field */ tmp = 0x01020304; CopySendData(&tmp, sizeof(int32), fp); *************** *** 725,731 **** } if (!binary) ! CopySendChar('\n', fp); MemoryContextSwitchTo(oldcontext); } --- 743,749 ---- } if (!binary) ! CopySendString(fp ? PGEOL : "\n", fp); MemoryContextSwitchTo(oldcontext); } *************** *** 906,912 **** /* Signature */ CopyGetData(readSig, 12, fp); ! if (CopyGetEof(fp) || memcmp(readSig, BinarySignature, 12) != 0) elog(ERROR, "COPY BINARY: file signature not recognized"); /* Integer layout field */ CopyGetData(&tmp, sizeof(int32), fp); --- 924,930 ---- /* Signature */ CopyGetData(readSig, 12, fp); ! if (CopyGetEof(fp) || memcmp(readSig, BinarySignature, sizeof(BinarySignature)) != 0) elog(ERROR, "COPY BINARY: file signature not recognized"); /* Integer layout field */ CopyGetData(&tmp, sizeof(int32), fp); *************** *** 937,942 **** --- 955,961 ---- nulls = (char *) palloc(num_phys_attrs * sizeof(char)); copy_lineno = 0; + eol_type = EOL_UNKNOWN; fe_eof = false; /* Make room for a PARAM_EXEC value for domain constraint checks */ *************** *** 1350,1357 **** --- 1369,1412 ---- *result = END_OF_FILE; goto copy_eof; } + if (c == '\r') + { + if (eol_type == EOL_NL) + elog(ERROR, "CopyReadAttribute: Literal carriage return data value\n" + "found in input that has newline termination; use \\r"); + + /* Check for \r\n on first line, _and_ handle \r\n. */ + if (copy_lineno == 1 || eol_type == EOL_CRNL) + { + int c2 = CopyPeekChar(fp); + if (c2 == '\n') + { + CopyDonePeek(fp, c2, true); /* eat newline */ + eol_type = EOL_CRNL; + } + else + { + /* found \r, but no \n */ + if (eol_type == EOL_CRNL) + elog(ERROR, "CopyReadAttribute: Literal carriage return data value\n" + "found in input that has carriage return/newline termination; use \\r"); + /* if we got here, it is the first line and we didn't get \n, so put it back */ + CopyDonePeek(fp, c2, false); + eol_type = EOL_CR; + } + } + *result = END_OF_LINE; + break; + } if (c == '\n') { + if (eol_type == EOL_CRNL) + elog(ERROR, "CopyReadAttribute: Literal newline data value found in input\n" + "that has carriage return/newline termination; use \\n"); + if (eol_type == EOL_CR) + elog(ERROR, "CopyReadAttribute: Literal newline data value found in input\n" + "that has carriage return termination; use \\n"); + eol_type = EOL_NL; *result = END_OF_LINE; break; } *************** *** 1441,1451 **** c = '\v'; break; case '.': c = CopyGetChar(fp); ! if (c != '\n') ! elog(ERROR, "CopyReadAttribute: end of record marker corrupted"); *result = END_OF_FILE; goto copy_eof; } } appendStringInfoCharMacro(&attribute_buf, c); --- 1496,1519 ---- c = '\v'; break; case '.': + if (eol_type == EOL_CRNL) + { + c = CopyGetChar(fp); + if (c == '\n') + elog(ERROR, "CopyReadAttribute: end-of-copy termination does not match previous input"); + if (c != '\r') + elog(ERROR, "CopyReadAttribute: end-of-copy marker corrupt"); + } c = CopyGetChar(fp); ! if (c != '\r' && c != '\n') ! elog(ERROR, "CopyReadAttribute: end-of-copy marker corrupt"); ! if (((eol_type == EOL_NL || eol_type == EOL_CRNL) && c != '\n') || ! (eol_type == EOL_CR && c != '\r')) ! elog(ERROR, "CopyReadAttribute: end-of-copy termination does not match previous input"); *result = END_OF_FILE; goto copy_eof; + + /* Default, fall through with whatever character was just escaped. */ } } appendStringInfoCharMacro(&attribute_buf, c);
pgsql-patches by date: