Win32 patch for COPY - Mailing list pgsql-patches
From | Bruce Momjian |
---|---|
Subject | Win32 patch for COPY |
Date | |
Msg-id | 200304180317.h3I3HK713174@candle.pha.pa.us Whole thread Raw |
Responses |
Re: Win32 patch for COPY
|
List | pgsql-patches |
Here is a patch to allow COPY FROM to accept line terminators of \r, \n, and \r\n, and for COPY TO to output \r\n on Win32. CHANGES FROM PREVIOUS BEHAVIOR: o We used to allow a literal carriage return as a data value, while this patch will assume it is a line terminator. This was not documented in the COPY manual page, and was not output as part of COPY, but it was accepted, while in 7.4 it will not. You can still supply carriage return as \r or backslash-carriage-return. One trick was to prevent silently ignoring carriage returns at the end of a line in non-\r\n files. The solution was to create a has_crnl variable that is set from the first copy line --- if it is false, a literal carriage return found as a data value will throw an error, while a newline without a preceeding carriage return also throws an error. Backslash-literal still works fine. Literal carriage returns or line feeds not at the end of a line will cause the next line to have the incorrect number of fields which will throw an error. Even single-line COPY tables are properly checked when using STDIN/STDOUT because the \. must also terminate consistenly. Another change is that Win32 will output COPY files as native \r\n, rather than \n. Of course, this can be loaded into non-Win32 too. Should be be outputting \r for OS X? The good news is that copy.c is the only place where EOL still needs to be dealt with. Other files are either open in text mode (meaning they can handle any end-of-line format) or aren't edited/created by users. There is no need to change psql \copy because those files are opened in text mode. I also cleaned up the BinarySignature variable usage. I have tested with \n and \r\n PGEOL values. Docs updated. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: doc/src/sgml/ref/copy.sgml =================================================================== RCS file: /cvsroot/pgsql-server/doc/src/sgml/ref/copy.sgml,v retrieving revision 1.42 diff -c -c -r1.42 copy.sgml *** doc/src/sgml/ref/copy.sgml 15 Apr 2003 13:25:08 -0000 1.42 --- doc/src/sgml/ref/copy.sgml 18 Apr 2003 03:14:05 -0000 *************** *** 289,295 **** otherwise be taken as row or column delimiters. In particular, the following characters <emphasis>must</> be preceded by a backslash if they appear as part of a column value: backslash itself, ! newline, and the current delimiter character. </para> <para> --- 289,295 ---- otherwise be taken as row or column delimiters. In particular, the following characters <emphasis>must</> be preceded by a backslash if they appear as part of a column value: backslash itself, ! newline, carriage return, and the current delimiter character. </para> <para> *************** *** 355,370 **** It is strongly recommended that applications generating COPY data convert data newlines and carriage returns to the <literal>\n</> and <literal>\r</> sequences respectively. At present it is ! possible to represent a data carriage return without any special quoting, ! and to represent a data newline by a backslash and newline. However, ! these representations will not be accepted by default in future releases. </para> <para> ! Note that the end of each row is marked by a Unix-style newline ! (<quote><literal>\n</></>). Presently, <command>COPY FROM</command> will not behave as ! desired if given a file containing DOS- or Mac-style newlines. ! This is expected to change in future releases. </para> </refsect2> --- 355,370 ---- It is strongly recommended that applications generating COPY data convert data newlines and carriage returns to the <literal>\n</> and <literal>\r</> sequences respectively. At present it is ! possible to represent a data carriage return by a backslash and carriage ! return, and to represent a data newline by a backslash and newline. ! However, these representations might not be accepted in future releases. </para> <para> ! <command>COPY TO</command> will terminate each row with a Unix-style ! newline (<quote><literal>\n</></>), or carriage return/newline ! ("\r\n") on MS Windows. <command>COPY FROM</command> can handle lines ! ending with newlines, carriage returns, or carriage return/newlines. </para> </refsect2> *************** *** 393,399 **** 12-byte sequence <literal>PGBCOPY\n\377\r\n\0</> --- note that the zero byte is a required part of the signature. (The signature is designed to allow easy identification of files that have been munged by a non-8-bit-clean ! transfer. This signature will be changed by newline-translation filters, dropped zero bytes, dropped high bits, or parity changes.) </para> </listitem> --- 393,399 ---- 12-byte sequence <literal>PGBCOPY\n\377\r\n\0</> --- note that the zero byte is a required part of the signature. (The signature is designed to allow easy identification of files that have been munged by a non-8-bit-clean ! transfer. This signature will be changed by end-of-line-translation filters, dropped zero bytes, dropped high bits, or parity changes.) </para> </listitem> Index: src/backend/commands/copy.c =================================================================== RCS file: /cvsroot/pgsql-server/src/backend/commands/copy.c,v retrieving revision 1.191 diff -c -c -r1.191 copy.c *** src/backend/commands/copy.c 4 Apr 2003 20:42:11 -0000 1.191 --- src/backend/commands/copy.c 18 Apr 2003 03:14:51 -0000 *************** *** 49,54 **** --- 49,60 ---- #define ISOCTAL(c) (((c) >= '0') && ((c) <= '7')) #define OCTVALUE(c) ((c) - '0') + #ifndef WIN32 + #define PGEOL "\n" + #else + #define PGEOL "\r\n" + #endif + /* * Represents the type of data returned by CopyReadAttribute() */ *************** *** 70,82 **** static void CopyAttributeOut(FILE *fp, char *string, char *delim); static List *CopyGetAttnums(Relation rel, List *attnamelist); ! static const char BinarySignature[12] = "PGBCOPY\n\377\r\n\0"; /* * Static communication variables ... pretty grotty, but COPY has * never been reentrant... */ int copy_lineno = 0; /* exported for use by elog() -- dz */ static bool fe_eof; /* --- 76,91 ---- static void CopyAttributeOut(FILE *fp, char *string, char *delim); static List *CopyGetAttnums(Relation rel, List *attnamelist); ! /* The trailing null is part of the signature */ ! static const char BinarySignature[] = "PGBCOPY\n\377\r\n"; /* * Static communication variables ... pretty grotty, but COPY has * never been reentrant... */ int copy_lineno = 0; /* exported for use by elog() -- dz */ + bool has_crnl = false; + static bool fe_eof; /* *************** *** 504,510 **** else if (!is_from) { if (!binary) ! CopySendData("\\.\n", 3, fp); if (IsUnderPostmaster) pq_endcopyout(false); } --- 513,522 ---- else if (!is_from) { if (!binary) ! { ! CopySendString("\\.", fp); ! CopySendString(fp ? PGEOL : "\n", fp); ! } if (IsUnderPostmaster) pq_endcopyout(false); } *************** *** 589,595 **** int32 tmp; /* Signature */ ! CopySendData((char *) BinarySignature, 12, fp); /* Integer layout field */ tmp = 0x01020304; CopySendData(&tmp, sizeof(int32), fp); --- 601,607 ---- int32 tmp; /* Signature */ ! CopySendData((char *) BinarySignature, sizeof(BinarySignature), fp); /* Integer layout field */ tmp = 0x01020304; CopySendData(&tmp, sizeof(int32), fp); *************** *** 725,731 **** } if (!binary) ! CopySendChar('\n', fp); MemoryContextSwitchTo(oldcontext); } --- 737,743 ---- } if (!binary) ! CopySendString(fp ? PGEOL : "\n", fp); MemoryContextSwitchTo(oldcontext); } *************** *** 906,912 **** /* Signature */ CopyGetData(readSig, 12, fp); ! if (CopyGetEof(fp) || memcmp(readSig, BinarySignature, 12) != 0) elog(ERROR, "COPY BINARY: file signature not recognized"); /* Integer layout field */ CopyGetData(&tmp, sizeof(int32), fp); --- 918,924 ---- /* Signature */ CopyGetData(readSig, 12, fp); ! if (CopyGetEof(fp) || memcmp(readSig, BinarySignature, sizeof(BinarySignature)) != 0) elog(ERROR, "COPY BINARY: file signature not recognized"); /* Integer layout field */ CopyGetData(&tmp, sizeof(int32), fp); *************** *** 937,942 **** --- 949,955 ---- nulls = (char *) palloc(num_phys_attrs * sizeof(char)); copy_lineno = 0; + has_crnl = false; fe_eof = false; /* Make room for a PARAM_EXEC value for domain constraint checks */ *************** *** 1350,1357 **** --- 1363,1403 ---- *result = END_OF_FILE; goto copy_eof; } + if (c == '\r') + { + /* + * Do \r\n -> \n mapping only if first line has \r\n. + * + * This prevents us from silently discarding literal carriage + * return data values that just happen to be at the end of the line. + * Other literal carrige return and newline data values will just + * throw an error because the next line will have the incorrect number + * of data values. + */ + if (copy_lineno == 1 || has_crnl) + { + int c2 = CopyPeekChar(fp); + if (c2 == '\n') + { + CopyDonePeek(fp, c2, true); /* eat newline */ + has_crnl = true; + } + else + { + if (has_crnl) + elog(ERROR, "CopyReadAttribute: Literal carriage return data value\n" + "found in file containing carriage return/newline termination, use \\r"); + CopyDonePeek(fp, c2, false); + } + } + *result = END_OF_LINE; + break; + } if (c == '\n') { + if (has_crnl) + elog(ERROR, "CopyReadAttribute: Literal newline data value found in file\n" + "containing carriage return/newline termination, use \\n"); *result = END_OF_LINE; break; } *************** *** 1441,1451 **** c = '\v'; break; case '.': c = CopyGetChar(fp); ! if (c != '\n') ! elog(ERROR, "CopyReadAttribute: end of record marker corrupted"); *result = END_OF_FILE; goto copy_eof; } } appendStringInfoCharMacro(&attribute_buf, c); --- 1487,1506 ---- c = '\v'; break; case '.': + if (has_crnl) + { + c = CopyGetChar(fp); + if (c != '\r') + elog(ERROR, "Literal carriage return end-of-copy value detected in file containing carriage\n" + "return/newline termination, use \\r"); + } c = CopyGetChar(fp); ! if (c != '\n' && (c != '\r' || has_crnl)) ! elog(ERROR, "CopyReadAttribute: end of record marker corrupt"); *result = END_OF_FILE; goto copy_eof; + + /* Default, fall through with whatever character was just escaped. */ } } appendStringInfoCharMacro(&attribute_buf, c);
pgsql-patches by date: