Here is a new version of the patch. I realized that if you were loading
in a column that had only a single column, the tests wouldn't throw an
error on a literal \r, so I added code to record the end-of-line type
for the first line, and make sure the rest of the lines match.
It is a little more code, but I think COPY is important enough to
cleanly handle this and be 100% reliable.
---------------------------------------------------------------------------
Bruce Momjian wrote:
> Here is a patch to allow COPY FROM to accept line terminators of \r, \n,
> and \r\n, and for COPY TO to output \r\n on Win32.
>
> CHANGES FROM PREVIOUS BEHAVIOR:
>
> o We used to allow a literal carriage return as a data value,
> while this patch will assume it is a line terminator.
>
> This was not documented in the COPY manual page, and was not output as
> part of COPY, but it was accepted, while in 7.4 it will not. You can
> still supply carriage return as \r or backslash-carriage-return.
>
> One trick was to prevent silently ignoring carriage returns at the end
> of a line in non-\r\n files. The solution was to create a has_crnl
> variable that is set from the first copy line --- if it is false, a
> literal carriage return found as a data value will throw an error, while
> a newline without a preceeding carriage return also throws an error.
> Backslash-literal still works fine. Literal carriage returns or line
> feeds not at the end of a line will cause the next line to have the
> incorrect number of fields which will throw an error.
>
> Even single-line COPY tables are properly checked when using
> STDIN/STDOUT because the \. must also terminate consistenly.
>
> Another change is that Win32 will output COPY files as native \r\n,
> rather than \n. Of course, this can be loaded into non-Win32 too.
>
> Should be be outputting \r for OS X?
>
> The good news is that copy.c is the only place where EOL still needs to
> be dealt with. Other files are either open in text mode (meaning they
> can handle any end-of-line format) or aren't edited/created by users.
> There is no need to change psql \copy because those files are opened in
> text mode.
>
> I also cleaned up the BinarySignature variable usage.
>
> I have tested with \n and \r\n PGEOL values.
>
> Docs updated.
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman@candle.pha.pa.us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square, Pennsylvania 19073
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Index: doc/src/sgml/ref/copy.sgml
===================================================================
RCS file: /cvsroot/pgsql-server/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.42
diff -c -c -r1.42 copy.sgml
*** doc/src/sgml/ref/copy.sgml 15 Apr 2003 13:25:08 -0000 1.42
--- doc/src/sgml/ref/copy.sgml 18 Apr 2003 23:42:48 -0000
***************
*** 289,295 ****
otherwise be taken as row or column delimiters. In particular, the
following characters <emphasis>must</> be preceded by a backslash if
they appear as part of a column value: backslash itself,
! newline, and the current delimiter character.
</para>
<para>
--- 289,295 ----
otherwise be taken as row or column delimiters. In particular, the
following characters <emphasis>must</> be preceded by a backslash if
they appear as part of a column value: backslash itself,
! newline, carriage return, and the current delimiter character.
</para>
<para>
***************
*** 355,370 ****
It is strongly recommended that applications generating COPY data convert
data newlines and carriage returns to the <literal>\n</> and
<literal>\r</> sequences respectively. At present it is
! possible to represent a data carriage return without any special quoting,
! and to represent a data newline by a backslash and newline. However,
! these representations will not be accepted by default in future releases.
</para>
<para>
! Note that the end of each row is marked by a Unix-style newline
! (<quote><literal>\n</></>). Presently, <command>COPY FROM</command> will not behave as
! desired if given a file containing DOS- or Mac-style newlines.
! This is expected to change in future releases.
</para>
</refsect2>
--- 355,370 ----
It is strongly recommended that applications generating COPY data convert
data newlines and carriage returns to the <literal>\n</> and
<literal>\r</> sequences respectively. At present it is
! possible to represent a data carriage return by a backslash and carriage
! return, and to represent a data newline by a backslash and newline.
! However, these representations might not be accepted in future releases.
</para>
<para>
! <command>COPY TO</command> will terminate each row with a Unix-style
! newline (<quote><literal>\n</></>), or carriage return/newline
! ("\r\n") on MS Windows. <command>COPY FROM</command> can handle lines
! ending with newlines, carriage returns, or carriage return/newlines.
</para>
</refsect2>
***************
*** 393,399 ****
12-byte sequence <literal>PGBCOPY\n\377\r\n\0</> --- note that the zero byte
is a required part of the signature. (The signature is designed to allow
easy identification of files that have been munged by a non-8-bit-clean
! transfer. This signature will be changed by newline-translation
filters, dropped zero bytes, dropped high bits, or parity changes.)
</para>
</listitem>
--- 393,399 ----
12-byte sequence <literal>PGBCOPY\n\377\r\n\0</> --- note that the zero byte
is a required part of the signature. (The signature is designed to allow
easy identification of files that have been munged by a non-8-bit-clean
! transfer. This signature will be changed by end-of-line-translation
filters, dropped zero bytes, dropped high bits, or parity changes.)
</para>
</listitem>
Index: src/backend/commands/copy.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/commands/copy.c,v
retrieving revision 1.191
diff -c -c -r1.191 copy.c
*** src/backend/commands/copy.c 4 Apr 2003 20:42:11 -0000 1.191
--- src/backend/commands/copy.c 18 Apr 2003 23:42:50 -0000
***************
*** 49,54 ****
--- 49,60 ----
#define ISOCTAL(c) (((c) >= '0') && ((c) <= '7'))
#define OCTVALUE(c) ((c) - '0')
+ #ifndef WIN32
+ #define PGEOL "\n"
+ #else
+ #define PGEOL "\r\n"
+ #endif
+
/*
* Represents the type of data returned by CopyReadAttribute()
*/
***************
*** 70,82 ****
static void CopyAttributeOut(FILE *fp, char *string, char *delim);
static List *CopyGetAttnums(Relation rel, List *attnamelist);
! static const char BinarySignature[12] = "PGBCOPY\n\377\r\n\0";
/*
* Static communication variables ... pretty grotty, but COPY has
* never been reentrant...
*/
int copy_lineno = 0; /* exported for use by elog() -- dz */
static bool fe_eof;
/*
--- 76,97 ----
static void CopyAttributeOut(FILE *fp, char *string, char *delim);
static List *CopyGetAttnums(Relation rel, List *attnamelist);
! /* The trailing null is part of the signature */
! static const char BinarySignature[] = "PGBCOPY\n\377\r\n";
/*
* Static communication variables ... pretty grotty, but COPY has
* never been reentrant...
*/
int copy_lineno = 0; /* exported for use by elog() -- dz */
+ static enum
+ {
+ EOL_UNKNOWN,
+ EOL_NL,
+ EOL_CR,
+ EOL_CRNL
+ } eol_type;
+
static bool fe_eof;
/*
***************
*** 504,510 ****
else if (!is_from)
{
if (!binary)
! CopySendData("\\.\n", 3, fp);
if (IsUnderPostmaster)
pq_endcopyout(false);
}
--- 519,528 ----
else if (!is_from)
{
if (!binary)
! {
! CopySendString("\\.", fp);
! CopySendString(fp ? PGEOL : "\n", fp);
! }
if (IsUnderPostmaster)
pq_endcopyout(false);
}
***************
*** 589,595 ****
int32 tmp;
/* Signature */
! CopySendData((char *) BinarySignature, 12, fp);
/* Integer layout field */
tmp = 0x01020304;
CopySendData(&tmp, sizeof(int32), fp);
--- 607,613 ----
int32 tmp;
/* Signature */
! CopySendData((char *) BinarySignature, sizeof(BinarySignature), fp);
/* Integer layout field */
tmp = 0x01020304;
CopySendData(&tmp, sizeof(int32), fp);
***************
*** 725,731 ****
}
if (!binary)
! CopySendChar('\n', fp);
MemoryContextSwitchTo(oldcontext);
}
--- 743,749 ----
}
if (!binary)
! CopySendString(fp ? PGEOL : "\n", fp);
MemoryContextSwitchTo(oldcontext);
}
***************
*** 906,912 ****
/* Signature */
CopyGetData(readSig, 12, fp);
! if (CopyGetEof(fp) || memcmp(readSig, BinarySignature, 12) != 0)
elog(ERROR, "COPY BINARY: file signature not recognized");
/* Integer layout field */
CopyGetData(&tmp, sizeof(int32), fp);
--- 924,930 ----
/* Signature */
CopyGetData(readSig, 12, fp);
! if (CopyGetEof(fp) || memcmp(readSig, BinarySignature, sizeof(BinarySignature)) != 0)
elog(ERROR, "COPY BINARY: file signature not recognized");
/* Integer layout field */
CopyGetData(&tmp, sizeof(int32), fp);
***************
*** 937,942 ****
--- 955,961 ----
nulls = (char *) palloc(num_phys_attrs * sizeof(char));
copy_lineno = 0;
+ eol_type = EOL_UNKNOWN;
fe_eof = false;
/* Make room for a PARAM_EXEC value for domain constraint checks */
***************
*** 1350,1357 ****
--- 1369,1412 ----
*result = END_OF_FILE;
goto copy_eof;
}
+ if (c == '\r')
+ {
+ if (eol_type == EOL_NL)
+ elog(ERROR, "CopyReadAttribute: Literal carriage return data value\n"
+ "found in input that has newline termination; use \\r");
+
+ /* Check for \r\n on first line, _and_ handle \r\n. */
+ if (copy_lineno == 1 || eol_type == EOL_CRNL)
+ {
+ int c2 = CopyPeekChar(fp);
+ if (c2 == '\n')
+ {
+ CopyDonePeek(fp, c2, true); /* eat newline */
+ eol_type = EOL_CRNL;
+ }
+ else
+ {
+ /* found \r, but no \n */
+ if (eol_type == EOL_CRNL)
+ elog(ERROR, "CopyReadAttribute: Literal carriage return data value\n"
+ "found in input that has carriage return/newline termination; use \\r");
+ /* if we got here, it is the first line and we didn't get \n, so put it back */
+ CopyDonePeek(fp, c2, false);
+ eol_type = EOL_CR;
+ }
+ }
+ *result = END_OF_LINE;
+ break;
+ }
if (c == '\n')
{
+ if (eol_type == EOL_CRNL)
+ elog(ERROR, "CopyReadAttribute: Literal newline data value found in input\n"
+ "that has carriage return/newline termination; use \\n");
+ if (eol_type == EOL_CR)
+ elog(ERROR, "CopyReadAttribute: Literal newline data value found in input\n"
+ "that has carriage return termination; use \\n");
+ eol_type = EOL_NL;
*result = END_OF_LINE;
break;
}
***************
*** 1441,1451 ****
c = '\v';
break;
case '.':
c = CopyGetChar(fp);
! if (c != '\n')
! elog(ERROR, "CopyReadAttribute: end of record marker corrupted");
*result = END_OF_FILE;
goto copy_eof;
}
}
appendStringInfoCharMacro(&attribute_buf, c);
--- 1496,1519 ----
c = '\v';
break;
case '.':
+ if (eol_type == EOL_CRNL)
+ {
+ c = CopyGetChar(fp);
+ if (c == '\n')
+ elog(ERROR, "CopyReadAttribute: end-of-copy termination does not match previous input");
+ if (c != '\r')
+ elog(ERROR, "CopyReadAttribute: end-of-copy marker corrupt");
+ }
c = CopyGetChar(fp);
! if (c != '\r' && c != '\n')
! elog(ERROR, "CopyReadAttribute: end-of-copy marker corrupt");
! if (((eol_type == EOL_NL || eol_type == EOL_CRNL) && c != '\n') ||
! (eol_type == EOL_CR && c != '\r'))
! elog(ERROR, "CopyReadAttribute: end-of-copy termination does not match previous input");
*result = END_OF_FILE;
goto copy_eof;
+
+ /* Default, fall through with whatever character was just escaped. */
}
}
appendStringInfoCharMacro(&attribute_buf, c);