Win32 patch for COPY - Mailing list pgsql-patches

From Bruce Momjian
Subject Win32 patch for COPY
Date
Msg-id 200304180317.h3I3HK713174@candle.pha.pa.us
Whole thread Raw
Responses Re: Win32 patch for COPY
List pgsql-patches
Here is a patch to allow COPY FROM to accept line terminators of \r, \n,
and \r\n, and for COPY TO to output \r\n on Win32.

CHANGES FROM PREVIOUS BEHAVIOR:

    o We used to allow a literal carriage return as a data value,
      while this patch will assume it is a line terminator.

This was not documented in the COPY manual page, and was not output as
part of COPY, but it was accepted, while in 7.4 it will not.  You can
still supply carriage return as \r or backslash-carriage-return.

One trick was to prevent silently ignoring carriage returns at the end
of a line in non-\r\n files.  The solution was to create a has_crnl
variable that is set from the first copy line --- if it is false, a
literal carriage return found as a data value will throw an error, while
a newline without a preceeding carriage return also throws an error.
Backslash-literal still works fine.  Literal carriage returns or line
feeds not at the end of a line will cause the next line to have the
incorrect number of fields which will throw an error.

Even single-line COPY tables are properly checked when using
STDIN/STDOUT because the \. must also terminate consistenly.

Another change is that Win32 will output COPY files as native \r\n,
rather than \n.  Of course, this can be loaded into non-Win32 too.

Should be be outputting \r for OS X?

The good news is that copy.c is the only place where EOL still needs to
be dealt with.  Other files are either open in text mode (meaning they
can handle any end-of-line format) or aren't edited/created by users.
There is no need to change psql \copy because those files are opened in
text mode.

I also cleaned up the BinarySignature variable usage.

I have tested with \n and \r\n PGEOL values.

Docs updated.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: doc/src/sgml/ref/copy.sgml
===================================================================
RCS file: /cvsroot/pgsql-server/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.42
diff -c -c -r1.42 copy.sgml
*** doc/src/sgml/ref/copy.sgml    15 Apr 2003 13:25:08 -0000    1.42
--- doc/src/sgml/ref/copy.sgml    18 Apr 2003 03:14:05 -0000
***************
*** 289,295 ****
      otherwise be taken as row or column delimiters. In particular, the
      following characters <emphasis>must</> be preceded by a backslash if
      they appear as part of a column value: backslash itself,
!     newline, and the current delimiter character.
     </para>

     <para>
--- 289,295 ----
      otherwise be taken as row or column delimiters. In particular, the
      following characters <emphasis>must</> be preceded by a backslash if
      they appear as part of a column value: backslash itself,
!     newline, carriage return, and the current delimiter character.
     </para>

     <para>
***************
*** 355,370 ****
      It is strongly recommended that applications generating COPY data convert
      data newlines and carriage returns to the <literal>\n</> and
      <literal>\r</> sequences respectively.  At present it is
!     possible to represent a data carriage return without any special quoting,
!     and to represent a data newline by a backslash and newline.  However,
!     these representations will not be accepted by default in future releases.
     </para>

     <para>
!     Note that the end of each row is marked by a Unix-style newline
!     (<quote><literal>\n</></>).  Presently, <command>COPY FROM</command> will not behave as
!     desired if given a file containing DOS- or Mac-style newlines.
!     This is expected to change in future releases.
     </para>
    </refsect2>

--- 355,370 ----
      It is strongly recommended that applications generating COPY data convert
      data newlines and carriage returns to the <literal>\n</> and
      <literal>\r</> sequences respectively.  At present it is
!     possible to represent a data carriage return by a backslash and carriage
!     return, and to represent a data newline by a backslash and newline.
!     However, these representations might not be accepted in future releases.
     </para>

     <para>
!     <command>COPY TO</command> will terminate each row with a Unix-style
!     newline (<quote><literal>\n</></>),  or carriage return/newline
!     ("\r\n") on  MS Windows.  <command>COPY FROM</command> can handle lines
!     ending with newlines, carriage returns, or carriage return/newlines.
     </para>
    </refsect2>

***************
*** 393,399 ****
  12-byte sequence <literal>PGBCOPY\n\377\r\n\0</> --- note that the zero byte
  is a required part of the signature.  (The signature is designed to allow
  easy identification of files that have been munged by a non-8-bit-clean
! transfer.  This signature will be changed by newline-translation
  filters, dropped zero bytes, dropped high bits, or parity changes.)
         </para>
        </listitem>
--- 393,399 ----
  12-byte sequence <literal>PGBCOPY\n\377\r\n\0</> --- note that the zero byte
  is a required part of the signature.  (The signature is designed to allow
  easy identification of files that have been munged by a non-8-bit-clean
! transfer.  This signature will be changed by end-of-line-translation
  filters, dropped zero bytes, dropped high bits, or parity changes.)
         </para>
        </listitem>
Index: src/backend/commands/copy.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/commands/copy.c,v
retrieving revision 1.191
diff -c -c -r1.191 copy.c
*** src/backend/commands/copy.c    4 Apr 2003 20:42:11 -0000    1.191
--- src/backend/commands/copy.c    18 Apr 2003 03:14:51 -0000
***************
*** 49,54 ****
--- 49,60 ----
  #define ISOCTAL(c) (((c) >= '0') && ((c) <= '7'))
  #define OCTVALUE(c) ((c) - '0')

+ #ifndef WIN32
+ #define PGEOL    "\n"
+ #else
+ #define PGEOL    "\r\n"
+ #endif
+
  /*
   * Represents the type of data returned by CopyReadAttribute()
   */
***************
*** 70,82 ****
  static void CopyAttributeOut(FILE *fp, char *string, char *delim);
  static List *CopyGetAttnums(Relation rel, List *attnamelist);

! static const char BinarySignature[12] = "PGBCOPY\n\377\r\n\0";

  /*
   * Static communication variables ... pretty grotty, but COPY has
   * never been reentrant...
   */
  int            copy_lineno = 0;    /* exported for use by elog() -- dz */
  static bool fe_eof;

  /*
--- 76,91 ----
  static void CopyAttributeOut(FILE *fp, char *string, char *delim);
  static List *CopyGetAttnums(Relation rel, List *attnamelist);

! /* The trailing null is part of the signature */
! static const char BinarySignature[] = "PGBCOPY\n\377\r\n";

  /*
   * Static communication variables ... pretty grotty, but COPY has
   * never been reentrant...
   */
  int            copy_lineno = 0;    /* exported for use by elog() -- dz */
+ bool        has_crnl = false;
+
  static bool fe_eof;

  /*
***************
*** 504,510 ****
      else if (!is_from)
      {
          if (!binary)
!             CopySendData("\\.\n", 3, fp);
          if (IsUnderPostmaster)
              pq_endcopyout(false);
      }
--- 513,522 ----
      else if (!is_from)
      {
          if (!binary)
!         {
!             CopySendString("\\.", fp);
!             CopySendString(fp ? PGEOL : "\n", fp);
!         }
          if (IsUnderPostmaster)
              pq_endcopyout(false);
      }
***************
*** 589,595 ****
          int32        tmp;

          /* Signature */
!         CopySendData((char *) BinarySignature, 12, fp);
          /* Integer layout field */
          tmp = 0x01020304;
          CopySendData(&tmp, sizeof(int32), fp);
--- 601,607 ----
          int32        tmp;

          /* Signature */
!         CopySendData((char *) BinarySignature, sizeof(BinarySignature), fp);
          /* Integer layout field */
          tmp = 0x01020304;
          CopySendData(&tmp, sizeof(int32), fp);
***************
*** 725,731 ****
          }

          if (!binary)
!             CopySendChar('\n', fp);

          MemoryContextSwitchTo(oldcontext);
      }
--- 737,743 ----
          }

          if (!binary)
!             CopySendString(fp ? PGEOL : "\n", fp);

          MemoryContextSwitchTo(oldcontext);
      }
***************
*** 906,912 ****

          /* Signature */
          CopyGetData(readSig, 12, fp);
!         if (CopyGetEof(fp) || memcmp(readSig, BinarySignature, 12) != 0)
              elog(ERROR, "COPY BINARY: file signature not recognized");
          /* Integer layout field */
          CopyGetData(&tmp, sizeof(int32), fp);
--- 918,924 ----

          /* Signature */
          CopyGetData(readSig, 12, fp);
!         if (CopyGetEof(fp) || memcmp(readSig, BinarySignature, sizeof(BinarySignature)) != 0)
              elog(ERROR, "COPY BINARY: file signature not recognized");
          /* Integer layout field */
          CopyGetData(&tmp, sizeof(int32), fp);
***************
*** 937,942 ****
--- 949,955 ----
      nulls = (char *) palloc(num_phys_attrs * sizeof(char));

      copy_lineno = 0;
+     has_crnl = false;
      fe_eof = false;

      /* Make room for a PARAM_EXEC value for domain constraint checks */
***************
*** 1350,1357 ****
--- 1363,1403 ----
              *result = END_OF_FILE;
              goto copy_eof;
          }
+         if (c == '\r')
+         {
+             /*
+              *    Do \r\n -> \n mapping only if first line has \r\n.
+              *
+              *    This prevents us from silently discarding literal carriage
+              *    return data values that just happen to be at the end of the line.
+              *    Other literal carrige return and newline data values will just
+              *    throw an error because the next line will have the incorrect number
+              *    of data values.
+              */
+             if (copy_lineno == 1 || has_crnl)
+             {
+                 int c2 = CopyPeekChar(fp);
+                 if (c2 == '\n')
+                 {
+                     CopyDonePeek(fp, c2, true);        /* eat newline */
+                     has_crnl = true;
+                 }
+                 else
+                 {
+                     if (has_crnl)
+                         elog(ERROR, "CopyReadAttribute: Literal carriage return data value\n"
+                                     "found in file containing carriage return/newline termination, use \\r");
+                     CopyDonePeek(fp, c2, false);
+                 }
+             }
+             *result = END_OF_LINE;
+             break;
+         }
          if (c == '\n')
          {
+             if (has_crnl)
+                 elog(ERROR, "CopyReadAttribute: Literal newline data value found in file\n"
+                             "containing carriage return/newline termination, use \\n");
              *result = END_OF_LINE;
              break;
          }
***************
*** 1441,1451 ****
                      c = '\v';
                      break;
                  case '.':
                      c = CopyGetChar(fp);
!                     if (c != '\n')
!                         elog(ERROR, "CopyReadAttribute: end of record marker corrupted");
                      *result = END_OF_FILE;
                      goto copy_eof;
              }
          }
          appendStringInfoCharMacro(&attribute_buf, c);
--- 1487,1506 ----
                      c = '\v';
                      break;
                  case '.':
+                     if (has_crnl)
+                     {
+                         c = CopyGetChar(fp);
+                         if (c != '\r')
+                             elog(ERROR, "Literal carriage return end-of-copy value detected in file containing
carriage\n"
+                                     "return/newline termination, use \\r");
+                     }
                      c = CopyGetChar(fp);
!                     if (c != '\n' && (c != '\r' || has_crnl))
!                         elog(ERROR, "CopyReadAttribute: end of record marker corrupt");
                      *result = END_OF_FILE;
                      goto copy_eof;
+
+                 /* Default, fall through with whatever character was just escaped. */
              }
          }
          appendStringInfoCharMacro(&attribute_buf, c);

pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Win32 defines
Next
From: Manfred Koizar
Date:
Subject: Re: Nested transactions, 1st try