Thread: Re: [BUGS] BUG #2114: (patch) COPY FROM ... end of copy marker corrupt

Re: [BUGS] BUG #2114: (patch) COPY FROM ... end of copy marker corrupt

From
Bruce Momjian
Date:
Sorry for the delay in responding.  I have done research on your bug
report, and the problem seems even worse than you reported.  First, a
little background.  In non-CSV text mode, the backslash is the escape
character, so any character appearing after a backslash (all 255 of
them) is treated specially, e.g. \{delimiter}, \r, \n, and for our case
here '\.'.  (A literal backslash is \\).  In CSV mode, the quote is
special, but we don't have 255 special characters after a quote.  Only
two double-quotes, "", are special, a literal double-quote.

This behavior gives us problems for specifying the end-of-copy marker,
which is \, in both modes.  The big problem is that \. is also a valid
CSV data value (though not a valid non-CSV data value).  So, the
solution we came up with was to require \. to appear alone on a line in
CSV mode for it to be treated as end-of-copy.  Your idea of using quotes
worked, but it wasn't the right solution.  We need to enforce the
alone-on-a-line restriction.  Our code had:

        if (c == '\\' && cstate->line_buf.len == 0)

The problem with that is the because of the input and _output_
buffering, cstate->line_buf.len could be zero even if we are not on the
first character of a line.  In fact, for a typical line, it is zero for
all characters on the line.  The proper solution is to introduce a
boolean, first_char_in_line, that we set as we enter the loop and clear
once we process a character.

Looking closer at the code, I see the reason for email comments like
"the copy code is nearing unmaintainability.  The CSV/non-CSV code was
already complex, but the buffering additions in 8.1 pushed it over the
edge.

I have restructured the line-reading code in copy.c by:

    o  merging the CSV/non-CSV functions into a single function
    o  used macros to centralize and clarify the buffering code
    o  updated comments
    o  renamed client_encoding_only to encoding_embeds_ascii
    o  added a high-bit test to the encoding_embeds_ascii test for
       performance
    o  in CSV mode, allow a backslash followed by a non-period to
       continue being processed as a data value

There should be no performance impact from this patch because it is
functionally equivalent.  If you apply the patch you will see copy.c is
much clearer in this area now and might suggest additional
optimizations.

I have also attached a 8.1-only patch to fix the CSV \. handling bug
with no code restructuring.

---------------------------------------------------------------------------

Ben Gould wrote:
>
> The following bug has been logged online:
>
> Bug reference:      2114
> Logged by:          Ben Gould
> Email address:      ben.gould@free.fr
> PostgreSQL version: 8.1.0
> Operating system:   Mac OS X 10.4.3
> Description:        (patch) COPY FROM ... end of copy marker corrupt
> Details:
>
> With a table like:
>
> CREATE TABLE test_table (
> foo text,
> bar text,
> baz text
> );
>
> Using this format for COPY FROM:
>
> COPY test_table FROM STDIN WITH CSV HEADER DELIMITER AS ',' NULL AS 'NULL'
> QUOTE AS '\"' ESCAPE AS '\"'
>
> Where the file was generated via:
>
> COPY test_table TO STDOUT WITH CSV HEADER DELIMITER AS ',' NULL AS 'NULL'
> QUOTE AS '\"' ESCAPE AS '\"' FORCE QUOTE foo, bar, baz;
>
> I needed this patch:
>
> <<<
> --- postgresql-8.1.0.original/src/backend/commands/copy.c       2005-12-13
> 13:18:16.000000000 +0100
> +++ postgresql-8.1.0/src/backend/commands/copy.c        2005-12-13
> 13:28:28.000000000 +0100
> @@ -2531,7 +2531,7 @@
>                 /*
>                  * In CSV mode, we only recognize \. at start of line
>                  */
> -               if (c == '\\' && cstate->line_buf.len == 0)
> +               if (c == '\\' && !in_quote && cstate->line_buf.len == 0)
>                 {
>                         char            c2;
> >>>
>
> Because of this error message:
>
> pg_endcopy warning: ERROR:  end-of-copy marker corrupt
>
> (We have quoted strings containing things like ..\..\.. in the CSV file
> which broke the copy from.)
>
> I was using DBD::Pg as the client library.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: src/backend/commands/copy.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.255
diff -c -c -r1.255 copy.c
*** src/backend/commands/copy.c    22 Nov 2005 18:17:08 -0000    1.255
--- src/backend/commands/copy.c    27 Dec 2005 02:10:18 -0000
***************
*** 76,94 ****

  /*
   * This struct contains all the state variables used throughout a COPY
!  * operation.  For simplicity, we use the same struct for all variants
!  * of COPY, even though some fields are used in only some cases.
   *
!  * A word about encoding considerations: encodings that are only supported on
!  * the client side are those where multibyte characters may have second or
!  * later bytes with the high bit not set.  When scanning data in such an
!  * encoding to look for a match to a single-byte (ie ASCII) character,
!  * we must use the full pg_encoding_mblen() machinery to skip over
!  * multibyte characters, else we might find a false match to a trailing
!  * byte.  In supported server encodings, there is no possibility of
!  * a false match, and it's faster to make useless comparisons to trailing
!  * bytes than it is to invoke pg_encoding_mblen() to skip over them.
!  * client_only_encoding is TRUE when we have to do it the hard way.
   */
  typedef struct CopyStateData
  {
--- 76,94 ----

  /*
   * This struct contains all the state variables used throughout a COPY
!  * operation. For simplicity, we use the same struct for all variants of COPY,
!  * even though some fields are used in only some cases.
   *
!  * Multi-byte encodings: all supported client-side encodings encode multi-byte
!  * characters by having the first byte's high bit set. Subsequent bytes of the
!  * character can have the high bit not set. When scanning data in such an
!  * encoding to look for a match to a single-byte (ie ASCII) character, we must
!  * use the full pg_encoding_mblen() machinery to skip over multibyte
!  * characters, else we might find a false match to a trailing byte. In
!  * supported server encodings, there is no possibility of a false match, and
!  * it's faster to make useless comparisons to trailing bytes than it is to
!  * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is TRUE
!  * when we have to do it the hard way.
   */
  typedef struct CopyStateData
  {
***************
*** 101,107 ****
      EolType        eol_type;        /* EOL type of input */
      int            client_encoding;    /* remote side's character encoding */
      bool        need_transcoding;        /* client encoding diff from server? */
!     bool        client_only_encoding;    /* encoding not valid on server? */

      /* parameters from the COPY command */
      Relation    rel;            /* relation to copy to or from */
--- 101,107 ----
      EolType        eol_type;        /* EOL type of input */
      int            client_encoding;    /* remote side's character encoding */
      bool        need_transcoding;        /* client encoding diff from server? */
!     bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */

      /* parameters from the COPY command */
      Relation    rel;            /* relation to copy to or from */
***************
*** 160,165 ****
--- 160,230 ----
  typedef CopyStateData *CopyState;


+ /*
+  * These macros centralize code used to process line_buf and raw_buf buffers.
+  * They are macros because they often do continue/break control and to avoid
+  * function call overhead in tight COPY loops.
+  *
+  * We must use "if (1)" because "do {} while(0)" overrides the continue/break
+  * processing.  See http://www.cit.gu.edu.au/~anthony/info/C/C.macros.
+  */
+
+ /*
+  * This keeps the character read at the top of the loop in the buffer
+  * even if there is more than one read-ahead.
+  */
+ #define IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(extralen) \
+ if (1) \
+ { \
+     if (raw_buf_ptr + (extralen) >= copy_buf_len && !hit_eof) \
+     { \
+         raw_buf_ptr = prev_raw_ptr; /* undo fetch */ \
+         need_data = true; \
+         continue; \
+     } \
+ } else
+
+
+ /* This consumes the remainder of the buffer and breaks */
+ #define IF_NEED_REFILL_AND_EOF_BREAK(extralen) \
+ if (1) \
+ { \
+     if (raw_buf_ptr + (extralen) >= copy_buf_len && hit_eof) \
+     { \
+         if (extralen) \
+             raw_buf_ptr = copy_buf_len; /* consume the partial character */ \
+         /* backslash just before EOF, treat as data char */ \
+         result = true; \
+         break; \
+     } \
+ } else
+
+
+ /*
+  * Transfer any approved data to line_buf; must do this to be sure
+  * there is some room in raw_buf.
+  */
+ #define REFILL_LINEBUF \
+ if (1) \
+ { \
+     if (raw_buf_ptr > cstate->raw_buf_index) \
+     { \
+         appendBinaryStringInfo(&cstate->line_buf, \
+                              cstate->raw_buf + cstate->raw_buf_index, \
+                                raw_buf_ptr - cstate->raw_buf_index); \
+         cstate->raw_buf_index = raw_buf_ptr; \
+     } \
+ } else
+
+ /* Undo any read-ahead and jump out of the block. */
+ #define NO_END_OF_COPY_GOTO \
+ if (1) \
+ { \
+     raw_buf_ptr = prev_raw_ptr + 1; \
+     goto not_end_of_copy; \
+ } else
+
+
  static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";


***************
*** 169,175 ****
  static void CopyFrom(CopyState cstate);
  static bool CopyReadLine(CopyState cstate);
  static bool CopyReadLineText(CopyState cstate);
- static bool CopyReadLineCSV(CopyState cstate);
  static int CopyReadAttributesText(CopyState cstate, int maxfields,
                         char **fieldvals);
  static int CopyReadAttributesCSV(CopyState cstate, int maxfields,
--- 234,239 ----
***************
*** 940,946 ****
      /* Set up encoding conversion info */
      cstate->client_encoding = pg_get_client_encoding();
      cstate->need_transcoding = (cstate->client_encoding != GetDatabaseEncoding());
!     cstate->client_only_encoding = PG_ENCODING_IS_CLIENT_ONLY(cstate->client_encoding);

      cstate->copy_dest = COPY_FILE;        /* default */

--- 1004,1011 ----
      /* Set up encoding conversion info */
      cstate->client_encoding = pg_get_client_encoding();
      cstate->need_transcoding = (cstate->client_encoding != GetDatabaseEncoding());
!     /* See Multibyte encoding comment above */
!     cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->client_encoding);

      cstate->copy_dest = COPY_FILE;        /* default */

***************
*** 1970,1979 ****
      cstate->line_buf_converted = false;

      /* Parse data and transfer into line_buf */
!     if (cstate->csv_mode)
!         result = CopyReadLineCSV(cstate);
!     else
!         result = CopyReadLineText(cstate);

      if (result)
      {
--- 2035,2041 ----
      cstate->line_buf_converted = false;

      /* Parse data and transfer into line_buf */
!     result = CopyReadLineText(cstate);

      if (result)
      {
***************
*** 2048,2089 ****
  }

  /*
!  * CopyReadLineText - inner loop of CopyReadLine for non-CSV mode
!  *
!  * If you need to change this, better look at CopyReadLineCSV too
   */
  static bool
  CopyReadLineText(CopyState cstate)
  {
-     bool        result;
      char       *copy_raw_buf;
      int            raw_buf_ptr;
      int            copy_buf_len;
!     bool        need_data;
!     bool        hit_eof;
!     char        s[2];

!     s[1] = 0;

!     /* set default status */
!     result = false;

      /*
       * The objective of this loop is to transfer the entire next input line
       * into line_buf.  Hence, we only care for detecting newlines (\r and/or
       * \n) and the end-of-copy marker (\.).
       *
!      * For backwards compatibility we allow backslashes to escape newline
!      * characters.    Backslashes other than the end marker get put into the
!      * line_buf, since CopyReadAttributesText does its own escape processing.
       *
!      * These four characters, and only these four, are assumed the same in
!      * frontend and backend encodings.
       *
!      * For speed, we try to move data to line_buf in chunks rather than one
!      * character at a time.  raw_buf_ptr points to the next character to
!      * examine; any characters from raw_buf_index to raw_buf_ptr have been
!      * determined to be part of the line, but not yet transferred to line_buf.
       *
       * For a little extra speed within the loop, we copy raw_buf and
       * raw_buf_len into local variables.
--- 2110,2162 ----
  }

  /*
!  * CopyReadLineText - inner loop of CopyReadLine for text mode
   */
  static bool
  CopyReadLineText(CopyState cstate)
  {
      char       *copy_raw_buf;
      int            raw_buf_ptr;
      int            copy_buf_len;
!     bool        need_data = false;
!     bool        hit_eof = false;
!     bool        result = false;
!     char        mblen_str[2];
!     /* CSV variables */
!     bool        first_char_in_line = true;
!     bool        in_quote = false,
!                 last_was_esc = false;
!     char        quotec = '\0';
!     char        escapec = '\0';

!     if (cstate->csv_mode)
!     {
!         quotec = cstate->quote[0];
!         escapec = cstate->escape[0];
!         /* ignore special escape processing if it's the same as quotec */
!         if (quotec == escapec)
!             escapec = '\0';
!     }

!     mblen_str[1] = '\0';

      /*
       * The objective of this loop is to transfer the entire next input line
       * into line_buf.  Hence, we only care for detecting newlines (\r and/or
       * \n) and the end-of-copy marker (\.).
       *
!      * In CSV mode, \r and \n inside a quoted field are just part of the data
!      * value and are put in line_buf.  We keep just enough state to know if we
!      * are currently in a quoted field or not.
       *
!      * These four characters, and the CSV escape and quote characters, are
!      * assumed the same in frontend and backend encodings.
       *
!      * For speed, we try to move data from raw_buf to line_buf in chunks
!      * rather than one character at a time.  raw_buf_ptr points to the next
!      * character to examine; any characters from raw_buf_index to raw_buf_ptr
!      * have been determined to be part of the line, but not yet transferred
!      * to line_buf.
       *
       * For a little extra speed within the loop, we copy raw_buf and
       * raw_buf_len into local variables.
***************
*** 2091,2118 ****
      copy_raw_buf = cstate->raw_buf;
      raw_buf_ptr = cstate->raw_buf_index;
      copy_buf_len = cstate->raw_buf_len;
-     need_data = false;            /* flag to force reading more data */
-     hit_eof = false;            /* flag indicating no more data available */

      for (;;)
      {
          int            prev_raw_ptr;
          char        c;

!         /* Load more data if needed */
          if (raw_buf_ptr >= copy_buf_len || need_data)
          {
!             /*
!              * Transfer any approved data to line_buf; must do this to be sure
!              * there is some room in raw_buf.
!              */
!             if (raw_buf_ptr > cstate->raw_buf_index)
!             {
!                 appendBinaryStringInfo(&cstate->line_buf,
!                                      cstate->raw_buf + cstate->raw_buf_index,
!                                        raw_buf_ptr - cstate->raw_buf_index);
!                 cstate->raw_buf_index = raw_buf_ptr;
!             }

              /*
               * Try to read some more data.    This will certainly reset
--- 2164,2188 ----
      copy_raw_buf = cstate->raw_buf;
      raw_buf_ptr = cstate->raw_buf_index;
      copy_buf_len = cstate->raw_buf_len;

      for (;;)
      {
          int            prev_raw_ptr;
          char        c;

!         /*
!          *    Load more data if needed.  Ideally we would just force four bytes
!          *    of read-ahead and avoid the many calls to
!          *    IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(), but the COPY_OLD_FE
!          *    protocol does not allow us to read too far ahead or we might
!          *    read into the next data, so we read-ahead only as far we know
!          *    we can.  One optimization would be to read-ahead four byte here
!          *    if cstate->copy_dest != COPY_OLD_FE, but it hardly seems worth it,
!          *    considering the size of the buffer.
!          */
          if (raw_buf_ptr >= copy_buf_len || need_data)
          {
!             REFILL_LINEBUF;

              /*
               * Try to read some more data.    This will certainly reset
***************
*** 2139,2472 ****
          prev_raw_ptr = raw_buf_ptr;
          c = copy_raw_buf[raw_buf_ptr++];

!         if (c == '\r')
!         {
!             /* Check for \r\n on first line, _and_ handle \r\n. */
!             if (cstate->eol_type == EOL_UNKNOWN ||
!                 cstate->eol_type == EOL_CRNL)
!             {
!                 /*
!                  * If need more data, go back to loop top to load it.
!                  *
!                  * Note that if we are at EOF, c will wind up as '\0' because
!                  * of the guaranteed pad of raw_buf.
!                  */
!                 if (raw_buf_ptr >= copy_buf_len && !hit_eof)
!                 {
!                     raw_buf_ptr = prev_raw_ptr; /* undo fetch */
!                     need_data = true;
!                     continue;
!                 }
!                 c = copy_raw_buf[raw_buf_ptr];
!
!                 if (c == '\n')
!                 {
!                     raw_buf_ptr++;        /* eat newline */
!                     cstate->eol_type = EOL_CRNL;        /* in case not set yet */
!                 }
!                 else
!                 {
!                     /* found \r, but no \n */
!                     if (cstate->eol_type == EOL_CRNL)
!                         ereport(ERROR,
!                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                              errmsg("literal carriage return found in data"),
!                                  errhint("Use \"\\r\" to represent carriage return.")));
!
!                     /*
!                      * if we got here, it is the first line and we didn't find
!                      * \n, so don't consume the peeked character
!                      */
!                     cstate->eol_type = EOL_CR;
!                 }
!             }
!             else if (cstate->eol_type == EOL_NL)
!                 ereport(ERROR,
!                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                          errmsg("literal carriage return found in data"),
!                       errhint("Use \"\\r\" to represent carriage return.")));
!             /* If reach here, we have found the line terminator */
!             break;
!         }
!
!         if (c == '\n')
!         {
!             if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
!                 ereport(ERROR,
!                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                          errmsg("literal newline found in data"),
!                          errhint("Use \"\\n\" to represent newline.")));
!             cstate->eol_type = EOL_NL;    /* in case not set yet */
!             /* If reach here, we have found the line terminator */
!             break;
!         }
!
!         if (c == '\\')
          {
              /*
!              * If need more data, go back to loop top to load it.
               */
!             if (raw_buf_ptr >= copy_buf_len)
              {
!                 if (hit_eof)
!                 {
!                     /* backslash just before EOF, treat as data char */
!                     result = true;
!                     break;
!                 }
!                 raw_buf_ptr = prev_raw_ptr;        /* undo fetch */
!                 need_data = true;
!                 continue;
              }

              /*
!              * In non-CSV mode, backslash quotes the following character even
!              * if it's a newline, so we always advance to next character
               */
!             c = copy_raw_buf[raw_buf_ptr++];
!
!             if (c == '.')
!             {
!                 if (cstate->eol_type == EOL_CRNL)
!                 {
!                     if (raw_buf_ptr >= copy_buf_len && !hit_eof)
!                     {
!                         raw_buf_ptr = prev_raw_ptr;        /* undo fetch */
!                         need_data = true;
!                         continue;
!                     }
!                     /* if hit_eof, c will become '\0' */
!                     c = copy_raw_buf[raw_buf_ptr++];
!                     if (c == '\n')
!                         ereport(ERROR,
!                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                                  errmsg("end-of-copy marker does not match previous newline style")));
!                     if (c != '\r')
!                         ereport(ERROR,
!                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                                  errmsg("end-of-copy marker corrupt")));
!                 }
!                 if (raw_buf_ptr >= copy_buf_len && !hit_eof)
!                 {
!                     raw_buf_ptr = prev_raw_ptr; /* undo fetch */
!                     need_data = true;
!                     continue;
!                 }
!                 /* if hit_eof, c will become '\0' */
!                 c = copy_raw_buf[raw_buf_ptr++];
!                 if (c != '\r' && c != '\n')
!                     ereport(ERROR,
!                             (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                              errmsg("end-of-copy marker corrupt")));
!                 if ((cstate->eol_type == EOL_NL && c != '\n') ||
!                     (cstate->eol_type == EOL_CRNL && c != '\n') ||
!                     (cstate->eol_type == EOL_CR && c != '\r'))
!                     ereport(ERROR,
!                             (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                              errmsg("end-of-copy marker does not match previous newline style")));
!
!                 /*
!                  * Transfer only the data before the \. into line_buf, then
!                  * discard the data and the \. sequence.
!                  */
!                 if (prev_raw_ptr > cstate->raw_buf_index)
!                     appendBinaryStringInfo(&cstate->line_buf,
!                                      cstate->raw_buf + cstate->raw_buf_index,
!                                        prev_raw_ptr - cstate->raw_buf_index);
!                 cstate->raw_buf_index = raw_buf_ptr;
!                 result = true;    /* report EOF */
!                 break;
!             }
!         }
!
!         /*
!          * Do we need to be careful about trailing bytes of multibyte
!          * characters?    (See note above about client_only_encoding)
!          *
!          * We assume here that pg_encoding_mblen only looks at the first byte
!          * of the character!
!          */
!         if (cstate->client_only_encoding)
!         {
!             int            mblen;
!
!             s[0] = c;
!             mblen = pg_encoding_mblen(cstate->client_encoding, s);
!             if (raw_buf_ptr + (mblen - 1) > copy_buf_len)
!             {
!                 if (hit_eof)
!                 {
!                     /* consume the partial character (conversion will fail) */
!                     raw_buf_ptr = copy_buf_len;
!                     result = true;
!                     break;
!                 }
!                 raw_buf_ptr = prev_raw_ptr;        /* undo fetch */
!                 need_data = true;
!                 continue;
!             }
!             raw_buf_ptr += mblen - 1;
!         }
!     }                            /* end of outer loop */
!
!     /*
!      * Transfer any still-uncopied data to line_buf.
!      */
!     if (raw_buf_ptr > cstate->raw_buf_index)
!     {
!         appendBinaryStringInfo(&cstate->line_buf,
!                                cstate->raw_buf + cstate->raw_buf_index,
!                                raw_buf_ptr - cstate->raw_buf_index);
!         cstate->raw_buf_index = raw_buf_ptr;
!     }
!
!     return result;
! }
!
! /*
!  * CopyReadLineCSV - inner loop of CopyReadLine for CSV mode
!  *
!  * If you need to change this, better look at CopyReadLineText too
!  */
! static bool
! CopyReadLineCSV(CopyState cstate)
! {
!     bool        result;
!     char       *copy_raw_buf;
!     int            raw_buf_ptr;
!     int            copy_buf_len;
!     bool        need_data;
!     bool        hit_eof;
!     char        s[2];
!     bool        in_quote = false,
                  last_was_esc = false;
-     char        quotec = cstate->quote[0];
-     char        escapec = cstate->escape[0];
-
-     /* ignore special escape processing if it's the same as quotec */
-     if (quotec == escapec)
-         escapec = '\0';
-
-     s[1] = 0;
-
-     /* set default status */
-     result = false;
-
-     /*
-      * The objective of this loop is to transfer the entire next input line
-      * into line_buf.  Hence, we only care for detecting newlines (\r and/or
-      * \n) and the end-of-copy marker (\.).
-      *
-      * In CSV mode, \r and \n inside a quoted field are just part of the data
-      * value and are put in line_buf.  We keep just enough state to know if we
-      * are currently in a quoted field or not.
-      *
-      * These four characters, and the CSV escape and quote characters, are
-      * assumed the same in frontend and backend encodings.
-      *
-      * For speed, we try to move data to line_buf in chunks rather than one
-      * character at a time.  raw_buf_ptr points to the next character to
-      * examine; any characters from raw_buf_index to raw_buf_ptr have been
-      * determined to be part of the line, but not yet transferred to line_buf.
-      *
-      * For a little extra speed within the loop, we copy raw_buf and
-      * raw_buf_len into local variables.
-      */
-     copy_raw_buf = cstate->raw_buf;
-     raw_buf_ptr = cstate->raw_buf_index;
-     copy_buf_len = cstate->raw_buf_len;
-     need_data = false;            /* flag to force reading more data */
-     hit_eof = false;            /* flag indicating no more data available */
-
-     for (;;)
-     {
-         int            prev_raw_ptr;
-         char        c;
-
-         /* Load more data if needed */
-         if (raw_buf_ptr >= copy_buf_len || need_data)
-         {
-             /*
-              * Transfer any approved data to line_buf; must do this to be sure
-              * there is some room in raw_buf.
-              */
-             if (raw_buf_ptr > cstate->raw_buf_index)
-             {
-                 appendBinaryStringInfo(&cstate->line_buf,
-                                      cstate->raw_buf + cstate->raw_buf_index,
-                                        raw_buf_ptr - cstate->raw_buf_index);
-                 cstate->raw_buf_index = raw_buf_ptr;
-             }
-
-             /*
-              * Try to read some more data.    This will certainly reset
-              * raw_buf_index to zero, and raw_buf_ptr must go with it.
-              */
-             if (!CopyLoadRawBuf(cstate))
-                 hit_eof = true;
-             raw_buf_ptr = 0;
-             copy_buf_len = cstate->raw_buf_len;

              /*
!              * If we are completely out of data, break out of the loop,
!              * reporting EOF.
               */
!             if (copy_buf_len <= 0)
!             {
!                 result = true;
!                 break;
!             }
!             need_data = false;
!         }
!
!         /* OK to fetch a character */
!         prev_raw_ptr = raw_buf_ptr;
!         c = copy_raw_buf[raw_buf_ptr++];
!
!         /*
!          * If character is '\\' or '\r', we may need to look ahead below.
!          * Force fetch of the next character if we don't already have it. We
!          * need to do this before changing CSV state, in case one of these
!          * characters is also the quote or escape character.
!          *
!          * Note: old-protocol does not like forced prefetch, but it's OK here
!          * since we cannot validly be at EOF.
!          */
!         if (c == '\\' || c == '\r')
!         {
!             if (raw_buf_ptr >= copy_buf_len && !hit_eof)
!             {
!                 raw_buf_ptr = prev_raw_ptr;        /* undo fetch */
!                 need_data = true;
!                 continue;
!             }
          }

!         /*
!          * Dealing with quotes and escapes here is mildly tricky. If the quote
!          * char is also the escape char, there's no problem - we  just use the
!          * char as a toggle. If they are different, we need to ensure that we
!          * only take account of an escape inside a quoted field and
!          * immediately preceding a quote char, and not the second in a
!          * escape-escape sequence.
!          */
!         if (in_quote && c == escapec)
!             last_was_esc = !last_was_esc;
!         if (c == quotec && !last_was_esc)
!             in_quote = !in_quote;
!         if (c != escapec)
!             last_was_esc = false;
!
!         /*
!          * Updating the line count for embedded CR and/or LF chars is
!          * necessarily a little fragile - this test is probably about the best
!          * we can do.  (XXX it's arguable whether we should do this at all ---
!          * is cur_lineno a physical or logical count?)
!          */
!         if (in_quote && c == (cstate->eol_type == EOL_NL ? '\n' : '\r'))
!             cstate->cur_lineno++;
!
!         if (c == '\r' && !in_quote)
          {
              /* Check for \r\n on first line, _and_ handle \r\n. */
              if (cstate->eol_type == EOL_UNKNOWN ||
--- 2209,2257 ----
          prev_raw_ptr = raw_buf_ptr;
          c = copy_raw_buf[raw_buf_ptr++];

!         if (cstate->csv_mode)
          {
              /*
!              * If character is '\\' or '\r', we may need to look ahead below.
!              * Force fetch of the next character if we don't already have it. We
!              * need to do this before changing CSV state, in case one of these
!              * characters is also the quote or escape character.
!              *
!              * Note: old-protocol does not like forced prefetch, but it's OK here
!              * since we cannot validly be at EOF.
               */
!             if (c == '\\' || c == '\r')
              {
!                 IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
              }

              /*
!              * Dealing with quotes and escapes here is mildly tricky. If the quote
!              * char is also the escape char, there's no problem - we  just use the
!              * char as a toggle. If they are different, we need to ensure that we
!              * only take account of an escape inside a quoted field and
!              * immediately preceding a quote char, and not the second in a
!              * escape-escape sequence.
               */
!             if (in_quote && c == escapec)
!                 last_was_esc = !last_was_esc;
!             if (c == quotec && !last_was_esc)
!                 in_quote = !in_quote;
!             if (c != escapec)
                  last_was_esc = false;

              /*
!              * Updating the line count for embedded CR and/or LF chars is
!              * necessarily a little fragile - this test is probably about the best
!              * we can do.  (XXX it's arguable whether we should do this at all ---
!              * is cur_lineno a physical or logical count?)
               */
!             if (in_quote && c == (cstate->eol_type == EOL_NL ? '\n' : '\r'))
!                 cstate->cur_lineno++;
          }

!         /* Process \r */
!         if (c == '\r' && (!cstate->csv_mode || !in_quote))
          {
              /* Check for \r\n on first line, _and_ handle \r\n. */
              if (cstate->eol_type == EOL_UNKNOWN ||
***************
*** 2478,2489 ****
                   * Note that if we are at EOF, c will wind up as '\0' because
                   * of the guaranteed pad of raw_buf.
                   */
!                 if (raw_buf_ptr >= copy_buf_len && !hit_eof)
!                 {
!                     raw_buf_ptr = prev_raw_ptr; /* undo fetch */
!                     need_data = true;
!                     continue;
!                 }
                  c = copy_raw_buf[raw_buf_ptr];

                  if (c == '\n')
--- 2263,2271 ----
                   * Note that if we are at EOF, c will wind up as '\0' because
                   * of the guaranteed pad of raw_buf.
                   */
!                 IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
!
!                 /* get next char */
                  c = copy_raw_buf[raw_buf_ptr];

                  if (c == '\n')
***************
*** 2497,2505 ****
                      if (cstate->eol_type == EOL_CRNL)
                          ereport(ERROR,
                                  (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                             errmsg("unquoted carriage return found in data"),
!                                  errhint("Use quoted CSV field to represent carriage return.")));
!
                      /*
                       * if we got here, it is the first line and we didn't find
                       * \n, so don't consume the peeked character
--- 2279,2290 ----
                      if (cstate->eol_type == EOL_CRNL)
                          ereport(ERROR,
                                  (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                              errmsg(!cstate->csv_mode ?
!                                     "literal carriage return found in data" :
!                                     "unquoted carriage return found in data"),
!                                  errhint(!cstate->csv_mode ?
!                                         "Use \"\\r\" to represent carriage return." :
!                                         "Use quoted CSV field to represent carriage return.")));
                      /*
                       * if we got here, it is the first line and we didn't find
                       * \n, so don't consume the peeked character
***************
*** 2510,2559 ****
              else if (cstate->eol_type == EOL_NL)
                  ereport(ERROR,
                          (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                          errmsg("unquoted carriage return found in CSV data"),
!                          errhint("Use quoted CSV field to represent carriage return.")));
              /* If reach here, we have found the line terminator */
              break;
          }

!         if (c == '\n' && !in_quote)
          {
              if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
                  ereport(ERROR,
                          (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                          errmsg("unquoted newline found in data"),
!                      errhint("Use quoted CSV field to represent newline.")));
              cstate->eol_type = EOL_NL;    /* in case not set yet */
              /* If reach here, we have found the line terminator */
              break;
          }

          /*
!          * In CSV mode, we only recognize \. at start of line
           */
!         if (c == '\\' && cstate->line_buf.len == 0)
          {
              char        c2;

!             /*
!              * If need more data, go back to loop top to load it.
!              */
!             if (raw_buf_ptr >= copy_buf_len)
!             {
!                 if (hit_eof)
!                 {
!                     /* backslash just before EOF, treat as data char */
!                     result = true;
!                     break;
!                 }
!                 raw_buf_ptr = prev_raw_ptr;        /* undo fetch */
!                 need_data = true;
!                 continue;
!             }

!             /*
!              * Note: we do not change c here since we aren't treating \ as
!              * escaping the next character.
               */
              c2 = copy_raw_buf[raw_buf_ptr];

--- 2295,2343 ----
              else if (cstate->eol_type == EOL_NL)
                  ereport(ERROR,
                          (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                      errmsg(!cstate->csv_mode ?
!                                 "literal carriage return found in data" :
!                                 "unquoted carriage return found in data"),
!                          errhint(!cstate->csv_mode ?
!                                 "Use \"\\r\" to represent carriage return." :
!                                 "Use quoted CSV field to represent carriage return.")));
              /* If reach here, we have found the line terminator */
              break;
          }

!         /* Process \n */
!         if (c == '\n' && (!cstate->csv_mode || !in_quote))
          {
              if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
                  ereport(ERROR,
                          (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                          errmsg(!cstate->csv_mode ?
!                                 "literal newline found in data" :
!                                 "unquoted newline found in data"),
!                          errhint(!cstate->csv_mode ?
!                                  "Use \"\\n\" to represent newline." :
!                                  "Use quoted CSV field to represent newline.")));
              cstate->eol_type = EOL_NL;    /* in case not set yet */
              /* If reach here, we have found the line terminator */
              break;
          }

          /*
!          *    In CSV mode, we only recognize \. alone on a line.  This is
!          *    because \. is a valid CSV data value.
           */
!         if (c == '\\' && (!cstate->csv_mode || first_char_in_line))
          {
              char        c2;

!             IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
!             IF_NEED_REFILL_AND_EOF_BREAK(0);

!             /* -----
!              * get next character
!              * Note: we do not change c so if it isn't \., we can fall
!              * through and continue processing for client encoding.
!              * -----
               */
              c2 = copy_raw_buf[raw_buf_ptr];

***************
*** 2568,2662 ****
                   */
                  if (cstate->eol_type == EOL_CRNL)
                  {
!                     if (raw_buf_ptr >= copy_buf_len && !hit_eof)
!                     {
!                         raw_buf_ptr = prev_raw_ptr;        /* undo fetch */
!                         need_data = true;
!                         continue;
!                     }
                      /* if hit_eof, c2 will become '\0' */
                      c2 = copy_raw_buf[raw_buf_ptr++];
                      if (c2 == '\n')
!                         ereport(ERROR,
!                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                                  errmsg("end-of-copy marker does not match previous newline style")));
!                     if (c2 != '\r')
!                         ereport(ERROR,
!                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                                  errmsg("end-of-copy marker corrupt")));
!                 }
!                 if (raw_buf_ptr >= copy_buf_len && !hit_eof)
!                 {
!                     raw_buf_ptr = prev_raw_ptr; /* undo fetch */
!                     need_data = true;
!                     continue;
                  }
                  /* if hit_eof, c2 will become '\0' */
                  c2 = copy_raw_buf[raw_buf_ptr++];
                  if (c2 != '\r' && c2 != '\n')
!                     ereport(ERROR,
!                             (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                              errmsg("end-of-copy marker corrupt")));
                  if ((cstate->eol_type == EOL_NL && c2 != '\n') ||
                      (cstate->eol_type == EOL_CRNL && c2 != '\n') ||
                      (cstate->eol_type == EOL_CR && c2 != '\r'))
                      ereport(ERROR,
                              (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                               errmsg("end-of-copy marker does not match previous newline style")));

                  /*
                   * Transfer only the data before the \. into line_buf, then
                   * discard the data and the \. sequence.
                   */
                  if (prev_raw_ptr > cstate->raw_buf_index)
!                     appendBinaryStringInfo(&cstate->line_buf, cstate->raw_buf + cstate->raw_buf_index,
                                         prev_raw_ptr - cstate->raw_buf_index);
                  cstate->raw_buf_index = raw_buf_ptr;
                  result = true;    /* report EOF */
                  break;
              }
          }

          /*
!          * Do we need to be careful about trailing bytes of multibyte
!          * characters?    (See note above about client_only_encoding)
           *
!          * We assume here that pg_encoding_mblen only looks at the first byte
!          * of the character!
           */
!         if (cstate->client_only_encoding)
          {
              int            mblen;

!             s[0] = c;
!             mblen = pg_encoding_mblen(cstate->client_encoding, s);
!             if (raw_buf_ptr + (mblen - 1) > copy_buf_len)
!             {
!                 if (hit_eof)
!                 {
!                     /* consume the partial character (will fail below) */
!                     raw_buf_ptr = copy_buf_len;
!                     result = true;
!                     break;
!                 }
!                 raw_buf_ptr = prev_raw_ptr;        /* undo fetch */
!                 need_data = true;
!                 continue;
!             }
              raw_buf_ptr += mblen - 1;
          }
      }                            /* end of outer loop */

      /*
       * Transfer any still-uncopied data to line_buf.
       */
!     if (raw_buf_ptr > cstate->raw_buf_index)
!     {
!         appendBinaryStringInfo(&cstate->line_buf,
!                                cstate->raw_buf + cstate->raw_buf_index,
!                                raw_buf_ptr - cstate->raw_buf_index);
!         cstate->raw_buf_index = raw_buf_ptr;
!     }

      return result;
  }
--- 2352,2466 ----
                   */
                  if (cstate->eol_type == EOL_CRNL)
                  {
!                     /* Get the next character */
!                     IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
                      /* if hit_eof, c2 will become '\0' */
                      c2 = copy_raw_buf[raw_buf_ptr++];
+
                      if (c2 == '\n')
!                     {
!                         if (!cstate->csv_mode)
!                             ereport(ERROR,
!                                     (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                                      errmsg("end-of-copy marker does not match previous newline style")));
!                         else
!                             NO_END_OF_COPY_GOTO;
!                     }
!                     else if (c2 != '\r')
!                     {
!                         if (!cstate->csv_mode)
!                             ereport(ERROR,
!                                     (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                                      errmsg("end-of-copy marker corrupt")));
!                         else
!                             NO_END_OF_COPY_GOTO;
!                     }
                  }
+
+                 /* Get the next character */
+                 IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
                  /* if hit_eof, c2 will become '\0' */
                  c2 = copy_raw_buf[raw_buf_ptr++];
+
                  if (c2 != '\r' && c2 != '\n')
!                 {
!                     if (!cstate->csv_mode)
!                         ereport(ERROR,
!                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                                  errmsg("end-of-copy marker corrupt")));
!                     else
!                         NO_END_OF_COPY_GOTO;
!                 }
!
                  if ((cstate->eol_type == EOL_NL && c2 != '\n') ||
                      (cstate->eol_type == EOL_CRNL && c2 != '\n') ||
                      (cstate->eol_type == EOL_CR && c2 != '\r'))
+                 {
                      ereport(ERROR,
                              (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                               errmsg("end-of-copy marker does not match previous newline style")));
+                 }

                  /*
                   * Transfer only the data before the \. into line_buf, then
                   * discard the data and the \. sequence.
                   */
                  if (prev_raw_ptr > cstate->raw_buf_index)
!                     appendBinaryStringInfo(&cstate->line_buf,
!                                      cstate->raw_buf + cstate->raw_buf_index,
                                         prev_raw_ptr - cstate->raw_buf_index);
                  cstate->raw_buf_index = raw_buf_ptr;
                  result = true;    /* report EOF */
                  break;
              }
+             else if (!cstate->csv_mode)
+                 /*
+                  *    If we are here, it means we found a backslash followed by
+                  *    something other than a period.  In non-CSV mode, anything
+                  *    after a backslash is special, so we skip over that second
+                  *    character too.  If we didn't do that \\. would be
+                  *    considered an eof-of copy, while in non-CVS mode it is a
+                  *    literal backslash followed by a period.  In CSV mode,
+                  *    backslashes are not special, so we want to process the
+                  *    character after the backslash just like a normal character,
+                  *    so we don't increment in those cases.
+                  */
+                 raw_buf_ptr++;
          }

          /*
!          * This label is for CSV cases where \. appears at the start of a line,
!          * but there is more text after it, meaning it was a data value.
!          * We are more strict for \. in CSV mode because \. could be a data
!          * value, while in non-CSV mode, \. cannot be a data value.
!          */
! not_end_of_copy:
!
!         /*
!          * Process all bytes of a multi-byte character as a group.
           *
!          * We only support multi-byte sequences where the first byte
!          * has the high-bit set, so as an optimization we can avoid
!          * this block entirely if it is not set.
           */
!         if (cstate->encoding_embeds_ascii && IS_HIGHBIT_SET(c))
          {
              int            mblen;

!             mblen_str[0] = c;
!             /* All our encodings only read the first byte to get the length */
!             mblen = pg_encoding_mblen(cstate->client_encoding, mblen_str);
!             IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(mblen - 1);
!             IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
              raw_buf_ptr += mblen - 1;
          }
+         first_char_in_line = false;
      }                            /* end of outer loop */

      /*
       * Transfer any still-uncopied data to line_buf.
       */
!     REFILL_LINEBUF;

      return result;
  }
***************
*** 3150,3156 ****
                   * safe, because in valid backend encodings, extra bytes of a
                   * multibyte character never look like ASCII.
                   */
!                 if (cstate->client_only_encoding)
                      mblen = pg_encoding_mblen(cstate->client_encoding, string);
                  CopySendData(cstate, string, mblen);
                  break;
--- 2954,2960 ----
                   * safe, because in valid backend encodings, extra bytes of a
                   * multibyte character never look like ASCII.
                   */
!                 if (cstate->encoding_embeds_ascii && IS_HIGHBIT_SET(c))
                      mblen = pg_encoding_mblen(cstate->client_encoding, string);
                  CopySendData(cstate, string, mblen);
                  break;
***************
*** 3196,3202 ****
                  use_quote = true;
                  break;
              }
!             if (cstate->client_only_encoding)
                  mblen = pg_encoding_mblen(cstate->client_encoding, tstring);
              else
                  mblen = 1;
--- 3000,3006 ----
                  use_quote = true;
                  break;
              }
!             if (cstate->encoding_embeds_ascii && IS_HIGHBIT_SET(c))
                  mblen = pg_encoding_mblen(cstate->client_encoding, tstring);
              else
                  mblen = 1;
***************
*** 3210,3216 ****
      {
          if (use_quote && (c == quotec || c == escapec))
              CopySendChar(cstate, escapec);
!         if (cstate->client_only_encoding)
              mblen = pg_encoding_mblen(cstate->client_encoding, string);
          else
              mblen = 1;
--- 3014,3020 ----
      {
          if (use_quote && (c == quotec || c == escapec))
              CopySendChar(cstate, escapec);
!         if (cstate->encoding_embeds_ascii && IS_HIGHBIT_SET(c))
              mblen = pg_encoding_mblen(cstate->client_encoding, string);
          else
              mblen = 1;
Index: src/backend/commands/copy.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.254.2.1
diff -c -c -r1.254.2.1 copy.c
*** src/backend/commands/copy.c    22 Nov 2005 18:23:07 -0000    1.254.2.1
--- src/backend/commands/copy.c    27 Dec 2005 01:38:22 -0000
***************
*** 2338,2343 ****
--- 2338,2344 ----
      bool        need_data;
      bool        hit_eof;
      char        s[2];
+     bool        first_char_in_line = true;
      bool        in_quote = false,
                  last_was_esc = false;
      char        quotec = cstate->quote[0];
***************
*** 2531,2537 ****
          /*
           * In CSV mode, we only recognize \. at start of line
           */
!         if (c == '\\' && cstate->line_buf.len == 0)
          {
              char        c2;

--- 2532,2538 ----
          /*
           * In CSV mode, we only recognize \. at start of line
           */
!         if (c == '\\' && first_char_in_line)
          {
              char        c2;

***************
*** 2577,2589 ****
                      /* if hit_eof, c2 will become '\0' */
                      c2 = copy_raw_buf[raw_buf_ptr++];
                      if (c2 == '\n')
!                         ereport(ERROR,
!                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                                  errmsg("end-of-copy marker does not match previous newline style")));
                      if (c2 != '\r')
!                         ereport(ERROR,
!                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                                  errmsg("end-of-copy marker corrupt")));
                  }
                  if (raw_buf_ptr >= copy_buf_len && !hit_eof)
                  {
--- 2578,2592 ----
                      /* if hit_eof, c2 will become '\0' */
                      c2 = copy_raw_buf[raw_buf_ptr++];
                      if (c2 == '\n')
!                     {
!                         raw_buf_ptr = prev_raw_ptr + 1;
!                         goto not_end_of_copy;
!                     }
                      if (c2 != '\r')
!                     {
!                         raw_buf_ptr = prev_raw_ptr + 1;
!                         goto not_end_of_copy;
!                     }
                  }
                  if (raw_buf_ptr >= copy_buf_len && !hit_eof)
                  {
***************
*** 2594,2609 ****
                  /* if hit_eof, c2 will become '\0' */
                  c2 = copy_raw_buf[raw_buf_ptr++];
                  if (c2 != '\r' && c2 != '\n')
!                     ereport(ERROR,
!                             (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                              errmsg("end-of-copy marker corrupt")));
                  if ((cstate->eol_type == EOL_NL && c2 != '\n') ||
                      (cstate->eol_type == EOL_CRNL && c2 != '\n') ||
                      (cstate->eol_type == EOL_CR && c2 != '\r'))
                      ereport(ERROR,
                              (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                               errmsg("end-of-copy marker does not match previous newline style")));
-
                  /*
                   * Transfer only the data before the \. into line_buf, then
                   * discard the data and the \. sequence.
--- 2597,2612 ----
                  /* if hit_eof, c2 will become '\0' */
                  c2 = copy_raw_buf[raw_buf_ptr++];
                  if (c2 != '\r' && c2 != '\n')
!                 {
!                     raw_buf_ptr = prev_raw_ptr + 1;
!                     goto not_end_of_copy;
!                 }
                  if ((cstate->eol_type == EOL_NL && c2 != '\n') ||
                      (cstate->eol_type == EOL_CRNL && c2 != '\n') ||
                      (cstate->eol_type == EOL_CR && c2 != '\r'))
                      ereport(ERROR,
                              (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                               errmsg("end-of-copy marker does not match previous newline style")));
                  /*
                   * Transfer only the data before the \. into line_buf, then
                   * discard the data and the \. sequence.
***************
*** 2618,2628 ****
          }

          /*
!          * Do we need to be careful about trailing bytes of multibyte
!          * characters?    (See note above about client_only_encoding)
!          *
!          * We assume here that pg_encoding_mblen only looks at the first byte
!          * of the character!
           */
          if (cstate->client_only_encoding)
          {
--- 2621,2635 ----
          }

          /*
!          * This label is for CSV cases where \. appears at the start of a line,
!          * but there is more text after it, meaning it was a data value.
!          * We are more strict for \. in CSV mode because \. could be a data
!          * value, while in non-CSV mode, \. cannot be a data value.
!          */
! not_end_of_copy:
!
!         /*
!          * Process all bytes of a multi-byte character as a group.
           */
          if (cstate->client_only_encoding)
          {
***************
*** 2645,2650 ****
--- 2652,2658 ----
              }
              raw_buf_ptr += mblen - 1;
          }
+         first_char_in_line = false;
      }                            /* end of outer loop */

      /*

Re: [BUGS] BUG #2114: (patch) COPY FROM ... end of copy

From
Andrew Dunstan
Date:

Bruce Momjian wrote:

> The big problem is that \. is also a valid
>CSV data value (though not a valid non-CSV data value).  So, the
>solution we came up with was to require \. to appear alone on a line in
>CSV mode for it to be treated as end-of-copy.
>

According to the docs, that's the way to specify EOD in both text and
CSV mode:

  End of data can be represented by a single line containing just
backslash-period.

Your analysis regarding line_buf.len seems correct.

We probably should have a regression test with \. in a CSV field.

cheers

andrew


Re: [BUGS] BUG #2114: (patch) COPY FROM ... end of copy marker

From
Bruce Momjian
Date:
Andrew Dunstan wrote:
>
>
> Bruce Momjian wrote:
>
> > The big problem is that \. is also a valid
> >CSV data value (though not a valid non-CSV data value).  So, the
> >solution we came up with was to require \. to appear alone on a line in
> >CSV mode for it to be treated as end-of-copy.
> >
>
> According to the docs, that's the way to specify EOD in both text and
> CSV mode:
>
>   End of data can be represented by a single line containing just
> backslash-period.

Right, but in non-CSV mode, we allow \. at the end of any line because
it is unique so I kept that behavior.  That is not documented however.

> Your analysis regarding line_buf.len seems correct.
>
> We probably should have a regression test with \. in a CSV field.

Agreed.  My test for CSV was simple, just try loading:

    x\.
    x\.b
    \.c

all should load literally, but they fail.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [BUGS] BUG #2114: (patch) COPY FROM ... end of copy marker

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Andrew Dunstan wrote:
> >
> >
> > Bruce Momjian wrote:
> >
> > > The big problem is that \. is also a valid
> > >CSV data value (though not a valid non-CSV data value).  So, the
> > >solution we came up with was to require \. to appear alone on a line in
> > >CSV mode for it to be treated as end-of-copy.
> > >
> >
> > According to the docs, that's the way to specify EOD in both text and
> > CSV mode:
> >
> >   End of data can be represented by a single line containing just
> > backslash-period.
>
> Right, but in non-CSV mode, we allow \. at the end of any line because
> it is unique so I kept that behavior.  That is not documented however.
>
> > Your analysis regarding line_buf.len seems correct.
> >
> > We probably should have a regression test with \. in a CSV field.
>
> Agreed.  My test for CSV was simple, just try loading:
>
>     x\.
>     x\.b
>     \.c
>
> all should load literally, but they fail.

OK, original patch applied to HEAD and smaller version to 8.1.X, and
regression test added, now attached.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: src/test/regress/expected/copy2.out
===================================================================
RCS file: /cvsroot/pgsql/src/test/regress/expected/copy2.out,v
retrieving revision 1.22
diff -c -c -r1.22 copy2.out
*** src/test/regress/expected/copy2.out    26 Jun 2005 03:04:18 -0000    1.22
--- src/test/regress/expected/copy2.out    27 Dec 2005 18:19:36 -0000
***************
*** 194,199 ****
--- 194,202 ----
  --test that we read consecutive LFs properly
  CREATE TEMP TABLE testnl (a int, b text, c int);
  COPY testnl FROM stdin CSV;
+ -- test end of copy marker
+ CREATE TEMP TABLE testeoc (a text);
+ COPY testeoc FROM stdin CSV;
  DROP TABLE x, y;
  DROP FUNCTION fn_x_before();
  DROP FUNCTION fn_x_after();
Index: src/test/regress/sql/copy2.sql
===================================================================
RCS file: /cvsroot/pgsql/src/test/regress/sql/copy2.sql,v
retrieving revision 1.13
diff -c -c -r1.13 copy2.sql
*** src/test/regress/sql/copy2.sql    26 Jun 2005 03:04:37 -0000    1.13
--- src/test/regress/sql/copy2.sql    27 Dec 2005 18:19:36 -0000
***************
*** 139,144 ****
--- 139,153 ----
  inside",2
  \.

+ -- test end of copy marker
+ CREATE TEMP TABLE testeoc (a text);
+
+ COPY testeoc FROM stdin CSV;
+ a\.
+ \.b
+ c\.d
+ \.
+

  DROP TABLE x, y;
  DROP FUNCTION fn_x_before();

Re: [BUGS] BUG #2114: (patch) COPY FROM ... end of

From
"Luke Lonergan"
Date:
Bruce,

On 12/27/05 10:20 AM, "Bruce Momjian" <pgman@candle.pha.pa.us> wrote:

> OK, original patch applied to HEAD and smaller version to 8.1.X, and
> regression test added, now attached.

Great, good catch.

Have you tested performance, before and after?

The only good way to test performance is using a fast enough I/O subsystem
that you are CPU-bound, which means >60MB/s of write speed.

I'd be happy to get you an account on one.

- Luke



Re: [BUGS] BUG #2114: (patch) COPY FROM ... end of

From
Bruce Momjian
Date:
Luke Lonergan wrote:
> Bruce,
>
> On 12/27/05 10:20 AM, "Bruce Momjian" <pgman@candle.pha.pa.us> wrote:
>
> > OK, original patch applied to HEAD and smaller version to 8.1.X, and
> > regression test added, now attached.
>
> Great, good catch.
>
> Have you tested performance, before and after?
>
> The only good way to test performance is using a fast enough I/O subsystem
> that you are CPU-bound, which means >60MB/s of write speed.
>
> I'd be happy to get you an account on one.

I don't need to test performance because it is the same code, just with
macros and the two functions merged.  I do have an optimization for that
loop but I saw no improvement so I didn't apply it.  It was basically to
advance the pointer in a tight look just checking for \r, \n, and \\,
but it seems the larger loop isn't much slower than a tight one.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073