Updated COPY CSV patch - Mailing list pgsql-patches

From Bruce Momjian
Subject Updated COPY CSV patch
Date
Msg-id 200404130420.i3D4KEE02485@candle.pha.pa.us
Whole thread Raw
In response to Re: COPY for CSV documentation  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: Updated COPY CSV patch
List pgsql-patches
Andrew Dunstan wrote:
> If the null marker is not an empty string, it gets an error, of
> course - if it is it gets a null:
>
> [andrew@marmaduke pginst]$ echo ',,' | bin/psql -c "create temp
> table foo (a int, b text, c text); copy foo from stdin delimiter
> ',\"' null '\\\\N';" ERROR:  invalid input syntax for integer:
> "" CONTEXT:  COPY foo, line 1, column a: "" [andrew@marmaduke
> pginst]$ echo ',,' | bin/psql -c "create temp table foo (a int,
> b text, c text); copy foo from stdin delimiter ',\"' ;"
> [andrew@marmaduke pginst]$
>
>
> I hope that is expected behaviour - it's what *I* expect, at
> least.
> >
>
> Attached patch has these additions to previously posted patch:
> . quote character may not appear in NULL marker
> . any non-null value that matches the NULL marker is forced to be quoted
> when written.

OK, here is a new version of the patch that includes the grammar
changes we agreed upon, SGML changes, and \copy support.  I will not
make any more changes without contacting you so feel free to make
adjustments and repost.

I have two open issues.  First, CSV should support WITH OIDS, no?

Second, I found a problem with NULLs.  If I do:
.
        test=> create table test (x text, y text);
        CREATE TABLE
        test=> insert into test values ('', NULL);
        INSERT 17221 1
        test=>

then this:

        test=> copy test to '/tmp/b' with csv;

creates:

        "",

and this:

        test=> copy test to '/tmp/b' with csv NULL 'fred';

creates:

        ,fred

Is that logical?  A non-null field went from "" to nothing.

I think it is caused by this code:

         bool force_quote = (strcmp(string, null_print) == 0);
         CopyAttributeOutCSV(string, delim, quote, escape,
                             force_quote);

The reason it happens is that when the null string is '', it matches a
zero-length string, so the value is quoted.  When the null stirng isn't
blank, a zero-length string doesn't match the null string so it isn't
quoted.    I think we need to add special logic for zero-length strings
so they are always quoted, even if there is a special null string.  This
will make our dumps more consistent, I think, or maybe the current
behavior is OK.  It just struck me as strange.

I did a dump/reload test with a null string and null, and it worked
fine.

Is there any data that can not be dumped/reloaded via CSV?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: doc/src/sgml/ref/copy.sgml
===================================================================
RCS file: /cvsroot/pgsql-server/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.55
diff -c -c -r1.55 copy.sgml
*** doc/src/sgml/ref/copy.sgml    13 Dec 2003 23:59:07 -0000    1.55
--- doc/src/sgml/ref/copy.sgml    13 Apr 2004 04:18:22 -0000
***************
*** 26,32 ****
            [ BINARY ]
            [ OIDS ]
            [ DELIMITER [ AS ] '<replaceable class="parameter">delimiter</replaceable>' ]
!           [ NULL [ AS ] '<replaceable class="parameter">null string</replaceable>' ] ]

  COPY <replaceable class="parameter">tablename</replaceable> [ ( <replaceable class="parameter">column</replaceable>
[,...] ) ] 
      TO { '<replaceable class="parameter">filename</replaceable>' | STDOUT }
--- 26,34 ----
            [ BINARY ]
            [ OIDS ]
            [ DELIMITER [ AS ] '<replaceable class="parameter">delimiter</replaceable>' ]
!           [ NULL [ AS ] '<replaceable class="parameter">null string</replaceable>' ]
!           [ CSV [ QUOTE [ AS ] '<replaceable class="parameter">quote</replaceable>' ]
!                 [ ESCAPE [ AS ] '<replaceable class="parameter">escape</replaceable>' ] ]

  COPY <replaceable class="parameter">tablename</replaceable> [ ( <replaceable class="parameter">column</replaceable>
[,...] ) ] 
      TO { '<replaceable class="parameter">filename</replaceable>' | STDOUT }
***************
*** 34,40 ****
            [ BINARY ]
            [ OIDS ]
            [ DELIMITER [ AS ] '<replaceable class="parameter">delimiter</replaceable>' ]
!           [ NULL [ AS ] '<replaceable class="parameter">null string</replaceable>' ] ]
  </synopsis>
   </refsynopsisdiv>

--- 36,44 ----
            [ BINARY ]
            [ OIDS ]
            [ DELIMITER [ AS ] '<replaceable class="parameter">delimiter</replaceable>' ]
!           [ NULL [ AS ] '<replaceable class="parameter">null string</replaceable>' ]
!           [ CSV [ QUOTE [ AS ] '<replaceable class="parameter">quote</replaceable>' ]
!                 [ ESCAPE [ AS ] '<replaceable class="parameter">escape</replaceable>' ] ]
  </synopsis>
   </refsynopsisdiv>

***************
*** 136,142 ****
       <para>
        Specifies copying the OID for each row.  (An error is raised if
        <literal>OIDS</literal> is specified for a table that does not
!       have OIDs.)
       </para>
      </listitem>
     </varlistentry>
--- 140,146 ----
       <para>
        Specifies copying the OID for each row.  (An error is raised if
        <literal>OIDS</literal> is specified for a table that does not
!       have OIDs.)  FIX CSV FOR OIDS!
       </para>
      </listitem>
     </varlistentry>
***************
*** 146,152 ****
      <listitem>
       <para>
        The single character that separates columns within each row
!       (line) of the file.  The default is a tab character.
       </para>
      </listitem>
     </varlistentry>
--- 150,157 ----
      <listitem>
       <para>
        The single character that separates columns within each row
!       (line) of the file.  The default is a tab character in normal mode,
!       a comma in <literal>CSV</> mode.
       </para>
      </listitem>
     </varlistentry>
***************
*** 156,175 ****
      <listitem>
       <para>
        The string that represents a null value. The default is
!       <literal>\N</literal> (backslash-N). You might prefer an empty
!       string, for example.
       </para>

       <note>
        <para>
!        On a <command>COPY FROM</command>, any data item that matches
         this string will be stored as a null value, so you should make
         sure that you use the same string as you used with
         <command>COPY TO</command>.
        </para>
       </note>
      </listitem>
     </varlistentry>
    </variablelist>
   </refsect1>

--- 161,225 ----
      <listitem>
       <para>
        The string that represents a null value. The default is
!       <literal>\N</literal> (backslash-N) in normal mode, and a missing
!       value (no quotes) in <literal>CSV</> mode. You might prefer an empty
!       string in cases where you don't want to distinguish nulls from
!       empty strings.
       </para>

       <note>
        <para>
!        When using <command>COPY FROM</command>, any data item that matches
         this string will be stored as a null value, so you should make
         sure that you use the same string as you used with
         <command>COPY TO</command>.
        </para>
+
+       <para>
+        If you do not want anything used as null when using
+        <command>COPY FROM</command>, you can specify some value that is very
+        unlikely to appear in the file, such as <literal>frobnitz</literal> or
+        <literal>d5f4074b254c76cd8ae37bf1731f4aed</literal> (which is
+        <literal>md5('frobnitz')</literal>). This could be especially useful
+        when importing a <literal>CSV</> file into a table with <literal>NOT NULL</>
+        columns.
+       </para>
       </note>
+
      </listitem>
     </varlistentry>
+
+    <varlistentry>
+     <term><literal>CSV</literal></term>
+     <listitem>
+      <para>
+       Enables Comma Separated Variable (<literal>CSV</>) mode.  It sets the
+       default <literal>DELIMITER</> to comma, and <literal>QUOTE</> and
+       <literal>ESCAPE</> values to double-quote.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable class="parameter">quote string</replaceable></term>
+     <listitem>
+      <para>
+       Specifies the quotation character in <literal>CSV</> mode.
+       The default is double-quote.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable class="parameter">escape string</replaceable></term>
+     <listitem>
+      <para>
+       Specifies the character that should appear before a <literal>QUOTE</>
+       data character value in <literal>CSV</> mode.  The default is double-quote.
+      </para>
+     </listitem>
+    </varlistentry>
+
    </variablelist>
   </refsect1>

***************
*** 253,259 ****

     <para>
      When <command>COPY</command> is used without the <literal>BINARY</literal> option,
!     the data read or written is a text file with one line per table row.
      Columns in a row are separated by the delimiter character.
      The column values themselves are strings generated by the
      output function, or acceptable to the input function, of each
--- 303,310 ----

     <para>
      When <command>COPY</command> is used without the <literal>BINARY</literal> option,
!     the data read or written is a text file with one line per table row,
!     unless <literal>CSV</> mode is used.
      Columns in a row are separated by the delimiter character.
      The column values themselves are strings generated by the
      output function, or acceptable to the input function, of each
***************
*** 377,382 ****
--- 428,473 ----
      meant as data, <command>COPY FROM</command> will complain if the line
      endings in the input are not all alike.
     </para>
+   </refsect2>
+
+   <refsect2>
+    <title>CSV Format</title>
+
+    <para>
+     This format is used for importing from and exporting to the Comma
+     Separated Variable (<literal>CSV</>) file format used by many other programs,
+     such as spreadsheets. Instead of the escaping used by
+     <productname>PostgreSQL</productname>'s standard text mode, it produces
+     and recognises the common CSV escaping mechanism.
+    </para>
+
+    <para>
+     The values in each record are separated by the delimiter character.
+     If the value contains the delimiter character, the <literal>QUOTE</> character
+     or a carriage return or line feed character, then the whole value is prefixed
+     and suffixed by the quote character, and any occurrence within the value
+     of a quote character or the <literal>ESCAPE</> character is preceded
+     by the escape character.
+    </para>
+
+    <note>
+     <para>
+      CSV mode will both recognise and produce CSV files with quoted values
+      containing embedded carriage returns and line feeds. Thus the files are
+      not strictly one line per table row like non-CSV files.
+     </para>
+    </note>
+
+    <note>
+     <para>
+      Many programs produce strange and occasionally perverse CSV files, so
+      the file format is more a convention than a standard. Thus you might
+      encounter some files that cannot be imported using this mechanism, and
+      <command>COPY</> might produce files that other programs can not
+      process.
+     </para>
+    </note>
+
    </refsect2>

    <refsect2>
Index: doc/src/sgml/ref/psql-ref.sgml
===================================================================
RCS file: /cvsroot/pgsql-server/doc/src/sgml/ref/psql-ref.sgml,v
retrieving revision 1.110
diff -c -c -r1.110 psql-ref.sgml
*** doc/src/sgml/ref/psql-ref.sgml    12 Apr 2004 15:58:52 -0000    1.110
--- doc/src/sgml/ref/psql-ref.sgml    13 Apr 2004 04:18:25 -0000
***************
*** 711,716 ****
--- 711,718 ----
              [ <literal>oids</literal> ]
              [ <literal>delimiter [as] </literal> '<replaceable class="parameter">character</replaceable>' ]
              [ <literal>null [as] </literal> '<replaceable class="parameter">string</replaceable>' ]</literal>
+             [ <literal>csv [ quote [as] </literal> '<replaceable class="parameter">string</replaceable>' ]
+                            [ <literal>escape [as] </literal> '<replaceable class="parameter">string</replaceable>' ]
]
          </term>

          <listitem>
Index: src/backend/commands/copy.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/commands/copy.c,v
retrieving revision 1.219
diff -c -c -r1.219 copy.c
*** src/backend/commands/copy.c    6 Apr 2004 13:21:33 -0000    1.219
--- src/backend/commands/copy.c    13 Apr 2004 04:18:27 -0000
***************
*** 70,76 ****
  typedef enum CopyReadResult
  {
      NORMAL_ATTR,
!     END_OF_LINE
  } CopyReadResult;

  /*
--- 70,77 ----
  typedef enum CopyReadResult
  {
      NORMAL_ATTR,
!     END_OF_LINE,
!     UNTERMINATED_FIELD
  } CopyReadResult;

  /*
***************
*** 130,144 ****

  /* non-export function prototypes */
  static void CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
!        char *delim, char *null_print);
  static void CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
!          char *delim, char *null_print);
  static bool CopyReadLine(void);
  static char *CopyReadAttribute(const char *delim, const char *null_print,
                                 CopyReadResult *result, bool *isnull);
  static Datum CopyReadBinaryAttribute(int column_no, FmgrInfo *flinfo,
                          Oid typelem, bool *isnull);
  static void CopyAttributeOut(char *string, char *delim);
  static List *CopyGetAttnums(Relation rel, List *attnamelist);
  static void limit_printout_length(StringInfo buf);

--- 131,150 ----

  /* non-export function prototypes */
  static void CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
!        char *delim, char *null_print, bool csv_mode, char *quote, char *escape);
  static void CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
!          char *delim, char *null_print, bool csv_mode, char *quote, char *escape);
  static bool CopyReadLine(void);
  static char *CopyReadAttribute(const char *delim, const char *null_print,
                                 CopyReadResult *result, bool *isnull);
+ static char *CopyReadAttributeCSV(const char *delim, const char *null_print,
+                                char *quote, char *escape,
+                                CopyReadResult *result, bool *isnull);
  static Datum CopyReadBinaryAttribute(int column_no, FmgrInfo *flinfo,
                          Oid typelem, bool *isnull);
  static void CopyAttributeOut(char *string, char *delim);
+ static void CopyAttributeOutCSV(char *string, char *delim, char *quote,
+                                 char *escape, bool force_quote);
  static List *CopyGetAttnums(Relation rel, List *attnamelist);
  static void limit_printout_length(StringInfo buf);

***************
*** 682,688 ****
--- 688,697 ----
      List       *attnumlist;
      bool        binary = false;
      bool        oids = false;
+     bool        csv_mode = false;
      char       *delim = NULL;
+     char       *quote = NULL;
+     char       *escape = NULL;
      char       *null_print = NULL;
      Relation    rel;
      AclMode        required_access = (is_from ? ACL_INSERT : ACL_SELECT);
***************
*** 725,730 ****
--- 734,763 ----
                           errmsg("conflicting or redundant options")));
              null_print = strVal(defel->arg);
          }
+         else if (strcmp(defel->defname, "csv") == 0)
+         {
+             if (csv_mode)
+                 ereport(ERROR,
+                         (errcode(ERRCODE_SYNTAX_ERROR),
+                          errmsg("conflicting or redundant options")));
+             csv_mode = intVal(defel->arg);
+         }
+         else if (strcmp(defel->defname, "quote") == 0)
+         {
+             if (quote)
+                 ereport(ERROR,
+                         (errcode(ERRCODE_SYNTAX_ERROR),
+                          errmsg("conflicting or redundant options")));
+             quote = strVal(defel->arg);
+         }
+         else if (strcmp(defel->defname, "escape") == 0)
+         {
+             if (escape)
+                 ereport(ERROR,
+                         (errcode(ERRCODE_SYNTAX_ERROR),
+                          errmsg("conflicting or redundant options")));
+             escape = strVal(defel->arg);
+         }
          else
              elog(ERROR, "option \"%s\" not recognized",
                   defel->defname);
***************
*** 735,740 ****
--- 768,778 ----
                  (errcode(ERRCODE_SYNTAX_ERROR),
                   errmsg("cannot specify DELIMITER in BINARY mode")));

+     if (binary && csv_mode)
+         ereport(ERROR,
+                 (errcode(ERRCODE_SYNTAX_ERROR),
+                  errmsg("cannot specify CSV in BINARY mode")));
+
      if (binary && null_print)
          ereport(ERROR,
                  (errcode(ERRCODE_SYNTAX_ERROR),
***************
*** 742,751 ****

      /* Set defaults */
      if (!delim)
!         delim = "\t";
!
      if (!null_print)
!         null_print = "\\N";

      /*
       * Open and lock the relation, using the appropriate lock type.
--- 780,847 ----

      /* Set defaults */
      if (!delim)
!         delim = csv_mode ? "," : "\t";
!
      if (!null_print)
!         null_print = csv_mode ? "" : "\\N";
!
!     if (csv_mode)
!     {
!         if (!quote)
!             quote = "\"";
!         if (!escape)
!             escape = "\"";
!     }
!
!     /*
!      * Only single-character delimiter strings are supported.
!      */
!     if (strlen(delim) != 1)
!         ereport(ERROR,
!                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                  errmsg("COPY delimiter must be a single character")));
!
!     /*
!      * Check quote
!      */
!     if (!csv_mode && quote != NULL)
!         ereport(ERROR,
!                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                  errmsg("COPY quote available only in CSV mode")));
!
!     if (csv_mode && strlen(quote) != 1)
!         ereport(ERROR,
!                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                  errmsg("COPY quote must be a single character")));
!
!     /*
!      * Check escape
!      */
!     if (!csv_mode && escape != NULL)
!         ereport(ERROR,
!                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                  errmsg("COPY escape available only in CSV mode")));
!
!     if (csv_mode && strlen(escape) != 1)
!         ereport(ERROR,
!                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                  errmsg("COPY escape must be a single character")));
!
!     /*
!      * Don't allow the delimiter to appear in the null string.
!      */
!     if (strchr(null_print, delim[0]) != NULL)
!         ereport(ERROR,
!                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                  errmsg("COPY delimiter must not appear in the NULL specification")));
!
!     /*
!      * Don't allow the csv quote char to appear in the null string.
!      */
!     if (csv_mode && strchr(null_print, quote[0]) != NULL)
!         ereport(ERROR,
!                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                  errmsg("CSV quote character must not appear in the NULL specification")));

      /*
       * Open and lock the relation, using the appropriate lock type.
***************
*** 772,791 ****
                         "psql's \\copy command also works for anyone.")));

      /*
!      * Presently, only single-character delimiter strings are supported.
       */
-     if (strlen(delim) != 1)
-         ereport(ERROR,
-                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-                  errmsg("COPY delimiter must be a single character")));

!     /*
!      * Don't allow the delimiter to appear in the null string.
!      */
!     if (strchr(null_print, delim[0]) != NULL)
          ereport(ERROR,
!                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                  errmsg("COPY delimiter must not appear in the NULL specification")));

      /*
       * Don't allow COPY w/ OIDs to or from a table without them
--- 868,880 ----
                         "psql's \\copy command also works for anyone.")));

      /*
!      * Don't allow OIDs in CSV mode
       */

!     if (csv_mode && oids)  // FIX ME bjm
          ereport(ERROR,
!                 (errcode(ERRCODE_SYNTAX_ERROR),
!                  errmsg("Cannot specify OIDS in CSV mode ")));

      /*
       * Don't allow COPY w/ OIDs to or from a table without them
***************
*** 864,870 ****
                           errmsg("\"%s\" is a directory", filename)));
              }
          }
!         CopyFrom(rel, attnumlist, binary, oids, delim, null_print);
      }
      else
      {                            /* copy from database to file */
--- 953,960 ----
                           errmsg("\"%s\" is a directory", filename)));
              }
          }
!         CopyFrom(rel, attnumlist, binary, oids, delim, null_print, csv_mode,
!                  quote, escape);
      }
      else
      {                            /* copy from database to file */
***************
*** 926,932 ****
                           errmsg("\"%s\" is a directory", filename)));
              }
          }
!         CopyTo(rel, attnumlist, binary, oids, delim, null_print);
      }

      if (!pipe)
--- 1016,1023 ----
                           errmsg("\"%s\" is a directory", filename)));
              }
          }
!         CopyTo(rel, attnumlist, binary, oids, delim, null_print, csv_mode,
!                 quote, escape);
      }

      if (!pipe)
***************
*** 958,964 ****
   */
  static void
  CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
!        char *delim, char *null_print)
  {
      HeapTuple    tuple;
      TupleDesc    tupDesc;
--- 1049,1056 ----
   */
  static void
  CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
!        char *delim, char *null_print, bool csv_mode, char *quote,
!        char *escape)
  {
      HeapTuple    tuple;
      TupleDesc    tupDesc;
***************
*** 1051,1057 ****
      while ((tuple = heap_getnext(scandesc, ForwardScanDirection)) != NULL)
      {
          bool        need_delim = false;
-
          CHECK_FOR_INTERRUPTS();

          MemoryContextReset(mycontext);
--- 1143,1148 ----
***************
*** 1113,1119 ****
                                                             value,
                                    ObjectIdGetDatum(elements[attnum - 1]),
                              Int32GetDatum(attr[attnum - 1]->atttypmod)));
!                     CopyAttributeOut(string, delim);
                  }
                  else
                  {
--- 1204,1218 ----
                                                             value,
                                    ObjectIdGetDatum(elements[attnum - 1]),
                              Int32GetDatum(attr[attnum - 1]->atttypmod)));
!                     if (csv_mode)
!                     {
!                         bool force_quote = (strcmp(string, null_print) == 0);
!                         CopyAttributeOutCSV(string, delim, quote, escape,
!                                             force_quote);
!                     }
!                     else
!                         CopyAttributeOut(string, delim);
!
                  }
                  else
                  {
***************
*** 1243,1249 ****
   */
  static void
  CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
!          char *delim, char *null_print)
  {
      HeapTuple    tuple;
      TupleDesc    tupDesc;
--- 1342,1349 ----
   */
  static void
  CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
!          char *delim, char *null_print, bool csv_mode, char *quote,
!          char *escape)
  {
      HeapTuple    tuple;
      TupleDesc    tupDesc;
***************
*** 1388,1396 ****
      ExecBSInsertTriggers(estate, resultRelInfo);

      if (!binary)
-     {
          file_has_oids = oids;    /* must rely on user to tell us this... */
-     }
      else
      {
          /* Read and verify binary header */
--- 1488,1494 ----
***************
*** 1499,1504 ****
--- 1597,1603 ----

              if (file_has_oids)
              {
+                 /* can't be in CSV mode here */
                  string = CopyReadAttribute(delim, null_print,
                                             &result, &isnull);

***************
*** 1537,1550 ****
                               errmsg("missing data for column \"%s\"",
                                      NameStr(attr[m]->attname))));

!                 string = CopyReadAttribute(delim, null_print,
!                                            &result, &isnull);
!
!                 if (isnull)
                  {
!                     /* we read an SQL NULL, no need to do anything */
                  }
                  else
                  {
                      copy_attname = NameStr(attr[m]->attname);
                      values[m] = FunctionCall3(&in_functions[m],
--- 1636,1658 ----
                               errmsg("missing data for column \"%s\"",
                                      NameStr(attr[m]->attname))));

!                 if (csv_mode)
                  {
!                     string = CopyReadAttributeCSV(delim, null_print, quote,
!                                                   escape, &result, &isnull);
!                     if (result == UNTERMINATED_FIELD)
!                         ereport(ERROR,
!                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                                  errmsg("unterminated CSV quoted field")));
                  }
                  else
+                     string = CopyReadAttribute(delim, null_print,
+                                                &result, &isnull);
+
+
+
+                 /* we read an SQL NULL, no need to do anything */
+                 if (!isnull)
                  {
                      copy_attname = NameStr(attr[m]->attname);
                      values[m] = FunctionCall3(&in_functions[m],
***************
*** 2069,2074 ****
--- 2177,2328 ----
      return attribute_buf.data;
  }

+
+ /*
+  * Read the value of a single attribute in CSV mode,
+  * performing de-escaping as needed. Escaping does not follow the normal
+  * PostgreSQL text mode, but instead "standard" (i.e. common) CSV usage.
+  *
+  * Quoted fields can span lines, in which case the line end is embedded
+  * in the returned string.
+  *
+  * null_print is the null marker string.  Note that this is compared to
+  * the pre-de-escaped input string (thus if it is quoted it is not a NULL).
+  *
+  * *result is set to indicate what terminated the read:
+  *        NORMAL_ATTR:    column delimiter
+  *        END_OF_LINE:    end of line
+  *      UNTERMINATED_FIELD no quote detected at end of a quoted field
+  *
+  * In any case, the string read up to the terminator (or end of file)
+  * is returned.
+  *
+  * *isnull is set true or false depending on whether the input matched
+  * the null marker.  Note that the caller cannot check this since the
+  * returned string will be the post-de-escaping equivalent, which may
+  * look the same as some valid data string.
+  *----------
+  */
+
+ static char *
+ CopyReadAttributeCSV(const char *delim, const char *null_print, char *quote,
+                      char *escape, CopyReadResult *result, bool *isnull)
+ {
+     char        delimc = delim[0];
+     char        quotec = quote[0];
+     char        escapec = escape[0];
+     char        c;
+     int            start_cursor = line_buf.cursor;
+     int            end_cursor = start_cursor;
+     int            input_len;
+     bool        in_quote = false;
+     bool        saw_quote = false;
+
+     /* reset attribute_buf to empty */
+     attribute_buf.len = 0;
+     attribute_buf.data[0] = '\0';
+
+     /* set default status */
+     *result = END_OF_LINE;
+
+     for (;;)
+     {
+         /* handle multiline quoted fields */
+         if (in_quote && line_buf.cursor >= line_buf.len)
+         {
+             bool done;
+
+             switch(eol_type)
+             {
+                 case EOL_NL:
+                     appendStringInfoString(&attribute_buf,"\n");
+                     break;
+                 case EOL_CR:
+                     appendStringInfoString(&attribute_buf,"\r");
+                     break;
+                 case EOL_CRNL:
+                     appendStringInfoString(&attribute_buf,"\r\n");
+                     break;
+                 case EOL_UNKNOWN:
+                     /* shouldn't happen - just keep going */
+                     break;
+             }
+
+             copy_lineno++;
+             done = CopyReadLine();
+             if (done && line_buf.len == 0)
+                 break;
+             start_cursor = line_buf.cursor;
+         }
+
+         end_cursor = line_buf.cursor;
+         if (line_buf.cursor >= line_buf.len)
+             break;
+         c = line_buf.data[line_buf.cursor++];
+         /*
+          * unquoted field delimiter
+          */
+         if (!in_quote && c == delimc)
+         {
+             *result = NORMAL_ATTR;
+             break;
+         }
+         /*
+          * start of quoted field (or part of field)
+          */
+         if (!in_quote && c == quotec)
+         {
+             saw_quote = true;
+             in_quote = true;
+             continue;
+         }
+         /*
+          * escape within a quoted field
+          */
+         if (in_quote && c == escapec)
+         {
+             /*
+              * peek at the next char if available, and escape it if it
+              * is an escape char or a quote char
+              */
+             if (line_buf.cursor <= line_buf.len)
+             {
+                 char nextc = line_buf.data[line_buf.cursor];
+                 if (nextc == escapec || nextc == quotec)
+                 {
+                     appendStringInfoCharMacro(&attribute_buf, nextc);
+                     line_buf.cursor++;
+                     continue;
+                 }
+             }
+         }
+         /*
+          * end of quoted field.
+          * Must do this test after testing for escape in case quote char
+          * and escape char are the same (which is the common case).
+          */
+         if (in_quote && c == quotec)
+         {
+             in_quote = false;
+             continue;
+         }
+         appendStringInfoCharMacro(&attribute_buf, c);
+     }
+
+     if (in_quote)
+         *result = UNTERMINATED_FIELD;
+
+     /* check whether raw input matched null marker */
+     input_len = end_cursor - start_cursor;
+     if (!saw_quote && input_len == strlen(null_print) &&
+         strncmp(&line_buf.data[start_cursor], null_print, input_len) == 0)
+         *isnull = true;
+     else
+         *isnull = false;
+
+     return attribute_buf.data;
+ }
+
  /*
   * Read a binary attribute
   */
***************
*** 2192,2197 ****
--- 2446,2518 ----
                  break;
          }
      }
+ }
+
+ /*
+  * Send CSV representation of one attribute, with conversion and
+  * CSV type escaping
+  */
+ static void
+ CopyAttributeOutCSV(char *server_string, char *delim, char *quote,
+                     char *escape, bool force_quote)
+ {
+     char       *string;
+     char        c;
+     char        delimc = delim[0];
+     char        quotec = quote[0];
+      char        escapec = escape[0];
+     bool        need_quote = force_quote;
+     char        *test_string;
+     bool        same_encoding;
+     int            mblen;
+     int            i;
+
+     same_encoding = (server_encoding == client_encoding);
+     if (!same_encoding)
+         string = (char *) pg_server_to_client((unsigned char *) server_string,
+                                               strlen(server_string));
+     else
+         string = server_string;
+
+     /* have to run through the string twice,
+      * first time to see if it needs quoting, second to actually send it
+      */
+
+     for(test_string = string;
+         !need_quote && (c = *test_string) != '\0';
+         test_string += mblen)
+     {
+         if (c == delimc || c == quotec || c == '\n' || c == '\r')
+             need_quote = true;
+         if (!same_encoding)
+             mblen = pg_encoding_mblen(client_encoding, test_string);
+         else
+             mblen = 1;
+     }
+
+     if (need_quote)
+         CopySendChar(quotec);
+
+     for (; (c = *string) != '\0'; string += mblen)
+     {
+         if (c == quotec || c == escapec)
+             CopySendChar(escapec);
+
+         CopySendChar(c);
+
+         if (!same_encoding)
+         {
+             /* send additional bytes of the char, if any */
+             mblen = pg_encoding_mblen(client_encoding, string);
+             for (i = 1; i < mblen; i++)
+                 CopySendChar(string[i]);
+         }
+         else
+             mblen = 1;
+     }
+
+     if (need_quote)
+         CopySendChar(quotec);
  }

  /*
Index: src/backend/parser/gram.y
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/parser/gram.y,v
retrieving revision 2.450
diff -c -c -r2.450 gram.y
*** src/backend/parser/gram.y    5 Apr 2004 03:07:26 -0000    2.450
--- src/backend/parser/gram.y    13 Apr 2004 04:18:31 -0000
***************
*** 343,349 ****
      CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
      CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
      COMMITTED CONSTRAINT CONSTRAINTS CONVERSION_P CONVERT COPY CREATE CREATEDB
!     CREATEUSER CROSS CURRENT_DATE CURRENT_TIME
      CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE

      DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
--- 343,349 ----
      CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
      CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
      COMMITTED CONSTRAINT CONSTRAINTS CONVERSION_P CONVERT COPY CREATE CREATEDB
!     CREATEUSER CROSS CSV CURRENT_DATE CURRENT_TIME
      CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE

      DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
***************
*** 386,391 ****
--- 386,393 ----
      PRECISION PRESERVE PREPARE PRIMARY
      PRIOR PRIVILEGES PROCEDURAL PROCEDURE

+     QUOTE
+
      READ REAL RECHECK REFERENCES REINDEX RELATIVE_P RENAME REPEATABLE REPLACE
      RESET RESTART RESTRICT RETURNS REVOKE RIGHT ROLLBACK ROW ROWS
      RULE
***************
*** 1356,1361 ****
--- 1358,1375 ----
                  {
                      $$ = makeDefElem("delimiter", (Node *)makeString($3));
                  }
+             | CSV
+                 {
+                     $$ = makeDefElem("csv", (Node *)makeInteger(TRUE));
+                 }
+             | QUOTE opt_as Sconst
+                 {
+                     $$ = makeDefElem("quote", (Node *)makeString($3));
+                 }
+             | ESCAPE opt_as Sconst
+                 {
+                     $$ = makeDefElem("escape", (Node *)makeString($3));
+                 }
              | NULL_P opt_as Sconst
                  {
                      $$ = makeDefElem("null", (Node *)makeString($3));
***************
*** 7420,7425 ****
--- 7434,7440 ----
              | COPY
              | CREATEDB
              | CREATEUSER
+             | CSV
              | CURSOR
              | CYCLE
              | DATABASE
***************
*** 7507,7512 ****
--- 7522,7528 ----
              | PRIVILEGES
              | PROCEDURAL
              | PROCEDURE
+             | QUOTE
              | READ
              | RECHECK
              | REINDEX
Index: src/backend/parser/keywords.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/parser/keywords.c,v
retrieving revision 1.147
diff -c -c -r1.147 keywords.c
*** src/backend/parser/keywords.c    11 Mar 2004 01:47:40 -0000    1.147
--- src/backend/parser/keywords.c    13 Apr 2004 04:18:31 -0000
***************
*** 90,95 ****
--- 90,96 ----
      {"createdb", CREATEDB},
      {"createuser", CREATEUSER},
      {"cross", CROSS},
+     {"csv", CSV},
      {"current_date", CURRENT_DATE},
      {"current_time", CURRENT_TIME},
      {"current_timestamp", CURRENT_TIMESTAMP},
***************
*** 248,253 ****
--- 249,255 ----
      {"privileges", PRIVILEGES},
      {"procedural", PROCEDURAL},
      {"procedure", PROCEDURE},
+     {"quote", QUOTE},
      {"read", READ},
      {"real", REAL},
      {"recheck", RECHECK},
Index: src/bin/psql/copy.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/bin/psql/copy.c,v
retrieving revision 1.43
diff -c -c -r1.43 copy.c
*** src/bin/psql/copy.c    12 Apr 2004 15:58:52 -0000    1.43
--- src/bin/psql/copy.c    13 Apr 2004 04:18:32 -0000
***************
*** 66,73 ****
--- 66,76 ----
      bool        from;
      bool        binary;
      bool        oids;
+     bool        csv_mode;
      char       *delim;
      char       *null;
+     char       *quote;
+     char       *escape;
  };


***************
*** 81,86 ****
--- 84,91 ----
      free(ptr->file);
      free(ptr->delim);
      free(ptr->null);
+     free(ptr->quote);
+     free(ptr->escape);
      free(ptr);
  }

***************
*** 277,282 ****
--- 282,291 ----
              {
                  result->oids = true;
              }
+             else if (strcasecmp(token, "csv") == 0)
+             {
+                 result->csv_mode = true;
+             }
              else if (strcasecmp(token, "delimiter") == 0)
              {
                  token = strtokx(NULL, whitespace, NULL, "'",
***************
*** 301,306 ****
--- 310,339 ----
                  else
                      goto error;
              }
+             else if (strcasecmp(token, "quote") == 0)
+             {
+                 token = strtokx(NULL, whitespace, NULL, "'",
+                                 '\\', false, pset.encoding);
+                 if (token && strcasecmp(token, "as") == 0)
+                     token = strtokx(NULL, whitespace, NULL, "'",
+                                     '\\', false, pset.encoding);
+                 if (token)
+                     result->quote = pg_strdup(token);
+                 else
+                     goto error;
+             }
+             else if (strcasecmp(token, "escape") == 0)
+             {
+                 token = strtokx(NULL, whitespace, NULL, "'",
+                                 '\\', false, pset.encoding);
+                 if (token && strcasecmp(token, "as") == 0)
+                     token = strtokx(NULL, whitespace, NULL, "'",
+                                     '\\', false, pset.encoding);
+                 if (token)
+                     result->escape = pg_strdup(token);
+                 else
+                     goto error;
+             }
              else
                  goto error;

***************
*** 340,346 ****
      PGresult   *result;
      bool        success;
      struct stat st;
!
      /* parse options */
      options = parse_slash_copy(args);

--- 373,380 ----
      PGresult   *result;
      bool        success;
      struct stat st;
!     bool with_output = false;
!
      /* parse options */
      options = parse_slash_copy(args);

***************
*** 379,390 ****
--- 413,454 ----
                                options->delim);
      }

+     /* There is no backward-compatible CSV syntax */
      if (options->null)
      {
          if (options->null[0] == '\'')
              appendPQExpBuffer(&query, " WITH NULL AS %s", options->null);
          else
              appendPQExpBuffer(&query, " WITH NULL AS '%s'", options->null);
+         with_output = true;
+     }
+
+     if (options->csv_mode)
+     {
+         appendPQExpBuffer(&query, " %sCSV ", with_output ? "" : "WITH ");
+         with_output = true;
+     }
+
+     if (options->quote)
+     {
+         if (options->quote[0] == '\'')
+             appendPQExpBuffer(&query, " %sQUOTE AS %s",
+                 with_output ? "" : "WITH ", options->quote);
+         else
+             appendPQExpBuffer(&query, " %sQUOTE AS '%s'",
+                 with_output ? "" : "WITH ", options->quote);
+         with_output = true;
+     }
+
+     if (options->escape)
+     {
+         if (options->escape[0] == '\'')
+             appendPQExpBuffer(&query, " %sESCAPE AS %s",
+                 with_output ? "" : "WITH ", options->escape);
+         else
+             appendPQExpBuffer(&query, " %sESCAPE AS '%s'",
+                 with_output ? "" : "WITH ", options->escape);
+         with_output = true;
      }

      if (options->from)

pgsql-patches by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: aclitem accessor functions
Next
From: Claudio Natoli
Date:
Subject: win32 fixes