COPY for CSV documentation - Mailing list pgsql-patches

From Andrew Dunstan
Subject COPY for CSV documentation
Date
Msg-id 40781FB7.3090103@dunslane.net
Whole thread Raw
Responses Re: COPY for CSV documentation
List pgsql-patches
Attached is a patch with documentation for the CSV mode of COPY patch
submitted yesterday.

cheers

andrew
Index: doc/src/sgml/ref/copy.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql-server/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.55
diff -c -r1.55 copy.sgml
*** doc/src/sgml/ref/copy.sgml    13 Dec 2003 23:59:07 -0000    1.55
--- doc/src/sgml/ref/copy.sgml    10 Apr 2004 16:22:23 -0000
***************
*** 136,142 ****
       <para>
        Specifies copying the OID for each row.  (An error is raised if
        <literal>OIDS</literal> is specified for a table that does not
!       have OIDs.)
       </para>
      </listitem>
     </varlistentry>
--- 136,143 ----
       <para>
        Specifies copying the OID for each row.  (An error is raised if
        <literal>OIDS</literal> is specified for a table that does not
!       have OIDs, or if CSV mode is selected by specifying
!       <literal>DELIMITER</literal> longer then one character.)
       </para>
      </listitem>
     </varlistentry>
***************
*** 145,152 ****
      <term><replaceable class="parameter">delimiter</replaceable></term>
      <listitem>
       <para>
!       The single character that separates columns within each row
!       (line) of the file.  The default is a tab character.
       </para>
      </listitem>
     </varlistentry>
--- 146,162 ----
      <term><replaceable class="parameter">delimiter</replaceable></term>
      <listitem>
       <para>
!       A one- to three-character string containing characters used in importing
!       or exporting Text or Comma Separated Variable (CSV) files.
!       With only one character, Text file format is used.
!       If there are two or three characters, CSV format is used.
!       The first character for either file format is the delimiter character
!       used between values for a single row.
!       The second character is the CSV quote character,
!       and the third character, if present, is the CSV escape character
!       used inside quoted values, and defaults to the quote character.
!       The default for this parameter is a single tab character. (Thus TEXT
!       file mode is the default.)
       </para>
      </listitem>
     </varlistentry>
***************
*** 156,163 ****
      <listitem>
       <para>
        The string that represents a null value. The default is
!       <literal>\N</literal> (backslash-N). You might prefer an empty
!       string, for example.
       </para>

       <note>
--- 166,174 ----
      <listitem>
       <para>
        The string that represents a null value. The default is
!       <literal>\N</literal> (backslash-N), unless a CSV is being processed,
!       in which case it is an empty string. You might prefer an empty string
!       in any case.
       </para>

       <note>
***************
*** 168,173 ****
--- 179,197 ----
         <command>COPY TO</command>.
        </para>
       </note>
+
+      <note>
+       <para>
+        If you don't want anything used as null when using
+        <command>COPY FROM</command>, you can specify some value that is very
+        unlikely to appear in the file, such as <literal>frobnitz</literal> or
+        <literal>d5f4074b254c76cd8ae37bf1731f4aed</literal> (which is
+        <literal>md5('frobnitz')</literal>). This could be especially useful
+        when importing a CSV file into a table with <literal>NOT NULL</>
+        columns.
+       </para>
+      </note>
+
      </listitem>
     </varlistentry>
    </variablelist>
***************
*** 252,259 ****
     <title>Text Format</title>

     <para>
!     When <command>COPY</command> is used without the <literal>BINARY</literal> option,
!     the data read or written is a text file with one line per table row.
      Columns in a row are separated by the delimiter character.
      The column values themselves are strings generated by the
      output function, or acceptable to the input function, of each
--- 276,285 ----
     <title>Text Format</title>

     <para>
!     When <command>COPY</command> is used without the <literal>BINARY</literal>
!     option, the data read or written is a text file with one line per table
!     row unless <literal>DELIMITER</literal> is longer than one character,
!     in which case CSV format is used.
      Columns in a row are separated by the delimiter character.
      The column values themselves are strings generated by the
      output function, or acceptable to the input function, of each
***************
*** 380,385 ****
--- 406,459 ----
    </refsect2>

    <refsect2>
+    <title>CSV format</title>
+
+    <para>
+     This format is used for importing from and exporting to the Comma
+     Separated Variable (CSV) file format used by many other programs,
+     such as spreadsheets. Instead of escaping in
+     <productname>PostgreSQL</productname>'s standard text mode, it produces
+     and recognises the common CSV escaping mechanism.
+    </para>
+
+    <para>
+     The values in each record are separated by the delimiter character, which
+     is the first character in the <literal>DELIMITER</literal> parameter.
+     If the value contains the delimiter character, the quote character (the
+     second character in the <literal>DELIMITER</literal> parameter), or a
+     carriage return or line feed character, then the whole value is prefixed
+     and suffixed by the quote character, and any occurrence within the value
+     of a quote character or the escape character (the third character in the
+     <literal>DELIMITER</literal> parameter if present, or else just the quote
+     character itself) is preceded by the escape character.
+    </para>
+
+    <para>
+     In the most common cases, simply specifying <literal>DELIMITER ',"'</>
+     or <literal>DELIMITER ',\''</> should suffice, both for
+     <command>COPY FROM</> and <command>COPY TO</>.
+    </para>
+
+    <note>
+     <para>
+      CSV mode will both recognise and produce CSV files with quoted values
+      containing embedded carriage returns and line feeds. Thus the files are
+      not strictly one line per table row, as TEXT files are.
+     </para>
+    </note>
+
+    <note>
+     <para>
+      Many programs produce strange and occasionally perverse CSV files, and
+      the file format is a convention rather than a standard. Thus you might
+      encounter some files that cannot be imported using this mechanism, and
+      you might produce files that other programs fail to recognise properly.
+     </para>
+    </note>
+
+   </refsect2>
+
+   <refsect2>
     <title>Binary Format</title>

     <para>
***************
*** 532,537 ****
--- 606,620 ----
     using the vertical bar (<literal>|</literal>) as the field delimiter:
  <programlisting>
  COPY country TO STDOUT WITH DELIMITER '|';
+ </programlisting>
+   </para>
+
+   <para>
+    To do the same thing but instead produce a standard CSV file, using
+    <literal>,</> as the delimiter and <literal>"</> as both the quote and
+    escape characters:
+ <programlisting>
+ COPY country TO STDOUT WITH DELIMITER ',"';
  </programlisting>
    </para>


pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: stdin/stdout mismatch for COPY and \copy
Next
From: Bruce Momjian
Date:
Subject: Re: sound extract century/millennium date_part