Thread: COPY for CSV documentation

COPY for CSV documentation

From
Andrew Dunstan
Date:
Attached is a patch with documentation for the CSV mode of COPY patch
submitted yesterday.

cheers

andrew
Index: doc/src/sgml/ref/copy.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql-server/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.55
diff -c -r1.55 copy.sgml
*** doc/src/sgml/ref/copy.sgml    13 Dec 2003 23:59:07 -0000    1.55
--- doc/src/sgml/ref/copy.sgml    10 Apr 2004 16:22:23 -0000
***************
*** 136,142 ****
       <para>
        Specifies copying the OID for each row.  (An error is raised if
        <literal>OIDS</literal> is specified for a table that does not
!       have OIDs.)
       </para>
      </listitem>
     </varlistentry>
--- 136,143 ----
       <para>
        Specifies copying the OID for each row.  (An error is raised if
        <literal>OIDS</literal> is specified for a table that does not
!       have OIDs, or if CSV mode is selected by specifying
!       <literal>DELIMITER</literal> longer then one character.)
       </para>
      </listitem>
     </varlistentry>
***************
*** 145,152 ****
      <term><replaceable class="parameter">delimiter</replaceable></term>
      <listitem>
       <para>
!       The single character that separates columns within each row
!       (line) of the file.  The default is a tab character.
       </para>
      </listitem>
     </varlistentry>
--- 146,162 ----
      <term><replaceable class="parameter">delimiter</replaceable></term>
      <listitem>
       <para>
!       A one- to three-character string containing characters used in importing
!       or exporting Text or Comma Separated Variable (CSV) files.
!       With only one character, Text file format is used.
!       If there are two or three characters, CSV format is used.
!       The first character for either file format is the delimiter character
!       used between values for a single row.
!       The second character is the CSV quote character,
!       and the third character, if present, is the CSV escape character
!       used inside quoted values, and defaults to the quote character.
!       The default for this parameter is a single tab character. (Thus TEXT
!       file mode is the default.)
       </para>
      </listitem>
     </varlistentry>
***************
*** 156,163 ****
      <listitem>
       <para>
        The string that represents a null value. The default is
!       <literal>\N</literal> (backslash-N). You might prefer an empty
!       string, for example.
       </para>

       <note>
--- 166,174 ----
      <listitem>
       <para>
        The string that represents a null value. The default is
!       <literal>\N</literal> (backslash-N), unless a CSV is being processed,
!       in which case it is an empty string. You might prefer an empty string
!       in any case.
       </para>

       <note>
***************
*** 168,173 ****
--- 179,197 ----
         <command>COPY TO</command>.
        </para>
       </note>
+
+      <note>
+       <para>
+        If you don't want anything used as null when using
+        <command>COPY FROM</command>, you can specify some value that is very
+        unlikely to appear in the file, such as <literal>frobnitz</literal> or
+        <literal>d5f4074b254c76cd8ae37bf1731f4aed</literal> (which is
+        <literal>md5('frobnitz')</literal>). This could be especially useful
+        when importing a CSV file into a table with <literal>NOT NULL</>
+        columns.
+       </para>
+      </note>
+
      </listitem>
     </varlistentry>
    </variablelist>
***************
*** 252,259 ****
     <title>Text Format</title>

     <para>
!     When <command>COPY</command> is used without the <literal>BINARY</literal> option,
!     the data read or written is a text file with one line per table row.
      Columns in a row are separated by the delimiter character.
      The column values themselves are strings generated by the
      output function, or acceptable to the input function, of each
--- 276,285 ----
     <title>Text Format</title>

     <para>
!     When <command>COPY</command> is used without the <literal>BINARY</literal>
!     option, the data read or written is a text file with one line per table
!     row unless <literal>DELIMITER</literal> is longer than one character,
!     in which case CSV format is used.
      Columns in a row are separated by the delimiter character.
      The column values themselves are strings generated by the
      output function, or acceptable to the input function, of each
***************
*** 380,385 ****
--- 406,459 ----
    </refsect2>

    <refsect2>
+    <title>CSV format</title>
+
+    <para>
+     This format is used for importing from and exporting to the Comma
+     Separated Variable (CSV) file format used by many other programs,
+     such as spreadsheets. Instead of escaping in
+     <productname>PostgreSQL</productname>'s standard text mode, it produces
+     and recognises the common CSV escaping mechanism.
+    </para>
+
+    <para>
+     The values in each record are separated by the delimiter character, which
+     is the first character in the <literal>DELIMITER</literal> parameter.
+     If the value contains the delimiter character, the quote character (the
+     second character in the <literal>DELIMITER</literal> parameter), or a
+     carriage return or line feed character, then the whole value is prefixed
+     and suffixed by the quote character, and any occurrence within the value
+     of a quote character or the escape character (the third character in the
+     <literal>DELIMITER</literal> parameter if present, or else just the quote
+     character itself) is preceded by the escape character.
+    </para>
+
+    <para>
+     In the most common cases, simply specifying <literal>DELIMITER ',"'</>
+     or <literal>DELIMITER ',\''</> should suffice, both for
+     <command>COPY FROM</> and <command>COPY TO</>.
+    </para>
+
+    <note>
+     <para>
+      CSV mode will both recognise and produce CSV files with quoted values
+      containing embedded carriage returns and line feeds. Thus the files are
+      not strictly one line per table row, as TEXT files are.
+     </para>
+    </note>
+
+    <note>
+     <para>
+      Many programs produce strange and occasionally perverse CSV files, and
+      the file format is a convention rather than a standard. Thus you might
+      encounter some files that cannot be imported using this mechanism, and
+      you might produce files that other programs fail to recognise properly.
+     </para>
+    </note>
+
+   </refsect2>
+
+   <refsect2>
     <title>Binary Format</title>

     <para>
***************
*** 532,537 ****
--- 606,620 ----
     using the vertical bar (<literal>|</literal>) as the field delimiter:
  <programlisting>
  COPY country TO STDOUT WITH DELIMITER '|';
+ </programlisting>
+   </para>
+
+   <para>
+    To do the same thing but instead produce a standard CSV file, using
+    <literal>,</> as the delimiter and <literal>"</> as both the quote and
+    escape characters:
+ <programlisting>
+ COPY country TO STDOUT WITH DELIMITER ',"';
  </programlisting>
    </para>


Re: COPY for CSV documentation

From
Bruce Momjian
Date:
I have reviewed this patch.  Basically, CSV is enabled by specifying
more than one delimiter character to COPY, e.g. DELIMITER ',"' or
DELIMITER ',""'.  Is this API good for folks?

Prior to 7.2, a multi-character delimiter could be specified, but only
the first character was used.  7.2 release notes state:

     * COPY DELIMITERS string must be exactly one character (Tom)

I am a little worried about multibyte too, as you mentioned.  It doesn't
look like we support multi-byte delimiters.  I see in the code:

    if (strlen(delim) != 1)
        ereport(ERROR,

With this code, if you use multibyte delimiters, you get an error.
However, with CVS, I can see cases where someone trying to use multibyte
would get a very strange output file rather than an error.  It would use
byte one of the multibyte for the delimiter, and the second byte for the
quote, and maybe a third byte for quoting quotes.

We could use pg_mblen().  Maybe we should compute pg_mblen and strlen on
the delimiter and throw an error if the they are different lengths.
That would prevent multibyte delimiters.

Comments?

---------------------------------------------------------------------------

Andrew Dunstan wrote:
>
> Attached is a patch with documentation for the CSV mode of COPY patch
> submitted yesterday.
>
> cheers
>
> andrew

> Index: doc/src/sgml/ref/copy.sgml
> ===================================================================
> RCS file: /projects/cvsroot/pgsql-server/doc/src/sgml/ref/copy.sgml,v
> retrieving revision 1.55
> diff -c -r1.55 copy.sgml
> *** doc/src/sgml/ref/copy.sgml    13 Dec 2003 23:59:07 -0000    1.55
> --- doc/src/sgml/ref/copy.sgml    10 Apr 2004 16:22:23 -0000
> ***************
> *** 136,142 ****
>        <para>
>         Specifies copying the OID for each row.  (An error is raised if
>         <literal>OIDS</literal> is specified for a table that does not
> !       have OIDs.)
>        </para>
>       </listitem>
>      </varlistentry>
> --- 136,143 ----
>        <para>
>         Specifies copying the OID for each row.  (An error is raised if
>         <literal>OIDS</literal> is specified for a table that does not
> !       have OIDs, or if CSV mode is selected by specifying
> !       <literal>DELIMITER</literal> longer then one character.)
>        </para>
>       </listitem>
>      </varlistentry>
> ***************
> *** 145,152 ****
>       <term><replaceable class="parameter">delimiter</replaceable></term>
>       <listitem>
>        <para>
> !       The single character that separates columns within each row
> !       (line) of the file.  The default is a tab character.
>        </para>
>       </listitem>
>      </varlistentry>
> --- 146,162 ----
>       <term><replaceable class="parameter">delimiter</replaceable></term>
>       <listitem>
>        <para>
> !       A one- to three-character string containing characters used in importing
> !       or exporting Text or Comma Separated Variable (CSV) files.
> !       With only one character, Text file format is used.
> !       If there are two or three characters, CSV format is used.
> !       The first character for either file format is the delimiter character
> !       used between values for a single row.
> !       The second character is the CSV quote character,
> !       and the third character, if present, is the CSV escape character
> !       used inside quoted values, and defaults to the quote character.
> !       The default for this parameter is a single tab character. (Thus TEXT
> !       file mode is the default.)
>        </para>
>       </listitem>
>      </varlistentry>
> ***************
> *** 156,163 ****
>       <listitem>
>        <para>
>         The string that represents a null value. The default is
> !       <literal>\N</literal> (backslash-N). You might prefer an empty
> !       string, for example.
>        </para>
>
>        <note>
> --- 166,174 ----
>       <listitem>
>        <para>
>         The string that represents a null value. The default is
> !       <literal>\N</literal> (backslash-N), unless a CSV is being processed,
> !       in which case it is an empty string. You might prefer an empty string
> !       in any case.
>        </para>
>
>        <note>
> ***************
> *** 168,173 ****
> --- 179,197 ----
>          <command>COPY TO</command>.
>         </para>
>        </note>
> +
> +      <note>
> +       <para>
> +        If you don't want anything used as null when using
> +        <command>COPY FROM</command>, you can specify some value that is very
> +        unlikely to appear in the file, such as <literal>frobnitz</literal> or
> +        <literal>d5f4074b254c76cd8ae37bf1731f4aed</literal> (which is
> +        <literal>md5('frobnitz')</literal>). This could be especially useful
> +        when importing a CSV file into a table with <literal>NOT NULL</>
> +        columns.
> +       </para>
> +      </note>
> +
>       </listitem>
>      </varlistentry>
>     </variablelist>
> ***************
> *** 252,259 ****
>      <title>Text Format</title>
>
>      <para>
> !     When <command>COPY</command> is used without the <literal>BINARY</literal> option,
> !     the data read or written is a text file with one line per table row.
>       Columns in a row are separated by the delimiter character.
>       The column values themselves are strings generated by the
>       output function, or acceptable to the input function, of each
> --- 276,285 ----
>      <title>Text Format</title>
>
>      <para>
> !     When <command>COPY</command> is used without the <literal>BINARY</literal>
> !     option, the data read or written is a text file with one line per table
> !     row unless <literal>DELIMITER</literal> is longer than one character,
> !     in which case CSV format is used.
>       Columns in a row are separated by the delimiter character.
>       The column values themselves are strings generated by the
>       output function, or acceptable to the input function, of each
> ***************
> *** 380,385 ****
> --- 406,459 ----
>     </refsect2>
>
>     <refsect2>
> +    <title>CSV format</title>
> +
> +    <para>
> +     This format is used for importing from and exporting to the Comma
> +     Separated Variable (CSV) file format used by many other programs,
> +     such as spreadsheets. Instead of escaping in
> +     <productname>PostgreSQL</productname>'s standard text mode, it produces
> +     and recognises the common CSV escaping mechanism.
> +    </para>
> +
> +    <para>
> +     The values in each record are separated by the delimiter character, which
> +     is the first character in the <literal>DELIMITER</literal> parameter.
> +     If the value contains the delimiter character, the quote character (the
> +     second character in the <literal>DELIMITER</literal> parameter), or a
> +     carriage return or line feed character, then the whole value is prefixed
> +     and suffixed by the quote character, and any occurrence within the value
> +     of a quote character or the escape character (the third character in the
> +     <literal>DELIMITER</literal> parameter if present, or else just the quote
> +     character itself) is preceded by the escape character.
> +    </para>
> +
> +    <para>
> +     In the most common cases, simply specifying <literal>DELIMITER ',"'</>
> +     or <literal>DELIMITER ',\''</> should suffice, both for
> +     <command>COPY FROM</> and <command>COPY TO</>.
> +    </para>
> +
> +    <note>
> +     <para>
> +      CSV mode will both recognise and produce CSV files with quoted values
> +      containing embedded carriage returns and line feeds. Thus the files are
> +      not strictly one line per table row, as TEXT files are.
> +     </para>
> +    </note>
> +
> +    <note>
> +     <para>
> +      Many programs produce strange and occasionally perverse CSV files, and
> +      the file format is a convention rather than a standard. Thus you might
> +      encounter some files that cannot be imported using this mechanism, and
> +      you might produce files that other programs fail to recognise properly.
> +     </para>
> +    </note>
> +
> +   </refsect2>
> +
> +   <refsect2>
>      <title>Binary Format</title>
>
>      <para>
> ***************
> *** 532,537 ****
> --- 606,620 ----
>      using the vertical bar (<literal>|</literal>) as the field delimiter:
>   <programlisting>
>   COPY country TO STDOUT WITH DELIMITER '|';
> + </programlisting>
> +   </para>
> +
> +   <para>
> +    To do the same thing but instead produce a standard CSV file, using
> +    <literal>,</> as the delimiter and <literal>"</> as both the quote and
> +    escape characters:
> + <programlisting>
> + COPY country TO STDOUT WITH DELIMITER ',"';
>   </programlisting>
>     </para>
>

>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: COPY for CSV documentation

From
Andrew Dunstan
Date:

Bruce Momjian wrote:

>I have reviewed this patch.  Basically, CSV is enabled by specifying
>more than one delimiter character to COPY, e.g. DELIMITER ',"' or
>DELIMITER ',""'.  Is this API good for folks?
>

(if not I'll be displeased, as it is what was the general consensus
about a month ago)

>
>Prior to 7.2, a multi-character delimiter could be specified, but only
>the first character was used.  7.2 release notes state:
>
>     * COPY DELIMITERS string must be exactly one character (Tom)
>
>I am a little worried about multibyte too, as you mentioned.
>

I am not sure that CSVs even make sense in a multibyte world. If not, it
would make sense to disable it in such a case.

(Is that your only worry?)

cheers

andrew





Re: COPY for CSV documentation

From
Bruce Momjian
Date:
Andrew Dunstan wrote:
>
>
> Bruce Momjian wrote:
>
> >I have reviewed this patch.  Basically, CSV is enabled by specifying
> >more than one delimiter character to COPY, e.g. DELIMITER ',"' or
> >DELIMITER ',""'.  Is this API good for folks?
> >
>
> (if not I'll be displeased, as it is what was the general consensus
> about a month ago)

OK

> >Prior to 7.2, a multi-character delimiter could be specified, but only
> >the first character was used.  7.2 release notes state:
> >
> >     * COPY DELIMITERS string must be exactly one character (Tom)
> >
> >I am a little worried about multibyte too, as you mentioned.
> >
>
> I am not sure that CSVs even make sense in a multibyte world. If not, it
> would make sense to disable it in such a case.
>
> (Is that your only worry?)

Yes, my worry is that someone will use a multibyte character that the
system sees as several bytes and enters CSV mode.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: COPY for CSV documentation

From
Andrew Dunstan
Date:

Bruce Momjian wrote:

>>>Prior to 7.2, a multi-character delimiter could be specified, but only
>>>the first character was used.  7.2 release notes state:
>>>
>>>    * COPY DELIMITERS string must be exactly one character (Tom)
>>>
>>>I am a little worried about multibyte too, as you mentioned.
>>>
>>>
>>>
>>I am not sure that CSVs even make sense in a multibyte world. If not, it
>>would make sense to disable it in such a case.
>>
>>(Is that your only worry?)
>>
>>
>
>Yes, my worry is that someone will use a multibyte character that the
>system sees as several bytes and enters CSV mode.
>


How about if we specify it explicitly, like BINARY, instead of it being
implied by the length of DELIMITER?

COPY a FROM stdin CSV DELIMITER ',"';

That would make the patch somewhat more extensive, but maybe not hugely
more invasive (I tried to keep it as uninvasive as possible). I could do
that, I think.

cheers

andrew


Re: COPY for CSV documentation

From
Bruce Momjian
Date:
Andrew Dunstan wrote:
> >>I am not sure that CSVs even make sense in a multibyte world. If not, it
> >>would make sense to disable it in such a case.
> >>
> >>(Is that your only worry?)
> >>
> >>
> >
> >Yes, my worry is that someone will use a multibyte character that the
> >system sees as several bytes and enters CSV mode.
> >
>
>
> How about if we specify it explicitly, like BINARY, instead of it being
> implied by the length of DELIMITER?
>
> COPY a FROM stdin CSV DELIMITER ',"';
>
> That would make the patch somewhat more extensive, but maybe not hugely
> more invasive (I tried to keep it as uninvasive as possible). I could do
> that, I think.

That's what I was wondering.  Is triggering CSV for multi-character
delimiters a little too clever?  This reminds me of the use of LIMIT X,Y
with no indication which is limit and which is offset.

We certainly could code to prevent the multibyte problem I mentioned,
but should we?

I am thinking just:

> COPY a FROM stdin WITH CSV ',"';

or

> COPY a FROM stdin WITH DELIMITER "," QUOTE '"' EQUOTE '"';

EQUOTE for embedded quote.  These are used in very limited situations
and don't have to be reserved words or anything.

I can help with these changes if folks like them.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: COPY for CSV documentation

From
"Andrew Dunstan"
Date:
Bruce Momjian said:
>> >Yes, my worry is that someone will use a multibyte character that the
>> >system sees as several bytes and enters CSV mode.
>> >
>>
>>
>> How about if we specify it explicitly, like BINARY, instead of it
>> being  implied by the length of DELIMITER?
>>
>> COPY a FROM stdin CSV DELIMITER ',"';
>>
>> That would make the patch somewhat more extensive, but maybe not
>> hugely  more invasive (I tried to keep it as uninvasive as possible).
>> I could do  that, I think.
>
> That's what I was wondering.  Is triggering CSV for multi-character
> delimiters a little too clever?  This reminds me of the use of LIMIT
> X,Y with no indication which is limit and which is offset.
>
> We certainly could code to prevent the multibyte problem I mentioned,
> but should we?


I confess that in my anglocentric world I have remained lamentably
ignorant of how MBCS works. Just reading up a little, and looking over
some of our code (e.g. the scanner) it looks like the simple solution
would be to check that the delimiter was 8-bit clean. (I assume that ASCII
is a subset of every MBCS we support - is that correct?)

However ...

>
> I am thinking just:
>
>> COPY a FROM stdin WITH CSV ',"';
>
> or
>
>> COPY a FROM stdin WITH DELIMITER "," QUOTE '"' EQUOTE '"';
>
> EQUOTE for embedded quote.  These are used in very limited situations
> and don't have to be reserved words or anything.
>
> I can help with these changes if folks like them.
>


I prefer either the first, because it ensures things are specified
together.

If you want to do that I will work on some regression tests.

cheers

andrew





Re: COPY for CSV documentation

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> That's what I was wondering.  Is triggering CSV for multi-character
> delimiters a little too clever?  This reminds me of the use of LIMIT X,Y
> with no indication which is limit and which is offset.

I agree, this seems risky and not at all readable to someone who doesn't
remember exactly how the parameter is defined.

>> COPY a FROM stdin WITH DELIMITER "," QUOTE '"' EQUOTE '"';
> EQUOTE for embedded quote.

ESCAPE would be better no?  It's already a keyword ...

BTW, don't forget that the syntax options have to be provided in psql's
\copy as well.  Did the patch cover that?

            regards, tom lane

Re: COPY for CSV documentation

From
Bruce Momjian
Date:
Andrew Dunstan wrote:
> > I am thinking just:
> >
> >> COPY a FROM stdin WITH CSV ',"';
> >
> > or
> >
> >> COPY a FROM stdin WITH DELIMITER "," QUOTE '"' EQUOTE '"';
> >
> > EQUOTE for embedded quote.  These are used in very limited situations
> > and don't have to be reserved words or anything.
> >
> > I can help with these changes if folks like them.
> >
>
>
> I prefer either the first, because it ensures things are specified
> together.
>
> If you want to do that I will work on some regression tests.


( Jump to the bottom for my final solution.)


I thought about this today.  I am thinking of:

> COPY a FROM stdin WITH CSV

or

> COPY a FROM stdin WITH CSV '""' DELIMITER ','

In other words, the string after CSV is optional.  However, looking at
the COPY syntax, there isn't any case where we have an optional string
after a keyword.  Is that OK?

In this case, CVS is a mode that makes the delimiter ',' and the quote
and quote escape '"'.  This way, CVS doesn't require any special quote
if you want the default.

However, this still has CSV using a two-character string with special
meaning for the first and second characters.  What if we call it QUOTE
mode:

> COPY a FROM stdin WITH QUOTE

that enables CVS with comma delimiters and '"' for quote escape, so the
above would be equavalent to:

> COPY a FROM stdin WITH QUOTE '"' ESCAPE '"' DELIMITER ','

(I have used ESCAPE because Tom suggested it and it is already a
keyword.)

I am a little worried that ESCAPE only has meaning with QUOTE, and QUOTE
sets defaults for all the others.  This makes the syntax addition pretty
confusing for users.

---------------------------------------------------------------------------

Thinking further, maybe we need to add CSV, QUOTE, and ESCAPE to COPY.
QUOTE and ESCAPE are only available in CVS mode, so you can say:

> COPY a FROM stdin WITH CSV

or

> COPY a FROM stdin WITH CSV ESCAPE '\\'

This means that there is no optional string for keywords.  Here is the
line at the bottom we have to add to the COPY syntax.

       COPY tablename [ ( column [, ...] ) ]
           TO { 'filename' | STDOUT }
           [ [ WITH ]
                 [ BINARY ]
                 [ OIDS ]
                 [ DELIMITER [ AS ] 'delimiter' ]
                 [ NULL [ AS ] 'null string' ] ]

                 [ CSV [ QUOTE 'quote' ] [ ESCAPE 'escape' ] ]

DELIMITER default to tab, except in CSV mode, where it is a comma.

That sounds very clear to me.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: COPY for CSV documentation

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > That's what I was wondering.  Is triggering CSV for multi-character
> > delimiters a little too clever?  This reminds me of the use of LIMIT X,Y
> > with no indication which is limit and which is offset.
>
> I agree, this seems risky and not at all readable to someone who doesn't
> remember exactly how the parameter is defined.

Yep.

> >> COPY a FROM stdin WITH DELIMITER "," QUOTE '"' EQUOTE '"';
> > EQUOTE for embedded quote.
>
> ESCAPE would be better no?  It's already a keyword ...

Yep.

> BTW, don't forget that the syntax options have to be provided in psql's
> \copy as well.  Did the patch cover that?

Oh, yea.  That needs to be done too.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: COPY for CSV documentation

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> In other words, the string after CSV is optional.  However, looking at
> the COPY syntax, there isn't any case where we have an optional string
> after a keyword.  Is that OK?

Seems better to avoid it.

> However, this still has CSV using a two-character string with special
> meaning for the first and second characters.

One point that I don't think was made before is that if we do any such
thing, we'll be forever foreclosing any chance of allowing
multi-character delimiters.  ISTM that would not be forward-looking.

            regards, tom lane

Re: COPY for CSV documentation

From
"Andrew Dunstan"
Date:
Tom Lane said:
>>> COPY a FROM stdin WITH DELIMITER "," QUOTE '"' EQUOTE '"';
>> EQUOTE for embedded quote.
>
> ESCAPE would be better no?  It's already a keyword ...

much better.

>
> BTW, don't forget that the syntax options have to be provided in psql's
> \copy as well.  Did the patch cover that?
>

No, because it didn't make a syntax change. I admit I didn't test that
though.

cheers

andrew



Re: COPY for CSV documentation

From
"Andrew Dunstan"
Date:
Bruce Momjian said:
>
>       COPY tablename [ ( column [, ...] ) ]
>           TO { 'filename' | STDOUT }
>           [ [ WITH ]
>                 [ BINARY ]
>                 [ OIDS ]
>                 [ DELIMITER [ AS ] 'delimiter' ]
>                 [ NULL [ AS ] 'null string' ] ]
>
>                 [ CSV [ QUOTE 'quote' ] [ ESCAPE 'escape' ] ]
>
> DELIMITER default to tab, except in CSV mode, where it is a comma.
>
> That sounds very clear to me.
>


TIMTOWTDI, as we say in the perl world, and we could debate it endlessly.
This looks fine to me.

a few points:

. ESCAPE should default to same-as-quote
. in CSV mode, NULL should default to '' - that was in what I sent in.
. I assume that for now we will enforce the one byte rule for QUOTE and
ESCAPE as well as for DELIMITER.

cheers

andrew





Re: COPY for CSV documentation

From
Bruce Momjian
Date:
Andrew Dunstan wrote:
> Bruce Momjian said:
> >
> >       COPY tablename [ ( column [, ...] ) ]
> >           TO { 'filename' | STDOUT }
> >           [ [ WITH ]
> >                 [ BINARY ]
> >                 [ OIDS ]
> >                 [ DELIMITER [ AS ] 'delimiter' ]
> >                 [ NULL [ AS ] 'null string' ] ]
> >
> >                 [ CSV [ QUOTE 'quote' ] [ ESCAPE 'escape' ] ]
> >
> > DELIMITER default to tab, except in CSV mode, where it is a comma.
> >
> > That sounds very clear to me.
> >
>
>
> TIMTOWTDI, as we say in the perl world, and we could debate it endlessly.
> This looks fine to me.
>
> a few points:
>
> . ESCAPE should default to same-as-quote
> . in CSV mode, NULL should default to '' - that was in what I sent in.
> . I assume that for now we will enforce the one byte rule for QUOTE and
> ESCAPE as well as for DELIMITER.

Yes, agreed.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: COPY for CSV documentation

From
"Andrew Dunstan"
Date:
Bruce Momjian said:
> Andrew Dunstan wrote:
>> Bruce Momjian said:
>> >
>> >       COPY tablename [ ( column [, ...] ) ]
>> >           TO { 'filename' | STDOUT }
>> >           [ [ WITH ]
>> >                 [ BINARY ]
>> >                 [ OIDS ]
>> >                 [ DELIMITER [ AS ] 'delimiter' ]
>> >                 [ NULL [ AS ] 'null string' ] ]
>> >
>> >                 [ CSV [ QUOTE 'quote' ] [ ESCAPE 'escape' ] ]
>> >
>> > DELIMITER default to tab, except in CSV mode, where it is a comma.
>> >
>> > That sounds very clear to me.
>> >
>>
>>
>> TIMTOWTDI, as we say in the perl world, and we could debate it
>> endlessly. This looks fine to me.
>>
>> a few points:
>>
>> . ESCAPE should default to same-as-quote
>> . in CSV mode, NULL should default to '' - that was in what I sent in.
>> . I assume that for now we will enforce the one byte rule for QUOTE
>> and ESCAPE as well as for DELIMITER.
>
> Andrew, would you like me to take your patch, implement the changes
> discussed, add \copy, and post the result for your testing?
>

That would be great. Thanks. I don't think the changes to what I did with
copy.c will be huge.

BTW, how to do this is asked at least once a day on the #postgresql
channel on freenode - a lot of people *really* want this.

cheers

andrew



Re: COPY for CSV documentation

From
Bruno Wolff III
Date:
On Mon, Apr 12, 2004 at 02:26:14 -0400,
  Andrew Dunstan <andrew@dunslane.net> wrote:
>
> a few points:
>
> . in CSV mode, NULL should default to '' - that was in what I sent in.

Postgres normally treats an empty string as an empty string. Are you sure
you really want it to be treated as a NULL by default in this one place?

Re: COPY for CSV documentation

From
"Andrew Dunstan"
Date:
Bruno Wolff III said:
> On Mon, Apr 12, 2004 at 02:26:14 -0400,
>  Andrew Dunstan <andrew@dunslane.net> wrote:
>>
>> a few points:
>>
>> . in CSV mode, NULL should default to '' - that was in what I sent in.
>
> Postgres normally treats an empty string as an empty string. Are you
> sure you really want it to be treated as a NULL by default in this one
> place?
>

Yes ;-)

Otherwise, what will happen when we try to import into some non-text field
for which '' is not a valid value?

Spreadsheets commonly represent missing values as empty strings when
reading/writing CSVs - that's why this behaviour should be the default.

If you want to force it to use an empty string instead, simply specify
some unlikely value for NULL, like 'frobnitz'. But if you do, be prepared
for lots of errors unless you are importing into fields where empty string
is a valid text value.

Of course, a NOT NULL constraint will also break things, unless we went to
the MySQL method of handling insertion of NULL into a NOT NULL field ...
no I really am kidding.

cheers

andrew



Re: COPY for CSV documentation

From
Bruno Wolff III
Date:
On Mon, Apr 12, 2004 at 08:07:12 -0400,
  Andrew Dunstan <andrew@dunslane.net> wrote:
>
> Otherwise, what will happen when we try to import into some non-text field
> for which '' is not a valid value?

I would expect the copy to fail as it does normally.

> Spreadsheets commonly represent missing values as empty strings when
> reading/writing CSVs - that's why this behaviour should be the default.

That doesn't strike me as an overwhelming reason to do it that way.

> If you want to force it to use an empty string instead, simply specify
> some unlikely value for NULL, like 'frobnitz'. But if you do, be prepared
> for lots of errors unless you are importing into fields where empty string
> is a valid text value.

This concerns me. If NULL is on by default, there should be some way to
turn it off, not to change it to some value you hope won't show up in
the input stream.

Re: COPY for CSV documentation

From
Tom Lane
Date:
Bruno Wolff III <bruno@wolff.to> writes:
> On Mon, Apr 12, 2004 at 02:26:14 -0400,
>   Andrew Dunstan <andrew@dunslane.net> wrote:
>> a few points:
>> . in CSV mode, NULL should default to '' - that was in what I sent in.

> Postgres normally treats an empty string as an empty string. Are you sure
> you really want it to be treated as a NULL by default in this one place?

I think that's a spectacularly bad idea too.  People who really want
that can write "NULL ''", but it shouldn't be implied by CSV mode.

            regards, tom lane

Re: COPY for CSV documentation

From
Andrew Dunstan
Date:
Tom Lane wrote:

>Bruno Wolff III <bruno@wolff.to> writes:
>
>
>>On Mon, Apr 12, 2004 at 02:26:14 -0400,
>>  Andrew Dunstan <andrew@dunslane.net> wrote:
>>
>>
>>>a few points:
>>>. in CSV mode, NULL should default to '' - that was in what I sent in.
>>>
>>>
>
>
>
>>Postgres normally treats an empty string as an empty string. Are you sure
>>you really want it to be treated as a NULL by default in this one place?
>>
>>
>
>I think that's a spectacularly bad idea too.  People who really want
>that can write "NULL ''", but it shouldn't be implied by CSV mode.
>
>
>

Spectacularly? Hmm.

My approach was that the default should be the most common case. Perhaps
on import it's a tossup, but on export a CSV containing lots of \N cells
is likely to be ... unexpected.

But, honestly, it's not worth dying in a ditch over.

cheers

andrew

Re: COPY for CSV documentation

From
Bruce Momjian
Date:
Andrew Dunstan wrote:
> Tom Lane wrote:
>
> >Bruno Wolff III <bruno@wolff.to> writes:
> >
> >
> >>On Mon, Apr 12, 2004 at 02:26:14 -0400,
> >>  Andrew Dunstan <andrew@dunslane.net> wrote:
> >>
> >>
> >>>a few points:
> >>>. in CSV mode, NULL should default to '' - that was in what I sent in.
> >>>
> >>>
> >
> >
> >
> >>Postgres normally treats an empty string as an empty string. Are you sure
> >>you really want it to be treated as a NULL by default in this one place?
> >>
> >>
> >
> >I think that's a spectacularly bad idea too.  People who really want
> >that can write "NULL ''", but it shouldn't be implied by CSV mode.
> >
> >
> >
>
> Spectacularly? Hmm.
>
> My approach was that the default should be the most common case. Perhaps
> on import it's a tossup, but on export a CSV containing lots of \N cells
> is likely to be ... unexpected.
>
> But, honestly, it's not worth dying in a ditch over.

It is my understanding that \N is a valid column value (no backslash
escape in CSV, right?), so we can't use it for NULL.

The only thing I can think of is for NULL to be:

    ,,

(no quotes) and a zero-length string to be:

    ,"",

How do most applications handle those two cases?  If they accept either,
can we use that so we can read our own CSV files without losing the NULL
specification?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: COPY for CSV documentation

From
Bruno Wolff III
Date:
On Mon, Apr 12, 2004 at 10:30:22 -0400,
  Bruce Momjian <pgman@candle.pha.pa.us> wrote:
>
> It is my understanding that \N is a valid column value (no backslash
> escape in CSV, right?), so we can't use it for NULL.
> The only thing I can think of is for NULL to be:
>
>     ,,
>
> (no quotes) and a zero-length string to be:
>
>     ,"",
>
> How do most applications handle those two cases?  If they accept either,
> can we use that so we can read our own CSV files without losing the NULL
> specification?

I think the above are going to be treated as equvialent by most CSV parsers.

There doesn't seem to be a standard for CSV. From what I found describing
it, there isn't any feature for distinguishing NULLs from empty strings.
So whatever gets done is going to be application specific.

Re: COPY for CSV documentation

From
Bruce Momjian
Date:
Bruno Wolff III wrote:
> On Mon, Apr 12, 2004 at 10:30:22 -0400,
>   Bruce Momjian <pgman@candle.pha.pa.us> wrote:
> >
> > It is my understanding that \N is a valid column value (no backslash
> > escape in CSV, right?), so we can't use it for NULL.
> > The only thing I can think of is for NULL to be:
> >
> >     ,,
> >
> > (no quotes) and a zero-length string to be:
> >
> >     ,"",
> >
> > How do most applications handle those two cases?  If they accept either,
> > can we use that so we can read our own CSV files without losing the NULL
> > specification?
>
> I think the above are going to be treated as equvialent by most CSV parsers.
>
> There doesn't seem to be a standard for CSV. From what I found describing
> it, there isn't any feature for distinguishing NULLs from empty strings.
> So whatever gets done is going to be application specific.

I am thinking we could enable this NULL handling by default, and allow
it to be over-ridden with the NULL keyword.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: COPY for CSV documentation

From
Andrew Dunstan
Date:
Bruce Momjian wrote:

>It is my understanding that \N is a valid column value (no backslash
>escape in CSV, right?), so we can't use it for NULL.
>
>

\ is not conventionally magical in CSVs in the general case. That
doesn't mean we couldn't use \N, but to me it would violate the
principle of least surprise - no other application will interpret it in
any special way, and the whole point about this facility is exchanging
data with other applications.

>The only thing I can think of is for NULL to be:
>
>    ,,
>
>(no quotes) and a zero-length string to be:
>
>    ,"",
>
>How do most applications handle those two cases?  If they accept either,
>can we use that so we can read our own CSV files without losing the NULL
>specification?
>
>
>

In fact, in the patch I sent in, no quoted string is marked as null when
being read (so even if you use \N as the null marker, "\N" will be that
literal and not a null marker). And the null marker, whatever it is,
should be made quote safe by us throwing an error if it contains the
quote marker, just as we now make sure that the null marker is
delimiter-safe.

I will check on the write behaviour - it might need ammending too.

I'll submit a revised patch based on the original syntax scheme, and
then you (Bruce) can make the syntax/psql changes that seem to be agreed
on now - is that ok?

The default NULL value issue can be determined at the end of any
exhaustive debate we have - in the end it's a one line code change ;-)

cheers

andrew



Re: COPY for CSV documentation

From
Bruce Momjian
Date:
Andrew Dunstan wrote:
> In fact, in the patch I sent in, no quoted string is marked as null when
> being read (so even if you use \N as the null marker, "\N" will be that
> literal and not a null marker). And the null marker, whatever it is,
> should be made quote safe by us throwing an error if it contains the
> quote marker, just as we now make sure that the null marker is
> delimiter-safe.

What value does an int column get if the input file has ',,'.  Don't
tell me zero?  :-)  Error?

> I will check on the write behaviour - it might need ammending too.
>
> I'll submit a revised patch based on the original syntax scheme, and
> then you (Bruce) can make the syntax/psql changes that seem to be agreed
> on now - is that ok?

OK, go as far as you want and post it.  I will turn around a new patch
in a few hours after you post.

> The default NULL value issue can be determined at the end of any
> exhaustive debate we have - in the end it's a one line code change ;-)

Agreed.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: COPY for CSV documentation

From
Andrew Dunstan
Date:
Bruce Momjian wrote:

>Andrew Dunstan wrote:
>
>
>>In fact, in the patch I sent in, no quoted string is marked as null when
>>being read (so even if you use \N as the null marker, "\N" will be that
>>literal and not a null marker). And the null marker, whatever it is,
>>should be made quote safe by us throwing an error if it contains the
>>quote marker, just as we now make sure that the null marker is
>>delimiter-safe.
>>
>>
>
>What value does an int column get if the input file has ',,'.  Don't
>tell me zero?  :-)  Error?
>
>

If the null marker is not an empty string, it gets an error, of course -
if it is it gets a null:

[andrew@marmaduke pginst]$ echo ',,' | bin/psql -c "create temp table
foo (a int, b text, c text); copy foo from stdin delimiter ',\"' null
'\\\\N';"
ERROR:  invalid input syntax for integer: ""
CONTEXT:  COPY foo, line 1, column a: ""
[andrew@marmaduke pginst]$ echo ',,' | bin/psql -c "create temp table
foo (a int, b text, c text); copy foo from stdin delimiter ',\"' ;"
[andrew@marmaduke pginst]$


I hope that is expected behaviour - it's what *I* expect, at least.


>
>
>>I will check on the write behaviour - it might need ammending too.
>>
>>I'll submit a revised patch based on the original syntax scheme, and
>>then you (Bruce) can make the syntax/psql changes that seem to be agreed
>>on now - is that ok?
>>
>>
>
>OK, go as far as you want and post it.  I will turn around a new patch
>in a few hours after you post.
>
>
>
>>The default NULL value issue can be determined at the end of any
>>exhaustive debate we have - in the end it's a one line code change ;-)
>>
>>
>
>Agreed.
>
>
>

Attached patch has these additions to previously posted patch:
. quote character may not appear in NULL marker
. any non-null value that matches the NULL marker is forced to be quoted
when written.


cheers

andrew

Index: src/backend/commands/copy.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/commands/copy.c,v
retrieving revision 1.219
diff -c -r1.219 copy.c
*** src/backend/commands/copy.c    6 Apr 2004 13:21:33 -0000    1.219
--- src/backend/commands/copy.c    12 Apr 2004 16:21:33 -0000
***************
*** 70,76 ****
  typedef enum CopyReadResult
  {
      NORMAL_ATTR,
!     END_OF_LINE
  } CopyReadResult;

  /*
--- 70,77 ----
  typedef enum CopyReadResult
  {
      NORMAL_ATTR,
!     END_OF_LINE,
!     UNTERMINATED_FIELD
  } CopyReadResult;

  /*
***************
*** 136,144 ****
--- 137,148 ----
  static bool CopyReadLine(void);
  static char *CopyReadAttribute(const char *delim, const char *null_print,
                                 CopyReadResult *result, bool *isnull);
+ static char *CopyReadAttributeCSV(const char *delim, const char *null_print,
+                                CopyReadResult *result, bool *isnull);
  static Datum CopyReadBinaryAttribute(int column_no, FmgrInfo *flinfo,
                          Oid typelem, bool *isnull);
  static void CopyAttributeOut(char *string, char *delim);
+ static void CopyAttributeOutCSV(char *string, char *delim, bool force_quote);
  static List *CopyGetAttnums(Relation rel, List *attnamelist);
  static void limit_printout_length(StringInfo buf);

***************
*** 682,687 ****
--- 686,692 ----
      List       *attnumlist;
      bool        binary = false;
      bool        oids = false;
+     bool        csv_mode = false;
      char       *delim = NULL;
      char       *null_print = NULL;
      Relation    rel;
***************
*** 744,751 ****
      if (!delim)
          delim = "\t";

      if (!null_print)
!         null_print = "\\N";

      /*
       * Open and lock the relation, using the appropriate lock type.
--- 749,759 ----
      if (!delim)
          delim = "\t";

+     if (strlen(delim) > 1)
+         csv_mode = true;
+
      if (!null_print)
!         null_print = csv_mode ? "" : "\\N";

      /*
       * Open and lock the relation, using the appropriate lock type.
***************
*** 772,783 ****
                         "psql's \\copy command also works for anyone.")));

      /*
!      * Presently, only single-character delimiter strings are supported.
       */
!     if (strlen(delim) != 1)
          ereport(ERROR,
                  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                   errmsg("COPY delimiter must be a single character")));

      /*
       * Don't allow the delimiter to appear in the null string.
--- 780,806 ----
                         "psql's \\copy command also works for anyone.")));

      /*
!      * Only single-character delimiter strings are supported,
!      * except in CSV mode, where the string must be
!      * delimiter-char quote-char [escape-char]
       */
!     if (!csv_mode && strlen(delim) != 1)
          ereport(ERROR,
                  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                   errmsg("COPY delimiter must be a single character")));
+     else if (csv_mode)
+     {
+         if(strlen(delim) > 3)
+             ereport(ERROR,
+                 (errcode(ERRCODE_SYNTAX_ERROR),
+                  errmsg("COPY delimiters for CSV must be a 2 or 3 characters")));
+         if (delim[0] == delim[1] ||
+             (strlen(delim) == 3 && delim[0] == delim[2]))
+             ereport(ERROR,
+                 (errcode(ERRCODE_SYNTAX_ERROR),
+                  errmsg("CSV delimiter character must not be same as quote character or escape character")));
+
+     }

      /*
       * Don't allow the delimiter to appear in the null string.
***************
*** 788,793 ****
--- 811,833 ----
                   errmsg("COPY delimiter must not appear in the NULL specification")));

      /*
+      * Don't allow the csv quote char to appear in the null string.
+      */
+     if (csv_mode && strchr(null_print, delim[1]) != NULL)
+         ereport(ERROR,
+                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                  errmsg("CSV quote character  must not appear in the NULL specification")));
+
+     /*
+      * Don't allow OIDs in CSV mode
+      */
+
+     if (csv_mode && oids)
+         ereport(ERROR,
+                 (errcode(ERRCODE_SYNTAX_ERROR),
+                  errmsg("Cannot specify OIDS in CSV mode ")));
+
+     /*
       * Don't allow COPY w/ OIDs to or from a table without them
       */
      if (oids && !rel->rd_rel->relhasoids)
***************
*** 969,974 ****
--- 1009,1015 ----
      FmgrInfo   *out_functions;
      Oid           *elements;
      bool       *isvarlena;
+     bool       csv_mode;
      char       *string;
      Snapshot    mySnapshot;
      List       *cur;
***************
*** 979,984 ****
--- 1020,1026 ----
      attr = tupDesc->attrs;
      num_phys_attrs = tupDesc->natts;
      attr_count = length(attnumlist);
+     csv_mode = (strlen(delim) > 1);

      /*
       * Get info about the columns we need to process.
***************
*** 1051,1057 ****
      while ((tuple = heap_getnext(scandesc, ForwardScanDirection)) != NULL)
      {
          bool        need_delim = false;
-
          CHECK_FOR_INTERRUPTS();

          MemoryContextReset(mycontext);
--- 1093,1098 ----
***************
*** 1113,1119 ****
                                                             value,
                                    ObjectIdGetDatum(elements[attnum - 1]),
                              Int32GetDatum(attr[attnum - 1]->atttypmod)));
!                     CopyAttributeOut(string, delim);
                  }
                  else
                  {
--- 1154,1167 ----
                                                             value,
                                    ObjectIdGetDatum(elements[attnum - 1]),
                              Int32GetDatum(attr[attnum - 1]->atttypmod)));
!                     if (csv_mode)
!                     {
!                         bool force_quote = (strcmp(string,null_print) == 0);
!                         CopyAttributeOutCSV(string, delim, force_quote);
!                     }
!                     else
!                         CopyAttributeOut(string, delim);
!
                  }
                  else
                  {
***************
*** 1263,1268 ****
--- 1311,1317 ----
      Datum       *values;
      char       *nulls;
      bool        done = false;
+     bool        csv_mode;
      bool        isnull;
      ResultRelInfo *resultRelInfo;
      EState       *estate = CreateExecutorState(); /* for ExecConstraints() */
***************
*** 1280,1285 ****
--- 1329,1335 ----
      num_phys_attrs = tupDesc->natts;
      attr_count = length(attnumlist);
      num_defaults = 0;
+     csv_mode = (strlen(delim) > 1);

      /*
       * We need a ResultRelInfo so we can use the regular executor's
***************
*** 1499,1504 ****
--- 1549,1555 ----

              if (file_has_oids)
              {
+                 /* can't be in CSV mode here */
                  string = CopyReadAttribute(delim, null_print,
                                             &result, &isnull);

***************
*** 1537,1544 ****
                               errmsg("missing data for column \"%s\"",
                                      NameStr(attr[m]->attname))));

!                 string = CopyReadAttribute(delim, null_print,
!                                            &result, &isnull);

                  if (isnull)
                  {
--- 1588,1608 ----
                               errmsg("missing data for column \"%s\"",
                                      NameStr(attr[m]->attname))));

!                 if (csv_mode)
!                 {
!                     string = CopyReadAttributeCSV(delim, null_print,
!                                                   &result, &isnull);
!                     if (result == UNTERMINATED_FIELD)
!                         ereport(ERROR,
!                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                                  errmsg("unterminated CSV quoted field")));
!                 }
!                 else
!                 {
!                     string = CopyReadAttribute(delim, null_print,
!                                                &result, &isnull);
!                 }
!

                  if (isnull)
                  {
***************
*** 2069,2074 ****
--- 2133,2288 ----
      return attribute_buf.data;
  }

+
+ /*
+  * Read the value of a single attribute in CSV mode,
+  * performing de-escaping as needed. Escaping does not follow the normal
+  * PostgreSQL text mode, but instead "standard" (i.e. common) CSV usage.
+  *
+  * Quoted fields can span lines, in which case the line end is embedded
+  * in the returned string.
+  *
+  * delim is a 2- or 3-character string. The first character is the
+  * field delimiter, the second the quote character, the third is the
+  * escape character indise quotes, and defaults to the quote character.
+  *
+  * null_print is the null marker string.  Note that this is compared to
+  * the pre-de-escaped input string (thus if it is quoted it is not a NULL).
+  *
+  * *result is set to indicate what terminated the read:
+  *        NORMAL_ATTR:    column delimiter
+  *        END_OF_LINE:    end of line
+  *      UNTERMINATED_FIELD no quote detected at end of a quoted field
+  *
+  * In any case, the string read up to the terminator (or end of file)
+  * is returned.
+  *
+  * *isnull is set true or false depending on whether the input matched
+  * the null marker.  Note that the caller cannot check this since the
+  * returned string will be the post-de-escaping equivalent, which may
+  * look the same as some valid data string.
+  *----------
+  */
+
+ static char *
+ CopyReadAttributeCSV(const char *delim, const char *null_print,
+                   CopyReadResult *result, bool *isnull)
+ {
+     char        delimc = delim[0];
+     char        quotec = delim[1];
+     char        escapec = delim[2] ? delim[2] : delim[1];
+     char        c;
+     int            start_cursor = line_buf.cursor;
+     int            end_cursor = start_cursor;;
+     int            input_len;
+     bool        in_quote = false;
+     bool        saw_quote = false;
+
+     /* reset attribute_buf to empty */
+     attribute_buf.len = 0;
+     attribute_buf.data[0] = '\0';
+
+     /* set default status */
+     *result = END_OF_LINE;
+
+     for (;;)
+     {
+         /* handle multiline quoted fields */
+         if (in_quote && line_buf.cursor >= line_buf.len)
+         {
+             bool done;
+
+             switch(eol_type)
+             {
+                 case EOL_NL:
+                     appendStringInfoString(&attribute_buf,"\n");
+                     break;
+                 case EOL_CR:
+                     appendStringInfoString(&attribute_buf,"\r");
+                     break;
+                 case EOL_CRNL:
+                     appendStringInfoString(&attribute_buf,"\r\n");
+                     break;
+                 case EOL_UNKNOWN:
+                     /* shouldn't happen - just keep going */
+                     break;
+             }
+
+             copy_lineno++;
+             done = CopyReadLine();
+             if (done && line_buf.len == 0)
+                 break;
+             start_cursor = line_buf.cursor;
+         }
+
+         end_cursor = line_buf.cursor;
+         if (line_buf.cursor >= line_buf.len)
+             break;
+         c = line_buf.data[line_buf.cursor++];
+         /*
+          * unquoted field delimiter
+          */
+         if (!in_quote && c == delimc)
+         {
+             *result = NORMAL_ATTR;
+             break;
+         }
+         /*
+          * start of quoted field (or part of field)
+          */
+         if (!in_quote && c == quotec)
+         {
+             saw_quote = true;
+             in_quote = true;
+             continue;
+         }
+         /*
+          * escape within a quoted field
+          */
+         if (in_quote && c == escapec)
+         {
+             /*
+              * peek at the next char if available, and escape it if it
+              * is an escape char or a quote char
+              */
+             if (line_buf.cursor <= line_buf.len)
+             {
+                 char nextc = line_buf.data[line_buf.cursor];
+                 if (nextc == escapec || nextc == quotec)
+                 {
+                     appendStringInfoCharMacro(&attribute_buf, nextc);
+                     line_buf.cursor++;
+                     continue;
+                 }
+             }
+         }
+         /*
+          * end of quoted field.
+          * Must do this test after testing for escape in case quote char
+          * and escape char are the same (which is the common case).
+          */
+         if(in_quote && c == quotec)
+         {
+             in_quote = false;
+             continue;
+         }
+         appendStringInfoCharMacro(&attribute_buf, c);
+     }
+
+     if (in_quote)
+         *result = UNTERMINATED_FIELD;
+
+     /* check whether raw input matched null marker */
+     input_len = end_cursor - start_cursor;
+     if (!saw_quote && input_len == strlen(null_print) &&
+         strncmp(&line_buf.data[start_cursor], null_print, input_len) == 0)
+         *isnull = true;
+     else
+         *isnull = false;
+
+     return attribute_buf.data;
+ }
+
  /*
   * Read a binary attribute
   */
***************
*** 2192,2197 ****
--- 2406,2479 ----
                  break;
          }
      }
+ }
+
+ /*
+  * Send CSV representation of one attribute, with conversion and
+  * CSV type escaping
+  */
+ static void
+ CopyAttributeOutCSV(char *server_string, char *delim, bool force_quote)
+ {
+     char       *string;
+     char        c;
+     char        delimc = delim[0];
+     char        quotec = delim[1];
+      char        escapec = delim[2] ? delim[2] : delim[1];
+     bool        need_quote = force_quote;
+     char        *test_string;
+     bool        same_encoding;
+     int            mblen;
+     int            i;
+
+     same_encoding = (server_encoding == client_encoding);
+     if (!same_encoding)
+         string = (char *) pg_server_to_client((unsigned char *) server_string,
+                                               strlen(server_string));
+     else
+         string = server_string;
+
+     /* have to run through the string twice,
+      * first time to see if it needs quoting, second to actually send it
+      */
+
+     for(test_string = string;
+         !need_quote && (c = *test_string) != '\0';
+         test_string += mblen)
+     {
+         if (c == delimc || c == quotec || c == '\n' || c == '\r')
+         {
+             need_quote = true;
+         }
+         if (!same_encoding)
+             mblen = pg_encoding_mblen(client_encoding, test_string);
+         else
+             mblen = 1;
+     }
+
+     if (need_quote)
+         CopySendChar(quotec);
+
+     for (; (c = *string) != '\0'; string += mblen)
+     {
+         if (c == quotec || c == escapec)
+             CopySendChar(escapec);
+
+         CopySendChar(c);
+
+         if (!same_encoding)
+         {
+             /* send additional bytes of the char, if any */
+             mblen = pg_encoding_mblen(client_encoding, string);
+             for (i = 1; i < mblen; i++)
+                 CopySendChar(string[i]);
+         }
+         else
+             mblen = 1;
+     }
+
+     if (need_quote)
+         CopySendChar(quotec);
  }

  /*

Re: COPY for CSV documentation

From
Bruce Momjian
Date:
Andrew Dunstan wrote:
> Bruce Momjian wrote:
>
> >Andrew Dunstan wrote:
> >
> >
> >>In fact, in the patch I sent in, no quoted string is marked as null when
> >>being read (so even if you use \N as the null marker, "\N" will be that
> >>literal and not a null marker). And the null marker, whatever it is,
> >>should be made quote safe by us throwing an error if it contains the
> >>quote marker, just as we now make sure that the null marker is
> >>delimiter-safe.
> >>
> >>
> >
> >What value does an int column get if the input file has ',,'.  Don't
> >tell me zero?  :-)  Error?
> >
> >
>
> If the null marker is not an empty string, it gets an error, of course -
> if it is it gets a null:
>
> [andrew@marmaduke pginst]$ echo ',,' | bin/psql -c "create temp table
> foo (a int, b text, c text); copy foo from stdin delimiter ',\"' null
> '\\\\N';"
> ERROR:  invalid input syntax for integer: ""
> CONTEXT:  COPY foo, line 1, column a: ""
> [andrew@marmaduke pginst]$ echo ',,' | bin/psql -c "create temp table
> foo (a int, b text, c text); copy foo from stdin delimiter ',\"' ;"
> [andrew@marmaduke pginst]$
>
>
> I hope that is expected behaviour - it's what *I* expect, at least.
>

Yea, sorry, I misread your "no quoted string is marked as null" as not
handling ,, as null.

> Attached patch has these additions to previously posted patch:
> . quote character may not appear in NULL marker
> . any non-null value that matches the NULL marker is forced to be quoted
> when written.

OK, let me work on this now and repost.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: COPY for CSV documentation

From
Bruce Momjian
Date:
FYI, this CVS is turning into quite a job, but doing it right takes this
kind of effort.

---------------------------------------------------------------------------

Andrew Dunstan wrote:
> Bruce Momjian wrote:
>
> >Andrew Dunstan wrote:
> >
> >
> >>In fact, in the patch I sent in, no quoted string is marked as null when
> >>being read (so even if you use \N as the null marker, "\N" will be that
> >>literal and not a null marker). And the null marker, whatever it is,
> >>should be made quote safe by us throwing an error if it contains the
> >>quote marker, just as we now make sure that the null marker is
> >>delimiter-safe.
> >>
> >>
> >
> >What value does an int column get if the input file has ',,'.  Don't
> >tell me zero?  :-)  Error?
> >
> >
>
> If the null marker is not an empty string, it gets an error, of course -
> if it is it gets a null:
>
> [andrew@marmaduke pginst]$ echo ',,' | bin/psql -c "create temp table
> foo (a int, b text, c text); copy foo from stdin delimiter ',\"' null
> '\\\\N';"
> ERROR:  invalid input syntax for integer: ""
> CONTEXT:  COPY foo, line 1, column a: ""
> [andrew@marmaduke pginst]$ echo ',,' | bin/psql -c "create temp table
> foo (a int, b text, c text); copy foo from stdin delimiter ',\"' ;"
> [andrew@marmaduke pginst]$
>
>
> I hope that is expected behaviour - it's what *I* expect, at least.
>
>
> >
> >
> >>I will check on the write behaviour - it might need ammending too.
> >>
> >>I'll submit a revised patch based on the original syntax scheme, and
> >>then you (Bruce) can make the syntax/psql changes that seem to be agreed
> >>on now - is that ok?
> >>
> >>
> >
> >OK, go as far as you want and post it.  I will turn around a new patch
> >in a few hours after you post.
> >
> >
> >
> >>The default NULL value issue can be determined at the end of any
> >>exhaustive debate we have - in the end it's a one line code change ;-)
> >>
> >>
> >
> >Agreed.
> >
> >
> >
>
> Attached patch has these additions to previously posted patch:
> . quote character may not appear in NULL marker
> . any non-null value that matches the NULL marker is forced to be quoted
> when written.
>
>
> cheers
>
> andrew
>

> Index: src/backend/commands/copy.c
> ===================================================================
> RCS file: /projects/cvsroot/pgsql-server/src/backend/commands/copy.c,v
> retrieving revision 1.219
> diff -c -r1.219 copy.c
> *** src/backend/commands/copy.c    6 Apr 2004 13:21:33 -0000    1.219
> --- src/backend/commands/copy.c    12 Apr 2004 16:21:33 -0000
> ***************
> *** 70,76 ****
>   typedef enum CopyReadResult
>   {
>       NORMAL_ATTR,
> !     END_OF_LINE
>   } CopyReadResult;
>
>   /*
> --- 70,77 ----
>   typedef enum CopyReadResult
>   {
>       NORMAL_ATTR,
> !     END_OF_LINE,
> !     UNTERMINATED_FIELD
>   } CopyReadResult;
>
>   /*
> ***************
> *** 136,144 ****
> --- 137,148 ----
>   static bool CopyReadLine(void);
>   static char *CopyReadAttribute(const char *delim, const char *null_print,
>                                  CopyReadResult *result, bool *isnull);
> + static char *CopyReadAttributeCSV(const char *delim, const char *null_print,
> +                                CopyReadResult *result, bool *isnull);
>   static Datum CopyReadBinaryAttribute(int column_no, FmgrInfo *flinfo,
>                           Oid typelem, bool *isnull);
>   static void CopyAttributeOut(char *string, char *delim);
> + static void CopyAttributeOutCSV(char *string, char *delim, bool force_quote);
>   static List *CopyGetAttnums(Relation rel, List *attnamelist);
>   static void limit_printout_length(StringInfo buf);
>
> ***************
> *** 682,687 ****
> --- 686,692 ----
>       List       *attnumlist;
>       bool        binary = false;
>       bool        oids = false;
> +     bool        csv_mode = false;
>       char       *delim = NULL;
>       char       *null_print = NULL;
>       Relation    rel;
> ***************
> *** 744,751 ****
>       if (!delim)
>           delim = "\t";
>
>       if (!null_print)
> !         null_print = "\\N";
>
>       /*
>        * Open and lock the relation, using the appropriate lock type.
> --- 749,759 ----
>       if (!delim)
>           delim = "\t";
>
> +     if (strlen(delim) > 1)
> +         csv_mode = true;
> +
>       if (!null_print)
> !         null_print = csv_mode ? "" : "\\N";
>
>       /*
>        * Open and lock the relation, using the appropriate lock type.
> ***************
> *** 772,783 ****
>                          "psql's \\copy command also works for anyone.")));
>
>       /*
> !      * Presently, only single-character delimiter strings are supported.
>        */
> !     if (strlen(delim) != 1)
>           ereport(ERROR,
>                   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>                    errmsg("COPY delimiter must be a single character")));
>
>       /*
>        * Don't allow the delimiter to appear in the null string.
> --- 780,806 ----
>                          "psql's \\copy command also works for anyone.")));
>
>       /*
> !      * Only single-character delimiter strings are supported,
> !      * except in CSV mode, where the string must be
> !      * delimiter-char quote-char [escape-char]
>        */
> !     if (!csv_mode && strlen(delim) != 1)
>           ereport(ERROR,
>                   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>                    errmsg("COPY delimiter must be a single character")));
> +     else if (csv_mode)
> +     {
> +         if(strlen(delim) > 3)
> +             ereport(ERROR,
> +                 (errcode(ERRCODE_SYNTAX_ERROR),
> +                  errmsg("COPY delimiters for CSV must be a 2 or 3 characters")));
> +         if (delim[0] == delim[1] ||
> +             (strlen(delim) == 3 && delim[0] == delim[2]))
> +             ereport(ERROR,
> +                 (errcode(ERRCODE_SYNTAX_ERROR),
> +                  errmsg("CSV delimiter character must not be same as quote character or escape character")));
> +
> +     }
>
>       /*
>        * Don't allow the delimiter to appear in the null string.
> ***************
> *** 788,793 ****
> --- 811,833 ----
>                    errmsg("COPY delimiter must not appear in the NULL specification")));
>
>       /*
> +      * Don't allow the csv quote char to appear in the null string.
> +      */
> +     if (csv_mode && strchr(null_print, delim[1]) != NULL)
> +         ereport(ERROR,
> +                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                  errmsg("CSV quote character  must not appear in the NULL specification")));
> +
> +     /*
> +      * Don't allow OIDs in CSV mode
> +      */
> +
> +     if (csv_mode && oids)
> +         ereport(ERROR,
> +                 (errcode(ERRCODE_SYNTAX_ERROR),
> +                  errmsg("Cannot specify OIDS in CSV mode ")));
> +
> +     /*
>        * Don't allow COPY w/ OIDs to or from a table without them
>        */
>       if (oids && !rel->rd_rel->relhasoids)
> ***************
> *** 969,974 ****
> --- 1009,1015 ----
>       FmgrInfo   *out_functions;
>       Oid           *elements;
>       bool       *isvarlena;
> +     bool       csv_mode;
>       char       *string;
>       Snapshot    mySnapshot;
>       List       *cur;
> ***************
> *** 979,984 ****
> --- 1020,1026 ----
>       attr = tupDesc->attrs;
>       num_phys_attrs = tupDesc->natts;
>       attr_count = length(attnumlist);
> +     csv_mode = (strlen(delim) > 1);
>
>       /*
>        * Get info about the columns we need to process.
> ***************
> *** 1051,1057 ****
>       while ((tuple = heap_getnext(scandesc, ForwardScanDirection)) != NULL)
>       {
>           bool        need_delim = false;
> -
>           CHECK_FOR_INTERRUPTS();
>
>           MemoryContextReset(mycontext);
> --- 1093,1098 ----
> ***************
> *** 1113,1119 ****
>                                                              value,
>                                     ObjectIdGetDatum(elements[attnum - 1]),
>                               Int32GetDatum(attr[attnum - 1]->atttypmod)));
> !                     CopyAttributeOut(string, delim);
>                   }
>                   else
>                   {
> --- 1154,1167 ----
>                                                              value,
>                                     ObjectIdGetDatum(elements[attnum - 1]),
>                               Int32GetDatum(attr[attnum - 1]->atttypmod)));
> !                     if (csv_mode)
> !                     {
> !                         bool force_quote = (strcmp(string,null_print) == 0);
> !                         CopyAttributeOutCSV(string, delim, force_quote);
> !                     }
> !                     else
> !                         CopyAttributeOut(string, delim);
> !
>                   }
>                   else
>                   {
> ***************
> *** 1263,1268 ****
> --- 1311,1317 ----
>       Datum       *values;
>       char       *nulls;
>       bool        done = false;
> +     bool        csv_mode;
>       bool        isnull;
>       ResultRelInfo *resultRelInfo;
>       EState       *estate = CreateExecutorState(); /* for ExecConstraints() */
> ***************
> *** 1280,1285 ****
> --- 1329,1335 ----
>       num_phys_attrs = tupDesc->natts;
>       attr_count = length(attnumlist);
>       num_defaults = 0;
> +     csv_mode = (strlen(delim) > 1);
>
>       /*
>        * We need a ResultRelInfo so we can use the regular executor's
> ***************
> *** 1499,1504 ****
> --- 1549,1555 ----
>
>               if (file_has_oids)
>               {
> +                 /* can't be in CSV mode here */
>                   string = CopyReadAttribute(delim, null_print,
>                                              &result, &isnull);
>
> ***************
> *** 1537,1544 ****
>                                errmsg("missing data for column \"%s\"",
>                                       NameStr(attr[m]->attname))));
>
> !                 string = CopyReadAttribute(delim, null_print,
> !                                            &result, &isnull);
>
>                   if (isnull)
>                   {
> --- 1588,1608 ----
>                                errmsg("missing data for column \"%s\"",
>                                       NameStr(attr[m]->attname))));
>
> !                 if (csv_mode)
> !                 {
> !                     string = CopyReadAttributeCSV(delim, null_print,
> !                                                   &result, &isnull);
> !                     if (result == UNTERMINATED_FIELD)
> !                         ereport(ERROR,
> !                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> !                                  errmsg("unterminated CSV quoted field")));
> !                 }
> !                 else
> !                 {
> !                     string = CopyReadAttribute(delim, null_print,
> !                                                &result, &isnull);
> !                 }
> !
>
>                   if (isnull)
>                   {
> ***************
> *** 2069,2074 ****
> --- 2133,2288 ----
>       return attribute_buf.data;
>   }
>
> +
> + /*
> +  * Read the value of a single attribute in CSV mode,
> +  * performing de-escaping as needed. Escaping does not follow the normal
> +  * PostgreSQL text mode, but instead "standard" (i.e. common) CSV usage.
> +  *
> +  * Quoted fields can span lines, in which case the line end is embedded
> +  * in the returned string.
> +  *
> +  * delim is a 2- or 3-character string. The first character is the
> +  * field delimiter, the second the quote character, the third is the
> +  * escape character indise quotes, and defaults to the quote character.
> +  *
> +  * null_print is the null marker string.  Note that this is compared to
> +  * the pre-de-escaped input string (thus if it is quoted it is not a NULL).
> +  *
> +  * *result is set to indicate what terminated the read:
> +  *        NORMAL_ATTR:    column delimiter
> +  *        END_OF_LINE:    end of line
> +  *      UNTERMINATED_FIELD no quote detected at end of a quoted field
> +  *
> +  * In any case, the string read up to the terminator (or end of file)
> +  * is returned.
> +  *
> +  * *isnull is set true or false depending on whether the input matched
> +  * the null marker.  Note that the caller cannot check this since the
> +  * returned string will be the post-de-escaping equivalent, which may
> +  * look the same as some valid data string.
> +  *----------
> +  */
> +
> + static char *
> + CopyReadAttributeCSV(const char *delim, const char *null_print,
> +                   CopyReadResult *result, bool *isnull)
> + {
> +     char        delimc = delim[0];
> +     char        quotec = delim[1];
> +     char        escapec = delim[2] ? delim[2] : delim[1];
> +     char        c;
> +     int            start_cursor = line_buf.cursor;
> +     int            end_cursor = start_cursor;;
> +     int            input_len;
> +     bool        in_quote = false;
> +     bool        saw_quote = false;
> +
> +     /* reset attribute_buf to empty */
> +     attribute_buf.len = 0;
> +     attribute_buf.data[0] = '\0';
> +
> +     /* set default status */
> +     *result = END_OF_LINE;
> +
> +     for (;;)
> +     {
> +         /* handle multiline quoted fields */
> +         if (in_quote && line_buf.cursor >= line_buf.len)
> +         {
> +             bool done;
> +
> +             switch(eol_type)
> +             {
> +                 case EOL_NL:
> +                     appendStringInfoString(&attribute_buf,"\n");
> +                     break;
> +                 case EOL_CR:
> +                     appendStringInfoString(&attribute_buf,"\r");
> +                     break;
> +                 case EOL_CRNL:
> +                     appendStringInfoString(&attribute_buf,"\r\n");
> +                     break;
> +                 case EOL_UNKNOWN:
> +                     /* shouldn't happen - just keep going */
> +                     break;
> +             }
> +
> +             copy_lineno++;
> +             done = CopyReadLine();
> +             if (done && line_buf.len == 0)
> +                 break;
> +             start_cursor = line_buf.cursor;
> +         }
> +
> +         end_cursor = line_buf.cursor;
> +         if (line_buf.cursor >= line_buf.len)
> +             break;
> +         c = line_buf.data[line_buf.cursor++];
> +         /*
> +          * unquoted field delimiter
> +          */
> +         if (!in_quote && c == delimc)
> +         {
> +             *result = NORMAL_ATTR;
> +             break;
> +         }
> +         /*
> +          * start of quoted field (or part of field)
> +          */
> +         if (!in_quote && c == quotec)
> +         {
> +             saw_quote = true;
> +             in_quote = true;
> +             continue;
> +         }
> +         /*
> +          * escape within a quoted field
> +          */
> +         if (in_quote && c == escapec)
> +         {
> +             /*
> +              * peek at the next char if available, and escape it if it
> +              * is an escape char or a quote char
> +              */
> +             if (line_buf.cursor <= line_buf.len)
> +             {
> +                 char nextc = line_buf.data[line_buf.cursor];
> +                 if (nextc == escapec || nextc == quotec)
> +                 {
> +                     appendStringInfoCharMacro(&attribute_buf, nextc);
> +                     line_buf.cursor++;
> +                     continue;
> +                 }
> +             }
> +         }
> +         /*
> +          * end of quoted field.
> +          * Must do this test after testing for escape in case quote char
> +          * and escape char are the same (which is the common case).
> +          */
> +         if(in_quote && c == quotec)
> +         {
> +             in_quote = false;
> +             continue;
> +         }
> +         appendStringInfoCharMacro(&attribute_buf, c);
> +     }
> +
> +     if (in_quote)
> +         *result = UNTERMINATED_FIELD;
> +
> +     /* check whether raw input matched null marker */
> +     input_len = end_cursor - start_cursor;
> +     if (!saw_quote && input_len == strlen(null_print) &&
> +         strncmp(&line_buf.data[start_cursor], null_print, input_len) == 0)
> +         *isnull = true;
> +     else
> +         *isnull = false;
> +
> +     return attribute_buf.data;
> + }
> +
>   /*
>    * Read a binary attribute
>    */
> ***************
> *** 2192,2197 ****
> --- 2406,2479 ----
>                   break;
>           }
>       }
> + }
> +
> + /*
> +  * Send CSV representation of one attribute, with conversion and
> +  * CSV type escaping
> +  */
> + static void
> + CopyAttributeOutCSV(char *server_string, char *delim, bool force_quote)
> + {
> +     char       *string;
> +     char        c;
> +     char        delimc = delim[0];
> +     char        quotec = delim[1];
> +      char        escapec = delim[2] ? delim[2] : delim[1];
> +     bool        need_quote = force_quote;
> +     char        *test_string;
> +     bool        same_encoding;
> +     int            mblen;
> +     int            i;
> +
> +     same_encoding = (server_encoding == client_encoding);
> +     if (!same_encoding)
> +         string = (char *) pg_server_to_client((unsigned char *) server_string,
> +                                               strlen(server_string));
> +     else
> +         string = server_string;
> +
> +     /* have to run through the string twice,
> +      * first time to see if it needs quoting, second to actually send it
> +      */
> +
> +     for(test_string = string;
> +         !need_quote && (c = *test_string) != '\0';
> +         test_string += mblen)
> +     {
> +         if (c == delimc || c == quotec || c == '\n' || c == '\r')
> +         {
> +             need_quote = true;
> +         }
> +         if (!same_encoding)
> +             mblen = pg_encoding_mblen(client_encoding, test_string);
> +         else
> +             mblen = 1;
> +     }
> +
> +     if (need_quote)
> +         CopySendChar(quotec);
> +
> +     for (; (c = *string) != '\0'; string += mblen)
> +     {
> +         if (c == quotec || c == escapec)
> +             CopySendChar(escapec);
> +
> +         CopySendChar(c);
> +
> +         if (!same_encoding)
> +         {
> +             /* send additional bytes of the char, if any */
> +             mblen = pg_encoding_mblen(client_encoding, string);
> +             for (i = 1; i < mblen; i++)
> +                 CopySendChar(string[i]);
> +         }
> +         else
> +             mblen = 1;
> +     }
> +
> +     if (need_quote)
> +         CopySendChar(quotec);
>   }
>
>   /*

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Updated COPY CSV patch

From
Bruce Momjian
Date:
Andrew Dunstan wrote:
> If the null marker is not an empty string, it gets an error, of
> course - if it is it gets a null:
>
> [andrew@marmaduke pginst]$ echo ',,' | bin/psql -c "create temp
> table foo (a int, b text, c text); copy foo from stdin delimiter
> ',\"' null '\\\\N';" ERROR:  invalid input syntax for integer:
> "" CONTEXT:  COPY foo, line 1, column a: "" [andrew@marmaduke
> pginst]$ echo ',,' | bin/psql -c "create temp table foo (a int,
> b text, c text); copy foo from stdin delimiter ',\"' ;"
> [andrew@marmaduke pginst]$
>
>
> I hope that is expected behaviour - it's what *I* expect, at
> least.
> >
>
> Attached patch has these additions to previously posted patch:
> . quote character may not appear in NULL marker
> . any non-null value that matches the NULL marker is forced to be quoted
> when written.

OK, here is a new version of the patch that includes the grammar
changes we agreed upon, SGML changes, and \copy support.  I will not
make any more changes without contacting you so feel free to make
adjustments and repost.

I have two open issues.  First, CSV should support WITH OIDS, no?

Second, I found a problem with NULLs.  If I do:
.
        test=> create table test (x text, y text);
        CREATE TABLE
        test=> insert into test values ('', NULL);
        INSERT 17221 1
        test=>

then this:

        test=> copy test to '/tmp/b' with csv;

creates:

        "",

and this:

        test=> copy test to '/tmp/b' with csv NULL 'fred';

creates:

        ,fred

Is that logical?  A non-null field went from "" to nothing.

I think it is caused by this code:

         bool force_quote = (strcmp(string, null_print) == 0);
         CopyAttributeOutCSV(string, delim, quote, escape,
                             force_quote);

The reason it happens is that when the null string is '', it matches a
zero-length string, so the value is quoted.  When the null stirng isn't
blank, a zero-length string doesn't match the null string so it isn't
quoted.    I think we need to add special logic for zero-length strings
so they are always quoted, even if there is a special null string.  This
will make our dumps more consistent, I think, or maybe the current
behavior is OK.  It just struck me as strange.

I did a dump/reload test with a null string and null, and it worked
fine.

Is there any data that can not be dumped/reloaded via CSV?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: doc/src/sgml/ref/copy.sgml
===================================================================
RCS file: /cvsroot/pgsql-server/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.55
diff -c -c -r1.55 copy.sgml
*** doc/src/sgml/ref/copy.sgml    13 Dec 2003 23:59:07 -0000    1.55
--- doc/src/sgml/ref/copy.sgml    13 Apr 2004 04:18:22 -0000
***************
*** 26,32 ****
            [ BINARY ]
            [ OIDS ]
            [ DELIMITER [ AS ] '<replaceable class="parameter">delimiter</replaceable>' ]
!           [ NULL [ AS ] '<replaceable class="parameter">null string</replaceable>' ] ]

  COPY <replaceable class="parameter">tablename</replaceable> [ ( <replaceable class="parameter">column</replaceable>
[,...] ) ] 
      TO { '<replaceable class="parameter">filename</replaceable>' | STDOUT }
--- 26,34 ----
            [ BINARY ]
            [ OIDS ]
            [ DELIMITER [ AS ] '<replaceable class="parameter">delimiter</replaceable>' ]
!           [ NULL [ AS ] '<replaceable class="parameter">null string</replaceable>' ]
!           [ CSV [ QUOTE [ AS ] '<replaceable class="parameter">quote</replaceable>' ]
!                 [ ESCAPE [ AS ] '<replaceable class="parameter">escape</replaceable>' ] ]

  COPY <replaceable class="parameter">tablename</replaceable> [ ( <replaceable class="parameter">column</replaceable>
[,...] ) ] 
      TO { '<replaceable class="parameter">filename</replaceable>' | STDOUT }
***************
*** 34,40 ****
            [ BINARY ]
            [ OIDS ]
            [ DELIMITER [ AS ] '<replaceable class="parameter">delimiter</replaceable>' ]
!           [ NULL [ AS ] '<replaceable class="parameter">null string</replaceable>' ] ]
  </synopsis>
   </refsynopsisdiv>

--- 36,44 ----
            [ BINARY ]
            [ OIDS ]
            [ DELIMITER [ AS ] '<replaceable class="parameter">delimiter</replaceable>' ]
!           [ NULL [ AS ] '<replaceable class="parameter">null string</replaceable>' ]
!           [ CSV [ QUOTE [ AS ] '<replaceable class="parameter">quote</replaceable>' ]
!                 [ ESCAPE [ AS ] '<replaceable class="parameter">escape</replaceable>' ] ]
  </synopsis>
   </refsynopsisdiv>

***************
*** 136,142 ****
       <para>
        Specifies copying the OID for each row.  (An error is raised if
        <literal>OIDS</literal> is specified for a table that does not
!       have OIDs.)
       </para>
      </listitem>
     </varlistentry>
--- 140,146 ----
       <para>
        Specifies copying the OID for each row.  (An error is raised if
        <literal>OIDS</literal> is specified for a table that does not
!       have OIDs.)  FIX CSV FOR OIDS!
       </para>
      </listitem>
     </varlistentry>
***************
*** 146,152 ****
      <listitem>
       <para>
        The single character that separates columns within each row
!       (line) of the file.  The default is a tab character.
       </para>
      </listitem>
     </varlistentry>
--- 150,157 ----
      <listitem>
       <para>
        The single character that separates columns within each row
!       (line) of the file.  The default is a tab character in normal mode,
!       a comma in <literal>CSV</> mode.
       </para>
      </listitem>
     </varlistentry>
***************
*** 156,175 ****
      <listitem>
       <para>
        The string that represents a null value. The default is
!       <literal>\N</literal> (backslash-N). You might prefer an empty
!       string, for example.
       </para>

       <note>
        <para>
!        On a <command>COPY FROM</command>, any data item that matches
         this string will be stored as a null value, so you should make
         sure that you use the same string as you used with
         <command>COPY TO</command>.
        </para>
       </note>
      </listitem>
     </varlistentry>
    </variablelist>
   </refsect1>

--- 161,225 ----
      <listitem>
       <para>
        The string that represents a null value. The default is
!       <literal>\N</literal> (backslash-N) in normal mode, and a missing
!       value (no quotes) in <literal>CSV</> mode. You might prefer an empty
!       string in cases where you don't want to distinguish nulls from
!       empty strings.
       </para>

       <note>
        <para>
!        When using <command>COPY FROM</command>, any data item that matches
         this string will be stored as a null value, so you should make
         sure that you use the same string as you used with
         <command>COPY TO</command>.
        </para>
+
+       <para>
+        If you do not want anything used as null when using
+        <command>COPY FROM</command>, you can specify some value that is very
+        unlikely to appear in the file, such as <literal>frobnitz</literal> or
+        <literal>d5f4074b254c76cd8ae37bf1731f4aed</literal> (which is
+        <literal>md5('frobnitz')</literal>). This could be especially useful
+        when importing a <literal>CSV</> file into a table with <literal>NOT NULL</>
+        columns.
+       </para>
       </note>
+
      </listitem>
     </varlistentry>
+
+    <varlistentry>
+     <term><literal>CSV</literal></term>
+     <listitem>
+      <para>
+       Enables Comma Separated Variable (<literal>CSV</>) mode.  It sets the
+       default <literal>DELIMITER</> to comma, and <literal>QUOTE</> and
+       <literal>ESCAPE</> values to double-quote.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable class="parameter">quote string</replaceable></term>
+     <listitem>
+      <para>
+       Specifies the quotation character in <literal>CSV</> mode.
+       The default is double-quote.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable class="parameter">escape string</replaceable></term>
+     <listitem>
+      <para>
+       Specifies the character that should appear before a <literal>QUOTE</>
+       data character value in <literal>CSV</> mode.  The default is double-quote.
+      </para>
+     </listitem>
+    </varlistentry>
+
    </variablelist>
   </refsect1>

***************
*** 253,259 ****

     <para>
      When <command>COPY</command> is used without the <literal>BINARY</literal> option,
!     the data read or written is a text file with one line per table row.
      Columns in a row are separated by the delimiter character.
      The column values themselves are strings generated by the
      output function, or acceptable to the input function, of each
--- 303,310 ----

     <para>
      When <command>COPY</command> is used without the <literal>BINARY</literal> option,
!     the data read or written is a text file with one line per table row,
!     unless <literal>CSV</> mode is used.
      Columns in a row are separated by the delimiter character.
      The column values themselves are strings generated by the
      output function, or acceptable to the input function, of each
***************
*** 377,382 ****
--- 428,473 ----
      meant as data, <command>COPY FROM</command> will complain if the line
      endings in the input are not all alike.
     </para>
+   </refsect2>
+
+   <refsect2>
+    <title>CSV Format</title>
+
+    <para>
+     This format is used for importing from and exporting to the Comma
+     Separated Variable (<literal>CSV</>) file format used by many other programs,
+     such as spreadsheets. Instead of the escaping used by
+     <productname>PostgreSQL</productname>'s standard text mode, it produces
+     and recognises the common CSV escaping mechanism.
+    </para>
+
+    <para>
+     The values in each record are separated by the delimiter character.
+     If the value contains the delimiter character, the <literal>QUOTE</> character
+     or a carriage return or line feed character, then the whole value is prefixed
+     and suffixed by the quote character, and any occurrence within the value
+     of a quote character or the <literal>ESCAPE</> character is preceded
+     by the escape character.
+    </para>
+
+    <note>
+     <para>
+      CSV mode will both recognise and produce CSV files with quoted values
+      containing embedded carriage returns and line feeds. Thus the files are
+      not strictly one line per table row like non-CSV files.
+     </para>
+    </note>
+
+    <note>
+     <para>
+      Many programs produce strange and occasionally perverse CSV files, so
+      the file format is more a convention than a standard. Thus you might
+      encounter some files that cannot be imported using this mechanism, and
+      <command>COPY</> might produce files that other programs can not
+      process.
+     </para>
+    </note>
+
    </refsect2>

    <refsect2>
Index: doc/src/sgml/ref/psql-ref.sgml
===================================================================
RCS file: /cvsroot/pgsql-server/doc/src/sgml/ref/psql-ref.sgml,v
retrieving revision 1.110
diff -c -c -r1.110 psql-ref.sgml
*** doc/src/sgml/ref/psql-ref.sgml    12 Apr 2004 15:58:52 -0000    1.110
--- doc/src/sgml/ref/psql-ref.sgml    13 Apr 2004 04:18:25 -0000
***************
*** 711,716 ****
--- 711,718 ----
              [ <literal>oids</literal> ]
              [ <literal>delimiter [as] </literal> '<replaceable class="parameter">character</replaceable>' ]
              [ <literal>null [as] </literal> '<replaceable class="parameter">string</replaceable>' ]</literal>
+             [ <literal>csv [ quote [as] </literal> '<replaceable class="parameter">string</replaceable>' ]
+                            [ <literal>escape [as] </literal> '<replaceable class="parameter">string</replaceable>' ]
]
          </term>

          <listitem>
Index: src/backend/commands/copy.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/commands/copy.c,v
retrieving revision 1.219
diff -c -c -r1.219 copy.c
*** src/backend/commands/copy.c    6 Apr 2004 13:21:33 -0000    1.219
--- src/backend/commands/copy.c    13 Apr 2004 04:18:27 -0000
***************
*** 70,76 ****
  typedef enum CopyReadResult
  {
      NORMAL_ATTR,
!     END_OF_LINE
  } CopyReadResult;

  /*
--- 70,77 ----
  typedef enum CopyReadResult
  {
      NORMAL_ATTR,
!     END_OF_LINE,
!     UNTERMINATED_FIELD
  } CopyReadResult;

  /*
***************
*** 130,144 ****

  /* non-export function prototypes */
  static void CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
!        char *delim, char *null_print);
  static void CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
!          char *delim, char *null_print);
  static bool CopyReadLine(void);
  static char *CopyReadAttribute(const char *delim, const char *null_print,
                                 CopyReadResult *result, bool *isnull);
  static Datum CopyReadBinaryAttribute(int column_no, FmgrInfo *flinfo,
                          Oid typelem, bool *isnull);
  static void CopyAttributeOut(char *string, char *delim);
  static List *CopyGetAttnums(Relation rel, List *attnamelist);
  static void limit_printout_length(StringInfo buf);

--- 131,150 ----

  /* non-export function prototypes */
  static void CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
!        char *delim, char *null_print, bool csv_mode, char *quote, char *escape);
  static void CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
!          char *delim, char *null_print, bool csv_mode, char *quote, char *escape);
  static bool CopyReadLine(void);
  static char *CopyReadAttribute(const char *delim, const char *null_print,
                                 CopyReadResult *result, bool *isnull);
+ static char *CopyReadAttributeCSV(const char *delim, const char *null_print,
+                                char *quote, char *escape,
+                                CopyReadResult *result, bool *isnull);
  static Datum CopyReadBinaryAttribute(int column_no, FmgrInfo *flinfo,
                          Oid typelem, bool *isnull);
  static void CopyAttributeOut(char *string, char *delim);
+ static void CopyAttributeOutCSV(char *string, char *delim, char *quote,
+                                 char *escape, bool force_quote);
  static List *CopyGetAttnums(Relation rel, List *attnamelist);
  static void limit_printout_length(StringInfo buf);

***************
*** 682,688 ****
--- 688,697 ----
      List       *attnumlist;
      bool        binary = false;
      bool        oids = false;
+     bool        csv_mode = false;
      char       *delim = NULL;
+     char       *quote = NULL;
+     char       *escape = NULL;
      char       *null_print = NULL;
      Relation    rel;
      AclMode        required_access = (is_from ? ACL_INSERT : ACL_SELECT);
***************
*** 725,730 ****
--- 734,763 ----
                           errmsg("conflicting or redundant options")));
              null_print = strVal(defel->arg);
          }
+         else if (strcmp(defel->defname, "csv") == 0)
+         {
+             if (csv_mode)
+                 ereport(ERROR,
+                         (errcode(ERRCODE_SYNTAX_ERROR),
+                          errmsg("conflicting or redundant options")));
+             csv_mode = intVal(defel->arg);
+         }
+         else if (strcmp(defel->defname, "quote") == 0)
+         {
+             if (quote)
+                 ereport(ERROR,
+                         (errcode(ERRCODE_SYNTAX_ERROR),
+                          errmsg("conflicting or redundant options")));
+             quote = strVal(defel->arg);
+         }
+         else if (strcmp(defel->defname, "escape") == 0)
+         {
+             if (escape)
+                 ereport(ERROR,
+                         (errcode(ERRCODE_SYNTAX_ERROR),
+                          errmsg("conflicting or redundant options")));
+             escape = strVal(defel->arg);
+         }
          else
              elog(ERROR, "option \"%s\" not recognized",
                   defel->defname);
***************
*** 735,740 ****
--- 768,778 ----
                  (errcode(ERRCODE_SYNTAX_ERROR),
                   errmsg("cannot specify DELIMITER in BINARY mode")));

+     if (binary && csv_mode)
+         ereport(ERROR,
+                 (errcode(ERRCODE_SYNTAX_ERROR),
+                  errmsg("cannot specify CSV in BINARY mode")));
+
      if (binary && null_print)
          ereport(ERROR,
                  (errcode(ERRCODE_SYNTAX_ERROR),
***************
*** 742,751 ****

      /* Set defaults */
      if (!delim)
!         delim = "\t";
!
      if (!null_print)
!         null_print = "\\N";

      /*
       * Open and lock the relation, using the appropriate lock type.
--- 780,847 ----

      /* Set defaults */
      if (!delim)
!         delim = csv_mode ? "," : "\t";
!
      if (!null_print)
!         null_print = csv_mode ? "" : "\\N";
!
!     if (csv_mode)
!     {
!         if (!quote)
!             quote = "\"";
!         if (!escape)
!             escape = "\"";
!     }
!
!     /*
!      * Only single-character delimiter strings are supported.
!      */
!     if (strlen(delim) != 1)
!         ereport(ERROR,
!                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                  errmsg("COPY delimiter must be a single character")));
!
!     /*
!      * Check quote
!      */
!     if (!csv_mode && quote != NULL)
!         ereport(ERROR,
!                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                  errmsg("COPY quote available only in CSV mode")));
!
!     if (csv_mode && strlen(quote) != 1)
!         ereport(ERROR,
!                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                  errmsg("COPY quote must be a single character")));
!
!     /*
!      * Check escape
!      */
!     if (!csv_mode && escape != NULL)
!         ereport(ERROR,
!                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                  errmsg("COPY escape available only in CSV mode")));
!
!     if (csv_mode && strlen(escape) != 1)
!         ereport(ERROR,
!                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                  errmsg("COPY escape must be a single character")));
!
!     /*
!      * Don't allow the delimiter to appear in the null string.
!      */
!     if (strchr(null_print, delim[0]) != NULL)
!         ereport(ERROR,
!                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                  errmsg("COPY delimiter must not appear in the NULL specification")));
!
!     /*
!      * Don't allow the csv quote char to appear in the null string.
!      */
!     if (csv_mode && strchr(null_print, quote[0]) != NULL)
!         ereport(ERROR,
!                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                  errmsg("CSV quote character must not appear in the NULL specification")));

      /*
       * Open and lock the relation, using the appropriate lock type.
***************
*** 772,791 ****
                         "psql's \\copy command also works for anyone.")));

      /*
!      * Presently, only single-character delimiter strings are supported.
       */
-     if (strlen(delim) != 1)
-         ereport(ERROR,
-                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-                  errmsg("COPY delimiter must be a single character")));

!     /*
!      * Don't allow the delimiter to appear in the null string.
!      */
!     if (strchr(null_print, delim[0]) != NULL)
          ereport(ERROR,
!                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!                  errmsg("COPY delimiter must not appear in the NULL specification")));

      /*
       * Don't allow COPY w/ OIDs to or from a table without them
--- 868,880 ----
                         "psql's \\copy command also works for anyone.")));

      /*
!      * Don't allow OIDs in CSV mode
       */

!     if (csv_mode && oids)  // FIX ME bjm
          ereport(ERROR,
!                 (errcode(ERRCODE_SYNTAX_ERROR),
!                  errmsg("Cannot specify OIDS in CSV mode ")));

      /*
       * Don't allow COPY w/ OIDs to or from a table without them
***************
*** 864,870 ****
                           errmsg("\"%s\" is a directory", filename)));
              }
          }
!         CopyFrom(rel, attnumlist, binary, oids, delim, null_print);
      }
      else
      {                            /* copy from database to file */
--- 953,960 ----
                           errmsg("\"%s\" is a directory", filename)));
              }
          }
!         CopyFrom(rel, attnumlist, binary, oids, delim, null_print, csv_mode,
!                  quote, escape);
      }
      else
      {                            /* copy from database to file */
***************
*** 926,932 ****
                           errmsg("\"%s\" is a directory", filename)));
              }
          }
!         CopyTo(rel, attnumlist, binary, oids, delim, null_print);
      }

      if (!pipe)
--- 1016,1023 ----
                           errmsg("\"%s\" is a directory", filename)));
              }
          }
!         CopyTo(rel, attnumlist, binary, oids, delim, null_print, csv_mode,
!                 quote, escape);
      }

      if (!pipe)
***************
*** 958,964 ****
   */
  static void
  CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
!        char *delim, char *null_print)
  {
      HeapTuple    tuple;
      TupleDesc    tupDesc;
--- 1049,1056 ----
   */
  static void
  CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
!        char *delim, char *null_print, bool csv_mode, char *quote,
!        char *escape)
  {
      HeapTuple    tuple;
      TupleDesc    tupDesc;
***************
*** 1051,1057 ****
      while ((tuple = heap_getnext(scandesc, ForwardScanDirection)) != NULL)
      {
          bool        need_delim = false;
-
          CHECK_FOR_INTERRUPTS();

          MemoryContextReset(mycontext);
--- 1143,1148 ----
***************
*** 1113,1119 ****
                                                             value,
                                    ObjectIdGetDatum(elements[attnum - 1]),
                              Int32GetDatum(attr[attnum - 1]->atttypmod)));
!                     CopyAttributeOut(string, delim);
                  }
                  else
                  {
--- 1204,1218 ----
                                                             value,
                                    ObjectIdGetDatum(elements[attnum - 1]),
                              Int32GetDatum(attr[attnum - 1]->atttypmod)));
!                     if (csv_mode)
!                     {
!                         bool force_quote = (strcmp(string, null_print) == 0);
!                         CopyAttributeOutCSV(string, delim, quote, escape,
!                                             force_quote);
!                     }
!                     else
!                         CopyAttributeOut(string, delim);
!
                  }
                  else
                  {
***************
*** 1243,1249 ****
   */
  static void
  CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
!          char *delim, char *null_print)
  {
      HeapTuple    tuple;
      TupleDesc    tupDesc;
--- 1342,1349 ----
   */
  static void
  CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
!          char *delim, char *null_print, bool csv_mode, char *quote,
!          char *escape)
  {
      HeapTuple    tuple;
      TupleDesc    tupDesc;
***************
*** 1388,1396 ****
      ExecBSInsertTriggers(estate, resultRelInfo);

      if (!binary)
-     {
          file_has_oids = oids;    /* must rely on user to tell us this... */
-     }
      else
      {
          /* Read and verify binary header */
--- 1488,1494 ----
***************
*** 1499,1504 ****
--- 1597,1603 ----

              if (file_has_oids)
              {
+                 /* can't be in CSV mode here */
                  string = CopyReadAttribute(delim, null_print,
                                             &result, &isnull);

***************
*** 1537,1550 ****
                               errmsg("missing data for column \"%s\"",
                                      NameStr(attr[m]->attname))));

!                 string = CopyReadAttribute(delim, null_print,
!                                            &result, &isnull);
!
!                 if (isnull)
                  {
!                     /* we read an SQL NULL, no need to do anything */
                  }
                  else
                  {
                      copy_attname = NameStr(attr[m]->attname);
                      values[m] = FunctionCall3(&in_functions[m],
--- 1636,1658 ----
                               errmsg("missing data for column \"%s\"",
                                      NameStr(attr[m]->attname))));

!                 if (csv_mode)
                  {
!                     string = CopyReadAttributeCSV(delim, null_print, quote,
!                                                   escape, &result, &isnull);
!                     if (result == UNTERMINATED_FIELD)
!                         ereport(ERROR,
!                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                                  errmsg("unterminated CSV quoted field")));
                  }
                  else
+                     string = CopyReadAttribute(delim, null_print,
+                                                &result, &isnull);
+
+
+
+                 /* we read an SQL NULL, no need to do anything */
+                 if (!isnull)
                  {
                      copy_attname = NameStr(attr[m]->attname);
                      values[m] = FunctionCall3(&in_functions[m],
***************
*** 2069,2074 ****
--- 2177,2328 ----
      return attribute_buf.data;
  }

+
+ /*
+  * Read the value of a single attribute in CSV mode,
+  * performing de-escaping as needed. Escaping does not follow the normal
+  * PostgreSQL text mode, but instead "standard" (i.e. common) CSV usage.
+  *
+  * Quoted fields can span lines, in which case the line end is embedded
+  * in the returned string.
+  *
+  * null_print is the null marker string.  Note that this is compared to
+  * the pre-de-escaped input string (thus if it is quoted it is not a NULL).
+  *
+  * *result is set to indicate what terminated the read:
+  *        NORMAL_ATTR:    column delimiter
+  *        END_OF_LINE:    end of line
+  *      UNTERMINATED_FIELD no quote detected at end of a quoted field
+  *
+  * In any case, the string read up to the terminator (or end of file)
+  * is returned.
+  *
+  * *isnull is set true or false depending on whether the input matched
+  * the null marker.  Note that the caller cannot check this since the
+  * returned string will be the post-de-escaping equivalent, which may
+  * look the same as some valid data string.
+  *----------
+  */
+
+ static char *
+ CopyReadAttributeCSV(const char *delim, const char *null_print, char *quote,
+                      char *escape, CopyReadResult *result, bool *isnull)
+ {
+     char        delimc = delim[0];
+     char        quotec = quote[0];
+     char        escapec = escape[0];
+     char        c;
+     int            start_cursor = line_buf.cursor;
+     int            end_cursor = start_cursor;
+     int            input_len;
+     bool        in_quote = false;
+     bool        saw_quote = false;
+
+     /* reset attribute_buf to empty */
+     attribute_buf.len = 0;
+     attribute_buf.data[0] = '\0';
+
+     /* set default status */
+     *result = END_OF_LINE;
+
+     for (;;)
+     {
+         /* handle multiline quoted fields */
+         if (in_quote && line_buf.cursor >= line_buf.len)
+         {
+             bool done;
+
+             switch(eol_type)
+             {
+                 case EOL_NL:
+                     appendStringInfoString(&attribute_buf,"\n");
+                     break;
+                 case EOL_CR:
+                     appendStringInfoString(&attribute_buf,"\r");
+                     break;
+                 case EOL_CRNL:
+                     appendStringInfoString(&attribute_buf,"\r\n");
+                     break;
+                 case EOL_UNKNOWN:
+                     /* shouldn't happen - just keep going */
+                     break;
+             }
+
+             copy_lineno++;
+             done = CopyReadLine();
+             if (done && line_buf.len == 0)
+                 break;
+             start_cursor = line_buf.cursor;
+         }
+
+         end_cursor = line_buf.cursor;
+         if (line_buf.cursor >= line_buf.len)
+             break;
+         c = line_buf.data[line_buf.cursor++];
+         /*
+          * unquoted field delimiter
+          */
+         if (!in_quote && c == delimc)
+         {
+             *result = NORMAL_ATTR;
+             break;
+         }
+         /*
+          * start of quoted field (or part of field)
+          */
+         if (!in_quote && c == quotec)
+         {
+             saw_quote = true;
+             in_quote = true;
+             continue;
+         }
+         /*
+          * escape within a quoted field
+          */
+         if (in_quote && c == escapec)
+         {
+             /*
+              * peek at the next char if available, and escape it if it
+              * is an escape char or a quote char
+              */
+             if (line_buf.cursor <= line_buf.len)
+             {
+                 char nextc = line_buf.data[line_buf.cursor];
+                 if (nextc == escapec || nextc == quotec)
+                 {
+                     appendStringInfoCharMacro(&attribute_buf, nextc);
+                     line_buf.cursor++;
+                     continue;
+                 }
+             }
+         }
+         /*
+          * end of quoted field.
+          * Must do this test after testing for escape in case quote char
+          * and escape char are the same (which is the common case).
+          */
+         if (in_quote && c == quotec)
+         {
+             in_quote = false;
+             continue;
+         }
+         appendStringInfoCharMacro(&attribute_buf, c);
+     }
+
+     if (in_quote)
+         *result = UNTERMINATED_FIELD;
+
+     /* check whether raw input matched null marker */
+     input_len = end_cursor - start_cursor;
+     if (!saw_quote && input_len == strlen(null_print) &&
+         strncmp(&line_buf.data[start_cursor], null_print, input_len) == 0)
+         *isnull = true;
+     else
+         *isnull = false;
+
+     return attribute_buf.data;
+ }
+
  /*
   * Read a binary attribute
   */
***************
*** 2192,2197 ****
--- 2446,2518 ----
                  break;
          }
      }
+ }
+
+ /*
+  * Send CSV representation of one attribute, with conversion and
+  * CSV type escaping
+  */
+ static void
+ CopyAttributeOutCSV(char *server_string, char *delim, char *quote,
+                     char *escape, bool force_quote)
+ {
+     char       *string;
+     char        c;
+     char        delimc = delim[0];
+     char        quotec = quote[0];
+      char        escapec = escape[0];
+     bool        need_quote = force_quote;
+     char        *test_string;
+     bool        same_encoding;
+     int            mblen;
+     int            i;
+
+     same_encoding = (server_encoding == client_encoding);
+     if (!same_encoding)
+         string = (char *) pg_server_to_client((unsigned char *) server_string,
+                                               strlen(server_string));
+     else
+         string = server_string;
+
+     /* have to run through the string twice,
+      * first time to see if it needs quoting, second to actually send it
+      */
+
+     for(test_string = string;
+         !need_quote && (c = *test_string) != '\0';
+         test_string += mblen)
+     {
+         if (c == delimc || c == quotec || c == '\n' || c == '\r')
+             need_quote = true;
+         if (!same_encoding)
+             mblen = pg_encoding_mblen(client_encoding, test_string);
+         else
+             mblen = 1;
+     }
+
+     if (need_quote)
+         CopySendChar(quotec);
+
+     for (; (c = *string) != '\0'; string += mblen)
+     {
+         if (c == quotec || c == escapec)
+             CopySendChar(escapec);
+
+         CopySendChar(c);
+
+         if (!same_encoding)
+         {
+             /* send additional bytes of the char, if any */
+             mblen = pg_encoding_mblen(client_encoding, string);
+             for (i = 1; i < mblen; i++)
+                 CopySendChar(string[i]);
+         }
+         else
+             mblen = 1;
+     }
+
+     if (need_quote)
+         CopySendChar(quotec);
  }

  /*
Index: src/backend/parser/gram.y
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/parser/gram.y,v
retrieving revision 2.450
diff -c -c -r2.450 gram.y
*** src/backend/parser/gram.y    5 Apr 2004 03:07:26 -0000    2.450
--- src/backend/parser/gram.y    13 Apr 2004 04:18:31 -0000
***************
*** 343,349 ****
      CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
      CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
      COMMITTED CONSTRAINT CONSTRAINTS CONVERSION_P CONVERT COPY CREATE CREATEDB
!     CREATEUSER CROSS CURRENT_DATE CURRENT_TIME
      CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE

      DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
--- 343,349 ----
      CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
      CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
      COMMITTED CONSTRAINT CONSTRAINTS CONVERSION_P CONVERT COPY CREATE CREATEDB
!     CREATEUSER CROSS CSV CURRENT_DATE CURRENT_TIME
      CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE

      DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
***************
*** 386,391 ****
--- 386,393 ----
      PRECISION PRESERVE PREPARE PRIMARY
      PRIOR PRIVILEGES PROCEDURAL PROCEDURE

+     QUOTE
+
      READ REAL RECHECK REFERENCES REINDEX RELATIVE_P RENAME REPEATABLE REPLACE
      RESET RESTART RESTRICT RETURNS REVOKE RIGHT ROLLBACK ROW ROWS
      RULE
***************
*** 1356,1361 ****
--- 1358,1375 ----
                  {
                      $$ = makeDefElem("delimiter", (Node *)makeString($3));
                  }
+             | CSV
+                 {
+                     $$ = makeDefElem("csv", (Node *)makeInteger(TRUE));
+                 }
+             | QUOTE opt_as Sconst
+                 {
+                     $$ = makeDefElem("quote", (Node *)makeString($3));
+                 }
+             | ESCAPE opt_as Sconst
+                 {
+                     $$ = makeDefElem("escape", (Node *)makeString($3));
+                 }
              | NULL_P opt_as Sconst
                  {
                      $$ = makeDefElem("null", (Node *)makeString($3));
***************
*** 7420,7425 ****
--- 7434,7440 ----
              | COPY
              | CREATEDB
              | CREATEUSER
+             | CSV
              | CURSOR
              | CYCLE
              | DATABASE
***************
*** 7507,7512 ****
--- 7522,7528 ----
              | PRIVILEGES
              | PROCEDURAL
              | PROCEDURE
+             | QUOTE
              | READ
              | RECHECK
              | REINDEX
Index: src/backend/parser/keywords.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/parser/keywords.c,v
retrieving revision 1.147
diff -c -c -r1.147 keywords.c
*** src/backend/parser/keywords.c    11 Mar 2004 01:47:40 -0000    1.147
--- src/backend/parser/keywords.c    13 Apr 2004 04:18:31 -0000
***************
*** 90,95 ****
--- 90,96 ----
      {"createdb", CREATEDB},
      {"createuser", CREATEUSER},
      {"cross", CROSS},
+     {"csv", CSV},
      {"current_date", CURRENT_DATE},
      {"current_time", CURRENT_TIME},
      {"current_timestamp", CURRENT_TIMESTAMP},
***************
*** 248,253 ****
--- 249,255 ----
      {"privileges", PRIVILEGES},
      {"procedural", PROCEDURAL},
      {"procedure", PROCEDURE},
+     {"quote", QUOTE},
      {"read", READ},
      {"real", REAL},
      {"recheck", RECHECK},
Index: src/bin/psql/copy.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/bin/psql/copy.c,v
retrieving revision 1.43
diff -c -c -r1.43 copy.c
*** src/bin/psql/copy.c    12 Apr 2004 15:58:52 -0000    1.43
--- src/bin/psql/copy.c    13 Apr 2004 04:18:32 -0000
***************
*** 66,73 ****
--- 66,76 ----
      bool        from;
      bool        binary;
      bool        oids;
+     bool        csv_mode;
      char       *delim;
      char       *null;
+     char       *quote;
+     char       *escape;
  };


***************
*** 81,86 ****
--- 84,91 ----
      free(ptr->file);
      free(ptr->delim);
      free(ptr->null);
+     free(ptr->quote);
+     free(ptr->escape);
      free(ptr);
  }

***************
*** 277,282 ****
--- 282,291 ----
              {
                  result->oids = true;
              }
+             else if (strcasecmp(token, "csv") == 0)
+             {
+                 result->csv_mode = true;
+             }
              else if (strcasecmp(token, "delimiter") == 0)
              {
                  token = strtokx(NULL, whitespace, NULL, "'",
***************
*** 301,306 ****
--- 310,339 ----
                  else
                      goto error;
              }
+             else if (strcasecmp(token, "quote") == 0)
+             {
+                 token = strtokx(NULL, whitespace, NULL, "'",
+                                 '\\', false, pset.encoding);
+                 if (token && strcasecmp(token, "as") == 0)
+                     token = strtokx(NULL, whitespace, NULL, "'",
+                                     '\\', false, pset.encoding);
+                 if (token)
+                     result->quote = pg_strdup(token);
+                 else
+                     goto error;
+             }
+             else if (strcasecmp(token, "escape") == 0)
+             {
+                 token = strtokx(NULL, whitespace, NULL, "'",
+                                 '\\', false, pset.encoding);
+                 if (token && strcasecmp(token, "as") == 0)
+                     token = strtokx(NULL, whitespace, NULL, "'",
+                                     '\\', false, pset.encoding);
+                 if (token)
+                     result->escape = pg_strdup(token);
+                 else
+                     goto error;
+             }
              else
                  goto error;

***************
*** 340,346 ****
      PGresult   *result;
      bool        success;
      struct stat st;
!
      /* parse options */
      options = parse_slash_copy(args);

--- 373,380 ----
      PGresult   *result;
      bool        success;
      struct stat st;
!     bool with_output = false;
!
      /* parse options */
      options = parse_slash_copy(args);

***************
*** 379,390 ****
--- 413,454 ----
                                options->delim);
      }

+     /* There is no backward-compatible CSV syntax */
      if (options->null)
      {
          if (options->null[0] == '\'')
              appendPQExpBuffer(&query, " WITH NULL AS %s", options->null);
          else
              appendPQExpBuffer(&query, " WITH NULL AS '%s'", options->null);
+         with_output = true;
+     }
+
+     if (options->csv_mode)
+     {
+         appendPQExpBuffer(&query, " %sCSV ", with_output ? "" : "WITH ");
+         with_output = true;
+     }
+
+     if (options->quote)
+     {
+         if (options->quote[0] == '\'')
+             appendPQExpBuffer(&query, " %sQUOTE AS %s",
+                 with_output ? "" : "WITH ", options->quote);
+         else
+             appendPQExpBuffer(&query, " %sQUOTE AS '%s'",
+                 with_output ? "" : "WITH ", options->quote);
+         with_output = true;
+     }
+
+     if (options->escape)
+     {
+         if (options->escape[0] == '\'')
+             appendPQExpBuffer(&query, " %sESCAPE AS %s",
+                 with_output ? "" : "WITH ", options->escape);
+         else
+             appendPQExpBuffer(&query, " %sESCAPE AS '%s'",
+                 with_output ? "" : "WITH ", options->escape);
+         with_output = true;
      }

      if (options->from)

Re: Updated COPY CSV patch

From
"Andrew Dunstan"
Date:
Bruce Momjian said:
> Second, I found a problem with NULLs.  If I do:
> .
>        test=> create table test (x text, y text);
>        CREATE TABLE
>        test=> insert into test values ('', NULL);
>        INSERT 17221 1
>        test=>
>
> then this:
>
>        test=> copy test to '/tmp/b' with csv;
>
> creates:
>
>        "",
>
> and this:
>
>        test=> copy test to '/tmp/b' with csv NULL 'fred';
>
> creates:
>
>        ,fred
>
> Is that logical?  A non-null field went from "" to nothing.
>

One more point about this - we can't force quoting of every non-null
value, which would remove the "inconsistency" you see here, because
spreadsheets especially infer information from whether or not a CSV value
is quoted. In particular, they will not usually treat a quoted numeric
value as numeric, which would be a very undesirable effect.

cheers

andrew



Re: Updated COPY CSV patch

From
Andrew Dunstan
Date:
Andrew Dunstan wrote:

>Bruce Momjian said:
>
>
>>Second, I found a problem with NULLs.  If I do:
>>.
>>       test=> create table test (x text, y text);
>>       CREATE TABLE
>>       test=> insert into test values ('', NULL);
>>       INSERT 17221 1
>>       test=>
>>
>>then this:
>>
>>       test=> copy test to '/tmp/b' with csv;
>>
>>creates:
>>
>>       "",
>>
>>and this:
>>
>>       test=> copy test to '/tmp/b' with csv NULL 'fred';
>>
>>creates:
>>
>>       ,fred
>>
>>Is that logical?  A non-null field went from "" to nothing.
>>
>>
>>
>
>One more point about this - we can't force quoting of every non-null
>value, which would remove the "inconsistency" you see here, because
>spreadsheets especially infer information from whether or not a CSV value
>is quoted. In particular, they will not usually treat a quoted numeric
>value as numeric, which would be a very undesirable effect.
>
>
>

Thinking about this some more .... maybe the right rule would be "quote
all non-numeric non-null values". That would fix the odd effect that
Bruce saw, and it would also stop a spreadsheet from interpreting a date
like 2002-03-04 as an arithmetic expression.

Note that we don't care if a value is quoted or not - the only inference
we draw from it is that if it is quoted it can't be null. We don't need
to draw type inferences from it because we know the type we are trying
to import into from the table defn. This fits nicely with the rule about
being liberal with what you accept.

Thoughts?

cheers

andrew

Re: Updated COPY CSV patch

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Thinking about this some more .... maybe the right rule would be "quote
> all non-numeric non-null values".

And how would you define "numeric"?

I do *not* like putting data-type-specific knowledge into COPY.

            regards, tom lane

Re: Updated COPY CSV patch

From
Andrew Dunstan
Date:
Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>
>
>>Thinking about this some more .... maybe the right rule would be "quote
>>all non-numeric non-null values".
>>
>>
>
>And how would you define "numeric"?
>
>

At least the following:

 int8
 int2
 int4
 float4
 float8
 numeric
 money

and domains based on them.

>I do *not* like putting data-type-specific knowledge into COPY.
>
>

I'm trying to keep this as simple as possible. But we have to be a bit
smart if we want to be able to export nicely. Here's the problem: say
you have a text field that stores something that has numeric form (phone
number, SSN, whatever). You want that exported as text (i.e. quoted).
Otherwise, things like leading zeros will get lost by the importing
program. However, you *must* not quote genuine number values, or they
will not be imported correctly either. So, either we get a little bit
smart about the types we are exporting, or our export is going to be
broken in some cases. We need to be able to use more information than is
contained in the text representation of the value.

I would certainly hope to keep all this confined to the CSV code path,
and not intrude on any existing functionality in any way.

Exporting nicely has a lot more wrinkles than importing nicely, because
predicting the behaviour of the program we might be exporting to is
difficult.

If you can suggest a better way I'm all ears.

cheers

andrew

Re: Updated COPY CSV patch

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> And how would you define "numeric"?

> At least the following:

>  int8
>  int2
>  int4
>  float4
>  float8
>  numeric
>  money

> and domains based on them.

Wrong answer, as this excludes user-defined types.  COPY should not
discriminate on the basis of recognizing particular data types.

> I'm trying to keep this as simple as possible. But we have to be a bit
> smart if we want to be able to export nicely. Here's the problem: say
> you have a text field that stores something that has numeric form (phone
> number, SSN, whatever). You want that exported as text (i.e. quoted).
> Otherwise, things like leading zeros will get lost by the importing
> program. However, you *must* not quote genuine number values, or they
> will not be imported correctly either.

Again, you are trying to make COPY into something it isn't and shouldn't
be.

> Exporting nicely has a lot more wrinkles than importing nicely, because
> predicting the behaviour of the program we might be exporting to is
> difficult.

s/difficult/impossible/.  I might be willing to accept this sort of
cruft if it were well-defined cruft, but in point of fact you are trying
to set up expectations that will be impossible to satisfy.  We will be
forever making more little tweaks to COPY that render its behavior ever
less predictable, in the vain hope of reclosing the can of worms you
want to open.  It would be a lot wiser to implement this sort of
behavior outside the backend, in code that is easily hacked by users.

            regards, tom lane

Re: Updated COPY CSV patch

From
Andrew Dunstan
Date:
Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>
>
>>Tom Lane wrote:
>>
>>
>>>And how would you define "numeric"?
>>>
>>>
>
>
>
>>At least the following:
>>
>>
>
>
>
>> int8
>> int2
>> int4
>> float4
>> float8
>> numeric
>> money
>>
>>
>
>
>
>>and domains based on them.
>>
>>
>
>Wrong answer, as this excludes user-defined types.  COPY should not
>discriminate on the basis of recognizing particular data types.
>
>
>
>>I'm trying to keep this as simple as possible. But we have to be a bit
>>smart if we want to be able to export nicely. Here's the problem: say
>>you have a text field that stores something that has numeric form (phone
>>number, SSN, whatever). You want that exported as text (i.e. quoted).
>>Otherwise, things like leading zeros will get lost by the importing
>>program. However, you *must* not quote genuine number values, or they
>>will not be imported correctly either.
>>
>>
>
>Again, you are trying to make COPY into something it isn't and shouldn't
>be.
>
>
>
>>Exporting nicely has a lot more wrinkles than importing nicely, because
>>predicting the behaviour of the program we might be exporting to is
>>difficult.
>>
>>
>
>s/difficult/impossible/.  I might be willing to accept this sort of
>cruft if it were well-defined cruft, but in point of fact you are trying
>to set up expectations that will be impossible to satisfy.  We will be
>forever making more little tweaks to COPY that render its behavior ever
>less predictable, in the vain hope of reclosing the can of worms you
>want to open.  It would be a lot wiser to implement this sort of
>behavior outside the backend, in code that is easily hacked by users.
>
>
>

I have neither the time nor the inclination for a fight over this. I
hope I never have to use this anyway.

Plan B would be to provide a libpq client for importing/exporting CSVs.
This is such a basic function that I'd like to see it in the core
distribution. Most of the required logic is in what has already been
done.  It would mean that it had to be done via a client connection,
which might have speed implications for very large imports that could
run faster direct from the server's file system.

I thought there was an agreement on how to do this, but the straight
jacket into which it is being forced will significantly limit its
utility, IMNSHO.

cheers

andrew



Re: Updated COPY CSV patch

From
Karel Zak
Date:
On Tue, Apr 13, 2004 at 10:43:35AM -0400, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
> > Thinking about this some more .... maybe the right rule would be "quote
> > all non-numeric non-null values".
>
> And how would you define "numeric"?

 And don't  forget that  number format depend  on locale. Now  we ignore
 LC_NUMERIC, but  maybe at  some time  in future  we will  support it. I
 think the best solution is quote all values including numerics too.

 123,456  (in Czech this is one number with decimal "point" ;-)

    Karel

--
 Karel Zak  <zakkr@zf.jcu.cz>
 http://home.zf.jcu.cz/~zakkr/

Re: Updated COPY CSV patch

From
"Andrew Dunstan"
Date:
Karel Zak said:
> On Tue, Apr 13, 2004 at 10:43:35AM -0400, Tom Lane wrote:
>> Andrew Dunstan <andrew@dunslane.net> writes:
>> > Thinking about this some more .... maybe the right rule would be
>> > "quote  all non-numeric non-null values".
>>
>> And how would you define "numeric"?
>
> And don't  forget that  number format depend  on locale. Now  we ignore
> LC_NUMERIC, but  maybe at  some time  in future  we will  support it. I
> think the best solution is quote all values including numerics too.
>
> 123,456  (in Czech this is one number with decimal "point" ;-)
>

I am assuming that such users would use a delimiter other than a comma.
Any string that contains the delimiter *must* be quoted, regardless of
type. That's just the way CSVs work - it's not an artifact of ours.

cheers

andrew