Thread: COPY not handling csv files with quoted fiedls with embedded delimiters correctly

COPY not handling csv files with quoted fiedls with embedded delimiters correctly

From
Mary Anderson
Date:
Hi All,
    Alas, my PhD is mathematics, not linquistics.  I do not seem to be
able to parse the explanation of quotes and delimiters in thte COPY
command documentation correctly.
   I thought that the following COPY command would correctly read the
file outlined below.  (I checked, the lines end when they say the end.)
But it does not seem to handle the enclosed comma correctly.

   Command:

COPY input.input_file_m105 (test_date,deaths_)
FROM '/home/maryfran/memdev/DATA/DATES/Mmmm_ddCOMMA_yyyy.txt'
WITH CSV HEADER

   File:

test_date,deaths                                         /* this is
first line of file*/
"January 3, 1999","123"
"February 3, 1999","123"
"March 3, 1999","123"
"April 3, 1999","123"
"May 3, 1999","123"
"June 3, 1999","123"
"July 3, 1999","123"
"August 3, 1999","123"
"September 3, 1999","123"
"October 3, 1999","123"
"November 3, 1999","123"
"December 3, 1999","123"

I checked.  The last quote on the line is in fact the last character on
the line.  This file is one of a number of test files, the others
without embedded commas, all of which were copied correctly with no
quotes.  Leaving off the quotes on the 123's does not help matters.  I
get the same error message.

Error message:

ERROR: extra data after last expected column CONTEXT: COPY
input_file_m105, line 2: "January 3, 1999,123"

First, I would like to know what is going on, since it is my
understanding that the comma is the default deliminter, the double quote
is the default quote, and enclosing the field in quotes should escape
the embedded comma.

Second, I am writing a web application which will allow selected users
to upload files in very general formats.  I am hoping that the user will
be able to give the application basic meta data on the columns in the
file and have the file uploaded and processed.  If postgresql is going
to have idiosycrasies which force the user to preprocess files to escape
special characters or embed delimiters I would like to know about them.

Mary Anderson <maryfran@demog.berkeley.edu> writes:
>    Command:

> COPY input.input_file_m105 (test_date,deaths_)
> FROM '/home/maryfran/memdev/DATA/DATES/Mmmm_ddCOMMA_yyyy.txt'
> WITH CSV HEADER

>    File:

> test_date,deaths                                         /* this is
> first line of file*/
> "January 3, 1999","123"

> Error message:

> ERROR: extra data after last expected column CONTEXT: COPY
> input_file_m105, line 2: "January 3, 1999,123"

Maybe you aren't reading the same file you think you are?  Because
the error message says that there are no quotes on the line that the
backend saw.

            regards, tom lane