Updated COPY CSV patch - Mailing list pgsql-patches
From | Bruce Momjian |
---|---|
Subject | Updated COPY CSV patch |
Date | |
Msg-id | 200404130420.i3D4KEE02485@candle.pha.pa.us Whole thread Raw |
In response to | Re: COPY for CSV documentation (Andrew Dunstan <andrew@dunslane.net>) |
Responses |
Re: Updated COPY CSV patch
|
List | pgsql-patches |
Andrew Dunstan wrote: > If the null marker is not an empty string, it gets an error, of > course - if it is it gets a null: > > [andrew@marmaduke pginst]$ echo ',,' | bin/psql -c "create temp > table foo (a int, b text, c text); copy foo from stdin delimiter > ',\"' null '\\\\N';" ERROR: invalid input syntax for integer: > "" CONTEXT: COPY foo, line 1, column a: "" [andrew@marmaduke > pginst]$ echo ',,' | bin/psql -c "create temp table foo (a int, > b text, c text); copy foo from stdin delimiter ',\"' ;" > [andrew@marmaduke pginst]$ > > > I hope that is expected behaviour - it's what *I* expect, at > least. > > > > Attached patch has these additions to previously posted patch: > . quote character may not appear in NULL marker > . any non-null value that matches the NULL marker is forced to be quoted > when written. OK, here is a new version of the patch that includes the grammar changes we agreed upon, SGML changes, and \copy support. I will not make any more changes without contacting you so feel free to make adjustments and repost. I have two open issues. First, CSV should support WITH OIDS, no? Second, I found a problem with NULLs. If I do: . test=> create table test (x text, y text); CREATE TABLE test=> insert into test values ('', NULL); INSERT 17221 1 test=> then this: test=> copy test to '/tmp/b' with csv; creates: "", and this: test=> copy test to '/tmp/b' with csv NULL 'fred'; creates: ,fred Is that logical? A non-null field went from "" to nothing. I think it is caused by this code: bool force_quote = (strcmp(string, null_print) == 0); CopyAttributeOutCSV(string, delim, quote, escape, force_quote); The reason it happens is that when the null string is '', it matches a zero-length string, so the value is quoted. When the null stirng isn't blank, a zero-length string doesn't match the null string so it isn't quoted. I think we need to add special logic for zero-length strings so they are always quoted, even if there is a special null string. This will make our dumps more consistent, I think, or maybe the current behavior is OK. It just struck me as strange. I did a dump/reload test with a null string and null, and it worked fine. Is there any data that can not be dumped/reloaded via CSV? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: doc/src/sgml/ref/copy.sgml =================================================================== RCS file: /cvsroot/pgsql-server/doc/src/sgml/ref/copy.sgml,v retrieving revision 1.55 diff -c -c -r1.55 copy.sgml *** doc/src/sgml/ref/copy.sgml 13 Dec 2003 23:59:07 -0000 1.55 --- doc/src/sgml/ref/copy.sgml 13 Apr 2004 04:18:22 -0000 *************** *** 26,32 **** [ BINARY ] [ OIDS ] [ DELIMITER [ AS ] '<replaceable class="parameter">delimiter</replaceable>' ] ! [ NULL [ AS ] '<replaceable class="parameter">null string</replaceable>' ] ] COPY <replaceable class="parameter">tablename</replaceable> [ ( <replaceable class="parameter">column</replaceable> [,...] ) ] TO { '<replaceable class="parameter">filename</replaceable>' | STDOUT } --- 26,34 ---- [ BINARY ] [ OIDS ] [ DELIMITER [ AS ] '<replaceable class="parameter">delimiter</replaceable>' ] ! [ NULL [ AS ] '<replaceable class="parameter">null string</replaceable>' ] ! [ CSV [ QUOTE [ AS ] '<replaceable class="parameter">quote</replaceable>' ] ! [ ESCAPE [ AS ] '<replaceable class="parameter">escape</replaceable>' ] ] COPY <replaceable class="parameter">tablename</replaceable> [ ( <replaceable class="parameter">column</replaceable> [,...] ) ] TO { '<replaceable class="parameter">filename</replaceable>' | STDOUT } *************** *** 34,40 **** [ BINARY ] [ OIDS ] [ DELIMITER [ AS ] '<replaceable class="parameter">delimiter</replaceable>' ] ! [ NULL [ AS ] '<replaceable class="parameter">null string</replaceable>' ] ] </synopsis> </refsynopsisdiv> --- 36,44 ---- [ BINARY ] [ OIDS ] [ DELIMITER [ AS ] '<replaceable class="parameter">delimiter</replaceable>' ] ! [ NULL [ AS ] '<replaceable class="parameter">null string</replaceable>' ] ! [ CSV [ QUOTE [ AS ] '<replaceable class="parameter">quote</replaceable>' ] ! [ ESCAPE [ AS ] '<replaceable class="parameter">escape</replaceable>' ] ] </synopsis> </refsynopsisdiv> *************** *** 136,142 **** <para> Specifies copying the OID for each row. (An error is raised if <literal>OIDS</literal> is specified for a table that does not ! have OIDs.) </para> </listitem> </varlistentry> --- 140,146 ---- <para> Specifies copying the OID for each row. (An error is raised if <literal>OIDS</literal> is specified for a table that does not ! have OIDs.) FIX CSV FOR OIDS! </para> </listitem> </varlistentry> *************** *** 146,152 **** <listitem> <para> The single character that separates columns within each row ! (line) of the file. The default is a tab character. </para> </listitem> </varlistentry> --- 150,157 ---- <listitem> <para> The single character that separates columns within each row ! (line) of the file. The default is a tab character in normal mode, ! a comma in <literal>CSV</> mode. </para> </listitem> </varlistentry> *************** *** 156,175 **** <listitem> <para> The string that represents a null value. The default is ! <literal>\N</literal> (backslash-N). You might prefer an empty ! string, for example. </para> <note> <para> ! On a <command>COPY FROM</command>, any data item that matches this string will be stored as a null value, so you should make sure that you use the same string as you used with <command>COPY TO</command>. </para> </note> </listitem> </varlistentry> </variablelist> </refsect1> --- 161,225 ---- <listitem> <para> The string that represents a null value. The default is ! <literal>\N</literal> (backslash-N) in normal mode, and a missing ! value (no quotes) in <literal>CSV</> mode. You might prefer an empty ! string in cases where you don't want to distinguish nulls from ! empty strings. </para> <note> <para> ! When using <command>COPY FROM</command>, any data item that matches this string will be stored as a null value, so you should make sure that you use the same string as you used with <command>COPY TO</command>. </para> + + <para> + If you do not want anything used as null when using + <command>COPY FROM</command>, you can specify some value that is very + unlikely to appear in the file, such as <literal>frobnitz</literal> or + <literal>d5f4074b254c76cd8ae37bf1731f4aed</literal> (which is + <literal>md5('frobnitz')</literal>). This could be especially useful + when importing a <literal>CSV</> file into a table with <literal>NOT NULL</> + columns. + </para> </note> + </listitem> </varlistentry> + + <varlistentry> + <term><literal>CSV</literal></term> + <listitem> + <para> + Enables Comma Separated Variable (<literal>CSV</>) mode. It sets the + default <literal>DELIMITER</> to comma, and <literal>QUOTE</> and + <literal>ESCAPE</> values to double-quote. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable class="parameter">quote string</replaceable></term> + <listitem> + <para> + Specifies the quotation character in <literal>CSV</> mode. + The default is double-quote. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable class="parameter">escape string</replaceable></term> + <listitem> + <para> + Specifies the character that should appear before a <literal>QUOTE</> + data character value in <literal>CSV</> mode. The default is double-quote. + </para> + </listitem> + </varlistentry> + </variablelist> </refsect1> *************** *** 253,259 **** <para> When <command>COPY</command> is used without the <literal>BINARY</literal> option, ! the data read or written is a text file with one line per table row. Columns in a row are separated by the delimiter character. The column values themselves are strings generated by the output function, or acceptable to the input function, of each --- 303,310 ---- <para> When <command>COPY</command> is used without the <literal>BINARY</literal> option, ! the data read or written is a text file with one line per table row, ! unless <literal>CSV</> mode is used. Columns in a row are separated by the delimiter character. The column values themselves are strings generated by the output function, or acceptable to the input function, of each *************** *** 377,382 **** --- 428,473 ---- meant as data, <command>COPY FROM</command> will complain if the line endings in the input are not all alike. </para> + </refsect2> + + <refsect2> + <title>CSV Format</title> + + <para> + This format is used for importing from and exporting to the Comma + Separated Variable (<literal>CSV</>) file format used by many other programs, + such as spreadsheets. Instead of the escaping used by + <productname>PostgreSQL</productname>'s standard text mode, it produces + and recognises the common CSV escaping mechanism. + </para> + + <para> + The values in each record are separated by the delimiter character. + If the value contains the delimiter character, the <literal>QUOTE</> character + or a carriage return or line feed character, then the whole value is prefixed + and suffixed by the quote character, and any occurrence within the value + of a quote character or the <literal>ESCAPE</> character is preceded + by the escape character. + </para> + + <note> + <para> + CSV mode will both recognise and produce CSV files with quoted values + containing embedded carriage returns and line feeds. Thus the files are + not strictly one line per table row like non-CSV files. + </para> + </note> + + <note> + <para> + Many programs produce strange and occasionally perverse CSV files, so + the file format is more a convention than a standard. Thus you might + encounter some files that cannot be imported using this mechanism, and + <command>COPY</> might produce files that other programs can not + process. + </para> + </note> + </refsect2> <refsect2> Index: doc/src/sgml/ref/psql-ref.sgml =================================================================== RCS file: /cvsroot/pgsql-server/doc/src/sgml/ref/psql-ref.sgml,v retrieving revision 1.110 diff -c -c -r1.110 psql-ref.sgml *** doc/src/sgml/ref/psql-ref.sgml 12 Apr 2004 15:58:52 -0000 1.110 --- doc/src/sgml/ref/psql-ref.sgml 13 Apr 2004 04:18:25 -0000 *************** *** 711,716 **** --- 711,718 ---- [ <literal>oids</literal> ] [ <literal>delimiter [as] </literal> '<replaceable class="parameter">character</replaceable>' ] [ <literal>null [as] </literal> '<replaceable class="parameter">string</replaceable>' ]</literal> + [ <literal>csv [ quote [as] </literal> '<replaceable class="parameter">string</replaceable>' ] + [ <literal>escape [as] </literal> '<replaceable class="parameter">string</replaceable>' ] ] </term> <listitem> Index: src/backend/commands/copy.c =================================================================== RCS file: /cvsroot/pgsql-server/src/backend/commands/copy.c,v retrieving revision 1.219 diff -c -c -r1.219 copy.c *** src/backend/commands/copy.c 6 Apr 2004 13:21:33 -0000 1.219 --- src/backend/commands/copy.c 13 Apr 2004 04:18:27 -0000 *************** *** 70,76 **** typedef enum CopyReadResult { NORMAL_ATTR, ! END_OF_LINE } CopyReadResult; /* --- 70,77 ---- typedef enum CopyReadResult { NORMAL_ATTR, ! END_OF_LINE, ! UNTERMINATED_FIELD } CopyReadResult; /* *************** *** 130,144 **** /* non-export function prototypes */ static void CopyTo(Relation rel, List *attnumlist, bool binary, bool oids, ! char *delim, char *null_print); static void CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids, ! char *delim, char *null_print); static bool CopyReadLine(void); static char *CopyReadAttribute(const char *delim, const char *null_print, CopyReadResult *result, bool *isnull); static Datum CopyReadBinaryAttribute(int column_no, FmgrInfo *flinfo, Oid typelem, bool *isnull); static void CopyAttributeOut(char *string, char *delim); static List *CopyGetAttnums(Relation rel, List *attnamelist); static void limit_printout_length(StringInfo buf); --- 131,150 ---- /* non-export function prototypes */ static void CopyTo(Relation rel, List *attnumlist, bool binary, bool oids, ! char *delim, char *null_print, bool csv_mode, char *quote, char *escape); static void CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids, ! char *delim, char *null_print, bool csv_mode, char *quote, char *escape); static bool CopyReadLine(void); static char *CopyReadAttribute(const char *delim, const char *null_print, CopyReadResult *result, bool *isnull); + static char *CopyReadAttributeCSV(const char *delim, const char *null_print, + char *quote, char *escape, + CopyReadResult *result, bool *isnull); static Datum CopyReadBinaryAttribute(int column_no, FmgrInfo *flinfo, Oid typelem, bool *isnull); static void CopyAttributeOut(char *string, char *delim); + static void CopyAttributeOutCSV(char *string, char *delim, char *quote, + char *escape, bool force_quote); static List *CopyGetAttnums(Relation rel, List *attnamelist); static void limit_printout_length(StringInfo buf); *************** *** 682,688 **** --- 688,697 ---- List *attnumlist; bool binary = false; bool oids = false; + bool csv_mode = false; char *delim = NULL; + char *quote = NULL; + char *escape = NULL; char *null_print = NULL; Relation rel; AclMode required_access = (is_from ? ACL_INSERT : ACL_SELECT); *************** *** 725,730 **** --- 734,763 ---- errmsg("conflicting or redundant options"))); null_print = strVal(defel->arg); } + else if (strcmp(defel->defname, "csv") == 0) + { + if (csv_mode) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("conflicting or redundant options"))); + csv_mode = intVal(defel->arg); + } + else if (strcmp(defel->defname, "quote") == 0) + { + if (quote) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("conflicting or redundant options"))); + quote = strVal(defel->arg); + } + else if (strcmp(defel->defname, "escape") == 0) + { + if (escape) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("conflicting or redundant options"))); + escape = strVal(defel->arg); + } else elog(ERROR, "option \"%s\" not recognized", defel->defname); *************** *** 735,740 **** --- 768,778 ---- (errcode(ERRCODE_SYNTAX_ERROR), errmsg("cannot specify DELIMITER in BINARY mode"))); + if (binary && csv_mode) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("cannot specify CSV in BINARY mode"))); + if (binary && null_print) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), *************** *** 742,751 **** /* Set defaults */ if (!delim) ! delim = "\t"; ! if (!null_print) ! null_print = "\\N"; /* * Open and lock the relation, using the appropriate lock type. --- 780,847 ---- /* Set defaults */ if (!delim) ! delim = csv_mode ? "," : "\t"; ! if (!null_print) ! null_print = csv_mode ? "" : "\\N"; ! ! if (csv_mode) ! { ! if (!quote) ! quote = "\""; ! if (!escape) ! escape = "\""; ! } ! ! /* ! * Only single-character delimiter strings are supported. ! */ ! if (strlen(delim) != 1) ! ereport(ERROR, ! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), ! errmsg("COPY delimiter must be a single character"))); ! ! /* ! * Check quote ! */ ! if (!csv_mode && quote != NULL) ! ereport(ERROR, ! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), ! errmsg("COPY quote available only in CSV mode"))); ! ! if (csv_mode && strlen(quote) != 1) ! ereport(ERROR, ! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), ! errmsg("COPY quote must be a single character"))); ! ! /* ! * Check escape ! */ ! if (!csv_mode && escape != NULL) ! ereport(ERROR, ! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), ! errmsg("COPY escape available only in CSV mode"))); ! ! if (csv_mode && strlen(escape) != 1) ! ereport(ERROR, ! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), ! errmsg("COPY escape must be a single character"))); ! ! /* ! * Don't allow the delimiter to appear in the null string. ! */ ! if (strchr(null_print, delim[0]) != NULL) ! ereport(ERROR, ! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), ! errmsg("COPY delimiter must not appear in the NULL specification"))); ! ! /* ! * Don't allow the csv quote char to appear in the null string. ! */ ! if (csv_mode && strchr(null_print, quote[0]) != NULL) ! ereport(ERROR, ! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), ! errmsg("CSV quote character must not appear in the NULL specification"))); /* * Open and lock the relation, using the appropriate lock type. *************** *** 772,791 **** "psql's \\copy command also works for anyone."))); /* ! * Presently, only single-character delimiter strings are supported. */ - if (strlen(delim) != 1) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("COPY delimiter must be a single character"))); ! /* ! * Don't allow the delimiter to appear in the null string. ! */ ! if (strchr(null_print, delim[0]) != NULL) ereport(ERROR, ! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), ! errmsg("COPY delimiter must not appear in the NULL specification"))); /* * Don't allow COPY w/ OIDs to or from a table without them --- 868,880 ---- "psql's \\copy command also works for anyone."))); /* ! * Don't allow OIDs in CSV mode */ ! if (csv_mode && oids) // FIX ME bjm ereport(ERROR, ! (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("Cannot specify OIDS in CSV mode "))); /* * Don't allow COPY w/ OIDs to or from a table without them *************** *** 864,870 **** errmsg("\"%s\" is a directory", filename))); } } ! CopyFrom(rel, attnumlist, binary, oids, delim, null_print); } else { /* copy from database to file */ --- 953,960 ---- errmsg("\"%s\" is a directory", filename))); } } ! CopyFrom(rel, attnumlist, binary, oids, delim, null_print, csv_mode, ! quote, escape); } else { /* copy from database to file */ *************** *** 926,932 **** errmsg("\"%s\" is a directory", filename))); } } ! CopyTo(rel, attnumlist, binary, oids, delim, null_print); } if (!pipe) --- 1016,1023 ---- errmsg("\"%s\" is a directory", filename))); } } ! CopyTo(rel, attnumlist, binary, oids, delim, null_print, csv_mode, ! quote, escape); } if (!pipe) *************** *** 958,964 **** */ static void CopyTo(Relation rel, List *attnumlist, bool binary, bool oids, ! char *delim, char *null_print) { HeapTuple tuple; TupleDesc tupDesc; --- 1049,1056 ---- */ static void CopyTo(Relation rel, List *attnumlist, bool binary, bool oids, ! char *delim, char *null_print, bool csv_mode, char *quote, ! char *escape) { HeapTuple tuple; TupleDesc tupDesc; *************** *** 1051,1057 **** while ((tuple = heap_getnext(scandesc, ForwardScanDirection)) != NULL) { bool need_delim = false; - CHECK_FOR_INTERRUPTS(); MemoryContextReset(mycontext); --- 1143,1148 ---- *************** *** 1113,1119 **** value, ObjectIdGetDatum(elements[attnum - 1]), Int32GetDatum(attr[attnum - 1]->atttypmod))); ! CopyAttributeOut(string, delim); } else { --- 1204,1218 ---- value, ObjectIdGetDatum(elements[attnum - 1]), Int32GetDatum(attr[attnum - 1]->atttypmod))); ! if (csv_mode) ! { ! bool force_quote = (strcmp(string, null_print) == 0); ! CopyAttributeOutCSV(string, delim, quote, escape, ! force_quote); ! } ! else ! CopyAttributeOut(string, delim); ! } else { *************** *** 1243,1249 **** */ static void CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids, ! char *delim, char *null_print) { HeapTuple tuple; TupleDesc tupDesc; --- 1342,1349 ---- */ static void CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids, ! char *delim, char *null_print, bool csv_mode, char *quote, ! char *escape) { HeapTuple tuple; TupleDesc tupDesc; *************** *** 1388,1396 **** ExecBSInsertTriggers(estate, resultRelInfo); if (!binary) - { file_has_oids = oids; /* must rely on user to tell us this... */ - } else { /* Read and verify binary header */ --- 1488,1494 ---- *************** *** 1499,1504 **** --- 1597,1603 ---- if (file_has_oids) { + /* can't be in CSV mode here */ string = CopyReadAttribute(delim, null_print, &result, &isnull); *************** *** 1537,1550 **** errmsg("missing data for column \"%s\"", NameStr(attr[m]->attname)))); ! string = CopyReadAttribute(delim, null_print, ! &result, &isnull); ! ! if (isnull) { ! /* we read an SQL NULL, no need to do anything */ } else { copy_attname = NameStr(attr[m]->attname); values[m] = FunctionCall3(&in_functions[m], --- 1636,1658 ---- errmsg("missing data for column \"%s\"", NameStr(attr[m]->attname)))); ! if (csv_mode) { ! string = CopyReadAttributeCSV(delim, null_print, quote, ! escape, &result, &isnull); ! if (result == UNTERMINATED_FIELD) ! ereport(ERROR, ! (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), ! errmsg("unterminated CSV quoted field"))); } else + string = CopyReadAttribute(delim, null_print, + &result, &isnull); + + + + /* we read an SQL NULL, no need to do anything */ + if (!isnull) { copy_attname = NameStr(attr[m]->attname); values[m] = FunctionCall3(&in_functions[m], *************** *** 2069,2074 **** --- 2177,2328 ---- return attribute_buf.data; } + + /* + * Read the value of a single attribute in CSV mode, + * performing de-escaping as needed. Escaping does not follow the normal + * PostgreSQL text mode, but instead "standard" (i.e. common) CSV usage. + * + * Quoted fields can span lines, in which case the line end is embedded + * in the returned string. + * + * null_print is the null marker string. Note that this is compared to + * the pre-de-escaped input string (thus if it is quoted it is not a NULL). + * + * *result is set to indicate what terminated the read: + * NORMAL_ATTR: column delimiter + * END_OF_LINE: end of line + * UNTERMINATED_FIELD no quote detected at end of a quoted field + * + * In any case, the string read up to the terminator (or end of file) + * is returned. + * + * *isnull is set true or false depending on whether the input matched + * the null marker. Note that the caller cannot check this since the + * returned string will be the post-de-escaping equivalent, which may + * look the same as some valid data string. + *---------- + */ + + static char * + CopyReadAttributeCSV(const char *delim, const char *null_print, char *quote, + char *escape, CopyReadResult *result, bool *isnull) + { + char delimc = delim[0]; + char quotec = quote[0]; + char escapec = escape[0]; + char c; + int start_cursor = line_buf.cursor; + int end_cursor = start_cursor; + int input_len; + bool in_quote = false; + bool saw_quote = false; + + /* reset attribute_buf to empty */ + attribute_buf.len = 0; + attribute_buf.data[0] = '\0'; + + /* set default status */ + *result = END_OF_LINE; + + for (;;) + { + /* handle multiline quoted fields */ + if (in_quote && line_buf.cursor >= line_buf.len) + { + bool done; + + switch(eol_type) + { + case EOL_NL: + appendStringInfoString(&attribute_buf,"\n"); + break; + case EOL_CR: + appendStringInfoString(&attribute_buf,"\r"); + break; + case EOL_CRNL: + appendStringInfoString(&attribute_buf,"\r\n"); + break; + case EOL_UNKNOWN: + /* shouldn't happen - just keep going */ + break; + } + + copy_lineno++; + done = CopyReadLine(); + if (done && line_buf.len == 0) + break; + start_cursor = line_buf.cursor; + } + + end_cursor = line_buf.cursor; + if (line_buf.cursor >= line_buf.len) + break; + c = line_buf.data[line_buf.cursor++]; + /* + * unquoted field delimiter + */ + if (!in_quote && c == delimc) + { + *result = NORMAL_ATTR; + break; + } + /* + * start of quoted field (or part of field) + */ + if (!in_quote && c == quotec) + { + saw_quote = true; + in_quote = true; + continue; + } + /* + * escape within a quoted field + */ + if (in_quote && c == escapec) + { + /* + * peek at the next char if available, and escape it if it + * is an escape char or a quote char + */ + if (line_buf.cursor <= line_buf.len) + { + char nextc = line_buf.data[line_buf.cursor]; + if (nextc == escapec || nextc == quotec) + { + appendStringInfoCharMacro(&attribute_buf, nextc); + line_buf.cursor++; + continue; + } + } + } + /* + * end of quoted field. + * Must do this test after testing for escape in case quote char + * and escape char are the same (which is the common case). + */ + if (in_quote && c == quotec) + { + in_quote = false; + continue; + } + appendStringInfoCharMacro(&attribute_buf, c); + } + + if (in_quote) + *result = UNTERMINATED_FIELD; + + /* check whether raw input matched null marker */ + input_len = end_cursor - start_cursor; + if (!saw_quote && input_len == strlen(null_print) && + strncmp(&line_buf.data[start_cursor], null_print, input_len) == 0) + *isnull = true; + else + *isnull = false; + + return attribute_buf.data; + } + /* * Read a binary attribute */ *************** *** 2192,2197 **** --- 2446,2518 ---- break; } } + } + + /* + * Send CSV representation of one attribute, with conversion and + * CSV type escaping + */ + static void + CopyAttributeOutCSV(char *server_string, char *delim, char *quote, + char *escape, bool force_quote) + { + char *string; + char c; + char delimc = delim[0]; + char quotec = quote[0]; + char escapec = escape[0]; + bool need_quote = force_quote; + char *test_string; + bool same_encoding; + int mblen; + int i; + + same_encoding = (server_encoding == client_encoding); + if (!same_encoding) + string = (char *) pg_server_to_client((unsigned char *) server_string, + strlen(server_string)); + else + string = server_string; + + /* have to run through the string twice, + * first time to see if it needs quoting, second to actually send it + */ + + for(test_string = string; + !need_quote && (c = *test_string) != '\0'; + test_string += mblen) + { + if (c == delimc || c == quotec || c == '\n' || c == '\r') + need_quote = true; + if (!same_encoding) + mblen = pg_encoding_mblen(client_encoding, test_string); + else + mblen = 1; + } + + if (need_quote) + CopySendChar(quotec); + + for (; (c = *string) != '\0'; string += mblen) + { + if (c == quotec || c == escapec) + CopySendChar(escapec); + + CopySendChar(c); + + if (!same_encoding) + { + /* send additional bytes of the char, if any */ + mblen = pg_encoding_mblen(client_encoding, string); + for (i = 1; i < mblen; i++) + CopySendChar(string[i]); + } + else + mblen = 1; + } + + if (need_quote) + CopySendChar(quotec); } /* Index: src/backend/parser/gram.y =================================================================== RCS file: /cvsroot/pgsql-server/src/backend/parser/gram.y,v retrieving revision 2.450 diff -c -c -r2.450 gram.y *** src/backend/parser/gram.y 5 Apr 2004 03:07:26 -0000 2.450 --- src/backend/parser/gram.y 13 Apr 2004 04:18:31 -0000 *************** *** 343,349 **** CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT COMMITTED CONSTRAINT CONSTRAINTS CONVERSION_P CONVERT COPY CREATE CREATEDB ! CREATEUSER CROSS CURRENT_DATE CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS --- 343,349 ---- CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT COMMITTED CONSTRAINT CONSTRAINTS CONVERSION_P CONVERT COPY CREATE CREATEDB ! CREATEUSER CROSS CSV CURRENT_DATE CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS *************** *** 386,391 **** --- 386,393 ---- PRECISION PRESERVE PREPARE PRIMARY PRIOR PRIVILEGES PROCEDURAL PROCEDURE + QUOTE + READ REAL RECHECK REFERENCES REINDEX RELATIVE_P RENAME REPEATABLE REPLACE RESET RESTART RESTRICT RETURNS REVOKE RIGHT ROLLBACK ROW ROWS RULE *************** *** 1356,1361 **** --- 1358,1375 ---- { $$ = makeDefElem("delimiter", (Node *)makeString($3)); } + | CSV + { + $$ = makeDefElem("csv", (Node *)makeInteger(TRUE)); + } + | QUOTE opt_as Sconst + { + $$ = makeDefElem("quote", (Node *)makeString($3)); + } + | ESCAPE opt_as Sconst + { + $$ = makeDefElem("escape", (Node *)makeString($3)); + } | NULL_P opt_as Sconst { $$ = makeDefElem("null", (Node *)makeString($3)); *************** *** 7420,7425 **** --- 7434,7440 ---- | COPY | CREATEDB | CREATEUSER + | CSV | CURSOR | CYCLE | DATABASE *************** *** 7507,7512 **** --- 7522,7528 ---- | PRIVILEGES | PROCEDURAL | PROCEDURE + | QUOTE | READ | RECHECK | REINDEX Index: src/backend/parser/keywords.c =================================================================== RCS file: /cvsroot/pgsql-server/src/backend/parser/keywords.c,v retrieving revision 1.147 diff -c -c -r1.147 keywords.c *** src/backend/parser/keywords.c 11 Mar 2004 01:47:40 -0000 1.147 --- src/backend/parser/keywords.c 13 Apr 2004 04:18:31 -0000 *************** *** 90,95 **** --- 90,96 ---- {"createdb", CREATEDB}, {"createuser", CREATEUSER}, {"cross", CROSS}, + {"csv", CSV}, {"current_date", CURRENT_DATE}, {"current_time", CURRENT_TIME}, {"current_timestamp", CURRENT_TIMESTAMP}, *************** *** 248,253 **** --- 249,255 ---- {"privileges", PRIVILEGES}, {"procedural", PROCEDURAL}, {"procedure", PROCEDURE}, + {"quote", QUOTE}, {"read", READ}, {"real", REAL}, {"recheck", RECHECK}, Index: src/bin/psql/copy.c =================================================================== RCS file: /cvsroot/pgsql-server/src/bin/psql/copy.c,v retrieving revision 1.43 diff -c -c -r1.43 copy.c *** src/bin/psql/copy.c 12 Apr 2004 15:58:52 -0000 1.43 --- src/bin/psql/copy.c 13 Apr 2004 04:18:32 -0000 *************** *** 66,73 **** --- 66,76 ---- bool from; bool binary; bool oids; + bool csv_mode; char *delim; char *null; + char *quote; + char *escape; }; *************** *** 81,86 **** --- 84,91 ---- free(ptr->file); free(ptr->delim); free(ptr->null); + free(ptr->quote); + free(ptr->escape); free(ptr); } *************** *** 277,282 **** --- 282,291 ---- { result->oids = true; } + else if (strcasecmp(token, "csv") == 0) + { + result->csv_mode = true; + } else if (strcasecmp(token, "delimiter") == 0) { token = strtokx(NULL, whitespace, NULL, "'", *************** *** 301,306 **** --- 310,339 ---- else goto error; } + else if (strcasecmp(token, "quote") == 0) + { + token = strtokx(NULL, whitespace, NULL, "'", + '\\', false, pset.encoding); + if (token && strcasecmp(token, "as") == 0) + token = strtokx(NULL, whitespace, NULL, "'", + '\\', false, pset.encoding); + if (token) + result->quote = pg_strdup(token); + else + goto error; + } + else if (strcasecmp(token, "escape") == 0) + { + token = strtokx(NULL, whitespace, NULL, "'", + '\\', false, pset.encoding); + if (token && strcasecmp(token, "as") == 0) + token = strtokx(NULL, whitespace, NULL, "'", + '\\', false, pset.encoding); + if (token) + result->escape = pg_strdup(token); + else + goto error; + } else goto error; *************** *** 340,346 **** PGresult *result; bool success; struct stat st; ! /* parse options */ options = parse_slash_copy(args); --- 373,380 ---- PGresult *result; bool success; struct stat st; ! bool with_output = false; ! /* parse options */ options = parse_slash_copy(args); *************** *** 379,390 **** --- 413,454 ---- options->delim); } + /* There is no backward-compatible CSV syntax */ if (options->null) { if (options->null[0] == '\'') appendPQExpBuffer(&query, " WITH NULL AS %s", options->null); else appendPQExpBuffer(&query, " WITH NULL AS '%s'", options->null); + with_output = true; + } + + if (options->csv_mode) + { + appendPQExpBuffer(&query, " %sCSV ", with_output ? "" : "WITH "); + with_output = true; + } + + if (options->quote) + { + if (options->quote[0] == '\'') + appendPQExpBuffer(&query, " %sQUOTE AS %s", + with_output ? "" : "WITH ", options->quote); + else + appendPQExpBuffer(&query, " %sQUOTE AS '%s'", + with_output ? "" : "WITH ", options->quote); + with_output = true; + } + + if (options->escape) + { + if (options->escape[0] == '\'') + appendPQExpBuffer(&query, " %sESCAPE AS %s", + with_output ? "" : "WITH ", options->escape); + else + appendPQExpBuffer(&query, " %sESCAPE AS '%s'", + with_output ? "" : "WITH ", options->escape); + with_output = true; } if (options->from)
pgsql-patches by date: