Re: multiline CSV fields - Mailing list pgsql-hackers
From | Andrew Dunstan |
---|---|
Subject | Re: multiline CSV fields |
Date | |
Msg-id | 41AFAC25.3080405@dunslane.net Whole thread Raw |
In response to | Re: multiline CSV fields (Andrew Dunstan <andrew@dunslane.net>) |
Responses |
Re: multiline CSV fields
(Tom Lane <tgl@sss.pgh.pa.us>)
Re: [PATCHES] multiline CSV fields (Bruce Momjian <pgman@candle.pha.pa.us>) |
List | pgsql-hackers |
I wrote: > > If it bothers you that much. I'd make a flag, cleared at the start of > each COPY, and then where we test for CR or LF in CopyAttributeOutCSV, > if the flag is not set then set it and issue the warning. I didn't realise until Bruce told me just now that I was on the hook for this. I guess i should keep my big mouth shut. (Yeah, that's gonna happen ...) Anyway, here's a tiny patch that does what I had in mind. cheers andrew Index: copy.c =================================================================== RCS file: /home/cvsmirror/pgsql/src/backend/commands/copy.c,v retrieving revision 1.234 diff -c -r1.234 copy.c *** copy.c 6 Nov 2004 17:46:27 -0000 1.234 --- copy.c 2 Dec 2004 23:34:20 -0000 *************** *** 98,103 **** --- 98,104 ---- static EolType eol_type; /* EOL type of input */ static int client_encoding; /* remote side's character encoding */ static int server_encoding; /* local encoding */ + static bool embedded_line_warning; /* these are just for error messages, see copy_in_error_callback */ static bool copy_binary; /* is it a binary copy? */ *************** *** 1190,1195 **** --- 1191,1197 ---- attr = tupDesc->attrs; num_phys_attrs = tupDesc->natts; attr_count = list_length(attnumlist); + embedded_line_warning = false; /* * Get info about the columns we need to process. *************** *** 2627,2632 **** --- 2629,2653 ---- !use_quote && (c = *test_string) != '\0'; test_string += mblen) { + /* + * We don't know here what the surrounding line end characters + * might be. It might not even be under postgres' control. So + * we simple warn on ANY embedded line ending character. + * + * This warning will disappear when we make line parsing field-aware, + * so that we can reliably read in embedded line ending characters + * regardless of the file's line-end context. + * + */ + + if (!embedded_line_warning && (c == '\n' || c == '\r') ) + { + embedded_line_warning = true; + elog(WARNING, + "CSV fields with embedded linefeed or carriage return " + "characters might not be able to be reimported"); + } + if (c == delimc || c == quotec || c == '\n' || c == '\r') use_quote = true; if (!same_encoding)
pgsql-hackers by date: