Thread: Re: [HACKERS] multiline CSV fields
I wrote: > > If it bothers you that much. I'd make a flag, cleared at the start of > each COPY, and then where we test for CR or LF in CopyAttributeOutCSV, > if the flag is not set then set it and issue the warning. I didn't realise until Bruce told me just now that I was on the hook for this. I guess i should keep my big mouth shut. (Yeah, that's gonna happen ...) Anyway, here's a tiny patch that does what I had in mind. cheers andrew Index: copy.c =================================================================== RCS file: /home/cvsmirror/pgsql/src/backend/commands/copy.c,v retrieving revision 1.234 diff -c -r1.234 copy.c *** copy.c 6 Nov 2004 17:46:27 -0000 1.234 --- copy.c 2 Dec 2004 23:34:20 -0000 *************** *** 98,103 **** --- 98,104 ---- static EolType eol_type; /* EOL type of input */ static int client_encoding; /* remote side's character encoding */ static int server_encoding; /* local encoding */ + static bool embedded_line_warning; /* these are just for error messages, see copy_in_error_callback */ static bool copy_binary; /* is it a binary copy? */ *************** *** 1190,1195 **** --- 1191,1197 ---- attr = tupDesc->attrs; num_phys_attrs = tupDesc->natts; attr_count = list_length(attnumlist); + embedded_line_warning = false; /* * Get info about the columns we need to process. *************** *** 2627,2632 **** --- 2629,2653 ---- !use_quote && (c = *test_string) != '\0'; test_string += mblen) { + /* + * We don't know here what the surrounding line end characters + * might be. It might not even be under postgres' control. So + * we simple warn on ANY embedded line ending character. + * + * This warning will disappear when we make line parsing field-aware, + * so that we can reliably read in embedded line ending characters + * regardless of the file's line-end context. + * + */ + + if (!embedded_line_warning && (c == '\n' || c == '\r') ) + { + embedded_line_warning = true; + elog(WARNING, + "CSV fields with embedded linefeed or carriage return " + "characters might not be able to be reimported"); + } + if (c == delimc || c == quotec || c == '\n' || c == '\r') use_quote = true; if (!same_encoding)
Andrew Dunstan <andrew@dunslane.net> writes: > + if (!embedded_line_warning && (c == '\n' || c == '\r') ) > + { > + embedded_line_warning = true; > + elog(WARNING, > + "CSV fields with embedded linefeed or carriage return " > + "characters might not be able to be reimported"); > + } What about forcibly translating them to the two-character sequences \n or \r? Or is that not considered a CSV-compatible representation? regards, tom lane
Tom Lane wrote: >Andrew Dunstan <andrew@dunslane.net> writes: > > >>+ if (!embedded_line_warning && (c == '\n' || c == '\r') ) >>+ { >>+ embedded_line_warning = true; >>+ elog(WARNING, >>+ "CSV fields with embedded linefeed or carriage return " >>+ "characters might not be able to be reimported"); >>+ } >> >> > >What about forcibly translating them to the two-character sequences \n >or \r? Or is that not considered a CSV-compatible representation? > > > > Not compatible AFAIK. Certainly not portably. And the warning would still be true, because we don't do this unescaping on the way back in. I think the way the comment in the patch suggests and previous emails have discussed is the right way to go with this - I will attend to it after we branch ;-) cheers andrew
Patch applied. Thanks. --------------------------------------------------------------------------- Andrew Dunstan wrote: > > > I wrote: > > > > > If it bothers you that much. I'd make a flag, cleared at the start of > > each COPY, and then where we test for CR or LF in CopyAttributeOutCSV, > > if the flag is not set then set it and issue the warning. > > > > I didn't realise until Bruce told me just now that I was on the hook for > this. I guess i should keep my big mouth shut. (Yeah, that's gonna > happen ...) > > Anyway, here's a tiny patch that does what I had in mind. > > cheers > > andrew [ text/x-patch is unsupported, treating like TEXT/PLAIN ] > Index: copy.c > =================================================================== > RCS file: /home/cvsmirror/pgsql/src/backend/commands/copy.c,v > retrieving revision 1.234 > diff -c -r1.234 copy.c > *** copy.c 6 Nov 2004 17:46:27 -0000 1.234 > --- copy.c 2 Dec 2004 23:34:20 -0000 > *************** > *** 98,103 **** > --- 98,104 ---- > static EolType eol_type; /* EOL type of input */ > static int client_encoding; /* remote side's character encoding */ > static int server_encoding; /* local encoding */ > + static bool embedded_line_warning; > > /* these are just for error messages, see copy_in_error_callback */ > static bool copy_binary; /* is it a binary copy? */ > *************** > *** 1190,1195 **** > --- 1191,1197 ---- > attr = tupDesc->attrs; > num_phys_attrs = tupDesc->natts; > attr_count = list_length(attnumlist); > + embedded_line_warning = false; > > /* > * Get info about the columns we need to process. > *************** > *** 2627,2632 **** > --- 2629,2653 ---- > !use_quote && (c = *test_string) != '\0'; > test_string += mblen) > { > + /* > + * We don't know here what the surrounding line end characters > + * might be. It might not even be under postgres' control. So > + * we simple warn on ANY embedded line ending character. > + * > + * This warning will disappear when we make line parsing field-aware, > + * so that we can reliably read in embedded line ending characters > + * regardless of the file's line-end context. > + * > + */ > + > + if (!embedded_line_warning && (c == '\n' || c == '\r') ) > + { > + embedded_line_warning = true; > + elog(WARNING, > + "CSV fields with embedded linefeed or carriage return " > + "characters might not be able to be reimported"); > + } > + > if (c == delimc || c == quotec || c == '\n' || c == '\r') > use_quote = true; > if (!same_encoding) > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073