Re: [PATCHES] multiline CSV fields - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: [PATCHES] multiline CSV fields |
Date | |
Msg-id | 200412031713.iB3HDQE04288@candle.pha.pa.us Whole thread Raw |
In response to | Re: multiline CSV fields (Andrew Dunstan <andrew@dunslane.net>) |
List | pgsql-hackers |
Patch applied. Thanks. --------------------------------------------------------------------------- Andrew Dunstan wrote: > > > I wrote: > > > > > If it bothers you that much. I'd make a flag, cleared at the start of > > each COPY, and then where we test for CR or LF in CopyAttributeOutCSV, > > if the flag is not set then set it and issue the warning. > > > > I didn't realise until Bruce told me just now that I was on the hook for > this. I guess i should keep my big mouth shut. (Yeah, that's gonna > happen ...) > > Anyway, here's a tiny patch that does what I had in mind. > > cheers > > andrew [ text/x-patch is unsupported, treating like TEXT/PLAIN ] > Index: copy.c > =================================================================== > RCS file: /home/cvsmirror/pgsql/src/backend/commands/copy.c,v > retrieving revision 1.234 > diff -c -r1.234 copy.c > *** copy.c 6 Nov 2004 17:46:27 -0000 1.234 > --- copy.c 2 Dec 2004 23:34:20 -0000 > *************** > *** 98,103 **** > --- 98,104 ---- > static EolType eol_type; /* EOL type of input */ > static int client_encoding; /* remote side's character encoding */ > static int server_encoding; /* local encoding */ > + static bool embedded_line_warning; > > /* these are just for error messages, see copy_in_error_callback */ > static bool copy_binary; /* is it a binary copy? */ > *************** > *** 1190,1195 **** > --- 1191,1197 ---- > attr = tupDesc->attrs; > num_phys_attrs = tupDesc->natts; > attr_count = list_length(attnumlist); > + embedded_line_warning = false; > > /* > * Get info about the columns we need to process. > *************** > *** 2627,2632 **** > --- 2629,2653 ---- > !use_quote && (c = *test_string) != '\0'; > test_string += mblen) > { > + /* > + * We don't know here what the surrounding line end characters > + * might be. It might not even be under postgres' control. So > + * we simple warn on ANY embedded line ending character. > + * > + * This warning will disappear when we make line parsing field-aware, > + * so that we can reliably read in embedded line ending characters > + * regardless of the file's line-end context. > + * > + */ > + > + if (!embedded_line_warning && (c == '\n' || c == '\r') ) > + { > + embedded_line_warning = true; > + elog(WARNING, > + "CSV fields with embedded linefeed or carriage return " > + "characters might not be able to be reimported"); > + } > + > if (c == delimc || c == quotec || c == '\n' || c == '\r') > use_quote = true; > if (!same_encoding) > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
pgsql-hackers by date: