Re: multiline CSV fields - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: multiline CSV fields
Date
Msg-id 41AFAC25.3080405@dunslane.net
Whole thread Raw
In response to Re: multiline CSV fields  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: multiline CSV fields  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: [PATCHES] multiline CSV fields  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers

I wrote:

>
> If it bothers you that much. I'd make a flag, cleared at the start of
> each COPY, and then where we test for CR or LF in CopyAttributeOutCSV,
> if the flag is not set then set it and issue the warning.



I didn't realise until Bruce told me just now that I was on the hook for
this. I guess i should keep my big mouth shut. (Yeah, that's gonna
happen ...)

Anyway, here's a tiny patch that does what I had in mind.

cheers

andrew
Index: copy.c
===================================================================
RCS file: /home/cvsmirror/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.234
diff -c -r1.234 copy.c
*** copy.c    6 Nov 2004 17:46:27 -0000    1.234
--- copy.c    2 Dec 2004 23:34:20 -0000
***************
*** 98,103 ****
--- 98,104 ----
  static EolType eol_type;        /* EOL type of input */
  static int    client_encoding;    /* remote side's character encoding */
  static int    server_encoding;    /* local encoding */
+ static bool embedded_line_warning;

  /* these are just for error messages, see copy_in_error_callback */
  static bool copy_binary;        /* is it a binary copy? */
***************
*** 1190,1195 ****
--- 1191,1197 ----
      attr = tupDesc->attrs;
      num_phys_attrs = tupDesc->natts;
      attr_count = list_length(attnumlist);
+     embedded_line_warning = false;

      /*
       * Get info about the columns we need to process.
***************
*** 2627,2632 ****
--- 2629,2653 ----
           !use_quote && (c = *test_string) != '\0';
           test_string += mblen)
      {
+         /*
+          * We don't know here what the surrounding line end characters
+          * might be. It might not even be under postgres' control. So
+          * we simple warn on ANY embedded line ending character.
+          *
+          * This warning will disappear when we make line parsing field-aware,
+          * so that we can reliably read in embedded line ending characters
+          * regardless of the file's line-end context.
+          *
+          */
+
+         if (!embedded_line_warning  && (c == '\n' || c == '\r') )
+         {
+             embedded_line_warning = true;
+             elog(WARNING,
+                  "CSV fields with embedded linefeed or carriage return "
+                  "characters might not be able to be reimported");
+         }
+
          if (c == delimc || c == quotec || c == '\n' || c == '\r')
              use_quote = true;
          if (!same_encoding)

pgsql-hackers by date:

Previous
From: Neil Conway
Date:
Subject: Re: nodeAgg perf tweak
Next
From: Tom Lane
Date:
Subject: Re: nodeAgg perf tweak