Thread: CSV consecutive newline bug

CSV consecutive newline bug

From
Andrew Dunstan
Date:
I have just been alerted to a bug in the 8.0 handling of embedded
newlines in CSV data. Basically it barfs on consecutive newlines. The
attached patch for 8.0 appears to fix it. The bug isn't present in the
HEAD branch, and I'm wondering if we should not backpatch the HEAD
multiline patch rather than applying this. OTOH, applying this patch
would probably be more in keeping with our conservative approach to
changes to stable branches, I guess.

cheers

andrew
Index: src/backend/commands/copy.c
===================================================================
RCS file: /home/cvsmirror/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.236
diff -c -r1.236 copy.c
*** src/backend/commands/copy.c    31 Dec 2004 21:59:41 -0000    1.236
--- src/backend/commands/copy.c    11 May 2005 21:12:07 -0000
***************
*** 2395,2400 ****
--- 2395,2401 ----
              if (done && line_buf.len == 0)
                  break;
              start_cursor = line_buf.cursor;
+             continue;
          }

          end_cursor = line_buf.cursor;

Re: CSV consecutive newline bug

From
Neil Conway
Date:
Andrew Dunstan wrote:
> I have just been alerted to a bug in the 8.0 handling of embedded
> newlines in CSV data. Basically it barfs on consecutive newlines. The
> attached patch for 8.0 appears to fix it. The bug isn't present in the
> HEAD branch, and I'm wondering if we should not backpatch the HEAD
> multiline patch rather than applying this.

Is there a particular reason to backport the larger patch? As a general
rule I'm inclined to apply minimally-invasive fixes to stable branches,
but I don't know the code in question, so perhaps there is some reason
to make an exception in this case.

Also, a regression test for this bug would be nice :)

-Neil

Re: CSV consecutive newline bug

From
Andrew Dunstan
Date:

Neil Conway wrote:

> Andrew Dunstan wrote:
>
>> I have just been alerted to a bug in the 8.0 handling of embedded
>> newlines in CSV data. Basically it barfs on consecutive newlines. The
>> attached patch for 8.0 appears to fix it. The bug isn't present in
>> the HEAD branch, and I'm wondering if we should not backpatch the
>> HEAD multiline patch rather than applying this.
>
>
> Is there a particular reason to backport the larger patch? As a
> general rule I'm inclined to apply minimally-invasive fixes to stable
> branches, but I don't know the code in question, so perhaps there is
> some reason to make an exception in this case.


Well, if I'd known we were as far away from a release as we turned out
to be at the time the original multiline limitation was discovered, I
would have submitted a patch for inclusion in 8.0. Never mind -
hindsight doesn't help much. Just go with the tiny patch. If anyone
wants the later fix it's very easy to get, because it was the first
patch applied after 8.0 branched. Just dropping in the later version of
copy.c should work.


>
> Also, a regression test for this bug would be nice :)
>
>

regression patch against 8.0 branch attached.

cheers

andrew


Index: expected/copy2.out
===================================================================
RCS file: /home/cvsmirror/pgsql/src/test/regress/expected/copy2.out,v
retrieving revision 1.19.4.1
diff -c -r1.19.4.1 copy2.out
*** expected/copy2.out    22 Jan 2005 05:14:19 -0000    1.19.4.1
--- expected/copy2.out    12 May 2005 01:08:20 -0000
***************
*** 191,196 ****
--- 191,199 ----
  "Jackson, Sam","\\h"
  "It is \"perfect\".","    "
  "",
+ --test that we read consecutive LFs properly
+ CREATE TEMP TABLE testnl (a int, b text, c int);
+ COPY testnl FROM stdin CSV;
  DROP TABLE x, y;
  DROP FUNCTION fn_x_before();
  DROP FUNCTION fn_x_after();
Index: sql/copy2.sql
===================================================================
RCS file: /home/cvsmirror/pgsql/src/test/regress/sql/copy2.sql,v
retrieving revision 1.10.4.1
diff -c -r1.10.4.1 copy2.sql
*** sql/copy2.sql    22 Jan 2005 05:14:25 -0000    1.10.4.1
--- sql/copy2.sql    12 May 2005 01:05:12 -0000
***************
*** 129,134 ****
--- 129,145 ----
  COPY y TO stdout WITH CSV QUOTE '''' DELIMITER '|';
  COPY y TO stdout WITH CSV FORCE QUOTE col2 ESCAPE '\\';

+ --test that we read consecutive LFs properly
+
+ CREATE TEMP TABLE testnl (a int, b text, c int);
+
+ COPY testnl FROM stdin CSV;
+ 1,"a field with two LFs
+
+ inside",2
+ \.
+
+
  DROP TABLE x, y;
  DROP FUNCTION fn_x_before();
  DROP FUNCTION fn_x_after();

Re: CSV consecutive newline bug

From
Neil Conway
Date:
Andrew Dunstan wrote:
> regression patch against 8.0 branch attached.

The tiny patch has been applied to REL8_0_STABLE, and the regression
test patch has been applied to both REL8_0_STABLE and HEAD. Thanks for
the patches.

-Neil