Thread: bug in COPY

bug in COPY

From
nconway@klamath.dyndns.org (Neil Conway)
Date:
This behavior doesn't look right:

nconway=# create table foo (a int default 50, b int default 100);
CREATE TABLE
nconway=# copy foo from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 
>> \.
nconway=# select * from foo;a | b 
---+---0 |  
(1 row)

(The first line of the COPY input is blank: i.e. just a newline)

The problem appears to be in both 7.2.1 and current CVS.

Cheers,

Neil

-- 
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC


Re: bug in COPY

From
Tom Lane
Date:
nconway@klamath.dyndns.org (Neil Conway) writes:
> This behavior doesn't look right:

It's not, but I believe the correct point of view is that the input
data is defective and should be rejected.  See past discussions
leading up to the TODO item that mentions rejecting COPY input rows
with the wrong number of fields (rather than silently filling with
NULLs as we do now).

A subsidiary point here is that pg_atoi() explicitly takes a zero-length
string as valid input of value 0.  I think this is quite bogus myself,
but I don't know why that behavior was put in or whether we'd be breaking
anything if we tightened it up.
        regards, tom lane


Re: bug in COPY

From
nconway@klamath.dyndns.org (Neil Conway)
Date:
On Wed, Jul 24, 2002 at 04:23:56PM -0400, Tom Lane wrote:
> nconway@klamath.dyndns.org (Neil Conway) writes:
> > This behavior doesn't look right:
> 
> It's not, but I believe the correct point of view is that the input
> data is defective and should be rejected.  See past discussions
> leading up to the TODO item that mentions rejecting COPY input rows
> with the wrong number of fields (rather than silently filling with
> NULLs as we do now).

Yeah, I was thinking that too. Now that we have column lists in
COPY, there is no need to keep this functionality around: if the
user wants to load data that is missing a column, they can just
omit the column from the column list and have the column default
inserted (which is a lot more sensible than inserting NULL).

Unfortunately, I think that removing this properly will require
refactoring some of the COPY code. I'll take a look at implementing
this...

Cheers,

Neil

-- 
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC


Re: bug in COPY

From
Tom Lane
Date:
nconway@klamath.dyndns.org (Neil Conway) writes:
>> leading up to the TODO item that mentions rejecting COPY input rows
>> with the wrong number of fields (rather than silently filling with
>> NULLs as we do now).

> Yeah, I was thinking that too. Now that we have column lists in
> COPY, there is no need to keep this functionality around: if the
> user wants to load data that is missing a column, they can just
> omit the column from the column list and have the column default
> inserted (which is a lot more sensible than inserting NULL).

Right, that was the thinking exactly.

> Unfortunately, I think that removing this properly will require
> refactoring some of the COPY code. I'll take a look at implementing
> this...

I thought it could be hacked in pretty easily, but if you want to
clean up the structure while you're in there, go for it.
        regards, tom lane


Re: bug in COPY

From
Bruce Momjian
Date:
Tom Lane wrote:
> nconway@klamath.dyndns.org (Neil Conway) writes:
> > This behavior doesn't look right:
> 
> It's not, but I believe the correct point of view is that the input
> data is defective and should be rejected.  See past discussions
> leading up to the TODO item that mentions rejecting COPY input rows
> with the wrong number of fields (rather than silently filling with
> NULLs as we do now).
> 
> A subsidiary point here is that pg_atoi() explicitly takes a zero-length
> string as valid input of value 0.  I think this is quite bogus myself,
> but I don't know why that behavior was put in or whether we'd be breaking
> anything if we tightened it up.

Yea, I found the atoi zero-length behavior when I tried to clean up the
use of strtol() recently.  If you remove that behavior, the regression
tests fail.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: bug in COPY

From
Bruce Momjian
Date:
Tom Lane wrote:
> A subsidiary point here is that pg_atoi() explicitly takes a zero-length
> string as valid input of value 0.  I think this is quite bogus myself,
> but I don't know why that behavior was put in or whether we'd be breaking
> anything if we tightened it up.

I have attached a patch the throws an error if pg_atoi() is passed a
zero-length string, and have included regression diffs showing the
effects of the patch.

Seems the new code catches a few places that were bad, like defineing {}
for an array of 0,0.  The copy2.out change is because pg_atoi catches
the problem before COPY does.

The tightening up of pg_atoi seems safe and makes sense to me.

If no adverse comments, I will apply and fix up the regression results.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: src/backend/utils/adt/numutils.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/utils/adt/numutils.c,v
retrieving revision 1.51
diff -c -c -r1.51 numutils.c
*** src/backend/utils/adt/numutils.c    16 Jul 2002 18:34:16 -0000    1.51
--- src/backend/utils/adt/numutils.c    27 Aug 2002 17:51:45 -0000
***************
*** 60,66 ****
      if (s == (char *) NULL)
          elog(ERROR, "pg_atoi: NULL pointer!");
      else if (*s == 0)
!         l = (long) 0;
      else
          l = strtol(s, &badp, 10);

--- 60,66 ----
      if (s == (char *) NULL)
          elog(ERROR, "pg_atoi: NULL pointer!");
      else if (*s == 0)
!         elog(ERROR, "pg_atoi: zero-length string!");
      else
          l = strtol(s, &badp, 10);

*** ./expected/arrays.out    Thu Nov 29 16:02:41 2001
--- ./results/arrays.out    Tue Aug 27 14:14:47 2002
***************
*** 16,21 ****
--- 16,22 ----
  --
  INSERT INTO arrtest (a[5], b[2][1][2], c, d, f, g)
     VALUES ('{1,2,3,4,5}', '{{{},{1,2}}}', '{}', '{}', '{}', '{}');
+ ERROR:  pg_atoi: zero-length string!
  UPDATE arrtest SET e[0] = '1.1';
  UPDATE arrtest SET e[1] = '2.2';
  INSERT INTO arrtest (f)
***************
*** 29,39 ****
     VALUES ('{}', '{3,4}', '{foo,bar}', '{bar,foo}');
  SELECT * FROM arrtest;
        a      |        b        |     c     |       d       |     e     |        f        |      g
! -------------+-----------------+-----------+---------------+-----------+-----------------+-------------
!  {1,2,3,4,5} | {{{0,0},{1,2}}} | {}        | {}            |           | {}              | {}
   {11,12,23}  | {{3,4},{4,5}}   | {foobar}  | {{elt1,elt2}} | {3.4,6.7} | {"abc  ",abcde} | {abc,abcde}
   {}          | {3,4}           | {foo,bar} | {bar,foo}     |           |                 |
! (3 rows)

  SELECT arrtest.a[1],
            arrtest.b[1][1][1],
--- 30,39 ----
     VALUES ('{}', '{3,4}', '{foo,bar}', '{bar,foo}');
  SELECT * FROM arrtest;
       a      |       b       |     c     |       d       |     e     |        f        |      g
! ------------+---------------+-----------+---------------+-----------+-----------------+-------------
   {11,12,23} | {{3,4},{4,5}} | {foobar}  | {{elt1,elt2}} | {3.4,6.7} | {"abc  ",abcde} | {abc,abcde}
   {}         | {3,4}         | {foo,bar} | {bar,foo}     |           |                 |
! (2 rows)

  SELECT arrtest.a[1],
            arrtest.b[1][1][1],
***************
*** 43,61 ****
     FROM arrtest;
   a  | b |   c    |  d   | e
  ----+---+--------+------+---
-   1 | 0 |        |      |
   11 |   | foobar | elt1 |
      |   | foo    |      |
! (3 rows)

  SELECT a[1], b[1][1][1], c[1], d[1][1], e[0]
     FROM arrtest;
   a  | b |   c    |  d   | e
  ----+---+--------+------+---
-   1 | 0 |        |      |
   11 |   | foobar | elt1 |
      |   | foo    |      |
! (3 rows)

  SELECT a[1:3],
            b[1:1][1:2][1:2],
--- 43,59 ----
     FROM arrtest;
   a  | b |   c    |  d   | e
  ----+---+--------+------+---
   11 |   | foobar | elt1 |
      |   | foo    |      |
! (2 rows)

  SELECT a[1], b[1][1][1], c[1], d[1][1], e[0]
     FROM arrtest;
   a  | b |   c    |  d   | e
  ----+---+--------+------+---
   11 |   | foobar | elt1 |
      |   | foo    |      |
! (2 rows)

  SELECT a[1:3],
            b[1:1][1:2][1:2],
***************
*** 63,82 ****
            d[1:1][1:2]
     FROM arrtest;
       a      |        b        |     c     |       d
! ------------+-----------------+-----------+---------------
!  {1,2,3}    | {{{0,0},{1,2}}} |           |
   {11,12,23} |                 | {foobar}  | {{elt1,elt2}}
              |                 | {foo,bar} |
! (3 rows)

  SELECT array_dims(a) AS a,array_dims(b) AS b,array_dims(c) AS c
     FROM arrtest;
     a   |        b        |   c
! -------+-----------------+-------
!  [1:5] | [1:1][1:2][1:2] |
   [1:3] | [1:2][1:2]      | [1:1]
         | [1:2]           | [1:2]
! (3 rows)

  -- returns nothing
  SELECT *
--- 61,78 ----
            d[1:1][1:2]
     FROM arrtest;
       a      | b |     c     |       d
! ------------+---+-----------+---------------
   {11,12,23} |   | {foobar}  | {{elt1,elt2}}
              |   | {foo,bar} |
! (2 rows)

  SELECT array_dims(a) AS a,array_dims(b) AS b,array_dims(c) AS c
     FROM arrtest;
     a   |     b      |   c
! -------+------------+-------
   [1:3] | [1:2][1:2] | [1:1]
         | [1:2]      | [1:2]
! (2 rows)

  -- returns nothing
  SELECT *
***************
*** 99,109 ****
    WHERE array_dims(c) is not null;
  SELECT a,b,c FROM arrtest;
         a       |           b           |         c
! ---------------+-----------------------+-------------------
!  {16,25,3,4,5} | {{{113,142},{1,147}}} | {}
   {}            | {3,4}                 | {foo,new_word}
   {16,25,23}    | {{3,4},{4,5}}         | {foobar,new_word}
! (3 rows)

  SELECT a[1:3],
            b[1:1][1:2][1:2],
--- 95,104 ----
    WHERE array_dims(c) is not null;
  SELECT a,b,c FROM arrtest;
       a      |       b       |         c
! ------------+---------------+-------------------
   {}         | {3,4}         | {foo,new_word}
   {16,25,23} | {{3,4},{4,5}} | {foobar,new_word}
! (2 rows)

  SELECT a[1:3],
            b[1:1][1:2][1:2],
***************
*** 111,119 ****
            d[1:1][2:2]
     FROM arrtest;
       a      |           b           |         c         |    d
! ------------+-----------------------+-------------------+----------
!  {16,25,3}  | {{{113,142},{1,147}}} |                   |
              |                       | {foo,new_word}    |
   {16,25,23} |                       | {foobar,new_word} | {{elt2}}
! (3 rows)

--- 106,113 ----
            d[1:1][2:2]
     FROM arrtest;
       a      | b |         c         |    d
! ------------+---+-------------------+----------
              |   | {foo,new_word}    |
   {16,25,23} |   | {foobar,new_word} | {{elt2}}
! (2 rows)


======================================================================

*** ./expected/copy2.out    Wed Aug 21 22:25:28 2002
--- ./results/copy2.out    Tue Aug 27 14:15:50 2002
***************
*** 34,40 ****
  ERROR:  Attribute "d" specified more than once
  -- missing data: should fail
  COPY x from stdin;
! ERROR:  copy: line 1, Missing data for column "b"
  lost synchronization with server, resetting connection
  COPY x from stdin;
  ERROR:  copy: line 1, Missing data for column "e"
--- 34,40 ----
  ERROR:  Attribute "d" specified more than once
  -- missing data: should fail
  COPY x from stdin;
! ERROR:  copy: line 1, pg_atoi: zero-length string!
  lost synchronization with server, resetting connection
  COPY x from stdin;
  ERROR:  copy: line 1, Missing data for column "e"

======================================================================


Re: bug in COPY

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>       else if (*s == 0)
> !         elog(ERROR, "pg_atoi: zero-length string!");

The exclamation point seems inappropriate.  Perhaps "zero-length input"
would be better than "string" also.

Otherwise this seems like a reasonable thing to do.
        regards, tom lane


Re: bug in COPY

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >       else if (*s == 0)
> > !         elog(ERROR, "pg_atoi: zero-length string!");
> 
> The exclamation point seems inappropriate.  Perhaps "zero-length input"
> would be better than "string" also.

I copied the other test case:
   if (s == (char *) NULL)       elog(ERROR, "pg_atoi: NULL pointer!"); 

I removed them both '!'.

> Otherwise this seems like a reasonable thing to do.

OK, will apply now.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: bug in COPY

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> The exclamation point seems inappropriate.  Perhaps "zero-length input"
>> would be better than "string" also.

> I copied the other test case:

>     if (s == (char *) NULL)
>         elog(ERROR, "pg_atoi: NULL pointer!"); 

Well, the NULL-pointer test might equally well be coded as an Assert:
it's to catch backend coding errors, not cases of incorrect user input.
So the exclamation point there didn't bother me.

> I removed them both '!'.

If you like.  But the two conditions are not comparable.
        regards, tom lane


Re: bug in COPY

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> The exclamation point seems inappropriate.  Perhaps "zero-length input"
> >> would be better than "string" also.
> 
> > I copied the other test case:
> 
> >     if (s == (char *) NULL)
> >         elog(ERROR, "pg_atoi: NULL pointer!"); 
> 
> Well, the NULL-pointer test might equally well be coded as an Assert:
> it's to catch backend coding errors, not cases of incorrect user input.
> So the exclamation point there didn't bother me.


You really think it should be Assert?  I don't mind opening the chance
to throw an error for bad input, but are you sure that a NULL should
never get there?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: bug in COPY

From
Bruce Momjian
Date:
Patch applied.

---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> The exclamation point seems inappropriate.  Perhaps "zero-length input"
> >> would be better than "string" also.
>
> > I copied the other test case:
>
> >     if (s == (char *) NULL)
> >         elog(ERROR, "pg_atoi: NULL pointer!");
>
> Well, the NULL-pointer test might equally well be coded as an Assert:
> it's to catch backend coding errors, not cases of incorrect user input.
> So the exclamation point there didn't bother me.
>
> > I removed them both '!'.
>
> If you like.  But the two conditions are not comparable.
>
>             regards, tom lane
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: src/backend/utils/adt/numutils.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/utils/adt/numutils.c,v
retrieving revision 1.51
diff -c -c -r1.51 numutils.c
*** src/backend/utils/adt/numutils.c    16 Jul 2002 18:34:16 -0000    1.51
--- src/backend/utils/adt/numutils.c    27 Aug 2002 20:28:41 -0000
***************
*** 58,66 ****
       */

      if (s == (char *) NULL)
!         elog(ERROR, "pg_atoi: NULL pointer!");
      else if (*s == 0)
!         l = (long) 0;
      else
          l = strtol(s, &badp, 10);

--- 58,66 ----
       */

      if (s == (char *) NULL)
!         elog(ERROR, "pg_atoi: NULL pointer");
      else if (*s == 0)
!         elog(ERROR, "pg_atoi: zero-length string");
      else
          l = strtol(s, &badp, 10);

Index: src/test/regress/expected/arrays.out
===================================================================
RCS file: /cvsroot/pgsql-server/src/test/regress/expected/arrays.out,v
retrieving revision 1.9
diff -c -c -r1.9 arrays.out
*** src/test/regress/expected/arrays.out    29 Nov 2001 21:02:41 -0000    1.9
--- src/test/regress/expected/arrays.out    27 Aug 2002 20:28:45 -0000
***************
*** 15,21 ****
  -- 'e' is also a large object.
  --
  INSERT INTO arrtest (a[5], b[2][1][2], c, d, f, g)
!    VALUES ('{1,2,3,4,5}', '{{{},{1,2}}}', '{}', '{}', '{}', '{}');
  UPDATE arrtest SET e[0] = '1.1';
  UPDATE arrtest SET e[1] = '2.2';
  INSERT INTO arrtest (f)
--- 15,21 ----
  -- 'e' is also a large object.
  --
  INSERT INTO arrtest (a[5], b[2][1][2], c, d, f, g)
!    VALUES ('{1,2,3,4,5}', '{{{0,0},{1,2}}}', '{}', '{}', '{}', '{}');
  UPDATE arrtest SET e[0] = '1.1';
  UPDATE arrtest SET e[1] = '2.2';
  INSERT INTO arrtest (f)
Index: src/test/regress/expected/copy2.out
===================================================================
RCS file: /cvsroot/pgsql-server/src/test/regress/expected/copy2.out,v
retrieving revision 1.6
diff -c -c -r1.6 copy2.out
*** src/test/regress/expected/copy2.out    22 Aug 2002 00:01:51 -0000    1.6
--- src/test/regress/expected/copy2.out    27 Aug 2002 20:28:45 -0000
***************
*** 34,40 ****
  ERROR:  Attribute "d" specified more than once
  -- missing data: should fail
  COPY x from stdin;
! ERROR:  copy: line 1, Missing data for column "b"
  lost synchronization with server, resetting connection
  COPY x from stdin;
  ERROR:  copy: line 1, Missing data for column "e"
--- 34,40 ----
  ERROR:  Attribute "d" specified more than once
  -- missing data: should fail
  COPY x from stdin;
! ERROR:  copy: line 1, pg_atoi: zero-length string
  lost synchronization with server, resetting connection
  COPY x from stdin;
  ERROR:  copy: line 1, Missing data for column "e"
Index: src/test/regress/sql/arrays.sql
===================================================================
RCS file: /cvsroot/pgsql-server/src/test/regress/sql/arrays.sql,v
retrieving revision 1.8
diff -c -c -r1.8 arrays.sql
*** src/test/regress/sql/arrays.sql    21 May 2001 16:54:46 -0000    1.8
--- src/test/regress/sql/arrays.sql    27 Aug 2002 20:28:45 -0000
***************
*** 18,24 ****
  --

  INSERT INTO arrtest (a[5], b[2][1][2], c, d, f, g)
!    VALUES ('{1,2,3,4,5}', '{{{},{1,2}}}', '{}', '{}', '{}', '{}');

  UPDATE arrtest SET e[0] = '1.1';

--- 18,24 ----
  --

  INSERT INTO arrtest (a[5], b[2][1][2], c, d, f, g)
!    VALUES ('{1,2,3,4,5}', '{{{0,0},{1,2}}}', '{}', '{}', '{}', '{}');

  UPDATE arrtest SET e[0] = '1.1';


Re: bug in COPY

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> Well, the NULL-pointer test might equally well be coded as an Assert:

> You really think it should be Assert?

I don't feel a need to change it, no.  I was just pointing out that
there shouldn't be any way for a user to cause that condition to occur.
        regards, tom lane