Thread: bug in COPY
This behavior doesn't look right: nconway=# create table foo (a int default 50, b int default 100); CREATE TABLE nconway=# copy foo from stdin; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> >> \. nconway=# select * from foo;a | b ---+---0 | (1 row) (The first line of the COPY input is blank: i.e. just a newline) The problem appears to be in both 7.2.1 and current CVS. Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
nconway@klamath.dyndns.org (Neil Conway) writes: > This behavior doesn't look right: It's not, but I believe the correct point of view is that the input data is defective and should be rejected. See past discussions leading up to the TODO item that mentions rejecting COPY input rows with the wrong number of fields (rather than silently filling with NULLs as we do now). A subsidiary point here is that pg_atoi() explicitly takes a zero-length string as valid input of value 0. I think this is quite bogus myself, but I don't know why that behavior was put in or whether we'd be breaking anything if we tightened it up. regards, tom lane
On Wed, Jul 24, 2002 at 04:23:56PM -0400, Tom Lane wrote: > nconway@klamath.dyndns.org (Neil Conway) writes: > > This behavior doesn't look right: > > It's not, but I believe the correct point of view is that the input > data is defective and should be rejected. See past discussions > leading up to the TODO item that mentions rejecting COPY input rows > with the wrong number of fields (rather than silently filling with > NULLs as we do now). Yeah, I was thinking that too. Now that we have column lists in COPY, there is no need to keep this functionality around: if the user wants to load data that is missing a column, they can just omit the column from the column list and have the column default inserted (which is a lot more sensible than inserting NULL). Unfortunately, I think that removing this properly will require refactoring some of the COPY code. I'll take a look at implementing this... Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
nconway@klamath.dyndns.org (Neil Conway) writes: >> leading up to the TODO item that mentions rejecting COPY input rows >> with the wrong number of fields (rather than silently filling with >> NULLs as we do now). > Yeah, I was thinking that too. Now that we have column lists in > COPY, there is no need to keep this functionality around: if the > user wants to load data that is missing a column, they can just > omit the column from the column list and have the column default > inserted (which is a lot more sensible than inserting NULL). Right, that was the thinking exactly. > Unfortunately, I think that removing this properly will require > refactoring some of the COPY code. I'll take a look at implementing > this... I thought it could be hacked in pretty easily, but if you want to clean up the structure while you're in there, go for it. regards, tom lane
Tom Lane wrote: > nconway@klamath.dyndns.org (Neil Conway) writes: > > This behavior doesn't look right: > > It's not, but I believe the correct point of view is that the input > data is defective and should be rejected. See past discussions > leading up to the TODO item that mentions rejecting COPY input rows > with the wrong number of fields (rather than silently filling with > NULLs as we do now). > > A subsidiary point here is that pg_atoi() explicitly takes a zero-length > string as valid input of value 0. I think this is quite bogus myself, > but I don't know why that behavior was put in or whether we'd be breaking > anything if we tightened it up. Yea, I found the atoi zero-length behavior when I tried to clean up the use of strtol() recently. If you remove that behavior, the regression tests fail. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom Lane wrote: > A subsidiary point here is that pg_atoi() explicitly takes a zero-length > string as valid input of value 0. I think this is quite bogus myself, > but I don't know why that behavior was put in or whether we'd be breaking > anything if we tightened it up. I have attached a patch the throws an error if pg_atoi() is passed a zero-length string, and have included regression diffs showing the effects of the patch. Seems the new code catches a few places that were bad, like defineing {} for an array of 0,0. The copy2.out change is because pg_atoi catches the problem before COPY does. The tightening up of pg_atoi seems safe and makes sense to me. If no adverse comments, I will apply and fix up the regression results. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: src/backend/utils/adt/numutils.c =================================================================== RCS file: /cvsroot/pgsql-server/src/backend/utils/adt/numutils.c,v retrieving revision 1.51 diff -c -c -r1.51 numutils.c *** src/backend/utils/adt/numutils.c 16 Jul 2002 18:34:16 -0000 1.51 --- src/backend/utils/adt/numutils.c 27 Aug 2002 17:51:45 -0000 *************** *** 60,66 **** if (s == (char *) NULL) elog(ERROR, "pg_atoi: NULL pointer!"); else if (*s == 0) ! l = (long) 0; else l = strtol(s, &badp, 10); --- 60,66 ---- if (s == (char *) NULL) elog(ERROR, "pg_atoi: NULL pointer!"); else if (*s == 0) ! elog(ERROR, "pg_atoi: zero-length string!"); else l = strtol(s, &badp, 10); *** ./expected/arrays.out Thu Nov 29 16:02:41 2001 --- ./results/arrays.out Tue Aug 27 14:14:47 2002 *************** *** 16,21 **** --- 16,22 ---- -- INSERT INTO arrtest (a[5], b[2][1][2], c, d, f, g) VALUES ('{1,2,3,4,5}', '{{{},{1,2}}}', '{}', '{}', '{}', '{}'); + ERROR: pg_atoi: zero-length string! UPDATE arrtest SET e[0] = '1.1'; UPDATE arrtest SET e[1] = '2.2'; INSERT INTO arrtest (f) *************** *** 29,39 **** VALUES ('{}', '{3,4}', '{foo,bar}', '{bar,foo}'); SELECT * FROM arrtest; a | b | c | d | e | f | g ! -------------+-----------------+-----------+---------------+-----------+-----------------+------------- ! {1,2,3,4,5} | {{{0,0},{1,2}}} | {} | {} | | {} | {} {11,12,23} | {{3,4},{4,5}} | {foobar} | {{elt1,elt2}} | {3.4,6.7} | {"abc ",abcde} | {abc,abcde} {} | {3,4} | {foo,bar} | {bar,foo} | | | ! (3 rows) SELECT arrtest.a[1], arrtest.b[1][1][1], --- 30,39 ---- VALUES ('{}', '{3,4}', '{foo,bar}', '{bar,foo}'); SELECT * FROM arrtest; a | b | c | d | e | f | g ! ------------+---------------+-----------+---------------+-----------+-----------------+------------- {11,12,23} | {{3,4},{4,5}} | {foobar} | {{elt1,elt2}} | {3.4,6.7} | {"abc ",abcde} | {abc,abcde} {} | {3,4} | {foo,bar} | {bar,foo} | | | ! (2 rows) SELECT arrtest.a[1], arrtest.b[1][1][1], *************** *** 43,61 **** FROM arrtest; a | b | c | d | e ----+---+--------+------+--- - 1 | 0 | | | 11 | | foobar | elt1 | | | foo | | ! (3 rows) SELECT a[1], b[1][1][1], c[1], d[1][1], e[0] FROM arrtest; a | b | c | d | e ----+---+--------+------+--- - 1 | 0 | | | 11 | | foobar | elt1 | | | foo | | ! (3 rows) SELECT a[1:3], b[1:1][1:2][1:2], --- 43,59 ---- FROM arrtest; a | b | c | d | e ----+---+--------+------+--- 11 | | foobar | elt1 | | | foo | | ! (2 rows) SELECT a[1], b[1][1][1], c[1], d[1][1], e[0] FROM arrtest; a | b | c | d | e ----+---+--------+------+--- 11 | | foobar | elt1 | | | foo | | ! (2 rows) SELECT a[1:3], b[1:1][1:2][1:2], *************** *** 63,82 **** d[1:1][1:2] FROM arrtest; a | b | c | d ! ------------+-----------------+-----------+--------------- ! {1,2,3} | {{{0,0},{1,2}}} | | {11,12,23} | | {foobar} | {{elt1,elt2}} | | {foo,bar} | ! (3 rows) SELECT array_dims(a) AS a,array_dims(b) AS b,array_dims(c) AS c FROM arrtest; a | b | c ! -------+-----------------+------- ! [1:5] | [1:1][1:2][1:2] | [1:3] | [1:2][1:2] | [1:1] | [1:2] | [1:2] ! (3 rows) -- returns nothing SELECT * --- 61,78 ---- d[1:1][1:2] FROM arrtest; a | b | c | d ! ------------+---+-----------+--------------- {11,12,23} | | {foobar} | {{elt1,elt2}} | | {foo,bar} | ! (2 rows) SELECT array_dims(a) AS a,array_dims(b) AS b,array_dims(c) AS c FROM arrtest; a | b | c ! -------+------------+------- [1:3] | [1:2][1:2] | [1:1] | [1:2] | [1:2] ! (2 rows) -- returns nothing SELECT * *************** *** 99,109 **** WHERE array_dims(c) is not null; SELECT a,b,c FROM arrtest; a | b | c ! ---------------+-----------------------+------------------- ! {16,25,3,4,5} | {{{113,142},{1,147}}} | {} {} | {3,4} | {foo,new_word} {16,25,23} | {{3,4},{4,5}} | {foobar,new_word} ! (3 rows) SELECT a[1:3], b[1:1][1:2][1:2], --- 95,104 ---- WHERE array_dims(c) is not null; SELECT a,b,c FROM arrtest; a | b | c ! ------------+---------------+------------------- {} | {3,4} | {foo,new_word} {16,25,23} | {{3,4},{4,5}} | {foobar,new_word} ! (2 rows) SELECT a[1:3], b[1:1][1:2][1:2], *************** *** 111,119 **** d[1:1][2:2] FROM arrtest; a | b | c | d ! ------------+-----------------------+-------------------+---------- ! {16,25,3} | {{{113,142},{1,147}}} | | | | {foo,new_word} | {16,25,23} | | {foobar,new_word} | {{elt2}} ! (3 rows) --- 106,113 ---- d[1:1][2:2] FROM arrtest; a | b | c | d ! ------------+---+-------------------+---------- | | {foo,new_word} | {16,25,23} | | {foobar,new_word} | {{elt2}} ! (2 rows) ====================================================================== *** ./expected/copy2.out Wed Aug 21 22:25:28 2002 --- ./results/copy2.out Tue Aug 27 14:15:50 2002 *************** *** 34,40 **** ERROR: Attribute "d" specified more than once -- missing data: should fail COPY x from stdin; ! ERROR: copy: line 1, Missing data for column "b" lost synchronization with server, resetting connection COPY x from stdin; ERROR: copy: line 1, Missing data for column "e" --- 34,40 ---- ERROR: Attribute "d" specified more than once -- missing data: should fail COPY x from stdin; ! ERROR: copy: line 1, pg_atoi: zero-length string! lost synchronization with server, resetting connection COPY x from stdin; ERROR: copy: line 1, Missing data for column "e" ======================================================================
Bruce Momjian <pgman@candle.pha.pa.us> writes: > else if (*s == 0) > ! elog(ERROR, "pg_atoi: zero-length string!"); The exclamation point seems inappropriate. Perhaps "zero-length input" would be better than "string" also. Otherwise this seems like a reasonable thing to do. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > else if (*s == 0) > > ! elog(ERROR, "pg_atoi: zero-length string!"); > > The exclamation point seems inappropriate. Perhaps "zero-length input" > would be better than "string" also. I copied the other test case: if (s == (char *) NULL) elog(ERROR, "pg_atoi: NULL pointer!"); I removed them both '!'. > Otherwise this seems like a reasonable thing to do. OK, will apply now. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> The exclamation point seems inappropriate. Perhaps "zero-length input" >> would be better than "string" also. > I copied the other test case: > if (s == (char *) NULL) > elog(ERROR, "pg_atoi: NULL pointer!"); Well, the NULL-pointer test might equally well be coded as an Assert: it's to catch backend coding errors, not cases of incorrect user input. So the exclamation point there didn't bother me. > I removed them both '!'. If you like. But the two conditions are not comparable. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> The exclamation point seems inappropriate. Perhaps "zero-length input" > >> would be better than "string" also. > > > I copied the other test case: > > > if (s == (char *) NULL) > > elog(ERROR, "pg_atoi: NULL pointer!"); > > Well, the NULL-pointer test might equally well be coded as an Assert: > it's to catch backend coding errors, not cases of incorrect user input. > So the exclamation point there didn't bother me. You really think it should be Assert? I don't mind opening the chance to throw an error for bad input, but are you sure that a NULL should never get there? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Patch applied. --------------------------------------------------------------------------- Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> The exclamation point seems inappropriate. Perhaps "zero-length input" > >> would be better than "string" also. > > > I copied the other test case: > > > if (s == (char *) NULL) > > elog(ERROR, "pg_atoi: NULL pointer!"); > > Well, the NULL-pointer test might equally well be coded as an Assert: > it's to catch backend coding errors, not cases of incorrect user input. > So the exclamation point there didn't bother me. > > > I removed them both '!'. > > If you like. But the two conditions are not comparable. > > regards, tom lane > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: src/backend/utils/adt/numutils.c =================================================================== RCS file: /cvsroot/pgsql-server/src/backend/utils/adt/numutils.c,v retrieving revision 1.51 diff -c -c -r1.51 numutils.c *** src/backend/utils/adt/numutils.c 16 Jul 2002 18:34:16 -0000 1.51 --- src/backend/utils/adt/numutils.c 27 Aug 2002 20:28:41 -0000 *************** *** 58,66 **** */ if (s == (char *) NULL) ! elog(ERROR, "pg_atoi: NULL pointer!"); else if (*s == 0) ! l = (long) 0; else l = strtol(s, &badp, 10); --- 58,66 ---- */ if (s == (char *) NULL) ! elog(ERROR, "pg_atoi: NULL pointer"); else if (*s == 0) ! elog(ERROR, "pg_atoi: zero-length string"); else l = strtol(s, &badp, 10); Index: src/test/regress/expected/arrays.out =================================================================== RCS file: /cvsroot/pgsql-server/src/test/regress/expected/arrays.out,v retrieving revision 1.9 diff -c -c -r1.9 arrays.out *** src/test/regress/expected/arrays.out 29 Nov 2001 21:02:41 -0000 1.9 --- src/test/regress/expected/arrays.out 27 Aug 2002 20:28:45 -0000 *************** *** 15,21 **** -- 'e' is also a large object. -- INSERT INTO arrtest (a[5], b[2][1][2], c, d, f, g) ! VALUES ('{1,2,3,4,5}', '{{{},{1,2}}}', '{}', '{}', '{}', '{}'); UPDATE arrtest SET e[0] = '1.1'; UPDATE arrtest SET e[1] = '2.2'; INSERT INTO arrtest (f) --- 15,21 ---- -- 'e' is also a large object. -- INSERT INTO arrtest (a[5], b[2][1][2], c, d, f, g) ! VALUES ('{1,2,3,4,5}', '{{{0,0},{1,2}}}', '{}', '{}', '{}', '{}'); UPDATE arrtest SET e[0] = '1.1'; UPDATE arrtest SET e[1] = '2.2'; INSERT INTO arrtest (f) Index: src/test/regress/expected/copy2.out =================================================================== RCS file: /cvsroot/pgsql-server/src/test/regress/expected/copy2.out,v retrieving revision 1.6 diff -c -c -r1.6 copy2.out *** src/test/regress/expected/copy2.out 22 Aug 2002 00:01:51 -0000 1.6 --- src/test/regress/expected/copy2.out 27 Aug 2002 20:28:45 -0000 *************** *** 34,40 **** ERROR: Attribute "d" specified more than once -- missing data: should fail COPY x from stdin; ! ERROR: copy: line 1, Missing data for column "b" lost synchronization with server, resetting connection COPY x from stdin; ERROR: copy: line 1, Missing data for column "e" --- 34,40 ---- ERROR: Attribute "d" specified more than once -- missing data: should fail COPY x from stdin; ! ERROR: copy: line 1, pg_atoi: zero-length string lost synchronization with server, resetting connection COPY x from stdin; ERROR: copy: line 1, Missing data for column "e" Index: src/test/regress/sql/arrays.sql =================================================================== RCS file: /cvsroot/pgsql-server/src/test/regress/sql/arrays.sql,v retrieving revision 1.8 diff -c -c -r1.8 arrays.sql *** src/test/regress/sql/arrays.sql 21 May 2001 16:54:46 -0000 1.8 --- src/test/regress/sql/arrays.sql 27 Aug 2002 20:28:45 -0000 *************** *** 18,24 **** -- INSERT INTO arrtest (a[5], b[2][1][2], c, d, f, g) ! VALUES ('{1,2,3,4,5}', '{{{},{1,2}}}', '{}', '{}', '{}', '{}'); UPDATE arrtest SET e[0] = '1.1'; --- 18,24 ---- -- INSERT INTO arrtest (a[5], b[2][1][2], c, d, f, g) ! VALUES ('{1,2,3,4,5}', '{{{0,0},{1,2}}}', '{}', '{}', '{}', '{}'); UPDATE arrtest SET e[0] = '1.1';
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> Well, the NULL-pointer test might equally well be coded as an Assert: > You really think it should be Assert? I don't feel a need to change it, no. I was just pointing out that there shouldn't be any way for a user to cause that condition to occur. regards, tom lane