Thread: COPY incorrectly uses null instead of an empty string in last field

COPY incorrectly uses null instead of an empty string in last field

From
Oliver Elphick
Date:
Release 7.2:

Where the last field of a line contains an empty string, COPY
incorrectly inserts a NULL.  This will cause data to be loaded
incorrectly from pg_dump.  This happens because, when a newline is seen,
COPY needs to know whether there was a preceding delimiter and use an
empty string in that case; there is a difference between a last field
that is an empty string and one or more last fields that are completely
unspecified.

Changing this should not affect COPY TO or pg_dump, which already put
out \N in the last field if it really is NULL.

For example:

--
-- Selected TOC Entries:
--
\connect  -  "olly"

--
-- TOC Entry ID 2 (OID 1522144)
--
-- Name: schau Type: TABLE Owner: olly
--

CREATE TABLE "schau" ("feld1" text NOT NULL,"feld2" character varying(10) NOT NULL,Constraint "pk_schau" Primary Key
("feld1","feld2")
 
);

--
-- Data for TOC Entry ID 3 (OID 1522144)
--
-- Name: schau Type: TABLE DATA Owner: olly
--


COPY "schau" FROM stdin;
Helge    
Arne    
Alle    Keiner
\.

I tried to make a fix, but managed to break it completely :-(

-- 
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
    "But God commendeth his love toward us, in that, while      we were yet sinners, Christ died for us."
                           Romans 5:8 
 



Re: COPY incorrectly uses null instead of an empty string in last field

From
Tom Lane
Date:
Oliver Elphick <olly@lfix.co.uk> writes:
> Where the last field of a line contains an empty string, COPY
> incorrectly inserts a NULL.

Certainly not.  Using your example table definition, I did:

regression=# insert into schau values ('test','foo');
INSERT 146293 1
regression=# insert into schau values ('test2','');
INSERT 146294 1
regression=# insert into schau values ('test2',null);
ERROR:  ExecAppend: Fail to add null value in not null attribute feld2
regression=# copy schau to '/tmp/schau.out';
COPY
regression=# delete from schau;
DELETE 2
regression=# copy schau from '/tmp/schau.out';
COPY
regression=# select * from schau ;feld1 | feld2
-------+-------test  | footest2 |
(2 rows)

regression=# select * from schau where feld2 is null;feld1 | feld2
-------+-------
(0 rows)

The contents of /tmp/schau.out are:

test    foo
test2    

(there's a tab after test2, in case you can't see it ;-))

I don't see any problem here.
        regards, tom lane


Re: COPY incorrectly uses null instead of an empty string

From
Oliver Elphick
Date:
On Tue, 2002-02-26 at 19:02, Tom Lane wrote:
> > I did it this morning on getting the Debian bug report (against 7.1.3)
> > and it failed.  I repeated it just now to check after reading your reply
> > and it succeeded, using exactly the same input file (in which there is
> > indeed a tab after the first field before the empty string.)  And I was
> > certainly running 7.2 both times.
> 
> What the heck?  Something is weird there.  Maybe there is some
> additional condition needed to cause a problem.  Can you send me the bug
> report?

I will if I can make it happen again.

The only thing I can think of at the moment is that this morning I may
have used cut-and-paste to write the input file or to read in the
commands, either of which would have lost the tabs.

The original reporter's mail had no tabs at all, but it was not an
attachment and it had had lost all its tabs along the way.  His input
script came from 7.0 pg_dump.  I am following up with him whether there
are actually tabs before the empty fields in that pg_dump output.

-- 
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
    "But God commendeth his love toward us, in that, while      we were yet sinners, Christ died for us."
                           Romans 5:8 
 



Re: COPY incorrectly uses null instead of an empty string in last field

From
Tom Lane
Date:
> The original reporter's mail had no tabs at all, but it was not an
> attachment and it had had lost all its tabs along the way.  His input
> script came from 7.0 pg_dump.  I am following up with him whether there
> are actually tabs before the empty fields in that pg_dump output.

If a trailing tab got lost in a dump file, then COPY IN would silently
assume that fields after the tab position should be NULL.  I suspect
that that is what happened here.  Probably some "helpful" program
stripped trailing whitespace from the file.

One of the things we've agreed to do in 7.3 is change COPY IN to remove
that assumption --- a line with too few fields (too few tabs) will draw
an error report instead of silently doing what's likely the wrong thing.
        regards, tom lane


Re: COPY incorrectly uses null instead of an empty string in last field

From
"Zeugswetter Andreas SB SD"
Date:
> One of the things we've agreed to do in 7.3 is change COPY IN to remove
> that assumption --- a line with too few fields (too few tabs) will draw
> an error report instead of silently doing what's likely the wrong thing.

But there will be new syntax for COPY, that allows missing trailing columns.
I hope.

Andreas


Re: COPY incorrectly uses null instead of an empty string in last field

From
Tom Lane
Date:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
>> One of the things we've agreed to do in 7.3 is change COPY IN to remove
>> that assumption --- a line with too few fields (too few tabs) will draw
>> an error report instead of silently doing what's likely the wrong thing.

> But there will be new syntax for COPY, that allows missing trailing columns. 
> I hope.

Why?
        regards, tom lane


Re: COPY incorrectly uses null instead of an empty string in last field

From
Tom Lane
Date:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> But there will be new syntax for COPY, that allows missing trailing columns. 
> I hope.
>> 
>> Why?

> Well, good question. For one for backwards compatibility.

It's an undocumented feature.  How many people are likely to be using it?

> I guess I would prefer COPY syntax that allows you to specify columns
> as has been previously discussed.

Yes, that's on the to-do list as well.  But no matter what the expected
set of columns is, COPY ought to complain if a line is missing some
fields.
        regards, tom lane


Re: COPY incorrectly uses null instead of an empty string in last field

From
"Zeugswetter Andreas SB SD"
Date:
> >> One of the things we've agreed to do in 7.3 is change COPY IN to remove
> >> that assumption --- a line with too few fields (too few tabs) will draw
> >> an error report instead of silently doing what's likely the wrong thing.
>
> > But there will be new syntax for COPY, that allows missing trailing columns.
> > I hope.
>
> Why?

Well, good question. For one for backwards compatibility.
I guess I would prefer COPY syntax that allows you to specify columns
as has been previously discussed. Having that, that would be sufficient
and safer than only a switch.

COPY 'afile' to atab (a1, a3, a5, a2)

Andreas