Re: Updated COPY CSV patch - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: Updated COPY CSV patch
Date
Msg-id 200404141741.i3EHfm616546@candle.pha.pa.us
Whole thread Raw
In response to Re: Updated COPY CSV patch  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: Updated COPY CSV patch  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-patches
Andrew Dunstan wrote:
> Again, I think this will break that property. But if that's what it
> takes to be able to import to a table with NOT NULL in at least some
> cases I could live with it. Just. But in the general case it won't work.
> Say you are importing into a table with the following defn: (a text, b
> text not null, c int). then the line 'x,,' will fail on b if '' is null,
> and will fail on c if '' is empty string. And yet this sort of output is
> exactly what is to be expected from a spreadsheet.

I thought about this case.  I think the big objection to the original
idea was trying to tie the CSV behavior to specific types.  With a
type-agnostic system like PostgreSQL, doing that will certainly fail to
catch all cases and be a maintenance nightmare.

However, your idea of testing pg_attribute.attnotnull is acceptable, I
think.  It isn't type-specific.

One way of solving the above issue is to allow comma-comma to promote to
a zero-length string if the column is NOT NULL.  One idea would be to
print a warning the first time a NOT NULL column sees a comma-comma, and
state that it will be changed to a zero-length string.  You could use
STRICT to disable that and throw an error.

I don't see why this has to be type-specific.  If a zero-length string
doesn't match an int column, you throw an error. If you see comma-comma
for an INT NOT NULL column, you have to throw an error.  There is no
proper value for the column.

You might also want to do that anyway in some cases even if the column
isn't NOT NULL.  This is where your FORCE col1, col2 came from, where
you can specify which columns should be NULL and which zero-length
strings.

I guess the issue is can we get a single specification for the entire
table, or do we need per-column specifications?

Basically, we are trying to over-load the CSV system to store null
information by using the distinction of whether a column has quote-quote
or nothing.

Let's look at your example again:

    CREATE TABLE test (col1 TEXT NOT NULL, col2 INT)

and you have a CSV input file of:

    ,

If we say both are NULL, it fails, and if we say neither is NULL, it
fails.

Do we need control for each column?  What if we go with preferring NULL
for comma-comma, and then print warnings for NOT NULL columns and try
the promote.  If you want comma-comma to be zero-length string, you can
create the column with NOT NULL, load the file, then ALTER TABLE to
allow NULL's again.  Basically, the NOT NULL specification on the column
is the COPY CSV control method, rather than having it be in COPY.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Updated COPY CSV patch
Next
From: Andrew Dunstan
Date:
Subject: Re: Updated COPY CSV patch