Re: Updated COPY CSV patch - Mailing list pgsql-patches
From | Bruce Momjian |
---|---|
Subject | Re: Updated COPY CSV patch |
Date | |
Msg-id | 200404141741.i3EHfm616546@candle.pha.pa.us Whole thread Raw |
In response to | Re: Updated COPY CSV patch (Andrew Dunstan <andrew@dunslane.net>) |
Responses |
Re: Updated COPY CSV patch
|
List | pgsql-patches |
Andrew Dunstan wrote: > Again, I think this will break that property. But if that's what it > takes to be able to import to a table with NOT NULL in at least some > cases I could live with it. Just. But in the general case it won't work. > Say you are importing into a table with the following defn: (a text, b > text not null, c int). then the line 'x,,' will fail on b if '' is null, > and will fail on c if '' is empty string. And yet this sort of output is > exactly what is to be expected from a spreadsheet. I thought about this case. I think the big objection to the original idea was trying to tie the CSV behavior to specific types. With a type-agnostic system like PostgreSQL, doing that will certainly fail to catch all cases and be a maintenance nightmare. However, your idea of testing pg_attribute.attnotnull is acceptable, I think. It isn't type-specific. One way of solving the above issue is to allow comma-comma to promote to a zero-length string if the column is NOT NULL. One idea would be to print a warning the first time a NOT NULL column sees a comma-comma, and state that it will be changed to a zero-length string. You could use STRICT to disable that and throw an error. I don't see why this has to be type-specific. If a zero-length string doesn't match an int column, you throw an error. If you see comma-comma for an INT NOT NULL column, you have to throw an error. There is no proper value for the column. You might also want to do that anyway in some cases even if the column isn't NOT NULL. This is where your FORCE col1, col2 came from, where you can specify which columns should be NULL and which zero-length strings. I guess the issue is can we get a single specification for the entire table, or do we need per-column specifications? Basically, we are trying to over-load the CSV system to store null information by using the distinction of whether a column has quote-quote or nothing. Let's look at your example again: CREATE TABLE test (col1 TEXT NOT NULL, col2 INT) and you have a CSV input file of: , If we say both are NULL, it fails, and if we say neither is NULL, it fails. Do we need control for each column? What if we go with preferring NULL for comma-comma, and then print warnings for NOT NULL columns and try the promote. If you want comma-comma to be zero-length string, you can create the column with NOT NULL, load the file, then ALTER TABLE to allow NULL's again. Basically, the NOT NULL specification on the column is the COPY CSV control method, rather than having it be in COPY. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
pgsql-patches by date: