Re: Updated COPY CSV patch - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: Updated COPY CSV patch
Date
Msg-id 200404141913.i3EJDNi02660@candle.pha.pa.us
Whole thread Raw
In response to Re: Updated COPY CSV patch  (Andrew Dunstan <andrew@dunslane.net>)
Responses Plan for CSV handling of quotes, NULL
List pgsql-patches
Andrew Dunstan wrote:
> Bruce Momjian wrote:
>
> >
> >Do we need control for each column?  What if we go with preferring NULL
> >for comma-comma, and then print warnings for NOT NULL columns and try
> >the promote.  If you want comma-comma to be zero-length string, you can
> >create the column with NOT NULL, load the file, then ALTER TABLE to
> >allow NULL's again.  Basically, the NOT NULL specification on the column
> >is the COPY CSV control method, rather than having it be in COPY.
> >
> >
> >
>
>
> If we can't do type specific stuff then we need to be able to have
> column-specific controls on export, at least.
>
> Consider a text column containing US 5-digit ZIP codes. If they are not
> quoted, a spreadsheet will almost certainly not preserve the leading
> zero some of them have, producing very undesirable results. However, a
> genuine numeric-type field must not be quoted, or the same spreadsheet
> won't see that value as a number. Unless we do stuff based on type, we
> have no way of knowing from the text representation of the data what we
> really need. Thus my proposal from this morning for column-specific user
> control over this aspect. And if we are going to have per column user
> control on export, why not on import too, to handle the NOT NULL
> problem? It might make life easier for us code-wise than chasing down
> nullability (e.g. in domains).

Wow, that is certainly an excellent point.  When we import, we know the
resulting data type, but spreadsheets don't, and rely on the quoting to
know what to do with the value.

The zipcode is an excellent example.  You can't even test for leading
zeros because then some spreadsheet values in the column are text and
some numeric.

We do have a column list capability with COPY already:

       COPY tablename [ ( column [, ...] ) ]
           FROM { 'filename' | STDIN }

Maybe we should extend that to control quoting on export and NULL
handling on import.  Does that solve our problems?

FYI, do you have IM?  I am:

    AIM    bmomjian
    ICQ    151255111
    Yahoo    bmomjian
    MSN    root@candle.pha.pa.us
    IRC    bmomjian via FreeNode or EFNet

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

pgsql-patches by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Updated COPY CSV patch
Next
From: Bruce Momjian
Date:
Subject: Re: win32 fixes