Re: Problem with COPY CSV - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: Problem with COPY CSV
Date
Msg-id 438E7A24.9020106@dunslane.net
Whole thread Raw
In response to Problem with COPY CSV  (Christopher Kings-Lynne <chriskl@familyhealth.com.au>)
List pgsql-hackers

Christopher Kings-Lynne wrote:

> Attached is a small test extract from the USDA nutrient database.
>
> The problem is that the script won't load the COPY data correctly.  
> This is with CVS HEAD (and 8.1).
>
> It is the 4th column in the table that gives the problem (nutr_no 
> integer).  Each of the 3 COPY rows has a different way of specifying a 
> number:
>
> * As an unquoted empty value
> * As a quoted number
> * As a quoted empty value
>
> Now, I'm specifying "NULL AS ''" in the COPY command but it refuses to 
> recognize the quoted empty value as NULL.
>
> Anyone have any ideas?  Is this a bug?  The manual even says that 
> using "NULL AS ''" lets you make no distinction between empty and 
> quoted empty values.


It is not a bug. It is working as designed and as documented. The rule 
is basically that a null value is never quoted and a quoted value is 
never null. The docs say:

"The CSV format has no standard way to distinguish a NULL value from an 
empty string. PostgreSQL's COPY handles this by quoting. A NULL is 
output as the NULL string and is not quoted, while a data value matching 
the NULL string is quoted. Therefore, using the default settings, a NULL 
is written as an unquoted empty string, while an empty string is written 
with double quotes (""). Reading values follows similar rules. You can 
use FORCE NOT NULL to prevent NULL input comparisons for specific columns."

The corollary of this is that you can't make a quoted value be accepted 
as null (FORCE NOT NULL does the converse, i.e. it makes an unquoted 
value be considered as not null).

The handling of nulls in CSV mode was discussed in almost nauseous 
detail at the time we did CSV about 18 months ago, so there should be no 
surprise here.

For data as irregular as this I suggest that you import it into a text 
field and then update the value of that field to NULL where it's empty - 
you could then do ALTER TYPE ... USING ... . The other possibility would 
be to preprocess the data.

I at least am not inclined to tinker too much more with CSV mode - we 
could end up catering for every weird output format in the world if 
we're not careful.

cheers

andrew





pgsql-hackers by date:

Previous
From: "Marc G. Fournier"
Date:
Subject: Re: [pgsql-www] Upcoming PG re-releases
Next
From: "Joshua D. Drake"
Date:
Subject: Re: [pgsql-www] Upcoming PG re-releases