Plan for CSV handling of quotes, NULL - Mailing list pgsql-patches

From Bruce Momjian
Subject Plan for CSV handling of quotes, NULL
Date
Msg-id 200404142251.i3EMpiY07491@candle.pha.pa.us
Whole thread Raw
In response to Re: Updated COPY CSV patch  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: Plan for CSV handling of quotes, NULL  (Bruce Momjian <pgman@candle.pha.pa.us>)
Re: Plan for CSV handling of quotes, NULL  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Plan for CSV handling of quotes, NULL  ("Andrew Dunstan" <andrew@dunslane.net>)
List pgsql-patches
Bruce Momjian wrote:
> Wow, that is certainly an excellent point.  When we import, we know the
> resulting data type, but spreadsheets don't, and rely on the quoting to
> know what to do with the value.
>
> The zipcode is an excellent example.  You can't even test for leading
> zeros because then some spreadsheet values in the column are text and
> some numeric.

I talked to Andrew on IRC and we went over the open CSV issues.

We talked about how we could do quoting for zipcode in TEXT fields and
not quote true numeric values without hardcoding datatypes into the
system.  The most tricky case was NUMERIC vs. TEXT with zipcodes.
NUMERIC and TEXT have almost identical pg_type entries, so there is
nothing there to help us.

I found parse_coerce.c::TypeCategory(), which contains information about
which data type oids are in which grouping, e.g. DATETIME, STRING,
NUMERIC, etc.  It seems that function, if called with
pg_type.typbasetype could help determine if quotes should be used.  My
idea is to skip quotes for NUMERIC and DATETIME types, and quote
everything else.  This means that user-defined types will always be
output with quotes, which is probably OK.

So, for open CSV items we have:

    o  add oid dump/reload
    o  handle loading of comma-comma into NOT NULL collumns
    o  handle quoting of TEXT type with zipcodes, etc.

For the NOT NULL cases, I am thinking we can just throw a warning the
first time a comma-comma is loaded into a TEXT column and promote to a
zero-lengh string.  If the column is INT, it throws an error.

If we head in this direction, we will not need any additional syntax
except that which is in the patch already.

I have the master version of the patch and made his suggested changes
for the default for 'escape'.

Comments?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: win32 fixes
Next
From: Bruce Momjian
Date:
Subject: Re: Plan for CSV handling of quotes, NULL