Thread: allow CSV quote in NULL

allow CSV quote in NULL

From
Stephen Frost
Date:
Greetings,

  Please find attached a minor patch to remove the constraints that a
  user can't include the delimiter or quote characters in a 'NULL AS'
  string when importing CSV files.

  This allows a user to explicitly request that NULL conversion happen
  on fields which are quoted.  As the quote character is being allowed
  to be in the 'NULL AS' string now, there's no reason to exclude the
  delimiter character from being seen in that string as well, though
  unless quoted using the CSV quote character it won't ever be matched.

  An example of the usage:

  sfrost*=# \copy billing_data from ~/BillingSamplePricerFile.csv
            with csv header quote as '"' null as '""'

  This is no contrived example, it's an issue I ran into earlier today
  when I got a file which had (for reasons unknown to me and not easily
  changed upstream):

  "1","V","WASHDCABC12","","120033"...

  Both of the ending columns shown are integer fields, the "" here being
  used to indicate a NULL value.

  Without the patch, an ERROR occurs:

  sfrost=> \copy billing_data from ~/BillingSamplePricerFile.csv
            with csv header quote as '"'
  ERROR:  invalid input syntax for integer: ""

  And there's no way to get it to import with COPY CSV mode.  The
  patch adds this ability without affecting existing usage or changing
  the syntax.  Even with the patch an ERROR occurs with the default
  treatment of CSV files:

  sfrost=# \copy billing_data from ~/BillingSamplePricerFile.csv
           with csv header quote as '"'
  ERROR:  invalid input syntax for integer: ""

  Which would be expected.  If the file is modified to remove the ""s
  for NULL columns, it imports just fine with the syntax above.

  It'd be really nice to have this included.

      Thanks!

        Stephen

Attachment

Re: allow CSV quote in NULL

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
>   Please find attached a minor patch to remove the constraints that a
>   user can't include the delimiter or quote characters in a 'NULL AS'
>   string when importing CSV files.

This can't really be sane can it?

            regards, tom lane

Re: allow CSV quote in NULL

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> >   Please find attached a minor patch to remove the constraints that a
> >   user can't include the delimiter or quote characters in a 'NULL AS'
> >   string when importing CSV files.
>
> This can't really be sane can it?

hm?  It's a problem I've run into a number of times and I finally got
fed up with it enough to create a patch for it.  It solves my problem
and I don't believe breaks or adversely affects anything else.  It also
has to be explicitly asked for by the user, and when the user asks for
it, what's done is intuitively what would be expected.

I don't see the logic in forbidding the user from being able to do
this...  My specific case involved an integer field which was quoted in
the CSV file (which isn't generally a problem, it's only an issue if it
ends up being a an empty string).  I could easily see it being used for
text fields as well though, if, for example, they want empty strings,
even when quoted, to import as NULLs.  Of course, the default remains
that it'd be imported as an empty string, having it be a NULL has to be
explicitly requested.

    Thanks,

        Stephen

Attachment

Re: allow CSV quote in NULL

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Stephen Frost <sfrost@snowman.net> writes:
>>   Please find attached a minor patch to remove the constraints that a
>>   user can't include the delimiter or quote characters in a 'NULL AS'
>>   string when importing CSV files.
>
> This can't really be sane can it?

Including unquoted delimiter characters seems kind of insane. But including
the quote characters or quoted delimiter characters seems reasonable.

The alternative would be interpreting NULL strings after dequoting but that
would leave no way to include the NULL string literally. This solution means
there's no way to include it (if it needs quoting) but only when you specify
it this way.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: allow CSV quote in NULL

From
Stephen Frost
Date:
* Gregory Stark (stark@enterprisedb.com) wrote:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>
> > Stephen Frost <sfrost@snowman.net> writes:
> >>   Please find attached a minor patch to remove the constraints that a
> >>   user can't include the delimiter or quote characters in a 'NULL AS'
> >>   string when importing CSV files.
> >
> > This can't really be sane can it?
>
> Including unquoted delimiter characters seems kind of insane. But including
> the quote characters or quoted delimiter characters seems reasonable.

Right.  We could write something to check if the user provides an
unquoted delimiter character but I'm not really sure it's worth it, to
be perfectly honest.  If it'll make people happier with the patch I can
do it though.

> The alternative would be interpreting NULL strings after dequoting but that
> would leave no way to include the NULL string literally. This solution means
> there's no way to include it (if it needs quoting) but only when you specify
> it this way.

Yeah, interpreting NULLs after dequoting means you've lost the
information about if it's quoted or not, or you have to add some funky
syntax to say "if it's quoted, do it differently...", which is no good,
imv.

What the patch does basically is say "give us the exact string that
shows up between the unquoted delimiters that you want to be treated
as a NULL."  This removes the complexity of the question about quoting,
unquoting, whatever, and makes it a very clear-cut, straight-forward
solution with no impact on existing users, imv.

    Thanks,

        Stephen

Attachment

Re: allow CSV quote in NULL

From
Alvaro Herrera
Date:
Stephen Frost wrote:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> > Stephen Frost <sfrost@snowman.net> writes:
> > >   Please find attached a minor patch to remove the constraints that a
> > >   user can't include the delimiter or quote characters in a 'NULL AS'
> > >   string when importing CSV files.
> >
> > This can't really be sane can it?
>
> hm?  It's a problem I've run into a number of times and I finally got
> fed up with it enough to create a patch for it.

I've seen people in the same situation before, so this makes sense from
that perspective.  If there's a vote, the patch gets a +1 from me.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: allow CSV quote in NULL

From
Bruce Momjian
Date:
This has been saved for the 8.4 release:

    http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Stephen Frost wrote:
> Greetings,
>
>   Please find attached a minor patch to remove the constraints that a
>   user can't include the delimiter or quote characters in a 'NULL AS'
>   string when importing CSV files.
>
>   This allows a user to explicitly request that NULL conversion happen
>   on fields which are quoted.  As the quote character is being allowed
>   to be in the 'NULL AS' string now, there's no reason to exclude the
>   delimiter character from being seen in that string as well, though
>   unless quoted using the CSV quote character it won't ever be matched.
>
>   An example of the usage:
>
>   sfrost*=# \copy billing_data from ~/BillingSamplePricerFile.csv
>             with csv header quote as '"' null as '""'
>
>   This is no contrived example, it's an issue I ran into earlier today
>   when I got a file which had (for reasons unknown to me and not easily
>   changed upstream):
>
>   "1","V","WASHDCABC12","","120033"...
>
>   Both of the ending columns shown are integer fields, the "" here being
>   used to indicate a NULL value.
>
>   Without the patch, an ERROR occurs:
>
>   sfrost=> \copy billing_data from ~/BillingSamplePricerFile.csv
>             with csv header quote as '"'
>   ERROR:  invalid input syntax for integer: ""
>
>   And there's no way to get it to import with COPY CSV mode.  The
>   patch adds this ability without affecting existing usage or changing
>   the syntax.  Even with the patch an ERROR occurs with the default
>   treatment of CSV files:
>
>   sfrost=# \copy billing_data from ~/BillingSamplePricerFile.csv
>            with csv header quote as '"'
>   ERROR:  invalid input syntax for integer: ""
>
>   Which would be expected.  If the file is modified to remove the ""s
>   for NULL columns, it imports just fine with the syntax above.
>
>   It'd be really nice to have this included.
>
>       Thanks!
>
>         Stephen

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +