Re: Patch: FORCE_NULL option for copy COPY in CSV mode - Mailing list pgsql-hackers

From David Fetter
Subject Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date
Msg-id 20131009172556.GB13993@fetter.org
Whole thread Raw
In response to Re: Patch: FORCE_NULL option for copy COPY in CSV mode  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: Patch: FORCE_NULL option for copy COPY in CSV mode
List pgsql-hackers
On Wed, Oct 09, 2013 at 09:52:29AM -0400, Andrew Dunstan wrote:
> 
> On 10/09/2013 09:22 AM, Amit Kapila wrote:
> >On Wed, Oct 9, 2013 at 9:15 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>On Tue, Oct 8, 2013 at 6:03 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> >>>On 10/07/2013 11:34 PM, Amit Kapila wrote:
> >>>>On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew@dunslane.net>
> >>>>wrote:
> >>>>>On 10/07/2013 03:06 PM, Robert Haas wrote:
> >>>>>>
> >>>>>>>Also if your use case is to treat empty strings as NULL (as per above
> >>>>>>>documentation), can't it be handled with "WITH NULL AS" option.
> >>>>>>>For example, something like:
> >>>>>>>
> >>>>>>>postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
> >>>>>>>Enter data to be copied followed by a newline.
> >>>>>>>End with a backslash and a period on a line by itself.
> >>>>>>>>>50,
> >>>>>>>>>\.
> >>>>>>>postgres=# select * from testnull;
> >>>>>>>    a  |  b
> >>>>>>>----+------
> >>>>>>>    50 | NULL
> >>>>>>>(1 row)
> >>>>>>Good point.  If this patch is just implementing something that can
> >>>>>>already be done with another syntax, we don't need it.
> >>>>>>
> >>>>>Isn't the point of this option to allow a *quoted* empty string to be
> >>>>>forced
> >>>>>to NULL? If so, this is not testing the same case - in fact the COPY
> >>>>>command
> >>>>>above just makes explicit the default CSV NULL setting anyway.
> >>>>I am really not sure if all the purpose of patch can be achieved by
> >>>>existing syntax, neither it is explained clearly.
> >>>>However the proposal hasn't discussed why it's not good idea to extend
> >>>>some similar syntax "COPY .. NULL" which is used to replace string
> >>>>with NULL's?
> >>>>Description of NULL says: "Specifies the string that represents a null
> >>>>value."
> >>>>Now why can't this syntax be extended to support quoted empty string
> >>>>if it's not supported currently?
> >>>>I have not checked completely, If it's difficult or not possible to
> >>>>support in existing syntax, then even it add's more value to introduce
> >>>>new syntax.
> >>>>
> >>>>By asking above question, I doesn't mean that we should not go for the
> >>>>new proposed syntax, rather it's to know and understand the benefit of
> >>>>new syntax, also it helps during CF review for reviewer's if the
> >>>>proposal involves new syntax and that's discussed previously.
> >>>>
> >>>Quite apart from any other consideration, this suggestion is inferior to
> >>>what's proposed in that it's an all or nothing deal, while the patch allows
> >>>you to specify the behaviour very explicitly on a per column basis. I can
> >>>well imagine wanting to be able to force a quoted empty string to null for
> >>>numeric fields but not for text.
> >     Okay, but can't it be done by extending current syntax such as
> >     NULL FOR <col1, col2> AS ""- which would mean it will replace
> >corresponding column's values as NULL if they contain empty string.
> >
> >>>The basic principal of our CSV processing is that we don't ever turn a NULL
> >>>into something quoted and we don't ever turn something quoted into NULL.
> >>>That's what lets us round-trip test just about every combination of options.
> >>>I'm only going to be happy violating that, as this patch does, in a very
> >>>explicit and controlled way.
> >>Will this option allow only quoted empty string to be NULL or will
> >>handle without quoted empty string as well?
> >    I had checked patch and it seems for both quoted and unquoted empty
> >string, it will replace it with NULL.
> >    Now here it's bit different from FORCE_NOT_NULL, because already
> >for un-quoted empty strings we have a way to replace them with NULL.
> >
> >    I think having 2 different syntax for replacing empty strings (one
> >for quoted and another for un-quoted) as NULL's might not be best way
> >to accomplish
> >    this feature.
> >
> >
> 
> 
> I really don't know what you're saying here.
> 
> Here is the situation we have today (assuming the default null
> marker of empty-string):
> 
> default:               empty-string -> null, quoted-empty-string ->
> emptystring
> with force_not_null:   empty-string -> emptystring,
> quoted-empty-string -> emptystring
> 
> and the proposal would add to that:
> 
> with force-null:       empty-string -> null, quoted-empty-string -> null
> 
> So it appears to be quite on all fours with the way force_not_null
> works now, it just does the reverse.
> 
> I don't see at all that your suggested alternative has any
> advantages over what's been written. If you can say "NULL FOR (foo)
> as '""' how will you specify the null for some other column(s)?

Idea:

NULL FOR (foo,bar,baz,blurf) AS '""', NULL FOR (quux,fleeg) AS ...,

> Are we going to have multiple such clauses? It looks like a real
> mess.

Is that not part of what parsers ordinarily do?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Auto-tuning work_mem and maintenance_work_mem
Next
From: Robert Haas
Date:
Subject: Re: Auto-tuning work_mem and maintenance_work_mem