Re: Patch: FORCE_NULL option for copy COPY in CSV mode - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date
Msg-id CAA4eK1+-+A09acLR7WOyQ_XPdOtkPu1YU5pvRaDACCi4Kg6QOA@mail.gmail.com
Whole thread Raw
In response to Re: Patch: FORCE_NULL option for copy COPY in CSV mode  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: Patch: FORCE_NULL option for copy COPY in CSV mode  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
On Wed, Oct 9, 2013 at 7:22 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>
> On 10/09/2013 09:22 AM, Amit Kapila wrote:
>>
>> On Wed, Oct 9, 2013 at 9:15 AM, Amit Kapila <amit.kapila16@gmail.com>
>> wrote:
>>>
>>> On Tue, Oct 8, 2013 at 6:03 PM, Andrew Dunstan <andrew@dunslane.net>
>>> wrote:
>>>>
>>>> On 10/07/2013 11:34 PM, Amit Kapila wrote:
>>>>>
>>>>> On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew@dunslane.net>
>>>>> wrote:
>>>>>>
>>>>>> On 10/07/2013 03:06 PM, Robert Haas wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Also if your use case is to treat empty strings as NULL (as per
>>>>>>>> above
>>>>>>>> documentation), can't it be handled with "WITH NULL AS" option.
>>>>>>>> For example, something like:
>>>>>>>>
>>>>>>>> postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
>>>>>>>> Enter data to be copied followed by a newline.
>>>>>>>> End with a backslash and a period on a line by itself.
>>>>>>>>>>
>>>>>>>>>> 50,
>>>>>>>>>> \.
>>>>>>>>
>>>>>>>> postgres=# select * from testnull;
>>>>>>>>     a  |  b
>>>>>>>> ----+------
>>>>>>>>     50 | NULL
>>>>>>>> (1 row)
>>>>>>>
>>>>>>> Good point.  If this patch is just implementing something that can
>>>>>>> already be done with another syntax, we don't need it.
>>>>>>>
>>>>>> Isn't the point of this option to allow a *quoted* empty string to be
>>>>>> forced
>>>>>> to NULL? If so, this is not testing the same case - in fact the COPY
>>>>>> command
>>>>>> above just makes explicit the default CSV NULL setting anyway.
>>>>>
>>>>> I am really not sure if all the purpose of patch can be achieved by
>>>>> existing syntax, neither it is explained clearly.
>>>>> However the proposal hasn't discussed why it's not good idea to extend
>>>>> some similar syntax "COPY .. NULL" which is used to replace string
>>>>> with NULL's?
>>>>> Description of NULL says: "Specifies the string that represents a null
>>>>> value."
>>>>> Now why can't this syntax be extended to support quoted empty string
>>>>> if it's not supported currently?
>>>>> I have not checked completely, If it's difficult or not possible to
>>>>> support in existing syntax, then even it add's more value to introduce
>>>>> new syntax.
>>>>>
>>>>> By asking above question, I doesn't mean that we should not go for the
>>>>> new proposed syntax, rather it's to know and understand the benefit of
>>>>> new syntax, also it helps during CF review for reviewer's if the
>>>>> proposal involves new syntax and that's discussed previously.
>>>>>
>>>> Quite apart from any other consideration, this suggestion is inferior to
>>>> what's proposed in that it's an all or nothing deal, while the patch
>>>> allows
>>>> you to specify the behaviour very explicitly on a per column basis. I
>>>> can
>>>> well imagine wanting to be able to force a quoted empty string to null
>>>> for
>>>> numeric fields but not for text.
>>
>>      Okay, but can't it be done by extending current syntax such as
>>      NULL FOR <col1, col2> AS ""- which would mean it will replace
>> corresponding column's values as NULL if they contain empty string.
>>
>>>> The basic principal of our CSV processing is that we don't ever turn a
>>>> NULL
>>>> into something quoted and we don't ever turn something quoted into NULL.
>>>> That's what lets us round-trip test just about every combination of
>>>> options.
>>>> I'm only going to be happy violating that, as this patch does, in a very
>>>> explicit and controlled way.
>>>
>>> Will this option allow only quoted empty string to be NULL or will
>>> handle without quoted empty string as well?
>>
>>     I had checked patch and it seems for both quoted and unquoted empty
>> string, it will replace it with NULL.
>>     Now here it's bit different from FORCE_NOT_NULL, because already
>> for un-quoted empty strings we have a way to replace them with NULL.
>>
>>     I think having 2 different syntax for replacing empty strings (one
>> for quoted and another for un-quoted) as NULL's might not be best way
>> to accomplish
>>     this feature.
>>
>>
>
>
> I really don't know what you're saying here.
>
> Here is the situation we have today (assuming the default null marker of
> empty-string):
>
> default:               empty-string -> null, quoted-empty-string ->
> emptystring
> with force_not_null:   empty-string -> emptystring, quoted-empty-string ->
> emptystring
>
> and the proposal would add to that:
>
> with force-null:       empty-string -> null, quoted-empty-string -> null
>
> So it appears to be quite on all fours with the way force_not_null works
> now, it just does the reverse.
>
> I don't see at all that your suggested alternative has any advantages over
> what's been written. If you can say "NULL FOR (foo) as '""' how will you
> specify the null for some other column(s)? Are we going to have multiple
> such clauses?
  No, if user wants for particular string (ex. quoted empty string)
that few of it's columns becomes NULL, he can specify them together (
NULL for (col1, col2)).

> It looks like a real mess.
  One of the advantage, I could see using "NULL For .." syntax is
that already we have one syntax with which user can specify what
strings can be replaced with NULL, now just to handle quoted empty
string why to add different syntax.

"FORCE_NULL" has advantage that it can be used for some columns rather
than all columns, but I think for that existing syntax can also be
modified to support it.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Auto-tuning work_mem and maintenance_work_mem
Next
From: Pavel Stehule
Date:
Subject: Re: PSQL return coder