Thread: Fwd: Copy out wording

Fwd: Copy out wording

From
Magnus Hagander
Date:
Crap, I just realized I sent to pgadmin hackers by mystake. Meh.



Our documentation for COPY
(http://www.postgresql.org/docs/8.4/static/sql-copy.html) has the
following to say:
"
 The CSV format has no standard way to distinguish a NULL value from
an empty string. PostgreSQL's COPY handles this by quoting. A NULL is
output as the NULL string and is not quoted, while a data value
matching the NULL string is quoted. Therefore, using the default
settings, a NULL is written as an unquoted empty string, while an
empty string is written with double quotes (""). Reading values
follows similar rules. You can use FORCE NOT NULL to prevent NULL
input comparisons for specific columns.
"

Shouldn't that be:
"A NULL is output as the NULL string and is not quoted, while a data
value matching the empty string is quoted"?

If not, then what really is the difference between a NULL and a NULL string?


-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Fwd: Copy out wording

From
Andrew Dunstan
Date:

Magnus Hagander wrote:
> Our documentation for COPY
> (http://www.postgresql.org/docs/8.4/static/sql-copy.html) has the
> following to say:
> "
>  The CSV format has no standard way to distinguish a NULL value from
> an empty string. PostgreSQL's COPY handles this by quoting. A NULL is
> output as the NULL string and is not quoted, while a data value
> matching the NULL string is quoted. Therefore, using the default
> settings, a NULL is written as an unquoted empty string, while an
> empty string is written with double quotes (""). Reading values
> follows similar rules. You can use FORCE NOT NULL to prevent NULL
> input comparisons for specific columns.
> "
>
> Shouldn't that be:
> "A NULL is output as the NULL string and is not quoted, while a data
> value matching the empty string is quoted"?
>
> If not, then what really is the difference between a NULL and a NULL string?
>
>   


No, it shouldn't. Let's say NULL is represented as "foo". Then a null 
between delimiters will be written as
   delimiter foo delimiter

while the string "foo" will be
   delimiter quotechar foo quotechar delimiter

and an empty non-null string will be
   delimiter delimiter

unless you have FORCE QUOTE on for it, in which case it will be
   delimiter quotechar quotechar delimiter


We had quite a bit of debate on the shape of CSV output at the time it 
was done (during 8.0), and that's what we came up with. It has the 
useful property that we can round-trip the data, i.e. we can read back 
the data we output without losing information about nulls, no matter 
what the NULL string is, something we have always been resistant to 
changing.

If you think we could explain it better, by all means have a go at it. 
But your proposed change isn't accurate. Here is an illustration of the 
above:
   andrew=# copy (values (1, 'foo', 2),(3,null,4),(5,'',6 )  ) to   stdout null 'foo' csv header;
column1,column2,column3  1,"foo",2   3,foo,4   5,,6   andrew=# copy (values (1, 'foo', 2),(3,null,4),(5,'',6 )  ) to
stdoutnull 'foo' csv header force quote column2;   column1,column2,column3   1,"foo",2   3,foo,4   5,"",6
 


HTH

cheers

andrew





Re: Fwd: Copy out wording

From
Magnus Hagander
Date:
On Thu, Sep 3, 2009 at 13:19, Andrew Dunstan<andrew@dunslane.net> wrote:
>
>
> Magnus Hagander wrote:
>>
>> Our documentation for COPY
>> (http://www.postgresql.org/docs/8.4/static/sql-copy.html) has the
>> following to say:
>> "
>>  The CSV format has no standard way to distinguish a NULL value from
>> an empty string. PostgreSQL's COPY handles this by quoting. A NULL is
>> output as the NULL string and is not quoted, while a data value
>> matching the NULL string is quoted. Therefore, using the default
>> settings, a NULL is written as an unquoted empty string, while an
>> empty string is written with double quotes (""). Reading values
>> follows similar rules. You can use FORCE NOT NULL to prevent NULL
>> input comparisons for specific columns.
>> "
>>
>> Shouldn't that be:
>> "A NULL is output as the NULL string and is not quoted, while a data
>> value matching the empty string is quoted"?
>>
>> If not, then what really is the difference between a NULL and a NULL
>> string?
>>
>>
>
>
> No, it shouldn't. Let's say NULL is represented as "foo". Then a null
> between delimiters will be written as
>
>   delimiter foo delimiter
>
> while the string "foo" will be
>
>   delimiter quotechar foo quotechar delimiter
>
> and an empty non-null string will be
>
>   delimiter delimiter
>
> unless you have FORCE QUOTE on for it, in which case it will be
>
>   delimiter quotechar quotechar delimiter
>
>
> We had quite a bit of debate on the shape of CSV output at the time it was
> done (during 8.0), and that's what we came up with. It has the useful
> property that we can round-trip the data, i.e. we can read back the data we
> output without losing information about nulls, no matter what the NULL
> string is, something we have always been resistant to changing.
>
> If you think we could explain it better, by all means have a go at it. But
> your proposed change isn't accurate. Here is an illustration of the above:

Oh, hang on, "the NULL string" refers to the copy parameter? Not a
part of the data? I read it as "a string being NULL". Maybe something
like "the value of the NULL string parameter" to be overly clear for
people like me? :-)

(FWIW, I totally agree with the feature, I was just confused by the docs)

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Fwd: Copy out wording

From
Andrew Dunstan
Date:

Magnus Hagander wrote:
> Oh, hang on, "the NULL string" refers to the copy parameter? Not a
> part of the data? I read it as "a string being NULL". Maybe something
> like "the value of the NULL string parameter" to be overly clear for
> people like me? :-)
>   

We could change:
   A NULL is output as the NULL string and is not quoted, while a data value   matching the NULL string is quoted.
     

to
   A NULL is output as the NULL parameter and is not quoted, while a non-NULL data value whose text representation
matchesthe NULL parameter is quoted.
 


or something similar. Would that be better?


cheers

andrew




Re: Fwd: Copy out wording

From
Magnus Hagander
Date:
On Thu, Sep 3, 2009 at 13:31, Andrew Dunstan<andrew@dunslane.net> wrote:
>
>
> Magnus Hagander wrote:
>>
>> Oh, hang on, "the NULL string" refers to the copy parameter? Not a
>> part of the data? I read it as "a string being NULL". Maybe something
>> like "the value of the NULL string parameter" to be overly clear for
>> people like me? :-)
>>
>
> We could change:
>
>   A NULL is output as the NULL string and is not quoted, while a data value
>   matching the NULL string is quoted.
>
>
> to
>
>   A NULL is output as the NULL parameter and is not quoted, while a non-NULL
> data value whose text representation
>   matches the NULL parameter is quoted.
>
>
> or something similar. Would that be better?

Yes, much better IMO.

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Fwd: Copy out wording

From
Bruce Momjian
Date:
Magnus Hagander wrote:
> On Thu, Sep 3, 2009 at 13:31, Andrew Dunstan<andrew@dunslane.net> wrote:
> >
> >
> > Magnus Hagander wrote:
> >>
> >> Oh, hang on, "the NULL string" refers to the copy parameter? Not a
> >> part of the data? I read it as "a string being NULL". Maybe something
> >> like "the value of the NULL string parameter" to be overly clear for
> >> people like me? :-)
> >>
> >
> > We could change:
> >
> > ? A NULL is output as the NULL string and is not quoted, while a data value
> > ? matching the NULL string is quoted.
> >
> >
> > to
> >
> > ? A NULL is output as the NULL parameter and is not quoted, while a non-NULL
> > data value whose text representation
> > ? matches the NULL parameter is quoted.
> >
> >
> > or something similar. Would that be better?
>
> Yes, much better IMO.

I have applied the attached documentation clarification patch, and
backpatched it to 8.4.X.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/ref/copy.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.87
diff -c -c -r1.87 copy.sgml
*** doc/src/sgml/ref/copy.sgml    5 Sep 2009 23:58:01 -0000    1.87
--- doc/src/sgml/ref/copy.sgml    17 Sep 2009 21:06:59 -0000
***************
*** 550,562 ****
     <para>
      The <literal>CSV</> format has no standard way to distinguish a
      <literal>NULL</> value from an empty string.
!     <productname>PostgreSQL</>'s <command>COPY</> handles this by
!     quoting. A <literal>NULL</> is output as the <literal>NULL</>
!     string and is not quoted, while a data value matching the
!     <literal>NULL</> string is quoted. Therefore, using the default
!     settings, a <literal>NULL</> is written as an unquoted empty
!     string, while an empty string is written with double quotes
!     (<literal>""</>). Reading values follows similar rules. You can
      use <literal>FORCE NOT NULL</> to prevent <literal>NULL</> input
      comparisons for specific columns.
     </para>
--- 550,559 ----
     <para>
      The <literal>CSV</> format has no standard way to distinguish a
      <literal>NULL</> value from an empty string.
!     <productname>PostgreSQL</>'s <command>COPY</> handles this using
!     quoting. A <literal>NULL</> is output as an empty string without
!     quotes, while an empty string data value is double-quoted
!     (<literal>""</>).  Reading values follows similar rules. You can
      use <literal>FORCE NOT NULL</> to prevent <literal>NULL</> input
      comparisons for specific columns.
     </para>

Re: Fwd: Copy out wording

From
Andrew Dunstan
Date:

Bruce Momjian wrote:
> Magnus Hagander wrote:
>   
>> On Thu, Sep 3, 2009 at 13:31, Andrew Dunstan<andrew@dunslane.net> wrote:
>>     
>>> Magnus Hagander wrote:
>>>       
>>>> Oh, hang on, "the NULL string" refers to the copy parameter? Not a
>>>> part of the data? I read it as "a string being NULL". Maybe something
>>>> like "the value of the NULL string parameter" to be overly clear for
>>>> people like me? :-)
>>>>
>>>>         
>>> We could change:
>>>
>>> ? A NULL is output as the NULL string and is not quoted, while a data value
>>> ? matching the NULL string is quoted.
>>>
>>>
>>> to
>>>
>>> ? A NULL is output as the NULL parameter and is not quoted, while a non-NULL
>>> data value whose text representation
>>> ? matches the NULL parameter is quoted.
>>>
>>>
>>> or something similar. Would that be better?
>>>       
>> Yes, much better IMO.
>>     
>
> I have applied the attached documentation clarification patch, and
> backpatched it to 8.4.X.
>
>   
>   

Why didn't you follow the wording I actually suggested, which had the 
virtue of being accurate.

cheers

andrew


Re: Fwd: Copy out wording

From
Bruce Momjian
Date:
Andrew Dunstan wrote:
>
>
> Bruce Momjian wrote:
> > Magnus Hagander wrote:
> >
> >> On Thu, Sep 3, 2009 at 13:31, Andrew Dunstan<andrew@dunslane.net> wrote:
> >>
> >>> Magnus Hagander wrote:
> >>>
> >>>> Oh, hang on, "the NULL string" refers to the copy parameter? Not a
> >>>> part of the data? I read it as "a string being NULL". Maybe something
> >>>> like "the value of the NULL string parameter" to be overly clear for
> >>>> people like me? :-)
> >>>>
> >>>>
> >>> We could change:
> >>>
> >>> ? A NULL is output as the NULL string and is not quoted, while a data value
> >>> ? matching the NULL string is quoted.
> >>>
> >>>
> >>> to
> >>>
> >>> ? A NULL is output as the NULL parameter and is not quoted, while a non-NULL
> >>> data value whose text representation
> >>> ? matches the NULL parameter is quoted.
> >>>
> >>>
> >>> or something similar. Would that be better?
> >>>
> >> Yes, much better IMO.
> >>
> >
> > I have applied the attached documentation clarification patch, and
> > backpatched it to 8.4.X.
> >
> >
> >
>
> Why didn't you follow the wording I actually suggested, which had the
> virtue of being accurate.

I thought the problem was the use of the word "null string", which
clearly was confusing.

I have updated the docs to use your wording, with a little
clarification.  The diff against yesterday's CVS is attached, which is
not smaller.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: copy.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.87
diff -c -r1.87 copy.sgml
*** copy.sgml    5 Sep 2009 23:58:01 -0000    1.87
--- copy.sgml    17 Sep 2009 21:47:11 -0000
***************
*** 1,5 ****
  <!--
! $PostgreSQL: pgsql/doc/src/sgml/ref/copy.sgml,v 1.87 2009/09/05 23:58:01 tgl Exp $
  PostgreSQL documentation
  -->

--- 1,5 ----
  <!--
! $PostgreSQL: pgsql/doc/src/sgml/ref/copy.sgml,v 1.88 2009/09/17 21:13:01 momjian Exp $
  PostgreSQL documentation
  -->

***************
*** 552,561 ****
      <literal>NULL</> value from an empty string.
      <productname>PostgreSQL</>'s <command>COPY</> handles this by
      quoting. A <literal>NULL</> is output as the <literal>NULL</>
!     string and is not quoted, while a data value matching the
!     <literal>NULL</> string is quoted. Therefore, using the default
      settings, a <literal>NULL</> is written as an unquoted empty
!     string, while an empty string is written with double quotes
      (<literal>""</>). Reading values follows similar rules. You can
      use <literal>FORCE NOT NULL</> to prevent <literal>NULL</> input
      comparisons for specific columns.
--- 552,561 ----
      <literal>NULL</> value from an empty string.
      <productname>PostgreSQL</>'s <command>COPY</> handles this by
      quoting. A <literal>NULL</> is output as the <literal>NULL</>
!     parameter and is not quoted, while a non-NULL value matching the
!     the <literal>NULL</> parameter string is quoted. Therefore, using the default
      settings, a <literal>NULL</> is written as an unquoted empty
!     string, while an empty string data value is written with double quotes
      (<literal>""</>). Reading values follows similar rules. You can
      use <literal>FORCE NOT NULL</> to prevent <literal>NULL</> input
      comparisons for specific columns.