Thread: Fwd: Copy out wording
Crap, I just realized I sent to pgadmin hackers by mystake. Meh. Our documentation for COPY (http://www.postgresql.org/docs/8.4/static/sql-copy.html) has the following to say: " The CSV format has no standard way to distinguish a NULL value from an empty string. PostgreSQL's COPY handles this by quoting. A NULL is output as the NULL string and is not quoted, while a data value matching the NULL string is quoted. Therefore, using the default settings, a NULL is written as an unquoted empty string, while an empty string is written with double quotes (""). Reading values follows similar rules. You can use FORCE NOT NULL to prevent NULL input comparisons for specific columns. " Shouldn't that be: "A NULL is output as the NULL string and is not quoted, while a data value matching the empty string is quoted"? If not, then what really is the difference between a NULL and a NULL string? -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
Magnus Hagander wrote: > Our documentation for COPY > (http://www.postgresql.org/docs/8.4/static/sql-copy.html) has the > following to say: > " > The CSV format has no standard way to distinguish a NULL value from > an empty string. PostgreSQL's COPY handles this by quoting. A NULL is > output as the NULL string and is not quoted, while a data value > matching the NULL string is quoted. Therefore, using the default > settings, a NULL is written as an unquoted empty string, while an > empty string is written with double quotes (""). Reading values > follows similar rules. You can use FORCE NOT NULL to prevent NULL > input comparisons for specific columns. > " > > Shouldn't that be: > "A NULL is output as the NULL string and is not quoted, while a data > value matching the empty string is quoted"? > > If not, then what really is the difference between a NULL and a NULL string? > > No, it shouldn't. Let's say NULL is represented as "foo". Then a null between delimiters will be written as delimiter foo delimiter while the string "foo" will be delimiter quotechar foo quotechar delimiter and an empty non-null string will be delimiter delimiter unless you have FORCE QUOTE on for it, in which case it will be delimiter quotechar quotechar delimiter We had quite a bit of debate on the shape of CSV output at the time it was done (during 8.0), and that's what we came up with. It has the useful property that we can round-trip the data, i.e. we can read back the data we output without losing information about nulls, no matter what the NULL string is, something we have always been resistant to changing. If you think we could explain it better, by all means have a go at it. But your proposed change isn't accurate. Here is an illustration of the above: andrew=# copy (values (1, 'foo', 2),(3,null,4),(5,'',6 ) ) to stdout null 'foo' csv header; column1,column2,column3 1,"foo",2 3,foo,4 5,,6 andrew=# copy (values (1, 'foo', 2),(3,null,4),(5,'',6 ) ) to stdoutnull 'foo' csv header force quote column2; column1,column2,column3 1,"foo",2 3,foo,4 5,"",6 HTH cheers andrew
On Thu, Sep 3, 2009 at 13:19, Andrew Dunstan<andrew@dunslane.net> wrote: > > > Magnus Hagander wrote: >> >> Our documentation for COPY >> (http://www.postgresql.org/docs/8.4/static/sql-copy.html) has the >> following to say: >> " >> The CSV format has no standard way to distinguish a NULL value from >> an empty string. PostgreSQL's COPY handles this by quoting. A NULL is >> output as the NULL string and is not quoted, while a data value >> matching the NULL string is quoted. Therefore, using the default >> settings, a NULL is written as an unquoted empty string, while an >> empty string is written with double quotes (""). Reading values >> follows similar rules. You can use FORCE NOT NULL to prevent NULL >> input comparisons for specific columns. >> " >> >> Shouldn't that be: >> "A NULL is output as the NULL string and is not quoted, while a data >> value matching the empty string is quoted"? >> >> If not, then what really is the difference between a NULL and a NULL >> string? >> >> > > > No, it shouldn't. Let's say NULL is represented as "foo". Then a null > between delimiters will be written as > > delimiter foo delimiter > > while the string "foo" will be > > delimiter quotechar foo quotechar delimiter > > and an empty non-null string will be > > delimiter delimiter > > unless you have FORCE QUOTE on for it, in which case it will be > > delimiter quotechar quotechar delimiter > > > We had quite a bit of debate on the shape of CSV output at the time it was > done (during 8.0), and that's what we came up with. It has the useful > property that we can round-trip the data, i.e. we can read back the data we > output without losing information about nulls, no matter what the NULL > string is, something we have always been resistant to changing. > > If you think we could explain it better, by all means have a go at it. But > your proposed change isn't accurate. Here is an illustration of the above: Oh, hang on, "the NULL string" refers to the copy parameter? Not a part of the data? I read it as "a string being NULL". Maybe something like "the value of the NULL string parameter" to be overly clear for people like me? :-) (FWIW, I totally agree with the feature, I was just confused by the docs) -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
Magnus Hagander wrote: > Oh, hang on, "the NULL string" refers to the copy parameter? Not a > part of the data? I read it as "a string being NULL". Maybe something > like "the value of the NULL string parameter" to be overly clear for > people like me? :-) > We could change: A NULL is output as the NULL string and is not quoted, while a data value matching the NULL string is quoted. to A NULL is output as the NULL parameter and is not quoted, while a non-NULL data value whose text representation matchesthe NULL parameter is quoted. or something similar. Would that be better? cheers andrew
On Thu, Sep 3, 2009 at 13:31, Andrew Dunstan<andrew@dunslane.net> wrote: > > > Magnus Hagander wrote: >> >> Oh, hang on, "the NULL string" refers to the copy parameter? Not a >> part of the data? I read it as "a string being NULL". Maybe something >> like "the value of the NULL string parameter" to be overly clear for >> people like me? :-) >> > > We could change: > > A NULL is output as the NULL string and is not quoted, while a data value > matching the NULL string is quoted. > > > to > > A NULL is output as the NULL parameter and is not quoted, while a non-NULL > data value whose text representation > matches the NULL parameter is quoted. > > > or something similar. Would that be better? Yes, much better IMO. -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
Magnus Hagander wrote: > On Thu, Sep 3, 2009 at 13:31, Andrew Dunstan<andrew@dunslane.net> wrote: > > > > > > Magnus Hagander wrote: > >> > >> Oh, hang on, "the NULL string" refers to the copy parameter? Not a > >> part of the data? I read it as "a string being NULL". Maybe something > >> like "the value of the NULL string parameter" to be overly clear for > >> people like me? :-) > >> > > > > We could change: > > > > ? A NULL is output as the NULL string and is not quoted, while a data value > > ? matching the NULL string is quoted. > > > > > > to > > > > ? A NULL is output as the NULL parameter and is not quoted, while a non-NULL > > data value whose text representation > > ? matches the NULL parameter is quoted. > > > > > > or something similar. Would that be better? > > Yes, much better IMO. I have applied the attached documentation clarification patch, and backpatched it to 8.4.X. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: doc/src/sgml/ref/copy.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v retrieving revision 1.87 diff -c -c -r1.87 copy.sgml *** doc/src/sgml/ref/copy.sgml 5 Sep 2009 23:58:01 -0000 1.87 --- doc/src/sgml/ref/copy.sgml 17 Sep 2009 21:06:59 -0000 *************** *** 550,562 **** <para> The <literal>CSV</> format has no standard way to distinguish a <literal>NULL</> value from an empty string. ! <productname>PostgreSQL</>'s <command>COPY</> handles this by ! quoting. A <literal>NULL</> is output as the <literal>NULL</> ! string and is not quoted, while a data value matching the ! <literal>NULL</> string is quoted. Therefore, using the default ! settings, a <literal>NULL</> is written as an unquoted empty ! string, while an empty string is written with double quotes ! (<literal>""</>). Reading values follows similar rules. You can use <literal>FORCE NOT NULL</> to prevent <literal>NULL</> input comparisons for specific columns. </para> --- 550,559 ---- <para> The <literal>CSV</> format has no standard way to distinguish a <literal>NULL</> value from an empty string. ! <productname>PostgreSQL</>'s <command>COPY</> handles this using ! quoting. A <literal>NULL</> is output as an empty string without ! quotes, while an empty string data value is double-quoted ! (<literal>""</>). Reading values follows similar rules. You can use <literal>FORCE NOT NULL</> to prevent <literal>NULL</> input comparisons for specific columns. </para>
Bruce Momjian wrote: > Magnus Hagander wrote: > >> On Thu, Sep 3, 2009 at 13:31, Andrew Dunstan<andrew@dunslane.net> wrote: >> >>> Magnus Hagander wrote: >>> >>>> Oh, hang on, "the NULL string" refers to the copy parameter? Not a >>>> part of the data? I read it as "a string being NULL". Maybe something >>>> like "the value of the NULL string parameter" to be overly clear for >>>> people like me? :-) >>>> >>>> >>> We could change: >>> >>> ? A NULL is output as the NULL string and is not quoted, while a data value >>> ? matching the NULL string is quoted. >>> >>> >>> to >>> >>> ? A NULL is output as the NULL parameter and is not quoted, while a non-NULL >>> data value whose text representation >>> ? matches the NULL parameter is quoted. >>> >>> >>> or something similar. Would that be better? >>> >> Yes, much better IMO. >> > > I have applied the attached documentation clarification patch, and > backpatched it to 8.4.X. > > > Why didn't you follow the wording I actually suggested, which had the virtue of being accurate. cheers andrew
Andrew Dunstan wrote: > > > Bruce Momjian wrote: > > Magnus Hagander wrote: > > > >> On Thu, Sep 3, 2009 at 13:31, Andrew Dunstan<andrew@dunslane.net> wrote: > >> > >>> Magnus Hagander wrote: > >>> > >>>> Oh, hang on, "the NULL string" refers to the copy parameter? Not a > >>>> part of the data? I read it as "a string being NULL". Maybe something > >>>> like "the value of the NULL string parameter" to be overly clear for > >>>> people like me? :-) > >>>> > >>>> > >>> We could change: > >>> > >>> ? A NULL is output as the NULL string and is not quoted, while a data value > >>> ? matching the NULL string is quoted. > >>> > >>> > >>> to > >>> > >>> ? A NULL is output as the NULL parameter and is not quoted, while a non-NULL > >>> data value whose text representation > >>> ? matches the NULL parameter is quoted. > >>> > >>> > >>> or something similar. Would that be better? > >>> > >> Yes, much better IMO. > >> > > > > I have applied the attached documentation clarification patch, and > > backpatched it to 8.4.X. > > > > > > > > Why didn't you follow the wording I actually suggested, which had the > virtue of being accurate. I thought the problem was the use of the word "null string", which clearly was confusing. I have updated the docs to use your wording, with a little clarification. The diff against yesterday's CVS is attached, which is not smaller. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: copy.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v retrieving revision 1.87 diff -c -r1.87 copy.sgml *** copy.sgml 5 Sep 2009 23:58:01 -0000 1.87 --- copy.sgml 17 Sep 2009 21:47:11 -0000 *************** *** 1,5 **** <!-- ! $PostgreSQL: pgsql/doc/src/sgml/ref/copy.sgml,v 1.87 2009/09/05 23:58:01 tgl Exp $ PostgreSQL documentation --> --- 1,5 ---- <!-- ! $PostgreSQL: pgsql/doc/src/sgml/ref/copy.sgml,v 1.88 2009/09/17 21:13:01 momjian Exp $ PostgreSQL documentation --> *************** *** 552,561 **** <literal>NULL</> value from an empty string. <productname>PostgreSQL</>'s <command>COPY</> handles this by quoting. A <literal>NULL</> is output as the <literal>NULL</> ! string and is not quoted, while a data value matching the ! <literal>NULL</> string is quoted. Therefore, using the default settings, a <literal>NULL</> is written as an unquoted empty ! string, while an empty string is written with double quotes (<literal>""</>). Reading values follows similar rules. You can use <literal>FORCE NOT NULL</> to prevent <literal>NULL</> input comparisons for specific columns. --- 552,561 ---- <literal>NULL</> value from an empty string. <productname>PostgreSQL</>'s <command>COPY</> handles this by quoting. A <literal>NULL</> is output as the <literal>NULL</> ! parameter and is not quoted, while a non-NULL value matching the ! the <literal>NULL</> parameter string is quoted. Therefore, using the default settings, a <literal>NULL</> is written as an unquoted empty ! string, while an empty string data value is written with double quotes (<literal>""</>). Reading values follows similar rules. You can use <literal>FORCE NOT NULL</> to prevent <literal>NULL</> input comparisons for specific columns.