Thread: pgsql: Add regression tests for CSV and \., and add automatic quoting of

pgsql: Add regression tests for CSV and \., and add automatic quoting of

From
momjian@postgresql.org (Bruce Momjian)
Date:
Log Message:
-----------
Add regression tests for CSV and \., and add automatic quoting of a
single column dump that has a \. value, so the load works properly.  I
also added documentation describing this issue.

Backpatch to 8.1.X.

Tags:
----
REL8_1_STABLE

Modified Files:
--------------
    pgsql/doc/src/sgml/ref:
        copy.sgml (r1.70 -> r1.70.2.1)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/ref/copy.sgml.diff?r1=1.70&r2=1.70.2.1)
    pgsql/src/backend/commands:
        copy.c (r1.254.2.2 -> r1.254.2.3)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/copy.c.diff?r1=1.254.2.2&r2=1.254.2.3)
    pgsql/src/test/regress/expected:
        copy2.out (r1.22 -> r1.22.2.1)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/expected/copy2.out.diff?r1=1.22&r2=1.22.2.1)
    pgsql/src/test/regress/sql:
        copy2.sql (r1.13 -> r1.13.2.1)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/sql/copy2.sql.diff?r1=1.13&r2=1.13.2.1)

Re: pgsql: Add regression tests for CSV and \., and add automatic quoting of

From
"Andrew Dunstan"
Date:
Andrew Dunstan said:
> Bruce Momjian said:
>> Log Message:
>> -----------
>> Add regression tests for CSV and \., and add automatic quoting of a
single column dump that has a \. value, so the load works properly.  I also
added documentation describing this issue.
>>
>
> This seems unnecessarily elaborate, in code that is already byzantine. I
think we can safely quote *any* field that has \. regardless of whether or
not it is a singleton. There's no need to make a single column a special
case - if it's valid for a singleton it's valid for any, and vice versa.
>


Now that I've woken up properly I realise that it's also just wrong - it
will miss the case we need to catch of the first column of a multi-column
line beginning with \. - just treat them all the same and all will be well.

Also, this test is suspicious:

    strcmp(string, "\\.") == 0

Don't we also want to quote it if the field reads \.x ?
    strncmp(string, "\\.",2) == 0
seems like it would be a better test.


cheers

andrew




Re: pgsql: Add regression tests for CSV and \., and add

From
Bruce Momjian
Date:
Andrew Dunstan wrote:
> Andrew Dunstan said:
> > Bruce Momjian said:
> >> Log Message:
> >> -----------
> >> Add regression tests for CSV and \., and add automatic quoting of a
> single column dump that has a \. value, so the load works properly.  I also
> added documentation describing this issue.
> >>
> >
> > This seems unnecessarily elaborate, in code that is already byzantine. I
> think we can safely quote *any* field that has \. regardless of whether or
> not it is a singleton. There's no need to make a single column a special
> case - if it's valid for a singleton it's valid for any, and vice versa.
> >
>
>
> Now that I've woken up properly I realise that it's also just wrong - it
> will miss the case we need to catch of the first column of a multi-column
> line beginning with \. - just treat them all the same and all will be well.
>
> Also, this test is suspicious:
>
>     strcmp(string, "\\.") == 0
>
> Don't we also want to quote it if the field reads \.x ?
>     strncmp(string, "\\.",2) == 0
> seems like it would be a better test.

Have you looked at the regression tests I added?  \.x will no longer be
interpreted as an end-of-data marker:

    COPY testeoc FROM stdin CSV;
    a\.
    \.b
    c\.d
    "\."
    \.

    COPY testeoc TO stdout CSV;
    a\.
    \.b
    c\.d
    "\."

Our documentation says \. must appear alone on a line.  With non-CSV, we
allow \. to appear on the end of a line too because it can not be a data
value, but for CSV, we have to enforce that.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: pgsql: Add regression tests for CSV and \., and add

From
"Andrew Dunstan"
Date:
Bruce Momjian said:
>> Now that I've woken up properly I realise that it's also just wrong -
>> it will miss the case we need to catch of the first column of a
>> multi-column line beginning with \. - just treat them all the same and
>> all will be well.
>>
>> Also, this test is suspicious:
>>
>>     strcmp(string, "\\.") == 0
>>
>> Don't we also want to quote it if the field reads \.x ?
>>     strncmp(string, "\\.",2) == 0
>> seems like it would be a better test.
>
> Have you looked at the regression tests I added?  \.x will no longer be
> interpreted as an end-of-data marker:
>

Oh. I didn't realise that part of the behaviour had changed. Does that alter
our robustness?

I still think it seems odd to quote the field depending on it being the only
field. That feels more inconsistent than quoting it whenever it is \.

But I guess it's a corner case that is hardly likely to happen in real life.


cheers

andrew



Re: pgsql: Add regression tests for CSV and \., and add

From
Bruce Momjian
Date:
Andrew Dunstan wrote:
> Bruce Momjian said:
> >> Now that I've woken up properly I realise that it's also just wrong -
> >> it will miss the case we need to catch of the first column of a
> >> multi-column line beginning with \. - just treat them all the same and
> >> all will be well.
> >>
> >> Also, this test is suspicious:
> >>
> >>     strcmp(string, "\\.") == 0
> >>
> >> Don't we also want to quote it if the field reads \.x ?
> >>     strncmp(string, "\\.",2) == 0
> >> seems like it would be a better test.
> >
> > Have you looked at the regression tests I added?  \.x will no longer be
> > interpreted as an end-of-data marker:
> >
>
> Oh. I didn't realise that part of the behaviour had changed. Does that alter
> our robustness?

It improves our robustness, for sure.

> I still think it seems odd to quote the field depending on it being the only
> field. That feels more inconsistent than quoting it whenever it is \.
>
> But I guess it's a corner case that is hardly likely to happen in real life.

That is why I did it only for the single-column case, so the quoting
would happen only when required, and it would limit confusion.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073