Thread: pgsql: Add regression tests for CSV and \., and add automatic quoting of
pgsql: Add regression tests for CSV and \., and add automatic quoting of
From
momjian@postgresql.org (Bruce Momjian)
Date:
Log Message: ----------- Add regression tests for CSV and \., and add automatic quoting of a single column dump that has a \. value, so the load works properly. I also added documentation describing this issue. Backpatch to 8.1.X. Tags: ---- REL8_1_STABLE Modified Files: -------------- pgsql/doc/src/sgml/ref: copy.sgml (r1.70 -> r1.70.2.1) (http://developer.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/ref/copy.sgml.diff?r1=1.70&r2=1.70.2.1) pgsql/src/backend/commands: copy.c (r1.254.2.2 -> r1.254.2.3) (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/copy.c.diff?r1=1.254.2.2&r2=1.254.2.3) pgsql/src/test/regress/expected: copy2.out (r1.22 -> r1.22.2.1) (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/expected/copy2.out.diff?r1=1.22&r2=1.22.2.1) pgsql/src/test/regress/sql: copy2.sql (r1.13 -> r1.13.2.1) (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/sql/copy2.sql.diff?r1=1.13&r2=1.13.2.1)
Re: pgsql: Add regression tests for CSV and \., and add automatic quoting of
From
"Andrew Dunstan"
Date:
Andrew Dunstan said: > Bruce Momjian said: >> Log Message: >> ----------- >> Add regression tests for CSV and \., and add automatic quoting of a single column dump that has a \. value, so the load works properly. I also added documentation describing this issue. >> > > This seems unnecessarily elaborate, in code that is already byzantine. I think we can safely quote *any* field that has \. regardless of whether or not it is a singleton. There's no need to make a single column a special case - if it's valid for a singleton it's valid for any, and vice versa. > Now that I've woken up properly I realise that it's also just wrong - it will miss the case we need to catch of the first column of a multi-column line beginning with \. - just treat them all the same and all will be well. Also, this test is suspicious: strcmp(string, "\\.") == 0 Don't we also want to quote it if the field reads \.x ? strncmp(string, "\\.",2) == 0 seems like it would be a better test. cheers andrew
Andrew Dunstan wrote: > Andrew Dunstan said: > > Bruce Momjian said: > >> Log Message: > >> ----------- > >> Add regression tests for CSV and \., and add automatic quoting of a > single column dump that has a \. value, so the load works properly. I also > added documentation describing this issue. > >> > > > > This seems unnecessarily elaborate, in code that is already byzantine. I > think we can safely quote *any* field that has \. regardless of whether or > not it is a singleton. There's no need to make a single column a special > case - if it's valid for a singleton it's valid for any, and vice versa. > > > > > Now that I've woken up properly I realise that it's also just wrong - it > will miss the case we need to catch of the first column of a multi-column > line beginning with \. - just treat them all the same and all will be well. > > Also, this test is suspicious: > > strcmp(string, "\\.") == 0 > > Don't we also want to quote it if the field reads \.x ? > strncmp(string, "\\.",2) == 0 > seems like it would be a better test. Have you looked at the regression tests I added? \.x will no longer be interpreted as an end-of-data marker: COPY testeoc FROM stdin CSV; a\. \.b c\.d "\." \. COPY testeoc TO stdout CSV; a\. \.b c\.d "\." Our documentation says \. must appear alone on a line. With non-CSV, we allow \. to appear on the end of a line too because it can not be a data value, but for CSV, we have to enforce that. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian said: >> Now that I've woken up properly I realise that it's also just wrong - >> it will miss the case we need to catch of the first column of a >> multi-column line beginning with \. - just treat them all the same and >> all will be well. >> >> Also, this test is suspicious: >> >> strcmp(string, "\\.") == 0 >> >> Don't we also want to quote it if the field reads \.x ? >> strncmp(string, "\\.",2) == 0 >> seems like it would be a better test. > > Have you looked at the regression tests I added? \.x will no longer be > interpreted as an end-of-data marker: > Oh. I didn't realise that part of the behaviour had changed. Does that alter our robustness? I still think it seems odd to quote the field depending on it being the only field. That feels more inconsistent than quoting it whenever it is \. But I guess it's a corner case that is hardly likely to happen in real life. cheers andrew
Andrew Dunstan wrote: > Bruce Momjian said: > >> Now that I've woken up properly I realise that it's also just wrong - > >> it will miss the case we need to catch of the first column of a > >> multi-column line beginning with \. - just treat them all the same and > >> all will be well. > >> > >> Also, this test is suspicious: > >> > >> strcmp(string, "\\.") == 0 > >> > >> Don't we also want to quote it if the field reads \.x ? > >> strncmp(string, "\\.",2) == 0 > >> seems like it would be a better test. > > > > Have you looked at the regression tests I added? \.x will no longer be > > interpreted as an end-of-data marker: > > > > Oh. I didn't realise that part of the behaviour had changed. Does that alter > our robustness? It improves our robustness, for sure. > I still think it seems odd to quote the field depending on it being the only > field. That feels more inconsistent than quoting it whenever it is \. > > But I guess it's a corner case that is hardly likely to happen in real life. That is why I did it only for the single-column case, so the quoting would happen only when required, and it would limit confusion. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073