Thread: [BUGS] \copy produces CSV output that cannot be read by \copy
Description: If the end-of-data marker \. occurs on a line of its own in a multiline string value, \copy outputs it to CSV without any quoting. Subsequently reading that CSV file using \copy fails.
Version:
$ psql
psql (9.5.7)
Type "help" for help.
test=# select version();
version
---------------------------------------------------------------------------------------------------
PostgreSQL 9.5.7 on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-16) 6.3.0 20170425, 64-bit
(1 row)
$ psql
psql (9.5.7)
Type "help" for help.
test=# select version();
version
---------------------------------------------------------------------------------------------------
PostgreSQL 9.5.7 on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-16) 6.3.0 20170425, 64-bit
(1 row)
How to reproduce:
CREATE TABLE test (t varchar);
INSERT INTO test (t) VALUES ('
\.
');
\copy (SELECT t FROM test) TO 'test.csv' WITH CSV;
\copy test FROM 'test.csv' WITH (FORMAT CSV);
This results in:INSERT INTO test (t) VALUES ('
\.
');
\copy (SELECT t FROM test) TO 'test.csv' WITH CSV;
\copy test FROM 'test.csv' WITH (FORMAT CSV);
ERROR: unterminated CSV quoted field
CONTEXT: COPY test, line 1: ""
"
Expected result: \copy can read the files it produced itself.
Analysis: I think that neither the writing nor the reading is incompatible with the documentation. Rather, there doesn’t seem to be a way to write such values to CSV that will be read back correctly (or at least I couldn‘t find such a way), as the reader always stops when encountering \. on a line by itself, and because there doesn’t seem to be a way to escape such a value in a way that wouldn’t produce a \. on a line by itself.
Nicolas
On Fri, Aug 4, 2017 at 2:38 PM, Nicolas Barbier <nicolas.barbier@gmail.com> wrote: > Analysis: I think that neither the writing nor the reading is incompatible > with the documentation. Rather, there doesn’t seem to be a way to write such > values to CSV that will be read back correctly (or at least I couldn‘t find > such a way), as the reader always stops when encountering \. on a line by > itself, and because there doesn’t seem to be a way to escape such a value in > a way that wouldn’t produce a \. on a line by itself. The format produced by COPY OUT looks fine to me, and can be reloaded with a plain COPY (not \copy). And you may be interested in this bit from src/bin/psql/copy.c: /* * This code erroneously assumes '\.' on a line alone * inside a quoted CSV string terminates the \copy. * http://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org */ if(strcmp(buf, "\\.\n") == 0 || strcmp(buf, "\\.\r\n") == 0) { copydone = true; break; } postgresql.org is offline now, and I don't have this thread at hand, but I guess that the answer is there... -- Michael -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Michael Paquier <michael.paquier@gmail.com> writes: > The format produced by COPY OUT looks fine to me, and can be reloaded > with a plain COPY (not \copy). And you may be interested in this bit > from src/bin/psql/copy.c: > /* > * This code erroneously assumes '\.' on a line alone > * inside a quoted CSV string terminates the \copy. > * > http://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org > */ I wonder if it would improve matters to check for "\." only when copystream == pset.cur_cmd_source, that is, only when the copy data is inlined into the SQL stream. That would create an inconsistency between inline and out-of-line data, but it might be a reasonable thing to do anyway. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Sat, Aug 5, 2017 at 6:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> The format produced by COPY OUT looks fine to me, and can be reloaded >> with a plain COPY (not \copy). And you may be interested in this bit >> from src/bin/psql/copy.c: >> /* >> * This code erroneously assumes '\.' on a line alone >> * inside a quoted CSV string terminates the \copy. >> * >> http://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org >> */ > > I wonder if it would improve matters to check for "\." only when > copystream == pset.cur_cmd_source, that is, only when the copy data > is inlined into the SQL stream. That would create an inconsistency > between inline and out-of-line data, but it might be a reasonable > thing to do anyway. A complete solution would be to look for the quote option provided by the user and track if the string being passed to the backend is within a quoted area or not, no? If that's a quoted area, the check for "\." could be bypassed. Now, as parse_slash_copy() has its own way to parse the command options given by the user, perhaps all this extra engineering is not worth fixing an edge case. In short, I agree that what you propose here has value to fix the case proposed here, as even COPY FROM stdin (not only \copy) fails now. -- Michael -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Thu, Aug 17, 2017 at 01:53:05PM +0900, Michael Paquier wrote: > On Sat, Aug 5, 2017 at 6:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Michael Paquier <michael.paquier@gmail.com> writes: > >> The format produced by COPY OUT looks fine to me, and can be reloaded > >> with a plain COPY (not \copy). And you may be interested in this bit > >> from src/bin/psql/copy.c: > >> /* > >> * This code erroneously assumes '\.' on a line alone > >> * inside a quoted CSV string terminates the \copy. > >> * > >> http://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org > >> */ > > > > I wonder if it would improve matters to check for "\." only when > > copystream == pset.cur_cmd_source, that is, only when the copy data > > is inlined into the SQL stream. That would create an inconsistency > > between inline and out-of-line data, but it might be a reasonable > > thing to do anyway. > > A complete solution would be to look for the quote option provided by > the user and track if the string being passed to the backend is within > a quoted area or not, no? If that's a quoted area, the check for "\." > could be bypassed. Now, as parse_slash_copy() has its own way to parse > the command options given by the user, perhaps all this extra > engineering is not worth fixing an edge case. > > In short, I agree that what you propose here has value to fix the case > proposed here, as even COPY FROM stdin (not only \copy) fails now. This is a six-year-old thread, but I have applied the following doc patch to at least document this behavior. --------------------------------------------------------------------------- commit 42d3125ada Author: Bruce Momjian <bruce@momjian.us> Date: Fri Nov 3 13:57:59 2023 -0400 doc: \copy can get data values \. and end-of-input confused Reported-by: Svante Richter Discussion: https://postgr.es/m/fcd57e4-8f23-4c3e-a5db-2571d09208e2@beta.fastmail.com Backpatch-through: 11 diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index d94e3cacfc..cc7d797159 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1119,6 +1119,10 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g destination, because all data must pass through the client/server connection. For large amounts of data the <acronym>SQL</acronym> command might be preferable. + Also, because of this pass-through method, <literal>\copy + ... from</literal> in <acronym>CSV</acronym> mode will erroneously + treat a <literal>\.</literal> data value alone on a line as an + end-of-input marker. </para> </tip> diff --git a/src/bin/psql/copy.c b/src/bin/psql/copy.c index b3cc3d9a29..dbbbdb8898 100644 --- a/src/bin/psql/copy.c +++ b/src/bin/psql/copy.c @@ -627,6 +627,8 @@ handleCopyIn(PGconn *conn, FILE *copystream, bool isbinary, PGresult **res) * This code erroneously assumes '\.' on a line alone * inside a quoted CSV string terminates the \copy. * https://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org + * + * https://www.postgresql.org/message-id/bfcd57e4-8f23-4c3e-a5db-2571d09208e2@beta.fastmail.com */ if ((linelen == 3 && memcmp(fgresult, "\\.\n", 3) == 0) || (linelen == 4 && memcmp(fgresult, "\\.\r\n", 4) == 0)) -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.