Thread: [BUGS] \copy produces CSV output that cannot be read by \copy

[BUGS] \copy produces CSV output that cannot be read by \copy

From
Nicolas Barbier
Date:
Description: If the end-of-data marker \. occurs on a line of its own in a multiline string value, \copy outputs it to CSV without any quoting. Subsequently reading that CSV file using \copy fails.

Version:

$ psql
psql (9.5.7)
Type "help" for help.

test=# select version();
                                              version                                             
---------------------------------------------------------------------------------------------------
 PostgreSQL 9.5.7 on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-16) 6.3.0 20170425, 64-bit
(1 row)

How to reproduce:

CREATE TABLE test (t varchar);
INSERT INTO test (t) VALUES ('
\.
');
\copy (SELECT t FROM test) TO 'test.csv' WITH CSV;
\copy test FROM 'test.csv' WITH (FORMAT CSV);

This results in:

ERROR:  unterminated CSV quoted field
CONTEXT:  COPY test, line 1: ""
"

Expected result: \copy can read the files it produced itself.

Analysis: I think that neither the writing nor the reading is incompatible with the documentation. Rather, there doesn’t seem to be a way to write such values to CSV that will be read back correctly (or at least I couldn‘t find such a way), as the reader always stops when encountering \. on a line by itself, and because there doesn’t seem to be a way to escape such a value in a way that wouldn’t produce a \. on a line by itself.

Nicolas

Re: [BUGS] \copy produces CSV output that cannot be read by \copy

From
Michael Paquier
Date:
On Fri, Aug 4, 2017 at 2:38 PM, Nicolas Barbier
<nicolas.barbier@gmail.com> wrote:
> Analysis: I think that neither the writing nor the reading is incompatible
> with the documentation. Rather, there doesn’t seem to be a way to write such
> values to CSV that will be read back correctly (or at least I couldn‘t find
> such a way), as the reader always stops when encountering \. on a line by
> itself, and because there doesn’t seem to be a way to escape such a value in
> a way that wouldn’t produce a \. on a line by itself.

The format produced by COPY OUT looks fine to me, and can be reloaded
with a plain COPY (not \copy). And you may be interested in this bit
from src/bin/psql/copy.c:                   /*                    * This code erroneously assumes '\.' on a line alone
                 * inside a quoted CSV string terminates the \copy.                    * 
http://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org                    */
if(strcmp(buf, "\\.\n") == 0 ||                       strcmp(buf, "\\.\r\n") == 0)                   {
    copydone = true;                       break;                   } 
postgresql.org is offline now, and I don't have this thread at hand,
but I guess that the answer is there...
--
Michael


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] \copy produces CSV output that cannot be read by \copy

From
Tom Lane
Date:
Michael Paquier <michael.paquier@gmail.com> writes:
> The format produced by COPY OUT looks fine to me, and can be reloaded
> with a plain COPY (not \copy). And you may be interested in this bit
> from src/bin/psql/copy.c:
>                     /*
>                      * This code erroneously assumes '\.' on a line alone
>                      * inside a quoted CSV string terminates the \copy.
>                      *
> http://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
>                      */

I wonder if it would improve matters to check for "\." only when
copystream == pset.cur_cmd_source, that is, only when the copy data
is inlined into the SQL stream.  That would create an inconsistency
between inline and out-of-line data, but it might be a reasonable
thing to do anyway.
        regards, tom lane


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] \copy produces CSV output that cannot be read by \copy

From
Michael Paquier
Date:
On Sat, Aug 5, 2017 at 6:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> The format produced by COPY OUT looks fine to me, and can be reloaded
>> with a plain COPY (not \copy). And you may be interested in this bit
>> from src/bin/psql/copy.c:
>>                     /*
>>                      * This code erroneously assumes '\.' on a line alone
>>                      * inside a quoted CSV string terminates the \copy.
>>                      *
>> http://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
>>                      */
>
> I wonder if it would improve matters to check for "\." only when
> copystream == pset.cur_cmd_source, that is, only when the copy data
> is inlined into the SQL stream.  That would create an inconsistency
> between inline and out-of-line data, but it might be a reasonable
> thing to do anyway.

A complete solution would be to look for the quote option provided by
the user and track if the string being passed to the backend is within
a quoted area or not, no? If that's a quoted area, the check for "\."
could be bypassed. Now, as parse_slash_copy() has its own way to parse
the command options given by the user, perhaps all this extra
engineering is not worth fixing an edge case.

In short, I agree that what you propose here has value to fix the case
proposed here, as even COPY FROM stdin (not only \copy) fails now.
-- 
Michael


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] \copy produces CSV output that cannot be read by \copy

From
Bruce Momjian
Date:
On Thu, Aug 17, 2017 at 01:53:05PM +0900, Michael Paquier wrote:
> On Sat, Aug 5, 2017 at 6:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Michael Paquier <michael.paquier@gmail.com> writes:
> >> The format produced by COPY OUT looks fine to me, and can be reloaded
> >> with a plain COPY (not \copy). And you may be interested in this bit
> >> from src/bin/psql/copy.c:
> >>                     /*
> >>                      * This code erroneously assumes '\.' on a line alone
> >>                      * inside a quoted CSV string terminates the \copy.
> >>                      *
> >> http://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
> >>                      */
> >
> > I wonder if it would improve matters to check for "\." only when
> > copystream == pset.cur_cmd_source, that is, only when the copy data
> > is inlined into the SQL stream.  That would create an inconsistency
> > between inline and out-of-line data, but it might be a reasonable
> > thing to do anyway.
> 
> A complete solution would be to look for the quote option provided by
> the user and track if the string being passed to the backend is within
> a quoted area or not, no? If that's a quoted area, the check for "\."
> could be bypassed. Now, as parse_slash_copy() has its own way to parse
> the command options given by the user, perhaps all this extra
> engineering is not worth fixing an edge case.
> 
> In short, I agree that what you propose here has value to fix the case
> proposed here, as even COPY FROM stdin (not only \copy) fails now.

This is a six-year-old thread, but I have applied the following doc
patch to at least document this behavior.

---------------------------------------------------------------------------

commit 42d3125ada
Author: Bruce Momjian <bruce@momjian.us>
Date:   Fri Nov 3 13:57:59 2023 -0400

    doc: \copy can get data values \. and end-of-input confused

    Reported-by: Svante Richter

    Discussion: https://postgr.es/m/fcd57e4-8f23-4c3e-a5db-2571d09208e2@beta.fastmail.com

    Backpatch-through: 11

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index d94e3cacfc..cc7d797159 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1119,6 +1119,10 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g
         destination, because all data must pass through the client/server
         connection.  For large amounts of data the <acronym>SQL</acronym>
         command might be preferable.
+        Also, because of this pass-through method, <literal>\copy
+        ... from</literal> in <acronym>CSV</acronym> mode will erroneously
+        treat a <literal>\.</literal> data value alone on a line as an
+        end-of-input marker.
         </para>
         </tip>

diff --git a/src/bin/psql/copy.c b/src/bin/psql/copy.c
index b3cc3d9a29..dbbbdb8898 100644
--- a/src/bin/psql/copy.c
+++ b/src/bin/psql/copy.c
@@ -627,6 +627,8 @@ handleCopyIn(PGconn *conn, FILE *copystream, bool isbinary, PGresult **res)
                         * This code erroneously assumes '\.' on a line alone
                         * inside a quoted CSV string terminates the \copy.
                         * https://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
+                        *
+                        *
https://www.postgresql.org/message-id/bfcd57e4-8f23-4c3e-a5db-2571d09208e2@beta.fastmail.com
                         */
                        if ((linelen == 3 && memcmp(fgresult, "\\.\n", 3) == 0) ||
                            (linelen == 4 && memcmp(fgresult, "\\.\r\n", 4) == 0))


-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.