Re: Fixing backslash dot for COPY FROM...CSV - Mailing list pgsql-hackers

From Sutou Kouhei
Subject Re: Fixing backslash dot for COPY FROM...CSV
Date
Msg-id 20240719.151031.622666269892202885.kou@clear-code.com
Whole thread Raw
In response to Re: Fixing backslash dot for COPY FROM...CSV  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Fixing backslash dot for COPY FROM...CSV
Re: Fixing backslash dot for COPY FROM...CSV
List pgsql-hackers
Hi,

I'm reviewing patches in Commitfest 2024-07 from top to bottom:
https://commitfest.postgresql.org/48/

This is the 4th patch:
https://commitfest.postgresql.org/48/4710/

FYI: https://commitfest.postgresql.org/48/4681/ is my patch.

In <2726138.1712462833@sss.pgh.pa.us>
  "Re: Fixing backslash dot for COPY FROM...CSV" on Sun, 07 Apr 2024 00:07:13 -0400,
  Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Here's the problem: if some client-side code thinks it's okay to
> quote "." as "\.", what is likely to happen when it's presented
> a data value consisting only of "."?  It could very easily fall
> into the trap of producing an end-of-data marker.
> 
> If we get rid of the whole concept of end-of-data markers, then
> it'd be totally reasonable to accept "\." as ".".  But as long
> as we still have end-of-data markers, I think it's unwise to allow
> "\." to appear as anything but an end-of-data marker.  It'd just
> add camouflage to the booby trap.

I read through this thread. It seems that this thread
discuss 2 things:

1. \. in CSV mode
2. \. in non-CSV mode

Recent messages discussed mainly 2. but how about create a
separated thread for 2.? Because the original mail focused
on 1. and it seems that we can handle them separately.


Here are comments for the latest v4 patch:

----
diff --git a/src/bin/psql/copy.c b/src/bin/psql/copy.c
index 961ae32694..a39818b193 100644
--- a/src/bin/psql/copy.c
+++ b/src/bin/psql/copy.c
@@ -620,20 +620,32 @@ handleCopyIn(PGconn *conn, FILE *copystream, bool isbinary, PGresult **res)
...
+                            if (copystream == pset.cur_cmd_source)
+                            {
+                                *fgresult='\0';
+                                buflen -= linelen;
----

Spaces around "=" are missing for the fgresult line.

BTW, here is a diff after pgindent:

----
diff --git a/src/bin/psql/copy.c b/src/bin/psql/copy.c
index a39818b1933..4ee8481998a 100644
--- a/src/bin/psql/copy.c
+++ b/src/bin/psql/copy.c
@@ -621,13 +621,13 @@ handleCopyIn(PGconn *conn, FILE *copystream, bool isbinary, PGresult **res)
                 if (buf[buflen - 1] == '\n')
                 {
                     /*
-                     * When at the beginning of the line, check for EOF marker.
-                     * If the marker is found and the data is inlined,
+                     * When at the beginning of the line, check for EOF
+                     * marker. If the marker is found and the data is inlined,
                      * we must stop at this point.  If not, the \. line can be
-                     * sent to the server, and we let it decide whether
-                     * it's an EOF or not depending on the format: in
-                     * basic TEXT, \. is going to be interpreted as an EOF, in
-                     * CSV, it will not.
+                     * sent to the server, and we let it decide whether it's
+                     * an EOF or not depending on the format: in basic TEXT,
+                     * \. is going to be interpreted as an EOF, in CSV, it
+                     * will not.
                      */
                     if (at_line_begin && copystream == pset.cur_cmd_source)
                     {
@@ -635,15 +635,16 @@ handleCopyIn(PGconn *conn, FILE *copystream, bool isbinary, PGresult **res)
                             (linelen == 4 && memcmp(fgresult, "\\.\r\n", 4) == 0))
                         {
                             copydone = true;
+
                             /*
-                             * Remove the EOF marker from the data sent.
-                             * In the case of CSV, the EOF marker must be
+                             * Remove the EOF marker from the data sent. In
+                             * the case of CSV, the EOF marker must be
                              * removed, otherwise it would be interpreted by
                              * the server as valid data.
                              */
                             if (copystream == pset.cur_cmd_source)
                             {
-                                *fgresult='\0';
+                                *fgresult = '\0';
                                 buflen -= linelen;
                             }
                         }
----


I also confirmed that the updated server and non-updated
psql compatibility problem (the end-of-data "\." is
inserted). It seems that it's difficult to solve without
introducing incompatibility.

How about introducing a new COPY option that controls
whether "\." is ignored or not instead of this approach?
Here is a migration idea:

1. Add the new COPY option to v18
2. psql detects whether a server has the new COPY option
   2.1. If it's available, psql uses the option and doesn't send "\."
   2.2. If it's not available, psql doesn't use the option and sends "\."
3. psql always doesn't sends "\." after v17 or earlier reach EOL


Thanks,
-- 
kou



pgsql-hackers by date:

Previous
From: Shlok Kyal
Date:
Subject: Re: Pgoutput not capturing the generated columns
Next
From: Amit Kapila
Date:
Subject: Re: Slow catchup of 2PC (twophase) transactions on replica in LR