Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV - Mailing list pgsql-bugs

From Bruce Momjian
Subject Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV
Date
Msg-id ZT0q_djqSO3d6MQU@momjian.us
Whole thread Raw
In response to Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV  (Bruce Momjian <bruce@momjian.us>)
Responses Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV  (Bruce Momjian <bruce@momjian.us>)
List pgsql-bugs
On Tue, Aug 16, 2022 at 11:26:47AM -0400, Bruce Momjian wrote:
> On Sun, Aug 14, 2022 at 10:08:55AM -0400, Tom Lane wrote:
> > Noah Misch <noah@leadboat.com> writes:
> > > The psql documentation says [\copy ... from stdin] looks for "\.".  It says
> > > nothing about doing that for [\copy ... from filename].  I wonder if psql
> > > should change the filename case to send the whole file to the server, ignoring
> > > "\." inside.  (That is to say, change to match the psql documentation.)
> > 
> > This seems like a sensible solution.  The need to use an in-band EOF
> > marker when reading from the command source file is clear, and there's
> > no non-kluge way there.  But that doesn't mean we should extend it
> > to separate files.  (I'm surprised to realize we do, actually.)
> 
> So, look what I found in psql/copy.c:
> 
>         /* check for EOF marker, but not on a partial line */
>         if (at_line_begin)
>         {
>         /*
> -->         * This code erroneously assumes '\.' on a line alone
> -->         * inside a quoted CSV string terminates the \copy.
> -->         * https://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
>          */
>         if ((linelen == 3 && memcmp(fgresult, "\\.\n", 3) == 0) ||
>             (linelen == 4 && memcmp(fgresult, "\\.\r\n", 4) == 0))
>         {
>             copydone = true;
>         }
>         }
> 
> I wondered who would reference an email thread in the source code, so I
> looked at the whole 2012 thread:
> 
>     https://www.postgresql.org/message-id/flat/E1TdNVQ-0001ju-GO%40wrigleys.postgresql.org
> 
> and it was, of course, me.  :-(
> 
> Tom's analysis was similar ten years ago:
> 
> > Ugh.  This seems like a rather fundamental oversight in the CSV feature.
> > The problem is that psql has no idea whether the copy is being done in
> > CSV mode or not --- and even if it did, it doesn't parse the data fully
> > enough to realize whether a \. line is inside quotes or not.
> > 
> > In the case of out-of-line data files, it might be reasonable to just
> > dispense with the check for \. altogether and always ship the whole file
> > to the backend; I think there's a \. check on the backend side.  (Not
> > sure this is safe in V2 protocol, but I doubt anyone cares anymore
> > about that.)
> > 
> > In the case of in-line data in a script file, CSV mode seems a bit
> > broken in any case; there's no concept of a terminator in CSV, AFAIK.
> > So maybe we don't have to worry about that.
> 
> I think at this point we should either fix this or document it.

I have decided to document this with the attached patch.  I plan to
apply it to all supported Postgres versions.  If we ever fix FROM file
so only FROM STDIN is a problem, we can update the documentation.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Attachment

pgsql-bugs by date:

Previous
From: Andrei Lepikhov
Date:
Subject: Re: BUG #18170: Unexpected error: no relation entry for relid 3
Next
From: Tom Lane
Date:
Subject: Re: BUG #18172: High memory usage in tSRF function context