Re: Should CSV parsing be stricter about mid-field quotes? - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: Should CSV parsing be stricter about mid-field quotes?
Date
Msg-id f594c28d-0da7-40d8-82f4-8e238ee3ccde@app.fastmail.com
Whole thread Raw
In response to Re: Should CSV parsing be stricter about mid-field quotes?  ("Daniel Verite" <daniel@manitou-mail.org>)
Responses Re: Should CSV parsing be stricter about mid-field quotes?
List pgsql-hackers
On Thu, May 18, 2023, at 18:48, Daniel Verite wrote:
> Joel Jacobson wrote:
>> OTOH, one would then need to inspect the TSV file doesn't contain \. on an
>> empty line...
>
> Note that this is the case for valid CSV contents, since backslash-dot
> on a line by itself is both an end-of-data marker for COPY FROM and a
> valid CSV line.
> Having this line in the data results in either an error or having the
> rest of the data silently discarded, depending on the context. There
> is some previous discussion about this in [1].
> Since the TEXT format doesn't have this kind of problem, one solution
> is to filter the data through PROGRAM with an [untrusted CSV]->TEXT
> filter. This is to be preferred over direct CSV loading when
> strictness or robustness are more important than convenience.
>
>
> [1]
> https://www.postgresql.org/message-id/10e3eff6-eb04-4b3f-aeb4-b920192b977a@manitou-mail.org

Thanks for sharing the old thread, very useful.
I see I've failed miserably to understand all the details of the COPY command.
 
Upon reading the thread, I'm still puzzled about one thing:

Why does \. need to have a special meaning when using COPY FROM with files?

I understand its necessity for STDIN, given that the end of input needs to be
explicitly defined.
However, for files, we have a known file size and the end-of-file can be
detected without the need for special markers.

Also, is the difference in how server-side COPY CSV is capable of dealing
with \. but apparently not the client-side \COPY CSV documented somewhere?

CREATE TABLE t (c text);
INSERT INTO t (c) VALUES ('foo'), (E'\n\\.\n'), ('bar');

-- Works OK:
COPY t TO '/tmp/t.csv' WITH CSV;
TRUNCATE t;
COPY t FROM '/tmp/t.csv' WITH CSV;

-- Doesn't work:
\COPY t TO '/tmp/t.csv' WITH CSV;
TRUNCATE t;
\COPY t FROM '/tmp/t.csv' WITH CSV;
ERROR:  unterminated CSV quoted field
CONTEXT:  COPY t, line 4: ""
\.
"

/Joel



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Conflict between regression tests namespace & transactions due to recent changes
Next
From: Marina Polyakova
Date:
Subject: Re: Conflict between regression tests namespace & transactions due to recent changes