Re: Should CSV parsing be stricter about mid-field quotes? - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: Should CSV parsing be stricter about mid-field quotes?
Date
Msg-id 92a9ae94-6884-4860-ab5f-3995c6f2b137@app.fastmail.com
Whole thread Raw
In response to Re: Should CSV parsing be stricter about mid-field quotes?  ("Joel Jacobson" <joel@compiler.org>)
Responses Re: Should CSV parsing be stricter about mid-field quotes?
List pgsql-hackers
On Tue, May 16, 2023, at 13:43, Joel Jacobson wrote:
>If we made midfield quoting a CSV error, those users who are currently mistaken
>about their TSV/TEXT files being CSV while also having balanced quotes in their
>data, would encounter an error rather than a silent failure, which I believe
>would be an enhancement.

Furthermore, I think it could be beneficial to add a HINT message for all type
of CSV/TEXT parsing errors, since the precise ERROR messages might just cause
the user to tinker with the options until it works, instead of carefully reading
through the documentation on the various formats.

Perhaps something like this:

HINT: Are you sure the FORMAT matches your input?

Also, the COPY documentation says nothing about TSV, and I know TEXT isn't
exactly TSV, but it's at least much more TSV than CSV, so maybe we should
describe the differences, such as \N. I think the best advise to users would be
to avoid exporting to .TSV and use .CSV instead, since I've noticed e.g.
Google Sheets to replace newlines in fields with blank space when
exporting .TSV, which effectively destroys data.

The first search results for "postgresql tsv" on Google link to postgresql.org
pages, but the COPY docs are not one of them unfortunately.

The first relevant hit is this one:

"Importing a TSV File into Postgres | by Riley Wong" [1]

Sadly, this author has also misunderstood how to properly import a .TSV file,
he got it all wrong, and doesn't understand or at least doesn't mention there
are more differences than just the delimiter:

COPY listings 
FROM '/home/ec2-user/list.tsv'
DELIMITER E'\t'
CSV HEADER;

I must confess I have used PostgreSQL for over two decades without having really
understood the detailed differences between TEXT and CSV, until recently.


pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Order changes in PG16 since ICU introduction
Next
From: Pavel Stehule
Date:
Subject: Re: Schema variables - new implementation for Postgres 15