Re: Should CSV parsing be stricter about mid-field quotes? - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: Should CSV parsing be stricter about mid-field quotes?
Date
Msg-id 62663752-ddc4-74e0-e982-ed781692eb90@dunslane.net
Whole thread Raw
In response to Re: Should CSV parsing be stricter about mid-field quotes?  ("Joel Jacobson" <joel@compiler.org>)
Responses Re: Should CSV parsing be stricter about mid-field quotes?
List pgsql-hackers


On 2023-05-16 Tu 13:15, Joel Jacobson wrote:
On Tue, May 16, 2023, at 13:43, Joel Jacobson wrote:
>If we made midfield quoting a CSV error, those users who are currently mistaken
>about their TSV/TEXT files being CSV while also having balanced quotes in their
>data, would encounter an error rather than a silent failure, which I believe
>would be an enhancement.

Furthermore, I think it could be beneficial to add a HINT message for all type
of CSV/TEXT parsing errors, since the precise ERROR messages might just cause
the user to tinker with the options until it works, instead of carefully reading
through the documentation on the various formats.

Perhaps something like this:

HINT: Are you sure the FORMAT matches your input?

Also, the COPY documentation says nothing about TSV, and I know TEXT isn't
exactly TSV, but it's at least much more TSV than CSV, so maybe we should
describe the differences, such as \N. I think the best advise to users would be
to avoid exporting to .TSV and use .CSV instead, since I've noticed e.g.
Google Sheets to replace newlines in fields with blank space when
exporting .TSV, which effectively destroys data.

The first search results for "postgresql tsv" on Google link to postgresql.org
pages, but the COPY docs are not one of them unfortunately.

The first relevant hit is this one:

"Importing a TSV File into Postgres | by Riley Wong" [1]

Sadly, this author has also misunderstood how to properly import a .TSV file,
he got it all wrong, and doesn't understand or at least doesn't mention there
are more differences than just the delimiter:

COPY listings 
FROM '/home/ec2-user/list.tsv'
DELIMITER E'\t'
CSV HEADER;

I must confess I have used PostgreSQL for over two decades without having really
understood the detailed differences between TEXT and CSV, until recently.


You can use CSV mode pretty reliably for TSV files. The trick is to use a quoting char that shouldn't appear, such as E'\x01' as well as setting the delimiter to E'\t'. Yes, it's far from obvious.


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Kirk Wolak
Date:
Subject: Re: psql: Could we get "-- " prefixing on the **** QUERY **** outputs? (ECHO_HIDDEN)
Next
From: Laurenz Albe
Date:
Subject: Re: psql: Could we get "-- " prefixing on the **** QUERY **** outputs? (ECHO_HIDDEN)