Re: Should CSV parsing be stricter about mid-field quotes? - Mailing list pgsql-hackers

From Daniel Verite
Subject Re: Should CSV parsing be stricter about mid-field quotes?
Date
Msg-id 5094828d-8f56-47de-a4eb-7dfbbca783c8@manitou-mail.org
Whole thread Raw
In response to Re: Should CSV parsing be stricter about mid-field quotes?  ("Joel Jacobson" <joel@compiler.org>)
Responses Re: Should CSV parsing be stricter about mid-field quotes?
List pgsql-hackers
    Joel Jacobson wrote:

> I've been using that trick myself many times in the past, but thanks to this
> deep-dive into this topic, it looks to me like TEXT would be a better format
> fit when dealing with unquoted TSV files, or?
>
> OTOH, one would then need to inspect the TSV file doesn't contain \. on an
> empty line...

Note that this is the case for valid CSV contents, since backslash-dot
on a line by itself is both an end-of-data marker for COPY FROM and a
valid CSV line.
Having this line in the data results in either an error or having the
rest of the data silently discarded, depending on the context. There
is some previous discussion about this in [1].
Since the TEXT format doesn't have this kind of problem, one solution
is to filter the data through PROGRAM with an [untrusted CSV]->TEXT
filter. This is to be preferred over direct CSV loading when
strictness or robustness are more important than convenience.


[1]
https://www.postgresql.org/message-id/10e3eff6-eb04-4b3f-aeb4-b920192b977a@manitou-mail.org

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Assert failure of the cross-check for nullingrels
Next
From: Jeff Davis
Date:
Subject: Re: Order changes in PG16 since ICU introduction