Re: Should CSV parsing be stricter about mid-field quotes? - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: Should CSV parsing be stricter about mid-field quotes?
Date
Msg-id 43e1e852-e3ba-4f24-a72b-72224acdbea4@app.fastmail.com
Whole thread Raw
In response to Re: Should CSV parsing be stricter about mid-field quotes?  ("Daniel Verite" <daniel@manitou-mail.org>)
List pgsql-hackers
On Fri, Oct 11, 2024, at 15:04, Joel Jacobson wrote:
> On Thu, Oct 10, 2024, at 10:37, Daniel Verite wrote:
>> Joel Jacobson wrote:
>>
>>> - No Headers or Metadata:
>>
>> It's not clear why it's necessary to disable the HEADER option
>> for this format?
>
> It's not necessary, no, just couldn't see a use-case,
> since I only thought about the COPY FROM case
> where one would be dealing with unstructured undelimited
> text files, such as log files coming from some other system,
> that I've never seen have header rows.
>
> However, thanks to your question, I see how a user
> might want to use the raw format to export a text
> column "as is" using COPY TO, in which case it would
> be useful to use HEADER and then HEADER MATCH
> for COPY FROM.
>
> I therefore think the HEADER option should be supported
> for the new raw format.
>
>>>  The format does not support header rows or end-of-data markers;
>>>  every line is treated as data.
>>
>> With COPY FROM STDIN followed by inline data in a script,
>> an end-of-data marker is required.  That's also a problem
>> for CSV except it's mitigated by the possibility of quoting
>> (using "\." instead of \.)
>
> Right. As long as \. won't have any special meaning for the raw format
> except in the STDIN case, that seems fine.
>
> I haven't looked at that part of the code in detail yet though.
>
> As a preparatory step, I think we should replace the two
> "binary" and "csv_mode" bool fields in CopyFormatOptions,
> with a new "format" field of a new new CopyFormat enum type.
>
> If instead introducing another bool field, I think the code would
> be too cluttered.

I'm starting a new thread for this with a more suitable subject.

/Joel



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Add contrib/pg_logicalsnapinspect
Next
From: Daniel Gustafsson
Date:
Subject: Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions