Re: New "single" COPY format - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: New "single" COPY format
Date
Msg-id 7abe064b-f660-465d-a522-341a325fe530@app.fastmail.com
Whole thread Raw
In response to Re: New "single" COPY format  ("Daniel Verite" <daniel@manitou-mail.org>)
Responses Re: New "single" COPY format
List pgsql-hackers
On Fri, Nov 8, 2024, at 20:44, Daniel Verite wrote:
> Aleksander Alekseev wrote:
>
>> IMO it should be 'text' we already have with special options e.g.
>> DELIMITER AS NULL ESCAPE AS NULL. If there are no escape characters
>> and column delimiters (and no NULLs designations, and what else I
>> forgot) then your text file just contains one tuple per line.
>
> +1 for the idea that accepting "no delimiter"  and "no escape"
> as a valid combination for the text format seems better
> than adding a new format.
> However inviting "NULL" into that syntax when it has nothing to do
> with the SQL "NULL" does not look like a good idea.
> Maybe DELIMITER '' ESCAPE '', or DELIMITER NONE ESCAPE NONE.

Okay, let's see if we can solve all problems I see with
overloading the 'text' format:

1. Text files containing \. in the middle of the file
% cat /tmp/test.txt
foo
\.
bar

How do we import such a file?
Is it not supported?
Or another option to turn off the special meaning of \.?
Both seems like bad ideas to me, maybe there is a nice idea I fail to see?

2. NULL option is \N for 'text', so to import a plain text
file safely, where \N lines should not be converted to NULL,
users would need to also specify NULL '', which seems
like a footgun to me.

3. What should happen if specifying DELIMITER NONE, and:
- specifying a column list with more than one column?
- not also specifying ESCAPE NONE?

4. What should happen if specifying ESCAPE NONE, and
- specifying a column list with more than one column?

5. What about the isomorphism violation, I brought up in my
previous email, that is, the non-bijective mapping and irreversibility,
for records with embedded newlines?
This is also a problem with a separate format,
but I wonder what you think about the problem,
if it's acceptable, or needs to be solved, and if so,
if you see any solutions.

> Besides, "single" as a format name does not sound right.
> Generally the name for a text format designates a set
> of characteristics meaning that certain combinations of
> characters have specific behaviors.
> Sometimes "plain" is used in the context of text formats
> to indicate that no character is special ("plain" is also the
> default subtype of "text" in MIME types).
>
> "single" as proposed is to be understood as "single-column",
> which is a consequence of the lack of a field delimiter, but
> not an intrinsic characteristic of the format.
> If COPY accepted fixed-length fields, it could be in a
> no-delimiter no-escape mode and still handle multiple
> columns, in opposition to what "single" suggests.

Good points. I agree "plain" is a better name.

/Joel



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Fix port/pg_iovec.h building extensions on x86_64-darwin
Next
From: Sergey Prokhorenko
Date:
Subject: Re: UUID v7