Re: New "single" COPY format - Mailing list pgsql-hackers
From | Joel Jacobson |
---|---|
Subject | Re: New "single" COPY format |
Date | |
Msg-id | 7abe064b-f660-465d-a522-341a325fe530@app.fastmail.com Whole thread Raw |
In response to | Re: New "single" COPY format ("Daniel Verite" <daniel@manitou-mail.org>) |
Responses |
Re: New "single" COPY format
|
List | pgsql-hackers |
On Fri, Nov 8, 2024, at 20:44, Daniel Verite wrote: > Aleksander Alekseev wrote: > >> IMO it should be 'text' we already have with special options e.g. >> DELIMITER AS NULL ESCAPE AS NULL. If there are no escape characters >> and column delimiters (and no NULLs designations, and what else I >> forgot) then your text file just contains one tuple per line. > > +1 for the idea that accepting "no delimiter" and "no escape" > as a valid combination for the text format seems better > than adding a new format. > However inviting "NULL" into that syntax when it has nothing to do > with the SQL "NULL" does not look like a good idea. > Maybe DELIMITER '' ESCAPE '', or DELIMITER NONE ESCAPE NONE. Okay, let's see if we can solve all problems I see with overloading the 'text' format: 1. Text files containing \. in the middle of the file % cat /tmp/test.txt foo \. bar How do we import such a file? Is it not supported? Or another option to turn off the special meaning of \.? Both seems like bad ideas to me, maybe there is a nice idea I fail to see? 2. NULL option is \N for 'text', so to import a plain text file safely, where \N lines should not be converted to NULL, users would need to also specify NULL '', which seems like a footgun to me. 3. What should happen if specifying DELIMITER NONE, and: - specifying a column list with more than one column? - not also specifying ESCAPE NONE? 4. What should happen if specifying ESCAPE NONE, and - specifying a column list with more than one column? 5. What about the isomorphism violation, I brought up in my previous email, that is, the non-bijective mapping and irreversibility, for records with embedded newlines? This is also a problem with a separate format, but I wonder what you think about the problem, if it's acceptable, or needs to be solved, and if so, if you see any solutions. > Besides, "single" as a format name does not sound right. > Generally the name for a text format designates a set > of characteristics meaning that certain combinations of > characters have specific behaviors. > Sometimes "plain" is used in the context of text formats > to indicate that no character is special ("plain" is also the > default subtype of "text" in MIME types). > > "single" as proposed is to be understood as "single-column", > which is a consequence of the lack of a field delimiter, but > not an intrinsic characteristic of the format. > If COPY accepted fixed-length fields, it could be in a > no-delimiter no-escape mode and still handle multiple > columns, in opposition to what "single" suggests. Good points. I agree "plain" is a better name. /Joel
pgsql-hackers by date: