Re: Fixing backslash dot for COPY FROM...CSV - Mailing list pgsql-hackers
From | Daniel Verite |
---|---|
Subject | Re: Fixing backslash dot for COPY FROM...CSV |
Date | |
Msg-id | 1fba50b1-604c-44f9-b6a6-a3a81e8d0bb8@manitou-mail.org Whole thread Raw |
In response to | Re: Fixing backslash dot for COPY FROM...CSV (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Fixing backslash dot for COPY FROM...CSV
|
List | pgsql-hackers |
Tom Lane wrote: > This is sufficiently weird that I'm starting to come around to > Daniel's original proposal that we just drop the server's recognition > of \. altogether (which would allow removal of some dozens of lines of > complicated and now known-buggy code) FWIW my plan was to not change anything in the TEXT mode, but I wasn't aware it had this issue that you found when \. is not in a line by itself. > Alternatively, we could fix it so that \. at the end of a line draws > "end-of-copy marker corrupt" > which would at least make things consistent, but I'm not sure that has > any great advantage. I surely don't want to document the current > behavioral details as being the right thing that we're going to keep > doing. Agreed we don't want to document that, but also why doesn't \. in the contents represent just a dot (as opposed to being an error), just like \a is a? I mean if eofdata contains foobar\a foobaz\aother then we get after import: f1 -------------- foobara foobazaother (2 rows) Reading the current doc on the text format, I can't see why importing: foobar\. foobar\.other is not supposed to produce f1 -------------- foobar. foobaz.other (2 rows) I see these rules in [1] about backslash: #1. "End of data can be represented by a single line containing just backslash-period (\.)." foobar\. and foobar\.other do not match that so #1 does not describe how they're interpreted. #2. "Backslash characters (\) can be used in the COPY data to quote data characters that might otherwise be taken as row or column delimiters." Dot is not a column delimiter (it's forbidden anyway), so #2 does not apply. #3. "In particular, the following characters must be preceded by a backslash if they appear as part of a column value: backslash itself, newline, carriage return, and the current delimiter character" Dot is not in that list so #3 does not apply. #4. "The following special backslash sequences are recognized by COPY FROM:" (followed by the table with \b \f, ...,) Dot is not mentioned. #5. "Any other backslashed character that is not mentioned in the above table will be taken to represent itself" Here we say that backslash dot represents a dot (unless other rules apply) foobar\. => foobar. foobar\.other => foobar.other #6. "However, beware of adding backslashes unnecessarily, since that might accidentally produce a string matching the end-of-data marker (\.) or the null string (\N by default)." So we *recommend* not to use \. but as I understand it, the warning with the EOD marker is about accidentally creating a line that matches #1, that is, \. alone on a line. #7 "These strings will be recognized before any other backslash processing is done." TBH I don't understand what #7 implies. The order in backslash processing looks like an implementation detail that should not matter in understanding the format? Considering this, it seems to me that #5 says that backslash-dot represents a dot unless #1 applies, and the other #2 #3 #4 #6 #7 rules do not state anything that would contradict that. [1] https://www.postgresql.org/docs/current/sql-copy.html Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
pgsql-hackers by date: