Thread: Unworkable column delimiter characters for COPY
Currently, copy.c rejects newline, carriage return, and backslash as settings for the column delimiter character (in non-CSV mode). These all seem necessary to avoid confusion. However, I just noticed that the letters r, n, t, etc would also not work: on output, data characters matching such a delimiter would get escaped as \r, \n, etc, which on input would be read as C-style control characters. I think at minimum we need to forbid b, f, n, r, t, v, which are the control character representations currently recognized by COPY. But I'm tempted to make it reject all 26 lower-case ASCII letters, as a form of future-proofing. Thoughts? regards, tom lane
Tom Lane wrote: > Currently, copy.c rejects newline, carriage return, and backslash as > settings for the column delimiter character (in non-CSV mode). These > all seem necessary to avoid confusion. However, I just noticed that the > letters r, n, t, etc would also not work: on output, data characters > matching such a delimiter would get escaped as \r, \n, etc, which on > input would be read as C-style control characters. > > I think at minimum we need to forbid b, f, n, r, t, v, which are the > control character representations currently recognized by COPY. > But I'm tempted to make it reject all 26 lower-case ASCII letters, > as a form of future-proofing. Thoughts? > > Assuming this is only for non-CSV mode, it seems OK. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> I think at minimum we need to forbid b, f, n, r, t, v, which are the >> control character representations currently recognized by COPY. >> But I'm tempted to make it reject all 26 lower-case ASCII letters, >> as a form of future-proofing. Thoughts? > Assuming this is only for non-CSV mode, it seems OK. On looking closer, 'x', octal digits, and '.' would also be trouble. So I made it reject a-z, 0-9, and dot. It appears that the CSV mode is a few bricks shy of a load here as well: it will let you do CSV DELIMITER '"' resulting in entirely broken output. It seems we ought to forbid delimiter from matching CSV quote or escape characters. I'll let you clean up that case though... regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> Tom Lane wrote: >> >>> I think at minimum we need to forbid b, f, n, r, t, v, which are the >>> control character representations currently recognized by COPY. >>> But I'm tempted to make it reject all 26 lower-case ASCII letters, >>> as a form of future-proofing. Thoughts? >>> > > >> Assuming this is only for non-CSV mode, it seems OK. >> > > On looking closer, 'x', octal digits, and '.' would also be trouble. > So I made it reject a-z, 0-9, and dot. > I take it upper case A-F are safe, even though they are hex digits, because they wouldn't immediately follow the backslash? > It appears that the CSV mode is a few bricks shy of a load here as > well: it will let you do CSV DELIMITER '"' resulting in entirely > broken output. It seems we ought to forbid delimiter from matching CSV > quote or escape characters. I'll let you clean up that case though... > > > Lucky me. Ok, I'll look at it. Should be simple enough. cheers andrew