Thread: Unworkable column delimiter characters for COPY

Unworkable column delimiter characters for COPY

From
Tom Lane
Date:
Currently, copy.c rejects newline, carriage return, and backslash as
settings for the column delimiter character (in non-CSV mode).  These
all seem necessary to avoid confusion.  However, I just noticed that the
letters r, n, t, etc would also not work: on output, data characters
matching such a delimiter would get escaped as \r, \n, etc, which on
input would be read as C-style control characters.

I think at minimum we need to forbid b, f, n, r, t, v, which are the
control character representations currently recognized by COPY.
But I'm tempted to make it reject all 26 lower-case ASCII letters,
as a form of future-proofing.  Thoughts?
        regards, tom lane


Re: Unworkable column delimiter characters for COPY

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Currently, copy.c rejects newline, carriage return, and backslash as
> settings for the column delimiter character (in non-CSV mode).  These
> all seem necessary to avoid confusion.  However, I just noticed that the
> letters r, n, t, etc would also not work: on output, data characters
> matching such a delimiter would get escaped as \r, \n, etc, which on
> input would be read as C-style control characters.
>
> I think at minimum we need to forbid b, f, n, r, t, v, which are the
> control character representations currently recognized by COPY.
> But I'm tempted to make it reject all 26 lower-case ASCII letters,
> as a form of future-proofing.  Thoughts?
>
>   

Assuming this is only for non-CSV mode, it seems OK.

cheers

andrew



Re: Unworkable column delimiter characters for COPY

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> I think at minimum we need to forbid b, f, n, r, t, v, which are the
>> control character representations currently recognized by COPY.
>> But I'm tempted to make it reject all 26 lower-case ASCII letters,
>> as a form of future-proofing.  Thoughts?

> Assuming this is only for non-CSV mode, it seems OK.

On looking closer, 'x', octal digits, and '.' would also be trouble.
So I made it reject a-z, 0-9, and dot.

It appears that the CSV mode is a few bricks shy of a load here as
well: it will let you do CSV DELIMITER '"' resulting in entirely
broken output.  It seems we ought to forbid delimiter from matching CSV
quote or escape characters.  I'll let you clean up that case though...
        regards, tom lane


Re: Unworkable column delimiter characters for COPY

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> Tom Lane wrote:
>>     
>>> I think at minimum we need to forbid b, f, n, r, t, v, which are the
>>> control character representations currently recognized by COPY.
>>> But I'm tempted to make it reject all 26 lower-case ASCII letters,
>>> as a form of future-proofing.  Thoughts?
>>>       
>
>   
>> Assuming this is only for non-CSV mode, it seems OK.
>>     
>
> On looking closer, 'x', octal digits, and '.' would also be trouble.
> So I made it reject a-z, 0-9, and dot.
>   

I take it upper case A-F are safe, even though they are hex digits, 
because they wouldn't immediately follow the backslash?

> It appears that the CSV mode is a few bricks shy of a load here as
> well: it will let you do CSV DELIMITER '"' resulting in entirely
> broken output.  It seems we ought to forbid delimiter from matching CSV
> quote or escape characters.  I'll let you clean up that case though...
>
>             
>   

Lucky me. Ok, I'll look at it. Should be simple enough.

cheers

andrew