Thread: Re: [HACKERS] Unworkable column delimiter characters for COPY

Re: [HACKERS] Unworkable column delimiter characters for COPY

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>
>> Tom Lane wrote:
>>
>>> I think at minimum we need to forbid b, f, n, r, t, v, which are the
>>> control character representations currently recognized by COPY.
>>> But I'm tempted to make it reject all 26 lower-case ASCII letters,
>>> as a form of future-proofing.  Thoughts?
>>>
>
>
>> Assuming this is only for non-CSV mode, it seems OK.
>>
>
> On looking closer, 'x', octal digits, and '.' would also be trouble.
> So I made it reject a-z, 0-9, and dot.
>
> It appears that the CSV mode is a few bricks shy of a load here as
> well: it will let you do CSV DELIMITER '"' resulting in entirely
> broken output.  It seems we ought to forbid delimiter from matching CSV
> quote or escape characters.  I'll let you clean up that case though...
>


This should do the trick - I'll apply it tomorrow.

cheers

andrew

Index: copy.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.293
diff -c -r1.293 copy.c
*** copy.c      27 Dec 2007 18:28:58 -0000      1.293
--- copy.c      28 Dec 2007 04:07:06 -0000
***************
*** 889,894 ****
--- 889,907 ----
                                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                                 errmsg("COPY delimiter cannot be
\"%s\"", cstate->delim)));

+       /* In CSV mode, disallow quote or escape chars as delimiter */
+       if (cstate->csv_mode)
+       {
+               if (cstate->delim[0] == cstate->quote[0])
+                       ereport(ERROR,
+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                                        errmsg("COPY delimiter and
quote must be different")));
+               else if (cstate->delim[0] == cstate->escape[0])
+                       ereport(ERROR,
+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                                        errmsg("COPY delimiter and
escape must be different")));
+       }
+
        /* Check header */
        if (!cstate->csv_mode && cstate->header_line)
                ereport(ERROR,



Re: [HACKERS] Unworkable column delimiter characters for COPY

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> It seems we ought to forbid delimiter from matching CSV
>> quote or escape characters.  I'll let you clean up that case though...

> This should do the trick - I'll apply it tomorrow.

A couple thoughts:

* This test needs to appear further down --- it is not sensible until
after you've checked strlen() == 1 for all the strings involved.

* I see that we disallow the CSV quote character from appearing in the
null_print string, but not the escape character.  Is this
correct/sensible?  If it is correct, maybe the delimiter could also
match the escape character?

            regards, tom lane

Re: [HACKERS] Unworkable column delimiter characters for COPY

From
Andrew Dunstan
Date:

Tom Lane wrote:
> I see that we disallow the CSV quote character from appearing in the
> null_print string, but not the escape character.  Is this
> correct/sensible?

Yes, because:
. nulls are never quoted
. fields containing the quote char must be quoted
. the escape char is only magical inside quoted fields


> If it is correct, maybe the delimiter could also
> match the escape character?
>
>

Yes, probably. Crazy, but I think it's workable. I'll take that test
out, and just make sure that the delimiter is different from the quote.

cheers

andrew