Thread: BUG #2221: Bad delimiters allowed in COPY ... TO
The following bug has been logged online: Bug reference: 2221 Logged by: David Fetter Email address: david@fetter.org PostgreSQL version: all Operating system: all Description: Bad delimiters allowed in COPY ... TO Details: This came up while I was testing my pg_dump "specify DELIMITER AS and/or NULL AS" patch. Below is a repro using newline. A similar problem happens when you specify the delimiter as a backslash. Should COPY simply not allow these delimiters? What about carriage return? Just in general, what bytes other than null ought COPY to reject out of hand? Also, what regression tests do we want to put in in order to ensure that COPY ... TO generates output that COPY ... FROM can understand? CREATE TABLE pqxxevents ( "year" integer, event text ); COPY pqxxevents ("year", event) FROM stdin DELIMITER AS '^M'; 2010^MOdyssey Two 2038^Mtime_t overflow 1971^Mjtv 1981^MC:\\> 1997^MAsian crisis 1999^M\N 1978^Mbloody\t\tcold 1989^MOde an die Freiheit 2001^MNew millennium 2001^M'911' WTC attack 2001^MA Space Odyssey 2002^Mlibpqxx 3001^MFinal Odyssey \.
On Sun, Jan 29, 2006 at 09:50:08PM +0000, David Fetter wrote: > > The following bug has been logged online: > > Bug reference: 2221 > Logged by: David Fetter > Email address: david@fetter.org > PostgreSQL version: all > Operating system: all > Description: Bad delimiters allowed in COPY ... TO > Details: > > This came up while I was testing my pg_dump "specify DELIMITER AS and/or > NULL AS" patch. Folks, Please pardon the self-followup. I believe that this patch fixes the problem in COPY. Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 415 235 3778 Remember to vote!
Attachment
On Sun, Jan 29, 2006 at 04:41:43PM -0800, David Fetter wrote: > On Sun, Jan 29, 2006 at 09:50:08PM +0000, David Fetter wrote: > > > > The following bug has been logged online: > > > > Bug reference: 2221 > > Logged by: David Fetter > > Email address: david@fetter.org > > PostgreSQL version: all > > Operating system: all > > Description: Bad delimiters allowed in COPY ... TO > > Details: > > > > This came up while I was testing my pg_dump "specify DELIMITER AS and/or > > NULL AS" patch. > > Folks, > > Please pardon the self-followup. I believe that this patch fixes > the problem in COPY. Another followup, this time with the comment done right. Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 415 235 3778 Remember to vote!
Attachment
On Sun, 2006-01-29 at 17:03 -0800, David Fetter wrote: > Another followup, this time with the comment done right. + /* Disallow the forbidden_delimiter strings */ + if (strcspn(cstate->delim, BADCHARS) != 1) + elog(ERROR, "COPY delimiter cannot be %#02x", + *cstate->delim); + The comment is still wrong: referencing "forbidden_delimiter" makes it sound like there is something named forbidden_delimiter, but there is not (at least in the patch as submitted). The patch should also use ereport rather than elog, because this error message might reasonably be encountered by the user. -Neil
On Sun, Jan 29, 2006 at 10:20:47PM -0500, Neil Conway wrote: > On Sun, 2006-01-29 at 17:03 -0800, David Fetter wrote: > > Another followup, this time with the comment done right. > > + /* Disallow the forbidden_delimiter strings */ > + if (strcspn(cstate->delim, BADCHARS) != 1) > + elog(ERROR, "COPY delimiter cannot be %#02x", > + *cstate->delim); > + > > The comment is still wrong: referencing "forbidden_delimiter" makes > it sound like there is something named forbidden_delimiter, but > there is not (at least in the patch as submitted). > > The patch should also use ereport rather than elog, because this > error message might reasonably be encountered by the user. Patch with BADCHARS attached :) Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 415 235 3778 Remember to vote!
David Fetter wrote: > >+ /* Disallow BADCHARS characters */ >+ if (strcspn(cstate->delim, BADCHARS) != 1) >+ ereport(ERROR, >+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), >+ errmsg("COPY delimiter cannot be \"%#02x\"", >+ *cstate->delim))); >+ > > > Is ERRCODE_FEATURE_NOT_SUPPORTED the right errcode? This isn't a missing feature; we are performing a sanity check here. We can reasonably expect never to support CR, LF or \ as the text delimiter. Maybe ERRCODE_INVALID_PARAMETER_VALUE ? Or maybe we need a new one. Also, I would probably make the format %#.02x so the result would look like 0x0d (for a CR). (I bet David never thought there would so much fuss over a handful of lines of code) cheers andrew
On Mon, Jan 30, 2006 at 08:21:34AM -0500, Andrew Dunstan wrote: > > > David Fetter wrote: > > > > >+ /* Disallow BADCHARS characters */ > >+ if (strcspn(cstate->delim, BADCHARS) != 1) > >+ ereport(ERROR, > >+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > >+ errmsg("COPY delimiter cannot be \"%#02x\"", > >+ *cstate->delim))); > >+ > > Is ERRCODE_FEATURE_NOT_SUPPORTED the right errcode? This isn't a > missing feature; we are performing a sanity check here. We can > reasonably expect never to support CR, LF or \ as the text > delimiter. I guess that depends on whether we ever plan to allow people to set the output record separator to something other than CR?LF. > Maybe ERRCODE_INVALID_PARAMETER_VALUE ? Or maybe we need a new one. > > Also, I would probably make the format %#.02x so the result would > look like 0x0d (for a CR). > > (I bet David never thought there would so much fuss over a handful > of lines of code) Actually, I'm happy to see it's getting QA. COPY is something that has Consequences⢠if anything goes wrong with it, so I'd rather do best efforts up front to get it right. :) Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 415 235 3778 Remember to vote!
> Is ERRCODE_FEATURE_NOT_SUPPORTED the right errcode? This isn't a > missing feature; we are performing a sanity check here. We can > reasonably expect never to support CR, LF or \ as the text > delimiter. I guess that depends on whether we ever plan to allow people to set the output record separator to something other than CR?LF. > Maybe ERRCODE_INVALID_PARAMETER_VALUE ? Or maybe we need a new one. Agreed. Right now it is invalid and there are no plans to support other values for end-of-line. I will make the change when the patch is applied. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Uh, couldn't the delimiter be a backslash in CVS mode? + #define BADCHARS "\r\n\\" Also, should we disable DELIMITER and NULL from sharing characters? --------------------------------------------------------------------------- David Fetter wrote: > On Sun, Jan 29, 2006 at 10:20:47PM -0500, Neil Conway wrote: > > On Sun, 2006-01-29 at 17:03 -0800, David Fetter wrote: > > > Another followup, this time with the comment done right. > > > > + /* Disallow the forbidden_delimiter strings */ > > + if (strcspn(cstate->delim, BADCHARS) != 1) > > + elog(ERROR, "COPY delimiter cannot be %#02x", > > + *cstate->delim); > > + > > > > The comment is still wrong: referencing "forbidden_delimiter" makes > > it sound like there is something named forbidden_delimiter, but > > there is not (at least in the patch as submitted). > > > > The patch should also use ereport rather than elog, because this > > error message might reasonably be encountered by the user. > > Patch with BADCHARS attached :) > > Cheers, > D > -- > David Fetter david@fetter.org http://fetter.org/ > phone: +1 415 235 3778 > > Remember to vote! [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Tue, Jan 31, 2006 at 08:03:41PM -0500, Bruce Momjian wrote: > Uh, couldn't the delimiter be a backslash in CVS mode? I don't think so. Folks? Anyhow, if there are different sets, I could do something like: #define BADCHARS "\r\n\\" #define BADCHARS_CSV "\r\n" and then check for csv_mode, etc. > + #define BADCHARS "\r\n\\" > > Also, should we disable DELIMITER and NULL from sharing characters? That's on about line 916, post-patch: /* Don't allow the delimiter to appear in the null string. */ if (strchr(cstate->null_print, cstate->delim[0]) != NULL) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("COPY delimiter must not appear in the NULL specification"))); I suppose that a different error code might be The Right Thing⢠here. Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 415 235 3778 Remember to vote!
On Tue, Jan 31, 2006 at 09:50:26PM -0600, Andrew Dunstan wrote: > David Fetter said: > > On Tue, Jan 31, 2006 at 08:03:41PM -0500, Bruce Momjian wrote: > >> Uh, couldn't the delimiter be a backslash in CVS mode? > > > > I don't think so. Folks? > > Using backslash as a delimiter in CSV would be odd, to say the least. As an > escape char it is occasionally used, but not as a delimiter in my > experience. Maybe we should apply the "be liberal in what you accept" rule, > but I think this would be stretching it. <aol>So do I.</aol> > >> Also, should we disable DELIMITER and NULL from sharing characters? > > > > That's on about line 916, post-patch: > > > > /* Don't allow the delimiter to appear in the null string. */ > > if (strchr(cstate->null_print, cstate->delim[0]) != NULL) > > ereport(ERROR, > > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > > errmsg("COPY delimiter must not appear in the NULL > > specification"))); > > > > I suppose that a different error code might be The Right Thing⢠> > here. > > ERRCODE_WHAT WERE_YOU_THINKING ? That's an excellent candidate, or maybe ERRCODE_INVALID_PARAMETER_VALUE. My vote is for ERRCODE_D00D_WTF ;) Maybe we need an error code for mutually incompatible param values. Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 415 235 3778 Remember to vote!
David Fetter said: > On Tue, Jan 31, 2006 at 08:03:41PM -0500, Bruce Momjian wrote: >> Uh, couldn't the delimiter be a backslash in CVS mode? > > I don't think so. Folks? Using backslash as a delimiter in CSV would be odd, to say the least. As an escape char it is occasionally used, but not as a delimiter in my experience. Maybe we should apply the "be liberal in what you accept" rule, but I think this would be stretching it. > > Anyhow, if there are different sets, I could do something like: > > #define BADCHARS "\r\n\\" > #define BADCHARS_CSV "\r\n" > > and then check for csv_mode, etc. > >> + #define BADCHARS "\r\n\\" >> >> Also, should we disable DELIMITER and NULL from sharing characters? > > That's on about line 916, post-patch: > > /* Don't allow the delimiter to appear in the null string. */ > if (strchr(cstate->null_print, cstate->delim[0]) != NULL) > ereport(ERROR, > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("COPY delimiter must not appear in the NULL > specification"))); > > I suppose that a different error code might be The Right Thing⢠here. > ERRCODE_WHAT WERE_YOU_THINKING ? cheers andrew