Thread: non-standard escapes in string literals
I posted this some time ago to pgsql-bugs[1], to no response. So I'll venture to try here. Postgres breaks the standard for string literals by supporting C-like escape sequences. This causes pain for people trying to write portable applications. Is there any hope for an option to follow the standard strictly? Cc's of replies appreciated. Thanks, Andrew [1] http://archives.postgresql.org/pgsql-bugs/2001-12/msg00048.php
Andrew Pimlott wrote: > I posted this some time ago to pgsql-bugs[1], to no response. So > I'll venture to try here. > > Postgres breaks the standard for string literals by supporting > C-like escape sequences. This causes pain for people trying to > write portable applications. Is there any hope for an option to > follow the standard strictly? This is actually the first time this has come up (that I remember). We do support C escaping, but you are the first to mention that it can cause problems for portable applications. Anyone else want to comment? I don't know how to address this. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Thu, 25 Apr 2002 10:41:56 EDT, Bruce Momjian wrote: > Andrew Pimlott wrote: > > I posted this some time ago to pgsql-bugs[1], to no response. So > > I'll venture to try here. > > > > Postgres breaks the standard for string literals by supporting > > C-like escape sequences. This causes pain for people trying to > > write portable applications. Is there any hope for an option to > > follow the standard strictly? > > This is actually the first time this has come up (that I remember). We > do support C escaping, but you are the first to mention that it can > cause problems for portable applications. > > Anyone else want to comment? I don't know how to address this. IMHO, I agree that I would like to see the ANSI standard implemented. While I really like PostgreSQL, it currently does not scale as large as other DBMS systems. Due to this, we try to code as database agnostic as possible so that a port requires a minimum of effort. Currently there are only a few areas remaining that are at issue. (Intervals and implicit type conversion have/are being addressed). I believe that the reason that it hasn't come up as an issue, per se, is that it would only affect strings with a backslash in them. Backslash is not a commonly used character. In addition, MySQL, also broken, uses backslashes in the same/similar way. Lots of people using PostgreSQL are stepping up from MySQL. This also poses the biggest problem in terms of legacy compatibility. Perhaps the answer is to add a runtime config option (and default it to ANSI) and possibly deprecate the C escaping. Thanks, F Harvell -- Mr. F Harvell Phone: +1.407.673.2529 FTS International Data Systems, Inc. Cell: +1.407.467.1919 7457 Aloma Ave, Suite 302 Fax: +1.407.673.4472 Winter Park, FL 32792 mailto:fharvell@fts.net
F Harvell <fharvell@fts.net> writes: > This also poses the biggest problem in terms of legacy compatibility. > Perhaps the answer is to add a runtime config option (and default it > to ANSI) and possibly deprecate the C escaping. While I wouldn't necessarily object to a runtime option, I do object to both the other parts of your proposal ;-). Backslash escaping is not broken; we aren't going to remove it or deprecate it, and I would vote against making it non-default. regards, tom lane
Tom Lane wrote: > F Harvell <fharvell@fts.net> writes: > > This also poses the biggest problem in terms of legacy compatibility. > > Perhaps the answer is to add a runtime config option (and default it > > to ANSI) and possibly deprecate the C escaping. > > While I wouldn't necessarily object to a runtime option, I do object > to both the other parts of your proposal ;-). Backslash escaping is > not broken; we aren't going to remove it or deprecate it, and I would > vote against making it non-default. Added to TODO: * Allow backslash handling in quoted strings to be disabled for portability -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Thu, 25 Apr 2002 15:07:44 EDT, Tom Lane wrote: > F Harvell <fharvell@fts.net> writes: > > This also poses the biggest problem in terms of legacy compatibility. > > Perhaps the answer is to add a runtime config option (and default it > > to ANSI) and possibly deprecate the C escaping. > > While I wouldn't necessarily object to a runtime option, I do object > to both the other parts of your proposal ;-). Backslash escaping is > not broken; we aren't going to remove it or deprecate it, and I would > vote against making it non-default. > Sorry, didn't mean to imply that backslash escaping was broken, just non-compliant. Beyond that, your recommendations are also probably the best course of action. I do desire that the "default" operation of the database be as ANSI standard compliant as possible, however, I certainly understand the need to be as backwards compliant as possible. The only issue that I can see with keeping the backslash escaping default is that new, non-PostgreSQL programmers will not be expecting the escaping and will be potentially blindsided by it. (A bigger deal since backslashes are unusual and are not often tested for/with.) Perhaps prominent notice in the documentation will be adequate/appropriate. Maybe a section on differences with the ANSI standard should be created. (Is there currently a compilation of differences anywhere or are they all dispersed within the documentation?). Thanks, F -- Mr. F Harvell Phone: +1.407.673.2529 FTS International Data Systems, Inc. Cell: +1.407.467.1919 7457 Aloma Ave, Suite 302 Fax: +1.407.673.4472 Winter Park, FL 32792 mailto:fharvell@fts.net
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Added to TODO: > > * Allow backslash handling in quoted strings to be disabled for portability BTW, what about embedded NUL characters in text strings? ;-) -- Florian Weimer Weimer@CERT.Uni-Stuttgart.DE University of Stuttgart http://CERT.Uni-Stuttgart.DE/people/fw/ RUS-CERT +49-711-685-5973/fax +49-711-685-5898
Florian Weimer <Weimer@CERT.Uni-Stuttgart.DE> writes: > BTW, what about embedded NUL characters in text strings? ;-) There's approximately zero chance of that happening in the foreseeable future. Since null-terminated strings are the API for both the parser and all datatype I/O routines, there'd have to be a lot of code changed to support this. To take just one example: strcoll() uses null-terminated strings, therefore we'd not be able to support locale-aware text comparisons unless we write our own replacement for the entire locale library. (Which we might do someday, but it's not a trivial task.) The amount of pain involved seems to far outweigh the gain... regards, tom lane
> > This also poses the biggest problem in terms of legacy compatibility. > > Perhaps the answer is to add a runtime config option (and default it > > to ANSI) and possibly deprecate the C escaping. > > While I wouldn't necessarily object to a runtime option, I do object > to both the other parts of your proposal ;-). Backslash escaping is > not broken; we aren't going to remove it or deprecate it, and I would > vote against making it non-default. When we are talking about the places where you need double escaping (once for parser, once for input function) to make it work, I would also say that that is very cumbersome (not broken, since it is thus documented) :-) I would also default to strict ANSI, but not depricate the escaping when set. All imho of course. Andreas
On Wed, May 08, 2002 at 06:47:46PM +0200, Zeugswetter Andreas SB SD wrote: > When we are talking about the places where you need double escaping > (once for parser, once for input function) to make it work, I would also > say that that is very cumbersome (not broken, since it is thus documented) :-) > I would also default to strict ANSI, but not depricate the escaping when set. > All imho of course. As the original reporter of this issue, I am gratified to hear it acknowledged by the developers. Thanks! (I also apologize if I exaggerated the pain caused, as apparently not many other people have been bitten by this specific problem. Well, it was painful for me. ;-) ) I must say, though, that I remain bothered by the "not broken" attitude. There is an obvious standard for PostgreSQL to follow, yet it is non-compliant in utterly trivial ways, which provide marginal or no benefits. Granted, changing long-standing defaults may not be acceptible; but there is a big difference between, "it is broken but we just can't change it for compatibility reasons", and, "it is not broken". It is my experience that most other free software projects take standards compliance more seriously than PostgreSQL, and my strong opinion that both the project and its users (not to mention the whole SQL database industry, eventually) would benefit from better support for the SQL standard. Ok, I've said my peace. Andrew
> It is my experience that most other free software projects take > standards compliance more seriously than PostgreSQL, and my strong > opinion that both the project and its users (not to mention the > whole SQL database industry, eventually) would benefit from better > support for the SQL standard. Ummm - I think you'd be hard pressed to find a open source db team more committed to standards compliance. Chris
Andrew Pimlott wrote: > On Wed, May 08, 2002 at 06:47:46PM +0200, Zeugswetter Andreas SB SD wrote: > > When we are talking about the places where you need double escaping > > (once for parser, once for input function) to make it work, I would also > > say that that is very cumbersome (not broken, since it is thus documented) :-) > > I would also default to strict ANSI, but not depricate the escaping when set. > > All imho of course. > > As the original reporter of this issue, I am gratified to hear it > acknowledged by the developers. Thanks! (I also apologize if I > exaggerated the pain caused, as apparently not many other people > have been bitten by this specific problem. Well, it was painful for > me. ;-) ) > > I must say, though, that I remain bothered by the "not broken" > attitude. There is an obvious standard for PostgreSQL to follow, > yet it is non-compliant in utterly trivial ways, which provide > marginal or no benefits. Granted, changing long-standing defaults > may not be acceptible; but there is a big difference between, "it is > broken but we just can't change it for compatibility reasons", and, > "it is not broken". > > It is my experience that most other free software projects take > standards compliance more seriously than PostgreSQL, and my strong > opinion that both the project and its users (not to mention the > whole SQL database industry, eventually) would benefit from better > support for the SQL standard. > > Ok, I've said my peace. Yes, these are good points. Our big problem is that we use backslash for two things, one for escaping single quotes and for escaping standard C characters, like \n. While we can use the standard-supported '' to insert single quotes, what should we do with \n? The problem is switching to standard ANSI solution reduces our functionality. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Mon, June 03 Bruce wrote: > > On Wed, May 08, 2002 at 06:47:46PM +0200, Zeugswetter SB SD Andreas wrote: > > > When we are talking about the places where you need double escaping > > > (once for parser, once for input function) to make it work, I would also > > > say that that is very cumbersome (not broken, since it is thus documented) :-) > > > I would also default to strict ANSI, but not depricate the escaping when set. > > > All imho of course. > Yes, these are good points. Our big problem is that we use backslash > for two things, one for escaping single quotes and for escaping standard > C characters, like \n. While we can use the standard-supported '' to > insert single quotes, what should we do with \n? The problem is > switching to standard ANSI solution reduces our functionality. The problem imho is, that this (no doubt in many cases valuable) feature reduces the functionality from the ANSI SQL perspective. Consider a field that is supposed to store Windows filenames, nam_file='C:\node1\resend\b.dat' :-) Thus I think a GUC to turn off all escaping except '' would be valuable. Andreas
At 01:20 PM 6/3/02 +0200, Zeugswetter Andreas SB SD wrote: > > for two things, one for escaping single quotes and for escaping standard > > C characters, like \n. While we can use the standard-supported '' to > > insert single quotes, what should we do with \n? The problem is > > switching to standard ANSI solution reduces our functionality. > >The problem imho is, that this (no doubt in many cases valuable) >feature reduces the functionality from the ANSI SQL perspective. >Consider a field that is supposed to store Windows filenames, >nam_file='C:\node1\resend\b.dat' :-) > >Thus I think a GUC to turn off all escaping except '' would be valuable. With current behaviour 'C:\node1\resend\b.dat' can be quoted as 'C:\\node1\\resend\\b.dat' But for the ANSI standard how does one stuff \r\n\t and other control characters into the database? If there's no way other than actually sending the control characters then that is a bad idea especially from a security viewpoint. Cheerio, Link.
Lincoln Yeoh writes: > But for the ANSI standard how does one stuff \r\n\t and other control > characters into the database? > > If there's no way other than actually sending the control characters then > that is a bad idea especially from a security viewpoint. Why?? -- Peter Eisentraut peter_e@gmx.net
At 09:58 PM 6/4/02 +0200, Peter Eisentraut wrote: >Lincoln Yeoh writes: > > > But for the ANSI standard how does one stuff \r\n\t and other control > > characters into the database? > > > > If there's no way other than actually sending the control characters then > > that is a bad idea especially from a security viewpoint. > >Why?? Quoting is to help separate data from commands. Though '' is sufficient for quoting ' it seems to me not sufficient for control characters. There could be control characters that cause problems with the DB, and people may not be sufficiently aware of potential problems. If you just remove the problematic characters, it means you can't store them in the database - the db can become less useful. Whereas with the current way of quoting control characters, if you are unsure what to quote, you could safely quote every "untrusted" character. Less chance of things going wrong. Also being able to quote allows you to store control characters in the database. An example of what could go wrong: a RDBMS may treat raw backspaces as part of the command stream and not the data, and thus insert into pics (data) values ('$CGIPARAM') could become - insert into pics (data) values('....JFIF^H^H^H^H^H^H...^H^H^HUPDATE row from IMPORTANT where (rowid='1') Which is treated as UPDATE row from IMPORTANT where (rowid='1') And so a file upload becomes an insiduous alteration of important data. Hope that helps, Link.
OK, I was wrong. '' can be sufficient. The DB just has to treat everything between single quotes as data except for '' which is treated as a ' in the data. However raw control characters can still cause problems in the various stages from the source to the DB. Cheerio, Link. Lincoln Yeoh wrote: Quoting is to help separate data from commands. Though '' is sufficient for quoting ' it seems to me not sufficient for control characters.
Lincoln Yeoh writes: > However raw control characters can still cause problems in the various > stages from the source to the DB. I still don't see why. You are merely speculating about implementation fallacies that aren't there. -- Peter Eisentraut peter_e@gmx.net
Yes it's speculation. The implementation at the DB isn't there, neither are the associated DBD/JDBC/ODBC drivers for it. Basically if the fallacies aren't in postgresql _if_ the decision is to implement it, I'd be happy. I was just noting (perhaps superfluously) that backspaces and friends (nulls) have been useful for exploiting databases (and other programs). Recently at least one multibyte character (0x81a2) allowed potential security problems with certain configurations/installations of Postgresql. Would switching to the standard cause such problems to be less or more likely? Would making it an option make such problems more likely? Cheerio, Link. p.s. Even +++AT[H]<cr>(remove square brackets and <cr> = carriage return) as data can cause problems sometimes - esp with crappy modems. Once there was a site whose EDI metadata had lots of +++ and they were experiencing "bad connections" <grin>... At 07:10 PM 6/6/02 +0200, Peter Eisentraut wrote: >Lincoln Yeoh writes: > > > However raw control characters can still cause problems in the various > > stages from the source to the DB. > >I still don't see why. You are merely speculating about implementation >fallacies that aren't there. > >-- >Peter Eisentraut peter_e@gmx.net