Thread: non-standard escapes in string literals

non-standard escapes in string literals

From
pimlott@idiomtech.com (Andrew Pimlott)
Date:
I posted this some time ago to pgsql-bugs[1], to no response.  So
I'll venture to try here.

Postgres breaks the standard for string literals by supporting
C-like escape sequences.  This causes pain for people trying to
write portable applications.  Is there any hope for an option to
follow the standard strictly?

Cc's of replies appreciated.

Thanks,
Andrew

[1] http://archives.postgresql.org/pgsql-bugs/2001-12/msg00048.php


Re: non-standard escapes in string literals

From
Bruce Momjian
Date:
Andrew Pimlott wrote:
> I posted this some time ago to pgsql-bugs[1], to no response.  So
> I'll venture to try here.
> 
> Postgres breaks the standard for string literals by supporting
> C-like escape sequences.  This causes pain for people trying to
> write portable applications.  Is there any hope for an option to
> follow the standard strictly?

This is actually the first time this has come up (that I remember).  We
do support C escaping, but you are the first to mention that it can
cause problems for portable applications.

Anyone else want to comment?  I don't know how to address this.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: non-standard escapes in string literals

From
F Harvell
Date:
On Thu, 25 Apr 2002 10:41:56 EDT, Bruce Momjian wrote:
> Andrew Pimlott wrote:
> > I posted this some time ago to pgsql-bugs[1], to no response.  So
> > I'll venture to try here.
> > 
> > Postgres breaks the standard for string literals by supporting
> > C-like escape sequences.  This causes pain for people trying to
> > write portable applications.  Is there any hope for an option to
> > follow the standard strictly?
> 
> This is actually the first time this has come up (that I remember).  We
> do support C escaping, but you are the first to mention that it can
> cause problems for portable applications.
> 
> Anyone else want to comment?  I don't know how to address this.

IMHO, I agree that I would like to see the ANSI standard implemented.

While I really like PostgreSQL, it currently does not scale as large
as other DBMS systems.  Due to this, we try to code as database
agnostic as possible so that a port requires a minimum of effort.
Currently there are only a few areas remaining that are at issue.
(Intervals and implicit type conversion have/are being addressed).

I believe that the reason that it hasn't come up as an issue, per se,
is that it would only affect strings with a backslash in them.
Backslash is not a commonly used character.  In addition, MySQL, also
broken, uses backslashes in the same/similar way.  Lots of people
using PostgreSQL are stepping up from MySQL.

This also poses the biggest problem in terms of legacy compatibility.
Perhaps the answer is to add a runtime config option (and default it
to ANSI) and possibly deprecate the C escaping.

Thanks,
F Harvell

-- 
Mr. F Harvell                          Phone: +1.407.673.2529
FTS International Data Systems, Inc.    Cell: +1.407.467.1919
7457 Aloma Ave, Suite 302                Fax: +1.407.673.4472
Winter Park, FL 32792                 mailto:fharvell@fts.net




Re: non-standard escapes in string literals

From
Tom Lane
Date:
F Harvell <fharvell@fts.net> writes:
> This also poses the biggest problem in terms of legacy compatibility.
> Perhaps the answer is to add a runtime config option (and default it
> to ANSI) and possibly deprecate the C escaping.

While I wouldn't necessarily object to a runtime option, I do object
to both the other parts of your proposal ;-).  Backslash escaping is
not broken; we aren't going to remove it or deprecate it, and I would
vote against making it non-default.
        regards, tom lane


Re: non-standard escapes in string literals

From
Bruce Momjian
Date:
Tom Lane wrote:
> F Harvell <fharvell@fts.net> writes:
> > This also poses the biggest problem in terms of legacy compatibility.
> > Perhaps the answer is to add a runtime config option (and default it
> > to ANSI) and possibly deprecate the C escaping.
> 
> While I wouldn't necessarily object to a runtime option, I do object
> to both the other parts of your proposal ;-).  Backslash escaping is
> not broken; we aren't going to remove it or deprecate it, and I would
> vote against making it non-default.

Added to TODO:
* Allow backslash handling in quoted strings to be disabled for portability

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: non-standard escapes in string literals

From
F Harvell
Date:
On Thu, 25 Apr 2002 15:07:44 EDT, Tom Lane wrote:
> F Harvell <fharvell@fts.net> writes:
> > This also poses the biggest problem in terms of legacy compatibility.
> > Perhaps the answer is to add a runtime config option (and default it
> > to ANSI) and possibly deprecate the C escaping.
> 
> While I wouldn't necessarily object to a runtime option, I do object
> to both the other parts of your proposal ;-).  Backslash escaping is
> not broken; we aren't going to remove it or deprecate it, and I would
> vote against making it non-default.
> 

Sorry, didn't mean to imply that backslash escaping was broken, just
non-compliant.  Beyond that, your recommendations are also probably
the best course of action.

I do desire that the "default" operation of the database be as ANSI
standard compliant as possible, however, I certainly understand the
need to be as backwards compliant as possible.  The only issue that I
can see with keeping the backslash escaping default is that new,
non-PostgreSQL programmers will not be expecting the escaping and will
be potentially blindsided by it.  (A bigger deal since backslashes are
unusual and are not often tested for/with.)  Perhaps prominent notice
in the documentation will be adequate/appropriate.  Maybe a section on
differences with the ANSI standard should be created.  (Is there
currently a compilation of differences anywhere or are they all
dispersed within the documentation?).

Thanks, F

-- 
Mr. F Harvell                          Phone: +1.407.673.2529
FTS International Data Systems, Inc.    Cell: +1.407.467.1919
7457 Aloma Ave, Suite 302                Fax: +1.407.673.4472
Winter Park, FL 32792                 mailto:fharvell@fts.net




Re: non-standard escapes in string literals

From
Florian Weimer
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:

> Added to TODO:
>
>  * Allow backslash handling in quoted strings to be disabled for portability

BTW, what about embedded NUL characters in text strings? ;-)

-- 
Florian Weimer                       Weimer@CERT.Uni-Stuttgart.DE
University of Stuttgart           http://CERT.Uni-Stuttgart.DE/people/fw/
RUS-CERT                          +49-711-685-5973/fax +49-711-685-5898


Re: non-standard escapes in string literals

From
Tom Lane
Date:
Florian Weimer <Weimer@CERT.Uni-Stuttgart.DE> writes:
> BTW, what about embedded NUL characters in text strings? ;-)

There's approximately zero chance of that happening in the foreseeable
future.  Since null-terminated strings are the API for both the parser
and all datatype I/O routines, there'd have to be a lot of code changed
to support this.  To take just one example: strcoll() uses
null-terminated strings, therefore we'd not be able to support
locale-aware text comparisons unless we write our own replacement for
the entire locale library.  (Which we might do someday, but it's not
a trivial task.)

The amount of pain involved seems to far outweigh the gain...
        regards, tom lane


Re: non-standard escapes in string literals

From
"Zeugswetter Andreas SB SD"
Date:
> > This also poses the biggest problem in terms of legacy compatibility.
> > Perhaps the answer is to add a runtime config option (and default it
> > to ANSI) and possibly deprecate the C escaping.
>
> While I wouldn't necessarily object to a runtime option, I do object
> to both the other parts of your proposal ;-).  Backslash escaping is
> not broken; we aren't going to remove it or deprecate it, and I would
> vote against making it non-default.

When we are talking about the places where you need double escaping
(once for parser, once for input function) to make it work, I would also
say that that is very cumbersome (not broken, since it is thus documented) :-)
I would also default to strict ANSI, but not depricate the escaping when set.
All imho of course.

Andreas


Re: non-standard escapes in string literals

From
pimlott@idiomtech.com (Andrew Pimlott)
Date:
On Wed, May 08, 2002 at 06:47:46PM +0200, Zeugswetter Andreas SB SD wrote:
> When we are talking about the places where you need double escaping 
> (once for parser, once for input function) to make it work, I would also 
> say that that is very cumbersome (not broken, since it is thus documented) :-) 
> I would also default to strict ANSI, but not depricate the escaping when set.
> All imho of course.

As the original reporter of this issue, I am gratified to hear it
acknowledged by the developers.  Thanks!  (I also apologize if I
exaggerated the pain caused, as apparently not many other people
have been bitten by this specific problem.  Well, it was painful for
me. ;-) )

I must say, though, that I remain bothered by the "not broken"
attitude.  There is an obvious standard for PostgreSQL to follow,
yet it is non-compliant in utterly trivial ways, which provide
marginal or no benefits.  Granted, changing long-standing defaults
may not be acceptible; but there is a big difference between, "it is
broken but we just can't change it for compatibility reasons", and,
"it is not broken".

It is my experience that most other free software projects take
standards compliance more seriously than PostgreSQL, and my strong
opinion that both the project and its users (not to mention the
whole SQL database industry, eventually) would benefit from better
support for the SQL standard.

Ok, I've said my peace.

Andrew


Re: non-standard escapes in string literals

From
"Christopher Kings-Lynne"
Date:
> It is my experience that most other free software projects take
> standards compliance more seriously than PostgreSQL, and my strong
> opinion that both the project and its users (not to mention the
> whole SQL database industry, eventually) would benefit from better
> support for the SQL standard.

Ummm - I think you'd be hard pressed to find a open source db team more
committed to standards compliance.

Chris



Re: non-standard escapes in string literals

From
Bruce Momjian
Date:
Andrew Pimlott wrote:
> On Wed, May 08, 2002 at 06:47:46PM +0200, Zeugswetter Andreas SB SD wrote:
> > When we are talking about the places where you need double escaping 
> > (once for parser, once for input function) to make it work, I would also 
> > say that that is very cumbersome (not broken, since it is thus documented) :-) 
> > I would also default to strict ANSI, but not depricate the escaping when set.
> > All imho of course.
> 
> As the original reporter of this issue, I am gratified to hear it
> acknowledged by the developers.  Thanks!  (I also apologize if I
> exaggerated the pain caused, as apparently not many other people
> have been bitten by this specific problem.  Well, it was painful for
> me. ;-) )
> 
> I must say, though, that I remain bothered by the "not broken"
> attitude.  There is an obvious standard for PostgreSQL to follow,
> yet it is non-compliant in utterly trivial ways, which provide
> marginal or no benefits.  Granted, changing long-standing defaults
> may not be acceptible; but there is a big difference between, "it is
> broken but we just can't change it for compatibility reasons", and,
> "it is not broken".
> 
> It is my experience that most other free software projects take
> standards compliance more seriously than PostgreSQL, and my strong
> opinion that both the project and its users (not to mention the
> whole SQL database industry, eventually) would benefit from better
> support for the SQL standard.
> 
> Ok, I've said my peace.

Yes, these are good points.  Our big problem is that we use backslash
for two things, one for escaping single quotes and for escaping standard
C characters, like \n.  While we can use the standard-supported '' to
insert single quotes, what should we do with \n?  The problem is
switching to standard ANSI solution reduces our functionality.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: non-standard escapes in string literals

From
"Zeugswetter Andreas SB SD"
Date:
On Mon, June 03 Bruce wrote:
> > On Wed, May 08, 2002 at 06:47:46PM +0200, Zeugswetter SB SD Andreas wrote:
> > > When we are talking about the places where you need double escaping
> > > (once for parser, once for input function) to make it work, I would also
> > > say that that is very cumbersome (not broken, since it is thus documented) :-)
> > > I would also default to strict ANSI, but not depricate the escaping when set.
> > > All imho of course.

> Yes, these are good points.  Our big problem is that we use backslash
> for two things, one for escaping single quotes and for escaping standard
> C characters, like \n.  While we can use the standard-supported '' to
> insert single quotes, what should we do with \n?  The problem is
> switching to standard ANSI solution reduces our functionality.

The problem imho is, that this (no doubt in many cases valuable)
feature reduces the functionality from the ANSI SQL perspective.
Consider a field that is supposed to store Windows filenames,
nam_file='C:\node1\resend\b.dat' :-)

Thus I think a GUC to turn off all escaping except '' would be valuable.

Andreas


Re: non-standard escapes in string literals

From
Lincoln Yeoh
Date:
At 01:20 PM 6/3/02 +0200, Zeugswetter Andreas SB SD wrote:
> > for two things, one for escaping single quotes and for escaping standard
> > C characters, like \n.  While we can use the standard-supported '' to
> > insert single quotes, what should we do with \n?  The problem is
> > switching to standard ANSI solution reduces our functionality.
>
>The problem imho is, that this (no doubt in many cases valuable)
>feature reduces the functionality from the ANSI SQL perspective.
>Consider a field that is supposed to store Windows filenames,
>nam_file='C:\node1\resend\b.dat' :-)
>
>Thus I think a GUC to turn off all escaping except '' would be valuable.

With current behaviour 'C:\node1\resend\b.dat' can be quoted as 
'C:\\node1\\resend\\b.dat'

But for the ANSI standard how does one stuff \r\n\t and other control 
characters into the database?

If there's no way other than actually sending the control characters then 
that is a bad idea especially from a security viewpoint.

Cheerio,
Link.



Re: non-standard escapes in string literals

From
Peter Eisentraut
Date:
Lincoln Yeoh writes:

> But for the ANSI standard how does one stuff \r\n\t and other control
> characters into the database?
>
> If there's no way other than actually sending the control characters then
> that is a bad idea especially from a security viewpoint.

Why??

-- 
Peter Eisentraut   peter_e@gmx.net



Re: non-standard escapes in string literals

From
Lincoln Yeoh
Date:
At 09:58 PM 6/4/02 +0200, Peter Eisentraut wrote:
>Lincoln Yeoh writes:
>
> > But for the ANSI standard how does one stuff \r\n\t and other control
> > characters into the database?
> >
> > If there's no way other than actually sending the control characters then
> > that is a bad idea especially from a security viewpoint.
>
>Why??

Quoting is to help separate data from commands. Though '' is sufficient for 
quoting ' it seems to me not sufficient for control characters.

There could be control characters that cause problems with the DB, and 
people may not be sufficiently aware of potential problems. If you just 
remove the problematic characters, it means you can't store them in the 
database - the db can become less useful.

Whereas with the current way of quoting control characters, if you are 
unsure what to quote, you could safely quote every "untrusted" character. 
Less chance of things going wrong. Also being able to quote allows you to 
store control characters in the database.

An example of what could go wrong: a RDBMS may treat raw backspaces as part 
of the command stream and not the data, and thus

insert into pics (data) values ('$CGIPARAM')
could become -
insert into pics (data) values('....JFIF^H^H^H^H^H^H...^H^H^HUPDATE row 
from IMPORTANT where (rowid='1')
Which is treated as
UPDATE row from IMPORTANT where (rowid='1')

And so a file upload becomes an insiduous alteration of important data.

Hope that helps,
Link.




Re: non-standard escapes in string literals

From
Lincoln Yeoh
Date:
OK, I was wrong. '' can be sufficient. The DB just has to treat everything 
between single quotes as data except for '' which is treated as a ' in the 
data.

However raw control characters can still cause problems in the various 
stages from the source to the DB.

Cheerio,
Link.

Lincoln Yeoh wrote:
Quoting is to help separate data from commands. Though '' is sufficient for 
quoting ' it seems to me not sufficient for control characters.



Re: non-standard escapes in string literals

From
Peter Eisentraut
Date:
Lincoln Yeoh writes:

> However raw control characters can still cause problems in the various
> stages from the source to the DB.

I still don't see why.  You are merely speculating about implementation
fallacies that aren't there.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: non-standard escapes in string literals

From
Lincoln Yeoh
Date:
Yes it's speculation. The implementation at the DB isn't there, neither are 
the associated DBD/JDBC/ODBC drivers for it.

Basically if the fallacies aren't in postgresql _if_ the decision is to 
implement it, I'd be happy.

I was just noting (perhaps superfluously) that backspaces and friends 
(nulls) have been useful for exploiting databases (and other programs). 
Recently at least one multibyte character (0x81a2) allowed potential 
security problems with certain configurations/installations of Postgresql. 
Would switching to the standard cause such problems to be less or more 
likely? Would making it an option make such problems more likely?

Cheerio,
Link.

p.s. Even +++AT[H]<cr>(remove square brackets and <cr> = carriage return) 
as data can cause problems sometimes - esp with crappy modems. Once there 
was a site whose EDI metadata had lots of +++ and they were experiencing 
"bad connections" <grin>...


At 07:10 PM 6/6/02 +0200, Peter Eisentraut wrote:
>Lincoln Yeoh writes:
>
> > However raw control characters can still cause problems in the various
> > stages from the source to the DB.
>
>I still don't see why.  You are merely speculating about implementation
>fallacies that aren't there.
>
>--
>Peter Eisentraut   peter_e@gmx.net