Thread: Escape handling in COPY, strings, psql

Escape handling in COPY, strings, psql

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Here is an updated version of the COPY \x patch.  It is the first patch
> > attached.
> > Also, I realized that if we support \x in COPY, we should also support
> > \x in strings to the backend.  This is the second patch.
> 
> Do we really want to do any of these things?  We've been getting beaten
> up recently about the fact that we have non-SQL-spec string escapes
> (ie, all the backslash stuff) so I'm a bit dubious about adding more,
> especially when there's little if any demand for it.

I thought about that, but adding additional escape letters isn't our
problem --- it is the escape mechanism itself that is the issue.

I have wanted to post on this issue so now is a good time.  I think we
have been validly beaten up in that we pride ourselves on standards
compliance but have escape requirement on all strings.  Our string
escapes are a major problem --- not the number of them but the
requirement to double backslashes on input, like 'C:\\tmp'.  I am
thinking the only clean solution is to add a special keyword like ESCAPE
before strings that contain escape information.  I think a GUC is too
general.  You know if the string is a constant if it contains escapes
just by looking at it, and if it is a variable, hopefully you know if it
has escapes.  

Basically, I think we have to deal with this somehow. I think it could
be implemented by looking for the ESCAPE keyword in parser/scan.l and
handling it all in there by ignoring backslash escapes if ESCAPE
preceeds the string.  By the time you are in gram.y, it is too late.

> I don't object too much to the COPY addition, since that's outside any
> spec anyway, but I do think we ought to think twice about adding this
> to SQL literal handling.
> 
> > Third, I found out that psql has some unusual handling of escaped
> > numbers.  Instead of using \ddd as octal, it has \ddd is decimal, \0ddd
> > is octal, and \0xddd is decimal.  It is basically following the strtol()
> > rules for an escaped value.  This seems confusing and contradicts how
> > the rest of our system works.
> 
> I agree, that's just going to confuse people.
> 
> > ! xqescape        [\\][^0-7x]
> 
> If you are going to insist on this, at least make it case-insensitive.

The submitted COPY patch also was case-insensitive, \x and \X, but I
changed that because we are case-sensitive for all backslashes in COPY,
and C is the same (\n and \N are different too, so we actually use the
case-sensitivity).  Should we allow \X just so it is case-insensitive
like the SQL specification X'4f'?  That is the only logic I can think of
for it to be case-insensitive, but we have to then do that at all
levels, and I am not sure it makes sense.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Escape handling in COPY, strings, psql

From
Peter Eisentraut
Date:
Bruce Momjian wrote:
> I thought about that, but adding additional escape letters isn't our
> problem --- it is the escape mechanism itself that is the issue.

In a random-encoding environment, the option to specify byte values 
directly -- at any level -- is of limited value anyway and is a 
potential source of errors.  So let's stay away from that.

I did not find the original posts that your quotations came from, but it 
has to be considered that COPY is also subject to encoding processing.  
Overall, I find this proposal to be a dangerous option.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: Escape handling in COPY, strings, psql

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > Here is an updated version of the COPY \x patch.  It is the first patch
> > > attached.
> > > Also, I realized that if we support \x in COPY, we should also support
> > > \x in strings to the backend.  This is the second patch.
> > 
> > Do we really want to do any of these things?  We've been getting beaten
> > up recently about the fact that we have non-SQL-spec string escapes
> > (ie, all the backslash stuff) so I'm a bit dubious about adding more,
> > especially when there's little if any demand for it.
> 
> I thought about that, but adding additional escape letters isn't our
> problem --- it is the escape mechanism itself that is the issue.
> 
> I have wanted to post on this issue so now is a good time.  I think we
> have been validly beaten up in that we pride ourselves on standards
> compliance but have escape requirement on all strings.  Our string
> escapes are a major problem --- not the number of them but the
> requirement to double backslashes on input, like 'C:\\tmp'.  I am
> thinking the only clean solution is to add a special keyword like ESCAPE
> before strings that contain escape information.  I think a GUC is too
> general.  You know if the string is a constant if it contains escapes
> just by looking at it, and if it is a variable, hopefully you know if it
> has escapes.  
> 
> Basically, I think we have to deal with this somehow. I think it could
> be implemented by looking for the ESCAPE keyword in parser/scan.l and
> handling it all in there by ignoring backslash escapes if ESCAPE
> preceeds the string.  By the time you are in gram.y, it is too late.

One other idea would be to remove escape processing for single-quoted
strings but keep it for our $$ strings, becuase they are not ANSI
standard.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Escape handling in COPY, strings, psql

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> One other idea would be to remove escape processing for single-quoted
> strings but keep it for our $$ strings, becuase they are not ANSI
> standard.

There is *no* escape processing within $$, and never will be, because
that would destroy the entire point.  You'd be right back having to
double backslashes.
        regards, tom lane


Re: Escape handling in COPY, strings, psql

From
Peter Eisentraut
Date:
Bruce Momjian wrote:
> >  I am thinking the only clean solution is to add a special keyword
> > like ESCAPE before strings that contain escape information.  I
> > think a GUC is too general.  You know if the string is a constant
> > if it contains escapes just by looking at it, and if it is a
> > variable, hopefully you know if it has escapes.

I do support gradually phasing out backslash escapes in standard string 
literals in the interest of portability.  Most of the current escape 
sequences are of limited value anyway.  Let's think about ways to get 
there:

Enabling escape sequences in string literals controls the formatting of 
input (and output?) data, so it is akin to, say, the client encoding 
and the date style, so a GUC variable isn't out of the question in my 
mind.  It makes most sense, though, if we want to eventually make users 
switch it off all the time, that is, as a transition aid.  But before 
that can happen, we need to come up with an alternative mechanism to 
enter weird characters.

One such way may be to provide functions (say, chr(), tab(), etc.) to 
give access to unprintable characters, but that will result in terrible 
performance for long strings and it also won't help with COPY or places 
where only literals are allowed.

Another way would be to allow escape sequences only in specially marked 
strings.  The proposal above doing 'foo' ESCAPE 'x' seems fairly 
elegant for SQL linguists but would be pretty weird to implement in the 
lexer.  It won't help with COPY either, but that is really the case for 
all solutions.

A more compact represenation may be using a prefix letter, like E'foo'.  
This fits the SQL syntax, is familiar with Python programmers (although 
in the other direction), and can be implemented efficiently in the 
lexer.  I like that the best, personally.

For COPY, we would probably have to use a flag in the COPY command 
itself either way (like already done for NULL AS).

Comments?  Other ideas?  Keep the escapes?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: Escape handling in COPY, strings, psql

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> I do support gradually phasing out backslash escapes in standard string 
> literals in the interest of portability.  Most of the current escape 
> sequences are of limited value anyway.  Let's think about ways to get 
> there:

I really don't think there is any way to get there without creating
gaping security holes in all kinds of client code :-(.  If we change
the escaping rules, then a client that is expecting some other rule
than happens to be in force will be subject to trivial SQL-injection
attacks.  This will make the autocommit fiasco pale by comparison ...

> For COPY, we would probably have to use a flag in the COPY command 
> itself either way (like already done for NULL AS).

The spec-compatibility argument for removing escapes does not apply to
COPY at all, so I see no need to fool with the COPY definition in any
case.
        regards, tom lane


Re: Escape handling in COPY, strings, psql

From
Bruce Momjian
Date:
Peter Eisentraut wrote:
> Bruce Momjian wrote:
> > >  I am thinking the only clean solution is to add a special keyword
> > > like ESCAPE before strings that contain escape information.  I
> > > think a GUC is too general.  You know if the string is a constant
> > > if it contains escapes just by looking at it, and if it is a
> > > variable, hopefully you know if it has escapes.
> 
> I do support gradually phasing out backslash escapes in standard string 
> literals in the interest of portability.  Most of the current escape 
> sequences are of limited value anyway.  Let's think about ways to get 
> there:
> 
> Enabling escape sequences in string literals controls the formatting of 
> input (and output?) data, so it is akin to, say, the client encoding 
> and the date style, so a GUC variable isn't out of the question in my 
> mind.  It makes most sense, though, if we want to eventually make users 
> switch it off all the time, that is, as a transition aid.  But before 
> that can happen, we need to come up with an alternative mechanism to 
> enter weird characters.
> 
> One such way may be to provide functions (say, chr(), tab(), etc.) to 
> give access to unprintable characters, but that will result in terrible 
> performance for long strings and it also won't help with COPY or places 
> where only literals are allowed.
> 
> Another way would be to allow escape sequences only in specially marked 
> strings.  The proposal above doing 'foo' ESCAPE 'x' seems fairly 
> elegant for SQL linguists but would be pretty weird to implement in the 
> lexer.  It won't help with COPY either, but that is really the case for 
> all solutions.

I was suggesting ESCAPE 'string' or ESC 'string'.  The marker has to be
before the string so scan.l can alter its processing of the string ---
after the string is too late --- there is no way to undo any escaping
that has happened, and it might already be used by gram.y.

I could probably hack up a sample implementation if people are
interested.

> A more compact representation may be using a prefix letter, like E'foo'.  
> This fits the SQL syntax, is familiar with Python programmers (although 
> in the other direction), and can be implemented efficiently in the 
> lexer.  I like that the best, personally.
> 
> For COPY, we would probably have to use a flag in the COPY command 
> itself either way (like already done for NULL AS).

I agree with Tom that COPY has to be left unchanged.  The fundamental
problem is the representation of NULL values, that I don't think we can
do without some escape mechanism.  Single-quote escapes works by
doubling them, but once you need to represent something more like
null's, I can't think of a solution without escapes.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Escape handling in COPY, strings, psql

From
Bruce Momjian
Date:
Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > I do support gradually phasing out backslash escapes in standard string 
> > literals in the interest of portability.  Most of the current escape 
> > sequences are of limited value anyway.  Let's think about ways to get 
> > there:
> 
> I really don't think there is any way to get there without creating
> gaping security holes in all kinds of client code :-(.  If we change
> the escaping rules, then a client that is expecting some other rule
> than happens to be in force will be subject to trivial SQL-injection
> attacks.  This will make the autocommit fiasco pale by comparison ...

I looked at PQescapeString() and fortunately it escapes single quotes
by doing double-single quotes, not by using a backslash.  This was
probably chosen for standards compliance.

Basically, I think our current behavior is not sustainable.  I think we
are going to need to do something, and I think we should consider a
solution now rather than later.  I don't think we can be as serious a
contender for portability without some kind of solution.

I am thinking we should first tell people in 8.1 that they should start
using only double-single quotes, and perhaps support the ESCAPE phrase
as a no-op, and then consider some kind of solution in 8.2 or later.

I don't think fixing this is going to be a huge security problem, but it
might be a small one.  The good thing is that double-single quotes work,
so if people use only that for quote escaping, if you forget the ESCAPE
clause, you just get literal backslashes, not a security problem.

I ran the following test:test=> select $$\$$; ?column?---------- \(1 row)test=> create table test (x TEXT);CREATE
TABLEtest=>INSERT INTO test VALUES ($$\$$);INSERT 0 1test=> SELECT * FROM test; x--- \(1 row)
 

and the good news is that output of backslashes is fine --- it is just
input that is the issue, and the security problem is only using \',
which we would have to tell people to avoid and start using only ''.

I think we can tell people in 8.1 that they should modify their
applications to only use '', and that \' might be a security problem in
the future.  If we get to that then using ESC or not only affects input
of values and literal backslashes being entered, and my guess is that
90% of the backslash entries that want escaping are literal in the
application and not supplied by program variables.  In fact, if we
disable backslash by default then strings coming in only have to deal
with single quotes (like other databases) and the system is more secure
because there is no special backslash handling by default.

> > For COPY, we would probably have to use a flag in the COPY command 
> > itself either way (like already done for NULL AS).
> 
> The spec-compatibility argument for removing escapes does not apply to
> COPY at all, so I see no need to fool with the COPY definition in any
> case.

Agreed.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Escape handling in COPY, strings, psql

From
Christopher Kings-Lynne
Date:
> I think we can tell people in 8.1 that they should modify their
> applications to only use '', and that \' might be a security problem in
> the future.  If we get to that then using ESC or not only affects input
> of values and literal backslashes being entered, and my guess is that
> 90% of the backslash entries that want escaping are literal in the
> application and not supplied by program variables.  In fact, if we
> disable backslash by default then strings coming in only have to deal
> with single quotes (like other databases) and the system is more secure
> because there is no special backslash handling by default.

I can tell you right now this will be a problem :)  There are loads of 
PHP ppl who use addslashes() instead of pg_escape_string() to escape data.

Chris



Re: Escape handling in COPY, strings, psql

From
Peter Eisentraut
Date:
Bruce Momjian wrote:
> I was suggesting ESCAPE 'string' or ESC 'string'.  The marker has to
> be before the string so scan.l can alter its processing of the string
> --- after the string is too late --- there is no way to undo any
> escaping that has happened, and it might already be used by gram.y.

That pretty much corresponds to my E'string' proposal.  Both are 
probably equally trivial to implement.

> I agree with Tom that COPY has to be left unchanged.  The fundamental
> problem is the representation of NULL values, that I don't think we
> can do without some escape mechanism.  Single-quote escapes works by
> doubling them, but once you need to represent something more like
> null's, I can't think of a solution without escapes.

Yes, I now realize that COPY has a whole set of different rules anyway, 
so we can leave that out of this discussion.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: Escape handling in COPY, strings, psql

From
Bruce Momjian
Date:
Christopher Kings-Lynne wrote:
> > I think we can tell people in 8.1 that they should modify their
> > applications to only use '', and that \' might be a security problem in
> > the future.  If we get to that then using ESC or not only affects input
> > of values and literal backslashes being entered, and my guess is that
> > 90% of the backslash entries that want escaping are literal in the
> > application and not supplied by program variables.  In fact, if we
> > disable backslash by default then strings coming in only have to deal
> > with single quotes (like other databases) and the system is more secure
> > because there is no special backslash handling by default.
> 
> I can tell you right now this will be a problem :)  There are loads of 
> PHP ppl who use addslashes() instead of pg_escape_string() to escape data.

I read the PHP addslashes() manual page:
http://us3.php.net/addslashes

First, I see what people mean about PHP having most of the complex
content in comments, rather than in the actual manual text, and this
tendency is certainly something we want to avoid --- you end up having
to digest all the comments to find the details that should be in the
manual already.

On to the case at hand, the comments mention that addslashes() isn't
safe for all databases, and in fact isn't the prefered method.  I do
think it could be a problem we have to have people avoid.  One idea for
8.1 is to throw a warning if \' appears in a string, thereby helping
people find the places they are using the incorrect non-standard
escaping.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Escape handling in COPY, strings, psql

From
Christopher Kings-Lynne
Date:
> I read the PHP addslashes() manual page:
> 
>     http://us3.php.net/addslashes
> 
> First, I see what people mean about PHP having most of the complex
> content in comments, rather than in the actual manual text, and this
> tendency is certainly something we want to avoid --- you end up having
> to digest all the comments to find the details that should be in the
> manual already.

Actually, all the comments are posted on the php-doc list, with 
automatic urls in them for 'fixed in cvs', 'rejected', etc.

Each comment is supposed to be acted upon (ie. fixed in source), then 
deleted.

There's still a lot of old comment around that hasn't had that treatment 
though...

Chris


Re: Escape handling in COPY, strings, psql

From
Bruce Momjian
Date:
Christopher Kings-Lynne wrote:
> > I read the PHP addslashes() manual page:
> > 
> >     http://us3.php.net/addslashes
> > 
> > First, I see what people mean about PHP having most of the complex
> > content in comments, rather than in the actual manual text, and this
> > tendency is certainly something we want to avoid --- you end up having
> > to digest all the comments to find the details that should be in the
> > manual already.
> 
> Actually, all the comments are posted on the php-doc list, with 
> automatic urls in them for 'fixed in cvs', 'rejected', etc.
> 
> Each comment is supposed to be acted upon (ie. fixed in source), then 
> deleted.
> 
> There's still a lot of old comment around that hasn't had that treatment 
> though...

Right, they are more _usage_ comments, but still I think they could be
consolidated into manual text.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Escape handling in COPY, strings, psql

From
Bruce Momjian
Date:
Peter Eisentraut wrote:
> Bruce Momjian wrote:
> > I was suggesting ESCAPE 'string' or ESC 'string'.  The marker has to
> > be before the string so scan.l can alter its processing of the string
> > --- after the string is too late --- there is no way to undo any
> > escaping that has happened, and it might already be used by gram.y.
> 
> That pretty much corresponds to my E'string' proposal.  Both are 
> probably equally trivial to implement.

Right.  I think your E'' idea has the benefit of fitting with our
existing X'' and B'' modifiers.  It is also simpler and cleaner to do in
scan.l, so I think your idea is best.

> > I agree with Tom that COPY has to be left unchanged.  The fundamental
> > problem is the representation of NULL values, that I don't think we
> > can do without some escape mechanism.  Single-quote escapes works by
> > doubling them, but once you need to represent something more like
> > null's, I can't think of a solution without escapes.
> 
> Yes, I now realize that COPY has a whole set of different rules anyway, 
> so we can leave that out of this discussion.

Cool.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Escape handling in COPY, strings, psql

From
Greg Stark
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:

> Christopher Kings-Lynne wrote:
> > 
> > Each comment is supposed to be acted upon (ie. fixed in source), then 
> > deleted.
> 
> Right, they are more _usage_ comments, but still I think they could be
> consolidated into manual text.

If that's "supposed" to happen it certainly hasn't been the de facto
procedure.

I think they have things partly right here though. A lot of those comments
aren't actually the kinds of things that belong in the canonical reference.
They include things like "watch out for this common error" or "here's a handy
use for this function". Often the "common error" or "handy use" are pretty
bogus but every now and then there's a genuinely useful one.

These kinds of things would just clutter up a reference. A reference should
just state unambiguously this function does XYZ and give examples that help
explain XYZ.

The PHP Docs do have a bit of a problem in that often the comments include
things like "In case X, what happens is Y" which really ought to be covered by
the canonical reference. That's a problem.


-- 
greg



Backslash handling in strings

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Peter Eisentraut wrote:
> > Bruce Momjian wrote:
> > > I was suggesting ESCAPE 'string' or ESC 'string'.  The marker has to
> > > be before the string so scan.l can alter its processing of the string
> > > --- after the string is too late --- there is no way to undo any
> > > escaping that has happened, and it might already be used by gram.y.
> > 
> > That pretty much corresponds to my E'string' proposal.  Both are 
> > probably equally trivial to implement.
> 
> Right.  I think your E'' idea has the benefit of fitting with our
> existing X'' and B'' modifiers.  It is also simpler and cleaner to do in
> scan.l, so I think your idea is best.

[ CC list trimmed.]

OK, I talked to Tom and Peter and I have come up with a tentative plan.

The goal, at some point, is that we would have two types of strings, ''
strings and E'' strings.  '' strings don't have any special backslash
handling for compatibility with with the ANSI spec and all other
databases except MySQL (and in MySQL it is now optional).  E'' strings
behave just like our strings do now, with backslash handling.

In 8.0.X, we add support for E'' strings, but it is a noop.  This is
done just for portability with future releases.  We also state that
users should stop using \' to escape quotes in strings, and instead use
'', and that we will throw a warning in 8.1 if we see a \' in a non-E
string.  (We could probably throw a warning for E'' use of \' too, but I
want to give users the ability to avoid the warning if they can't change
from using \' to ''.)

In 8.1, we start issuing the warning for \' in non-E strings, plus we
tell users who want escape processing that they will have to use E''
strings for this starting in release 8.2, and they should start
migrating their escaped strings over to E''.

Tom also suggested a readonly GUC variable that is sent to clients that
indicates if simple strings have backslash handling, for use by
applications that are doing escapes themselves, perhaps
'escape_all_strings'.

PQescapeString() and PQescapeBytea() can still be used, but only with
E'' strings in 8.2.  We could create PQquoteString() for 8.1 and later
to allow for just single-quote doubling for non-E strings.

Tom asked about how to handle pg_dump contents that have strings, like
function bodies.  We could start using E'' for those in 8.0 but it does
break backward movement of dumps, and someone upgrading from 7.1 to 8.2
would be in trouble.  :-(  Perhaps we will have another round of
subrelease fixes and we can bundle this into that and tell people they
have to upgrade to the newest subrelease before going to 8.2.  I think
we have had that requirement in the past when we had broken pg_dump
processing.

The good news is that once everyone uses only '' to quote string, we
will not have any data security issues with this change.  The only
potential problem is the mishandling of backslash characters if there is
a mismatch between what the client expects and the server uses.  By
backpatching E'' perhaps even to 7.4 and earlier (as a noop), we could
minimize this problem.

Is this whole thing ugly?  Yes.  Can we just close our eyes and hope we
can continue with our current behavior while growing a larger userbase
--- probabably not.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Backslash handling in strings

From
Greg Stark
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:

> The goal, at some point, is that we would have two types of strings, ''
> strings and E'' strings.  '' strings don't have any special backslash
> handling for compatibility with with the ANSI spec and all other
> databases except MySQL (and in MySQL it is now optional).  E'' strings
> behave just like our strings do now, with backslash handling.


The only thing I'm not clear on is what exactly is the use case for E''
strings. That is, who do you expect to actually use them?

Any new applications are recommended to be using '' strings. And any existing
applications obviously won't be using them since they don't currently exist.

The only potential candidates are existing applications being ported forward.
And that only makes sense if they're currently using some function like
addslash. Is it really easier to change all the SQL queries to use E'' (and
still have a bug) than it is to replace addslash with PQquoteString() ?



Also, I'm really confused why you would make PQescapeString require E''
strings and introduce a new function. That means existing non-buggy
applications would suddenly be buggy? And it would be impossible to write a
properly functioning application that interpolates a constant into a query
that would be portable to 8.2 and 8.0?


-- 
greg



Re: Backslash handling in strings

From
Bruce Momjian
Date:
Greg Stark wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> 
> > The goal, at some point, is that we would have two types of strings, ''
> > strings and E'' strings.  '' strings don't have any special backslash
> > handling for compatibility with with the ANSI spec and all other
> > databases except MySQL (and in MySQL it is now optional).  E'' strings
> > behave just like our strings do now, with backslash handling.
> 
> 
> The only thing I'm not clear on is what exactly is the use case for E''
> strings. That is, who do you expect to actually use them?
> 
> Any new applications are recommended to be using '' strings. And any existing
> applications obviously won't be using them since they don't currently exist.

We are saying to use '' to escape single quotes in all strings.  E'' is
still useful if you want to use backslash escapes in your strings.

Does that answer your questions?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Backslash handling in strings

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> The only thing I'm not clear on is what exactly is the use case for E''
> strings. That is, who do you expect to actually use them?

The case that convinced me we need to keep some sort of backslash
capability is this: suppose you want to put a string including a tab
into your database.  Try to do it with psql:t=> insert into foo values ('<TAB>
Guess what: you won't get anywhere, at least not unless you disable
readline.  So it's nice to be able to use \t.

There are related issues involving \r and \n depending on your platform.
And this doesn't even scratch the surface of encoding-related funnies.

So there's definitely a use-case for keeping the existing backslash
behavior, and E'string' seems like a reasonable proposal for doing that
without conflicting with the SQL spec.

What I do not see at the moment is how we get there from here (ie,
dropping backslashing in regular literals) without incurring tremendous
pain --- including breaking all existing pg_dump files, introducing
security holes and/or data corruption into many existing apps that are
not presently broken, and probably some other ways of ruining your day.
I'm quite unconvinced that this particular letter of the SQL spec is
worth complying with ...
        regards, tom lane


Re: Backslash handling in strings

From
Dennis Bjorklund
Date:
On Tue, 31 May 2005, Tom Lane wrote:

> The case that convinced me we need to keep some sort of backslash
> capability is this: suppose you want to put a string including a tab
> into your database.  Try to do it with psql:
>     t=> insert into foo values ('<TAB>
> Guess what: you won't get anywhere, at least not unless you disable
> readline.  So it's nice to be able to use \t.

To insert a tab using readline you can press ESC followed by TAB. This
works as least in readline as it is setup in redhat/fedora (and readline
can be setup in 1000 different ways so who knows how portable this is).

-- 
/Dennis Björklund



Re: Backslash handling in strings

From
Tom Ivar Helbekkmo
Date:
Dennis Bjorklund <db@zigo.dhs.org> writes:

> To insert a tab using readline you can press ESC followed by TAB.

...or ^V followed by TAB, as per age-old tradition.  :-)

-tih
-- 
Don't ascribe to stupidity what can be adequately explained by ignorance.


Re: Backslash handling in strings

From
Dennis Bjorklund
Date:
On Tue, 31 May 2005, Tom Ivar Helbekkmo wrote:

> ...or ^V followed by TAB, as per age-old tradition.  :-)

Right, I forgot about that one. One can also do other control characters
instead of TAB by pressing CTRL-J and similar.

Well, I just wanted to point out that it's possible. The main problem is
still to make sure that old dumps work and can be imported. I don't see
how that can work without a GUC variable in addition to the E'foo' stuff
(but that's not so bad as it can be phased in to support old pg_dumps and
phased out again in pg 10 or something).

-- 
/Dennis Björklund



Re: Backslash handling in strings

From
Bruce Momjian
Date:
Tom Lane wrote:
> Greg Stark <gsstark@mit.edu> writes:
> > The only thing I'm not clear on is what exactly is the use case for E''
> > strings. That is, who do you expect to actually use them?
> 
> The case that convinced me we need to keep some sort of backslash
> capability is this: suppose you want to put a string including a tab
> into your database.  Try to do it with psql:
>     t=> insert into foo values ('<TAB>
> Guess what: you won't get anywhere, at least not unless you disable
> readline.  So it's nice to be able to use \t.
> 
> There are related issues involving \r and \n depending on your platform.
> And this doesn't even scratch the surface of encoding-related funnies.
> 
> So there's definitely a use-case for keeping the existing backslash
> behavior, and E'string' seems like a reasonable proposal for doing that
> without conflicting with the SQL spec.
> 
> What I do not see at the moment is how we get there from here (ie,
> dropping backslashing in regular literals) without incurring tremendous
> pain --- including breaking all existing pg_dump files, introducing
> security holes and/or data corruption into many existing apps that are
> not presently broken, and probably some other ways of ruining your day.
> I'm quite unconvinced that this particular letter of the SQL spec is
> worth complying with ...

I think this is going to be like the Win32 port, where there is little
excitement from our existing users, but it is needed to grow our user
base.

I think the E'' is useful becuase it gives people a migration path for
the escapes they are already using, and the escape mechanism itself it
something useful to keep.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Backslash handling in strings

From
Bruno Wolff III
Date:
On Tue, May 31, 2005 at 11:49:20 +0200, Dennis Bjorklund <db@zigo.dhs.org> wrote:
> On Tue, 31 May 2005, Tom Lane wrote:
> 
> > The case that convinced me we need to keep some sort of backslash
> > capability is this: suppose you want to put a string including a tab
> > into your database.  Try to do it with psql:
> >     t=> insert into foo values ('<TAB>
> > Guess what: you won't get anywhere, at least not unless you disable
> > readline.  So it's nice to be able to use \t.
> 
> To insert a tab using readline you can press ESC followed by TAB. This
> works as least in readline as it is setup in redhat/fedora (and readline
> can be setup in 1000 different ways so who knows how portable this is).

There are still advantages to having printable backslashed escaped characters
in strings that are saved to files. It makes it easier to see what is really
in the string and they are less likely to get accidentally munged when
editing the file or moving it between systems with different line termination
conventions.


Re: Backslash handling in strings

From
Bruce Momjian
Date:
Here is a summary of the issues with moving to no escapes for non-E
strings:
http://candle.pha.pa.us/cgi-bin/pgescape

---------------------------------------------------------------------------

Bruno Wolff III wrote:
> On Tue, May 31, 2005 at 11:49:20 +0200,
>   Dennis Bjorklund <db@zigo.dhs.org> wrote:
> > On Tue, 31 May 2005, Tom Lane wrote:
> > 
> > > The case that convinced me we need to keep some sort of backslash
> > > capability is this: suppose you want to put a string including a tab
> > > into your database.  Try to do it with psql:
> > >     t=> insert into foo values ('<TAB>
> > > Guess what: you won't get anywhere, at least not unless you disable
> > > readline.  So it's nice to be able to use \t.
> > 
> > To insert a tab using readline you can press ESC followed by TAB. This
> > works as least in readline as it is setup in redhat/fedora (and readline
> > can be setup in 1000 different ways so who knows how portable this is).
> 
> There are still advantages to having printable backslashed escaped characters
> in strings that are saved to files. It makes it easier to see what is really
> in the string and they are less likely to get accidentally munged when
> editing the file or moving it between systems with different line termination
> conventions.
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073