Re: Unicode string literals versus the world - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Unicode string literals versus the world
Date
Msg-id 12063.1239724473@sss.pgh.pa.us
Whole thread Raw
In response to Re: Unicode string literals versus the world  (Marko Kreen <markokr@gmail.com>)
Responses Re: Unicode string literals versus the world  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers
Marko Kreen <markokr@gmail.com> writes:
> I would prefer that such quoting extensions would wait until
> stdstr=on setting is the only mode Postgres will operate.
> Fitting new quoting ways to environment with flippable stdstr setting
> will be rather painful for everyone.

It would certainly be a lot safer to wait until non-standard-conforming
strings don't exist anymore.  The problem is that that may never happen,
and is certainly not on the roadmap to happen in the foreseeable future.

> I still stand on my proposal, how about extending E'' strings with
> unicode escapes (eg. \uXXXX)?  The E'' strings are already more
> clearly defined than '' and they are our "own", we don't need to
> consider random standards, but can consider our sanity.

That's one way we could proceed.  The other proposal that seemed
attractive to me was a decode-like function:
uescape('foo\00e9bar')uescape('foo\00e9bar', '\')

(double all the backslashes if you assume not
standard_conforming_strings).  The arguments in favor of this one
are (1) you can apply it to the result of an expression, it's not
strictly tied to literals; and (2) it's a lot lower-footprint solution
since it doesn't affect basic literal handling.  If you wish to suppose
that this is only a stopgap until someday when we can implement the SQL
standard syntax more safely, then low footprint is good.  One could
even imagine back-porting this into existing releases as a user-defined
function.

The solution with \u in extended literals is probably workable too.
I'm slightly worried about the possibility of issues with code that
thinks it knows what an E-literal means but doesn't really.  In
particular something might think it knows that "\u" just means "u",
and proceed to strip the backslash.  I don't see a path for that to
become a security hole though, only a garden-variety bug.  So I could
live with that one on the grounds of being easier to use (which it
would be, because of less typing compared to uescape()).
        regards, tom lane


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Unicode support
Next
From: Peter Eisentraut
Date:
Subject: Re: Unicode string literals versus the world