On Fri, Dec 16, 2011 at 12:13 PM, Joey Adams <joeyadams3.14159@gmail.com> wrote:
> So, if the server encoding is not UTF-8, should we ban Unicode escapes:
>
> "\u00FCber"
>
> or non-ASCII characters?
>
> "über"
The former. Refusing the escapes makes sense, because it's totally
unclear how we ought to interpret them. Refusing the characters would
be just breaking something for no particular reason. Right now, for
example, EXPLAIN (FORMAT JSON) could easily end up returning non-ASCII
characters in whatever the database encoding happens to be. That
command would be unusable if we arbitrarily chucked an error every
time a non-ASCII character showed up and the database encoding wasn't
UTF-8.
> Also:
>
> * What if the server encoding is SQL_ASCII?
>
> * What if the server encoding is UTF-8, but the client encoding is
> something else (e.g. SQL_ASCII)?
It's not clear to me why these cases would require any special handling.
In the spirit of Simon's suggestion that we JFDI, I cooked up a patch
today that JFDI. See attached. This lacks any form of
canonicalization and therefore doesn't support comparison operators.
It also lacks documentation, regression testing, and probably an
almost uncountable number of other bells and whistles that people
would like to have. This is more or less a deliberate decision on my
part: I feel that the biggest problem with this project is that we've
spent far too much time dithering over what the exactly perfect set of
functionality set would be, and not enough time getting good basic
functionality committed. So this is as basic as it gets. It does
exactly one thing: validation. If people are happy with it, we can
extend from here incrementally.
Thoughts?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company