On Thu, Sep 19, 2013 at 7:58 PM, Tatsuo Ishii <ishii@postgresql.org> wrote:
> What about limiting to use NCHAR with a database which has same
> encoding or "compatible" encoding (on which the encoding conversion is
> defined)? This way, NCHAR text can be automatically converted from
> NCHAR to the database encoding in the server side thus we can treat
> NCHAR exactly same as CHAR afterward. I suppose what encoding is used
> for NCHAR should be defined in initdb time or creation of the database
> (if we allow this, we need to add a new column to know what encoding
> is used for NCHAR).
>
> For example, "CREATE TABLE t1(t NCHAR(10))" will succeed if NCHAR is
> UTF-8 and database encoding is UTF-8. Even succeed if NCHAR is
> SHIFT-JIS and database encoding is UTF-8 because there is a conversion
> between UTF-8 and SHIFT-JIS. However will not succeed if NCHAR is
> SHIFT-JIS and database encoding is ISO-8859-1 because there's no
> conversion between them.
I think the point here is that, at least as I understand it, encoding
conversion and sanitization happens at a very early stage right now,
when we first receive the input from the client. If the user sends a
string of bytes as part of a query or bind placeholder that's not
valid in the database encoding, it's going to error out before any
type-specific code has an opportunity to get control. Look at
textin(), for example. There's no encoding check there. That means
it's already been done at that point. To make this work, someone's
going to have to figure out what to do about *that*. Until we have a
sketch of what the design for that looks like, I don't see how we can
credibly entertain more specific proposals.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company