Home > mailing lists

Re: UTF8 national character data type support WIP patch and list of open issues. - Mailing list pgsql-hackers

From	MauMau
Subject	Re: UTF8 national character data type support WIP patch and list of open issues.
Date	September 21, 2013 00:34:14
Msg-id	75E3DA50779B4FB79FB3DFC0211E8F8A@maumau Whole thread
In response to	Re: UTF8 national character data type support WIP patch and list of open issues. (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

From: "Robert Haas" <robertmhaas@gmail.com>
> On Thu, Sep 19, 2013 at 7:58 PM, Tatsuo Ishii <ishii@postgresql.org> 
> wrote:
>> What about limiting to use NCHAR with a database which has same
>> encoding or "compatible" encoding (on which the encoding conversion is
>> defined)? This way, NCHAR text can be automatically converted from
>> NCHAR to the database encoding in the server side thus we can treat
>> NCHAR exactly same as CHAR afterward.  I suppose what encoding is used
>> for NCHAR should be defined in initdb time or creation of the database
>> (if we allow this, we need to add a new column to know what encoding
>> is used for NCHAR).
>>
>> For example, "CREATE TABLE t1(t NCHAR(10))" will succeed if NCHAR is
>> UTF-8 and database encoding is UTF-8. Even succeed if NCHAR is
>> SHIFT-JIS and database encoding is UTF-8 because there is a conversion
>> between UTF-8 and SHIFT-JIS. However will not succeed if NCHAR is
>> SHIFT-JIS and database encoding is ISO-8859-1 because there's no
>> conversion between them.
>
> I think the point here is that, at least as I understand it, encoding
> conversion and sanitization happens at a very early stage right now,
> when we first receive the input from the client.  If the user sends a
> string of bytes as part of a query or bind placeholder that's not
> valid in the database encoding, it's going to error out before any
> type-specific code has an opportunity to get control.   Look at
> textin(), for example.  There's no encoding check there.  That means
> it's already been done at that point.  To make this work, someone's
> going to have to figure out what to do about *that*.  Until we have a
> sketch of what the design for that looks like, I don't see how we can
> credibly entertain more specific proposals.

OK, I see your point.  Let's consider that design.  I'll learn the code 
regarding this.  Does anybody, especially Tatsuo san, Tom san, Peter san, 
have any good idea?

Regards
MauMau

pgsql-hackers by date:

From: "MauMau"
Date: 21 September 2013, 00:30:28
Subject: Re: UTF8 national character data type support WIP patch and list of open issues.

From: Peter Geoghegan
Date: 21 September 2013, 00:48:47
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

Re: UTF8 national character data type support WIP patch and list of open issues. - Mailing list pgsql-hackers

Previous

Next