Home > mailing lists

Re: UTF8 national character data type support WIP patch and list of open issues. - Mailing list pgsql-hackers

From	MauMau
Subject	Re: UTF8 national character data type support WIP patch and list of open issues.
Date	November 9, 2013 07:23:57
Msg-id	673E261C589440E3B0D8FDF9A11B1181@maumau Whole thread Raw
In response to	Re: UTF8 national character data type support WIP patch and list of open issues. (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: UTF8 national character data type support WIP patch and list of open issues.
List	pgsql-hackers

Tree view

From: "Robert Haas" <robertmhaas@gmail.com>
> On Tue, Nov 5, 2013 at 5:15 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> On 11/5/13, 1:04 AM, Arulappan, Arul Shaji wrote:
>>> Implements NCHAR/NVARCHAR as distinct data types, not as synonyms
>>
>> If, per SQL standard, NCHAR(x) is equivalent to CHAR(x) CHARACTER SET
>> "cs", then for some "cs", NCHAR(x) must be the same as CHAR(x).
>> Therefore, an implementation as separate data types is wrong.
>
> Since the point doesn't seem to be getting through, let me try to be
> more clear: we're not going to accept any form of this patch.  A patch
> that makes some progress toward actually coping with multiple
> encodings in the same database would be very much worth considering,
> but adding compatible syntax with incompatible semantics is not of
> interest to the PostgreSQL project.  We have had this debate on many
> other topics in the past and will no doubt have it again in the
> future, but the outcome is always the same.

It doesn't seem that there is any semantics incompatible with the SQL 
standard as follows:

- In the first step, "cs" is the database encoding, which is used for 
char/varchar/text.
- In the second (or final) step, where multiple encodings per database is 
supported, "cs" is the national character encoding which is specified with 
CREATE DATABASE ... NATIONAL CHARACTER ENCODING cs.  If NATIONAL CHARACTER 
ENCODING clause is omitted, "cs" is the database encoding as step 1.

Let me repeat myself: I think the biggest and immediate issue is that 
PostgreSQL does not support national character types at least officially. 
"Officially" means the description in the manual.  So I don't have strong 
objection against the current (hidden) implementation of nchar types in 
PostgreSQL which are just synonyms, as long as the official support is 
documented.  Serious users don't want to depend on hidden features.

However, doesn't the current synonym approach have any problems?  Wouldn't 
it produce any trouble in the future?  If we treat nchar as char, we lose 
the fact that the user requested nchar.  Can we lose the fact so easily and 
produce irreversible result as below?

--------------------------------------------------
Maybe so.  I guess the distinct type for NCHAR is for future extension and
user friendliness.  As one user, I expect to get "national character"
instead of "char character set xxx" as output of psql \d and pg_dump when I
specified "national character" in DDL.  In addition, that makes it easy to
use the pg_dump output for importing data to other DBMSs for some reason.
--------------------------------------------------


Regards
MauMau

pgsql-hackers by date:

From: Amit Khandekar
Date: 09 November 2013, 06:39:47
Subject: information schema parameter_default implementation

From: David Rowley
Date: 09 November 2013, 07:30:22
Subject: Re: patch to fix unused variable warning on windows build

Re: UTF8 national character data type support WIP patch and list of open issues. - Mailing list pgsql-hackers

Previous

Next