Re: UTF8 national character data type support WIP patch and list of open issues. - Mailing list pgsql-hackers

From MauMau
Subject Re: UTF8 national character data type support WIP patch and list of open issues.
Date
Msg-id 37B76474BB3149FD841373E12E355851@maumau
Whole thread Raw
In response to Re: UTF8 national character data type support WIP patch and list of open issues.  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: UTF8 national character data type support WIP patch and list of open issues.
Re: UTF8 national character data type support WIP patch and list of open issues.
List pgsql-hackers
From: "Robert Haas" <robertmhaas@gmail.com>
> That may be what's important to you, but it's not what's important to
> me.

National character types support may be important to some potential users of 
PostgreSQL and the popularity of PostgreSQL, not me.  That's why national 
character support is listed in the PostgreSQL TODO wiki.  We might be losing 
potential users just because their selection criteria includes national 
character support.


> I am not keen to introduce support for nchar and nvarchar as
> differently-named types with identical semantics.

Similar examples already exist:

- varchar and text: the only difference is the existence of explicit length 
limit
- numeric and decimal
- int and int4, smallint and int2, bigint and int8
- real/double precison and float

In addition, the SQL standard itself admits:

"The <key word>s NATIONAL CHARACTER are used to specify the character type 
with an implementation-
defined character set. Special syntax (N'string') is provided for 
representing literals in that character set.
...
"NATIONAL CHARACTER" is equivalent to the corresponding <character string 
type> with a specification
of "CHARACTER SET CSN", where "CSN" is an implementation-defined <character 
set name>."

"A <national character string literal> is equivalent to a <character string 
literal> with the "N" replaced by
"<introducer><character set specification>", where "<character set 
specification>" is an implementation-
defined <character set name>."


> And I think it's an
> even worse idea to introduce them now, making them work one way, and
> then later change the behavior in a backward-incompatible fashion.

I understand your feeling.  The concern about incompatibility can be 
eliminated by thinking the following way.  How about this?

- NCHAR can be used with any database encoding.

- At first, NCHAR is exactly the same as CHAR.  That is, 
"implementation-defined character set" described in the SQL standard is the 
database character set.

- In the future, the character set for NCHAR can be selected at database 
creation like Oracle's CREATE DATABAWSE .... NATIONAL CHARACTER SET 
AL16UTF16.  The default it the database set.


Could you tell me what kind of specification we should implement if we 
officially support national character types?

Regards
MauMau





pgsql-hackers by date:

Previous
From: Steve Singer
Date:
Subject: Re: record identical operator - Review
Next
From: Andres Freund
Date:
Subject: Re: [PERFORM] encouraging index-only scans