Re: UTF8 national character data type support WIP patch and list of open issues. - Mailing list pgsql-hackers

From Albe Laurenz
Subject Re: UTF8 national character data type support WIP patch and list of open issues.
Date
Msg-id A737B7A37273E048B164557ADEF4A58B17C56755@ntex2010i.host.magwien.gv.at
Whole thread Raw
In response to Re: UTF8 national character data type support WIP patch and list of open issues.  ("MauMau" <maumau307@gmail.com>)
Responses Re: UTF8 national character data type support WIP patch and list of open issues.
List pgsql-hackers
MauMau wrote:
> From: "Albe Laurenz" <laurenz.albe@wien.gv.at>
>> If I understood the discussion correctly the use case is that
>> there are advantages to having a database encoding different
>> from UTF-8, but you'd still want sume UTF-8 columns.
>>
>> Wouldn't it be a better design to allow specifying the encoding
>> per column?  That would give you more flexibility.
> 
> Yes, you are right.  In the previous discussion:
> 
> - That would be nice if available, but it is hard to implement multiple
> encodings in one database.

Granted.

> - Some people (I'm not sure many or few) are NCHAR/NVARCHAR in other DBMSs.
> To invite them to PostgreSQL, it's important to support national character
> feature syntactically and document it in the manual.  This is the first
> step.

I looked into the Standard, and it does not have NVARCHAR.
The type is called NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING
or NCHAR VARYING.

I guess that the goal of this patch is to support Oracle syntax.
But anybody trying to port CREATE TABLE statements from Oracle
is already exposed to enough incompatibilities that the difference between
NVARCHAR and NCHAR VARYING will not be the reason to reject PostgreSQL.

In other words, I doubt that introducing the nonstandard NVARCHAR
will have more benefits than drawbacks (new reserved word).

Regarding the Standard compliant names of these data types, PostgreSQL
already supports those.  Maybe some documentation would help.

> - As the second step, we can implement multiple encodings in one database.
> According to the SQL standard, "NCHAR(n)" is equivalent to "CHAR(n)
> CHARACTER SET cs", where cs is an implementation-defined character set.

That second step would definitely have benefits.

But I don't think that this requires the first step that your patch
implements, it is in fact orthogonal.

I don't think that there is any need to change NCHAR even if we
get per-column encoding, it is just syntactic sugar to support
SQL Feature F421.

Why not tackle the second step first?

Yours,
Laurenz Albe

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Failure while inserting parent tuple to B-tree is not fun
Next
From: Claudio Freire
Date:
Subject: Re: Fast insertion indexes: why no developments