Home > mailing lists

Re: UTF8 national character data type support WIP patch and list of open issues. - Mailing list pgsql-hackers

From	Albe Laurenz
Subject	Re: UTF8 national character data type support WIP patch and list of open issues.
Date	November 5, 2013 13:17:56
Msg-id	A737B7A37273E048B164557ADEF4A58B17C56755@ntex2010i.host.magwien.gv.at Whole thread
In response to	Re: UTF8 national character data type support WIP patch and list of open issues. ("MauMau" <maumau307@gmail.com>)
Responses	Re: UTF8 national character data type support WIP patch and list of open issues.
List	pgsql-hackers

Tree view

MauMau wrote:
> From: "Albe Laurenz" <laurenz.albe@wien.gv.at>
>> If I understood the discussion correctly the use case is that
>> there are advantages to having a database encoding different
>> from UTF-8, but you'd still want sume UTF-8 columns.
>>
>> Wouldn't it be a better design to allow specifying the encoding
>> per column?  That would give you more flexibility.
> 
> Yes, you are right.  In the previous discussion:
> 
> - That would be nice if available, but it is hard to implement multiple
> encodings in one database.

Granted.

> - Some people (I'm not sure many or few) are NCHAR/NVARCHAR in other DBMSs.
> To invite them to PostgreSQL, it's important to support national character
> feature syntactically and document it in the manual.  This is the first
> step.

I looked into the Standard, and it does not have NVARCHAR.
The type is called NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING
or NCHAR VARYING.

I guess that the goal of this patch is to support Oracle syntax.
But anybody trying to port CREATE TABLE statements from Oracle
is already exposed to enough incompatibilities that the difference between
NVARCHAR and NCHAR VARYING will not be the reason to reject PostgreSQL.

In other words, I doubt that introducing the nonstandard NVARCHAR
will have more benefits than drawbacks (new reserved word).

Regarding the Standard compliant names of these data types, PostgreSQL
already supports those.  Maybe some documentation would help.

> - As the second step, we can implement multiple encodings in one database.
> According to the SQL standard, "NCHAR(n)" is equivalent to "CHAR(n)
> CHARACTER SET cs", where cs is an implementation-defined character set.

That second step would definitely have benefits.

But I don't think that this requires the first step that your patch
implements, it is in fact orthogonal.

I don't think that there is any need to change NCHAR even if we
get per-column encoding, it is just syntactic sugar to support
SQL Feature F421.

Why not tackle the second step first?

Yours,
Laurenz Albe

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 05 November 2013, 13:07:27
Subject: Re: Failure while inserting parent tuple to B-tree is not fun

From: Claudio Freire
Date: 05 November 2013, 13:18:32
Subject: Re: Fast insertion indexes: why no developments

Re: UTF8 national character data type support WIP patch and list of open issues. - Mailing list pgsql-hackers

Previous

Next