This is a proposal to implement functionalities for the handling of
National Characters.
[Introduction]
The aim of this proposal is to eventually have a way to represent
'National Characters' in a uniform way, even in non-UTF8 encoded
databases. Many of our customers in the Asian region who are now, as
part of their platform modernization, are moving away from mainframes
where they have used National Characters representation in COBOL and
other databases. Having stronger support for national characters
representation will also make it easier for these customers to look at
PostgreSQL more favourably when migrating from other well known RDBMSs
who all have varying degrees of NCHAR/NVARCHAR support.
[Specifications]
Broadly speaking, the national characters implementation ideally will
include the following
- Support for NCHAR/NVARCHAR data types
- Representing NCHAR and NVARCHAR columns in UTF-8 encoding in non-UTF8
databases
- Support for UTF16 column encoding and representing NCHAR and NVARCHAR
columns in UTF16 encoding in all databases.
- Support for NATIONAL_CHARACTER_SET GUC variable that will determine
the encoding that will be used in NCHAR/NVARCHAR columns.
The above points are at the moment a 'wishlist' only. Our aim is to
tackle them one-by-one as we progress. I will send a detailed proposal
later with more technical details.
The main aim at the moment is to get some feedback on the above to know
if this feature is something that would benefit PostgreSQL in general,
and if users maintaining DBs in non-English speaking regions will find
this beneficial.
Rgds,
Arul Shaji
P.S.: It has been quite some time since I send a correspondence to this
list. Our mail server adds a standard legal disclaimer to all outgoing
mails, which I know that this list is not a huge fan of. I used to have
an exemption for the mails I send to this list. If the disclaimer
appears, apologies in advance. I will rectify that on the next one.