Home > mailing lists

Re: UTF8 national character data type support WIP patch and list of open issues. - Mailing list pgsql-hackers

From	MauMau
Subject	Re: UTF8 national character data type support WIP patch and list of open issues.
Date	September 23, 2013 06:51:03
Msg-id	D0A2FE73E8354EDCBEE56EC79268CA4E@maumau Whole thread
In response to	Re: UTF8 national character data type support WIP patch and list of open issues. (Tatsuo Ishii <ishii@postgresql.org>)
Responses	Re: UTF8 national character data type support WIP patch and list of open issues.
List	pgsql-hackers

Tree view

From: "Tatsuo Ishii" <ishii@postgresql.org>
> I don't think the bind placeholder is the case. That is processed by
> exec_bind_message() in postgres.c. It has enough info about the type
> of the placeholder, and I think we can easily deal with NCHAR. Same
> thing can be said to COPY case.

Yes, I've learned it.  Agreed.  If we allow an encoding for NCHAR different 
from the database encoding, we can convert text from the client encoding to 
the NCHAR encoding in nchar_in() for example.  We can retrieve the NCHAR 
encoding from pg_database and store it in a global variable at session 
start.


> Problem is an ordinary query (simple protocol "Q" message) as you
> pointed out. Encoding conversion happens at a very early stage (note
> that fast-path case has the same issue). If a query message contains,
> say, SHIFT-JIS and EUC-JP, then we are going into trouble because the
> encoding conversion routine (pg_client_to_server) regards that the
> message from client contains only one encoding. However my question
> is, does it really happen? Because there's any text editor which can
> create SHIFT-JIS and EUC-JP mixed text. So my guess is, when user want
> to use NCHAR as SHIFT-JIS text, the rest of query consist of either
> SHIFT-JIS or plain ASCII. If so, what the user need to do is, set the
> client encoding to SJIFT-JIS and everything should be fine.
>
> Maumau, is my guess correct?

Yes, I believe you are right.  Regardless of whether we support multiple 
encodings in one database or not, a single client encoding will be 
sufficient for one session.  When receiving the "Q" message, the whole SQL 
text is converted from the client encoding to the database encoding.  This 
part needs no modification.  During execution of the "Q" message, NCHAR 
values are converted from the database encoding to the NCHAR encoding.

Thank you very much, Tatsuo san.  Everybody, is there any other challenge we 
should consider to support NCHAR/NVARCHAR types as distinct types?

Regards
MauMau

pgsql-hackers by date:

From: Abhijit Menon-Sen
Date: 23 September 2013, 06:47:51
Subject: Re: LDAP: bugfix and deprecated OpenLDAP API

From: samthakur74
Date: 23 September 2013, 07:56:22
Subject: Re: pg_stat_statements: calls under-estimation propagation

Re: UTF8 national character data type support WIP patch and list of open issues. - Mailing list pgsql-hackers

Previous

Next