Re: UTF8 national character data type support WIP patch and list of open issues. - Mailing list pgsql-hackers

From MauMau
Subject Re: UTF8 national character data type support WIP patch and list of open issues.
Date
Msg-id D0A2FE73E8354EDCBEE56EC79268CA4E@maumau
Whole thread Raw
In response to Re: UTF8 national character data type support WIP patch and list of open issues.  (Tatsuo Ishii <ishii@postgresql.org>)
Responses Re: UTF8 national character data type support WIP patch and list of open issues.
List pgsql-hackers
From: "Tatsuo Ishii" <ishii@postgresql.org>
> I don't think the bind placeholder is the case. That is processed by
> exec_bind_message() in postgres.c. It has enough info about the type
> of the placeholder, and I think we can easily deal with NCHAR. Same
> thing can be said to COPY case.

Yes, I've learned it.  Agreed.  If we allow an encoding for NCHAR different 
from the database encoding, we can convert text from the client encoding to 
the NCHAR encoding in nchar_in() for example.  We can retrieve the NCHAR 
encoding from pg_database and store it in a global variable at session 
start.


> Problem is an ordinary query (simple protocol "Q" message) as you
> pointed out. Encoding conversion happens at a very early stage (note
> that fast-path case has the same issue). If a query message contains,
> say, SHIFT-JIS and EUC-JP, then we are going into trouble because the
> encoding conversion routine (pg_client_to_server) regards that the
> message from client contains only one encoding. However my question
> is, does it really happen? Because there's any text editor which can
> create SHIFT-JIS and EUC-JP mixed text. So my guess is, when user want
> to use NCHAR as SHIFT-JIS text, the rest of query consist of either
> SHIFT-JIS or plain ASCII. If so, what the user need to do is, set the
> client encoding to SJIFT-JIS and everything should be fine.
>
> Maumau, is my guess correct?

Yes, I believe you are right.  Regardless of whether we support multiple 
encodings in one database or not, a single client encoding will be 
sufficient for one session.  When receiving the "Q" message, the whole SQL 
text is converted from the client encoding to the database encoding.  This 
part needs no modification.  During execution of the "Q" message, NCHAR 
values are converted from the database encoding to the NCHAR encoding.

Thank you very much, Tatsuo san.  Everybody, is there any other challenge we 
should consider to support NCHAR/NVARCHAR types as distinct types?

Regards
MauMau




pgsql-hackers by date:

Previous
From: Abhijit Menon-Sen
Date:
Subject: Re: LDAP: bugfix and deprecated OpenLDAP API
Next
From: samthakur74
Date:
Subject: Re: pg_stat_statements: calls under-estimation propagation