Re: UTF8 national character data type support WIP patch and list of open issues. - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: UTF8 national character data type support WIP patch and list of open issues.
Date
Msg-id 20130922.072952.1977066018971837040.t-ishii@sraoss.co.jp
Whole thread Raw
In response to Re: UTF8 national character data type support WIP patch and list of open issues.  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: UTF8 national character data type support WIP patch and list of open issues.
List pgsql-hackers
> I think the point here is that, at least as I understand it, encoding
> conversion and sanitization happens at a very early stage right now,
> when we first receive the input from the client. If the user sends a
> string of bytes as part of a query or bind placeholder that's not
> valid in the database encoding, it's going to error out before any
> type-specific code has an opportunity to get control.   Look at
> textin(), for example.  There's no encoding check there.  That means
> it's already been done at that point.  To make this work, someone's
> going to have to figure out what to do about *that*.  Until we have a
> sketch of what the design for that looks like, I don't see how we can
> credibly entertain more specific proposals.

I don't think the bind placeholder is the case. That is processed by
exec_bind_message() in postgres.c. It has enough info about the type
of the placeholder, and I think we can easily deal with NCHAR. Same
thing can be said to COPY case.

Problem is an ordinary query (simple protocol "Q" message) as you
pointed out. Encoding conversion happens at a very early stage (note
that fast-path case has the same issue). If a query message contains,
say, SHIFT-JIS and EUC-JP, then we are going into trouble because the
encoding conversion routine (pg_client_to_server) regards that the
message from client contains only one encoding. However my question
is, does it really happen? Because there's any text editor which can
create SHIFT-JIS and EUC-JP mixed text. So my guess is, when user want
to use NCHAR as SHIFT-JIS text, the rest of query consist of either
SHIFT-JIS or plain ASCII. If so, what the user need to do is, set the
client encoding to SJIFT-JIS and everything should be fine.

Maumau, is my guess correct?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp



pgsql-hackers by date:

Previous
From: Jaime Casanova
Date:
Subject: Re: Assertions in PL/PgSQL
Next
From: Josh Berkus
Date:
Subject: Re: VMs for Reviewers Available