Home > mailing lists

Re: UTF8 national character data type support WIP patch and list of open issues. - Mailing list pgsql-hackers

From	Tatsuo Ishii
Subject	Re: UTF8 national character data type support WIP patch and list of open issues.
Date	September 22, 2013 01:30:19
Msg-id	20130922.072952.1977066018971837040.t-ishii@sraoss.co.jp Whole thread Raw
In response to	Re: UTF8 national character data type support WIP patch and list of open issues. (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: UTF8 national character data type support WIP patch and list of open issues.
List	pgsql-hackers

Tree view

> I think the point here is that, at least as I understand it, encoding
> conversion and sanitization happens at a very early stage right now,
> when we first receive the input from the client. If the user sends a
> string of bytes as part of a query or bind placeholder that's not
> valid in the database encoding, it's going to error out before any
> type-specific code has an opportunity to get control.   Look at
> textin(), for example.  There's no encoding check there.  That means
> it's already been done at that point.  To make this work, someone's
> going to have to figure out what to do about *that*.  Until we have a
> sketch of what the design for that looks like, I don't see how we can
> credibly entertain more specific proposals.

I don't think the bind placeholder is the case. That is processed by
exec_bind_message() in postgres.c. It has enough info about the type
of the placeholder, and I think we can easily deal with NCHAR. Same
thing can be said to COPY case.

Problem is an ordinary query (simple protocol "Q" message) as you
pointed out. Encoding conversion happens at a very early stage (note
that fast-path case has the same issue). If a query message contains,
say, SHIFT-JIS and EUC-JP, then we are going into trouble because the
encoding conversion routine (pg_client_to_server) regards that the
message from client contains only one encoding. However my question
is, does it really happen? Because there's any text editor which can
create SHIFT-JIS and EUC-JP mixed text. So my guess is, when user want
to use NCHAR as SHIFT-JIS text, the rest of query consist of either
SHIFT-JIS or plain ASCII. If so, what the user need to do is, set the
client encoding to SJIFT-JIS and everything should be fine.

Maumau, is my guess correct?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

pgsql-hackers by date:

From: Jaime Casanova
Date: 22 September 2013, 01:17:05
Subject: Re: Assertions in PL/PgSQL

From: Josh Berkus
Date: 22 September 2013, 01:35:51
Subject: Re: VMs for Reviewers Available

Re: UTF8 national character data type support WIP patch and list of open issues. - Mailing list pgsql-hackers

Previous

Next