Thread: [suggestion]support UNICODE host variables in ECPG
Hi all. There is a demand for ECPG to support UNICODE host variables. This topic has also appeared in thread [1]. I would like to discuss whether to support in postgres. Do you have any opinion? The specifications and usage described below are the same as in [1]. Requirements ============ 1. support utext keyword in ECPG The utext is used to define the Unicode host variable in ECPG application in windows platform. 2. support UVARCHAR keyword in ECPG The UVARCHAR is used to define the Unicode vary length host variable in ECPG application in windows platform. 3. Saving the content of the Unicode variables into the database as database character set or getting the content from the database into the Unicode variables. 4. Firstly can only consider the UTF8 as database character set and UTF16 as the Unicode format for host variable. A datatype convert will be done between the UTF8 and UTF16 by ECPG. 5. Since Unicode has big-endian and little-endian format, a environment variable is used to identify them and do the endianness convert accordingly. Usage ============ int main() { EXEC SQL BEGIN DECLARE SECTION; utext employee[20] ; /* define Unicode host variable */ UVARCHAR address[50] ; /* defin a vary length Unicode host variable */ EXEC SQL END DECLARE SECTION; ... EXEC SQL CREATE TABLE emp (ename char(20), address varchar(50)); /* UTF8 is the database character set */ EXEC SQL INSERT INTO emp (ename) VALUES ('Mike', '1 sydney, NSW') ; /* Database character set converted to Unicode */ EXEC SQL SELECT ename INTO :employee FROM emp ; /* Database character set converted to Unicode */ EXEC SQL SELECT address INTO :address FROM emp ; wprintf(L"employee name is %s\n",employee); wprintf(L"employee address is %s\n", address.attr); DELETE * FROM emp; /* Unicode converted to Database character */ EXEC SQL INSERT INTO emp (ename,address) VALUES (:employee, :address); EXEC SQL DROP TABLE emp; EXEC SQL DISCONNECT ALL; } [1] https://www.postgresql.org/message-id/flat/CAF3%2BxMLcare1QrDzTxP-3JZyH5SXRkGzNUf-khSgPfmpQpkz%2BA%40mail.gmail.com Best regards, --------------------- Ryohei Nagaura
Nagaura-san I understand that the previous discussion pointed that the feature had better be implemented more simply or step-by-step and description about implementation is needed more. I also think it prevented the discussion to reach to the detail of feature. What is your opinion about it? Regards Ryo Matsumura
From: Nagaura, Ryohei [mailto:nagaura.ryohei@jp.fujitsu.com] > There is a demand for ECPG to support UNICODE host variables. > This topic has also appeared in thread [1]. > I would like to discuss whether to support in postgres. > > Do you have any opinion? * What's the benefit of supporting UTF16 in host variables? * Does your proposal comply with the SQL standard? If not, what does the SQL standard say about support for UTF16? * Why only Windows? Regards Takayuki Tsunakawa
Matsumura-san, Tsunakawa-san Thank you for reply. Tsunakawa-san > * What's the benefit of supporting UTF16 in host variables? There are two benefits. 1) As byte per character is constant in UTF16 encoding, it can process strings more efficiently than other encodings. 2) This enables C programmers to use wide characters. > * Does your proposal comply with the SQL standard? If not, what does the > SQL standard say about support for UTF16? I referred to the document, but I could not find it. Does anyone know about this? > * Why only Windows? It should be implemented in other OS if needed. Matsumura-san > I understand that the previous discussion pointed that the feature had > better be implemented more simply or step-by-step and description about > implementation is needed more. > I also think it prevented the discussion to reach to the detail of feature. > What is your opinion about it? I wanted to discuss the necessity first, so I did not answer. I'm very sorry for not having mentioned it. If it is judged that this function is necessary, I'll remake the design. Best regards, --------------------- Ryohei Nagaura
"Nagaura, Ryohei" <nagaura.ryohei@jp.fujitsu.com> writes: > Tsunakawa-san >> * What's the benefit of supporting UTF16 in host variables? > 1) As byte per character is constant in UTF16 encoding, it can process strings more efficiently than other encodings. I don't think I buy that argument; it falls down as soon as you consider characters above U+FFFF. I worry that by supporting UTF16, we'd basically be encouraging users to write code that fails on such characters, which doesn't seem like good project policy. regards, tom lane
> * What's the benefit of supporting UTF16 in host variables? I think that the first benefit of suggestion is providing a way to treat UTF16 chars for application. Whether or not to support above U+FFFF (e.g. surrogate pair) may be a next discussion. For that purpose, implementation for the suggestion may be easier than for supporting UTF16 at client_encoding. Uvarchar seems to be a label indicating that stored data is encoded by UTF16. It localizes the impacts within only labeled host variable. # At least, ecpglib is not good at treating 0x00 as a part of one character. Regards Ryo Matsumura
Hi, On Fri, Dec 21, 2018 at 5:08 PM, Tom Lane wrote: > I don't think I buy that argument; it falls down as soon as you consider > characters above U+FFFF. I worry that by supporting UTF16, we'd basically > be encouraging users to write code that fails on such characters, which > doesn't seem like good project policy. Oh, I mistook. Thank you for pointing out. On Mon, Dec 24, 2018 at 5:07 PM, Matsumura Ryo wrote: > I think that the first benefit of suggestion is providing a way to treat > UTF16 chars for application. Whether or not to support above > U+FFFF (e.g. surrogate pair) may be a next discussion. Thank you for your comments. Yes, I'd like to judge the necessity of this function before designing. Best regards, --------------------- Ryohei Nagaura