Thread: [suggestion]support UNICODE host variables in ECPG

[suggestion]support UNICODE host variables in ECPG

From
"Nagaura, Ryohei"
Date:
Hi all.

There is a demand for ECPG to support UNICODE host variables.
This topic has also appeared in thread [1].
I would like to discuss whether to support in postgres.

Do you have any opinion?

The specifications and usage described below are the same as in [1].

Requirements
============
1. support utext keyword in ECPG



The utext is used to define the Unicode host variable in ECPG application
in windows platform.



2. support  UVARCHAR keyword in ECPG



The UVARCHAR is used to define the Unicode vary length host variable in
ECPG application in windows platform.



3. Saving the content of the Unicode variables into the database as
database character set or getting the content from the database into the
Unicode variables.



4. Firstly can only consider the UTF8 as database character set and UTF16
as the Unicode format for host variable. A datatype convert will be done
between the UTF8 and UTF16 by ECPG.



5. Since Unicode has big-endian and little-endian format, a environment
variable is used to identify them and do the endianness convert accordingly.



Usage
============
int main() {
    EXEC SQL BEGIN DECLARE SECTION;
        utext employee[20] ;    /* define Unicode host variable  */
        UVARCHAR address[50] ;  /* defin a vary length Unicode host
variable  */
    EXEC SQL END DECLARE SECTION;



    ...



    EXEC SQL CREATE TABLE emp (ename char(20), address varchar(50));



    /* UTF8 is the database character set  */
    EXEC SQL INSERT INTO emp (ename) VALUES ('Mike', '1 sydney, NSW') ;



    /* Database character set converted to Unicode */
    EXEC SQL SELECT ename INTO :employee FROM emp ;



    /* Database character set converted to Unicode */
    EXEC SQL SELECT address INTO :address FROM emp ;



    wprintf(L"employee name is %s\n",employee);



    wprintf(L"employee address is %s\n", address.attr);



    DELETE * FROM emp;



    /* Unicode converted to Database character */
    EXEC SQL INSERT INTO emp (ename,address) VALUES (:employee, :address);



    EXEC SQL DROP TABLE emp;
    EXEC SQL DISCONNECT ALL;
 }

[1]
https://www.postgresql.org/message-id/flat/CAF3%2BxMLcare1QrDzTxP-3JZyH5SXRkGzNUf-khSgPfmpQpkz%2BA%40mail.gmail.com

Best regards,
---------------------
Ryohei Nagaura




RE: [suggestion]support UNICODE host variables in ECPG

From
"Matsumura, Ryo"
Date:
Nagaura-san

I understand that the previous discussion pointed that the feature had better
be implemented more simply or step-by-step and description about implementation
is needed more.
I also think it prevented the discussion to reach to the detail of feature.

What is your opinion about it?

Regards
Ryo Matsumura



RE: [suggestion]support UNICODE host variables in ECPG

From
"Tsunakawa, Takayuki"
Date:
From: Nagaura, Ryohei [mailto:nagaura.ryohei@jp.fujitsu.com]
> There is a demand for ECPG to support UNICODE host variables.
> This topic has also appeared in thread [1].
> I would like to discuss whether to support in postgres.
> 
> Do you have any opinion?

* What's the benefit of supporting UTF16 in host variables?
* Does your proposal comply with the SQL standard?  If not, what does the SQL standard say about support for UTF16?
* Why only Windows?


Regards
Takayuki Tsunakawa




RE: [suggestion]support UNICODE host variables in ECPG

From
"Nagaura, Ryohei"
Date:
Matsumura-san, Tsunakawa-san

Thank you for reply.

Tsunakawa-san
> * What's the benefit of supporting UTF16 in host variables?
There are two benefits.
1) As byte per character is constant in UTF16 encoding, it can process strings more efficiently than other encodings.
2) This enables C programmers to use wide characters.

> * Does your proposal comply with the SQL standard?  If not, what does the
> SQL standard say about support for UTF16?
I referred to the document, but I could not find it.
Does anyone know about this?

> * Why only Windows?
It should be implemented in other OS if needed.

Matsumura-san
> I understand that the previous discussion pointed that the feature had
> better be implemented more simply or step-by-step and description about
> implementation is needed more.
> I also think it prevented the discussion to reach to the detail of feature.
> What is your opinion about it?
I wanted to discuss the necessity first, so I did not answer.
I'm very sorry for not having mentioned it.
If it is judged that this function is necessary, I'll remake the design.

Best regards,
---------------------
Ryohei Nagaura



Re: [suggestion]support UNICODE host variables in ECPG

From
Tom Lane
Date:
"Nagaura, Ryohei" <nagaura.ryohei@jp.fujitsu.com> writes:
> Tsunakawa-san
>> * What's the benefit of supporting UTF16 in host variables?

> 1) As byte per character is constant in UTF16 encoding, it can process strings more efficiently than other encodings.

I don't think I buy that argument; it falls down as soon as you consider
characters above U+FFFF.  I worry that by supporting UTF16, we'd basically
be encouraging users to write code that fails on such characters, which
doesn't seem like good project policy.

            regards, tom lane


RE: [suggestion]support UNICODE host variables in ECPG

From
"Matsumura, Ryo"
Date:
> * What's the benefit of supporting UTF16 in host variables?

I think that the first benefit of suggestion is providing a way to
treat UTF16 chars for application. Whether or not to support above
U+FFFF (e.g. surrogate pair) may be a next discussion.

For that purpose, implementation for the suggestion may be easier
than for supporting UTF16 at client_encoding. Uvarchar seems to be
a label indicating that stored data is encoded by UTF16. It localizes
the impacts within only labeled host variable.

# At least, ecpglib is not good at treating 0x00 as a part of one character.

Regards
Ryo Matsumura




RE: [suggestion]support UNICODE host variables in ECPG

From
"Nagaura, Ryohei"
Date:
Hi,

On Fri, Dec 21, 2018 at 5:08 PM, Tom Lane wrote:
> I don't think I buy that argument; it falls down as soon as you consider
> characters above U+FFFF.  I worry that by supporting UTF16, we'd basically
> be encouraging users to write code that fails on such characters, which
> doesn't seem like good project policy.
Oh, I mistook.
Thank you for pointing out.

On Mon, Dec 24, 2018 at 5:07 PM, Matsumura Ryo wrote:
> I think that the first benefit of suggestion is providing a way to treat
> UTF16 chars for application. Whether or not to support above
> U+FFFF (e.g. surrogate pair) may be a next discussion.
Thank you for your comments.
Yes, I'd like to judge the necessity of this function before designing.

Best regards,
---------------------
Ryohei Nagaura