Re: Why format() adds double quote? - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: Why format() adds double quote?
Date
Msg-id CAFj8pRDOHSNR5L6bvChrO2Do2-avOORtUwJq2XmrsMCHteO6gg@mail.gmail.com
Whole thread Raw
In response to Re: Why format() adds double quote?  (Tatsuo Ishii <ishii@postgresql.org>)
Responses Re: Why format() adds double quote?  (Tatsuo Ishii <ishii@postgresql.org>)
List pgsql-hackers
Hi

2016-01-20 7:20 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:
> 2016-01-20 3:47 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:
>
>> test=# select format('%I', t) from t1;
>>   format
>> ----------
>>  aaa
>>  "AAA"
>>  "あいう"
>> (3 rows)
>>
>> Why is the text value of the third line needed to be double quoted?
>> (note that it is a multi byte character). Same thing can be said to
>> quote_ident().
>>
>> We treat identifiers made of the multi byte characters without double
>> quotation (non delimited identifier) in other places.
>>
>> test=# create table t2(あいう text);
>> CREATE TABLE
>> test=# insert into t2 values('aaa');
>> INSERT 0 1
>> test=# select あいう from t2;
>>  あいう
>> --------
>>  aaa
>> (1 row)
>
> format uses same routine as quote_ident. So quote_ident should be fixed
> first.

Yes, I had that in my mind too.

Attached is the proposed patch to fix the bug.
Regression tests passed.

Here is an example after the patch. Note that the third row is not
quoted any more.

test=#  select format('%I', あいう) from t2;
 format
--------
 aaa
 "AAA"
 あああ
(3 rows)

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 3783e97..b93fc27 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -9405,7 +9405,7 @@ quote_identifier(const char *ident)
         * would like to use <ctype.h> macros here, but they might yield unwanted
         * locale-specific results...
         */
-       safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_');
+       safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_' || IS_HIGHBIT_SET(ident[0]));

        for (ptr = ident; *ptr; ptr++)
        {
@@ -9413,7 +9413,8 @@ quote_identifier(const char *ident)

                if ((ch >= 'a' && ch <= 'z') ||
                        (ch >= '0' && ch <= '9') ||
-                       (ch == '_'))
+                       (ch == '_') ||
+                       (IS_HIGHBIT_SET(ch)))
                {
                        /* okay */
                }


This patch ls simply - I remember I was surprised, so we allow any multibyte char few months ago.

+1

Pavel


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Logical decoding on standby
Next
From: "Shulgin, Oleksandr"
Date:
Subject: Re: Stream consistent snapshot via a logical decoding plugin as a series of INSERTs