Re: Why format() adds double quote? - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: Why format() adds double quote?
Date
Msg-id 20160120.181715.761036523837813422.t-ishii@sraoss.co.jp
Whole thread Raw
In response to Re: Why format() adds double quote?  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: Why format() adds double quote?  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-hackers
> Hi
> 
> 2016-01-20 7:20 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:
> 
>> > 2016-01-20 3:47 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:
>> >
>> >> test=# select format('%I', t) from t1;
>> >>   format
>> >> ----------
>> >>  aaa
>> >>  "AAA"
>> >>  "あいう"
>> >> (3 rows)
>> >>
>> >> Why is the text value of the third line needed to be double quoted?
>> >> (note that it is a multi byte character). Same thing can be said to
>> >> quote_ident().
>> >>
>> >> We treat identifiers made of the multi byte characters without double
>> >> quotation (non delimited identifier) in other places.
>> >>
>> >> test=# create table t2(あいう text);
>> >> CREATE TABLE
>> >> test=# insert into t2 values('aaa');
>> >> INSERT 0 1
>> >> test=# select あいう from t2;
>> >>  あいう
>> >> --------
>> >>  aaa
>> >> (1 row)
>> >
>> > format uses same routine as quote_ident. So quote_ident should be fixed
>> > first.
>>
>> Yes, I had that in my mind too.
>>
>> Attached is the proposed patch to fix the bug.
>> Regression tests passed.
>>
>> Here is an example after the patch. Note that the third row is not
>> quoted any more.
>>
>> test=#  select format('%I', あいう) from t2;
>>  format
>> --------
>>  aaa
>>  "AAA"
>>  あああ
>> (3 rows)
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>> diff --git a/src/backend/utils/adt/ruleutils.c
>> b/src/backend/utils/adt/ruleutils.c
>> index 3783e97..b93fc27 100644
>> --- a/src/backend/utils/adt/ruleutils.c
>> +++ b/src/backend/utils/adt/ruleutils.c
>> @@ -9405,7 +9405,7 @@ quote_identifier(const char *ident)
>>          * would like to use <ctype.h> macros here, but they might yield
>> unwanted
>>          * locale-specific results...
>>          */
>> -       safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_');
>> +       safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_' ||
>> IS_HIGHBIT_SET(ident[0]));
>>
>>         for (ptr = ident; *ptr; ptr++)
>>         {
>> @@ -9413,7 +9413,8 @@ quote_identifier(const char *ident)
>>
>>                 if ((ch >= 'a' && ch <= 'z') ||
>>                         (ch >= '0' && ch <= '9') ||
>> -                       (ch == '_'))
>> +                       (ch == '_') ||
>> +                       (IS_HIGHBIT_SET(ch)))
>>                 {
>>                         /* okay */
>>                 }
>>
>>
> This patch ls simply - I remember I was surprised, so we allow any
> multibyte char few months ago.
> 
> +1

If we would go this way, question is if we should back patch this or
not since the patch apparently changes the existing
behaviors. Comments?  I would think we should not.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



pgsql-hackers by date:

Previous
From: "Shulgin, Oleksandr"
Date:
Subject: Re: Stream consistent snapshot via a logical decoding plugin as a series of INSERTs
Next
From: Pavel Stehule
Date:
Subject: Re: Why format() adds double quote?