Re: Why format() adds double quote? - Mailing list pgsql-hackers
From | Pavel Stehule |
---|---|
Subject | Re: Why format() adds double quote? |
Date | |
Msg-id | CAFj8pRD7-Ko2E0cuj-PMYJN7Pmcqud1X-0RCBJh1h0oY30F2Fg@mail.gmail.com Whole thread Raw |
In response to | Re: Why format() adds double quote? (Tatsuo Ishii <ishii@postgresql.org>) |
Responses |
Re: Why format() adds double quote?
(Robert Haas <robertmhaas@gmail.com>)
|
List | pgsql-hackers |
2016-01-20 10:17 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:
If we would go this way, question is if we should back patch this or> Hi
>
> 2016-01-20 7:20 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:
>
>> > 2016-01-20 3:47 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:
>> >
>> >> test=# select format('%I', t) from t1;
>> >> format
>> >> ----------
>> >> aaa
>> >> "AAA"
>> >> "あいう"
>> >> (3 rows)
>> >>
>> >> Why is the text value of the third line needed to be double quoted?
>> >> (note that it is a multi byte character). Same thing can be said to
>> >> quote_ident().
>> >>
>> >> We treat identifiers made of the multi byte characters without double
>> >> quotation (non delimited identifier) in other places.
>> >>
>> >> test=# create table t2(あいう text);
>> >> CREATE TABLE
>> >> test=# insert into t2 values('aaa');
>> >> INSERT 0 1
>> >> test=# select あいう from t2;
>> >> あいう
>> >> --------
>> >> aaa
>> >> (1 row)
>> >
>> > format uses same routine as quote_ident. So quote_ident should be fixed
>> > first.
>>
>> Yes, I had that in my mind too.
>>
>> Attached is the proposed patch to fix the bug.
>> Regression tests passed.
>>
>> Here is an example after the patch. Note that the third row is not
>> quoted any more.
>>
>> test=# select format('%I', あいう) from t2;
>> format
>> --------
>> aaa
>> "AAA"
>> あああ
>> (3 rows)
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>> diff --git a/src/backend/utils/adt/ruleutils.c
>> b/src/backend/utils/adt/ruleutils.c
>> index 3783e97..b93fc27 100644
>> --- a/src/backend/utils/adt/ruleutils.c
>> +++ b/src/backend/utils/adt/ruleutils.c
>> @@ -9405,7 +9405,7 @@ quote_identifier(const char *ident)
>> * would like to use <ctype.h> macros here, but they might yield
>> unwanted
>> * locale-specific results...
>> */
>> - safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_');
>> + safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_' ||
>> IS_HIGHBIT_SET(ident[0]));
>>
>> for (ptr = ident; *ptr; ptr++)
>> {
>> @@ -9413,7 +9413,8 @@ quote_identifier(const char *ident)
>>
>> if ((ch >= 'a' && ch <= 'z') ||
>> (ch >= '0' && ch <= '9') ||
>> - (ch == '_'))
>> + (ch == '_') ||
>> + (IS_HIGHBIT_SET(ch)))
>> {
>> /* okay */
>> }
>>
>>
> This patch ls simply - I remember I was surprised, so we allow any
> multibyte char few months ago.
>
> +1
not since the patch apparently changes the existing
behaviors. Comments? I would think we should not.
I am sure, so we should not backport this change. This can breaks customer regress tests - and the current behave isn't 100% correct, but it is safe.
Pavel
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
pgsql-hackers by date: