Home > mailing lists

Re: Why format() adds double quote? - Mailing list pgsql-hackers

From	Pavel Stehule
Subject	Re: Why format() adds double quote?
Date	January 20, 2016 09:20:47
Msg-id	CAFj8pRD7-Ko2E0cuj-PMYJN7Pmcqud1X-0RCBJh1h0oY30F2Fg@mail.gmail.com Whole thread
In response to	Re: Why format() adds double quote? (Tatsuo Ishii <ishii@postgresql.org>)
Responses	Re: Why format() adds double quote?
List	pgsql-hackers

Tree view

2016-01-20 10:17 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:

> Hi
>
> 2016-01-20 7:20 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:
>
>> > 2016-01-20 3:47 GMT+01:00 Tatsuo Ishii <ishii@postgresql.org>:
>> >
>> >> test=# select format('%I', t) from t1;
>> >> format
>> >> ----------
>> >> aaa
>> >> "AAA"
>> >> "あいう"
>> >> (3 rows)
>> >>
>> >> Why is the text value of the third line needed to be double quoted?
>> >> (note that it is a multi byte character). Same thing can be said to
>> >> quote_ident().
>> >>
>> >> We treat identifiers made of the multi byte characters without double
>> >> quotation (non delimited identifier) in other places.
>> >>
>> >> test=# create table t2(あいう text);
>> >> CREATE TABLE
>> >> test=# insert into t2 values('aaa');
>> >> INSERT 0 1
>> >> test=# select あいう from t2;
>> >> あいう
>> >> --------
>> >> aaa
>> >> (1 row)
>> >
>> > format uses same routine as quote_ident. So quote_ident should be fixed
>> > first.
>>
>> Yes, I had that in my mind too.
>>
>> Attached is the proposed patch to fix the bug.
>> Regression tests passed.
>>
>> Here is an example after the patch. Note that the third row is not
>> quoted any more.
>>
>> test=# select format('%I', あいう) from t2;
>> format
>> --------
>> aaa
>> "AAA"
>> あああ
>> (3 rows)
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>> diff --git a/src/backend/utils/adt/ruleutils.c
>> b/src/backend/utils/adt/ruleutils.c
>> index 3783e97..b93fc27 100644
>> --- a/src/backend/utils/adt/ruleutils.c
>> +++ b/src/backend/utils/adt/ruleutils.c
>> @@ -9405,7 +9405,7 @@ quote_identifier(const char *ident)
>> * would like to use <ctype.h> macros here, but they might yield
>> unwanted
>> * locale-specific results...
>> */
>> - safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_');
>> + safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_' ||
>> IS_HIGHBIT_SET(ident[0]));
>>
>> for (ptr = ident; *ptr; ptr++)
>> {
>> @@ -9413,7 +9413,8 @@ quote_identifier(const char *ident)
>>
>> if ((ch >= 'a' && ch <= 'z') ||
>> (ch >= '0' && ch <= '9') ||
>> - (ch == '_'))
>> + (ch == '_') ||
>> + (IS_HIGHBIT_SET(ch)))
>> {
>> /* okay */
>> }
>>
>>
> This patch ls simply - I remember I was surprised, so we allow any
> multibyte char few months ago.
>
> +1

If we would go this way, question is if we should back patch this or
not since the patch apparently changes the existing
behaviors. Comments? I would think we should not.

I am sure, so we should not backport this change. This can breaks customer regress tests - and the current behave isn't 100% correct, but it is safe.

Pavel

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

pgsql-hackers by date:

From: Tatsuo Ishii
Date: 20 January 2016, 09:17:26
Subject: Re: Why format() adds double quote?

From: Konstantin Knizhnik
Date: 20 January 2016, 09:28:45
Subject: Batch update of indexes

Re: Why format() adds double quote? - Mailing list pgsql-hackers

Previous

Next