Home > mailing lists

Re: Why format() adds double quote? - Mailing list pgsql-hackers

From	Tatsuo Ishii
Subject	Re: Why format() adds double quote?
Date	January 28, 2016 03:00:46
Msg-id	20160128.090029.781286852790195741.t-ishii@sraoss.co.jp Whole thread Raw
In response to	Re: Why format() adds double quote? ("Daniel Verite" <daniel@manitou-mail.org>)
List	pgsql-hackers

Tree view

> I've used white space in the example, but I'm concerned about
> punctuation too.
> 
> unicode.org has this helpful paper:
> http://www.unicode.org/L2/L2000/00260-sql.pdf
> which studies Unicode in SQL-99 identifiers.
> 
> The relevant BNF they extracted from the standard looks like this:
> 
> identifier body> ::=
>    <identifier start>
>    [ { <underscore> | <identifier part> }... ]
> 
> <identifier start> ::=
>    <initial alphabetic character>
>    | <ideographic character>
> 
> <identifier part> ::=
>     <alphabetic character>
>     | <ideographic character>
>     | <decimal digit character>
>     | <identifier combining character>
>     | <underscore>
>     | <alternate underscore>
>     | <extender character>
>     | <identifier ignorable character>
>     | <connector character>
> 
> <delimited identifier> ::=
>    <double quote> <delimited identifier body> <double quote>
> 
> <delimited identifier body> ::= <delimited identifier part>...
> 
> <delimited identifier part> ::=
>    <nondoublequote character>
>    | <doublequote symbol>
> 
> ========
> 
> The current version of quote_ident() plays it safe by implementing
> the rule that, as soon it encounters a character outside
> of US-ASCII, it surrounds the identifier with double quotes, no matter
> to which category or block this character belongs.
> So its output is guaranteed to be compatible with the above grammar.
> 
> The change in the patch is that multibyte characters just don't imply
> quoting.
> 
> But according to the points 1 and 2 of the paper, the first character
> must have the Unicode alphabetic property, and it must not
> have the Unicode combining property.

Good point.

> I'm mostly ignorant in Unicode so I'm not sure of the precise
> implications of having such Unicode properties, but still my
> understanding is that the new quote_ident() ignores these rules,
> so in this sense it could produce outputs that wouldn't be
> compatible with SQL-99.
> 
> Also, here's what we say in the manual about non quoted identifiers:
> http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html
> 
> "SQL identifiers and key words must begin with a letter (a-z, but also
> letters with diacritical marks and non-Latin letters) or an underscore
> (_). Subsequent characters in an identifier or key word can be
> letters, underscores, digits (0-9), or dollar signs ($)"
> 
> So it explicitly allows letters in general  (and also seems less
> strict than SQL-99 about underscore), but it makes no promise about
> Unicode punctuation or spaces, for instance, even though in practice
> the parser seems to accept them just fine.

You could arbitary extend your point, not only with Unicode
punctuation or spaces, There are number of characters look-alike "-"
in Unicode, for example.　Do we want to treat them like ASCII "-"?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

pgsql-hackers by date:

From: "Dickson S. Guedes"
Date: 28 January 2016, 01:47:07
Subject: Re: Why format() adds double quote?

From: Alvaro Herrera
Date: 28 January 2016, 03:05:04
Subject: Re: [PATCH] we have added support for box type in SP-GiST index

Re: Why format() adds double quote? - Mailing list pgsql-hackers

Previous

Next