Re: Why format() adds double quote? - Mailing list pgsql-hackers
From | Tatsuo Ishii |
---|---|
Subject | Re: Why format() adds double quote? |
Date | |
Msg-id | 20160128.090029.781286852790195741.t-ishii@sraoss.co.jp Whole thread Raw |
In response to | Re: Why format() adds double quote? ("Daniel Verite" <daniel@manitou-mail.org>) |
List | pgsql-hackers |
> I've used white space in the example, but I'm concerned about > punctuation too. > > unicode.org has this helpful paper: > http://www.unicode.org/L2/L2000/00260-sql.pdf > which studies Unicode in SQL-99 identifiers. > > The relevant BNF they extracted from the standard looks like this: > > identifier body> ::= > <identifier start> > [ { <underscore> | <identifier part> }... ] > > <identifier start> ::= > <initial alphabetic character> > | <ideographic character> > > <identifier part> ::= > <alphabetic character> > | <ideographic character> > | <decimal digit character> > | <identifier combining character> > | <underscore> > | <alternate underscore> > | <extender character> > | <identifier ignorable character> > | <connector character> > > <delimited identifier> ::= > <double quote> <delimited identifier body> <double quote> > > <delimited identifier body> ::= <delimited identifier part>... > > <delimited identifier part> ::= > <nondoublequote character> > | <doublequote symbol> > > ======== > > The current version of quote_ident() plays it safe by implementing > the rule that, as soon it encounters a character outside > of US-ASCII, it surrounds the identifier with double quotes, no matter > to which category or block this character belongs. > So its output is guaranteed to be compatible with the above grammar. > > The change in the patch is that multibyte characters just don't imply > quoting. > > But according to the points 1 and 2 of the paper, the first character > must have the Unicode alphabetic property, and it must not > have the Unicode combining property. Good point. > I'm mostly ignorant in Unicode so I'm not sure of the precise > implications of having such Unicode properties, but still my > understanding is that the new quote_ident() ignores these rules, > so in this sense it could produce outputs that wouldn't be > compatible with SQL-99. > > Also, here's what we say in the manual about non quoted identifiers: > http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html > > "SQL identifiers and key words must begin with a letter (a-z, but also > letters with diacritical marks and non-Latin letters) or an underscore > (_). Subsequent characters in an identifier or key word can be > letters, underscores, digits (0-9), or dollar signs ($)" > > So it explicitly allows letters in general (and also seems less > strict than SQL-99 about underscore), but it makes no promise about > Unicode punctuation or spaces, for instance, even though in practice > the parser seems to accept them just fine. You could arbitary extend your point, not only with Unicode punctuation or spaces, There are number of characters look-alike "-" in Unicode, for example. Do we want to treat them like ASCII "-"? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
pgsql-hackers by date: