Re: Why format() adds double quote? - Mailing list pgsql-hackers
From | Daniel Verite |
---|---|
Subject | Re: Why format() adds double quote? |
Date | |
Msg-id | e3788f38-83e5-4036-9fd7-faa6ea32b774@mm Whole thread Raw |
In response to | Re: Why format() adds double quote? (Tatsuo Ishii <ishii@postgresql.org>) |
Responses |
Re: Why format() adds double quote?
|
List | pgsql-hackers |
Tatsuo Ishii wrote: > 2) What does the SQL standard say? Do they say that non ASCII white > spaces should be treated as ASCII white spaces? I've used white space in the example, but I'm concerned about punctuation too. unicode.org has this helpful paper: http://www.unicode.org/L2/L2000/00260-sql.pdf which studies Unicode in SQL-99 identifiers. The relevant BNF they extracted from the standard looks like this: identifier body> ::= <identifier start> [ { <underscore> | <identifier part> }... ] <identifier start> ::= <initial alphabetic character> | <ideographic character> <identifier part> ::= <alphabetic character> | <ideographic character> | <decimal digit character> | <identifiercombining character> | <underscore> | <alternate underscore> | <extender character> | <identifier ignorablecharacter> | <connector character> <delimited identifier> ::= <double quote> <delimited identifier body> <double quote> <delimited identifier body> ::= <delimited identifier part>... <delimited identifier part> ::= <nondoublequote character> | <doublequote symbol> ======== The current version of quote_ident() plays it safe by implementing the rule that, as soon it encounters a character outside of US-ASCII, it surrounds the identifier with double quotes, no matter to which category or block this character belongs. So its output is guaranteed to be compatible with the above grammar. The change in the patch is that multibyte characters just don't imply quoting. But according to the points 1 and 2 of the paper, the first character must have the Unicode alphabetic property, and it must not have the Unicode combining property. I'm mostly ignorant in Unicode so I'm not sure of the precise implications of having such Unicode properties, but still my understanding is that the new quote_ident() ignores these rules, so in this sense it could produce outputs that wouldn't be compatible with SQL-99. Also, here's what we say in the manual about non quoted identifiers: http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html "SQL identifiers and key words must begin with a letter (a-z, but also letters with diacritical marks and non-Latin letters) or an underscore (_). Subsequent characters in an identifier or key word can be letters, underscores, digits (0-9), or dollar signs ($)" So it explicitly allows letters in general (and also seems less strict than SQL-99 about underscore), but it makes no promise about Unicode punctuation or spaces, for instance, even though in practice the parser seems to accept them just fine. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
pgsql-hackers by date: