>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Also, I'm going to push back on the claim that allowing comments
Tom> there is required by the SQL spec. The relevant rules in SQL:2011
Tom> are
Tom> <Unicode character string literal> ::=
Tom> [ <introducer> <character set specification> ]
Tom> U <ampersand> <quote> [ <Unicode representation>... ] <quote>
Tom> [ { <separator> <quote> [ <Unicode representation>... ] <quote> }... ]
Tom> <Unicode escape specifier>
Tom> <Unicode escape specifier> ::=
Tom> [ UESCAPE <quote> <Unicode escape character> <quote> ]
Tom> I do not see any principled way of arguing that these rules
Tom> require comments to be allowed adjacent to UESCAPE without also
Tom> claiming that they must be allowed between, say, the initial 'U'
Tom> and the ampersand.
These are the rules that (as far as I can see) apply to that case:
5.2 <token> and <separator>
<separator> ::=
{ <comment> | <white space> }...
7) Any <token> may be followed by a <separator>.
5.3 <literal>
11) In a <Unicode character string literal>, there shall be no
<separator> between the "U" and the <ampersand> nor between the
<ampersand> and the <quote>.
Tom> The only place these rules allow a <separator> is between segments
Tom> of a multiline literal. It looks to me like an extension that we
Tom> even allow whitespace around UESCAPE.
I think that that use of <separator> is only to indicate that a
<separator> there is _required_, rather than optional as it usually is
after tokens, and that the special rule about requiring newlines also
applies only to that specific use of <separator>.
If the whole <Unicode character string literal> is regarded as being a
single token, and therefore rule 5.2.7 above didn't apply around the
UESCAPE, then there would be no reason to write rule 5.3.11 forbidding
separators within the U&' part.
(In the case of X'...', there's rule 5.2.5, which as I see it would
prevent a space after the X, but that rule explicitly does not apply to
the U& cases.)
As a related issue, we don't allow comments within the <separator> that
splits a multiline literal, even though the spec certainly allows those
(arguably, since the spec defines that comments are equivalent to
newlines, "select 'foo' /**/ 'bar';" should be legal too).
I've put up a summary of all these at
https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL_Standard#Lexing_of_string_literals_and_comments
(under the assumption that the whole issue is filed under WONTFIX at
least for the time being)
--
Andrew (irc:RhodiumToad)