Re: BUG #15273: Lexer bug with UESCAPE - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #15273: Lexer bug with UESCAPE
Date
Msg-id 23850.1531252251@sss.pgh.pa.us
Whole thread Raw
In response to BUG #15273: Lexer bug with UESCAPE  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re: BUG #15273: Lexer bug with UESCAPE  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
List pgsql-bugs
=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
> SELECT U&'a' /*c1*/ UESCAPE /*c2*/ 'x';
> ERROR:  syntax error at or near "'x'"
> LINE 1: SELECT U&'a' /*c1*/ UESCAPE /*c2*/ 'x';

> I think the former is a bug, as, per ISO SQL, a comment is equivalent to
> whitespace (with newline), and therefore, should be ignored here.

I'd classify this as "won't fix".  It'd require pretty significant bloat
in the lexer rules to make it happen, and it doesn't really seem worth it.

Also, I'm going to push back on the claim that allowing comments there
is required by the SQL spec.  The relevant rules in SQL:2011 are

<Unicode character string literal> ::=
  [ <introducer> <character set specification> ]
      U <ampersand> <quote> [ <Unicode representation>... ] <quote>
      [ { <separator> <quote> [ <Unicode representation>... ] <quote> }... ]
      <Unicode escape specifier>

<Unicode escape specifier> ::=
  [ UESCAPE <quote> <Unicode escape character> <quote> ]

I do not see any principled way of arguing that these rules require
comments to be allowed adjacent to UESCAPE without also claiming
that they must be allowed between, say, the initial 'U' and the
ampersand.  The only place these rules allow a <separator> is
between segments of a multiline literal.  It looks to me like an
extension that we even allow whitespace around UESCAPE.

            regards, tom lane


pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #15273: Lexer bug with UESCAPE
Next
From: Tom Lane
Date:
Subject: Re: Fwd: Problem with a "complex" upsert