Thread: Lexical Structure - String Constants
Hi,
I'm trying to build in Java a SQL lexer/parser, compliant with PostgreSQL 9.3, from scratch as a hobby project and reading chapter 4, section 4.1 (http://www.postgresql.org/docs/9.3/interactive/sql-syntax-lexical.html) and I've noticed a few things I thought I should mention:
In section 4.1.2.1, the following text introduces us to SQL's bizarre multiline/multisegment split style: "Two string constants that are only separated by whitespace with at least one newline are concatenated and effectively treated as if the string had been written as one constant."
The text does not mention if comments are allowed between segments, so I've run a few tests on PSQL (PostgreSQL 9.3.4):
version
------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-16ubuntu6) 4.8.2, 64-bit
(1 row)
postgres=# SELECT 'a'
'b';
?column?
----------
ab
(1 row)
postgres=# SELECT 'a' --comment
'b';
?column?
----------
ab
(1 row)
So far everything worked, but I've got different results with C style block comments:
postgres=# SELECT 'a' /*comment*/
'b';
ERROR: syntax error at or near "'b'"
LINE 2: 'b';
So line style comments (--) are accepted between segments but not C style block comments (/* */). Do you think this difference in behavior should me mentioned in the docs?
I've also noticed that in section 4.1.2.6, the following statement: "At least one digit must follow the exponent marker (e), if one is present."
As I've understood the statement, I think it says that the following instruction should not be valid because the exponent marker is not followed by at least one digit, but the expression is successfully evaluated:
postgres=# SELECT 10e;
e
----
10
(1 row)
That said, I live in Brazil and English is not my first language so I may be mistaken, but I thought I should bring this to this list.
Regards,
Sérgio Saquetim
=?UTF-8?Q?S=C3=A9rgio_Saquetim?= <sergiosaquetim@gmail.com> writes: > So line style comments (--) are accepted between segments but not C style > block comments (/* */). Do you think this difference in behavior should me > mentioned in the docs? Hm, interesting. It looks to me like modern versions of the SQL spec require either -- or /* ... */ style comments to be allowed between segments of a quoted literal. This is pretty bad taste in language design, if you ask me, but that's what it seems to say. I think that our current lexer rules date from before the SQL standard even had /* ... */ style comments, which is why the lexer isn't taking it. > I've also noticed that in section 4.1.2.6, the following statement: "At > least one digit must follow the exponent marker (e), if one is present." > As I've understood the statement, I think it says that the following > instruction should not be valid because the exponent marker is not followed > by at least one digit, but the expression is successfully evaluated: > postgres=# SELECT 10e; > e > ---- > 10 > (1 row) "10e" is not a valid number, just like the manual says. But "10" is a valid number, and "e" is a valid column alias, so this is equivalent to "SELECT 10 AS e". There's no requirement for white space between adjacent tokens, if the tokens couldn't validly be run together into one token. regards, tom lane
> "10e" is not a valid number, just like the manual says. But "10" is a
> valid number, and "e" is a valid column alias, so this is equivalent
> to "SELECT 10 AS e". There's no requirement for white space between
> adjacent tokens, if the tokens couldn't validly be run together into
> one token.
Thanks Tom,
I haven't noticed that fact. I'll refactor my lexer to deal with that.
Regards,
Sérgio Saquetim