Thread: Token separation
Hi, I just tried to input a hexadecimal number in PostgreSQL (8.4) and was rather surprised by the result: | tim=# SELECT 0x13; | x13 | ----- | 0 | (1 Zeile) | tim=# SELECT 0abc; | abc | ----- | 0 | (1 Zeile) | tim=# The documentation says: | A token can be a key word, an identifier, a quoted identifi- | er, a literal (or constant), or a special character symbol. | Tokens are normally separated by whitespace (space, tab, | newline), but need not be if there is no ambiguity (which is | generally only the case if a special character is adjacent | to some other token type). Is this behaviour really conforming to the standard? Even stranger is what MySQL (5.1.59) makes out of it: | mysql> SELECT 0x40; | +------+ | | 0x40 | | +------+ | | @ | | +------+ | 1 row in set (0.00 sec) | mysql> SELECT 0abc; | ERROR 1054 (42S22): Unknown column '0abc' in 'field list' | mysql> Tim
Tim Landscheidt <tim@tim-landscheidt.de> writes: > [ "0x13" is lexed as "0" then "x13" ] > Is this behaviour really conforming to the standard? Well, it's pretty much the universal behavior of flex-based lexers, anyway. A token ends when the next character can no longer sensibly be added to it. Possibly the documentation should be tweaked to mention the number-followed-by-identifier case. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: >> [ "0x13" is lexed as "0" then "x13" ] >> Is this behaviour really conforming to the standard? > Well, it's pretty much the universal behavior of flex-based lexers, > anyway. A token ends when the next character can no longer sensibly > be added to it. I know, but - off the top of my head - in most other lan- guages "0abc" will then give a syntax error. > Possibly the documentation should be tweaked to mention the > number-followed-by-identifier case. Especially if you consider such cases: | tim=# SELECT 1D1; SELECT 1E1; SELECT 1F1; | d1 | ---- | 1 | (1 Zeile) | ?column? | ---------- | 10 | (1 Zeile) | f1 | ---- | 1 | (1 Zeile) | tim=# I don't think it's common to hit this, but the documentation surely could use a caveat. I will write something up and submit it to -docs. Thanks, Tim
On 2012-01-16, Tim Landscheidt <tim@tim-landscheidt.de> wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> [ "0x13" is lexed as "0" then "x13" ] > >>> Is this behaviour really conforming to the standard? > >> Well, it's pretty much the universal behavior of flex-based lexers, >> anyway. A token ends when the next character can no longer sensibly >> be added to it. > > I know, but - off the top of my head - in most other lan- > guages "0abc" will then give a syntax error. In most other languages "0 abc" would also be a syntax error.0and doesn't give a syntax error in phpeg: <? echo 0and 0;?> -- ⚂⚃ 100% natural