RE: [HACKERS] Postgres' lexer - Mailing list pgsql-hackers

From Ansley, Michael
Subject RE: [HACKERS] Postgres' lexer
Date
Msg-id 1BF7C7482189D211B03F00805F8527F748C02C@S-NATH-EXCH2
Whole thread Raw
List pgsql-hackers
>> > To my mind, without spaces this construction *is* ambiguous, and
frankly
>> > I'd have expected the second interpretation ('+-' is a single operator
>> > name).  Almost every computer language in the world uses "greedy"
>> > tokenization where the next token is the longest series of characters
>> > that can validly be a token.  I don't regard the above behavior as
>> > predictable, natural, nor obvious.  In fact, I'd say it's a bug that
>> > "3+-2" and "3+-x" are not lexed in the same way.
>> > 
>> 
>> Completely agree with that. This differentiating behavior looks like a
bug.
>> 
>> > However, aside from arguing about whether the current behavior is good
>> > or bad, these examples seem to indicate that it doesn't take an
infinite
>> > amount of lookahead to reproduce the behavior.  It looks to me like we
>> > could preserve the current behavior by parsing a '-' as a separate
token
>> > if it *immediately* precedes a digit, and otherwise allowing it to be
>> > folded into the preceding operator.  That could presumably be done
>> > without VLTC.
>> 
>> Ok. If we *have* to preserve old weird behavior, here is the patch.
>> It is to be applied over all my other patches. Though if I were to
>> decide whether to restore old behavior, I wouldn't do it. Because it
>> is inconsistency in grammar, i.e. a bug.
>> 
If a construct is ambiguous, then the behaviour should be undefined (i.e.:
we can do what we like, within reason).  If the user wants something
predictable, then she should use brackets ;-)

If 3+-2 presents an ambiguity (which it does) then make sure that you do
this: 3+(-2).  If you have an operator +- then you should do this (3)+-(2).
However, if you have 3+-2 without brackets, then, because this is ambiguous
(assuming no +- operator), this is undefined, and we can do pretty much
whatever we feel like with it.  Unless there is an operator +- defined,
because then the behaviour is no longer ambiguous.  The longest possible
identifier is always matched, and this means that the +- will be identified.

Especially with the unary minus, my feeling is that it should be placed in
brackets if correct behaviour is desired.

MikeA



pgsql-hackers by date:

Previous
From: The Hermit Hacker
Date:
Subject: RE: [HACKERS] md.c is feeling much better now, thank you
Next
From: Christian Denning
Date:
Subject: Re: Linux/Postgres 6.5 problems using jdbc w/jdk1.2