>> > To my mind, without spaces this construction *is* ambiguous, and
frankly
>> > I'd have expected the second interpretation ('+-' is a single operator
>> > name). Almost every computer language in the world uses "greedy"
>> > tokenization where the next token is the longest series of characters
>> > that can validly be a token. I don't regard the above behavior as
>> > predictable, natural, nor obvious. In fact, I'd say it's a bug that
>> > "3+-2" and "3+-x" are not lexed in the same way.
>> >
>>
>> Completely agree with that. This differentiating behavior looks like a
bug.
>>
>> > However, aside from arguing about whether the current behavior is good
>> > or bad, these examples seem to indicate that it doesn't take an
infinite
>> > amount of lookahead to reproduce the behavior. It looks to me like we
>> > could preserve the current behavior by parsing a '-' as a separate
token
>> > if it *immediately* precedes a digit, and otherwise allowing it to be
>> > folded into the preceding operator. That could presumably be done
>> > without VLTC.
>>
>> Ok. If we *have* to preserve old weird behavior, here is the patch.
>> It is to be applied over all my other patches. Though if I were to
>> decide whether to restore old behavior, I wouldn't do it. Because it
>> is inconsistency in grammar, i.e. a bug.
>>
If a construct is ambiguous, then the behaviour should be undefined (i.e.:
we can do what we like, within reason). If the user wants something
predictable, then she should use brackets ;-)
If 3+-2 presents an ambiguity (which it does) then make sure that you do
this: 3+(-2). If you have an operator +- then you should do this (3)+-(2).
However, if you have 3+-2 without brackets, then, because this is ambiguous
(assuming no +- operator), this is undefined, and we can do pretty much
whatever we feel like with it. Unless there is an operator +- defined,
because then the behaviour is no longer ambiguous. The longest possible
identifier is always matched, and this means that the +- will be identified.
Especially with the unary minus, my feeling is that it should be placed in
brackets if correct behaviour is desired.
MikeA