Thread: RE: [HACKERS] Postgres' lexer

RE: [HACKERS] Postgres' lexer

From
"Ansley, Michael"
Date:
Leon wrote:
>> Ok. Especially if there are more unary operators (I always wondered
>> what unary % in gram.y stands for :)  it is reasonable not to make
>> a special case of uminus and slightly change the old behavior. That
>> is even more convincing that constructs like 3+-2 and 3+-b were 
>> parsed in different way, and, what is worse, a>-2 and a>-b also
>> parsed differently. So let us ask the (hopefully) last question:
>> Thomas (Lockhart), do you agree on always parsing constructs like
>> '+-' or '>-' as is, and not as '+' '-' or '>' '-'  ?
This construct doesn't always make sense.  It should only be recognised as a
'>-' if that operator exists, otherwise it should be either generate an
error (which is reasonable because of the ambiguity that it creates (not for
this operator, but for the general case)), or try to complete (if that's
possible).  I have a bit of a problem with reading this: a > -2 correctly,
while not reading this: a>-2 correctly, because that implies that you are
using the space as a precedence operator.  This should be done by braces.
This: a > (-2) is totally unambiguous, spaces or no spaces.

Perhaps there is a general case for where unary operators are allowed to
appear, and we can use this, e.g.: they can only appear at the beginning of
an expression, or immediately after another operator (ignoring spaces).
This means that >- will be scanned as an operator if it exists, or a >
followed by a unary minus if >- doesn't exist as an operator.  And this
removes some ambiguity, because now we have a defined rule: if the - doesn't
appear at the beginning of an expression, or immediately (ignoring spaces)
after another operator, then it must be a binary minus.



MikeA


Re: [HACKERS] Postgres' lexer

From
Tom Lane
Date:
"Ansley, Michael" <Michael.Ansley@intec.co.za> writes:
> I have a bit of a problem with reading this: a > -2 correctly,
> while not reading this: a>-2 correctly, because that implies that you are
> using the space as a precedence operator.  This should be done by braces.

Not at all: this is a strictly lexical issue (where do we divide the
input into tokens) and whitespace has been considered a reasonable
lexical separator for years.  Furthermore, SQL already depends on
whitespace to separate tokens that are made of letters and digits.
You can't spell "SELECT" as "SEL ECT", nor "SELECT f1" as "SELECTf1",
nor does "SELECT 1 2;" mean "SELECT 12;".  So it seems perfectly
reasonable to me to use whitespace to separate operator names when
there would otherwise be ambiguity about what's meant.

> This: a > (-2) is totally unambiguous, spaces or no spaces.

True, and there's nothing to stop you from writing that style if you
prefer it.

> Perhaps there is a general case for where unary operators are allowed to
> appear, and we can use this, e.g.: they can only appear at the beginning of
> an expression, or immediately after another operator (ignoring spaces).

Don't forget about right-unary operators...

> This means that >- will be scanned as an operator if it exists, or a >
> followed by a unary minus if >- doesn't exist as an operator.

I think it would be a really bad idea for the lexical analysis to depend
on whether or not particular operator names are defined, for the same
reasons that lexical analysis of word tokens doesn't depend on whether
there are keywords/table names/field names that match those tokens.
You get into circularity problems very quickly if you do that.
Language designers learned not to do that in the sixties...
        regards, tom lane


Re: [HACKERS] Postgres' lexer

From
Leon
Date:
Tom Lane wrote:
> 
> I think it would be a really bad idea for the lexical analysis to depend
> on whether or not particular operator names are defined, for the same
> reasons that lexical analysis of word tokens doesn't depend on whether
> there are keywords/table names/field names that match those tokens.

101% correct :)

> You get into circularity problems very quickly if you do that.
> Language designers learned not to do that in the sixties...
> 

All that should be carved in stone and then erected as a monument :)
It is a good idea to explicitly state where and how to divide 
functions amongst components - though it places some (minor) 
restrictions, it introduces an conceivable order, which one can
abide by. E.g. no semantics is allowed in lexer. Even unary minus
in numbers is semantics and isn't proper for lexer. 

-- 
Leon.