Thread: RE: [HACKERS] Postgres' lexer
Leon wrote: >> Ok. Especially if there are more unary operators (I always wondered >> what unary % in gram.y stands for :) it is reasonable not to make >> a special case of uminus and slightly change the old behavior. That >> is even more convincing that constructs like 3+-2 and 3+-b were >> parsed in different way, and, what is worse, a>-2 and a>-b also >> parsed differently. So let us ask the (hopefully) last question: >> Thomas (Lockhart), do you agree on always parsing constructs like >> '+-' or '>-' as is, and not as '+' '-' or '>' '-' ? This construct doesn't always make sense. It should only be recognised as a '>-' if that operator exists, otherwise it should be either generate an error (which is reasonable because of the ambiguity that it creates (not for this operator, but for the general case)), or try to complete (if that's possible). I have a bit of a problem with reading this: a > -2 correctly, while not reading this: a>-2 correctly, because that implies that you are using the space as a precedence operator. This should be done by braces. This: a > (-2) is totally unambiguous, spaces or no spaces. Perhaps there is a general case for where unary operators are allowed to appear, and we can use this, e.g.: they can only appear at the beginning of an expression, or immediately after another operator (ignoring spaces). This means that >- will be scanned as an operator if it exists, or a > followed by a unary minus if >- doesn't exist as an operator. And this removes some ambiguity, because now we have a defined rule: if the - doesn't appear at the beginning of an expression, or immediately (ignoring spaces) after another operator, then it must be a binary minus. MikeA
"Ansley, Michael" <Michael.Ansley@intec.co.za> writes: > I have a bit of a problem with reading this: a > -2 correctly, > while not reading this: a>-2 correctly, because that implies that you are > using the space as a precedence operator. This should be done by braces. Not at all: this is a strictly lexical issue (where do we divide the input into tokens) and whitespace has been considered a reasonable lexical separator for years. Furthermore, SQL already depends on whitespace to separate tokens that are made of letters and digits. You can't spell "SELECT" as "SEL ECT", nor "SELECT f1" as "SELECTf1", nor does "SELECT 1 2;" mean "SELECT 12;". So it seems perfectly reasonable to me to use whitespace to separate operator names when there would otherwise be ambiguity about what's meant. > This: a > (-2) is totally unambiguous, spaces or no spaces. True, and there's nothing to stop you from writing that style if you prefer it. > Perhaps there is a general case for where unary operators are allowed to > appear, and we can use this, e.g.: they can only appear at the beginning of > an expression, or immediately after another operator (ignoring spaces). Don't forget about right-unary operators... > This means that >- will be scanned as an operator if it exists, or a > > followed by a unary minus if >- doesn't exist as an operator. I think it would be a really bad idea for the lexical analysis to depend on whether or not particular operator names are defined, for the same reasons that lexical analysis of word tokens doesn't depend on whether there are keywords/table names/field names that match those tokens. You get into circularity problems very quickly if you do that. Language designers learned not to do that in the sixties... regards, tom lane
Tom Lane wrote: > > I think it would be a really bad idea for the lexical analysis to depend > on whether or not particular operator names are defined, for the same > reasons that lexical analysis of word tokens doesn't depend on whether > there are keywords/table names/field names that match those tokens. 101% correct :) > You get into circularity problems very quickly if you do that. > Language designers learned not to do that in the sixties... > All that should be carved in stone and then erected as a monument :) It is a good idea to explicitly state where and how to divide functions amongst components - though it places some (minor) restrictions, it introduces an conceivable order, which one can abide by. E.g. no semantics is allowed in lexer. Even unary minus in numbers is semantics and isn't proper for lexer. -- Leon.