Thread: UNION JOIN vs UNION SELECT
I noticed you've got some really ugly stuff in gram.y to handleSELECT * FROM foo UNION JOIN bar which has a shift/reduce conflict withSELECT * FROM foo UNION SELECT * FROM bar Looks like you resolved this by requiring parens around a UNION JOIN construct. So, aside from being ugly, this fails to meet the SQL92 spec (nothing about parens there...). This is another case where a one-token lookahead between the lexer and parser would make life a lot easier: we could replace UNION JOIN with a single UNIONJOIN token and thereby eliminate the shift-reduce conflict. You'll probably recall that the ambiguity between NOT NULL and NOT DEFERRABLE gave us similar problems. We were able to get around that by pretending NOT DEFERRABLE is an independent clause and leaving some of the parsing work to be done by analyze.c, but I don't think that trick will work here. I seem to recall a third case where a lookahead would have helped, but can't find the details in the archives right now. I think it's time to bite the bullet and put in a lookahead filter. What say you? regards, tom lane
> I noticed you've got some really ugly stuff in gram.y to handle > SELECT * FROM foo UNION JOIN bar > I think it's time to bite the bullet and put in a lookahead filter. > What say you? *sigh* Probably right. The UNION vs UNION JOIN stuff illustrates it pretty well. I haven't tried assigning precedence levels to these tokens or to those subclauses; would that help to resolve the conflicts? - Thomas
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >> I think it's time to bite the bullet and put in a lookahead filter. >> What say you? > *sigh* Probably right. The UNION vs UNION JOIN stuff illustrates it > pretty well. I haven't tried assigning precedence levels to these tokens > or to those subclauses; would that help to resolve the conflicts? I don't see how. The real problem is that given SELECT * FROM foo UNION ... ^ parsing here you don't know whether to reduce what you have to select_clause (as you must if what follows is UNION SELECT) or shift (as you must if you want to parse "foo UNION JOIN bar" as part of the FROM-clause). Precedence will not help: the grammar is just plain not LR(1) unless you count UNION JOIN as a single token. It's barely possible that we could redesign our grammar to avoid needing to make a shift-reduce decision here, but it would be so ugly and nonintuitive that I can't see that as being a better answer than a lookahead filter. We should use precedence to implement ISO's distinction in the precedence of UNION, INTERSECT, and EXCEPT (we get that wrong currently), but I don't see how it helps for the UNION vs UNION JOIN issue. Quite apropos of this: now that we are committed to assuming our lexer is flex, does anyone object to using flex's -P option to customize the yyfoo() names emitted by flex? That seems cleaner to me than the sed-script kluges we currently rely on. regards, tom lane
> the grammar is just plain not LR(1) unless you > count UNION JOIN as a single token. Would it be bad to make UNION JOIN as a single token?
Chris <chrisb@nimrod.itg.telstra.com.au> writes: >> the grammar is just plain not LR(1) unless you >> count UNION JOIN as a single token. > Would it be bad to make UNION JOIN as a single token? That's exactly the solution I'm proposing. However, it's pretty painful to make the lexer do it directly (consider intervening comments, for example) so what I have in mind is a filter between the parser and lexer that does one-token lookahead when it finds a UNION token. If next token is JOIN, pass back just one UNIONJOIN token, else stash away the second token to be returned on next call from parser. regards, tom lane
Tom Lane wrote: > > Chris <chrisb@nimrod.itg.telstra.com.au> writes: > >> the grammar is just plain not LR(1) unless you > >> count UNION JOIN as a single token. > > > Would it be bad to make UNION JOIN as a single token? > > That's exactly the solution I'm proposing. However, it's pretty painful > to make the lexer do it directly (consider intervening comments, for > example) Comments are a pain in the parser. What if something prior to the lexer filtered out comments before either the lexer or parser could see them? Would it be as easy as s/--.*// before the lexer?
To answer my own question, of course that's no good because there are constants and other stuff. Another suggestion, could we take the SQL standards group out the back and have them flogged? :-) > > Tom Lane wrote: > > > > Chris <chrisb@nimrod.itg.telstra.com.au> writes: > > >> the grammar is just plain not LR(1) unless you > > >> count UNION JOIN as a single token. > > > > > Would it be bad to make UNION JOIN as a single token? > > > > That's exactly the solution I'm proposing. However, it's pretty painful > > to make the lexer do it directly (consider intervening comments, for > > example) > > Comments are a pain in the parser. What if something prior to the lexer > filtered out comments before either the lexer or parser could see them? > Would it be as easy as s/--.*// before the lexer?
> > > > That's exactly the solution I'm proposing. However, it's pretty painful > > to make the lexer do it directly (consider intervening comments, for > > example) > > Comments are a pain in the parser. What if something prior to the lexer > filtered out comments before either the lexer or parser could see them? > Would it be as easy as s/--.*// before the lexer? It probably wouldn't be that simple, but I do think that the solution is sound. Such a design is recommended by the Dragon book and has the benefit of simplifying both the lexer and the parser. -Andy
> You'll probably recall that the ambiguity between NOT NULL and NOT > DEFERRABLE gave us similar problems. We were able to get around that > by pretending NOT DEFERRABLE is an independent clause and leaving some > of the parsing work to be done by analyze.c, but I don't think that > trick will work here. > > I seem to recall a third case where a lookahead would have helped, > but can't find the details in the archives right now. > > I think it's time to bite the bullet and put in a lookahead filter. > What say you? Hmmm. Not real excited about that for performance reasons. Other options? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> I think it's time to bite the bullet and put in a lookahead filter. >> What say you? > Hmmm. Not real excited about that for performance reasons. Other options? It's been in there for a month. I'll bet lunch you will be unable to measure any performance cost --- one extra function call and if-test per token lexed is just not going to show on the radar screen. regards, tom lane