Thread: UNION JOIN vs UNION SELECT

UNION JOIN vs UNION SELECT

From

Tom Lane

Date:

26 August 2000, 19:37:23

I noticed you've got some really ugly stuff in gram.y to handleSELECT * FROM foo UNION JOIN bar
which has a shift/reduce conflict withSELECT * FROM foo UNION SELECT * FROM bar
Looks like you resolved this by requiring parens around a UNION JOIN
construct.  So, aside from being ugly, this fails to meet the SQL92
spec (nothing about parens there...).

This is another case where a one-token lookahead between the lexer
and parser would make life a lot easier: we could replace UNION JOIN
with a single UNIONJOIN token and thereby eliminate the shift-reduce
conflict.

You'll probably recall that the ambiguity between NOT NULL and NOT
DEFERRABLE gave us similar problems.  We were able to get around that
by pretending NOT DEFERRABLE is an independent clause and leaving some
of the parsing work to be done by analyze.c, but I don't think that
trick will work here.

I seem to recall a third case where a lookahead would have helped,
but can't find the details in the archives right now.

I think it's time to bite the bullet and put in a lookahead filter.
What say you?
        regards, tom lane

Re: UNION JOIN vs UNION SELECT

From

Thomas Lockhart

Date:

28 August 2000, 12:04:10

> I noticed you've got some really ugly stuff in gram.y to handle
>         SELECT * FROM foo UNION JOIN bar
> I think it's time to bite the bullet and put in a lookahead filter.
> What say you?

*sigh* Probably right. The UNION vs UNION JOIN stuff illustrates it
pretty well. I haven't tried assigning precedence levels to these tokens
or to those subclauses; would that help to resolve the conflicts?
                 - Thomas

Re: Re: UNION JOIN vs UNION SELECT

From

Tom Lane

Date:

28 August 2000, 14:08:11

Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> I think it's time to bite the bullet and put in a lookahead filter.
>> What say you?

> *sigh* Probably right. The UNION vs UNION JOIN stuff illustrates it
> pretty well. I haven't tried assigning precedence levels to these tokens
> or to those subclauses; would that help to resolve the conflicts?

I don't see how.  The real problem is that given
SELECT * FROM foo   UNION ...                         ^ parsing here

you don't know whether to reduce what you have to select_clause
(as you must if what follows is UNION SELECT) or shift (as you must
if you want to parse "foo UNION JOIN bar" as part of the FROM-clause).
Precedence will not help: the grammar is just plain not LR(1) unless you
count UNION JOIN as a single token.  It's barely possible that we could
redesign our grammar to avoid needing to make a shift-reduce decision
here, but it would be so ugly and nonintuitive that I can't see that as
being a better answer than a lookahead filter.

We should use precedence to implement ISO's distinction in the
precedence of UNION, INTERSECT, and EXCEPT (we get that wrong
currently), but I don't see how it helps for the UNION vs UNION JOIN
issue.

Quite apropos of this: now that we are committed to assuming our lexer
is flex, does anyone object to using flex's -P option to customize the
yyfoo() names emitted by flex?  That seems cleaner to me than the
sed-script kluges we currently rely on.
        regards, tom lane

Re: Re: UNION JOIN vs UNION SELECT

From

Chris

Date:

28 August 2000, 20:13:51

> the grammar is just plain not LR(1) unless you
> count UNION JOIN as a single token. 

Would it be bad to make UNION JOIN as a single token?

Re: Re: UNION JOIN vs UNION SELECT

From

Tom Lane

Date:

28 August 2000, 21:04:49

Chris <chrisb@nimrod.itg.telstra.com.au> writes:
>> the grammar is just plain not LR(1) unless you
>> count UNION JOIN as a single token. 

> Would it be bad to make UNION JOIN as a single token?

That's exactly the solution I'm proposing.  However, it's pretty painful
to make the lexer do it directly (consider intervening comments, for
example) so what I have in mind is a filter between the parser and lexer
that does one-token lookahead when it finds a UNION token.  If next
token is JOIN, pass back just one UNIONJOIN token, else stash away the
second token to be returned on next call from parser.
        regards, tom lane

Re: Re: UNION JOIN vs UNION SELECT

From

Chris

Date:

28 August 2000, 21:08:05

Tom Lane wrote:
> 
> Chris <chrisb@nimrod.itg.telstra.com.au> writes:
> >> the grammar is just plain not LR(1) unless you
> >> count UNION JOIN as a single token.
> 
> > Would it be bad to make UNION JOIN as a single token?
> 
> That's exactly the solution I'm proposing.  However, it's pretty painful
> to make the lexer do it directly (consider intervening comments, for
> example)

Comments are a pain in the parser. What if something prior to the lexer
filtered out comments before either the lexer or parser could see them?
Would it be as easy as s/--.*// before the lexer?

Re: Re: UNION JOIN vs UNION SELECT

From

Chris

Date:

28 August 2000, 21:20:20

To answer my own question, of course that's no good because there are
constants and other stuff. Another suggestion, could we take the SQL
standards group out the back and have them flogged? :-)

> 
> Tom Lane wrote:
> >
> > Chris <chrisb@nimrod.itg.telstra.com.au> writes:
> > >> the grammar is just plain not LR(1) unless you
> > >> count UNION JOIN as a single token.
> >
> > > Would it be bad to make UNION JOIN as a single token?
> >
> > That's exactly the solution I'm proposing.  However, it's pretty painful
> > to make the lexer do it directly (consider intervening comments, for
> > example)
> 
> Comments are a pain in the parser. What if something prior to the lexer
> filtered out comments before either the lexer or parser could see them?
> Would it be as easy as s/--.*// before the lexer?

Re: Re: UNION JOIN vs UNION SELECT

From

Andrew Selle

Date:

28 August 2000, 23:12:23

> > 
> > That's exactly the solution I'm proposing.  However, it's pretty painful
> > to make the lexer do it directly (consider intervening comments, for
> > example)
> 
> Comments are a pain in the parser. What if something prior to the lexer
> filtered out comments before either the lexer or parser could see them?
> Would it be as easy as s/--.*// before the lexer?

It probably wouldn't be that simple, but I do think that the solution
is sound.  Such a design is recommended by the Dragon book and has
the benefit of simplifying both the lexer and the parser.

-Andy

Re: UNION JOIN vs UNION SELECT

From

Bruce Momjian

Date:

15 October 2000, 23:53:03

> You'll probably recall that the ambiguity between NOT NULL and NOT
> DEFERRABLE gave us similar problems.  We were able to get around that
> by pretending NOT DEFERRABLE is an independent clause and leaving some
> of the parsing work to be done by analyze.c, but I don't think that
> trick will work here.
> 
> I seem to recall a third case where a lookahead would have helped,
> but can't find the details in the archives right now.
> 
> I think it's time to bite the bullet and put in a lookahead filter.
> What say you?

Hmmm.  Not real excited about that for performance reasons.  Other options?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: UNION JOIN vs UNION SELECT

From

Tom Lane

Date:

15 October 2000, 23:59:24

Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> I think it's time to bite the bullet and put in a lookahead filter.
>> What say you?

> Hmmm.  Not real excited about that for performance reasons.  Other options?

It's been in there for a month.  I'll bet lunch you will be unable to
measure any performance cost --- one extra function call and if-test
per token lexed is just not going to show on the radar screen.
        regards, tom lane