Thread: Parsing ambiguity for ORDER BY ... NULLS FIRST/LAST

Parsing ambiguity for ORDER BY ... NULLS FIRST/LAST

From
Tom Lane
Date:
I've run into an annoying parsing problem while working on the NULLS
FIRST/LAST stuff.  It turns out this is ambiguous:
... ORDER BY foo! NULLS ...

It could be that the "!" is a postfix operator and the "NULLS" is the
start of a NULLS FIRST/LAST clause, or it could be that the "!" is an
infix operator and "NULLS" is just a column name.  Bison needs to know
which before shifting the "NULLS", since in the former case it has to
reduce <a_expr> at this point.

The problem would go away if we made NULLS a fully reserved word, but
I don't think that's acceptable from a backward-compatibility or
spec-compliance standpoint, considering it's not a keyword at all in any
pre-2003 SQL spec and even in SQL2003 it's a non-reserved keyword.

The only other solution I can see is to make use of the lookahead filter
we already have in filtered_base_yylex() to combine NULLS FIRST and
NULLS LAST into single tokens.  This is not an ideal solution: consider
SELECT * FROM nulls first;

This should be considered a valid selection from a relation named "nulls"
with alias "first", but if it's reduced to a single token the grammar
will not see it that way, and will give an error.  However, that's a
sufficiently unlikely scenario that maybe we can just ignore it.  (It's
possible to work around the case by inserting AS, of course.)  We could
probably fix it if we really had to, but it would involve some pretty
ugly coding AFAICS.

BTW: the existing lookahead hack for WITH CASCADED etc. has the
identical problem.

BTW^2: the draft patch Teodor submitted awhile back tries to paper over
this problem by attaching a precedence to "NULLS".  But that does not
fix it, it just makes the parser ignore the possibility of infix "!"
in the example.  This would result in bizarre behavior anytime someone
tried to use a column named "nulls" in arithmetic expressions.

Thoughts?  Anyone see a better way?
        regards, tom lane


Re: Parsing ambiguity for ORDER BY ... NULLS FIRST/LAST

From
"Simon Riggs"
Date:
On Fri, 2007-01-05 at 20:19 -0500, Tom Lane wrote:

> The only other solution I can see is to make use of the lookahead filter
> we already have in filtered_base_yylex() to combine NULLS FIRST and
> NULLS LAST into single tokens.  This is not an ideal solution: consider
> 
>     SELECT * FROM nulls first;
> 
> This should be considered a valid selection from a relation named "nulls"
> with alias "first", but if it's reduced to a single token the grammar
> will not see it that way, and will give an error.  However, that's a
> sufficiently unlikely scenario that maybe we can just ignore it.  (It's
> possible to work around the case by inserting AS, of course.)  We could
> probably fix it if we really had to, but it would involve some pretty
> ugly coding AFAICS.
> 
> BTW: the existing lookahead hack for WITH CASCADED etc. has the
> identical problem.

Since we already have that problem, it seems sensible to go that way
with the NULLS FIRST issue.

Presumably you could put extra grammar rules in to throw errors when you
see FROM NULLS FIRST? Or should we just document it, somewhere?

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: Parsing ambiguity for ORDER BY ... NULLS FIRST/LAST

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> Presumably you could put extra grammar rules in to throw errors when you
> see FROM NULLS FIRST?

It'll throw an error just fine without any extra rules, because there
won't be any production allowing the NULLS_FIRST pseudo-token there.
You already see this in 8.2:

regression=# select * from with cascaded;
ERROR:  syntax error at or near "cascaded"
LINE 1: select * from with cascaded;                          ^
regression=#

One of the possibilities for fixing it is to add productions that
allow table_ref to expand to NULLS_FIRST, WITH_CASCADED, and the
other two-word pseudo-tokens, and then build the appropriate
relation-with-alias syntax tree out of whole cloth.  I find this pretty
ugly though, and I'm not sure that table_ref would be the only place
to fix, so I'm inclined not to do it unless we actually get complaints
from the field.  (The other avenue for fixing it would be to try to give
the lookahead filter enough context to know when not to combine the
tokens, but I think that way will probably be unworkably convoluted.)

Oh BTW, there's an interesting bug here: the expected workaround
doesn't work:

regression=# select * from with as cascaded;
ERROR:  relation "as" does not exist
regression=#

It should be complaining about "with" not "as".  I think that the
lookahead filter is getting out of sync somehow.
        regards, tom lane


Re: Parsing ambiguity for ORDER BY ... NULLS FIRST/LAST

From
Tom Lane
Date:
I wrote:
> One of the possibilities for fixing it is to add productions that
> allow table_ref to expand to NULLS_FIRST, WITH_CASCADED, and the
> other two-word pseudo-tokens, and then build the appropriate
> relation-with-alias syntax tree out of whole cloth.  I find this pretty
> ugly though, and I'm not sure that table_ref would be the only place
> to fix, so I'm inclined not to do it unless we actually get complaints
> from the field.

Actually, that way doesn't work, because for example it would fixselect * from with cascaded;
but notselect * from public.with cascaded;
So my suspicion that there'd be too many places to fix seems justified.

> (The other avenue for fixing it would be to try to give
> the lookahead filter enough context to know when not to combine the
> tokens, but I think that way will probably be unworkably convoluted.)

This idea might work though.  In particular, I think things would work
nicely if we could make the filter return the special symbols like
WITH_CASCADED only when the parser is in a state where such a symbol
could be accepted.  This appears to be possible if one is willing to get
intimate enough with the internals of the Bison parser ... but that's
something I'd prefer not to do, as it'd likely tie us to specific Bison
versions to a much greater degree than we are now.  At the moment, the
number of cases where word pairs could act unexpectedly is small enough
that I think we can just live with it.
        regards, tom lane


Re: Parsing ambiguity for ORDER BY ... NULLS FIRST/LAST

From
"Simon Riggs"
Date:
On Sat, 2007-01-06 at 13:05 -0500, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > Presumably you could put extra grammar rules in to throw errors when you
> > see FROM NULLS FIRST?
> 
> It'll throw an error just fine without any extra rules, because there
> won't be any production allowing the NULLS_FIRST pseudo-token there.
> You already see this in 8.2:
> 
> regression=# select * from with cascaded;
> ERROR:  syntax error at or near "cascaded"
> LINE 1: select * from with cascaded;
>                            ^

OK, Sounds good.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com