Thread: Some surprising precedence behavior in PG's grammar

Some surprising precedence behavior in PG's grammar

From
Tom Lane
Date:
While looking at the grammar's operator-precedence declarations in
connection with a recent pgsql-docs question, it struck me that this
declaration is a foot-gun waiting to go off:

%nonassoc    IS NULL_P TRUE_P FALSE_P UNKNOWN /* sets precedence for IS NULL, etc */

The only terminal that we actually need to attach precedence to for
IS NULL and related productions is "IS".  The others are just listed
there to save attaching explicit %prec declarations to those productions.
This seems like a bad idea, because attaching a precedence to a terminal
symbol that doesn't absolutely have to have one is just asking for
trouble: it can cause bison to accept grammars that would better have
provoked a shift/reduce error, and to silently resolve the ambiguity in
a way that you maybe didn't expect.

So I thought to myself that it'd be better to remove the unnecessary
precedence markings, and tried, with the attached patch.  And behold,
I got a shift/reduce conflict:

state 2788

  1569 b_expr: b_expr qual_Op . b_expr
  1571       | b_expr qual_Op .

    NULL_P             shift, and go to state 1010
    NULL_P    [reduce using rule 1571 (b_expr)]

So the god of unintended consequences has been here already.  What this
is telling us is that in examples such as

    CREATE TABLE foo (f1 int DEFAULT 10 %% NULL);

it is not immediately clear to bison whether to shift upon seeing the
NULL (which leads to a parse tree that says %% is an infix operator with
arguments 10 and NULL), or to reduce (which leads to a parse tree that
says that %% is a postfix operator with argument 10, and NULL is a
column declaration constraint separate from the DEFAULT expression).

If you try the experiment, you find out that the first interpretation is
preferred by the current grammar:

    ERROR:  operator does not exist: integer %% unknown

Now, this is probably a good thing, because NULL as a column declaration
constraint is not SQL standard (only NOT NULL is), so we're resolving
the ambiguity in a way that's more likely to be SQL-compatible.  But it
could be surprising to somebody who expected the other behavior,
especially since this seemingly-closely-related command is parsed the
other way:

    CREATE TABLE foo (f1 int DEFAULT 10 %% NOT NULL);
    ERROR:  operator does not exist: integer %%

And the reason for that difference in behavior is that NOT has a
declared precedence lower than POSTFIXOP, whereas NULL has a declared
precedence that's higher.  That comparison determines how bison resolves
the shift/reduce conflict.

The fact that this behavior falls out of a precedence declaration that's
seemingly quite unrelated, and was surely not designed with this case in
mind, is a perfect example of why I say that precedence declarations are
hazardous.

So I'd still like to get rid of the precedence markings for TRUE_P,
FALSE_P, and UNKNOWN, and will do so unless somebody has a really good
reason not to.  (It looks like we could avoid marking ZONE, too.)  But
I would be happier if we could also not mark NULL, because that's surely
used in a lot of other places, and could easily bite us a lot harder
than this.  Can anyone think of an alternative way to resolve this
particular conflict without the blunt instrument of a precedence marking?

            regards, tom lane

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 933a1a2..2fb0418 100644
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
*************** static void SplitColQualList(List *qualL
*** 614,620 ****
  %left        Op OPERATOR        /* multi-character ops and user-defined operators */
  %nonassoc    NOTNULL
  %nonassoc    ISNULL
! %nonassoc    IS NULL_P TRUE_P FALSE_P UNKNOWN /* sets precedence for IS NULL, etc */
  %left        '+' '-'
  %left        '*' '/' '%'
  %left        '^'
--- 614,620 ----
  %left        Op OPERATOR        /* multi-character ops and user-defined operators */
  %nonassoc    NOTNULL
  %nonassoc    ISNULL
! %nonassoc    IS                /* sets precedence for IS NULL, etc */
  %left        '+' '-'
  %left        '*' '/' '%'
  %left        '^'
*************** a_expr:        c_expr                                    { $$ = $1; }
*** 9887,9893 ****
               *    a ISNULL
               *    a NOTNULL
               */
!             | a_expr IS NULL_P
                  {
                      NullTest *n = makeNode(NullTest);
                      n->arg = (Expr *) $1;
--- 9887,9893 ----
               *    a ISNULL
               *    a NOTNULL
               */
!             | a_expr IS NULL_P                            %prec IS
                  {
                      NullTest *n = makeNode(NullTest);
                      n->arg = (Expr *) $1;
*************** a_expr:        c_expr                                    { $$ = $1; }
*** 9901,9907 ****
                      n->nulltesttype = IS_NULL;
                      $$ = (Node *)n;
                  }
!             | a_expr IS NOT NULL_P
                  {
                      NullTest *n = makeNode(NullTest);
                      n->arg = (Expr *) $1;
--- 9901,9907 ----
                      n->nulltesttype = IS_NULL;
                      $$ = (Node *)n;
                  }
!             | a_expr IS NOT NULL_P                        %prec IS
                  {
                      NullTest *n = makeNode(NullTest);
                      n->arg = (Expr *) $1;
*************** a_expr:        c_expr                                    { $$ = $1; }
*** 9919,9960 ****
                  {
                      $$ = (Node *)makeOverlaps($1, $3, @2, yyscanner);
                  }
!             | a_expr IS TRUE_P
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
                      b->booltesttype = IS_TRUE;
                      $$ = (Node *)b;
                  }
!             | a_expr IS NOT TRUE_P
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
                      b->booltesttype = IS_NOT_TRUE;
                      $$ = (Node *)b;
                  }
!             | a_expr IS FALSE_P
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
                      b->booltesttype = IS_FALSE;
                      $$ = (Node *)b;
                  }
!             | a_expr IS NOT FALSE_P
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
                      b->booltesttype = IS_NOT_FALSE;
                      $$ = (Node *)b;
                  }
!             | a_expr IS UNKNOWN
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
                      b->booltesttype = IS_UNKNOWN;
                      $$ = (Node *)b;
                  }
!             | a_expr IS NOT UNKNOWN
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
--- 9919,9960 ----
                  {
                      $$ = (Node *)makeOverlaps($1, $3, @2, yyscanner);
                  }
!             | a_expr IS TRUE_P                            %prec IS
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
                      b->booltesttype = IS_TRUE;
                      $$ = (Node *)b;
                  }
!             | a_expr IS NOT TRUE_P                        %prec IS
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
                      b->booltesttype = IS_NOT_TRUE;
                      $$ = (Node *)b;
                  }
!             | a_expr IS FALSE_P                            %prec IS
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
                      b->booltesttype = IS_FALSE;
                      $$ = (Node *)b;
                  }
!             | a_expr IS NOT FALSE_P                        %prec IS
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
                      b->booltesttype = IS_NOT_FALSE;
                      $$ = (Node *)b;
                  }
!             | a_expr IS UNKNOWN                            %prec IS
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
                      b->booltesttype = IS_UNKNOWN;
                      $$ = (Node *)b;
                  }
!             | a_expr IS NOT UNKNOWN                        %prec IS
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;

Re: Some surprising precedence behavior in PG's grammar

From
Andrew Dunstan
Date:

On 05/04/2011 07:39 PM, Tom Lane wrote:
> While looking at the grammar's operator-precedence declarations in
> connection with a recent pgsql-docs question, it struck me that this
> declaration is a foot-gun waiting to go off:
>
> %nonassoc    IS NULL_P TRUE_P FALSE_P UNKNOWN /* sets precedence for IS NULL, etc */
>
> The only terminal that we actually need to attach precedence to for
> IS NULL and related productions is "IS".  The others are just listed
> there to save attaching explicit %prec declarations to those productions.
> This seems like a bad idea, because attaching a precedence to a terminal
> symbol that doesn't absolutely have to have one is just asking for
> trouble: it can cause bison to accept grammars that would better have
> provoked a shift/reduce error, and to silently resolve the ambiguity in
> a way that you maybe didn't expect.
>
> So I thought to myself that it'd be better to remove the unnecessary
> precedence markings, and tried, with the attached patch.  And behold,
> I got a shift/reduce conflict:
>
> state 2788
>
>    1569 b_expr: b_expr qual_Op . b_expr
>    1571       | b_expr qual_Op .
>
>      NULL_P             shift, and go to state 1010
>      NULL_P    [reduce using rule 1571 (b_expr)]
>
> So the god of unintended consequences has been here already.  What this
> is telling us is that in examples such as
>
>     CREATE TABLE foo (f1 int DEFAULT 10 %% NULL);
>
> it is not immediately clear to bison whether to shift upon seeing the
> NULL (which leads to a parse tree that says %% is an infix operator with
> arguments 10 and NULL), or to reduce (which leads to a parse tree that
> says that %% is a postfix operator with argument 10, and NULL is a
> column declaration constraint separate from the DEFAULT expression).
>
> If you try the experiment, you find out that the first interpretation is
> preferred by the current grammar:
>
>     ERROR:  operator does not exist: integer %% unknown

Yeah, IIRC the default action for a shift/reduce conflict is to shift, 
as it's usually the right thing to do.

> Now, this is probably a good thing, because NULL as a column declaration
> constraint is not SQL standard (only NOT NULL is), so we're resolving
> the ambiguity in a way that's more likely to be SQL-compatible.  But it
> could be surprising to somebody who expected the other behavior,
> especially since this seemingly-closely-related command is parsed the
> other way:
>
>     CREATE TABLE foo (f1 int DEFAULT 10 %% NOT NULL);
>     ERROR:  operator does not exist: integer %%
>
> And the reason for that difference in behavior is that NOT has a
> declared precedence lower than POSTFIXOP, whereas NULL has a declared
> precedence that's higher.  That comparison determines how bison resolves
> the shift/reduce conflict.
>
> The fact that this behavior falls out of a precedence declaration that's
> seemingly quite unrelated, and was surely not designed with this case in
> mind, is a perfect example of why I say that precedence declarations are
> hazardous.
>
> So I'd still like to get rid of the precedence markings for TRUE_P,
> FALSE_P, and UNKNOWN, and will do so unless somebody has a really good
> reason not to.  (It looks like we could avoid marking ZONE, too.)  But
> I would be happier if we could also not mark NULL, because that's surely
> used in a lot of other places, and could easily bite us a lot harder
> than this.  Can anyone think of an alternative way to resolve this
> particular conflict without the blunt instrument of a precedence marking?
>             
>

My bison-fu is a bit rusty, but years ago I could do this stuff in my 
sleep. I'll be surprised if there isn't a way.

If we do need a precedence setting for NULL_P, then I think it should 
probably be on its own and not sharing one with IS.

If you don't solve it soon I'll take a look after I clear a couple of 
higher priority items from my list.

cheers

andrew


Re: Some surprising precedence behavior in PG's grammar

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 05/04/2011 07:39 PM, Tom Lane wrote:
>> If you try the experiment, you find out that the first interpretation is
>> preferred by the current grammar:
>> ERROR:  operator does not exist: integer %% unknown

> Yeah, IIRC the default action for a shift/reduce conflict is to shift, 
> as it's usually the right thing to do.

Well, there's nothing "default" about it: we've got precedence
declarations that specifically tell it to do that.  I'm just disturbed
because this isn't what those precedences were meant to do.

>> I would be happier if we could also not mark NULL, because that's surely
>> used in a lot of other places, and could easily bite us a lot harder
>> than this.  Can anyone think of an alternative way to resolve this
>> particular conflict without the blunt instrument of a precedence marking?

> My bison-fu is a bit rusty, but years ago I could do this stuff in my 
> sleep. I'll be surprised if there isn't a way.

> If we do need a precedence setting for NULL_P, then I think it should 
> probably be on its own and not sharing one with IS.

Yeah, I was thinking that too.  If we put %prec on the IS [NOT] NULL
productions then there is no need for NULL_P to have exactly its current
precedence; anything above POSTFIXOP would preserve the current behavior
in the DEFAULT ... NULL case.  (And if we decided we wanted to flip that
behavior, anything below POSTFIXOP would do that.)

But it would still make life safer for future grammar hacking if it
didn't have precedence at all.

BTW, I wonder why NOTNULL and ISNULL have their own precedence levels,
rather than being made to act exactly like IS [NOT] NULL ...
        regards, tom lane


Re: Some surprising precedence behavior in PG's grammar

From
Greg Stark
Date:
On Thu, May 5, 2011 at 12:39 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> So I'd still like to get rid of the precedence markings for TRUE_P,
> FALSE_P, and UNKNOWN, and will do so unless somebody has a really good
> reason not to.  (It looks like we could avoid marking ZONE, too.)  But
> I would be happier if we could also not mark NULL, because that's surely
> used in a lot of other places, and could easily bite us a lot harder
> than this.  Can anyone think of an alternative way to resolve this
> particular conflict without the blunt instrument of a precedence marking?
>

Isn't there already some gadget which forces postfix operators to be
discouraged compared to some other interpretation in other cases? That
would be the opposite of the current interpretation though which you
said you preferred.

--
greg


Re: Some surprising precedence behavior in PG's grammar

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Isn't there already some gadget which forces postfix operators to be
> discouraged compared to some other interpretation in other cases?

Yeah.  I'm not unhappy with the current grammar's behavior in this case.
What's bothering me is that the implementation seems likely to create
surprising/unexpected behaviors after future grammar changes.
        regards, tom lane


Re: Some surprising precedence behavior in PG's grammar

From
Greg Stark
Date:
On Thu, May 5, 2011 at 4:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Stark <gsstark@mit.edu> writes:
>> Isn't there already some gadget which forces postfix operators to be
>> discouraged compared to some other interpretation in other cases?
>
> Yeah.  I'm not unhappy with the current grammar's behavior in this case.
> What's bothering me is that the implementation seems likely to create
> surprising/unexpected behaviors after future grammar changes.

I do wonder how much we really gain from having postfix operators.
Other than ! I've never seen one and of course anyone who wanted to
use one could just as easily use a prefix operator. In practice I
think most unary operators are just special cases of binary operators
anyways and often once you have the binary operator it's clearer to
just use that anyways.

A *lot* of grammar conflicts we've had to worry about end up going
away if we didn't have postfix operators.
--
greg


Re: Some surprising precedence behavior in PG's grammar

From
Tom Lane
Date:
I wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> If we do need a precedence setting for NULL_P, then I think it should 
>> probably be on its own and not sharing one with IS.

> Yeah, I was thinking that too.  If we put %prec on the IS [NOT] NULL
> productions then there is no need for NULL_P to have exactly its current
> precedence; anything above POSTFIXOP would preserve the current behavior
> in the DEFAULT ... NULL case.  (And if we decided we wanted to flip that
> behavior, anything below POSTFIXOP would do that.)

On reflection I decided that the best quick-fix is to put NULL into the
list of keywords that are already precedence-grouped with IDENT.  That
at least makes sure that it has precedence behavior equivalent to any
plain old non-keyword.  If you can find a better fix, maybe we could
apply it to the other cases mentioned there as well.

> BTW, I wonder why NOTNULL and ISNULL have their own precedence levels,
> rather than being made to act exactly like IS [NOT] NULL ...

Is anybody up for changing that, or should we leave well enough alone?
        regards, tom lane