Some surprising precedence behavior in PG's grammar - Mailing list pgsql-hackers

From Tom Lane
Subject Some surprising precedence behavior in PG's grammar
Date
Msg-id 19623.1304552394@sss.pgh.pa.us
Whole thread Raw
Responses Re: Some surprising precedence behavior in PG's grammar  (Andrew Dunstan <andrew@dunslane.net>)
Re: Some surprising precedence behavior in PG's grammar  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
While looking at the grammar's operator-precedence declarations in
connection with a recent pgsql-docs question, it struck me that this
declaration is a foot-gun waiting to go off:

%nonassoc    IS NULL_P TRUE_P FALSE_P UNKNOWN /* sets precedence for IS NULL, etc */

The only terminal that we actually need to attach precedence to for
IS NULL and related productions is "IS".  The others are just listed
there to save attaching explicit %prec declarations to those productions.
This seems like a bad idea, because attaching a precedence to a terminal
symbol that doesn't absolutely have to have one is just asking for
trouble: it can cause bison to accept grammars that would better have
provoked a shift/reduce error, and to silently resolve the ambiguity in
a way that you maybe didn't expect.

So I thought to myself that it'd be better to remove the unnecessary
precedence markings, and tried, with the attached patch.  And behold,
I got a shift/reduce conflict:

state 2788

  1569 b_expr: b_expr qual_Op . b_expr
  1571       | b_expr qual_Op .

    NULL_P             shift, and go to state 1010
    NULL_P    [reduce using rule 1571 (b_expr)]

So the god of unintended consequences has been here already.  What this
is telling us is that in examples such as

    CREATE TABLE foo (f1 int DEFAULT 10 %% NULL);

it is not immediately clear to bison whether to shift upon seeing the
NULL (which leads to a parse tree that says %% is an infix operator with
arguments 10 and NULL), or to reduce (which leads to a parse tree that
says that %% is a postfix operator with argument 10, and NULL is a
column declaration constraint separate from the DEFAULT expression).

If you try the experiment, you find out that the first interpretation is
preferred by the current grammar:

    ERROR:  operator does not exist: integer %% unknown

Now, this is probably a good thing, because NULL as a column declaration
constraint is not SQL standard (only NOT NULL is), so we're resolving
the ambiguity in a way that's more likely to be SQL-compatible.  But it
could be surprising to somebody who expected the other behavior,
especially since this seemingly-closely-related command is parsed the
other way:

    CREATE TABLE foo (f1 int DEFAULT 10 %% NOT NULL);
    ERROR:  operator does not exist: integer %%

And the reason for that difference in behavior is that NOT has a
declared precedence lower than POSTFIXOP, whereas NULL has a declared
precedence that's higher.  That comparison determines how bison resolves
the shift/reduce conflict.

The fact that this behavior falls out of a precedence declaration that's
seemingly quite unrelated, and was surely not designed with this case in
mind, is a perfect example of why I say that precedence declarations are
hazardous.

So I'd still like to get rid of the precedence markings for TRUE_P,
FALSE_P, and UNKNOWN, and will do so unless somebody has a really good
reason not to.  (It looks like we could avoid marking ZONE, too.)  But
I would be happier if we could also not mark NULL, because that's surely
used in a lot of other places, and could easily bite us a lot harder
than this.  Can anyone think of an alternative way to resolve this
particular conflict without the blunt instrument of a precedence marking?

            regards, tom lane

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 933a1a2..2fb0418 100644
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
*************** static void SplitColQualList(List *qualL
*** 614,620 ****
  %left        Op OPERATOR        /* multi-character ops and user-defined operators */
  %nonassoc    NOTNULL
  %nonassoc    ISNULL
! %nonassoc    IS NULL_P TRUE_P FALSE_P UNKNOWN /* sets precedence for IS NULL, etc */
  %left        '+' '-'
  %left        '*' '/' '%'
  %left        '^'
--- 614,620 ----
  %left        Op OPERATOR        /* multi-character ops and user-defined operators */
  %nonassoc    NOTNULL
  %nonassoc    ISNULL
! %nonassoc    IS                /* sets precedence for IS NULL, etc */
  %left        '+' '-'
  %left        '*' '/' '%'
  %left        '^'
*************** a_expr:        c_expr                                    { $$ = $1; }
*** 9887,9893 ****
               *    a ISNULL
               *    a NOTNULL
               */
!             | a_expr IS NULL_P
                  {
                      NullTest *n = makeNode(NullTest);
                      n->arg = (Expr *) $1;
--- 9887,9893 ----
               *    a ISNULL
               *    a NOTNULL
               */
!             | a_expr IS NULL_P                            %prec IS
                  {
                      NullTest *n = makeNode(NullTest);
                      n->arg = (Expr *) $1;
*************** a_expr:        c_expr                                    { $$ = $1; }
*** 9901,9907 ****
                      n->nulltesttype = IS_NULL;
                      $$ = (Node *)n;
                  }
!             | a_expr IS NOT NULL_P
                  {
                      NullTest *n = makeNode(NullTest);
                      n->arg = (Expr *) $1;
--- 9901,9907 ----
                      n->nulltesttype = IS_NULL;
                      $$ = (Node *)n;
                  }
!             | a_expr IS NOT NULL_P                        %prec IS
                  {
                      NullTest *n = makeNode(NullTest);
                      n->arg = (Expr *) $1;
*************** a_expr:        c_expr                                    { $$ = $1; }
*** 9919,9960 ****
                  {
                      $$ = (Node *)makeOverlaps($1, $3, @2, yyscanner);
                  }
!             | a_expr IS TRUE_P
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
                      b->booltesttype = IS_TRUE;
                      $$ = (Node *)b;
                  }
!             | a_expr IS NOT TRUE_P
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
                      b->booltesttype = IS_NOT_TRUE;
                      $$ = (Node *)b;
                  }
!             | a_expr IS FALSE_P
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
                      b->booltesttype = IS_FALSE;
                      $$ = (Node *)b;
                  }
!             | a_expr IS NOT FALSE_P
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
                      b->booltesttype = IS_NOT_FALSE;
                      $$ = (Node *)b;
                  }
!             | a_expr IS UNKNOWN
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
                      b->booltesttype = IS_UNKNOWN;
                      $$ = (Node *)b;
                  }
!             | a_expr IS NOT UNKNOWN
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
--- 9919,9960 ----
                  {
                      $$ = (Node *)makeOverlaps($1, $3, @2, yyscanner);
                  }
!             | a_expr IS TRUE_P                            %prec IS
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
                      b->booltesttype = IS_TRUE;
                      $$ = (Node *)b;
                  }
!             | a_expr IS NOT TRUE_P                        %prec IS
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
                      b->booltesttype = IS_NOT_TRUE;
                      $$ = (Node *)b;
                  }
!             | a_expr IS FALSE_P                            %prec IS
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
                      b->booltesttype = IS_FALSE;
                      $$ = (Node *)b;
                  }
!             | a_expr IS NOT FALSE_P                        %prec IS
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
                      b->booltesttype = IS_NOT_FALSE;
                      $$ = (Node *)b;
                  }
!             | a_expr IS UNKNOWN                            %prec IS
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;
                      b->booltesttype = IS_UNKNOWN;
                      $$ = (Node *)b;
                  }
!             | a_expr IS NOT UNKNOWN                        %prec IS
                  {
                      BooleanTest *b = makeNode(BooleanTest);
                      b->arg = (Expr *) $1;

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: VARIANT / ANYTYPE datatype
Next
From: Tom Lane
Date:
Subject: Re: VARIANT / ANYTYPE datatype