Thread: Some surprising precedence behavior in PG's grammar
While looking at the grammar's operator-precedence declarations in connection with a recent pgsql-docs question, it struck me that this declaration is a foot-gun waiting to go off: %nonassoc IS NULL_P TRUE_P FALSE_P UNKNOWN /* sets precedence for IS NULL, etc */ The only terminal that we actually need to attach precedence to for IS NULL and related productions is "IS". The others are just listed there to save attaching explicit %prec declarations to those productions. This seems like a bad idea, because attaching a precedence to a terminal symbol that doesn't absolutely have to have one is just asking for trouble: it can cause bison to accept grammars that would better have provoked a shift/reduce error, and to silently resolve the ambiguity in a way that you maybe didn't expect. So I thought to myself that it'd be better to remove the unnecessary precedence markings, and tried, with the attached patch. And behold, I got a shift/reduce conflict: state 2788 1569 b_expr: b_expr qual_Op . b_expr 1571 | b_expr qual_Op . NULL_P shift, and go to state 1010 NULL_P [reduce using rule 1571 (b_expr)] So the god of unintended consequences has been here already. What this is telling us is that in examples such as CREATE TABLE foo (f1 int DEFAULT 10 %% NULL); it is not immediately clear to bison whether to shift upon seeing the NULL (which leads to a parse tree that says %% is an infix operator with arguments 10 and NULL), or to reduce (which leads to a parse tree that says that %% is a postfix operator with argument 10, and NULL is a column declaration constraint separate from the DEFAULT expression). If you try the experiment, you find out that the first interpretation is preferred by the current grammar: ERROR: operator does not exist: integer %% unknown Now, this is probably a good thing, because NULL as a column declaration constraint is not SQL standard (only NOT NULL is), so we're resolving the ambiguity in a way that's more likely to be SQL-compatible. But it could be surprising to somebody who expected the other behavior, especially since this seemingly-closely-related command is parsed the other way: CREATE TABLE foo (f1 int DEFAULT 10 %% NOT NULL); ERROR: operator does not exist: integer %% And the reason for that difference in behavior is that NOT has a declared precedence lower than POSTFIXOP, whereas NULL has a declared precedence that's higher. That comparison determines how bison resolves the shift/reduce conflict. The fact that this behavior falls out of a precedence declaration that's seemingly quite unrelated, and was surely not designed with this case in mind, is a perfect example of why I say that precedence declarations are hazardous. So I'd still like to get rid of the precedence markings for TRUE_P, FALSE_P, and UNKNOWN, and will do so unless somebody has a really good reason not to. (It looks like we could avoid marking ZONE, too.) But I would be happier if we could also not mark NULL, because that's surely used in a lot of other places, and could easily bite us a lot harder than this. Can anyone think of an alternative way to resolve this particular conflict without the blunt instrument of a precedence marking? regards, tom lane diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 933a1a2..2fb0418 100644 *** a/src/backend/parser/gram.y --- b/src/backend/parser/gram.y *************** static void SplitColQualList(List *qualL *** 614,620 **** %left Op OPERATOR /* multi-character ops and user-defined operators */ %nonassoc NOTNULL %nonassoc ISNULL ! %nonassoc IS NULL_P TRUE_P FALSE_P UNKNOWN /* sets precedence for IS NULL, etc */ %left '+' '-' %left '*' '/' '%' %left '^' --- 614,620 ---- %left Op OPERATOR /* multi-character ops and user-defined operators */ %nonassoc NOTNULL %nonassoc ISNULL ! %nonassoc IS /* sets precedence for IS NULL, etc */ %left '+' '-' %left '*' '/' '%' %left '^' *************** a_expr: c_expr { $$ = $1; } *** 9887,9893 **** * a ISNULL * a NOTNULL */ ! | a_expr IS NULL_P { NullTest *n = makeNode(NullTest); n->arg = (Expr *) $1; --- 9887,9893 ---- * a ISNULL * a NOTNULL */ ! | a_expr IS NULL_P %prec IS { NullTest *n = makeNode(NullTest); n->arg = (Expr *) $1; *************** a_expr: c_expr { $$ = $1; } *** 9901,9907 **** n->nulltesttype = IS_NULL; $$ = (Node *)n; } ! | a_expr IS NOT NULL_P { NullTest *n = makeNode(NullTest); n->arg = (Expr *) $1; --- 9901,9907 ---- n->nulltesttype = IS_NULL; $$ = (Node *)n; } ! | a_expr IS NOT NULL_P %prec IS { NullTest *n = makeNode(NullTest); n->arg = (Expr *) $1; *************** a_expr: c_expr { $$ = $1; } *** 9919,9960 **** { $$ = (Node *)makeOverlaps($1, $3, @2, yyscanner); } ! | a_expr IS TRUE_P { BooleanTest *b = makeNode(BooleanTest); b->arg = (Expr *) $1; b->booltesttype = IS_TRUE; $$ = (Node *)b; } ! | a_expr IS NOT TRUE_P { BooleanTest *b = makeNode(BooleanTest); b->arg = (Expr *) $1; b->booltesttype = IS_NOT_TRUE; $$ = (Node *)b; } ! | a_expr IS FALSE_P { BooleanTest *b = makeNode(BooleanTest); b->arg = (Expr *) $1; b->booltesttype = IS_FALSE; $$ = (Node *)b; } ! | a_expr IS NOT FALSE_P { BooleanTest *b = makeNode(BooleanTest); b->arg = (Expr *) $1; b->booltesttype = IS_NOT_FALSE; $$ = (Node *)b; } ! | a_expr IS UNKNOWN { BooleanTest *b = makeNode(BooleanTest); b->arg = (Expr *) $1; b->booltesttype = IS_UNKNOWN; $$ = (Node *)b; } ! | a_expr IS NOT UNKNOWN { BooleanTest *b = makeNode(BooleanTest); b->arg = (Expr *) $1; --- 9919,9960 ---- { $$ = (Node *)makeOverlaps($1, $3, @2, yyscanner); } ! | a_expr IS TRUE_P %prec IS { BooleanTest *b = makeNode(BooleanTest); b->arg = (Expr *) $1; b->booltesttype = IS_TRUE; $$ = (Node *)b; } ! | a_expr IS NOT TRUE_P %prec IS { BooleanTest *b = makeNode(BooleanTest); b->arg = (Expr *) $1; b->booltesttype = IS_NOT_TRUE; $$ = (Node *)b; } ! | a_expr IS FALSE_P %prec IS { BooleanTest *b = makeNode(BooleanTest); b->arg = (Expr *) $1; b->booltesttype = IS_FALSE; $$ = (Node *)b; } ! | a_expr IS NOT FALSE_P %prec IS { BooleanTest *b = makeNode(BooleanTest); b->arg = (Expr *) $1; b->booltesttype = IS_NOT_FALSE; $$ = (Node *)b; } ! | a_expr IS UNKNOWN %prec IS { BooleanTest *b = makeNode(BooleanTest); b->arg = (Expr *) $1; b->booltesttype = IS_UNKNOWN; $$ = (Node *)b; } ! | a_expr IS NOT UNKNOWN %prec IS { BooleanTest *b = makeNode(BooleanTest); b->arg = (Expr *) $1;
On 05/04/2011 07:39 PM, Tom Lane wrote: > While looking at the grammar's operator-precedence declarations in > connection with a recent pgsql-docs question, it struck me that this > declaration is a foot-gun waiting to go off: > > %nonassoc IS NULL_P TRUE_P FALSE_P UNKNOWN /* sets precedence for IS NULL, etc */ > > The only terminal that we actually need to attach precedence to for > IS NULL and related productions is "IS". The others are just listed > there to save attaching explicit %prec declarations to those productions. > This seems like a bad idea, because attaching a precedence to a terminal > symbol that doesn't absolutely have to have one is just asking for > trouble: it can cause bison to accept grammars that would better have > provoked a shift/reduce error, and to silently resolve the ambiguity in > a way that you maybe didn't expect. > > So I thought to myself that it'd be better to remove the unnecessary > precedence markings, and tried, with the attached patch. And behold, > I got a shift/reduce conflict: > > state 2788 > > 1569 b_expr: b_expr qual_Op . b_expr > 1571 | b_expr qual_Op . > > NULL_P shift, and go to state 1010 > NULL_P [reduce using rule 1571 (b_expr)] > > So the god of unintended consequences has been here already. What this > is telling us is that in examples such as > > CREATE TABLE foo (f1 int DEFAULT 10 %% NULL); > > it is not immediately clear to bison whether to shift upon seeing the > NULL (which leads to a parse tree that says %% is an infix operator with > arguments 10 and NULL), or to reduce (which leads to a parse tree that > says that %% is a postfix operator with argument 10, and NULL is a > column declaration constraint separate from the DEFAULT expression). > > If you try the experiment, you find out that the first interpretation is > preferred by the current grammar: > > ERROR: operator does not exist: integer %% unknown Yeah, IIRC the default action for a shift/reduce conflict is to shift, as it's usually the right thing to do. > Now, this is probably a good thing, because NULL as a column declaration > constraint is not SQL standard (only NOT NULL is), so we're resolving > the ambiguity in a way that's more likely to be SQL-compatible. But it > could be surprising to somebody who expected the other behavior, > especially since this seemingly-closely-related command is parsed the > other way: > > CREATE TABLE foo (f1 int DEFAULT 10 %% NOT NULL); > ERROR: operator does not exist: integer %% > > And the reason for that difference in behavior is that NOT has a > declared precedence lower than POSTFIXOP, whereas NULL has a declared > precedence that's higher. That comparison determines how bison resolves > the shift/reduce conflict. > > The fact that this behavior falls out of a precedence declaration that's > seemingly quite unrelated, and was surely not designed with this case in > mind, is a perfect example of why I say that precedence declarations are > hazardous. > > So I'd still like to get rid of the precedence markings for TRUE_P, > FALSE_P, and UNKNOWN, and will do so unless somebody has a really good > reason not to. (It looks like we could avoid marking ZONE, too.) But > I would be happier if we could also not mark NULL, because that's surely > used in a lot of other places, and could easily bite us a lot harder > than this. Can anyone think of an alternative way to resolve this > particular conflict without the blunt instrument of a precedence marking? > > My bison-fu is a bit rusty, but years ago I could do this stuff in my sleep. I'll be surprised if there isn't a way. If we do need a precedence setting for NULL_P, then I think it should probably be on its own and not sharing one with IS. If you don't solve it soon I'll take a look after I clear a couple of higher priority items from my list. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 05/04/2011 07:39 PM, Tom Lane wrote: >> If you try the experiment, you find out that the first interpretation is >> preferred by the current grammar: >> ERROR: operator does not exist: integer %% unknown > Yeah, IIRC the default action for a shift/reduce conflict is to shift, > as it's usually the right thing to do. Well, there's nothing "default" about it: we've got precedence declarations that specifically tell it to do that. I'm just disturbed because this isn't what those precedences were meant to do. >> I would be happier if we could also not mark NULL, because that's surely >> used in a lot of other places, and could easily bite us a lot harder >> than this. Can anyone think of an alternative way to resolve this >> particular conflict without the blunt instrument of a precedence marking? > My bison-fu is a bit rusty, but years ago I could do this stuff in my > sleep. I'll be surprised if there isn't a way. > If we do need a precedence setting for NULL_P, then I think it should > probably be on its own and not sharing one with IS. Yeah, I was thinking that too. If we put %prec on the IS [NOT] NULL productions then there is no need for NULL_P to have exactly its current precedence; anything above POSTFIXOP would preserve the current behavior in the DEFAULT ... NULL case. (And if we decided we wanted to flip that behavior, anything below POSTFIXOP would do that.) But it would still make life safer for future grammar hacking if it didn't have precedence at all. BTW, I wonder why NOTNULL and ISNULL have their own precedence levels, rather than being made to act exactly like IS [NOT] NULL ... regards, tom lane
On Thu, May 5, 2011 at 12:39 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > So I'd still like to get rid of the precedence markings for TRUE_P, > FALSE_P, and UNKNOWN, and will do so unless somebody has a really good > reason not to. (It looks like we could avoid marking ZONE, too.) But > I would be happier if we could also not mark NULL, because that's surely > used in a lot of other places, and could easily bite us a lot harder > than this. Can anyone think of an alternative way to resolve this > particular conflict without the blunt instrument of a precedence marking? > Isn't there already some gadget which forces postfix operators to be discouraged compared to some other interpretation in other cases? That would be the opposite of the current interpretation though which you said you preferred. -- greg
Greg Stark <gsstark@mit.edu> writes: > Isn't there already some gadget which forces postfix operators to be > discouraged compared to some other interpretation in other cases? Yeah. I'm not unhappy with the current grammar's behavior in this case. What's bothering me is that the implementation seems likely to create surprising/unexpected behaviors after future grammar changes. regards, tom lane
On Thu, May 5, 2011 at 4:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Greg Stark <gsstark@mit.edu> writes: >> Isn't there already some gadget which forces postfix operators to be >> discouraged compared to some other interpretation in other cases? > > Yeah. I'm not unhappy with the current grammar's behavior in this case. > What's bothering me is that the implementation seems likely to create > surprising/unexpected behaviors after future grammar changes. I do wonder how much we really gain from having postfix operators. Other than ! I've never seen one and of course anyone who wanted to use one could just as easily use a prefix operator. In practice I think most unary operators are just special cases of binary operators anyways and often once you have the binary operator it's clearer to just use that anyways. A *lot* of grammar conflicts we've had to worry about end up going away if we didn't have postfix operators. -- greg
I wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> If we do need a precedence setting for NULL_P, then I think it should >> probably be on its own and not sharing one with IS. > Yeah, I was thinking that too. If we put %prec on the IS [NOT] NULL > productions then there is no need for NULL_P to have exactly its current > precedence; anything above POSTFIXOP would preserve the current behavior > in the DEFAULT ... NULL case. (And if we decided we wanted to flip that > behavior, anything below POSTFIXOP would do that.) On reflection I decided that the best quick-fix is to put NULL into the list of keywords that are already precedence-grouped with IDENT. That at least makes sure that it has precedence behavior equivalent to any plain old non-keyword. If you can find a better fix, maybe we could apply it to the other cases mentioned there as well. > BTW, I wonder why NOTNULL and ISNULL have their own precedence levels, > rather than being made to act exactly like IS [NOT] NULL ... Is anybody up for changing that, or should we leave well enough alone? regards, tom lane