Thread: BUG #15555: Syntax errors when using the COMMENT command in plpgsqland a "comment" variable

BUG #15555: Syntax errors when using the COMMENT command in plpgsqland a "comment" variable

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15555
Logged by:          Feike Steenbergen
Email address:      feikesteenbergen@gmail.com
PostgreSQL version: 11.1
Operating system:   CentOS Linux release 7.5.1804 (Core)
Description:

Recently, I ran into an issue when trying to put comments on objects in
plpgsql DO block, for example the following:

DO $$
DECLARE
    "comment" text := 'This is a comment';
BEGIN
    COMMENT ON TABLE abc IS 'This is another comment';
END;
$$;

Generates the following error message:

ERROR:  42601: syntax error at or near "ON"
LINE 5:     COMMENT ON TABLE abc IS 'This is another comment';

Renaming the variable from "comment" to something else makes the problem go
away,
as well as wrapping the COMMENT command in an EXECUTE call;
It happens both when double quoting the identifier, or when omitting the
double
quotes.

For now I'll just avoid using comment as a variable name altogether, but I
was
surprised by this behaviour.

Kind regards,

Feike


=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
> Recently, I ran into an issue when trying to put comments on objects in
> plpgsql DO block, for example the following:
>
> DO $$
> DECLARE
>     "comment" text := 'This is a comment';
> BEGIN
>     COMMENT ON TABLE abc IS 'This is another comment';
> END;
> $$;
>
> Generates the following error message:
> ERROR:  42601: syntax error at or near "ON"
> LINE 5:     COMMENT ON TABLE abc IS 'This is another comment';
>
> Renaming the variable from "comment" to something else makes the problem go
> away,

This isn't specific to COMMENT, you can cause problems with variables
named like any statement-introducing keyword.  For instance

regression=# do $$
declare update int;
begin
  update foo set bar = 42;
end$$;
ERROR:  syntax error at or near "foo"
LINE 4:   update foo set bar = 42;
                 ^

On the other hand, such a variable works fine as long as you use it
as a variable:

regression=# do $$
declare update int;
begin
  update := 42;
  raise notice 'update = %', update;
end$$;
NOTICE:  update = 42
DO

One idea is to get rid of the ambiguity by making all such words reserved
so far as plpgsql is concerned, but nobody would thank us for that.
It would be a maintenance problem too, because the plpgsql parser doesn't
otherwise have a list of such keywords.

I wonder whether we could improve matters by adjusting the heuristic for
such things in pl_scanner.c:

             * If we are at start of statement, prefer unreserved keywords
             * over variable names, unless the next token is assignment or
             * '[', in which case prefer variable names.  (Note we need not
             * consider '.' as the next token; that case was handled above,
             * and we always prefer variable names in that case.)  If we are
             * not at start of statement, always prefer variable names over
             * unreserved keywords.

The trouble with special-casing unreserved keywords here is precisely
that those only include words that are special to plpgsql, not words
that introduce statements of the main SQL grammar.

Maybe, if we are at start of statement, we should not even consider the
possibility of matching an unqualified identifier to a variable name
unless the next token is assignment or '['.  IOW, the logic here would
become (1) if not at statement start, or if next token is assignment or
'[', see if the identifier matches a variable name.  (2) If not, see if
it matches an unreserved plpgsql keyword.  (3) If not, assume it is a SQL
keyword.

Are there any other cases where recognizing a variable name at statement
start is the correct thing to do?  Even if it's not correct, could this
result in worse error messages?  I think you probably just end up with
"syntax error" whenever we guess wrong, but I might be missing something.

            regards, tom lane


I wrote:
> I wonder whether we could improve matters by adjusting the heuristic for
> such things in pl_scanner.c:
>
>              * If we are at start of statement, prefer unreserved keywords
>              * over variable names, unless the next token is assignment or
>              * '[', in which case prefer variable names.  (Note we need not
>              * consider '.' as the next token; that case was handled above,
>              * and we always prefer variable names in that case.)  If we are
>              * not at start of statement, always prefer variable names over
>              * unreserved keywords.

Digging in the commit history, that code all dates to commit bb1b8f69,
which arose from a discussion about making plpgsql keywords unreserved:

https://www.postgresql.org/message-id/5030.1416778830%40sss.pgh.pa.us

AFAICS, the question of conflicts against core-parser keywords didn't
come up at all in that thread, so it's not surprising we didn't consider
that case.  I don't see any indication that we explicitly rejected
doing something for that case.

There's still the question of whether there are any cases where we
*need* to recognize a variable name that's at statement start and
isn't followed by assignment or '['.

            regards, tom lane


I wrote:
>> I wonder whether we could improve matters by adjusting the heuristic for
>> such things in pl_scanner.c:
>> 
>>         * If we are at start of statement, prefer unreserved keywords
>>         * over variable names, unless the next token is assignment or
>>         * '[', in which case prefer variable names.  (Note we need not
>>         * consider '.' as the next token; that case was handled above,
>>         * and we always prefer variable names in that case.)  If we are
>>         * not at start of statement, always prefer variable names over
>>         * unreserved keywords.

[ pokes at that for a bit ]  The logic here is a bit denser than
one could wish, but here's a draft patch that seems to get the
job done.  It passes check-world, which isn't conclusive but
at least suggests that this doesn't break anything.

I'll add this to the next CF in hopes that somebody will try to
break it.

            regards, tom lane

diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c
index 8bacc74..03e7c6c 100644
*** a/src/pl/plpgsql/src/pl_comp.c
--- b/src/pl/plpgsql/src/pl_comp.c
*************** make_datum_param(PLpgSQL_expr *expr, int
*** 1353,1358 ****
--- 1353,1361 ----
   * yytxt is the original token text; we need this to check for quoting,
   * so that later checks for unreserved keywords work properly.
   *
+  * We attempt to recognize the token as a variable only if lookup is true
+  * and the plpgsql_IdentifierLookup context permits it.
+  *
   * If recognized as a variable, fill in *wdatum and return true;
   * if not recognized, fill in *word and return false.
   * (Note: those two pointers actually point to members of the same union,
*************** make_datum_param(PLpgSQL_expr *expr, int
*** 1360,1376 ****
   * ----------
   */
  bool
! plpgsql_parse_word(char *word1, const char *yytxt,
                     PLwdatum *wdatum, PLword *word)
  {
      PLpgSQL_nsitem *ns;

      /*
!      * We should do nothing in DECLARE sections.  In SQL expressions, there's
!      * no need to do anything either --- lookup will happen when the
!      * expression is compiled.
       */
!     if (plpgsql_IdentifierLookup == IDENTIFIER_LOOKUP_NORMAL)
      {
          /*
           * Do a lookup in the current namespace stack
--- 1363,1379 ----
   * ----------
   */
  bool
! plpgsql_parse_word(char *word1, const char *yytxt, bool lookup,
                     PLwdatum *wdatum, PLword *word)
  {
      PLpgSQL_nsitem *ns;

      /*
!      * We should not lookup variables in DECLARE sections.  In SQL
!      * expressions, there's no need to do so either --- lookup will happen
!      * when the expression is compiled.
       */
!     if (lookup && plpgsql_IdentifierLookup == IDENTIFIER_LOOKUP_NORMAL)
      {
          /*
           * Do a lookup in the current namespace stack
diff --git a/src/pl/plpgsql/src/pl_scanner.c b/src/pl/plpgsql/src/pl_scanner.c
index ab18946..3e746d1 100644
*** a/src/pl/plpgsql/src/pl_scanner.c
--- b/src/pl/plpgsql/src/pl_scanner.c
*************** plpgsql_yylex(void)
*** 328,333 ****
--- 328,334 ----
                  push_back_token(tok2, &aux2);
                  if (plpgsql_parse_word(aux1.lval.str,
                                         core_yy.scanbuf + aux1.lloc,
+                                        true,
                                         &aux1.lval.wdatum,
                                         &aux1.lval.word))
                      tok1 = T_DATUM;
*************** plpgsql_yylex(void)
*** 349,401 ****
              push_back_token(tok2, &aux2);

              /*
!              * If we are at start of statement, prefer unreserved keywords
!              * over variable names, unless the next token is assignment or
!              * '[', in which case prefer variable names.  (Note we need not
!              * consider '.' as the next token; that case was handled above,
!              * and we always prefer variable names in that case.)  If we are
!              * not at start of statement, always prefer variable names over
!              * unreserved keywords.
               */
!             if (AT_STMT_START(plpgsql_yytoken) &&
!                 !(tok2 == '=' || tok2 == COLON_EQUALS || tok2 == '['))
              {
!                 /* try for unreserved keyword, then for variable name */
!                 if (core_yy.scanbuf[aux1.lloc] != '"' &&
!                     (kw = ScanKeywordLookup(aux1.lval.str,
!                                             unreserved_keywords,
!                                             num_unreserved_keywords)))
!                 {
!                     aux1.lval.keyword = kw->name;
!                     tok1 = kw->value;
!                 }
!                 else if (plpgsql_parse_word(aux1.lval.str,
!                                             core_yy.scanbuf + aux1.lloc,
!                                             &aux1.lval.wdatum,
!                                             &aux1.lval.word))
!                     tok1 = T_DATUM;
!                 else
!                     tok1 = T_WORD;
              }
              else
!             {
!                 /* try for variable name, then for unreserved keyword */
!                 if (plpgsql_parse_word(aux1.lval.str,
!                                        core_yy.scanbuf + aux1.lloc,
!                                        &aux1.lval.wdatum,
!                                        &aux1.lval.word))
!                     tok1 = T_DATUM;
!                 else if (!aux1.lval.word.quoted &&
!                          (kw = ScanKeywordLookup(aux1.lval.word.ident,
!                                                  unreserved_keywords,
!                                                  num_unreserved_keywords)))
!                 {
!                     aux1.lval.keyword = kw->name;
!                     tok1 = kw->value;
!                 }
!                 else
!                     tok1 = T_WORD;
!             }
          }
      }
      else
--- 350,389 ----
              push_back_token(tok2, &aux2);

              /*
!              * See if it matches a variable name, except in the context where
!              * we are at start of statement and the next token isn't
!              * assignment or '['.  In that case, it couldn't validly be a
!              * variable name, and skipping the lookup allows variable names to
!              * be used that would conflict with plpgsql or core keywords that
!              * introduce statements (e.g., "comment").  Without this special
!              * logic, every statement-introducing keyword would effectively be
!              * reserved in PL/pgSQL, which would be unpleasant.
!              *
!              * If it isn't a variable name, try to match against unreserved
!              * plpgsql keywords.  If not one of those either, it's T_WORD.
!              *
!              * Note: we must call plpgsql_parse_word even if we don't want to
!              * do variable lookup, because it sets up aux1.lval.word for the
!              * non-variable cases.
               */
!             if (plpgsql_parse_word(aux1.lval.str,
!                                    core_yy.scanbuf + aux1.lloc,
!                                    (!AT_STMT_START(plpgsql_yytoken) ||
!                                     (tok2 == '=' || tok2 == COLON_EQUALS ||
!                                      tok2 == '[')),
!                                    &aux1.lval.wdatum,
!                                    &aux1.lval.word))
!                 tok1 = T_DATUM;
!             else if (!aux1.lval.word.quoted &&
!                      (kw = ScanKeywordLookup(aux1.lval.word.ident,
!                                              unreserved_keywords,
!                                              num_unreserved_keywords)))
              {
!                 aux1.lval.keyword = kw->name;
!                 tok1 = kw->value;
              }
              else
!                 tok1 = T_WORD;
          }
      }
      else
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index 42177cc..962b9c2 100644
*** a/src/pl/plpgsql/src/plpgsql.h
--- b/src/pl/plpgsql/src/plpgsql.h
*************** extern PLpgSQL_function *plpgsql_compile
*** 1175,1181 ****
  extern PLpgSQL_function *plpgsql_compile_inline(char *proc_source);
  extern void plpgsql_parser_setup(struct ParseState *pstate,
                       PLpgSQL_expr *expr);
! extern bool plpgsql_parse_word(char *word1, const char *yytxt,
                     PLwdatum *wdatum, PLword *word);
  extern bool plpgsql_parse_dblword(char *word1, char *word2,
                        PLwdatum *wdatum, PLcword *cword);
--- 1175,1181 ----
  extern PLpgSQL_function *plpgsql_compile_inline(char *proc_source);
  extern void plpgsql_parser_setup(struct ParseState *pstate,
                       PLpgSQL_expr *expr);
! extern bool plpgsql_parse_word(char *word1, const char *yytxt, bool lookup,
                     PLwdatum *wdatum, PLword *word);
  extern bool plpgsql_parse_dblword(char *word1, char *word2,
                        PLwdatum *wdatum, PLcword *cword);
diff --git a/src/test/regress/expected/plpgsql.out b/src/test/regress/expected/plpgsql.out
index b9a987d..d2f68e1 100644
*** a/src/test/regress/expected/plpgsql.out
--- b/src/test/regress/expected/plpgsql.out
*************** select unreserved_test();
*** 4781,4786 ****
--- 4781,4807 ----
                43
  (1 row)

+ create or replace function unreserved_test() returns int as $$
+ declare
+   comment int := 21;
+ begin
+   comment := comment * 2;
+   comment on function unreserved_test() is 'this is a test';
+   return comment;
+ end
+ $$ language plpgsql;
+ select unreserved_test();
+  unreserved_test
+ -----------------
+               42
+ (1 row)
+
+ select obj_description('unreserved_test()'::regprocedure, 'pg_proc');
+  obj_description
+ -----------------
+  this is a test
+ (1 row)
+
  drop function unreserved_test();
  --
  -- Test FOREACH over arrays
diff --git a/src/test/regress/sql/plpgsql.sql b/src/test/regress/sql/plpgsql.sql
index 01239e2..9c8cf75 100644
*** a/src/test/regress/sql/plpgsql.sql
--- b/src/test/regress/sql/plpgsql.sql
*************** $$ language plpgsql;
*** 3892,3897 ****
--- 3892,3911 ----

  select unreserved_test();

+ create or replace function unreserved_test() returns int as $$
+ declare
+   comment int := 21;
+ begin
+   comment := comment * 2;
+   comment on function unreserved_test() is 'this is a test';
+   return comment;
+ end
+ $$ language plpgsql;
+
+ select unreserved_test();
+
+ select obj_description('unreserved_test()'::regprocedure, 'pg_proc');
+
  drop function unreserved_test();

  --

On 17/12/2018 23:05, Tom Lane wrote:
> [ pokes at that for a bit ]  The logic here is a bit denser than
> one could wish, but here's a draft patch that seems to get the
> job done.  It passes check-world, which isn't conclusive but
> at least suggests that this doesn't break anything.

This looks good to me.  I can't think of any other syntax scenarios that
it needs to address.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 17/12/2018 23:05, Tom Lane wrote:
>> [ pokes at that for a bit ]  The logic here is a bit denser than
>> one could wish, but here's a draft patch that seems to get the
>> job done.  It passes check-world, which isn't conclusive but
>> at least suggests that this doesn't break anything.

> This looks good to me.  I can't think of any other syntax scenarios that
> it needs to address.

Pushed, thanks for checking!

            regards, tom lane