Re: Replacing plpgsql's lexer - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Replacing plpgsql's lexer
Date
Msg-id 15478.1240159364@sss.pgh.pa.us
Whole thread Raw
In response to Re: Replacing plpgsql's lexer  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Replacing plpgsql's lexer  (Greg Stark <stark@enterprisedb.com>)
Re: Replacing plpgsql's lexer  (Grzegorz Jaskiewicz <gj@pointblue.com.pl>)
List pgsql-hackers
I wrote:
> So I think we are down to a choice of doing nothing for 8.4, or teaching
> the existing plpgsql lexer about standard_conforming_strings.  Assuming
> the current proposal for U& literals holds up, it should not be
> necessary for plpgsql to know about those explicitly as long as it obeys
> standard_conforming_strings, so this might not be too horrid a project.
> I'll take a look at that next.

The attached proposed patch rips out plpgsql's handling of comments and
string literals, and puts in scanner rules that are extracted from the
core lexer (but simplified in a few places where we don't need all the
complexity).  The net user-visible effects should be:

* Both regular and E'' literals should now be parsed exactly the same
as the core does it.

* Nested slash-star comments are now handled properly.

* Warnings and errors associated with string parsing should now match
the core, which means they might vary a bit from previous plpgsql
behavior.

I need to test this a bit more, and it could probably do with adding
a few regression test cases, but I think it's code-complete.

Comments?

            regards, tom lane

PS: in passing I got rid of the scanner_functype/scanner_typereported
kluge, which might once have had some purpose but now is just cluttering
both the scanner and the grammar.  This is a leftover from my failed
attempt at removing the scanner altogether.  Since it simplifies the
code I thought I'd keep it.

Index: src/pl/plpgsql/src/gram.y
===================================================================
RCS file: /cvsroot/pgsql/src/pl/plpgsql/src/gram.y,v
retrieving revision 1.121
diff -c -r1.121 gram.y
*** src/pl/plpgsql/src/gram.y    18 Feb 2009 11:33:04 -0000    1.121
--- src/pl/plpgsql/src/gram.y    19 Apr 2009 16:28:04 -0000
***************
*** 62,67 ****
--- 62,69 ----
                                             int lineno);
  static    void             check_sql_expr(const char *stmt);
  static    void             plpgsql_sql_error_callback(void *arg);
+ static    char            *parse_string_token(const char *token);
+ static    void             plpgsql_string_error_callback(void *arg);
  static    char            *check_label(const char *yytxt);
  static    void             check_labels(const char *start_label,
                                        const char *end_label);
***************
*** 228,235 ****
          /*
           * Other tokens
           */
- %token    T_FUNCTION
- %token    T_TRIGGER
  %token    T_STRING
  %token    T_NUMBER
  %token    T_SCALAR                /* a VAR, RECFIELD, or TRIGARG */
--- 230,235 ----
***************
*** 244,256 ****

  %%

! pl_function        : T_FUNCTION comp_optsect pl_block opt_semi
                      {
!                         yylval.program = (PLpgSQL_stmt_block *)$3;
!                     }
!                 | T_TRIGGER comp_optsect pl_block opt_semi
!                     {
!                         yylval.program = (PLpgSQL_stmt_block *)$3;
                      }
                  ;

--- 244,252 ----

  %%

! pl_function        : comp_optsect pl_block opt_semi
                      {
!                         yylval.program = (PLpgSQL_stmt_block *) $2;
                      }
                  ;

***************
*** 1403,1409 ****
                              if (tok == T_STRING)
                              {
                                  /* old style message and parameters */
!                                 new->message = plpgsql_get_string_value();
                                  /*
                                   * We expect either a semi-colon, which
                                   * indicates no parameters, or a comma that
--- 1399,1405 ----
                              if (tok == T_STRING)
                              {
                                  /* old style message and parameters */
!                                 new->message = parse_string_token(yytext);
                                  /*
                                   * We expect either a semi-colon, which
                                   * indicates no parameters, or a comma that
***************
*** 1435,1441 ****

                                      if (yylex() != T_STRING)
                                          yyerror("syntax error");
!                                     sqlstatestr = plpgsql_get_string_value();

                                      if (strlen(sqlstatestr) != 5)
                                          yyerror("invalid SQLSTATE code");
--- 1431,1437 ----

                                      if (yylex() != T_STRING)
                                          yyerror("syntax error");
!                                     sqlstatestr = parse_string_token(yytext);

                                      if (strlen(sqlstatestr) != 5)
                                          yyerror("invalid SQLSTATE code");
***************
*** 1778,1784 ****
                              /* next token should be a string literal */
                              if (yylex() != T_STRING)
                                  yyerror("syntax error");
!                             sqlstatestr = plpgsql_get_string_value();

                              if (strlen(sqlstatestr) != 5)
                                  yyerror("invalid SQLSTATE code");
--- 1774,1780 ----
                              /* next token should be a string literal */
                              if (yylex() != T_STRING)
                                  yyerror("syntax error");
!                             sqlstatestr = parse_string_token(yytext);

                              if (strlen(sqlstatestr) != 5)
                                  yyerror("invalid SQLSTATE code");
***************
*** 2738,2743 ****
--- 2734,2782 ----
      errposition(0);
  }

+ /*
+  * Convert a string-literal token to the represented string value.
+  *
+  * To do this, we need to invoke the core lexer.  To avoid confusion between
+  * the core bison/flex definitions and our own, the actual invocation is in
+  * pl_funcs.c.  Here we are only concerned with setting up the right errcontext
+  * state, which is handled the same as in check_sql_expr().
+  */
+ static char *
+ parse_string_token(const char *token)
+ {
+     char       *result;
+     ErrorContextCallback  syntax_errcontext;
+     ErrorContextCallback *previous_errcontext;
+
+     /* See comments in check_sql_expr() */
+     Assert(error_context_stack->callback == plpgsql_compile_error_callback);
+
+     previous_errcontext = error_context_stack;
+     syntax_errcontext.callback = plpgsql_string_error_callback;
+     syntax_errcontext.arg = (char *) token;
+     syntax_errcontext.previous = error_context_stack->previous;
+     error_context_stack = &syntax_errcontext;
+
+     result = plpgsql_parse_string_token(token);
+
+     /* Restore former ereport callback */
+     error_context_stack = previous_errcontext;
+
+     return result;
+ }
+
+ static void
+ plpgsql_string_error_callback(void *arg)
+ {
+     Assert(plpgsql_error_funcname);
+
+     errcontext("String literal in PL/PgSQL function \"%s\" near line %d",
+                plpgsql_error_funcname, plpgsql_error_lineno);
+     /* representing the string literal as internalquery seems overkill */
+     errposition(0);
+ }
+
  static char *
  check_label(const char *yytxt)
  {
Index: src/pl/plpgsql/src/pl_comp.c
===================================================================
RCS file: /cvsroot/pgsql/src/pl/plpgsql/src/pl_comp.c,v
retrieving revision 1.134
diff -c -r1.134 pl_comp.c
*** src/pl/plpgsql/src/pl_comp.c    18 Feb 2009 11:33:04 -0000    1.134
--- src/pl/plpgsql/src/pl_comp.c    19 Apr 2009 16:28:04 -0000
***************
*** 261,267 ****
             bool forValidator)
  {
      Form_pg_proc procStruct = (Form_pg_proc) GETSTRUCT(procTup);
!     int            functype = CALLED_AS_TRIGGER(fcinfo) ? T_TRIGGER : T_FUNCTION;
      Datum        prosrcdatum;
      bool        isnull;
      char       *proc_source;
--- 261,267 ----
             bool forValidator)
  {
      Form_pg_proc procStruct = (Form_pg_proc) GETSTRUCT(procTup);
!     bool        is_trigger = CALLED_AS_TRIGGER(fcinfo);
      Datum        prosrcdatum;
      bool        isnull;
      char       *proc_source;
***************
*** 293,299 ****
      if (isnull)
          elog(ERROR, "null prosrc");
      proc_source = TextDatumGetCString(prosrcdatum);
!     plpgsql_scanner_init(proc_source, functype);

      plpgsql_error_funcname = pstrdup(NameStr(procStruct->proname));
      plpgsql_error_lineno = 0;
--- 293,299 ----
      if (isnull)
          elog(ERROR, "null prosrc");
      proc_source = TextDatumGetCString(prosrcdatum);
!     plpgsql_scanner_init(proc_source);

      plpgsql_error_funcname = pstrdup(NameStr(procStruct->proname));
      plpgsql_error_lineno = 0;
***************
*** 359,371 ****
      function->fn_oid = fcinfo->flinfo->fn_oid;
      function->fn_xmin = HeapTupleHeaderGetXmin(procTup->t_data);
      function->fn_tid = procTup->t_self;
!     function->fn_functype = functype;
      function->fn_cxt = func_cxt;
      function->out_param_varno = -1;        /* set up for no OUT param */

!     switch (functype)
      {
!         case T_FUNCTION:

              /*
               * Fetch info about the procedure's parameters. Allocations aren't
--- 359,371 ----
      function->fn_oid = fcinfo->flinfo->fn_oid;
      function->fn_xmin = HeapTupleHeaderGetXmin(procTup->t_data);
      function->fn_tid = procTup->t_self;
!     function->fn_is_trigger = is_trigger;
      function->fn_cxt = func_cxt;
      function->out_param_varno = -1;        /* set up for no OUT param */

!     switch (is_trigger)
      {
!         case false:

              /*
               * Fetch info about the procedure's parameters. Allocations aren't
***************
*** 564,570 ****
              ReleaseSysCache(typeTup);
              break;

!         case T_TRIGGER:
              /* Trigger procedure's return type is unknown yet */
              function->fn_rettype = InvalidOid;
              function->fn_retbyval = false;
--- 564,570 ----
              ReleaseSysCache(typeTup);
              break;

!         case true:
              /* Trigger procedure's return type is unknown yet */
              function->fn_rettype = InvalidOid;
              function->fn_retbyval = false;
***************
*** 645,651 ****
              break;

          default:
!             elog(ERROR, "unrecognized function typecode: %u", functype);
              break;
      }

--- 645,651 ----
              break;

          default:
!             elog(ERROR, "unrecognized function typecode: %d", (int) is_trigger);
              break;
      }

***************
*** 790,796 ****
       * Recognize tg_argv when compiling triggers
       * (XXX this sucks, it should be a regular variable in the namestack)
       */
!     if (plpgsql_curr_compile->fn_functype == T_TRIGGER)
      {
          if (strcmp(cp[0], "tg_argv") == 0)
          {
--- 790,796 ----
       * Recognize tg_argv when compiling triggers
       * (XXX this sucks, it should be a regular variable in the namestack)
       */
!     if (plpgsql_curr_compile->fn_is_trigger)
      {
          if (strcmp(cp[0], "tg_argv") == 0)
          {
Index: src/pl/plpgsql/src/pl_funcs.c
===================================================================
RCS file: /cvsroot/pgsql/src/pl/plpgsql/src/pl_funcs.c,v
retrieving revision 1.76
diff -c -r1.76 pl_funcs.c
*** src/pl/plpgsql/src/pl_funcs.c    18 Feb 2009 11:33:04 -0000    1.76
--- src/pl/plpgsql/src/pl_funcs.c    19 Apr 2009 16:28:04 -0000
***************
*** 17,22 ****
--- 17,24 ----

  #include <ctype.h>

+ #include "parser/gramparse.h"
+ #include "parser/gram.h"
  #include "parser/scansup.h"


***************
*** 460,465 ****
--- 462,502 ----


  /*
+  * plpgsql_parse_string_token - get the value represented by a string literal
+  *
+  * We do not make plpgsql's lexer produce the represented value, because
+  * in many cases we don't need it.  Instead this function is invoked when
+  * we do need it.  The input is the T_STRING token as identified by the lexer.
+  *
+  * The result is a palloc'd string.
+  *
+  * Note: this is called only from plpgsql's gram.y, but we can't just put it
+  * there because including parser/gram.h there would cause confusion.
+  */
+ char *
+ plpgsql_parse_string_token(const char *token)
+ {
+     int        ctoken;
+
+     /*
+      * We use the core lexer to do the dirty work.  Aside from getting the
+      * right results for escape sequences and so on, this helps us produce
+      * appropriate warnings for escape_string_warning etc.
+      */
+     scanner_init(token);
+
+     ctoken = base_yylex();
+
+     if (ctoken != SCONST)
+         elog(ERROR, "unexpected result from base lexer: %d", ctoken);
+
+     scanner_finish();
+
+     return base_yylval.str;
+ }
+
+
+ /*
   * Statement type as a string, for use in error messages etc.
   */
  const char *
Index: src/pl/plpgsql/src/plpgsql.h
===================================================================
RCS file: /cvsroot/pgsql/src/pl/plpgsql/src/plpgsql.h,v
retrieving revision 1.110
diff -c -r1.110 plpgsql.h
*** src/pl/plpgsql/src/plpgsql.h    9 Apr 2009 02:57:53 -0000    1.110
--- src/pl/plpgsql/src/plpgsql.h    19 Apr 2009 16:28:04 -0000
***************
*** 650,656 ****
      Oid            fn_oid;
      TransactionId fn_xmin;
      ItemPointerData fn_tid;
!     int            fn_functype;
      PLpgSQL_func_hashkey *fn_hashkey;    /* back-link to hashtable key */
      MemoryContext fn_cxt;

--- 650,656 ----
      Oid            fn_oid;
      TransactionId fn_xmin;
      ItemPointerData fn_tid;
!     bool        fn_is_trigger;
      PLpgSQL_func_hashkey *fn_hashkey;    /* back-link to hashtable key */
      MemoryContext fn_cxt;

***************
*** 880,885 ****
--- 880,886 ----
   * ----------
   */
  extern void plpgsql_convert_ident(const char *s, char **output, int numidents);
+ extern char *plpgsql_parse_string_token(const char *token);
  extern const char *plpgsql_stmt_typename(PLpgSQL_stmt *stmt);
  extern void plpgsql_dumptree(PLpgSQL_function *func);

***************
*** 894,901 ****
  extern void plpgsql_push_back_token(int token);
  extern void plpgsql_yyerror(const char *message);
  extern int    plpgsql_scanner_lineno(void);
! extern void plpgsql_scanner_init(const char *str, int functype);
  extern void plpgsql_scanner_finish(void);
- extern char *plpgsql_get_string_value(void);

  #endif   /* PLPGSQL_H */
--- 895,901 ----
  extern void plpgsql_push_back_token(int token);
  extern void plpgsql_yyerror(const char *message);
  extern int    plpgsql_scanner_lineno(void);
! extern void plpgsql_scanner_init(const char *str);
  extern void plpgsql_scanner_finish(void);

  #endif   /* PLPGSQL_H */
Index: src/pl/plpgsql/src/scan.l
===================================================================
RCS file: /cvsroot/pgsql/src/pl/plpgsql/src/scan.l,v
retrieving revision 1.67
diff -c -r1.67 scan.l
*** src/pl/plpgsql/src/scan.l    18 Feb 2009 11:33:04 -0000    1.67
--- src/pl/plpgsql/src/scan.l    19 Apr 2009 16:28:04 -0000
***************
*** 19,45 ****
  #include "mb/pg_wchar.h"


- /* No reason to constrain amount of data slurped */
- #define YY_READ_BUF_SIZE 16777216
-
  /* Avoid exit() on fatal scanner errors (a bit ugly -- see yy_fatal_error) */
  #undef fprintf
  #define fprintf(file, fmt, msg)  ereport(ERROR, (errmsg_internal("%s", msg)))

  /* Handles to the buffer that the lexer uses internally */
  static YY_BUFFER_STATE scanbufhandle;
  static char *scanbuf;

  static const char *scanstr;        /* original input string */

- static int    scanner_functype;
- static bool    scanner_typereported;
  static int    pushback_token;
  static bool have_pushback_token;
  static const char *cur_line_start;
  static int    cur_line_num;
  static char    *dolqstart;      /* current $foo$ quote start string */
! static int    dolqlen;            /* signal to plpgsql_get_string_value */

  bool plpgsql_SpaceScanned = false;
  %}
--- 19,49 ----
  #include "mb/pg_wchar.h"


  /* Avoid exit() on fatal scanner errors (a bit ugly -- see yy_fatal_error) */
  #undef fprintf
  #define fprintf(file, fmt, msg)  ereport(ERROR, (errmsg_internal("%s", msg)))

+ /*
+  * When we parse a token that requires multiple lexer rules to process,
+  * remember the token's starting position this way.
+  */
+ #define SAVE_TOKEN_START()  \
+     ( start_lineno = plpgsql_scanner_lineno(), start_charpos = yytext )
+
  /* Handles to the buffer that the lexer uses internally */
  static YY_BUFFER_STATE scanbufhandle;
  static char *scanbuf;

  static const char *scanstr;        /* original input string */

  static int    pushback_token;
  static bool have_pushback_token;
  static const char *cur_line_start;
  static int    cur_line_num;
+ static int        xcdepth = 0;    /* depth of nesting in slash-star comments */
  static char    *dolqstart;      /* current $foo$ quote start string */
!
! extern bool        standard_conforming_strings;

  bool plpgsql_SpaceScanned = false;
  %}
***************
*** 54,84 ****

  %option case-insensitive


! %x    IN_STRING
! %x    IN_COMMENT
! %x    IN_DOLLARQUOTE

  digit            [0-9]
  ident_start        [A-Za-z\200-\377_]
  ident_cont        [A-Za-z\200-\377_0-9\$]

  quoted_ident    (\"[^\"]*\")+

  identifier        ({ident_start}{ident_cont}*|{quoted_ident})

  param            \${digit}+

- space            [ \t\n\r\f]
-
- /* $foo$ style quotes ("dollar quoting")
-  * copied straight from the backend SQL parser
-  */
- dolq_start        [A-Za-z\200-\377_]
- dolq_cont        [A-Za-z\200-\377_0-9]
- dolqdelim        \$({dolq_start}{dolq_cont}*)?\$
- dolqinside        [^$]+
-
  %%
      /* ----------
       * Local variables in scanner to remember where
--- 58,130 ----

  %option case-insensitive

+ /*
+  * Exclusive states are a subset of the core lexer's:
+  *  <xc> extended C-style comments
+  *  <xq> standard quoted strings
+  *  <xe> extended quoted strings (support backslash escape sequences)
+  *  <xdolq> $foo$ quoted strings
+  */
+
+ %x xc
+ %x xe
+ %x xq
+ %x xdolq
+
+ /*
+  * Definitions --- these generally must match the core lexer, but in some
+  * cases we can simplify, since we only care about identifying the token
+  * boundaries and not about deriving the represented value.  Also, we
+  * aren't trying to lex multicharacter operators so their interactions
+  * with comments go away.
+  */
+
+ space            [ \t\n\r\f]
+ horiz_space        [ \t\f]
+ newline            [\n\r]
+ non_newline        [^\n\r]
+
+ comment            ("--"{non_newline}*)
+
+ whitespace        ({space}+|{comment})
+ special_whitespace        ({space}+|{comment}{newline})
+ horiz_whitespace        ({horiz_space}|{comment})
+ whitespace_with_newline    ({horiz_whitespace}*{newline}{special_whitespace}*)
+
+ quote            '
+ quotestop        {quote}{whitespace}*
+ quotecontinue    {quote}{whitespace_with_newline}{quote}
+ quotefail        {quote}{whitespace}*"-"
+
+ xestart            [eE]{quote}
+ xeinside        [^\\']+
+ xeescape        [\\].
+
+ xqstart            {quote}
+ xqdouble        {quote}{quote}
+ xqinside        [^']+
+
+ dolq_start        [A-Za-z\200-\377_]
+ dolq_cont        [A-Za-z\200-\377_0-9]
+ dolqdelim        \$({dolq_start}{dolq_cont}*)?\$
+ dolqfailed        \${dolq_start}{dolq_cont}*
+ dolqinside        [^$]+

! xcstart            \/\*
! xcstop            \*+\/
! xcinside        [^*/]+

  digit            [0-9]
  ident_start        [A-Za-z\200-\377_]
  ident_cont        [A-Za-z\200-\377_0-9\$]

+ /* This is a simpler treatment of quoted identifiers than the core uses */
  quoted_ident    (\"[^\"]*\")+

  identifier        ({ident_start}{ident_cont}*|{quoted_ident})

  param            \${digit}+

  %%
      /* ----------
       * Local variables in scanner to remember where
***************
*** 96,112 ****
      plpgsql_SpaceScanned = false;

      /* ----------
-      * On the first call to a new source report the
-      * function's type (T_FUNCTION or T_TRIGGER)
-      * ----------
-      */
-     if (!scanner_typereported)
-     {
-         scanner_typereported = true;
-         return scanner_functype;
-     }
-
-     /* ----------
       * The keyword rules
       * ----------
       */
--- 142,147 ----
***************
*** 225,343 ****

  {digit}+        { return T_NUMBER;            }

! \".                {
!                 plpgsql_error_lineno = plpgsql_scanner_lineno();
!                 ereport(ERROR,
!                         (errcode(ERRCODE_DATATYPE_MISMATCH),
!                          errmsg("unterminated quoted identifier")));
!             }
!
!     /* ----------
!      * Ignore whitespaces but remember this happened
!      * ----------
!      */
! {space}+        { plpgsql_SpaceScanned = true;        }

      /* ----------
!      * Eat up comments
       * ----------
       */
! --[^\r\n]*        ;
!
! \/\*            { start_lineno = plpgsql_scanner_lineno();
!               BEGIN(IN_COMMENT);
!             }
! <IN_COMMENT>\*\/    { BEGIN(INITIAL); plpgsql_SpaceScanned = true; }
! <IN_COMMENT>\n        ;
! <IN_COMMENT>.        ;
! <IN_COMMENT><<EOF>>    {
!                 plpgsql_error_lineno = start_lineno;
!                 ereport(ERROR,
!                         (errcode(ERRCODE_DATATYPE_MISMATCH),
!                          errmsg("unterminated /* comment")));
!             }

      /* ----------
!      * Collect anything inside of ''s and return one STRING token
!      *
!      * Hacking yytext/yyleng here lets us avoid using yymore(), which is
!      * a win for performance.  It's safe because we know the underlying
!      * input buffer is not changing.
       * ----------
       */
! '            {
!               start_lineno = plpgsql_scanner_lineno();
!               start_charpos = yytext;
!               BEGIN(IN_STRING);
!             }
! [eE]'        {
!               /* for now, treat the same as a regular literal */
!               start_lineno = plpgsql_scanner_lineno();
!               start_charpos = yytext;
!               BEGIN(IN_STRING);
!             }
! <IN_STRING>\\.        { }
! <IN_STRING>\\        { /* can only happen with \ at EOF */ }
! <IN_STRING>''        { }
! <IN_STRING>'        {
!               /* tell plpgsql_get_string_value it's not a dollar quote */
!               dolqlen = 0;
!               /* adjust yytext/yyleng to describe whole string token */
!               yyleng += (yytext - start_charpos);
!               yytext = start_charpos;
!               BEGIN(INITIAL);
!               return T_STRING;
!             }
! <IN_STRING>[^'\\]+    { }
! <IN_STRING><<EOF>>    {
!                 plpgsql_error_lineno = start_lineno;
!                 ereport(ERROR,
!                         (errcode(ERRCODE_DATATYPE_MISMATCH),
!                          errmsg("unterminated quoted string")));
!             }
!
! {dolqdelim}        {
!               start_lineno = plpgsql_scanner_lineno();
!               start_charpos = yytext;
!               dolqstart = pstrdup(yytext);
!               BEGIN(IN_DOLLARQUOTE);
!             }
! <IN_DOLLARQUOTE>{dolqdelim} {
!               if (strcmp(yytext, dolqstart) == 0)
!               {
!                     pfree(dolqstart);
!                     /* tell plpgsql_get_string_value it is a dollar quote */
!                     dolqlen = yyleng;
                      /* adjust yytext/yyleng to describe whole string token */
                      yyleng += (yytext - start_charpos);
                      yytext = start_charpos;
-                     BEGIN(INITIAL);
                      return T_STRING;
!               }
!               else
!               {
!                     /*
!                      * When we fail to match $...$ to dolqstart, transfer
!                      * the $... part to the output, but put back the final
!                      * $ for rescanning.  Consider $delim$...$junk$delim$
!                      */
!                     yyless(yyleng-1);
!               }
!             }
! <IN_DOLLARQUOTE>{dolqinside} { }
! <IN_DOLLARQUOTE>.    { /* needed for $ inside the quoted text */ }
! <IN_DOLLARQUOTE><<EOF>>    {
!                 plpgsql_error_lineno = start_lineno;
!                 ereport(ERROR,
!                         (errcode(ERRCODE_DATATYPE_MISMATCH),
!                          errmsg("unterminated dollar-quoted string")));
!             }

      /* ----------
       * Any unmatched character is returned as is
       * ----------
       */
! .            { return yytext[0];            }

  %%

--- 260,393 ----

  {digit}+        { return T_NUMBER;            }

! \".                { yyerror("unterminated quoted identifier"); }

      /* ----------
!      * Ignore whitespace (including comments) but remember this happened
       * ----------
       */
! {whitespace}    { plpgsql_SpaceScanned = true; }

      /* ----------
!      * Comment and literal handling is mostly copied from the core lexer
       * ----------
       */
! {xcstart}        {
!                     /* Set location in case of syntax error in comment */
!                     SAVE_TOKEN_START();
!                     xcdepth = 0;
!                     BEGIN(xc);
!                     plpgsql_SpaceScanned = true;
!                 }
!
! <xc>{xcstart}    {
!                     xcdepth++;
!                 }
!
! <xc>{xcstop}    {
!                     if (xcdepth <= 0)
!                         BEGIN(INITIAL);
!                     else
!                         xcdepth--;
!                 }
!
! <xc>{xcinside}    {
!                     /* ignore */
!                 }
!
! <xc>\/+            {
!                     /* ignore */
!                 }
!
! <xc>\*+            {
!                     /* ignore */
!                 }
!
! <xc><<EOF>>        { yyerror("unterminated /* comment"); }
!
! {xqstart}        {
!                     SAVE_TOKEN_START();
!                     if (standard_conforming_strings)
!                         BEGIN(xq);
!                     else
!                         BEGIN(xe);
!                 }
! {xestart}        {
!                     SAVE_TOKEN_START();
!                     BEGIN(xe);
!                 }
! <xq,xe>{quotestop}    |
! <xq,xe>{quotefail} {
!                     yyless(1);
!                     BEGIN(INITIAL);
                      /* adjust yytext/yyleng to describe whole string token */
                      yyleng += (yytext - start_charpos);
                      yytext = start_charpos;
                      return T_STRING;
!                 }
! <xq,xe>{xqdouble} {
!                 }
! <xq>{xqinside}  {
!                 }
! <xe>{xeinside}  {
!                 }
! <xe>{xeescape}  {
!                 }
! <xq,xe>{quotecontinue} {
!                     /* ignore */
!                 }
! <xe>.            {
!                     /* This is only needed for \ just before EOF */
!                 }
! <xq,xe><<EOF>>        { yyerror("unterminated quoted string"); }
!
! {dolqdelim}        {
!                     SAVE_TOKEN_START();
!                     dolqstart = pstrdup(yytext);
!                     BEGIN(xdolq);
!                 }
! {dolqfailed}    {
!                     /* throw back all but the initial "$" */
!                     yyless(1);
!                     /* and treat it as {other} */
!                     return yytext[0];
!                 }
! <xdolq>{dolqdelim} {
!                     if (strcmp(yytext, dolqstart) == 0)
!                     {
!                         pfree(dolqstart);
!                         BEGIN(INITIAL);
!                         /* adjust yytext/yyleng to describe whole string */
!                         yyleng += (yytext - start_charpos);
!                         yytext = start_charpos;
!                         return T_STRING;
!                     }
!                     else
!                     {
!                         /*
!                          * When we fail to match $...$ to dolqstart, transfer
!                          * the $... part to the output, but put back the final
!                          * $ for rescanning.  Consider $delim$...$junk$delim$
!                          */
!                         yyless(yyleng-1);
!                     }
!                 }
! <xdolq>{dolqinside} {
!                 }
! <xdolq>{dolqfailed} {
!                 }
! <xdolq>.        {
!                     /* This is only needed for $ inside the quoted text */
!                 }
! <xdolq><<EOF>>    { yyerror("unterminated dollar-quoted string"); }

      /* ----------
       * Any unmatched character is returned as is
       * ----------
       */
! .                {
!                     return yytext[0];
!                 }

  %%

***************
*** 437,443 ****
   * to cite in error messages.
   */
  void
! plpgsql_scanner_init(const char *str, int functype)
  {
      Size    slen;

--- 487,493 ----
   * to cite in error messages.
   */
  void
! plpgsql_scanner_init(const char *str)
  {
      Size    slen;

***************
*** 460,468 ****
      /* Other setup */
      scanstr = str;

-     scanner_functype = functype;
-     scanner_typereported = false;
-
      have_pushback_token = false;

      cur_line_start = scanbuf;
--- 510,515 ----
***************
*** 493,569 ****
      yy_delete_buffer(scanbufhandle);
      pfree(scanbuf);
  }
-
- /*
-  * Called after a T_STRING token is read to get the string literal's value
-  * as a palloc'd string.  (We make this a separate call because in many
-  * scenarios there's no need to get the decoded value.)
-  *
-  * Note: we expect the literal to be the most recently lexed token.  This
-  * would not work well if we supported multiple-token pushback or if
-  * plpgsql_yylex() wanted to read ahead beyond a T_STRING token.
-  */
- char *
- plpgsql_get_string_value(void)
- {
-     char       *result;
-     const char *cp;
-     int            len;
-
-     if (dolqlen > 0)
-     {
-         /* Token is a $foo$...$foo$ string */
-         len = yyleng - 2 * dolqlen;
-         Assert(len >= 0);
-         result = (char *) palloc(len + 1);
-         memcpy(result, yytext + dolqlen, len);
-         result[len] = '\0';
-     }
-     else if (*yytext == 'E' || *yytext == 'e')
-     {
-         /* Token is an E'...' string */
-         result = (char *) palloc(yyleng + 1);    /* more than enough room */
-         len = 0;
-         for (cp = yytext + 2; *cp; cp++)
-         {
-             if (*cp == '\'')
-             {
-                 if (cp[1] == '\'')
-                     result[len++] = *cp++;
-                 /* else it must be string end quote */
-             }
-             else if (*cp == '\\')
-             {
-                 if (cp[1] != '\0')    /* just a paranoid check */
-                     result[len++] = *(++cp);
-             }
-             else
-                 result[len++] = *cp;
-         }
-         result[len] = '\0';
-     }
-     else
-     {
-         /* Token is a '...' string */
-         result = (char *) palloc(yyleng + 1);    /* more than enough room */
-         len = 0;
-         for (cp = yytext + 1; *cp; cp++)
-         {
-             if (*cp == '\'')
-             {
-                 if (cp[1] == '\'')
-                     result[len++] = *cp++;
-                 /* else it must be string end quote */
-             }
-             else if (*cp == '\\')
-             {
-                 if (cp[1] != '\0')    /* just a paranoid check */
-                     result[len++] = *(++cp);
-             }
-             else
-                 result[len++] = *cp;
-         }
-         result[len] = '\0';
-     }
-     return result;
- }
--- 540,542 ----

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Why isn't stats_temp_directory automatically created?
Next
From: David Fetter
Date:
Subject: Duplicate code in psql's \ commands