Thread: syntax error position "CREATE FUNCTION" bug fix

syntax error position "CREATE FUNCTION" bug fix

From
Fabien COELHO
Date:
Dear patchers,

Please find attached a patch to fix the "CREATE FUNCTION"  syntax error
position bug I reported a few days ago.

As the exact query being processed in only known to the backend (it may be
the initial query, it may be a subset of the initial query, it may be some
generated query?), the offending string is returned with the error
position, which is expressed with respect to this query (that has always
been the case by the way).

In order to do that, I did the following:

1. appended a new "query" field into the ErrorData structure,
   which is set with an added errquery function.

2. modified the error reporting part of the protocol (version 3).
   As the protocol implementation in libpq is fuzzy enough, there is
   not fix on the client reception part, only the server sending
   part is modified with a new field for the query (Q). Thus this
   addition should not harm old clients.

3. fixed yyerror to return the processed query on errors.
   a copy of the buffer is needed as the scanner buffer is modified
   and the convention about buffer termination are not the same.

4. fixed the psql position reporting code to use this reported query
   instead of the one it sent. Tom's quick fix around the problem is removed.

5. updated comments where it seemd appropriate in the code.

6. finally updated the protocol documentation for the error reporting
   part by adding the Q field and fixing the P field.

I dit it like that because I don't think it is elegantly feasible to
update the position to get back to the initial query, as escapes may have
been processed within the string, so a simple offset would not fix the
bug.

It validates for me.

Have a nice day,

--
Fabien Coelho - coelho@cri.ensmp.fr

Attachment

Re: syntax error position "CREATE FUNCTION" bug fix

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
> Please find attached a patch to fix the "CREATE FUNCTION"  syntax error
> position bug I reported a few days ago.

This strikes me as much too intrusive to be worthwhile ... the problem
is simply not important enough to justify a protocol change.  Moreover,
I don't like the proposed protocol change anyway.  This approach only
solves the problem for psql-like error reporting, namely clients that
are going to regurgitate a string in their error message and aren't
very picky about whether that string really matches the original input.
One of the design goals for the error reporting feature is that GUI-type
clients should be able to mark syntax error positions by highlighting in
the original input window.  This proposal abandons that goal.

            regards, tom lane

Re: syntax error position "CREATE FUNCTION" bug fix

From
Fabien COELHO
Date:
Dear Tom,

> > Please find attached a patch to fix the "CREATE FUNCTION"  syntax error
> > position bug I reported a few days ago.
>
> This strikes me as much too intrusive to be worthwhile ... the problem
> is simply not important enough to justify a protocol change.

Then I can suggest to happend the string after the position.
NO protocol change, but the convention that the string is shown after
the position. Small difference, I sent a proposal later.

> Moreover, I don't like the proposed protocol change anyway.  This
> approach only solves the problem for psql-like error reporting, namely
> clients that are going to regurgitate a string in their error message
> and aren't very picky about whether that string really matches the
> original input. One of the design goals for the error reporting feature
> is that GUI-type clients should be able to mark syntax error positions
> by highlighting in the original input window.  This proposal abandons
> that goal.

I cannot see your point.

Any GUI can take advantage of the returned string to show it in a window
with fancy colors and do any hilighting around the position.

I've implemented the stuff for psql (with your help btw), but I cannot see
why it couldn't be used with other interfaces?

The issue is that the error position is not enough in some cases.
I proposed the only possible fix for that, which is to provide the
missing information to the client, which will do something or nothing
about it. The current status is that the information is not available,
so nothing can be done.

Moreover, I had problems with syntax errors in string-embedded queries in
the past, and this is trickier to solve, so I don't see why the error
position should not be fixed for those errors.

So my point is that I agree that the protocole should not be changed, but
I disagree that the "bug" should remain as it is because of some
principles.

Have a nice day,

--
Fabien Coelho - coelho@cri.ensmp.fr

Re: syntax error position "CREATE FUNCTION" bug fix

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> Moreover, I don't like the proposed protocol change anyway.  This
>> approach only solves the problem for psql-like error reporting, namely
>> clients that are going to regurgitate a string in their error message
>> and aren't very picky about whether that string really matches the
>> original input. One of the design goals for the error reporting feature
>> is that GUI-type clients should be able to mark syntax error positions
>> by highlighting in the original input window.  This proposal abandons
>> that goal.

> I cannot see your point.

> Any GUI can take advantage of the returned string to show it in a window
> with fancy colors and do any hilighting around the position.

But it cannot (easily) match it up with the *original input* and put the
cursor in the right place in the *input* window.  You are envisioning
something that's no better than psql with window borders.  My idea of a
GUI syntax error report is something that puts my editing cursor in the
right place.

> I've implemented the stuff for psql (with your help btw), but I cannot see
> why it couldn't be used with other interfaces?

It's not the right thing for other interfaces.  If it were the right
thing for all interfaces, we'd have put the functionality in the backend
to begin with.

            regards, tom lane

Re: syntax error position "CREATE FUNCTION" bug fix

From
Fabien COELHO
Date:
Dear Tom,

> > Any GUI can take advantage of the returned string to show it in a window
> > with fancy colors and do any hilighting around the position.
>
> But it cannot (easily) match it up with the *original input*

Sure. Even the parser in the backend cannot do it, that's the problem!;-)

> and put the cursor in the right place in the *input* window.  You are
> envisioning something that's no better than psql with window borders.

I try to envision what is achievable with a reasonnable effort;-)

If I read you correctly, it is all interfaces or none... As a mostly;
psql user, I'm not lucky;-)

I don't think it is really easy to compute the good position wrt to the
original query if you want to keep into account escapes that are eaten by
the first parsing. I can provide a fix that would catch simple cases,
but not all of them.

Would you accept a "it works sometimes, but it may be wrong others" hack?

> My idea of a GUI syntax error report is something that puts my editing
> cursor in the right place.

Thus you decided that you prefer that NO interface should be able to show
the correct position, rather than having at least one to do it, and other
being able to display something, because you decided that the only place
to show something in a GUI is in the initial window or never. You don't
like dialog box, I guess;-)

Have a nice day,

--
Fabien Coelho - coelho@cri.ensmp.fr

Re: syntax error position "CREATE FUNCTION" bug fix

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> But it cannot (easily) match it up with the *original input*

> Sure. Even the parser in the backend cannot do it, that's the problem!;-)

We *can* do it, it's just a matter of the pain level.  For instance we
could have the main lexer save aside an indication of the number of
quoting characters it discarded at each position.  The problem in my
mind is to not put a heavy overhead on the main lexing path for
information that will only get used in an error case.

I was toying with the idea of having the CREATE FUNCTION error callback
go back to the original command string (which I believe is now always
saved in the Portal, and certainly could be if it isn't) and re-parse
it using a slower variant of the lexer that keeps this info.  Not sure
what all is involved in that approach, but in principle this would let
us meet the original design goal without any overhead except in the
error case.

> Would you accept a "it works sometimes, but it may be wrong others" hack?

I think it'd be okay if it gets the common cases right and doesn't
generate lies in the cases it can't get right.

Also, I think it might be an acceptable compromise to give the position
correctly only when the function body is quoted using dollar-quoting.
(Everybody's gonna be using dollar quoting to write their functions in
7.5, right? ;-))  In that case it is trivial to convert the "inside"
syntax error offset to an offset in the original string literal, and you
only have to know the position of that literal in the original command
to finish up.

BTW, it's worth pointing out that the client will have to special-case
the situation where a CONTEXT entry is present anyhow, since that most
likely means that the syntax error is inside some function called by the
query, and not in the query itself.  So the hack in psql doesn't go away
in any case; all that we do differently is not send CONTEXT from the SQL
function parse error callback if we are able to convert the syntax error
offset to something relative to the original command.  (One could
envision a really smart client pulling back the text of the function
identified by the topmost CONTEXT entry and putting the cursor on that
--- of course this would have to be in a popup window not the original
query, but it's doable in principle.)

>> My idea of a GUI syntax error report is something that puts my editing
>> cursor in the right place.

> Thus you decided that you prefer that NO interface should be able to show
> the correct position, rather than having at least one to do it, and other
> being able to display something, because you decided that the only place
> to show something in a GUI is in the initial window or never.

No, I say that we shouldn't put in a kluge that gets it sort-of right in
a simple interface and makes it impossible for better interfaces to get
it really right.  I think we can do better than that.

            regards, tom lane

Re: syntax error position "CREATE FUNCTION" bug fix

From
Tom Lane
Date:
Attached is a proposed patch that fixes the
cursor-position-in-CREATE-FUNCTION issue per my earlier suggestion.

Since I complained that your proposal was too grotty, it's only fair to
throw this out to let people take potshots at it ;-).  The main
grottiness here is providing access to the executing Portal so that the
callback function can get at the original command string.  I don't think
this is unreasonably bad, since we already have a global PortalContext
that points to the portal's memory context --- I just added parallel
code that updates a new global ActivePortal.

The re-parsing of the original command is simplistic but will handle all
normal cases.

            regards, tom lane


*** src/backend/catalog/pg_proc.c.orig    Sat Mar 13 20:58:41 2004
--- src/backend/catalog/pg_proc.c    Thu Mar 18 18:20:20 2004
***************
*** 23,31 ****
--- 23,33 ----
  #include "executor/executor.h"
  #include "fmgr.h"
  #include "miscadmin.h"
+ #include "mb/pg_wchar.h"
  #include "parser/parse_coerce.h"
  #include "parser/parse_expr.h"
  #include "parser/parse_type.h"
+ #include "tcop/pquery.h"
  #include "tcop/tcopprot.h"
  #include "utils/acl.h"
  #include "utils/builtins.h"
***************
*** 45,50 ****
--- 47,56 ----
  static Datum create_parameternames_array(int parameterCount,
                                           const char *parameterNames[]);
  static void sql_function_parse_error_callback(void *arg);
+ static int    match_prosrc_to_query(const char *prosrc, const char *queryText,
+                                   int cursorpos);
+ static bool match_prosrc_to_literal(const char *prosrc, const char *literal,
+                                     int cursorpos, int *newcursorpos);


  /* ----------------------------------------------------------------
***************
*** 768,774 ****
           * location is inside the function body, not out in CREATE FUNCTION.
           */
          sqlerrcontext.callback = sql_function_parse_error_callback;
!         sqlerrcontext.arg = proc;
          sqlerrcontext.previous = error_context_stack;
          error_context_stack = &sqlerrcontext;

--- 774,780 ----
           * location is inside the function body, not out in CREATE FUNCTION.
           */
          sqlerrcontext.callback = sql_function_parse_error_callback;
!         sqlerrcontext.arg = tuple;
          sqlerrcontext.previous = error_context_stack;
          error_context_stack = &sqlerrcontext;

***************
*** 800,821 ****
  }

  /*
!  * error context callback to let us supply a context marker
   */
  static void
  sql_function_parse_error_callback(void *arg)
  {
!     Form_pg_proc proc = (Form_pg_proc) arg;

      /*
!      * XXX it'd be really nice to adjust the syntax error position to
!      * account for the offset from the start of the statement to the
!      * function body string, not to mention any quoting characters in
!      * the string, but I can't see any decent way to do that...
       *
!      * In the meantime, put in a CONTEXT entry that can cue clients
!      * not to trust the syntax error position completely.
       */
!     errcontext("SQL function \"%s\"",
!                NameStr(proc->proname));
  }
--- 806,976 ----
  }

  /*
!  * error context callback to let us adjust syntax errors from SQL functions
   */
  static void
  sql_function_parse_error_callback(void *arg)
  {
!     HeapTuple    tuple = (HeapTuple) arg;
!     Form_pg_proc proc = (Form_pg_proc) GETSTRUCT(tuple);
!     int            origerrposition;
!     const char *queryText;
!     bool        isnull;
!     Datum        tmp;
!     char       *prosrc;
!     int            newerrposition;
!
!     /*
!      * Nothing to do unless we are dealing with a syntax error that has
!      * a cursor position.  In that case, we need to try to adjust the
!      * position to match the original query, not the text of the function.
!      */
!     origerrposition = geterrposition();
!     if (origerrposition <= 0)
!         return;
!
!     /*
!      * We can get the original query text from the active portal (hack...)
!      */
!     Assert(ActivePortal && ActivePortal->portalActive);
!     queryText = ActivePortal->sourceText;
!
!     /*
!      * Try to locate the function text in the original query.
!      */
!     tmp = SysCacheGetAttr(PROCOID, tuple, Anum_pg_proc_prosrc, &isnull);
!     if (isnull)
!         elog(ERROR, "null prosrc");
!     prosrc = DatumGetCString(DirectFunctionCall1(textout, tmp));
!
!     newerrposition = match_prosrc_to_query(prosrc, queryText, origerrposition);
!
!     if (newerrposition > 0)
!     {
!         /* Successful, so fix the error position */
!         errposition(newerrposition);
!     }
!     else
!     {
!         /*
!          * If unsuccessful, put in a CONTEXT entry that will cue clients
!          * not to treat the error position as relative to the original query.
!          */
!         errcontext("SQL function \"%s\"",
!                    NameStr(proc->proname));
!     }
! }
!
! /*
!  * Try to locate the string literal containing the function body in the
!  * given text of the CREATE FUNCTION command.  If successful, return the
!  * character (not byte) index within the command corresponding to the
!  * given character index within the literal.  If not successful, return 0.
!  */
! static int
! match_prosrc_to_query(const char *prosrc, const char *queryText,
!                       int cursorpos)
! {
!     /*
!      * Rather than fully parsing the CREATE FUNCTION command, we just scan
!      * the command looking for $prosrc$ or 'prosrc'.  This could be fooled
!      * (though not in any very probable scenarios), so fail if we find
!      * more than one match.
!      */
!     int        prosrclen = strlen(prosrc);
!     int        querylen = strlen(queryText);
!     int        matchpos = 0;
!     int        curpos;
!     int        newcursorpos;
!
!     for (curpos = 0; curpos < querylen-prosrclen; curpos++)
!     {
!         if (queryText[curpos] == '$' &&
!             strncmp(prosrc, &queryText[curpos+1], prosrclen) == 0 &&
!             queryText[curpos+1+prosrclen] == '$')
!         {
!             /*
!              * Found a $foo$ match.  Since there are no embedded quoting
!              * characters in a dollar-quoted literal, we don't have to do
!              * any fancy arithmetic; just offset by the starting position.
!              */
!             if (matchpos)
!                 return 0;        /* multiple matches, fail */
!             matchpos = pg_mbstrlen_with_len(queryText, curpos+1)
!                 + cursorpos;
!         }
!         else if (queryText[curpos] == '\'' &&
!                  match_prosrc_to_literal(prosrc, &queryText[curpos+1],
!                                          cursorpos, &newcursorpos))
!         {
!             /*
!              * Found a 'foo' match.  match_prosrc_to_literal() has adjusted
!              * for any quotes or backslashes embedded in the literal.
!              */
!             if (matchpos)
!                 return 0;        /* multiple matches, fail */
!             matchpos = pg_mbstrlen_with_len(queryText, curpos+1)
!                 + newcursorpos;
!         }
!     }
!
!     return matchpos;
! }
!
! /*
!  * Try to match the given source text to a single-quoted literal.
!  * If successful, adjust newcursorpos to correspond to the character
!  * (not byte) index corresponding to cursorpos in the source text.
!  *
!  * At entry, literal points just past a ' character.  We must check for the
!  * trailing quote.
!  */
! static bool
! match_prosrc_to_literal(const char *prosrc, const char *literal,
!                         int cursorpos, int *newcursorpos)
! {
!     int            newcp = cursorpos;
!     int            chlen;

      /*
!      * This implementation handles backslashes and doubled quotes in the
!      * string literal.  It does not handle the SQL syntax for literals
!      * continued across line boundaries.
       *
!      * We do the comparison a character at a time, not a byte at a time,
!      * so that we can do the correct cursorpos math.
       */
!     while (*prosrc)
!     {
!         cursorpos--;            /* characters left before cursor */
!         /*
!          * Check for backslashes and doubled quotes in the literal; adjust
!          * newcp when one is found before the cursor.
!          */
!         if (*literal == '\\')
!         {
!             literal++;
!             if (cursorpos > 0)
!                 newcp++;
!         }
!         else if (*literal == '\'')
!         {
!             if (literal[1] != '\'')
!                 return false;
!             literal++;
!             if (cursorpos > 0)
!                 newcp++;
!         }
!         chlen = pg_mblen(prosrc);
!         if (strncmp(prosrc, literal, chlen) != 0)
!             return false;
!         prosrc += chlen;
!         literal += chlen;
!     }
!
!     *newcursorpos = newcp;
!
!     if (*literal == '\'' && literal[1] != '\'')
!         return true;
!     return false;
  }
*** src/backend/commands/portalcmds.c.orig    Sat Nov 29 14:51:47 2003
--- src/backend/commands/portalcmds.c    Thu Mar 18 16:57:10 2004
***************
*** 270,275 ****
--- 270,276 ----
  PersistHoldablePortal(Portal portal)
  {
      QueryDesc  *queryDesc = PortalGetQueryDesc(portal);
+     Portal        saveActivePortal;
      MemoryContext savePortalContext;
      MemoryContext saveQueryContext;
      MemoryContext oldcxt;
***************
*** 311,316 ****
--- 312,319 ----
      /*
       * Set global portal context pointers.
       */
+     saveActivePortal = ActivePortal;
+     ActivePortal = portal;
      savePortalContext = PortalContext;
      PortalContext = PortalGetHeapMemory(portal);
      saveQueryContext = QueryContext;
***************
*** 342,347 ****
--- 345,351 ----
      /* Mark portal not active */
      portal->portalActive = false;

+     ActivePortal = saveActivePortal;
      PortalContext = savePortalContext;
      QueryContext = saveQueryContext;

*** src/backend/tcop/postgres.c.orig    Mon Mar 15 15:01:58 2004
--- src/backend/tcop/postgres.c    Thu Mar 18 16:56:55 2004
***************
*** 2710,2715 ****
--- 2710,2716 ----
           */
          MemoryContextSwitchTo(TopMemoryContext);
          MemoryContextResetAndDeleteChildren(ErrorContext);
+         ActivePortal = NULL;
          PortalContext = NULL;
          QueryContext = NULL;

*** src/backend/tcop/pquery.c.orig    Fri Mar  5 15:57:31 2004
--- src/backend/tcop/pquery.c    Thu Mar 18 16:56:56 2004
***************
*** 23,28 ****
--- 23,35 ----
  #include "utils/memutils.h"


+ /*
+  * ActivePortal is the currently executing Portal (the most closely nested,
+  * if there are several).
+  */
+ Portal    ActivePortal = NULL;
+
+
  static uint32 RunFromStore(Portal portal, ScanDirection direction, long count,
               DestReceiver *dest);
  static long PortalRunSelect(Portal portal, bool forward, long count,
***************
*** 395,400 ****
--- 402,408 ----
            char *completionTag)
  {
      bool        result;
+     Portal        saveActivePortal;
      MemoryContext savePortalContext;
      MemoryContext saveQueryContext;
      MemoryContext oldContext;
***************
*** 433,438 ****
--- 441,448 ----
      /*
       * Set global portal context pointers.
       */
+     saveActivePortal = ActivePortal;
+     ActivePortal = portal;
      savePortalContext = PortalContext;
      PortalContext = PortalGetHeapMemory(portal);
      saveQueryContext = QueryContext;
***************
*** 508,513 ****
--- 518,524 ----
      /* Mark portal not active */
      portal->portalActive = false;

+     ActivePortal = saveActivePortal;
      PortalContext = savePortalContext;
      QueryContext = saveQueryContext;

***************
*** 925,930 ****
--- 936,942 ----
                 DestReceiver *dest)
  {
      long        result;
+     Portal        saveActivePortal;
      MemoryContext savePortalContext;
      MemoryContext saveQueryContext;
      MemoryContext oldContext;
***************
*** 948,953 ****
--- 960,967 ----
      /*
       * Set global portal context pointers.
       */
+     saveActivePortal = ActivePortal;
+     ActivePortal = portal;
      savePortalContext = PortalContext;
      PortalContext = PortalGetHeapMemory(portal);
      saveQueryContext = QueryContext;
***************
*** 972,977 ****
--- 986,992 ----
      /* Mark portal not active */
      portal->portalActive = false;

+     ActivePortal = saveActivePortal;
      PortalContext = savePortalContext;
      QueryContext = saveQueryContext;

*** src/backend/utils/error/elog.c.orig    Mon Mar 15 15:01:58 2004
--- src/backend/utils/error/elog.c    Thu Mar 18 17:36:58 2004
***************
*** 808,813 ****
--- 808,829 ----
      return 0;                    /* return value does not matter */
  }

+ /*
+  * geterrposition --- return the currently set error position (0 if none)
+  *
+  * This is only intended for use in error callback subroutines, since there
+  * is no other place outside elog.c where the concept is meaningful.
+  */
+ int
+ geterrposition(void)
+ {
+     ErrorData  *edata = &errordata[errordata_stack_depth];
+
+     /* we don't bother incrementing recursion_depth */
+     CHECK_STACK_DEPTH();
+
+     return edata->cursorpos;
+ }

  /*
   * elog_finish --- finish up for old-style API
*** src/include/tcop/pquery.h.orig    Sat Nov 29 17:41:14 2003
--- src/include/tcop/pquery.h    Thu Mar 18 16:56:49 2004
***************
*** 17,22 ****
--- 17,25 ----
  #include "utils/portal.h"


+ extern DLLIMPORT Portal ActivePortal;
+
+
  extern void ProcessQuery(Query *parsetree,
               Plan *plan,
               ParamListInfo params,
*** src/include/utils/elog.h.orig    Mon Mar 15 15:02:09 2004
--- src/include/utils/elog.h    Thu Mar 18 17:36:51 2004
***************
*** 132,137 ****
--- 132,139 ----
  extern int    errfunction(const char *funcname);
  extern int    errposition(int cursorpos);

+ extern int    geterrposition(void);
+

  /*----------
   * Old-style error reporting API: to be used in this way:

Re: syntax error position "CREATE FUNCTION" bug fix

From
Fabien COELHO
Date:
Dear Tom,

> Attached is a proposed patch that fixes the
> cursor-position-in-CREATE-FUNCTION issue per my earlier suggestion.
>
> The re-parsing of the original command is simplistic but will handle all
> normal cases.
> [...]

That's quite a demonstration;-)

However, I still stick with my "bad" simple idea because the simpler the
better, and also because of the following example:

psql> CREATE OR REPLACE FUNCTION count_tup(TEXT) RETURNS INTEGER AS '
DECLARE
  n RECORD;
BEGIN
  FOR n IN EXECUTE \'SELECT COUNT(*) AS c FRM \' || $1
  LOOP
    RETURN n.c;
  END LOOP;
  RETURN NULL;
END;'
LANGUAGE plpgsql;

psql> SELECT count_tup('pg_shadow');
ERROR:  syntax error at or near "FRM" at character 22
CONTEXT:  PL/pgSQL function "count_tup" line 4 at for over execute statement

As you can notice, the extract is not in the submitted query, so there
is no point to show it there.

Maybe real PL/pgSQL programmers will never have syntax errors with their
SQL stuff.

Thus I really think that the parser should return the processed query,
at least in some cases.

Anyway, have a nice day!

--
Fabien Coelho - coelho@cri.ensmp.fr