Thread: Better handling of parse errors

Better handling of parse errors

From
Gavin Sherry
Date:
Hi all,

Attached is a small patch to scan.l for consideration. It hands
yyerror() the position in the query string of the token which caused a
parse error. It is not even close to an implementation of error handling
a-la SQL99 but it certainly makes debugging complicated queries easier.

I've done some testing and it appears to hit the offending token pretty
accurately.

Can anyone find a way to break this? If not, I'd love to see it in 7.3.

Gavin

Re: Better handling of parse errors

From
Bruce Momjian
Date:
Can we see some sample output?

---------------------------------------------------------------------------

Gavin Sherry wrote:
> Hi all,
> 
> Attached is a small patch to scan.l for consideration. It hands
> yyerror() the position in the query string of the token which caused a
> parse error. It is not even close to an implementation of error handling
> a-la SQL99 but it certainly makes debugging complicated queries easier.
> 
> I've done some testing and it appears to hit the offending token pretty
> accurately.
> 
> Can anyone find a way to break this? If not, I'd love to see it in 7.3.
> 
> Gavin

Content-Description: 

[ Attachment, skipping... ]

> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
> http://archives.postgresql.org

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Better handling of parse errors

From
Gavin Sherry
Date:
On Tue, 6 Aug 2002, Bruce Momjian wrote:

> 
> Can we see some sample output?

template1=# select * frum pg_class;
ERROR:  parser: parse error at or near "frum" at character 10
template1=# select relname from pg_class a excepr select relname from
pg_class limit a;
ERROR:  parser: parse error at or near "excepr" at character 32
template1=# create table a (desc int);
ERROR:  parser: parse error at or near "desc" at character 17

Gavin

> 
> ---------------------------------------------------------------------------
> 
> Gavin Sherry wrote:
> > Hi all,
> > 
> > Attached is a small patch to scan.l for consideration. It hands
> > yyerror() the position in the query string of the token which caused a
> > parse error. It is not even close to an implementation of error handling
> > a-la SQL99 but it certainly makes debugging complicated queries easier.
> > 
> > I've done some testing and it appears to hit the offending token pretty
> > accurately.
> > 
> > Can anyone find a way to break this? If not, I'd love to see it in 7.3.
> > 
> > Gavin
> 
> Content-Description: 
> 
> [ Attachment, skipping... ]
> 
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: Have you searched our list archives?
> > 
> > http://archives.postgresql.org
> 
> 



Re: Better handling of parse errors

From
Gavin Sherry
Date:
On Tue, 6 Aug 2002, Bruce Momjian wrote:

> 
> Can we see some sample output?

One other important one:

template1=# select
template1-# *
template1-#
template1-# frum
template1-# pg_class;
ERROR:  parser: parse error at or near "frum" at character 10

Note that it counts non-printable characters. This could be both good and
bad.

Gavin




Re: Better handling of parse errors

From
Tom Lane
Date:
Gavin Sherry <swm@linuxworld.com.au> writes:
> Attached is a small patch to scan.l for consideration. It hands
> yyerror() the position in the query string of the token which caused a
> parse error.

Isn't that the hard way to do it?  Seems like you could just subtract
scanbuf from the error pointer, instead of adding overhead to the basic
lex loop.

Things that need to be decided (and documented somewhere):

Is the number an offset (counted from 0) or an index (counted from 1)?
Which end of the token does it point at?  Can the message be phrased
so as to make it reasonably clear what the number is?


A related change I'd been meaning to make is to get it to sayparse error at end of input
when that's the case, rather than the rather uselessparse error at or near ""
that you get now.  I'd still be inclined to just say "end of input"
in that case, and not bother with a character count.
        regards, tom lane


Re: Better handling of parse errors

From
Bruce Momjian
Date:
Gavin, have you answered these issues brought up about the patch?

---------------------------------------------------------------------------

Tom Lane wrote:
> Gavin Sherry <swm@linuxworld.com.au> writes:
> > Attached is a small patch to scan.l for consideration. It hands
> > yyerror() the position in the query string of the token which caused a
> > parse error.
> 
> Isn't that the hard way to do it?  Seems like you could just subtract
> scanbuf from the error pointer, instead of adding overhead to the basic
> lex loop.
> 
> Things that need to be decided (and documented somewhere):
> 
> Is the number an offset (counted from 0) or an index (counted from 1)?
> Which end of the token does it point at?  Can the message be phrased
> so as to make it reasonably clear what the number is?
> 
> 
> A related change I'd been meaning to make is to get it to say
>     parse error at end of input
> when that's the case, rather than the rather useless
>     parse error at or near ""
> that you get now.  I'd still be inclined to just say "end of input"
> in that case, and not bother with a character count.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Better handling of parse errors

From
Gavin Sherry
Date:
Bruce,

I was working on this on the train this morning. There are a few
possibilities I'd like to look at before I submit another patch.

Before I do, there is one important question that I didn't raise when I
submitted the first patch (which was only a proof of concept kind of
patch). Namely: do we want to modify every 7.2 error message? Many people
have written error message parsers into their applications to make up for
the fact that Postgres doesn't use error codes or issue 'standardised'
error messages.

Is this going to annoy people too much? Should it be delayed until we have
SQL99 diagnostics/error codes?

Gavin

On Wed, 14 Aug 2002, Bruce Momjian wrote:

> 
> Gavin, have you answered these issues brought up about the patch?
> 
> ---------------------------------------------------------------------------
> 
> Tom Lane wrote:
> > Gavin Sherry <swm@linuxworld.com.au> writes:
> > > Attached is a small patch to scan.l for consideration. It hands
> > > yyerror() the position in the query string of the token which caused a
> > > parse error.
> > 
> > Isn't that the hard way to do it?  Seems like you could just subtract
> > scanbuf from the error pointer, instead of adding overhead to the basic
> > lex loop.
> > 
> > Things that need to be decided (and documented somewhere):
> > 
> > Is the number an offset (counted from 0) or an index (counted from 1)?
> > Which end of the token does it point at?  Can the message be phrased
> > so as to make it reasonably clear what the number is?
> > 
> > 
> > A related change I'd been meaning to make is to get it to say
> >     parse error at end of input
> > when that's the case, rather than the rather useless
> >     parse error at or near ""
> > that you get now.  I'd still be inclined to just say "end of input"
> > in that case, and not bother with a character count.
> > 
> >             regards, tom lane
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> > 
> 
> 



Re: Better handling of parse errors

From
Bruce Momjian
Date:
I don't think anyone will mind, but you can throw a message to 'general'
just to be sure.

---------------------------------------------------------------------------

Gavin Sherry wrote:
> Bruce,
> 
> I was working on this on the train this morning. There are a few
> possibilities I'd like to look at before I submit another patch.
> 
> Before I do, there is one important question that I didn't raise when I
> submitted the first patch (which was only a proof of concept kind of
> patch). Namely: do we want to modify every 7.2 error message? Many people
> have written error message parsers into their applications to make up for
> the fact that Postgres doesn't use error codes or issue 'standardised'
> error messages.
> 
> Is this going to annoy people too much? Should it be delayed until we have
> SQL99 diagnostics/error codes?
> 
> Gavin
> 
> On Wed, 14 Aug 2002, Bruce Momjian wrote:
> 
> > 
> > Gavin, have you answered these issues brought up about the patch?
> > 
> > ---------------------------------------------------------------------------
> > 
> > Tom Lane wrote:
> > > Gavin Sherry <swm@linuxworld.com.au> writes:
> > > > Attached is a small patch to scan.l for consideration. It hands
> > > > yyerror() the position in the query string of the token which caused a
> > > > parse error.
> > > 
> > > Isn't that the hard way to do it?  Seems like you could just subtract
> > > scanbuf from the error pointer, instead of adding overhead to the basic
> > > lex loop.
> > > 
> > > Things that need to be decided (and documented somewhere):
> > > 
> > > Is the number an offset (counted from 0) or an index (counted from 1)?
> > > Which end of the token does it point at?  Can the message be phrased
> > > so as to make it reasonably clear what the number is?
> > > 
> > > 
> > > A related change I'd been meaning to make is to get it to say
> > >     parse error at end of input
> > > when that's the case, rather than the rather useless
> > >     parse error at or near ""
> > > that you get now.  I'd still be inclined to just say "end of input"
> > > in that case, and not bother with a character count.
> > > 
> > >             regards, tom lane
> > > 
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> > > 
> > 
> > 
> 
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Better handling of parse errors

From
Tom Lane
Date:
Gavin Sherry <swm@linuxworld.com.au> writes:
> ... do we want to modify every 7.2 error message?

Nyet ... but I don't think tacking an offset onto the end of
"parse error at or near foo" messages is likely to cause the
sort of generalized havoc you suggest ...

I'm on record as favoring a thorough redesign of the error-reporting
facility, and I hope to see that happen in conjunction with a frontend
protocol change in 7.4.  But I think your proposed change in this patch
is fairly harmless and could be done now.
        regards, tom lane


Re: Better handling of parse errors

From
Gavin Sherry
Date:
On Wed, 14 Aug 2002, Tom Lane wrote:

> Gavin Sherry <swm@linuxworld.com.au> writes:
> > ... do we want to modify every 7.2 error message?
>
> Nyet ... but I don't think tacking an offset onto the end of
> "parse error at or near foo" messages is likely to cause the
> sort of generalized havoc you suggest ...

In that case, attached is a patch which locates the beginning of the
offending token more efficiently (per your suggestion of using
scanbuf). The new patch does the same as before:

template1=# select * frum pg_class;
ERROR:  parser: parse error at or near "frum" at character 10

It also implement's Tom's suggestion:

template1=# select * from pg_class where\g
ERROR:  parse: parse error at end of input

Gavin

Re: Better handling of parse errors

From
Bruce Momjian
Date:
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://candle.pha.pa.us/cgi-bin/pgpatches

I will try to apply it within the next 48 hours.

---------------------------------------------------------------------------


Gavin Sherry wrote:
> On Wed, 14 Aug 2002, Tom Lane wrote:
>
> > Gavin Sherry <swm@linuxworld.com.au> writes:
> > > ... do we want to modify every 7.2 error message?
> >
> > Nyet ... but I don't think tacking an offset onto the end of
> > "parse error at or near foo" messages is likely to cause the
> > sort of generalized havoc you suggest ...
>
> In that case, attached is a patch which locates the beginning of the
> offending token more efficiently (per your suggestion of using
> scanbuf). The new patch does the same as before:
>
> template1=# select * frum pg_class;
> ERROR:  parser: parse error at or near "frum" at character 10
>
> It also implement's Tom's suggestion:
>
> template1=# select * from pg_class where\g
> ERROR:  parse: parse error at end of input
>
> Gavin

Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Better handling of parse errors

From
Bruce Momjian
Date:
Patch applied.  Thanks.

---------------------------------------------------------------------------


Gavin Sherry wrote:
> On Wed, 14 Aug 2002, Tom Lane wrote:
>
> > Gavin Sherry <swm@linuxworld.com.au> writes:
> > > ... do we want to modify every 7.2 error message?
> >
> > Nyet ... but I don't think tacking an offset onto the end of
> > "parse error at or near foo" messages is likely to cause the
> > sort of generalized havoc you suggest ...
>
> In that case, attached is a patch which locates the beginning of the
> offending token more efficiently (per your suggestion of using
> scanbuf). The new patch does the same as before:
>
> template1=# select * frum pg_class;
> ERROR:  parser: parse error at or near "frum" at character 10
>
> It also implement's Tom's suggestion:
>
> template1=# select * from pg_class where\g
> ERROR:  parse: parse error at end of input
>
> Gavin

Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [PATCHES] Better handling of parse errors

From
Peter Eisentraut
Date:
Gavin Sherry writes:

> In that case, attached is a patch which locates the beginning of the
> offending token more efficiently (per your suggestion of using
> scanbuf).

In the regression tests there are a couple of cases that could be
improved:

In strings.sql:

-- illegal string continuation syntax
SELECT 'first line'
' - next line' /* this comment is not allowed here */
' - third line'       AS "Illegal comment within continuation";
ERROR:  parser: parse error at or near "' - third line'" at character 89

Character 89 is the end of the "third line" line, but the parse error is
at the beginning of that line.

In create_function_1.sql:

CREATE FUNCTION test1 (int) RETURNS int LANGUAGE sql   AS 'not even SQL';
ERROR:  parser: parse error at or near "not" at character 1

Clearly confusing.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: [PATCHES] Better handling of parse errors

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> In strings.sql:

> -- illegal string continuation syntax
> SELECT 'first line'
> ' - next line' /* this comment is not allowed here */
> ' - third line'
>         AS "Illegal comment within continuation";
> ERROR:  parser: parse error at or near "' - third line'" at character 89

> Character 89 is the end of the "third line" line, but the parse error is
> at the beginning of that line.

This is fixed as of my later commit.

> In create_function_1.sql:

> CREATE FUNCTION test1 (int) RETURNS int LANGUAGE sql
>     AS 'not even SQL';
> ERROR:  parser: parse error at or near "not" at character 1

> Clearly confusing.

"Character 1" is correct as of the context that the parser is working
in, namely the function body.  I don't think we can do much to change
that, but perhaps we could make the message read like
ERROR:  parser: parse error at or near "not" at character 1 of function body
This would require giving the parser some sort of context-identifying
string to tack onto the message, but that doesn't seem too hard.
        regards, tom lane


Re: [PATCHES] Better handling of parse errors

From
Gavin Sherry
Date:
On Sun, 18 Aug 2002, Tom Lane wrote:

> Peter Eisentraut <peter_e@gmx.net> writes:
> > In strings.sql:
> 
> > -- illegal string continuation syntax
> > SELECT 'first line'
> > ' - next line' /* this comment is not allowed here */
> > ' - third line'
> >         AS "Illegal comment within continuation";
> > ERROR:  parser: parse error at or near "' - third line'" at character 89
> 
> > Character 89 is the end of the "third line" line, but the parse error is
> > at the beginning of that line.
> 
> This is fixed as of my later commit.
> 
> > In create_function_1.sql:
> 
> > CREATE FUNCTION test1 (int) RETURNS int LANGUAGE sql
> >     AS 'not even SQL';
> > ERROR:  parser: parse error at or near "not" at character 1
> 
> > Clearly confusing.
> 
> "Character 1" is correct as of the context that the parser is working
> in, namely the function body.  I don't think we can do much to change
> that, but perhaps we could make the message read like
> ERROR:  parser: parse error at or near "not" at character 1 of function body
> This would require giving the parser some sort of context-identifying
> string to tack onto the message, but that doesn't seem too hard.

Tom,

Reworking the code to taken into account token_start seems to work.
       elog(ERROR, "parser: %s at or near \"%s\" at character %i",           message,token_start ? token_start :
yytext,          token_start ? (unsigned int)(token_start - scanbuf + 1) :                         (unsigned
int)(yytext- scanbuf + 1));
 

I will submit a patch once I do some more testing to find other possible
situations where this plays up.

Gavin



Re: [PATCHES] Better handling of parse errors

From
Tom Lane
Date:
Gavin Sherry <swm@linuxworld.com.au> writes:
> Reworking the code to taken into account token_start seems to work.

Yes, I did that last night ...
        regards, tom lane


Re: [PATCHES] Better handling of parse errors

From
Peter Eisentraut
Date:
Tom Lane writes:

> "Character 1" is correct as of the context that the parser is working
> in, namely the function body.  I don't think we can do much to change
> that, but perhaps we could make the message read like
> ERROR:  parser: parse error at or near "not" at character 1 of function body

That would be better.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: [PATCHES] Better handling of parse errors

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane writes:
>> "Character 1" is correct as of the context that the parser is working
>> in, namely the function body.  I don't think we can do much to change
>> that, but perhaps we could make the message read like
>> ERROR:  parser: parse error at or near "not" at character 1 of function body

> That would be better.

After a quick look through the sources, it seems we could fairly easily
do this: callers of pg_parse_and_rewrite() and some related routines
could pass a string like "SQL function body", which would get plugged
into the parse-error message.  Two issues though:

* Is this okay from an internationalization point of view?  We can
gettext() the "SQL function body" string but I don't know if there
are serious problems with pasting that intoparse error at or near "%s" at character %d of %s
On the other hand I'm not comfortable with having the far-end caller
supply that whole string, either, since most of it is the lexer's
responsibility.

* The natural thing to say in _SPI_execute's call is "SPI query",
but this will probably not go over big with plpgsql users, who will
see that and probably have no idea what SPI is.  But I'm very
loathe to change the SPI API so that plpgsql can pass down the
context string --- that'll break existing user functions that use
SPI.  Do we want to uglify the SPI API to the extent of having
parallel calls that just add a context string parameter?  Is there
a better way?
        regards, tom lane