Thread: Re: [PATCHES] Patch for more readable parse error messages

Re: [PATCHES] Patch for more readable parse error messages

From
Tom Lane
Date:
Jeroen van Vianen <jeroen@design.nl> writes:
>> Does this work with a non-bison parser?  It looks mighty
>> bison-dependent to me...

> I'm not sure, but it probably is flex dependent (but Postgres always needed 
> flex anyway). I'm not aware of any yacc / byacc / bison dependencies. Don't 
> know if anybody has been successful building Postgres with another parser 
> generator.

Um, you're right of course --- those are lexer not parser datastructures
you're poking into.  Sorry for my confusion.

We do in fact work with non-bison parser generators, or did last time
I tried it (around 6.5 release).  I would not like us to stop working
with non-bison yaccs, since bison's output depends on alloca() which
is not available everywhere.

I'm not sure about the situation with lexers.  We have been saying for
a long time that flex was required, but since we got rid of the
scanner's use of trailing context ('/' rules) I think there is a better
chance that it would work with vanilla lex.  Anyone want to try that
with current sources?

> BTW, as we ship flex's output lex.yy.c (as scan.c) and bison's output
> (gram.c) in the distribution, any user would be able to compile the
> sources, but if they want to start hacking the .l or .y files, they'll
> need appropriate tools.

Right.  I am not aware of any portability problems with flex's output
as there are with bison's, so it may be that the concern is moot.
We may just be able to say "use the prebuilt scan.c or get flex; we
don't care about supporting vendor lexes anymore".

I do see a potential problem with this patch that's not related to
portability questions; it is that you're assuming that the lexer's
furthest penetration into the source text is a good place to point
at for parser errors.  That may not be true always.  In particular,
I've been advocating solving some other problems by inserting a
one-token lookahead buffer between the parser and the lexer.  If that
happens then you'd be off by (at least) one token in some cases.

I think the way that this sort of thing is customarily handled in
"real" compilers is that each token carries along an indication of
just where it was found in the source, and then error messages can
finger the right place without making assumptions about synchronization
between different phases of the scanning/parsing process.  That might
be more work than we can justify for SQL queries; not sure.

BTW, I think that the immediate problem of generating a good error
message for unterminated comments and literals could be solved in other
ways.  This patch or something like it might be cool anyway, but you
should bear in mind that printing out a query and then a marker that's
supposed to line up with something in the query doesn't always work
all that well.  Consider a query that's several dozen lines long,
such as a large table definition.  If we had more control over the
user interface and could highlight the offending token directly,
I'd be more excited about doing something like this.  (Actually, you
could partially address that problem by only printing one line's worth
of query text leading up to the error marker point.  It would still be
tricky to get it right in the presence of newlines, tabs, etc.)
        regards, tom lane


Re: [HACKERS] Re: [PATCHES] Patch for more readable parse error messages

From
Chris Bitmead
Date:
Tom Lane wrote:

> We do in fact work with non-bison parser generators, or did last time
> I tried it (around 6.5 release).  I would not like us to stop working
> with non-bison yaccs, since bison's output depends on alloca() which
> is not available everywhere.

I think GNU alloca should work on any platform because it's written in a
portable way.


Re: [PATCHES] Patch for more readable parse error messages

From
Jeroen van Vianen
Date:
At 20:03 20-02-00 -0500, Tom Lane wrote:
>I do see a potential problem with this patch that's not related to
>portability questions; it is that you're assuming that the lexer's
>furthest penetration into the source text is a good place to point
>at for parser errors.  That may not be true always.  In particular,
>I've been advocating solving some other problems by inserting a
>one-token lookahead buffer between the parser and the lexer.  If that
>happens then you'd be off by (at least) one token in some cases.

That's true, but the '*' indicator might at least indicate the approximate 
location of the error. I'm not aware of many (programming) languages that 
are able to indicate the error at the correct location all the time, anyway.

>I think the way that this sort of thing is customarily handled in
>"real" compilers is that each token carries along an indication of
>just where it was found in the source, and then error messages can
>finger the right place without making assumptions about synchronization
>between different phases of the scanning/parsing process.  That might
>be more work than we can justify for SQL queries; not sure.

True, but requires a lot more work.

>BTW, I think that the immediate problem of generating a good error
>message for unterminated comments and literals could be solved in other
>ways.  This patch or something like it might be cool anyway, but you
>should bear in mind that printing out a query and then a marker that's
>supposed to line up with something in the query doesn't always work
>all that well.  Consider a query that's several dozen lines long,
>such as a large table definition.  If we had more control over the
>user interface and could highlight the offending token directly,
>I'd be more excited about doing something like this.  (Actually, you
>could partially address that problem by only printing one line's worth
>of query text leading up to the error marker point.  It would still be
>tricky to get it right in the presence of newlines, tabs, etc.)

I try to make a good guess at where the location of the error is, but am 
hesitant to only print a few tokens near the error locations, as you won't 
be able to know where the error was found in complex queries or table 
definitions. Please try with more complex queries and tell me what you think.


Jeroen



Re: [HACKERS] Re: [PATCHES] Patch for more readable parse error messages

From
Peter Eisentraut
Date:
On 2000-02-20, Tom Lane mentioned:

> I would not like us to stop working
> with non-bison yaccs, since bison's output depends on alloca() which
> is not available everywhere.

Couldn't alloca(x) be defined to palloc(x) where missing? The performance
will be worse, but it ought to work.

-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: [HACKERS] Re: [PATCHES] Patch for more readable parse error messages

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On 2000-02-20, Tom Lane mentioned:
>> I would not like us to stop working
>> with non-bison yaccs, since bison's output depends on alloca() which
>> is not available everywhere.

> Couldn't alloca(x) be defined to palloc(x) where missing?

Probably, but I wasn't looking for a workaround; that was just one
quick illustration of a reason not to want to use bison (one that's
bitten me personally, so I knew it offhand).  We should try not to
become dependent on bison when there are near-equivalent tools, just
on general principles of maintaining portability.  For an analogy,
I believe most of the developers use gcc, but it would be a real bad
idea for us to abandon support for other compilers.

For the same sort of reasons I'd prefer that our scanner worked
with vanilla lex, not just flex.  I'm not sure how far away we are
from that; it may be an unrealistic goal.  But if it is within reach
then we shouldn't give it up lightly.
        regards, tom lane


Re: [HACKERS] Re: [PATCHES] Patch for more readable parse error messages

From
Bruce Momjian
Date:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > On 2000-02-20, Tom Lane mentioned:
> >> I would not like us to stop working
> >> with non-bison yaccs, since bison's output depends on alloca() which
> >> is not available everywhere.
> 
> > Couldn't alloca(x) be defined to palloc(x) where missing?
> 
> Probably, but I wasn't looking for a workaround; that was just one
> quick illustration of a reason not to want to use bison (one that's
> bitten me personally, so I knew it offhand).  We should try not to
> become dependent on bison when there are near-equivalent tools, just
> on general principles of maintaining portability.  For an analogy,
> I believe most of the developers use gcc, but it would be a real bad
> idea for us to abandon support for other compilers.

But I don't see non-bison solutions for finding the location of errors. 
Is it possible?  Could we enable the feature just for bison?


--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Re: [PATCHES] Patch for more readable parse error messages

From
Don Baccus
Date:
At 10:56 PM 2/21/00 -0500, Tom Lane wrote:
>Probably, but I wasn't looking for a workaround; that was just one
>quick illustration of a reason not to want to use bison (one that's
>bitten me personally, so I knew it offhand).  We should try not to
>become dependent on bison when there are near-equivalent tools, just
>on general principles of maintaining portability.  For an analogy,
>I believe most of the developers use gcc, but it would be a real bad
>idea for us to abandon support for other compilers.
>
>For the same sort of reasons I'd prefer that our scanner worked
>with vanilla lex, not just flex.  I'm not sure how far away we are
>from that; it may be an unrealistic goal.  But if it is within reach
>then we shouldn't give it up lightly.

I agree entirely with the above.  The more portable the tool, the larger
the potential user base.   Unless the goal is to bundle-up Postgres with
a pre-defined set of software, i.e. GNU in this case (despite the fact
that I don't see Postgres on their site as part of their list of open-source
software, and I think I looked twice), go for the cover-the-earth approach.

SQL syntax isn't particularly difficult.  On the other hand, I realize there's
a legacy to support.  Still, making portions of the product dependent on one
tool or another is an issue that merits close scrutiny.  Shouldn't be done 
except under compelling reasons.

I mean, presuming a reasonably modern C, C tools, and large-scale 
operating-system environment makes sense (no reason to run native on a palm
pilot,
at this point).  But unecessary dependence on particular tools when not
necessary doesn't make much sense.

Just IMO, of course.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] Re: [PATCHES] Patch for more readable parse error messages

From
Peter Eisentraut
Date:
On Mon, 21 Feb 2000, Tom Lane wrote:

> For the same sort of reasons I'd prefer that our scanner worked
> with vanilla lex, not just flex.  I'm not sure how far away we are
> from that; it may be an unrealistic goal.  But if it is within reach
> then we shouldn't give it up lightly.

I concur. Somewhere in between vanilla lex and flex is also POSIX lex,
which does support exclusive start conditions but no <<EOF>>.

Anyone for getting rid of GNU make?

-- 
Peter Eisentraut                  Sernanders vaeg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: [HACKERS] Re: [PATCHES] Patch for more readable parse error messages

From
Tom Lane
Date:
Peter Eisentraut <e99re41@DoCS.UU.SE> writes:
> ... Somewhere in between vanilla lex and flex is also POSIX lex,
> which does support exclusive start conditions but no <<EOF>>.

I noticed that in the flex manual.  Does it help us any?  That is,
are there a useful number of lexes out there that do the full POSIX
spec?  If flex is our only real choice for exclusive start conditions
then it's pointless to avoid <<EOF>>.

> Anyone for getting rid of GNU make?

No ;-).  GNU make has enough important features that there is no
near-equivalent non-GNU make.  VPATH, for example.  One thing I hope
we will be able to do sometime soon is build in an object directory
tree separate from the source tree... can't realistically do that
with any non-GNU make that I've heard of.
        regards, tom lane


Re: [HACKERS] Re: [PATCHES] Patch for more readable parse error messages

From
Peter Eisentraut
Date:
On 2000-02-22, Tom Lane mentioned:

> > Anyone for getting rid of GNU make?
> 
> No ;-).  GNU make has enough important features that there is no
> near-equivalent non-GNU make.  VPATH, for example.

There are other makes that support this too. While I love GNU make, too,
all the talk about allowing vanilla lex, etc. is pointless while GNU make
is required. Users don't see lex at all, they do see make.

OTOH, it is very hard for me to get an overview these days what's actually
out there in terms of other make's, other lex's, other yacc's, other
compilers. You should have an edge there (HPUX and all). Most
installations of commercial Unix vendors I get to nowadays use gcc, gmake,
flex as system tools. Yesterday I read that Sun builds Java proper with
GNU make!

The best way of going about this seems to take one of the perpetrators
(make file, gram.y, etc.) and try to port it to some given non-GNU tool
and take a look at the consequences. For example, if we get PostgreSQL to
compile with FreeBSD's make without crippling everything, that would be a
win for the user base. This may in fact be the first experiment.

> One thing I hope we will be able to do sometime soon is build in an
> object directory tree separate from the source tree... can't
> realistically do that with any non-GNU make that I've heard of.

I'm planning to work on that for 7.1. But here's an interesting tidbit:
Automake does support this feature but in its manual it claims that it
does not use any GNU make specific features. And in fact, VPATH exists in
both System V's and 4.3 BSD's make.


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden




Re: [HACKERS] Re: [PATCHES] Patch for more readable parse error messages

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On 2000-02-22, Tom Lane mentioned:
>>>> Anyone for getting rid of GNU make?
>> 
>> No ;-).  GNU make has enough important features that there is no
>> near-equivalent non-GNU make.  VPATH, for example.

> There are other makes that support this too. While I love GNU make, too,
> all the talk about allowing vanilla lex, etc. is pointless while GNU make
> is required. Users don't see lex at all, they do see make.

Huh?  Assuming someone will have program X installed is not the same as
assuming they will have program Y installed.  In this particular case,
a more exact way of putting it is that assuming program X is installed
is not the same as assuming that program Y's prebuilt-on-another-machine
output is usable on this platform.

> OTOH, it is very hard for me to get an overview these days what's actually
> out there in terms of other make's, other lex's, other yacc's, other
> compilers.

Not much.  The real problem here is "what set of tool features do you
assume you have, and what's it costing you in portability?"  GNU make
provides a very rich feature set that's widely portable, although you
do have to port the particular implementation.  If you don't want to
assume GNU make but just a generic make, there's a big gap in features
before you drop down to what's actually portable to a wide class of
vendor-provided makes.  VPATH, for example, does exist in *some*
vendor makes, but as a practical matter if you use it then you'd better
tell people "my program requires GNU make".  It's not worth the trouble
to keep track of the exceptions.

I will be the first to admit this is all a matter of judgment calls
rather than certainties.  As far as I can see, it's not worth our
trouble to try to operate with non-GNU makes; it is worth the trouble
to work with non-GNU yaccs, because we're not really using any bison-
specific features; it's looking like we should forget about non-GNU
lexes, but I'm not quite convinced yet.  You're free to hold different
opinions of course.  I've been around for a few years in the portable-
software game, so I tend to think I know where the minefields are, but
perhaps my hard experiences are out of date.

> The best way of going about this seems to take one of the perpetrators
> (make file, gram.y, etc.) and try to port it to some given non-GNU tool
> and take a look at the consequences.

But that only tells you about the one tool; in fact, only about the one
version of the one tool that you test.  In practice, useful knowledge
in this area comes from the school of hard knocks: ship an open-source
program and see what complaints you get.  I'd rather rely on experience
previously gained than learn those lessons again...

>> One thing I hope we will be able to do sometime soon is build in an
>> object directory tree separate from the source tree... can't
>> realistically do that with any non-GNU make that I've heard of.

> I'm planning to work on that for 7.1. But here's an interesting tidbit:
> Automake does support this feature but in its manual it claims that it
> does not use any GNU make specific features.

Yeah?  Do they claim not to need VPATH to do it?  I suppose it might
be possible, if they are willing to write sufficiently ugly and
non-hand-maintainable makefiles.  Not sure that's a good tradeoff
though.

> And in fact, VPATH exists in both System V's and 4.3 BSD's make.

You're still confusing two datapoints with the wide world...
        regards, tom lane


GNU make (Re: [HACKERS] Re: [PATCHES] Patch for more readable parse error messages)

From
Peter Eisentraut
Date:
On Wed, 23 Feb 2000, Tom Lane wrote:

> > And in fact, VPATH exists in both System V's and 4.3 BSD's make.
> 
> You're still confusing two datapoints with the wide world...

I challenge everyone to show me a make without VPATH. In fact, show me two
makes without a feature that you can't live without, and I shall forever
hold my peace. It's certainly easier to say "let's support yacc, because
we actually don't use any non-yacc features" than saying it for make. But
it's not the idea to say "we need GNU make because it has all these
features" when 93% of these features in fact exist in all other reasonable
makes as well. It's not the end of the world but it's something that
shouldn't be ignored.


-- 
Peter Eisentraut                  Sernanders vaeg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Peter Eisentraut <e99re41@DoCS.UU.SE> writes:
> On Wed, 23 Feb 2000, Tom Lane wrote:
>>>> And in fact, VPATH exists in both System V's and 4.3 BSD's make.
>> 
>> You're still confusing two datapoints with the wide world...

> I challenge everyone to show me a make without VPATH. In fact, show me two
> makes without a feature that you can't live without, and I shall forever
> hold my peace.

Out of the four systems I have easy access to: HPUX 10, HPUX 9, Linux
(some fairly old RedHat version), and SunOS 4.1.4, two have makes
without VPATH ... and Linux doesn't really count since it's using gmake
anyway.

Now you can argue that HPUX 9 and SunOS 4.1.4 are dinosaurs that should
be put out of their misery, and I wouldn't disagree --- but reality is
that a lot of people are running older systems and don't have the time
or interest to upgrade 'em.  "Portability" doesn't mean "portability to
the newest and most standards-conformant systems", it means portability
to what's actually out there.

> it's not the idea to say "we need GNU make because it has all these
> features" when 93% of these features in fact exist in all other reasonable
> makes as well.

If I thought we were anywhere near that close to being able to use old
makes, I'd be arguing for removing the GNU-make dependency too.  But
I don't think it's going to be practical...
        regards, tom lane


Re: GNU make (Re: [HACKERS] Re: [PATCHES] Patch for more readable parse error messages)

From
Peter Eisentraut
Date:
On Wed, 23 Feb 2000, Tom Lane wrote:

> Peter Eisentraut <e99re41@DoCS.UU.SE> writes:
> > I challenge everyone to show me a make without VPATH. In fact, show me two
> > makes without a feature that you can't live without, and I shall forever
> > hold my peace.
> 
> Out of the four systems I have easy access to: HPUX 10, HPUX 9, Linux
> (some fairly old RedHat version), and SunOS 4.1.4, two have makes
> without VPATH

You win. ;)

I surveyed several machines as well (Solaris, IRIX, FreeBSD, HPUX) which
all had this feature. I feel better now with actual data points, I hope
that's fair enough.

-- 
Peter Eisentraut                  Sernanders vaeg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: Re: [PATCHES] Patch for more readable parse error messages

From
Bruce Momjian
Date:
Here is more information about it.

> Jeroen van Vianen <jeroen@design.nl> writes:
> >> Does this work with a non-bison parser?  It looks mighty
> >> bison-dependent to me...
>
> > I'm not sure, but it probably is flex dependent (but Postgres always needed
> > flex anyway). I'm not aware of any yacc / byacc / bison dependencies. Don't
> > know if anybody has been successful building Postgres with another parser
> > generator.
>
> Um, you're right of course --- those are lexer not parser datastructures
> you're poking into.  Sorry for my confusion.
>
> We do in fact work with non-bison parser generators, or did last time
> I tried it (around 6.5 release).  I would not like us to stop working
> with non-bison yaccs, since bison's output depends on alloca() which
> is not available everywhere.
>
> I'm not sure about the situation with lexers.  We have been saying for
> a long time that flex was required, but since we got rid of the
> scanner's use of trailing context ('/' rules) I think there is a better
> chance that it would work with vanilla lex.  Anyone want to try that
> with current sources?
>
> > BTW, as we ship flex's output lex.yy.c (as scan.c) and bison's output
> > (gram.c) in the distribution, any user would be able to compile the
> > sources, but if they want to start hacking the .l or .y files, they'll
> > need appropriate tools.
>
> Right.  I am not aware of any portability problems with flex's output
> as there are with bison's, so it may be that the concern is moot.
> We may just be able to say "use the prebuilt scan.c or get flex; we
> don't care about supporting vendor lexes anymore".
>
> I do see a potential problem with this patch that's not related to
> portability questions; it is that you're assuming that the lexer's
> furthest penetration into the source text is a good place to point
> at for parser errors.  That may not be true always.  In particular,
> I've been advocating solving some other problems by inserting a
> one-token lookahead buffer between the parser and the lexer.  If that
> happens then you'd be off by (at least) one token in some cases.
>
> I think the way that this sort of thing is customarily handled in
> "real" compilers is that each token carries along an indication of
> just where it was found in the source, and then error messages can
> finger the right place without making assumptions about synchronization
> between different phases of the scanning/parsing process.  That might
> be more work than we can justify for SQL queries; not sure.
>
> BTW, I think that the immediate problem of generating a good error
> message for unterminated comments and literals could be solved in other
> ways.  This patch or something like it might be cool anyway, but you
> should bear in mind that printing out a query and then a marker that's
> supposed to line up with something in the query doesn't always work
> all that well.  Consider a query that's several dozen lines long,
> such as a large table definition.  If we had more control over the
> user interface and could highlight the offending token directly,
> I'd be more excited about doing something like this.  (Actually, you
> could partially address that problem by only printing one line's worth
> of query text leading up to the error marker point.  It would still be
> tricky to get it right in the presence of newlines, tabs, etc.)
>
>             regards, tom lane
>
> ************
>


--
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026