Thread: Yacc / Bison difficulties
I was trying to make a minor change today to the gram.y file to make PostgreSQL recognize "DOUBLE" as a data type the way DB2 does. I ran into reduce / reduce conflicts using both of the methods I tried. Having fought extensively with Bison before on a SQL oriented language translation project, I am amazed that you were able to get a grammar as complex as PostgreSQL to work without major difficulty. I was wondering about what the sense of the list would be to someday accepting a rewrite using a hand-coded LL(k) recursive descent parser. Anyone? - Mark Butler
> > I was trying to make a minor change today to the gram.y file to make > PostgreSQL recognize "DOUBLE" as a data type the way DB2 does. I ran into > reduce / reduce conflicts using both of the methods I tried. > > Having fought extensively with Bison before on a SQL oriented language > translation project, I am amazed that you were able to get a grammar as > complex as PostgreSQL to work without major difficulty. > > I was wondering about what the sense of the list would be to someday accepting > a rewrite using a hand-coded LL(k) recursive descent parser. Anyone? Interesting. What advantages would there be? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > Interesting. What advantages would there be? As any one who has ever attempted to build a C++ parser using Yacc or Bison can attest, it is very difficult to get an LALR based parser to correctly parse a sophisticated grammar. The advantages of using a hand written recursive descent parser lie in four areas: 1) ease of implementing grammar changes 2) ease of debugging 3) ability to handle unusual cases 4) ability to support context sensitive grammars Context sensitivity is useful for handling things like embedded programming languages without having to escape whole procedures as string literals, for example. We could support procedural language plugins without the current limitations on syntax. Another nice capability is the ability to enable and disable grammar rules at run time - you could add run time options to disable all non SQL-92 grammar rules for application portability testing or emulate Oracle's outer join syntax, as a couple of examples. - Mark Butler
Mark Butler <butlerm@middle.net> writes: > Bruce Momjian wrote: > > > Interesting. What advantages would there be? > > As any one who has ever attempted to build a C++ parser using Yacc or Bison > can attest, it is very difficult to get an LALR based parser to correctly > parse a sophisticated grammar. The advantages of using a hand written > recursive descent parser lie in four areas: > > 1) ease of implementing grammar changes > 2) ease of debugging > 3) ability to handle unusual cases > 4) ability to support context sensitive grammars > > Context sensitivity is useful for handling things like embedded programming > languages without having to escape whole procedures as string literals, for > example. We could support procedural language plugins without the current > limitations on syntax. > > Another nice capability is the ability to enable and disable grammar rules at > run time - you could add run time options to disable all non SQL-92 grammar > rules for application portability testing or emulate Oracle's outer join > syntax, as a couple of examples. On the other hand, recursive descent parsers tend to be more ad hoc, they tend to be harder to maintain, and they tend to be less efficient. I believe that yacc based parsers support context sensitivity just as well as recursive descent parsers. I'm not sure that C++ is a fair example. The problem with parsing C++ is the result of a convoluted declaration syntax, so much so that some constructs are completely syntactically ambiguous no matter how you look at them and must be decided by semantic rules. It's true that in such a case recursive descent is easier, because it is easier to mix semantic rules and syntax rules. But the SQL syntax is much simpler; is this really a problem with SQL? And I note that despite the difficulties, the g++ parser is yacc based. Ian ---------------------------(end of broadcast)--------------------------- TIP 462: A mushroom cloud has no silver lining.
On Wed, Apr 11, 2001 at 10:44:59PM -0700, Ian Lance Taylor wrote: > Mark Butler <butlerm@middle.net> writes: > > ... > > The advantages of using a hand written recursive descent parser lie in > > 1) ease of implementing grammar changes > > 2) ease of debugging > > 3) ability to handle unusual cases > > 4) ability to support context sensitive grammars > > ... > > Another nice capability is the ability to enable and disable grammar > > rules at run time ... > > On the other hand, recursive descent parsers tend to be more ad hoc, > they tend to be harder to maintain, and they tend to be less > efficient. ... And I note that despite the > difficulties, the g++ parser is yacc based. Yacc and yacc-like programs are most useful when the target grammar (or your understanding of it) is not very stable. With Yacc you can make sweeping changes much more easily; big changes can be a lot of work in a hand-coded parser. Once your grammar stabilizes, though, hand coding can provide flexibility that is inconceivable in a parser generator, albeit at some cost in speed and compact description. (I doubt parser speed is an issue for PG.) G++ has flirted seriously with switching to a recursive-descent parser, largely to be able to offer meaningful error messages and to recover better from errors, as well as to be able to parse some problematic but conformant (if unlikely) programs. Note that the choice is not just between Yacc and a hand-coded parser. Since Yacc, many more powerful parser generators have been released, one of which might be just right for PG. Nathan Myers ncm@zembu.com
Mark Butler writes: > I was trying to make a minor change today to the gram.y file to make > PostgreSQL recognize "DOUBLE" as a data type the way DB2 does. I ran into > reduce / reduce conflicts using both of the methods I tried. See attached patch. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Thanks. I didn't realize the need to move the DOUBLE token from the TokenId to the ColId production. Will this patch be integrated into the head branch? - Mark Butler Peter Eisentraut wrote: > > Mark Butler writes: > > > I was trying to make a minor change today to the gram.y file to make > > PostgreSQL recognize "DOUBLE" as a data type the way DB2 does. I ran into > > reduce / reduce conflicts using both of the methods I tried. > > See attached patch.
Mark Butler writes: > Thanks. I didn't realize the need to move the DOUBLE token from the TokenId to > the ColId production. Will this patch be integrated into the head branch? Not sure. It's not a standard type, but at least two other RDBMS have it and the name does make sense. Any comments? > > - Mark Butler > > Peter Eisentraut wrote: > > > > Mark Butler writes: > > > > > I was trying to make a minor change today to the gram.y file to make > > > PostgreSQL recognize "DOUBLE" as a data type the way DB2 does. I ran into > > > reduce / reduce conflicts using both of the methods I tried. > > > > See attached patch. > > -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
> Mark Butler writes: > > > Thanks. I didn't realize the need to move the DOUBLE token from the TokenId to > > the ColId production. Will this patch be integrated into the head branch? > > Not sure. It's not a standard type, but at least two other RDBMS have it > and the name does make sense. Any comments? That's a tough call. We already have some duplicate type symbols, but this is not a standard SQL type. I would see if we can get others to say it is a good idea. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > That's a tough call. We already have some duplicate type symbols, but > this is not a standard SQL type. I would see if we can get others to > say it is a good idea. But the DOUBLE keyword is already reserved by ANSI for use in the "DOUBLE PRECISION" type, so a "DOUBLE" synonym for it shouldn't make much of a difference. Right? - Mark Butler
Peter Eisentraut <peter_e@gmx.net> writes: > Mark Butler writes: >> Thanks. I didn't realize the need to move the DOUBLE token from the TokenId to >> the ColId production. Will this patch be integrated into the head branch? > Not sure. It's not a standard type, but at least two other RDBMS have it > and the name does make sense. Any comments? Seems like a reasonable change (for 7.2, not now). regards, tom lane
ncm@zembu.com (Nathan Myers) writes: > Yacc and yacc-like programs are most useful when the target grammar (or > your understanding of it) is not very stable. With Yacc you can make > sweeping changes much more easily; big changes can be a lot of work in > a hand-coded parser. And, in fact, this is precisely the killer reason why we will not switch to a handwritten parser anytime in the foreseeable future. Postgres' grammar is NOT stable. Compare the gram.y files for any two recent releases. I foresee changes at least as large in upcoming releases, btw, as we implement more of SQL92/99 and drop ancient PostQuel-isms. I have some interest in proposals to switch to a better parser-generator tool than yacc ... but yacc has the advantages of being widely available and widely understood. You'd need a pretty significant improvement over yacc to make it worth switching. regards, tom lane
On Fri, Apr 13, 2001 at 03:12:57AM -0400, Tom Lane wrote: > I have some interest in proposals to switch to a better parser-generator > tool than yacc ... There are tools to produce efficient top-down parsers as well. Those doing such parsers "by hand" may be interested in looking at PCCTS (http://www.polhode.com/pccts.html) or ANTLR (http://www.antlr.org/). -- Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/