On Tue, Dec 18, 2012 at 4:33 AM, Dimitri Fontaine
<dimitri@2ndquadrant.fr> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> And on the other hand, if you could get a clean split between the two
>> grammars, then regardless of exactly what the split was, it might seem
>> a win. But it seemed to me when I looked at this that you'd have to
>> duplicate a lot of stuff and the small parser still wouldn't end up
>> being very small, which I found hard to get excited about.
>
> I think the goal is not so much about getting a much smaller parser, but
> more about have a separate parser that you don't care about the "bloat"
> of, so that you can improve DDL without fearing about main parser
> performance regressions.
Well that would be nice, but the problem is that I see no way to
implement it. If, with a unified parser, the parser is 14% of our
source code, then splitting it in two will probably crank that number
up well over 20%, because there will be duplication between the two.
That seems double-plus un-good.
I can't help but suspect that the way we handle keywords today is
monumentally inefficient. The unreserved_keyword products, et al,
just seem somehow badly wrong-headed. We take the trouble to
distinguish all of those cases so that we an turn around and not
distinguish them. I feel like there ought to be some way to use lexer
states to handle this - if we're in a context where an unreserved
keyword will be treated as an IDENT, then have the lexer return IDENT
when it sees an unreserved keyword. I might be wrong, but it seems
like that would eliminate a whole lot of parser state transitions.
However, even if I'm right, I have no idea how to implement it. It
just seems very wasteful that we have so many parser states that have
no purpose other than (effectively) to convert an unreserved_keyword
into an IDENT when the lexer could do the same thing much more cheaply
given a bit more context.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company