Re: Remove useless associativity/precedence from parsers - Mailing list pgsql-hackers
From | Akim Demaille |
---|---|
Subject | Re: Remove useless associativity/precedence from parsers |
Date | |
Msg-id | 74DD0F55-F3CF-447E-ACF2-88C01E42AAC3@lrde.epita.fr Whole thread Raw |
In response to | Re: Remove useless associativity/precedence from parsers (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
Hi Tom, > Le 21 mai 2019 à 21:06, Tom Lane <tgl@sss.pgh.pa.us> a écrit : > > Akim Demaille <akim@lrde.epita.fr> writes: >>> Le 20 mai 2019 à 15:54, Tom Lane <tgl@sss.pgh.pa.us> a écrit : >>> 2013? Certainly not. We have a lot of buildfarm critters running >>> older platforms than that. > >> This I fully understand. However, Bison is a source generator, >> and it's quite customary to use modern tools on the maintainer >> side, and then deploy the result them on possibly much older >> architectures. >> Usually users of Bison build tarballs with the generated parsers >> in them, and ship/test from that. > > As do we, but at the same time we don't want to make our tool > requirements too onerous. I think that really the practical limit > right now is Bison 2.3 --- Apple is still shipping that as their system > version, And do not expect Apple to update it at all. Apple refuses the GPLv3, and stopped updating Bison to the last GPLv2 release, as it did for every GPL'd program. > so requiring something newer than 2.3 would put extra burden > on people doing PG development on Macs (of which there are a lot). Honestly, I seriously doubt that you have contributors that don't have MacPorts or Brew installed, and both are pretty up to date on Bison. >> So Bison, and your use of it today, is exactly what you need? >> There's no limitation of that tool that you'd like to see >> address that would make it a better tool for PostgreSQL? > > Well, there are a couple of pain points, but they're not going to be > addressed by marginal hacking on declarations ;-). The things that > we find really painful, IMV, are: > > * Speed of the generated parser could be better. Expect news this year about that. I precisely came to look at PostgreSQL for this. Is there an easy way to bench pg and the various costs? To be explicit: is there a way to see how long the parsing phase takes? And some mighty inputs to bench against? > I suspect this has > a lot to do with the fact that our grammar is huge, and so are the > tables, and that causes lots of cache misses. Maybe this could be > addressed by trying to make the tables smaller and/or laid out in > a way with better cache locality? The improvement I have in mind is about LR techniques, not about this. But you are right that it might be interesting. It's unlikely that the table can be made smaller though. Bison was designed when space was really scarce, and a lot of efforts were invested to make the tables as small as possible. The current trend is actually to optionally consume more space in exchange for better services (such as more accurate error messages). > * Lack of run-time extensibility of the parser. There are many PG > extensions that wish they could add things into the grammar, and can't. Making the grammars extensible definitely makes sense, and it's in the wishlist. But making this doable at runtime is a much bigger problem... However, maybe this can be achieved by calling the plugin parser from the outer parser. Provided, of course, that the grammar of the plugin is really in a "separate world"; if it also wants to get bits of the host grammar, it's certainly not so easy. Are there documented examples of this? What would that look like? > * LALR(1) parsing can only barely cope with SQL, and the standards > committee keeps making it harder. But Bison does more: it also provides support for LR(1) and IELR(1), which accept more (deterministic) grammars, and are not subject to "mysterious s/r conflicts" as in LALR. But maybe you refer to even beyond LR(1): > We've got some hacks that fake > an additional token of lookahead in some places, but that's just an > ugly (and performance-eating) hack. More than k=1 is unlikely to happen. Given that we have GLR, which provides us with k=∞ :) > Maybe Bison's GLR mode would already solve that, No doubt about that. Bison's grammar is not LR(1) either, because the rules are not mandated to end with ';', so when reading a grammar, in a sequence such as "<ID> <:> <ID>" the parser cannot know whether the second <ID> is the RHS of the rule introduced by the first <ID>, or the beginning of another rule if the sequence is actually "<ID> <:> <ID> <:>". Because of that, Bison is also playing dirty tricks to turn this LR(2) into LR(1). But as a consequence the grammar is much harder to evolve, the locations are less accurate (because two tokens are merged together into a mega token), etc. So it is considered to turn Bison's own parser to GLR. > but no one here has really looked into whether > it could improve matters or whether it'd come at a performance cost. That should be very easy to check: just adding %glr to the grammar should not change the API, should not change the visible behavior, but will give you a hint of the intrinsic cost of the GLR backend. > The Bison manual's description of GLR doesn't give me a warm feeling > either about the performance impact The GLR backend is efficient... for a GLR backend. I have not benched it against the deterministic backend, but now that you mention it, it's an obvious need... > or whether we'd still get > compile-time warnings about bogus grammars. Well, you can't fight nature here. Ambiguity of a grammar is undecidable. That being said, there's a modified Bison which implements heuristics to detect ambiguities in grammar [1]. Would that make you feel more comfortable? [1] http://www.lsv.fr/~schmitz/pub/expamb.pdf
pgsql-hackers by date: