Thread: Remove useless associativity/precedence from parsers
Hi all, Many of the grammars could be clarified. For instance there's a number of useless associativity and/or precedence declarations. Maybe the point is to leave some form of a documentation, but actually, since it's not used at all by the tool,that documentation is not checked. In the following two proposed patches, I remove directives that are completely useless. In other places, some associativityis declared (e.g., with %left) although the associativity is useless, only the precedence matters. I have notchanged this, because I don't know what is the version of Bison that is required. Given that it's a maintainer-side tool,I would suggest targeting recent versions of Bison, but opinions might differ here. Cheers! commit 75e597aa239d8ebc332d3a29630ecad0133d3d6f Author: Akim Demaille <akim.demaille@gmail.com> Date: Sun May 19 14:24:33 2019 +0200 json_path: remove useless precedence directives These directives are useless: the generated parser is exactly the same (except for line number changes). diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y index 22c2089f78..82b6529414 100644 --- a/src/backend/utils/adt/jsonpath_gram.y +++ b/src/backend/utils/adt/jsonpath_gram.y @@ -115,11 +115,9 @@ static JsonPathParseItem *makeItemLikeRegex(JsonPathParseItem *expr, %left OR_P %left AND_P -%right NOT_P %left '+' '-' %left '*' '/' '%' %left UMINUS -%nonassoc '(' ')' /* Grammar follows */ %% This second patch could be made simpler: just remove the %token declarations I provided, but then the generated files aredifferent (but, of course, both parsers are equivalent). commit 5322f7303a1a9dfa7cd959d68caeced847ae0466 Author: Akim Demaille <akim.demaille@gmail.com> Date: Sun May 19 14:32:15 2019 +0200 parser: remove useless associativity/precedence Use %token instead to guarantee that the token numbers are the same before and after this patch. As a consequence, the generated files are equal. diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 3dc0e8a4fb..3d4c552cfa 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -766,10 +766,10 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %left AT /* sets precedence for AT TIME ZONE */ %left COLLATE %right UMINUS -%left '[' ']' +%token '[' ']' %left '(' ')' %left TYPECAST -%left '.' +%token '.' /* * These might seem to be low-precedence, but actually they are not part * of the arithmetic hierarchy at all in their use as JOIN operators.
Akim Demaille <akim@lrde.epita.fr> writes: > In the following two proposed patches, I remove directives that are > completely useless. I'm far from convinced that the proposed changes in gram.y are a good idea. Both [] and . (field selection) *are* left-associative in a meaningful sense, so even if this change happens not to affect what Bison does, I think the declarations are good documentation. Would you have us also change the user documentation at https://www.postgresql.org/docs/devel/sql-syntax-lexical.html#SQL-PRECEDENCE ? I haven't looked at the jsonpath grammar closely enough to have an opinion about that one. regards, tom lane
Hi Tom, > Le 19 mai 2019 à 20:27, Tom Lane <tgl@sss.pgh.pa.us> a écrit : > > Akim Demaille <akim@lrde.epita.fr> writes: >> In the following two proposed patches, I remove directives that are >> completely useless. > > I'm far from convinced that the proposed changes in gram.y are a good > idea. Both [] and . (field selection) *are* left-associative in a > meaningful sense, so even if this change happens not to affect what > Bison does, I think the declarations are good documentation. I don't dispute the overall behavior of the grammar as a whole, I'm only referring to these directives. In my experience,leaving useless associativity and precedence directives can be misleading (since these directives have no impact,you could put them anywhere: their contribution is not checked in any way) or even dangerous (some day, some changeintroduces unexpected shift-reduce conflicts that someone should have studied, but because of "stray" directives, theyare "fixed" in some uncontrolled way). > Would > you have us also change the user documentation at > https://www.postgresql.org/docs/devel/sql-syntax-lexical.html#SQL-PRECEDENCE > ? No, of course not! That you define the arithmetics with an unambiguous grammar (expr/term/fact and no associativity/precedencedirective) or with an ambiguous grammar (expr and associativity/precedence directives) still resultsin the same behavior: the usual behavior of these operators. And the documentation should document that, of course. It is for the same reasons that I would recommend not using associativity directives (%left, %right, %nonassoc) where associativityplays no role: %precedence is made for this. But it was introduced in Bison 2.7.1 (2013-04-15), and I don'tknow if requiring it is acceptable to PostgreSQL. Cheers!
Akim Demaille <akim@lrde.epita.fr> writes: > It is for the same reasons that I would recommend not using associativity directives (%left, %right, %nonassoc) where associativityplays no role: %precedence is made for this. But it was introduced in Bison 2.7.1 (2013-04-15), and I don'tknow if requiring it is acceptable to PostgreSQL. 2013? Certainly not. We have a lot of buildfarm critters running older platforms than that. I believe our (documented and tested) minimum version of Bison is still 1.875. While we'd be willing to move that goalpost if there were clear benefits from doing so, I'm not even convinced that %precedence as you describe it here is any improvement at all. regards, tom lane
Hi Tom! > Le 20 mai 2019 à 15:54, Tom Lane <tgl@sss.pgh.pa.us> a écrit : > > Akim Demaille <akim@lrde.epita.fr> writes: >> It is for the same reasons that I would recommend not using associativity directives (%left, %right, %nonassoc) whereassociativity plays no role: %precedence is made for this. But it was introduced in Bison 2.7.1 (2013-04-15), and Idon't know if requiring it is acceptable to PostgreSQL. > > 2013? Certainly not. We have a lot of buildfarm critters running > older platforms than that. This I fully understand. However, Bison is a source generator, and it's quite customary to use modern tools on the maintainer side, and then deploy the result them on possibly much older architectures. Usually users of Bison build tarballs with the generated parsers in them, and ship/test from that. > I believe our (documented and tested) > minimum version of Bison is still 1.875. While we'd be willing > to move that goalpost if there were clear benefits from doing so, > I'm not even convinced that %precedence as you describe it here > is any improvement at all. Ok. I find this really surprising: you are leaving dormant directives that may fire some day without anyone knowing. You could comment out the useless associativity/precedence directives, that would just as well document them, without this risk. But, Ok, that's only my opinion. So Bison, and your use of it today, is exactly what you need? There's no limitation of that tool that you'd like to see address that would make it a better tool for PostgreSQL?
Akim Demaille <akim@lrde.epita.fr> writes: >> Le 20 mai 2019 à 15:54, Tom Lane <tgl@sss.pgh.pa.us> a écrit : >> 2013? Certainly not. We have a lot of buildfarm critters running >> older platforms than that. > This I fully understand. However, Bison is a source generator, > and it's quite customary to use modern tools on the maintainer > side, and then deploy the result them on possibly much older > architectures. > Usually users of Bison build tarballs with the generated parsers > in them, and ship/test from that. As do we, but at the same time we don't want to make our tool requirements too onerous. I think that really the practical limit right now is Bison 2.3 --- Apple is still shipping that as their system version, so requiring something newer than 2.3 would put extra burden on people doing PG development on Macs (of which there are a lot). The fact that we still test 1.875 is mostly just an "if it ain't broke don't break it" thing, ie don't move the goalposts without a reason. > So Bison, and your use of it today, is exactly what you need? > There's no limitation of that tool that you'd like to see > address that would make it a better tool for PostgreSQL? Well, there are a couple of pain points, but they're not going to be addressed by marginal hacking on declarations ;-). The things that we find really painful, IMV, are: * Speed of the generated parser could be better. I suspect this has a lot to do with the fact that our grammar is huge, and so are the tables, and that causes lots of cache misses. Maybe this could be addressed by trying to make the tables smaller and/or laid out in a way with better cache locality? * Lack of run-time extensibility of the parser. There are many PG extensions that wish they could add things into the grammar, and can't. This is pretty pie-in-the-sky, I know. One of the main reasons we stick to Bison is the compile-time grammar sanity checks it provides, and it's not apparent how to have that and extensibility too. But it's still a pain point. * LALR(1) parsing can only barely cope with SQL, and the standards committee keeps making it harder. We've got some hacks that fake an additional token of lookahead in some places, but that's just an ugly (and performance-eating) hack. Maybe Bison's GLR mode would already solve that, but no one here has really looked into whether it could improve matters or whether it'd come at a performance cost. The Bison manual's description of GLR doesn't give me a warm feeling either about the performance impact or whether we'd still get compile-time warnings about bogus grammars. Other PG hackers might have a different laundry list, but that's mine. regards, tom lane
Hi Tom, > Le 21 mai 2019 à 21:06, Tom Lane <tgl@sss.pgh.pa.us> a écrit : > > Akim Demaille <akim@lrde.epita.fr> writes: >>> Le 20 mai 2019 à 15:54, Tom Lane <tgl@sss.pgh.pa.us> a écrit : >>> 2013? Certainly not. We have a lot of buildfarm critters running >>> older platforms than that. > >> This I fully understand. However, Bison is a source generator, >> and it's quite customary to use modern tools on the maintainer >> side, and then deploy the result them on possibly much older >> architectures. >> Usually users of Bison build tarballs with the generated parsers >> in them, and ship/test from that. > > As do we, but at the same time we don't want to make our tool > requirements too onerous. I think that really the practical limit > right now is Bison 2.3 --- Apple is still shipping that as their system > version, And do not expect Apple to update it at all. Apple refuses the GPLv3, and stopped updating Bison to the last GPLv2 release, as it did for every GPL'd program. > so requiring something newer than 2.3 would put extra burden > on people doing PG development on Macs (of which there are a lot). Honestly, I seriously doubt that you have contributors that don't have MacPorts or Brew installed, and both are pretty up to date on Bison. >> So Bison, and your use of it today, is exactly what you need? >> There's no limitation of that tool that you'd like to see >> address that would make it a better tool for PostgreSQL? > > Well, there are a couple of pain points, but they're not going to be > addressed by marginal hacking on declarations ;-). The things that > we find really painful, IMV, are: > > * Speed of the generated parser could be better. Expect news this year about that. I precisely came to look at PostgreSQL for this. Is there an easy way to bench pg and the various costs? To be explicit: is there a way to see how long the parsing phase takes? And some mighty inputs to bench against? > I suspect this has > a lot to do with the fact that our grammar is huge, and so are the > tables, and that causes lots of cache misses. Maybe this could be > addressed by trying to make the tables smaller and/or laid out in > a way with better cache locality? The improvement I have in mind is about LR techniques, not about this. But you are right that it might be interesting. It's unlikely that the table can be made smaller though. Bison was designed when space was really scarce, and a lot of efforts were invested to make the tables as small as possible. The current trend is actually to optionally consume more space in exchange for better services (such as more accurate error messages). > * Lack of run-time extensibility of the parser. There are many PG > extensions that wish they could add things into the grammar, and can't. Making the grammars extensible definitely makes sense, and it's in the wishlist. But making this doable at runtime is a much bigger problem... However, maybe this can be achieved by calling the plugin parser from the outer parser. Provided, of course, that the grammar of the plugin is really in a "separate world"; if it also wants to get bits of the host grammar, it's certainly not so easy. Are there documented examples of this? What would that look like? > * LALR(1) parsing can only barely cope with SQL, and the standards > committee keeps making it harder. But Bison does more: it also provides support for LR(1) and IELR(1), which accept more (deterministic) grammars, and are not subject to "mysterious s/r conflicts" as in LALR. But maybe you refer to even beyond LR(1): > We've got some hacks that fake > an additional token of lookahead in some places, but that's just an > ugly (and performance-eating) hack. More than k=1 is unlikely to happen. Given that we have GLR, which provides us with k=∞ :) > Maybe Bison's GLR mode would already solve that, No doubt about that. Bison's grammar is not LR(1) either, because the rules are not mandated to end with ';', so when reading a grammar, in a sequence such as "<ID> <:> <ID>" the parser cannot know whether the second <ID> is the RHS of the rule introduced by the first <ID>, or the beginning of another rule if the sequence is actually "<ID> <:> <ID> <:>". Because of that, Bison is also playing dirty tricks to turn this LR(2) into LR(1). But as a consequence the grammar is much harder to evolve, the locations are less accurate (because two tokens are merged together into a mega token), etc. So it is considered to turn Bison's own parser to GLR. > but no one here has really looked into whether > it could improve matters or whether it'd come at a performance cost. That should be very easy to check: just adding %glr to the grammar should not change the API, should not change the visible behavior, but will give you a hint of the intrinsic cost of the GLR backend. > The Bison manual's description of GLR doesn't give me a warm feeling > either about the performance impact The GLR backend is efficient... for a GLR backend. I have not benched it against the deterministic backend, but now that you mention it, it's an obvious need... > or whether we'd still get > compile-time warnings about bogus grammars. Well, you can't fight nature here. Ambiguity of a grammar is undecidable. That being said, there's a modified Bison which implements heuristics to detect ambiguities in grammar [1]. Would that make you feel more comfortable? [1] http://www.lsv.fr/~schmitz/pub/expamb.pdf
Akim Demaille <akim@lrde.epita.fr> writes: > Honestly, I seriously doubt that you have contributors that don't > have MacPorts or Brew installed, and both are pretty up to date on > Bison. Hm, well, I'm a counterexample ;-). Right now you can develop PG on a Mac just fine without any additional stuff, excepting maybe OpenSSL if you want that. If we have a strong reason to require a newer Bison, I'd be willing to do so, but it needs to be a strong reason. >> * Speed of the generated parser could be better. > Expect news this year about that. I precisely came to look at > PostgreSQL for this. That's very cool news. > Is there an easy way to bench pg and the various > costs? To be explicit: is there a way to see how long the parsing > phase takes? And some mighty inputs to bench against? The easiest method is to fire up some client code that repeatedly does whatever you want to test, and then look at perf or oprofile or local equivalent to see where the time is going in the backend process. For the particular case of stressing the parser, probably the best thing to look at is test cases that do a lot of low-overhead DDL, such as creating views. You could do worse than just repeatedly sourcing our standard view files, like src/backend/catalog/system_views.sql src/backend/catalog/information_schema.sql (In either case, I'd suggest adapting the file to create all its objects in some transient schema that you can just drop. Repointing information_schema.sql to some other schema is trivial, just change a couple of commands at the top; and you could tweak system_views.sql similarly. Also consider wrapping the whole thing in BEGIN; ... ROLLBACK; instead of spending time on an explicit DROP.) Somebody else might know of a better test case but I'd try that first. There would still be a fair amount of I/O and catalog lookup overhead in a test run that way, but it would be an honest approximation of useful real-world cases. If you're willing to put some blinders on and just micro-optimize the flex/bison code, you could set up a custom function that just calls that stuff. I actually did that not too long ago; C code attached for amusement's sake. >> * Lack of run-time extensibility of the parser. There are many PG >> extensions that wish they could add things into the grammar, and can't. > Are there documented examples of this? What would that look like? I'm just vaguely recalling occasional how-could-I-do-this complaints on the pgsql-hackers mailing list. Perhaps somebody else could say something more concrete. regards, tom lane /* build this into a Postgres shared library, then create function drive_parser(query text, count int) returns void strict volatile language c as '.../drive_parser.so'; \timing select drive_parser('some-query', 1000); */ #include "postgres.h" #include "fmgr.h" #include "miscadmin.h" #include "tcop/tcopprot.h" #include "utils/builtins.h" #include "utils/memutils.h" PG_MODULE_MAGIC; /* * drive_parser(query text, count int) returns void */ PG_FUNCTION_INFO_V1(drive_parser); Datum drive_parser(PG_FUNCTION_ARGS) { text *txt = PG_GETARG_TEXT_PP(0); int32 count = PG_GETARG_INT32(1); char *query_string = text_to_cstring(txt); MemoryContext mycontext; mycontext = AllocSetContextCreate(CurrentMemoryContext, "drive_parser work cxt", ALLOCSET_DEFAULT_SIZES); while (count-- > 0) { List *parsetree_list; MemoryContext oldcontext; oldcontext = MemoryContextSwitchTo(mycontext); /* This times raw parsing only */ parsetree_list = pg_parse_query(query_string); MemoryContextSwitchTo(oldcontext); MemoryContextReset(mycontext); CHECK_FOR_INTERRUPTS(); } PG_RETURN_VOID(); }
> On 22 May 2019, at 23:25, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Akim Demaille <akim@lrde.epita.fr> writes: >> Honestly, I seriously doubt that you have contributors that don't >> have MacPorts or Brew installed, and both are pretty up to date on >> Bison. > > Hm, well, I'm a counterexample ;-) And one more. While I do have brew installed, I prefer to use it as little as possible. cheers ./daniel
On 5/21/19 11:49 AM, Akim Demaille wrote: > Hi Tom! > >> Le 20 mai 2019 à 15:54, Tom Lane <tgl@sss.pgh.pa.us> a écrit : >> >> Akim Demaille <akim@lrde.epita.fr> writes: >>> It is for the same reasons that I would recommend not using associativity directives (%left, %right, %nonassoc) whereassociativity plays no role: %precedence is made for this. But it was introduced in Bison 2.7.1 (2013-04-15), and Idon't know if requiring it is acceptable to PostgreSQL. >> 2013? Certainly not. We have a lot of buildfarm critters running >> older platforms than that. > This I fully understand. However, Bison is a source generator, > and it's quite customary to use modern tools on the maintainer > side, and then deploy the result them on possibly much older > architectures. > > Usually users of Bison build tarballs with the generated parsers > in them, and ship/test from that. > The buildfarm client does not build from tarballs, it builds from git, meaning it has to run bison. Thus Tom's objection is quite valid, and your dismissal of it is not. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > On 5/21/19 11:49 AM, Akim Demaille wrote: >> Usually users of Bison build tarballs with the generated parsers >> in them, and ship/test from that. > The buildfarm client does not build from tarballs, it builds from git, > meaning it has to run bison. Thus Tom's objection is quite valid, and > your dismissal of it is not. Right, but that's a much narrower set of people who need to update than "all PG users" or even "all PG developers". I checked the buildfarm's configure results not too long ago, and noted that the oldest bison versions are gaur | configure: using bison (GNU Bison) 1.875 prairiedog | configure: using bison (GNU Bison) 1.875 dromedary | configure: using bison (GNU Bison) 2.3 locust | configure: using bison (GNU Bison) 2.3 longfin | configure: using bison (GNU Bison) 2.3 nudibranch | configure: using bison (GNU Bison) 2.3 anole | configure: using bison (GNU Bison) 2.4.1 fulmar | configure: using bison (GNU Bison) 2.4.1 gharial | configure: using bison (GNU Bison) 2.4.1 grouse | configure: using bison (GNU Bison) 2.4.1 koreaceratops | configure: using bison (GNU Bison) 2.4.1 leech | configure: using bison (GNU Bison) 2.4.1 magpie | configure: using bison (GNU Bison) 2.4.1 treepie | configure: using bison (GNU Bison) 2.4.1 coypu | configure: using bison (GNU Bison) 2.4.3 friarbird | configure: using bison (GNU Bison) 2.4.3 nightjar | configure: using bison (GNU Bison) 2.4.3 (then 2.5 and later) (This doesn't cover the Windows members, unfortunately.) gaur and prairiedog are my own pet dinosaurs, and updating them would not be very painful. (Neither of them are using the original vendor Bison to begin with ... as I said, they're dinosaurs.) Meanwhile, three of the 2.3 members are Mac systems; nudibranch is SUSE 11. Requiring anything newer than 2.4.1 would start to cause problems for a fair number of people, I think. Still, the bottom line here is that we could require a new(ish) Bison if we could point to clear benefits that outweigh the pain. Right now there's not much argument for it. regards, tom lane
On Tue, May 21, 2019 at 3:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Other PG hackers might have a different laundry list, but that's mine. Good list. Another thing is that it would be nice to have a better way of resolving conflicts than attaching precedence declarations. Some problems can't be solved that way at all, and others can only be solved that way at the risk of unforeseen side effects. One possible idea is a way to mark a rule %weak, meaning that it should only be used if no non-%weak rule could apply. I'm not sure if that would really be the best way, but it's one idea. A more general version would be some kind of ability to give rules different strengths; in the case of a grammar conflict, the "stronger" rule would win. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Another thing is that it would be nice to have a better way of > resolving conflicts than attaching precedence declarations. Some > problems can't be solved that way at all, and others can only be > solved that way at the risk of unforeseen side effects. Yeah, we've definitely found that resolving shift/reduce conflicts via precedence declarations has more potential for surprising side-effects than one would think. It feels to me that there's something basically wrong with that concept, or at least wrong with the way we've used it. Some relevant commits: 670a6c7a2, 12b716457, 6fe27ca2f, and the "x NOT-something y" hacks in commit c6b3c939b (that one has a whole bunch of other cruft in it, so it might be hard to spot what I'm talking about). > One possible > idea is a way to mark a rule %weak, meaning that it should only be > used if no non-%weak rule could apply. I'm not sure if that would > really be the best way, but it's one idea. A more general version > would be some kind of ability to give rules different strengths; in > the case of a grammar conflict, the "stronger" rule would win. Hmmm ... not apparent to me that that's really going to help. Maybe it will, but it sounds like more likely it's just another mechanism with not-as-obvious-as-you-thought side effects. regards, tom lane
On Thu, May 23, 2019 at 12:00 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > One possible > > idea is a way to mark a rule %weak, meaning that it should only be > > used if no non-%weak rule could apply. I'm not sure if that would > > really be the best way, but it's one idea. A more general version > > would be some kind of ability to give rules different strengths; in > > the case of a grammar conflict, the "stronger" rule would win. > > Hmmm ... not apparent to me that that's really going to help. > Maybe it will, but it sounds like more likely it's just another > mechanism with not-as-obvious-as-you-thought side effects. That's possible; I'm open to other ideas. If you wanted to be really explicit about it, you could have a way to stick an optional name on a grammar rule, and a way to say that the current rule should lose to a list of named other rules. It seems pretty clear, though, that our use of %prec proves that we can't just write a grammar that is intrinsically conflict-free; we sometimes need to have conflicts and then tell the parser generator which option to prefer. And I think it's also pretty clear that %prec is, for anything other than operator precedence, a horrible way of doing that. A method that was merely mediocre could still be a big improvement over what we have available today. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2019-05-22 17:25:31 -0400, Tom Lane wrote: > The easiest method is to fire up some client code that repeatedly > does whatever you want to test, and then look at perf or oprofile > or local equivalent to see where the time is going in the backend > process. > > For the particular case of stressing the parser, probably the > best thing to look at is test cases that do a lot of low-overhead > DDL, such as creating views. You could do worse than just repeatedly > sourcing our standard view files, like > src/backend/catalog/system_views.sql > src/backend/catalog/information_schema.sql > (In either case, I'd suggest adapting the file to create all > its objects in some transient schema that you can just drop. > Repointing information_schema.sql to some other schema is > trivial, just change a couple of commands at the top; and > you could tweak system_views.sql similarly. Also consider > wrapping the whole thing in BEGIN; ... ROLLBACK; instead of > spending time on an explicit DROP.) > > Somebody else might know of a better test case but I'd try > that first. > There would still be a fair amount of I/O and catalog lookup > overhead in a test run that way, but it would be an honest > approximation of useful real-world cases. If you're willing to > put some blinders on and just micro-optimize the flex/bison > code, you could set up a custom function that just calls that > stuff. I actually did that not too long ago; C code attached > for amusement's sake. FWIW, this is why I'd suggested the hack of EXPLAIN (PARSE_ANALYZE OFF, OPTIMIZE OFF) a few years back. Right now it's hard to measure the parser in isolation. Greetings, Andres Freund
hi Tom! > Le 23 mai 2019 à 00:29, Tom Lane <tgl@sss.pgh.pa.us> a écrit : > > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> On 5/21/19 11:49 AM, Akim Demaille wrote: >>> Usually users of Bison build tarballs with the generated parsers >>> in them, and ship/test from that. > >> The buildfarm client does not build from tarballs, it builds from git, >> meaning it has to run bison. Thus Tom's objection is quite valid, and >> your dismissal of it is not. I had not realized I had been rude to anybody. I apologize to Tom, I did not mean to dismiss anything. > Right, but that's a much narrower set of people who need to update > than "all PG users" or even "all PG developers". > > I checked the buildfarm's configure results not too long ago, and noted > that the oldest bison versions are > > gaur | configure: using bison (GNU Bison) 1.875 > prairiedog | configure: using bison (GNU Bison) 1.875 > dromedary | configure: using bison (GNU Bison) 2.3 > locust | configure: using bison (GNU Bison) 2.3 > longfin | configure: using bison (GNU Bison) 2.3 > nudibranch | configure: using bison (GNU Bison) 2.3 > anole | configure: using bison (GNU Bison) 2.4.1 > fulmar | configure: using bison (GNU Bison) 2.4.1 > gharial | configure: using bison (GNU Bison) 2.4.1 > grouse | configure: using bison (GNU Bison) 2.4.1 > koreaceratops | configure: using bison (GNU Bison) 2.4.1 > leech | configure: using bison (GNU Bison) 2.4.1 > magpie | configure: using bison (GNU Bison) 2.4.1 > treepie | configure: using bison (GNU Bison) 2.4.1 > coypu | configure: using bison (GNU Bison) 2.4.3 > friarbird | configure: using bison (GNU Bison) 2.4.3 > nightjar | configure: using bison (GNU Bison) 2.4.3 > (then 2.5 and later) > > (This doesn't cover the Windows members, unfortunately.) > > gaur and prairiedog are my own pet dinosaurs, and updating them would > not be very painful. I don't want to be painful, but the fact that the buildfarm starts from git is also a design choice. In order to be fully reproducible, I know projects that have a first step that builds the end-user type of tarball, and run it on all sorts of architectures. Of course, it's more initial set up, but you gain your independence on many tools which in turn makes it easier to check on more architectures. > Still, the bottom line here is that we could require a new(ish) Bison > if we could point to clear benefits that outweigh the pain. Right > now there's not much argument for it. I get that, thanks. Cheers!
Hey Tom, > Le 22 mai 2019 à 23:25, Tom Lane <tgl@sss.pgh.pa.us> a écrit : > > Akim Demaille <akim@lrde.epita.fr> writes: >> Honestly, I seriously doubt that you have contributors that don't >> have MacPorts or Brew installed, and both are pretty up to date on >> Bison. > > Hm, well, I'm a counterexample ;-). Wow :) I have even more respect now :) I'm soooo happy to use the bleeding edge compilers to get all the possible warnings and sanitizers... Thanks for the tips on how to bench. I'll see what I can do (I'm not a (direct) user of pg myself, nor am I used to write SQL by hand). Cheers!
> Le 22 mai 2019 à 23:44, Daniel Gustafsson <daniel@yesql.se> a écrit : > >> On 22 May 2019, at 23:25, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> Akim Demaille <akim@lrde.epita.fr> writes: >>> Honestly, I seriously doubt that you have contributors that don't >>> have MacPorts or Brew installed, and both are pretty up to date on >>> Bison. >> >> Hm, well, I'm a counterexample ;-) > > And one more. While I do have brew installed, I prefer to use it as little as > possible. Err... You're not exactly what I call a counterexample.
Tom, > Le 23 mai 2019 à 06:00, Tom Lane <tgl@sss.pgh.pa.us> a écrit : > > Robert Haas <robertmhaas@gmail.com> writes: >> Another thing is that it would be nice to have a better way of >> resolving conflicts than attaching precedence declarations. Some >> problems can't be solved that way at all, and others can only be >> solved that way at the risk of unforeseen side effects. > > Yeah, we've definitely found that resolving shift/reduce conflicts via > precedence declarations has more potential for surprising side-effects > than one would think. That's why in recent versions of Bison we also provide a means to pure %expect directives on the rules themselves, to be more precise about what happens. > It feels to me that there's something basically > wrong with that concept, or at least wrong with the way we've used it. I'm trying to find means to scope the prec/assoc directives, because they are too powerful, and that's dangerous. This is also why I try to remove the useless ones. Some people don't trust assoc/prec directives at all and use only unambiguous grammars. But this can be very verbose... I agree something is not so cool about these directives. GLR parsers have a clear concept of in-between-rules precedence (%dprec). Something similar for LR (hence fully static) would be nice, but it remains to be invented.
>>>>> "Akim" == Akim Demaille <akim@lrde.epita.fr> writes: >> Yeah, we've definitely found that resolving shift/reduce conflicts >> via precedence declarations has more potential for surprising >> side-effects than one would think. Akim> That's why in recent versions of Bison we also provide a means to Akim> pure %expect directives on the rules themselves, to be more Akim> precise about what happens. It's possibly worth looking at the details of each case where we've run into problems to see whether there is a better solution. The main cases I know of are: 1. RANGE UNBOUNDED PRECEDING - this one is actually a defect in the standard SQL grammar, since UNBOUNDED is a non-reserved keyword and so it can also appear as a legal <identifier>, and the construct RANGE <unsigned value specification> PRECEDING allows <identifier> to appear as a <SQL parameter reference>. We solve this by giving UNBOUNDED a precedence below PRECEDING. 2. CUBE() - in the SQL spec, GROUP BY does not allow expressions, only column references, but we allow expressions as an extension. The syntax GROUP BY CUBE(a,b) is a shorthand for grouping sets, but this is ambiguous with a function cube(...). (CUBE is also a reserved word in the spec, but it's an unreserved keyword for us.) We solve this by giving CUBE (and ROLLUP) precedence below '('. 3. General handling of postfix operator conflicts The fact that we allow postfix operators means that any sequence which looks like <expression> <identifier> is ambiguous. This affects the use of aliases in the SELECT list, and also PRECEDING, FOLLOWING, GENERATED, and NULL can all follow expressions. 4. Not reserving words that the spec says should be reserved We avoid reserving PARTITION, RANGE, ROWS, GROUPS by using precedence hacks. -- Andrew (irc:RhodiumToad)
On 2019-May-28, Andrew Gierth wrote: > The main cases I know of are: > > 1. RANGE UNBOUNDED PRECEDING - this one is actually a defect in the > standard SQL grammar, since UNBOUNDED is a non-reserved keyword and so > it can also appear as a legal <identifier>, and the construct > RANGE <unsigned value specification> PRECEDING allows <identifier> to > appear as a <SQL parameter reference>. Should we report this to the SQL committee? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, May 21, 2019 at 03:06:43PM -0400, Tom Lane wrote: > * Speed of the generated parser could be better. I suspect this has > a lot to do with the fact that our grammar is huge, and so are the > tables, and that causes lots of cache misses. Maybe this could be > addressed by trying to make the tables smaller and/or laid out in > a way with better cache locality? Agreed. This was brought up in January, with a little more specificity: https://www.postgresql.org/message-id/20190125223859.GD13803@momjian.us With our scanner keywords list now more cache-aware, and with us planning to use Bison for years to come, has anyone ever looked at reordering the bison state machine array to be more cache aware, e.g., having common states next to each other rather than scattered around the array? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +