Thread: Remove useless associativity/precedence from parsers

Remove useless associativity/precedence from parsers

From
Akim Demaille
Date:
Hi all,

Many of the grammars could be clarified.  For instance there's a number of useless associativity and/or precedence
declarations. Maybe the point is to leave some form of a documentation, but actually, since it's not used at all by the
tool,that documentation is not checked. 

In the following two proposed patches, I remove directives that are completely useless.  In other places, some
associativityis declared (e.g., with %left) although the associativity is useless, only the precedence matters.  I have
notchanged this, because I don't know what is the version of Bison that is required.  Given that it's a maintainer-side
tool,I would suggest targeting recent versions of Bison, but opinions might differ here. 

Cheers!

commit 75e597aa239d8ebc332d3a29630ecad0133d3d6f
Author: Akim Demaille <akim.demaille@gmail.com>
Date:   Sun May 19 14:24:33 2019 +0200

    json_path: remove useless precedence directives

    These directives are useless: the generated parser is exactly the
    same (except for line number changes).

diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y
index 22c2089f78..82b6529414 100644
--- a/src/backend/utils/adt/jsonpath_gram.y
+++ b/src/backend/utils/adt/jsonpath_gram.y
@@ -115,11 +115,9 @@ static JsonPathParseItem *makeItemLikeRegex(JsonPathParseItem *expr,

 %left    OR_P
 %left    AND_P
-%right    NOT_P
 %left    '+' '-'
 %left    '*' '/' '%'
 %left    UMINUS
-%nonassoc '(' ')'

 /* Grammar follows */
 %%



This second patch could be made simpler: just remove the %token declarations I provided, but then the generated files
aredifferent (but, of course, both parsers are equivalent). 

commit 5322f7303a1a9dfa7cd959d68caeced847ae0466
Author: Akim Demaille <akim.demaille@gmail.com>
Date:   Sun May 19 14:32:15 2019 +0200

    parser: remove useless associativity/precedence

    Use %token instead to guarantee that the token numbers are the same
    before and after this patch.  As a consequence, the generated files
    are equal.

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 3dc0e8a4fb..3d4c552cfa 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -766,10 +766,10 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %left        AT                /* sets precedence for AT TIME ZONE */
 %left        COLLATE
 %right        UMINUS
-%left        '[' ']'
+%token        '[' ']'
 %left        '(' ')'
 %left        TYPECAST
-%left        '.'
+%token        '.'
 /*
  * These might seem to be low-precedence, but actually they are not part
  * of the arithmetic hierarchy at all in their use as JOIN operators.






Re: Remove useless associativity/precedence from parsers

From
Tom Lane
Date:
Akim Demaille <akim@lrde.epita.fr> writes:
> In the following two proposed patches, I remove directives that are
> completely useless.

I'm far from convinced that the proposed changes in gram.y are a good
idea.  Both [] and . (field selection) *are* left-associative in a
meaningful sense, so even if this change happens not to affect what
Bison does, I think the declarations are good documentation.  Would
you have us also change the user documentation at
https://www.postgresql.org/docs/devel/sql-syntax-lexical.html#SQL-PRECEDENCE
?

I haven't looked at the jsonpath grammar closely enough to have an
opinion about that one.

            regards, tom lane



Re: Remove useless associativity/precedence from parsers

From
Akim Demaille
Date:
Hi Tom,

> Le 19 mai 2019 à 20:27, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
>
> Akim Demaille <akim@lrde.epita.fr> writes:
>> In the following two proposed patches, I remove directives that are
>> completely useless.
>
> I'm far from convinced that the proposed changes in gram.y are a good
> idea.  Both [] and . (field selection) *are* left-associative in a
> meaningful sense, so even if this change happens not to affect what
> Bison does, I think the declarations are good documentation.

I don't dispute the overall behavior of the grammar as a whole, I'm only referring to these directives.  In my
experience,leaving useless associativity and precedence directives can be misleading (since these directives have no
impact,you could put them anywhere: their contribution is not checked in any way) or even dangerous (some day, some
changeintroduces unexpected shift-reduce conflicts that someone should have studied, but because of "stray" directives,
theyare "fixed" in some uncontrolled way). 

> Would
> you have us also change the user documentation at
> https://www.postgresql.org/docs/devel/sql-syntax-lexical.html#SQL-PRECEDENCE
> ?

No, of course not!  That you define the arithmetics with an unambiguous grammar (expr/term/fact and no
associativity/precedencedirective) or with an ambiguous grammar (expr and associativity/precedence directives) still
resultsin the same behavior: the usual behavior of these operators.  And the documentation should document that, of
course.


It is for the same reasons that I would recommend not using associativity directives (%left, %right, %nonassoc) where
associativityplays no role: %precedence is made for this.  But it was introduced in Bison 2.7.1 (2013-04-15), and I
don'tknow if requiring it is acceptable to PostgreSQL. 

Cheers!


Re: Remove useless associativity/precedence from parsers

From
Tom Lane
Date:
Akim Demaille <akim@lrde.epita.fr> writes:
> It is for the same reasons that I would recommend not using associativity directives (%left, %right, %nonassoc) where
associativityplays no role: %precedence is made for this.  But it was introduced in Bison 2.7.1 (2013-04-15), and I
don'tknow if requiring it is acceptable to PostgreSQL. 

2013?  Certainly not.  We have a lot of buildfarm critters running
older platforms than that.  I believe our (documented and tested)
minimum version of Bison is still 1.875.  While we'd be willing
to move that goalpost if there were clear benefits from doing so,
I'm not even convinced that %precedence as you describe it here
is any improvement at all.

            regards, tom lane



Re: Remove useless associativity/precedence from parsers

From
Akim Demaille
Date:
Hi Tom!

> Le 20 mai 2019 à 15:54, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
>
> Akim Demaille <akim@lrde.epita.fr> writes:
>> It is for the same reasons that I would recommend not using associativity directives (%left, %right, %nonassoc)
whereassociativity plays no role: %precedence is made for this.  But it was introduced in Bison 2.7.1 (2013-04-15), and
Idon't know if requiring it is acceptable to PostgreSQL. 
>
> 2013?  Certainly not.  We have a lot of buildfarm critters running
> older platforms than that.

This I fully understand.  However, Bison is a source generator,
and it's quite customary to use modern tools on the maintainer
side, and then deploy the result them on possibly much older
architectures.

Usually users of Bison build tarballs with the generated parsers
in them, and ship/test from that.

> I believe our (documented and tested)
> minimum version of Bison is still 1.875.  While we'd be willing
> to move that goalpost if there were clear benefits from doing so,
> I'm not even convinced that %precedence as you describe it here
> is any improvement at all.

Ok.  I find this really surprising: you are leaving dormant directives
that may fire some day without anyone knowing.

You could comment out the useless associativity/precedence directives,
that would just as well document them, without this risk.

But, Ok, that's only my opinion.


So Bison, and your use of it today, is exactly what you need?
There's no limitation of that tool that you'd like to see
address that would make it a better tool for PostgreSQL?


Re: Remove useless associativity/precedence from parsers

From
Tom Lane
Date:
Akim Demaille <akim@lrde.epita.fr> writes:
>> Le 20 mai 2019 à 15:54, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
>> 2013?  Certainly not.  We have a lot of buildfarm critters running
>> older platforms than that.

> This I fully understand.  However, Bison is a source generator,
> and it's quite customary to use modern tools on the maintainer
> side, and then deploy the result them on possibly much older
> architectures.
> Usually users of Bison build tarballs with the generated parsers
> in them, and ship/test from that.

As do we, but at the same time we don't want to make our tool
requirements too onerous.  I think that really the practical limit
right now is Bison 2.3 --- Apple is still shipping that as their system
version, so requiring something newer than 2.3 would put extra burden
on people doing PG development on Macs (of which there are a lot).
The fact that we still test 1.875 is mostly just an "if it ain't broke
don't break it" thing, ie don't move the goalposts without a reason.

> So Bison, and your use of it today, is exactly what you need?
> There's no limitation of that tool that you'd like to see
> address that would make it a better tool for PostgreSQL?

Well, there are a couple of pain points, but they're not going to be
addressed by marginal hacking on declarations ;-).  The things that
we find really painful, IMV, are:

* Speed of the generated parser could be better.  I suspect this has
a lot to do with the fact that our grammar is huge, and so are the
tables, and that causes lots of cache misses.  Maybe this could be
addressed by trying to make the tables smaller and/or laid out in
a way with better cache locality?

* Lack of run-time extensibility of the parser.  There are many PG
extensions that wish they could add things into the grammar, and can't.
This is pretty pie-in-the-sky, I know.  One of the main reasons we stick
to Bison is the compile-time grammar sanity checks it provides, and
it's not apparent how to have that and extensibility too.  But it's
still a pain point.

* LALR(1) parsing can only barely cope with SQL, and the standards
committee keeps making it harder.  We've got some hacks that fake
an additional token of lookahead in some places, but that's just an
ugly (and performance-eating) hack.  Maybe Bison's GLR mode would
already solve that, but no one here has really looked into whether
it could improve matters or whether it'd come at a performance cost.
The Bison manual's description of GLR doesn't give me a warm feeling
either about the performance impact or whether we'd still get
compile-time warnings about bogus grammars.

Other PG hackers might have a different laundry list, but that's mine.

            regards, tom lane



Re: Remove useless associativity/precedence from parsers

From
Akim Demaille
Date:
Hi Tom,

> Le 21 mai 2019 à 21:06, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
>
> Akim Demaille <akim@lrde.epita.fr> writes:
>>> Le 20 mai 2019 à 15:54, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
>>> 2013?  Certainly not.  We have a lot of buildfarm critters running
>>> older platforms than that.
>
>> This I fully understand.  However, Bison is a source generator,
>> and it's quite customary to use modern tools on the maintainer
>> side, and then deploy the result them on possibly much older
>> architectures.
>> Usually users of Bison build tarballs with the generated parsers
>> in them, and ship/test from that.
>
> As do we, but at the same time we don't want to make our tool
> requirements too onerous.  I think that really the practical limit
> right now is Bison 2.3 --- Apple is still shipping that as their system
> version,

And do not expect Apple to update it at all.  Apple refuses the
GPLv3, and stopped updating Bison to the last GPLv2 release,
as it did for every GPL'd program.

> so requiring something newer than 2.3 would put extra burden
> on people doing PG development on Macs (of which there are a lot).

Honestly, I seriously doubt that you have contributors that don't
have MacPorts or Brew installed, and both are pretty up to date on
Bison.

>> So Bison, and your use of it today, is exactly what you need?
>> There's no limitation of that tool that you'd like to see
>> address that would make it a better tool for PostgreSQL?
>
> Well, there are a couple of pain points, but they're not going to be
> addressed by marginal hacking on declarations ;-).  The things that
> we find really painful, IMV, are:
>
> * Speed of the generated parser could be better.

Expect news this year about that.  I precisely came to look at
PostgreSQL for this.  Is there an easy way to bench pg and the various
costs?  To be explicit: is there a way to see how long the parsing
phase takes?  And some mighty inputs to bench against?

> I suspect this has
> a lot to do with the fact that our grammar is huge, and so are the
> tables, and that causes lots of cache misses.  Maybe this could be
> addressed by trying to make the tables smaller and/or laid out in
> a way with better cache locality?

The improvement I have in mind is about LR techniques, not about
this.  But you are right that it might be interesting.

It's unlikely that the table can be made smaller though.  Bison
was designed when space was really scarce, and a lot of efforts
were invested to make the tables as small as possible.  The
current trend is actually to optionally consume more space in
exchange for better services (such as more accurate error messages).


> * Lack of run-time extensibility of the parser.  There are many PG
> extensions that wish they could add things into the grammar, and can't.

Making the grammars extensible definitely makes sense, and it's
in the wishlist.  But making this doable at runtime is a much bigger
problem...

However, maybe this can be achieved by calling the plugin parser
from the outer parser.  Provided, of course, that the grammar of
the plugin is really in a "separate world"; if it also wants to
get bits of the host grammar, it's certainly not so easy.

Are there documented examples of this?  What would that look like?


> * LALR(1) parsing can only barely cope with SQL, and the standards
> committee keeps making it harder.

But Bison does more: it also provides support for LR(1) and IELR(1),
which accept more (deterministic) grammars, and are not subject
to "mysterious s/r conflicts" as in LALR.  But maybe you refer to
even beyond LR(1):

> We've got some hacks that fake
> an additional token of lookahead in some places, but that's just an
> ugly (and performance-eating) hack.

More than k=1 is unlikely to happen.  Given that we have GLR, which
provides us with k=∞ :)

> Maybe Bison's GLR mode would already solve that,

No doubt about that.

Bison's grammar is not LR(1) either, because the rules are not mandated
to end with ';', so when reading a grammar, in a sequence such as
"<ID> <:> <ID>" the parser cannot know whether the second <ID> is the
RHS of the rule introduced by the first <ID>, or the beginning of another
rule if the sequence is actually "<ID> <:> <ID> <:>".  Because of that,
Bison is also playing dirty tricks to turn this LR(2) into LR(1).

But as a consequence the grammar is much harder to evolve, the locations
are less accurate (because two tokens are merged together into a mega
token), etc.

So it is considered to turn Bison's own parser to GLR.

> but no one here has really looked into whether
> it could improve matters or whether it'd come at a performance cost.

That should be very easy to check: just adding %glr to the grammar
should not change the API, should not change the visible behavior,
but will give you a hint of the intrinsic cost of the GLR backend.

> The Bison manual's description of GLR doesn't give me a warm feeling
> either about the performance impact

The GLR backend is efficient... for a GLR backend.  I have not
benched it against the deterministic backend, but now that you
mention it, it's an obvious need...

> or whether we'd still get
> compile-time warnings about bogus grammars.

Well, you can't fight nature here.  Ambiguity of a grammar is
undecidable.

That being said, there's a modified Bison which implements heuristics
to detect ambiguities in grammar [1].  Would that make you feel more
comfortable?

[1] http://www.lsv.fr/~schmitz/pub/expamb.pdf


Re: Remove useless associativity/precedence from parsers

From
Tom Lane
Date:
Akim Demaille <akim@lrde.epita.fr> writes:
> Honestly, I seriously doubt that you have contributors that don't
> have MacPorts or Brew installed, and both are pretty up to date on
> Bison.

Hm, well, I'm a counterexample ;-).  Right now you can develop PG
on a Mac just fine without any additional stuff, excepting maybe
OpenSSL if you want that.  If we have a strong reason to require
a newer Bison, I'd be willing to do so, but it needs to be a
strong reason.

>> * Speed of the generated parser could be better.

> Expect news this year about that.  I precisely came to look at
> PostgreSQL for this.

That's very cool news.

> Is there an easy way to bench pg and the various
> costs?  To be explicit: is there a way to see how long the parsing
> phase takes?  And some mighty inputs to bench against?

The easiest method is to fire up some client code that repeatedly
does whatever you want to test, and then look at perf or oprofile
or local equivalent to see where the time is going in the backend
process.

For the particular case of stressing the parser, probably the
best thing to look at is test cases that do a lot of low-overhead
DDL, such as creating views.  You could do worse than just repeatedly
sourcing our standard view files, like
    src/backend/catalog/system_views.sql
    src/backend/catalog/information_schema.sql
(In either case, I'd suggest adapting the file to create all
its objects in some transient schema that you can just drop.
Repointing information_schema.sql to some other schema is
trivial, just change a couple of commands at the top; and
you could tweak system_views.sql similarly.  Also consider
wrapping the whole thing in BEGIN; ... ROLLBACK; instead of
spending time on an explicit DROP.)

Somebody else might know of a better test case but I'd try
that first.

There would still be a fair amount of I/O and catalog lookup
overhead in a test run that way, but it would be an honest
approximation of useful real-world cases.  If you're willing to
put some blinders on and just micro-optimize the flex/bison
code, you could set up a custom function that just calls that
stuff.  I actually did that not too long ago; C code attached
for amusement's sake.

>> * Lack of run-time extensibility of the parser.  There are many PG
>> extensions that wish they could add things into the grammar, and can't.

> Are there documented examples of this?  What would that look like?

I'm just vaguely recalling occasional how-could-I-do-this complaints
on the pgsql-hackers mailing list.  Perhaps somebody else could say
something more concrete.

            regards, tom lane

/*

build this into a Postgres shared library, then

create function drive_parser(query text, count int) returns void
strict volatile language c as '.../drive_parser.so';

\timing

select drive_parser('some-query', 1000);

 */

#include "postgres.h"

#include "fmgr.h"
#include "miscadmin.h"
#include "tcop/tcopprot.h"
#include "utils/builtins.h"
#include "utils/memutils.h"

PG_MODULE_MAGIC;

/*
 * drive_parser(query text, count int) returns void
 */
PG_FUNCTION_INFO_V1(drive_parser);
Datum
drive_parser(PG_FUNCTION_ARGS)
{
    text       *txt = PG_GETARG_TEXT_PP(0);
    int32        count = PG_GETARG_INT32(1);
    char       *query_string = text_to_cstring(txt);
    MemoryContext mycontext;

    mycontext = AllocSetContextCreate(CurrentMemoryContext,
                                      "drive_parser work cxt",
                                      ALLOCSET_DEFAULT_SIZES);

    while (count-- > 0)
    {
        List       *parsetree_list;
        MemoryContext oldcontext;

        oldcontext = MemoryContextSwitchTo(mycontext);

        /* This times raw parsing only */
        parsetree_list = pg_parse_query(query_string);

        MemoryContextSwitchTo(oldcontext);

        MemoryContextReset(mycontext);

        CHECK_FOR_INTERRUPTS();
    }

    PG_RETURN_VOID();
}

Re: Remove useless associativity/precedence from parsers

From
Daniel Gustafsson
Date:
> On 22 May 2019, at 23:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Akim Demaille <akim@lrde.epita.fr> writes:
>> Honestly, I seriously doubt that you have contributors that don't
>> have MacPorts or Brew installed, and both are pretty up to date on
>> Bison.
>
> Hm, well, I'm a counterexample ;-)

And one more.  While I do have brew installed, I prefer to use it as little as
possible.

cheers ./daniel


Re: Remove useless associativity/precedence from parsers

From
Andrew Dunstan
Date:
On 5/21/19 11:49 AM, Akim Demaille wrote:
> Hi Tom!
>
>> Le 20 mai 2019 à 15:54, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
>>
>> Akim Demaille <akim@lrde.epita.fr> writes:
>>> It is for the same reasons that I would recommend not using associativity directives (%left, %right, %nonassoc)
whereassociativity plays no role: %precedence is made for this.  But it was introduced in Bison 2.7.1 (2013-04-15), and
Idon't know if requiring it is acceptable to PostgreSQL.
 
>> 2013?  Certainly not.  We have a lot of buildfarm critters running
>> older platforms than that.
> This I fully understand.  However, Bison is a source generator,
> and it's quite customary to use modern tools on the maintainer
> side, and then deploy the result them on possibly much older
> architectures.
>
> Usually users of Bison build tarballs with the generated parsers
> in them, and ship/test from that.
>


The buildfarm client does not build from tarballs, it builds from git,
meaning it has to run bison. Thus Tom's objection is quite valid, and
your dismissal of it is not.


cheers


andrew


-- 

Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Remove useless associativity/precedence from parsers

From
Tom Lane
Date:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> On 5/21/19 11:49 AM, Akim Demaille wrote:
>> Usually users of Bison build tarballs with the generated parsers
>> in them, and ship/test from that.

> The buildfarm client does not build from tarballs, it builds from git,
> meaning it has to run bison. Thus Tom's objection is quite valid, and
> your dismissal of it is not.

Right, but that's a much narrower set of people who need to update
than "all PG users" or even "all PG developers".

I checked the buildfarm's configure results not too long ago, and noted
that the oldest bison versions are

 gaur          | configure: using bison (GNU Bison) 1.875
 prairiedog    | configure: using bison (GNU Bison) 1.875
 dromedary     | configure: using bison (GNU Bison) 2.3
 locust        | configure: using bison (GNU Bison) 2.3
 longfin       | configure: using bison (GNU Bison) 2.3
 nudibranch    | configure: using bison (GNU Bison) 2.3
 anole         | configure: using bison (GNU Bison) 2.4.1
 fulmar        | configure: using bison (GNU Bison) 2.4.1
 gharial       | configure: using bison (GNU Bison) 2.4.1
 grouse        | configure: using bison (GNU Bison) 2.4.1
 koreaceratops | configure: using bison (GNU Bison) 2.4.1
 leech         | configure: using bison (GNU Bison) 2.4.1
 magpie        | configure: using bison (GNU Bison) 2.4.1
 treepie       | configure: using bison (GNU Bison) 2.4.1
 coypu         | configure: using bison (GNU Bison) 2.4.3
 friarbird     | configure: using bison (GNU Bison) 2.4.3
 nightjar      | configure: using bison (GNU Bison) 2.4.3
 (then 2.5 and later)

(This doesn't cover the Windows members, unfortunately.)

gaur and prairiedog are my own pet dinosaurs, and updating them would
not be very painful.  (Neither of them are using the original vendor
Bison to begin with ... as I said, they're dinosaurs.)  Meanwhile,
three of the 2.3 members are Mac systems; nudibranch is SUSE 11.
Requiring anything newer than 2.4.1 would start to cause problems
for a fair number of people, I think.

Still, the bottom line here is that we could require a new(ish) Bison
if we could point to clear benefits that outweigh the pain.  Right
now there's not much argument for it.

            regards, tom lane



Re: Remove useless associativity/precedence from parsers

From
Robert Haas
Date:
On Tue, May 21, 2019 at 3:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Other PG hackers might have a different laundry list, but that's mine.

Good list.

Another thing is that it would be nice to have a better way of
resolving conflicts than attaching precedence declarations.  Some
problems can't be solved that way at all, and others can only be
solved that way at the risk of unforeseen side effects.  One possible
idea is a way to mark a rule %weak, meaning that it should only be
used if no non-%weak rule could apply.  I'm not sure if that would
really be the best way, but it's one idea.  A more general version
would be some kind of ability to give rules different strengths; in
the case of a grammar conflict, the "stronger" rule would win.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Remove useless associativity/precedence from parsers

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Another thing is that it would be nice to have a better way of
> resolving conflicts than attaching precedence declarations.  Some
> problems can't be solved that way at all, and others can only be
> solved that way at the risk of unforeseen side effects.

Yeah, we've definitely found that resolving shift/reduce conflicts via
precedence declarations has more potential for surprising side-effects
than one would think.  It feels to me that there's something basically
wrong with that concept, or at least wrong with the way we've used it.
Some relevant commits: 670a6c7a2, 12b716457, 6fe27ca2f, and the
"x NOT-something y" hacks in commit c6b3c939b (that one has a whole bunch
of other cruft in it, so it might be hard to spot what I'm talking about).

> One possible
> idea is a way to mark a rule %weak, meaning that it should only be
> used if no non-%weak rule could apply.  I'm not sure if that would
> really be the best way, but it's one idea.  A more general version
> would be some kind of ability to give rules different strengths; in
> the case of a grammar conflict, the "stronger" rule would win.

Hmmm ... not apparent to me that that's really going to help.
Maybe it will, but it sounds like more likely it's just another
mechanism with not-as-obvious-as-you-thought side effects.

            regards, tom lane



Re: Remove useless associativity/precedence from parsers

From
Robert Haas
Date:
On Thu, May 23, 2019 at 12:00 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > One possible
> > idea is a way to mark a rule %weak, meaning that it should only be
> > used if no non-%weak rule could apply.  I'm not sure if that would
> > really be the best way, but it's one idea.  A more general version
> > would be some kind of ability to give rules different strengths; in
> > the case of a grammar conflict, the "stronger" rule would win.
>
> Hmmm ... not apparent to me that that's really going to help.
> Maybe it will, but it sounds like more likely it's just another
> mechanism with not-as-obvious-as-you-thought side effects.

That's possible; I'm open to other ideas.  If you wanted to be really
explicit about it, you could have a way to stick an optional name on a
grammar rule, and a way to say that the current rule should lose to a
list of named other rules.

It seems pretty clear, though, that our use of %prec proves that we
can't just write a grammar that is intrinsically conflict-free; we
sometimes need to have conflicts and then tell the parser generator
which option to prefer.  And I think it's also pretty clear that %prec
is, for anything other than operator precedence, a horrible way of
doing that.  A method that was merely mediocre could still be a big
improvement over what we have available today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Remove useless associativity/precedence from parsers

From
Andres Freund
Date:
Hi,

On 2019-05-22 17:25:31 -0400, Tom Lane wrote:
> The easiest method is to fire up some client code that repeatedly
> does whatever you want to test, and then look at perf or oprofile
> or local equivalent to see where the time is going in the backend
> process.
> 
> For the particular case of stressing the parser, probably the
> best thing to look at is test cases that do a lot of low-overhead
> DDL, such as creating views.  You could do worse than just repeatedly
> sourcing our standard view files, like
>     src/backend/catalog/system_views.sql
>     src/backend/catalog/information_schema.sql
> (In either case, I'd suggest adapting the file to create all
> its objects in some transient schema that you can just drop.
> Repointing information_schema.sql to some other schema is
> trivial, just change a couple of commands at the top; and
> you could tweak system_views.sql similarly.  Also consider
> wrapping the whole thing in BEGIN; ... ROLLBACK; instead of
> spending time on an explicit DROP.)
> 
> Somebody else might know of a better test case but I'd try
> that first.

> There would still be a fair amount of I/O and catalog lookup
> overhead in a test run that way, but it would be an honest
> approximation of useful real-world cases.  If you're willing to
> put some blinders on and just micro-optimize the flex/bison
> code, you could set up a custom function that just calls that
> stuff.  I actually did that not too long ago; C code attached
> for amusement's sake.

FWIW, this is why I'd suggested the hack of EXPLAIN (PARSE_ANALYZE OFF,
OPTIMIZE OFF) a few years back. Right now it's hard to measure the
parser in isolation.

Greetings,

Andres Freund



Re: Remove useless associativity/precedence from parsers

From
Akim Demaille
Date:
hi Tom!

> Le 23 mai 2019 à 00:29, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
>
> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>> On 5/21/19 11:49 AM, Akim Demaille wrote:
>>> Usually users of Bison build tarballs with the generated parsers
>>> in them, and ship/test from that.
>
>> The buildfarm client does not build from tarballs, it builds from git,
>> meaning it has to run bison. Thus Tom's objection is quite valid, and
>> your dismissal of it is not.

I had not realized I had been rude to anybody.  I apologize to
Tom, I did not mean to dismiss anything.

> Right, but that's a much narrower set of people who need to update
> than "all PG users" or even "all PG developers".
>
> I checked the buildfarm's configure results not too long ago, and noted
> that the oldest bison versions are
>
> gaur          | configure: using bison (GNU Bison) 1.875
> prairiedog    | configure: using bison (GNU Bison) 1.875
> dromedary     | configure: using bison (GNU Bison) 2.3
> locust        | configure: using bison (GNU Bison) 2.3
> longfin       | configure: using bison (GNU Bison) 2.3
> nudibranch    | configure: using bison (GNU Bison) 2.3
> anole         | configure: using bison (GNU Bison) 2.4.1
> fulmar        | configure: using bison (GNU Bison) 2.4.1
> gharial       | configure: using bison (GNU Bison) 2.4.1
> grouse        | configure: using bison (GNU Bison) 2.4.1
> koreaceratops | configure: using bison (GNU Bison) 2.4.1
> leech         | configure: using bison (GNU Bison) 2.4.1
> magpie        | configure: using bison (GNU Bison) 2.4.1
> treepie       | configure: using bison (GNU Bison) 2.4.1
> coypu         | configure: using bison (GNU Bison) 2.4.3
> friarbird     | configure: using bison (GNU Bison) 2.4.3
> nightjar      | configure: using bison (GNU Bison) 2.4.3
> (then 2.5 and later)
>
> (This doesn't cover the Windows members, unfortunately.)
>
> gaur and prairiedog are my own pet dinosaurs, and updating them would
> not be very painful.

I don't want to be painful, but the fact that the buildfarm
starts from git is also a design choice.  In order to be
fully reproducible, I know projects that have a first step
that builds the end-user type of tarball, and run it on all
sorts of architectures.  Of course, it's more initial set up,
but you gain your independence on many tools which in turn
makes it easier to check on more architectures.

> Still, the bottom line here is that we could require a new(ish) Bison
> if we could point to clear benefits that outweigh the pain.  Right
> now there's not much argument for it.

I get that, thanks.

Cheers!


Re: Remove useless associativity/precedence from parsers

From
Akim Demaille
Date:
Hey Tom,

> Le 22 mai 2019 à 23:25, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
>
> Akim Demaille <akim@lrde.epita.fr> writes:
>> Honestly, I seriously doubt that you have contributors that don't
>> have MacPorts or Brew installed, and both are pretty up to date on
>> Bison.
>
> Hm, well, I'm a counterexample ;-).

Wow :)  I have even more respect now :)  I'm soooo happy to use
the bleeding edge compilers to get all the possible warnings and
sanitizers...



Thanks for the tips on how to bench.  I'll see what I can do (I'm
not a (direct) user of pg myself, nor am I used to write SQL by
hand).

Cheers!


Re: Remove useless associativity/precedence from parsers

From
Akim Demaille
Date:
> Le 22 mai 2019 à 23:44, Daniel Gustafsson <daniel@yesql.se> a écrit :
>
>> On 22 May 2019, at 23:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Akim Demaille <akim@lrde.epita.fr> writes:
>>> Honestly, I seriously doubt that you have contributors that don't
>>> have MacPorts or Brew installed, and both are pretty up to date on
>>> Bison.
>>
>> Hm, well, I'm a counterexample ;-)
>
> And one more.  While I do have brew installed, I prefer to use it as little as
> possible.

Err...  You're not exactly what I call a counterexample.




Re: Remove useless associativity/precedence from parsers

From
Akim Demaille
Date:
Tom,

> Le 23 mai 2019 à 06:00, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
>
> Robert Haas <robertmhaas@gmail.com> writes:
>> Another thing is that it would be nice to have a better way of
>> resolving conflicts than attaching precedence declarations.  Some
>> problems can't be solved that way at all, and others can only be
>> solved that way at the risk of unforeseen side effects.
>
> Yeah, we've definitely found that resolving shift/reduce conflicts via
> precedence declarations has more potential for surprising side-effects
> than one would think.

That's why in recent versions of Bison we also provide a means
to pure %expect directives on the rules themselves, to be more
precise about what happens.

> It feels to me that there's something basically
> wrong with that concept, or at least wrong with the way we've used it.

I'm trying to find means to scope the prec/assoc directives, because
they are too powerful, and that's dangerous.  This is also why I try
to remove the useless ones.

Some people don't trust assoc/prec directives at all and use only
unambiguous grammars.  But this can be very verbose...

I agree something is not so cool about these directives.  GLR parsers
have a clear concept of in-between-rules precedence (%dprec).  Something
similar for LR (hence fully static) would be nice, but it remains to
be invented.


Re: Remove useless associativity/precedence from parsers

From
Andrew Gierth
Date:
>>>>> "Akim" == Akim Demaille <akim@lrde.epita.fr> writes:

 >> Yeah, we've definitely found that resolving shift/reduce conflicts
 >> via precedence declarations has more potential for surprising
 >> side-effects than one would think.

 Akim> That's why in recent versions of Bison we also provide a means to
 Akim> pure %expect directives on the rules themselves, to be more
 Akim> precise about what happens.

It's possibly worth looking at the details of each case where we've run
into problems to see whether there is a better solution.

The main cases I know of are:

1. RANGE UNBOUNDED PRECEDING - this one is actually a defect in the
standard SQL grammar, since UNBOUNDED is a non-reserved keyword and so
it can also appear as a legal <identifier>, and the construct
RANGE <unsigned value specification> PRECEDING allows <identifier> to
appear as a <SQL parameter reference>.

We solve this by giving UNBOUNDED a precedence below PRECEDING.

2. CUBE() - in the SQL spec, GROUP BY does not allow expressions, only
column references, but we allow expressions as an extension. The syntax
GROUP BY CUBE(a,b) is a shorthand for grouping sets, but this is
ambiguous with a function cube(...). (CUBE is also a reserved word in the
spec, but it's an unreserved keyword for us.)

We solve this by giving CUBE (and ROLLUP) precedence below '('.

3. General handling of postfix operator conflicts

The fact that we allow postfix operators means that any sequence which
looks like <expression> <identifier> is ambiguous. This affects the use
of aliases in the SELECT list, and also PRECEDING, FOLLOWING, GENERATED,
and NULL can all follow expressions.

4. Not reserving words that the spec says should be reserved

We avoid reserving PARTITION, RANGE, ROWS, GROUPS by using precedence
hacks.

-- 
Andrew (irc:RhodiumToad)



Re: Remove useless associativity/precedence from parsers

From
Alvaro Herrera
Date:
On 2019-May-28, Andrew Gierth wrote:

> The main cases I know of are:
> 
> 1. RANGE UNBOUNDED PRECEDING - this one is actually a defect in the
> standard SQL grammar, since UNBOUNDED is a non-reserved keyword and so
> it can also appear as a legal <identifier>, and the construct
> RANGE <unsigned value specification> PRECEDING allows <identifier> to
> appear as a <SQL parameter reference>.

Should we report this to the SQL committee?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Remove useless associativity/precedence from parsers

From
Bruce Momjian
Date:
On Tue, May 21, 2019 at 03:06:43PM -0400, Tom Lane wrote:
> * Speed of the generated parser could be better.  I suspect this has
> a lot to do with the fact that our grammar is huge, and so are the
> tables, and that causes lots of cache misses.  Maybe this could be
> addressed by trying to make the tables smaller and/or laid out in
> a way with better cache locality?

Agreed.  This was brought up in January, with a little more specificity:

    https://www.postgresql.org/message-id/20190125223859.GD13803@momjian.us

    With our scanner keywords list now more cache-aware, and with us
    planning to use Bison for years to come, has anyone ever looked at
    reordering the bison state machine array to be more cache aware, e.g.,
    having common states next to each other rather than scattered around the
    array?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +