Re: hint infrastructure setup (v3) - Mailing list pgsql-patches

From Tom Lane
Subject Re: hint infrastructure setup (v3)
Date
Msg-id 11136.1081175771@sss.pgh.pa.us
Whole thread Raw
In response to Re: hint infrastructure setup (v3)  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: hint infrastructure setup (v3)
Re: hint infrastructure setup (v3)
List pgsql-patches
Fabien COELHO <coelho@cri.ensmp.fr> writes:
> Although I agree that it is "doable", I have stronger reserve than yours.
> Also, I do not find it an appealing solution to change "gram.c" a lot.

I was not proposing hand-editing gram.c after bison generates it,
if that's what you meant ;-).  It seems however perfectly doable to
reference bison's state stack from yyerror.  We'd need some macro
hacking to make yystate and the stack available to yyerror, but nothing
worse than is already documented and recommended practice for other
bison tweaks.

> The automaton stack keeps states, which are not directly linked to rules
> and terminals. The terminals are not available, they must be kept
> separatly if you want them. This can be done in yylex().

We don't need them.  Any already-shifted tokens are to the left of where
the error is, no?

> The internal state, stack, token... are local variables within yyparse().
> As a result, they are not accessible from yyerror. I haven't found any
> available hook, so you have to hack "gram.c" to get this information.

No, just redefine "yyerror" as a macro that passes additional
parameters.

> - move backwards before doing the above, if some reductions where
>   performed because of the submitted token and finally resulted in the error,
>   the state that lead to the error may not be the highest available one, so
>   maybe other allowed tokens may also be missed. We would need to have
>   the last state before any reduction.

Yeah, I had come to the same conclusion --- state moves made without
consuming input would need to be backed out if we wanted to report the
complete follow set.  I haven't yet looked to see how to do that.

> As you noted, for things like "SHOW 123", the follow set basically
> includes all keywords although you can have SHOW ALL or SHOW name.
> So, as you suggested, you can either say "ident" as a simplification, but
> you miss ALL which is meaningful, or you say all keywords, which is
> useless.

You're ignoring the distinction between classes of keywords.  I would
not recommend treating reserved_keywords as a subclass of ident.

> (5) anything that can be done would be hardwired to one version of bison.
> There is a lot of asumptions in the code and data structures, and any
> newer/older version with some different internal representation would
> basically break any code that would rely on that. So postgres would not be
> "bison" portable:-( I don't think it is an real option that old postgresql
> source would be broken against future bison releases.

I think this argument is completely without merit.  The technology of
LALR parsers has been stable for what, thirty years now?  The parts of
bison that we'd want to look at are inherited lock stock and barrel from
AT&T yacc, and are unlikely to change in the foreseeable future; even
more unlikely to change in a way that we couldn't easily adapt to.  You
might as well argue that we shouldn't use autoconf because the autoconf
authors sometimes make not-very-compatible changes.

> (b) write a new "recursive descendant" parser, and drop "gram.y"

Been there, done that, not impressed with the idea.  There's a reason
why people invented parser generators...

> As a side effect of my inspection is that the automaton generated by bison
> could be simplified if a different tradeoff between the lexer, the parser
> and the post-processing would be chosen. Namelly, all tokens that are
> just identifiers should be dropped and processed differently.

We are not going to reserve the keywords that are presently unreserved.
If you can think of a reasonable way to stop treating them as separate
tokens inside the grammar without altering the user-visible behavior,
I'm certainly interested.  I think that will be rather difficult,
however, considering for one thing that SQL specifies different
case-folding behavior for identifiers and keywords.

            regards, tom lane

pgsql-patches by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Translation Upadtes for 7.5: initdb-ru.po.gz;libpq-ru.po.gz;psql-ru.po.gz
Next
From: Stephen Frost
Date:
Subject: Add error-checking to timestamp_recv