Thread: hint infrastructure setup (v3)

hint infrastructure setup (v3)

From
Fabien COELHO
Date:
Dear patchers,

Well, as the discussion rages over my previous patch submissions, I've
time to improve the code;-)

I finally figured out that there is 2 errhint functions (elog.c vs
ipc_text.c), and the one I'm calling is the fist one, so I need to put a
format. The second appears to ignore it's arguments after the first.

Anyway, please consider the following patch for inclusion to current
head. It validates for me. I need it to be able to go on.

Have a nice day,

--
Fabien Coelho - coelho@cri.ensmp.fr

Attachment

Re: hint infrastructure setup (v3)

From
Bruce Momjian
Date:
Fabien COELHO wrote:
>
> Dear patchers,
>
> Well, as the discussion rages over my previous patch submissions, I've
> time to improve the code;-)
>
> I finally figured out that there is 2 errhint functions (elog.c vs
> ipc_text.c), and the one I'm calling is the fist one, so I need to put a
> format. The second appears to ignore it's arguments after the first.
>
> Anyway, please consider the following patch for inclusion to current
> head. It validates for me. I need it to be able to go on.

Why did all the tags have to be renamed:

    + cmdGRANT: GRANT {noH;};

Also, what is typical output for a hint?  Can you show one?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: hint infrastructure setup (v3)

From
Fabien COELHO
Date:
Dear Bruce,

> Why did all the tags have to be renamed:
>
>     + cmdGRANT: GRANT {noH;};

that's a good question.

In order to add hints, I want to attach them to the states of the parser
automaton. So as to do that, I need I put a name on every state, so I need
to refactor the prefix that lead to a state, at give it a name.


Otherwise, I would have :

xxx: GRANT {noH;} ALL
   | GRANT {noH;} CREATE
   ;

(1) I would have to put the "after GRANT" hint twice, one after each
    GRANT occurence.

(2) this would result in shift/reduce conflicts, as the parser does not
    know which hint should be put. Well, it is the same code in both case,
    but yacc does not look at that.

Also, I need to stop hints, otherwise the last 'pushed' hint would be
shown on any error later.


> Also, what is typical output for a hint?  Can you show one?

Well, the current status of the infrastructure is that there is no hint;-)
The only hint with the patch is if you do not put an SQL command.

It may look like :

psql> creat table foo();
ERROR: syntax error at or near "creat" at character 1
creat table foo();
^
HINT: sql command such as SELECT or CREATE... ?


All other syntax errors result in no hint being displayed, thanks to
all added noH calls I put.


Also, As far as I remember, I put a new option to activate these hints.
Hints are disactivated by default. This is because some people do not
like advices and may want to turn them on or off.


--
Fabien Coelho - coelho@cri.ensmp.fr

Re: hint infrastructure setup (v3)

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> Also, what is typical output for a hint?  Can you show one?

> Well, the current status of the infrastructure is that there is no hint;-)

Ah, now I remember why I was waiting to review that stuff: I was expecting
you to come out with a version that actually did some things :-(

What you've got at the moment is a patch that significantly uglifies the
grammar without actually doing anything useful, or even clearly pointing
the way to what else will need to happen before it does do something
useful.  So it's difficult to form any reasoned judgment about whether
we want to commit to following this path or not.

As I said back at the beginning, I'm unconvinced that this path leads
to anything useful --- it seems like an extremely painful way of
accomplishing less than what a simple change to auto-show the command's
syntax summary could do.  To change my mind, you've got to show me some
concrete results of hints that are more useful than "\h <command>" is.

            regards, tom lane

Re: hint infrastructure setup (v3)

From
Fabien COELHO
Date:
Dear Tom,

> > Well, the current status of the infrastructure is that there is no hint;-)
>
> Ah, now I remember why I was waiting to review that stuff: I was expecting
> you to come out with a version that actually did some things :-(

Well, if you wait for something from me, it is better to tell me directly.
I was waiting for anything to happen to the patch (accept, discuss or
reject) before submitting anything else.


> What you've got at the moment is a patch that significantly uglifies the
> grammar without actually doing anything useful, or even clearly pointing
> the way to what else will need to happen before it does do something
> useful.  So it's difficult to form any reasoned judgment about whether
> we want to commit to following this path or not.

I can see your point.


The reasonnable way out of the deadlock that I can suggest is the
following:

I can resubmit a new patch that would provide the needed infrastructure
AND some hints on some commands as a proof of concept of what can be
achieved.

Then you can decide whether it is worth having this kind of thing on all
commands, or not.

If not, I won't have passed 3 week-ends in the parser instead if my
garden for nothing. If you are okay then you apply, and I'll submit some
new patches later on, one command at a time, and when I have time.

Does this sound reasonnable enough?

--
Fabien Coelho - coelho@cri.ensmp.fr

Re: hint infrastructure setup (v3)

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
> I can resubmit a new patch that would provide the needed infrastructure
> AND some hints on some commands as a proof of concept of what can be
> achieved.

I quite agree that you shouldn't do a complete implementation when it's
not clear if we'll accept it or not.  What I'd like to see is a demo,
basically.  How about one example of each of several different kinds
of hints?  We need to get a feeling for what can be accomplished and how
messy the grammar would get.

BTW, it seems like you are putting in a lot of infrastructure to
recreate externally the parse-stack state that bison has internally.
Maybe an interesting alternative would be to look at touching that
state directly.  I realize that bison doesn't consider that part of
their official API, but bison is a stable tool and seems unlikely to
change much in the future.  We could probably get away with it...

            regards, tom lane

Re: hint infrastructure setup (v3)

From
Fabien COELHO
Date:
Dear Tom,

> I quite agree that you shouldn't do a complete implementation when it's
> not clear if we'll accept it or not.  What I'd like to see is a demo,
> basically.  How about one example of each of several different kinds
> of hints?  We need to get a feeling for what can be accomplished and how
> messy the grammar would get.

I've already tried some implementation but it was not showable, and
because the infrastructure was not in place, inappropriate hints could
be shown.

It is not "that" messy. Just prefix are given a name to attach a rule
for hint updates.

> BTW, it seems like you are putting in a lot of infrastructure to
> recreate externally the parse-stack state that bison has internally.

Yes.

> Maybe an interesting alternative would be to look at touching that
> state directly.  I realize that bison doesn't consider that part of
> their official API, but bison is a stable tool and seems unlikely to
> change much in the future.  We could probably get away with it...

Interesting, however:

I would not like to break postgresql portability with other parser
generators. It would be enough to reject the patch from my point of
view;-)

Using some non standard undocumented API would not help other people who
would like to touch the parser.

It would not help me either, because I would have to learn the
undocumented API;-)

I really need to give a name to the state so as to be able to attach a
hint. I cannot set how I could guess the state number given after "GRANT"
just from the source, and without generating lots of conflicts.

So I'll do some more programming maybe over the week-end a resubmit
something soon.

--
Fabien.

Re: hint infrastructure setup (v3)

From
Bruce Momjian
Date:
Fabien COELHO wrote:
> I would not like to break postgresql portability with other parser
> generators. It would be enough to reject the patch from my point of
> view;-)

We require bison to build, period. I am sure we use enough
bison-specific stuff now.  No one has suggested another parser generator
to support either.

>
> Using some non standard undocumented API would not help other people who
> would like to touch the parser.
>
> It would not help me either, because I would have to learn the
> undocumented API;-)
>
> I really need to give a name to the state so as to be able to attach a
> hint. I cannot set how I could guess the state number given after "GRANT"
> just from the source, and without generating lots of conflicts.
>
> So I'll do some more programming maybe over the week-end a resubmit
> something soon.

Please show us an example of what a typical hint would look like to the
user.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: hint infrastructure setup (v3)

From
Fabien COELHO
Date:
> Please show us an example of what a typical hint would look like to the
> user.

As an illustration, here is the kind of thing I had with an early
quick and dirty proof of concept implementation I made for myself
one month ago.

As a result of this first try, I've convinced myself that I should do
that cleanly and incrementally, one command at a time, with a proper clean
infrastructure, and intermediate validations...

My idea of the target audience for those hints is my students while
they're learning SQL.


--
-- test HINTs provided on parser errors.
--
-- it should:
-- (1) trigger all possible hints
-- (2) include ACTUAL syntax errors by REAL people
--
-- Note that these test assume a psql like interface which
-- can only send "lexable" and "even" commands to the backend.
-- Thus I cannot test badly closed quotes or parentheses.
--
--
-- (1) all hints
--
--
-- sql command expected

foo;
ERROR:  syntax error at or near "foo" at character 1
HINT:  sql command such as SELECT, CREATE, GRANT... expected

123;
ERROR:  syntax error at or near "123" at character 1
HINT:  sql command such as SELECT, CREATE, GRANT... expected

+;
ERROR:  syntax error at or near "+" at character 1
HINT:  sql command such as SELECT, CREATE, GRANT... expected

'hello world';
ERROR:  syntax error at or near "'hello world'" at character 1
HINT:  sql command such as SELECT, CREATE, GRANT... expected

NULL;
ERROR:  syntax error at or near "NULL" at character 1
HINT:  sql command such as SELECT, CREATE, GRANT... expected

COLUMN;
ERROR:  syntax error at or near "COLUMN" at character 1
HINT:  sql command such as SELECT, CREATE, GRANT... expected

CREAT TABLE foo(id INT4);
ERROR:  syntax error at or near "CREAT" at character 1
HINT:  sql command such as SELECT, CREATE, GRANT... expected

--
-- CREATE xxx

CREATE 123;
ERROR:  syntax error at or near "123" at character 8
HINT:  OR REPLACE, TABLE, DATABASE, USER... expected

CREATE foo;
ERROR:  syntax error at or near "foo" at character 8
HINT:  OR REPLACE, TABLE, DATABASE, USER... expected

CREATE + CREATE ();
ERROR:  syntax error at or near "+" at character 8
HINT:  OR REPLACE, TABLE, DATABASE, USER... expected

CREATE INT4;
ERROR:  syntax error at or near "INT4" at character 8
HINT:  OR REPLACE, TABLE, DATABASE, USER... expected

CREATE OR foo;
ERROR:  syntax error at or near "foo" at character 11
HINT:  REPLACE expected

--
-- DROP xxx

DROP 123;
ERROR:  syntax error at or near "123" at character 6
HINT:  object such as TABLE, DATABASE, USER... expected

DROP foo;
ERROR:  syntax error at or near "foo" at character 6
HINT:  object such as TABLE, DATABASE, USER... expected

DROP USER 123;
ERROR:  syntax error at or near "123" at character 11
HINT:  user id such as calvin expected

DROP SYSID 123;
ERROR:  syntax error at or near "SYSID" at character 6
HINT:  object such as TABLE, DATABASE, USER... expected

DROP USER SYSID 123;
ERROR:  syntax error at or near "123" at character 17

DROP GROUP 321;
ERROR:  syntax error at or near "321" at character 12
HINT:  group id such as admin expected

DROP GROUP calvin, foo;
ERROR:  syntax error at or near "," at character 18
HINT:  ; expected

DROP TABLE (foo);
ERROR:  syntax error at or near "(" at character 12
HINT:  table name... expected

--
-- CREATE USER

CREATE USER COLUMN;
ERROR:  syntax error at or near "COLUMN" at character 13
HINT:  user id such as calvin expected

CREATE USER 123;
ERROR:  syntax error at or near "123" at character 13
HINT:  user id such as calvin expected

CREATE USER 'hello';
ERROR:  syntax error at or near "'hello'" at character 13
HINT:  user id such as calvin expected

CREATE USER calvin foo;
ERROR:  syntax error at or near "foo" at character 20
HINT:  user options or SET or RESET expected

CREATE USER calvin WITH foo;
ERROR:  syntax error at or near "foo" at character 25
HINT:  user options expected

CREATE USER calvin WITH CREATEDB foo;
ERROR:  syntax error at or near "foo" at character 34
HINT:  other user options or ';' expected

CREATE USER calvin PASSWORD 123;
ERROR:  syntax error at or near "123" at character 29
HINT:  password string such as 'secret' expected

CREATE USER calvin WITH PASSWORD 123;
ERROR:  syntax error at or near "123" at character 34
HINT:  password string such as 'secret' expected

CREATE USER calvin WITH ENCRYPTED 'pass';
ERROR:  syntax error at or near "'pass'" at character 35
HINT:  PASSWORD expected

CREATE USER calvin WITH UNENCRYPTED 'pass';
ERROR:  syntax error at or near "'pass'" at character 37
HINT:  PASSWORD expected

CREATE USER calvin WITH ENCRYPTED PASS 123;
ERROR:  syntax error at or near "PASS" at character 35
HINT:  PASSWORD expected

CREATE USER calvin WITH ENCRYPTED PASSWORD 123;
ERROR:  syntax error at or near "123" at character 44
HINT:  password string such as 'secret' expected

CREATE USER calvin WITH UNENCRYPTED PASSWORD 'secret' CREADB;
ERROR:  syntax error at or near "CREADB" at character 55
HINT:  other user options or ';' expected

CREATE USER calvin NOCREATEDB foo;
ERROR:  syntax error at or near "foo" at character 31
HINT:  other user options or ';' expected

CREATE USER calvin WITH IN foo;
ERROR:  syntax error at or near "foo" at character 28
HINT:  GROUP expected

CREATE USER calvin WITH IN GROUP 'foo';
ERROR:  syntax error at or near "'foo'" at character 34
HINT:  group id list expected

CREATE USER calvin WITH IN GROUP 123;
ERROR:  syntax error at or near "123" at character 34
HINT:  group id list expected

CREATE USER calvin WITH IN GROUP admin, column;
ERROR:  syntax error at or near "column" at character 41
HINT:  group id such as admin expected

CREATE USER calvin WITH IN GROUP admin, 123;
ERROR:  syntax error at or near "123" at character 41
HINT:  group id such as admin expected

CREATE USER calvin WITH IN GROUP admin, CREATEDB;
ERROR:  group "admin" does not exist

CREATE USER calvin WITH IN GROUP admin CREATEDB foo;
ERROR:  syntax error at or near "foo" at character 49
HINT:  other user options or ';' expected

CREATE USER calvin NOCREATEUSER CREATEDB foo;
ERROR:  syntax error at or near "foo" at character 42
HINT:  other user options or ';' expected

CREATE USER calvin CREATEUSER foo;
ERROR:  syntax error at or near "foo" at character 31
HINT:  other user options or ';' expected

CREATE USER calvin SYSID foo;
ERROR:  syntax error at or near "foo" at character 26
HINT:  integer constant expected

CREATE USER calvin SYSID 12 foo;
ERROR:  syntax error at or near "foo" at character 29
HINT:  other user options or ';' expected

CREATE USER calvin VALID UNTIL 12;
ERROR:  syntax error at or near "12" at character 32
HINT:  date string such as '01-02-2000' expected

CREATE USER calvin VALID foo;
ERROR:  syntax error at or near "foo" at character 26
HINT:  UNTIL expected

--
-- ALTER USER

ALTER USER calvin foo;
ERROR:  syntax error at or near "foo" at character 19
HINT:  user options or SET or RESET or ; expected

ALTER USER calvin WITH foo;
ERROR:  syntax error at or near "foo" at character 24
HINT:  user options expected

ALTER USER calvin SET foo;
ERROR:  syntax error at or near ";" at character 26
HINT:  SET stuff expected

ALTER USER calvin RESET 123;
ERROR:  syntax error at or near "123" at character 25
HINT:  user id such as calvin expected

--
-- CREATE GROUP

CREATE GROUP 123;
ERROR:  syntax error at or near "123" at character 14
HINT:  group id such as admin expected

CREATE GROUP grp foo;
ERROR:  syntax error at or near "foo" at character 18
HINT:  user list such as "USER calvin" or "SYSID 123" expected

CREATE GROUP grp WITH foo;
ERROR:  syntax error at or near "foo" at character 23
HINT:  user list such as "USER calvin" or "SYSID 123" expected

CREATE GROUP grp WITH USER ;
ERROR:  syntax error at or near ";" at character 28
HINT:  user id such as calvin expected

CREATE GROUP grp WITH USER 123;
ERROR:  syntax error at or near "123" at character 28
HINT:  user id such as calvin expected

CREATE GROUP grp WITH USER calvin, 123;
ERROR:  syntax error at or near "123" at character 36
HINT:  user id such as calvin expected

CREATE GROUP grp WITH SYSID calvin;
ERROR:  syntax error at or near "calvin" at character 29
HINT:  integer constant expected

CREATE GROUP grp WITH SYSID 'hello';
ERROR:  syntax error at or near "'hello'" at character 29
HINT:  integer constant expected

CREATE GROUP grp WITH USER calvin, hobbes SYSID 'hello';
ERROR:  syntax error at or near "'hello'" at character 49
HINT:  integer constant expected

CREATE GROUP grtp USER 123;
ERROR:  syntax error at or near "123" at character 24
HINT:  user id such as calvin expected

CREATE GROUP grtp SYSID calvin;
ERROR:  syntax error at or near "calvin" at character 25
HINT:  integer constant expected

CREATE GROUP grp WITH USER calvin, hobbes SYSID 123 foo;
ERROR:  syntax error at or near "foo" at character 53
HINT:  user list such as "USER calvin" or "SYSID 123" expected

--
-- ALTER GROUP

ALTER GROUP grp;
ERROR:  syntax error at or near ";" at character 16
HINT:  ADD or DROP expected

ALTER GROUP grp foo;
ERROR:  syntax error at or near "foo" at character 17
HINT:  ADD or DROP expected

ALTER GROUP grp ADD calvin;
ERROR:  syntax error at or near "calvin" at character 21
HINT:  USER user id list expected

ALTER GROUP grp DROP calvin;
ERROR:  syntax error at or near "calvin" at character 22
HINT:  USER user id list expected

ALTER GROUP grp ADD USER 123;
ERROR:  syntax error at or near "123" at character 26
HINT:  user id such as calvin expected

ALTER GROUP grp DROP USER 123;
ERROR:  syntax error at or near "123" at character 27
HINT:  user id such as calvin expected

ALTER GROUP grp ADD USER calvin, USER 123;
ERROR:  syntax error at or near "USER" at character 34
HINT:  user id such as calvin expected

ALTER GROUP grp DROP USER calvin, 123;
ERROR:  syntax error at or near "123" at character 35
HINT:  user id such as calvin expected

ALTER GROUP grp ADD SYSID calvin;
ERROR:  syntax error at or near "SYSID" at character 21
HINT:  USER user id list expected

ALTER GROUP grp DROP SYSID calvin;
ERROR:  syntax error at or near "SYSID" at character 22
HINT:  USER user id list expected

--
-- CREATE SCHEMA...

CREATE SCHEMA 123;
ERROR:  syntax error at or near "123" at character 15
HINT:  schema name or AUTHORIZATION expected

CREATE SCHEMA sch foo;
ERROR:  syntax error at or near "foo" at character 19
HINT:  AUTHORIZATION or ... expected

CREATE SCHEMA sch AUTHORIZATION 123;
ERROR:  syntax error at or near "123" at character 33
HINT:  user id such as calvin expected

CREATE SCHEMA AUTHORIZATION 123;
ERROR:  syntax error at or near "123" at character 29
HINT:  user id such as calvin expected

CREATE SCHEMA sch AUTHORIZATION calvin CREATE foo;
ERROR:  syntax error at or near "foo" at character 47
HINT:  TABLE, INDEX, SEQUENCE, TRIGGER or VIEW expected

CREATE SCHEMA sch AUTHORIZATION calvin CREATE TABLE +;
ERROR:  syntax error at or near "+" at character 53
HINT:  table name... expected

CREATE SCHEMA sch AUTHORIZATION calvin CREATE TABLE foo() xxx;
ERROR:  syntax error at or near "xxx" at character 59
HINT:  schema statement list such as CREATE TABLE/INDEX/SEQUENCE/TRIGGER/VIEW or GRANT or ; expected

CREATE SCHEMA sch AUTHORIZATION calvin CREATE TABLE foo() CREATE TABLE bla() x;
ERROR:  syntax error at or near "x" at character 78
HINT:  schema statement list such as CREATE TABLE/INDEX/SEQUENCE/TRIGGER/VIEW or GRANT or ; expected


--
Fabien Coelho - coelho@cri.ensmp.fr

Re: hint infrastructure setup (v3)

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
> I would not like to break postgresql portability with other parser
> generators.

I agree with Bruce's opinion that this is a nonissue.  In particular I'd
point out that the current grammar is already too big for any known yacc
clone besides bison (even with bison you need a very recent version),
and your proposed changes are going to make the grammar a whole lot
larger yet, at least in terms of states and actions.  Even if there was
any usefulness to supporting other yacc clones, the odds of making one
work seem minuscule.

> Using some non standard undocumented API would not help other people who
> would like to touch the parser.

It would help them a *lot* if it meant that they didn't have to be aware
of this stuff in order to do anything at all with the grammar.  My
fundamental objection to this proposal continues to be that it will make
the grammar unreadable and unmaintainable.  If we could push the error
message generation issues into a subroutine of yyerror() and not touch
the grammar rules at all, the chances of getting this accepted would
rise about a thousand percent.

The sample hints you show in another message still aren't impressing me
much, but in any case they look like they only depend on being able to
see what grammar symbols can follow the current point.  The last time
I studied this stuff (which was quite a while back) the follow set was
something an LR parser generator would have to compute anyway.  I don't
know whether bison's internal tables expose that set in any useful
fashion, but it surely seems worth a look.

            regards, tom lane

PS: another reason for doing it that way is that I suspect your current
approach is a dead end anyhow.  You've already hit one blocker problem
where you got reduce/reduce errors when you tried to insert
state-recording actions, and you've only gone as far as hinting the very
first symbol, which is hardly one of the more complex parts of the
grammar.  There are large parts of the grammar that only manage to avoid
ambiguity by the skin of their teeth --- I don't believe you'll be able
to do anything of this sort in those areas without breaking it.  Take a
look at SelectStmt and subsidiary productions as an example ... that's
a fragile bit of work that took a long time to get right.

Re: hint infrastructure setup (v3)

From
Fabien COELHO
Date:
Dear Tom,

> > I would not like to break postgresql portability with other parser
> > generators.
>
> I agree with Bruce's opinion that this is a nonissue.  In particular I'd
> point out that the current grammar is already too big for any known yacc
> clone besides bison (even with bison you need a very recent version),
> and your proposed changes are going to make the grammar a whole lot
> larger yet, at least in terms of states and actions.

Not really in terms of state. The state should basically be the same.
However yes in terms of "explicit" state that are given explicit names.
And definitely in terms of actions, as you say.


> > Using some non standard undocumented API would not help other people who
> > would like to touch the parser.
>
> It would help them a *lot* if it meant that they didn't have to be aware
> of this stuff in order to do anything at all with the grammar.  My
> fundamental objection to this proposal continues to be that it will make
> the grammar unreadable and unmaintainable.  If we could push the error
> message generation issues into a subroutine of yyerror() and not touch
> the grammar rules at all, the chances of getting this accepted would
> rise about a thousand percent.

Hmmm. I'm so much below zero! So maybe I think I won't resubmit a
"infrastructure patch with some hints for some commands", as it seems
just a lose of time.


> The sample hints you show in another message still aren't impressing me
> much,

I don't mean them to be impressive! It is not an expert system or a seer
I'm planing. I just want them to be useful to my students!


> but in any case they look like they only depend on being able to
> see what grammar symbols can follow the current point.

Yes. That's basically where the parser is when it generates an error, it
is expecting something and cannot find it. I could do more, but with more
efforts.


> The last time I studied this stuff (which was quite a while back) the
> follow set was something an LR parser generator would have to compute
> anyway.

Yes.


> I don't know whether bison's internal tables expose that set in any
> useful fashion, but it surely seems worth a look.

Having a look is something I can do;-)

I'm afraid it looks like "internal state 1232, 43425 and 42523", but there
may be some support enough in the generated code to get something more
interesting. It would require to be able to get textual meta informations
out of the state number, which is possible if bison/flex people did
something about it.

I can remembre something like that, somewhere deep in my memory...


> PS: another reason for doing it that way is that I suspect your current
> approach is a dead end anyhow.  You've already hit one blocker problem
> where you got reduce/reduce errors when you tried to insert
> state-recording actions, and you've only gone as far as hinting the very
> first symbol, which is hardly one of the more complex parts of the
> grammar.

These states exists anyway. Once the first keyword(s) are passed, there is
much less troubles as the SQL syntax is really verbose, so it is usually
hard to get lost. The only real problem I had passed the first symbol, for
the part I played with, is with SET [SESSION|LOCAL] SESSION AUTHORIZATION
which needed a real bit of refactoring.

Also, it is better to avoid some factorizations if the semantics
is different. For instance, user_list is used both for groups and users,
so I switched during my tests to a user_list and a group_list.

> There are large parts of the grammar that only manage to avoid
> ambiguity by the skin of their teeth --- I don't believe you'll be able
> to do anything of this sort in those areas without breaking it.  Take a
> look at SelectStmt and subsidiary productions as an example ... that's
> a fragile bit of work that took a long time to get right.

I noticed that great care has been taken to avoid any ambiguity.
I have no intention to change that.

--
Fabien Coelho - coelho@cri.ensmp.fr

Re: hint infrastructure setup (v3)

From
Andrew Dunstan
Date:
Fabien COELHO wrote:

>>The last time I studied this stuff (which was quite a while back) the
>>follow set was something an LR parser generator would have to compute
>>anyway.
>>
>>
>
>Yes.
>
>
>
>
>>I don't know whether bison's internal tables expose that set in any
>>useful fashion, but it surely seems worth a look.
>>
>>
>
>Having a look is something I can do;-)
>
>I'm afraid it looks like "internal state 1232, 43425 and 42523", but there
>may be some support enough in the generated code to get something more
>interesting. It would require to be able to get textual meta informations
>out of the state number, which is possible if bison/flex people did
>something about it.
>
>

You *really* don't want to go there. If you want to see what the parser
is doing you can run "bison -r all" over the grammar and examine the
.output file. But please, let's not examine the internal states. Talk
about unmaintainability! Also, I suspect that bison does a good bit of
optimisation by way of combining states that removes some of the
information you might need, but I haven't looked into it closely.

If we really wanted to go down this route, a predictive (i.e. LL(n)
type) parser rather than a bottom up parser would probably be more
suitable, but I doubt anyone has any stomach for doing that work, even
if there was agreement on the need for it.

cheers

andrew



Re: hint infrastructure setup (v3)

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> ... your proposed changes are going to make the grammar a whole lot
>> larger yet, at least in terms of states and actions.

> Not really in terms of state. The state should basically be the same.
> However yes in terms of "explicit" state that are given explicit names.
> And definitely in terms of actions, as you say.

But mid-rule actions are implemented by inventing additional internal
productions (see the bison manual's discussion of them; the O'Reilly
lex & yacc book says that all yaccs do it like that).  That's not only
more states, but more symbols, which is going to impose an O(N^2) cost
on the raw tabular representation of the parsing rules.  Maybe much of
this will be bought back when bison compresses the tables, and maybe
not.  Have you checked how much the size of gram.o grows with the stuff
you've installed so far?

(I'm also slightly worried about what this will do to parsing speed,
but evidence about that would have to wait for further development of
the patch.  In any case it'd be more attractive if the cost of this
could be paid only when we've already detected an error...)

> I'm afraid it looks like "internal state 1232, 43425 and 42523", but there
> may be some support enough in the generated code to get something more
> interesting. It would require to be able to get textual meta informations
> out of the state number, which is possible if bison/flex people did
> something about it.

The string names of the grammar symbols are all embedded in gram.c
anyway, so if you can figure out the symbols that are expected next,
their names could be printed directly.  We could alter the symbol names
to be more useful to novices, or alternatively install an additional
lookup table to substitute reader-friendly phrases.

            regards, tom lane

Re: hint infrastructure setup (v3)

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> You *really* don't want to go there. If you want to see what the parser
> is doing you can run "bison -r all" over the grammar and examine the
> .output file. But please, let's not examine the internal states. Talk
> about unmaintainability!

What I was suggesting was that we might be able to extract the "follow
set" from bison's tables, ie, the set of grammar symbols that are legal
next inputs given the current parse state stack.  I surely agree that
we don't want code that goes like "if state is N then print message X"
... but the follow set should be stable.  One way of describing Fabien's
existing patch is that it's essentially keeping track of the follow set
by hand :-(

> Also, I suspect that bison does a good bit of
> optimisation by way of combining states that removes some of the
> information you might need, but I haven't looked into it closely.

That could be a showstopper if true, but it's all speculation at this
point.

            regards, tom lane

Re: hint infrastructure setup (v3)

From
Bruce Momjian
Date:
Fabien COELHO wrote:
> CREATE USER calvin WITH IN GROUP admin, CREATEDB;
> ERROR:  group "admin" does not exist
>
> CREATE USER calvin WITH IN GROUP admin CREATEDB foo;
> ERROR:  syntax error at or near "foo" at character 49
> HINT:  other user options or ';' expected
>
> CREATE USER calvin NOCREATEUSER CREATEDB foo;
> ERROR:  syntax error at or near "foo" at character 42
> HINT:  other user options or ';' expected
>
> CREATE USER calvin CREATEUSER foo;
> ERROR:  syntax error at or near "foo" at character 31
> HINT:  other user options or ';' expected

I hate to say it but I don't see these hints as being very helpful.

Basically, if you look at the developers FAQ, you should discuss what
you want to do, how you want to do it, then code.  I realize you spent a
lot of time on a patch, but I don't see a lot of value in adding such
hints.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: hint infrastructure setup (v3)

From
Stephan Szabo
Date:
On Fri, 2 Apr 2004, Bruce Momjian wrote:

> Fabien COELHO wrote:
> > CREATE USER calvin WITH IN GROUP admin, CREATEDB;
> > ERROR:  group "admin" does not exist
> >
> > CREATE USER calvin WITH IN GROUP admin CREATEDB foo;
> > ERROR:  syntax error at or near "foo" at character 49
> > HINT:  other user options or ';' expected
> >
> > CREATE USER calvin NOCREATEUSER CREATEDB foo;
> > ERROR:  syntax error at or near "foo" at character 42
> > HINT:  other user options or ';' expected
> >
> > CREATE USER calvin CREATEUSER foo;
> > ERROR:  syntax error at or near "foo" at character 31
> > HINT:  other user options or ';' expected
>
> I hate to say it but I don't see these hints as being very helpful.

I can see them as potentially being useful for people who don't have alot
of knowledge of SQL or our dialect thereof.  I think some of the ones
shown may need better wording (for example the ones above basically seem
to mean go look at the help for create user which is pretty much the same
as the error on its own, but perhaps with a longer hint, it might be less
so) but I think it illustrates the point for these kinds of queries.  I'm
not sure that the HINT strings would be as meaningful in the middle of
complicated select/update/etc queries, but that would be something to see.

I'm not sure it's PostgreSQL's responsibility to teach SQL or even really
to teach our own commands, but if it were possible to do without much of a
performance, readability or maintenance cost, it'd probably be worth
doing.  I can't really say much specifically about the patch either way on
any of those grounds (having not actually looked).


Re: hint infrastructure setup (v3)

From
Bruce Momjian
Date:
Stephan Szabo wrote:
> > I hate to say it but I don't see these hints as being very helpful.
>
> I can see them as potentially being useful for people who don't have alot
> of knowledge of SQL or our dialect thereof.  I think some of the ones
> shown may need better wording (for example the ones above basically seem
> to mean go look at the help for create user which is pretty much the same
> as the error on its own, but perhaps with a longer hint, it might be less
> so) but I think it illustrates the point for these kinds of queries.  I'm
> not sure that the HINT strings would be as meaningful in the middle of
> complicated select/update/etc queries, but that would be something to see.
>
> I'm not sure it's PostgreSQL's responsibility to teach SQL or even really
> to teach our own commands, but if it were possible to do without much of a
> performance, readability or maintenance cost, it'd probably be worth
> doing.  I can't really say much specifically about the patch either way on
> any of those grounds (having not actually looked).

I wonder if we could just do a \h command as a hint.  In some cases, it
might be clearer than printing some text.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: hint infrastructure setup (v3)

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> I'm not sure that the HINT strings would be as meaningful in the middle of
> complicated select/update/etc queries, but that would be something to see.

That's pretty much my stumbling block too.  The examples Fabien has
shown so far don't seem to me to be all that much more helpful than a
simple facility to show the "\h command" syntax summary.  Now "\h"
doesn't know anything about syntax errors within expressions, and so
I'm prepared to believe that there's some gold to be mined here if we
can get the parser to offer useful hints in the context of complex
expressions.  What I've not seen yet is any proof that that can be done
at all, much less at a reasonable cost in terms of grammar complexity...

> I'm not sure it's PostgreSQL's responsibility to teach SQL or even really
> to teach our own commands, but if it were possible to do without much of a
> performance, readability or maintenance cost, it'd probably be worth
> doing.

Right; we have to consider those costs and trade them off against the
benefit of better error reporting.

One thought here is that the extra costs might be better spent in a
client program that's in a "training wheels" mode.  Would it be
reasonable to invent a "psql-student" client that has its own version of
the parser with better error detection?  Then production clients
wouldn't have to pay any cycles towards support of novices.  I'm not
sure right now how hard it would be to keep such a client in sync with
the production backend, but it's an idea to consider...

            regards, tom lane

Re: hint infrastructure setup (v3)

From
Fabien COELHO
Date:
Dear Stephan,

> I can see them as potentially being useful for people who don't have alot
> of knowledge of SQL or our dialect thereof.

That is my audience.

> I think some of the ones shown may need better wording [...]

Sure. I'm not the man for writing very clear English sentences.
I may be more fluent in C;-)

> I'm not sure it's PostgreSQL's responsibility to teach SQL or even really
> to teach our own commands,

I don't know about postgreSQL responsability, but I am a teacher using
postgreSQL as a tool for teaching database. I don't need hints, but my
students may find sql a little bit easier to grasp if they have a little
bit more help from the error reports.

I can understand that this is not the main agenda of postgresql, which
is more about new features, better performance and sustained stability.
But it is my agenda as a teacher.

--
Fabien Coelho - coelho@cri.ensmp.fr

Re: hint infrastructure setup (v3)

From
Fabien COELHO
Date:
> I wonder if we could just do a \h command as a hint.  In some cases, it
> might be clearer than printing some text.

\h SELECT or \h CREATE TABLE are 2 pages long.

For the beginner, it is a lot of help... a little bit too much.

So I think that a better job can be done. If I'm wrong, you don't apply
a patch and that's all! Basically, this is the current status;-)

--
Fabien Coelho - coelho@cri.ensmp.fr

Re: hint infrastructure setup (v3)

From
Fabien COELHO
Date:
Dear Tom,

> > Not really in terms of state. The state should basically be the same.
> > However yes in terms of "explicit" state that are given explicit names.
> > And definitely in terms of actions, as you say.
>
> But mid-rule actions are implemented by inventing additional internal
> productions

Sure.

> That's not only more states, but more symbols, which is going to impose
> an O(N^2) cost on the raw tabular representation of the parsing rules.
> Maybe much of this will be bought back when bison compresses the tables,
> and maybe not.

Mmh. Maybe. I don't know at the time.

> Have you checked how much the size of gram.o grows with the stuff
> you've installed so far?

I have not looked at that. It was just a quick and dirty implementation,
just to convince myself of what can be achieved.

> (I'm also slightly worried about what this will do to parsing speed,

Well, more reductions are performed, and I'm not sure that the switch()
implementation is really intelligent. Having a hash table could help big
grammars, but that is bison problems, and I will not rewrite that.

However, I'm not sure that parsing overhead is a big issue wrt other costs
in the backend, but this is not a reason to make it explode.

> > I'm afraid it looks like "internal state 1232, 43425 and [...],
>
> The string names of the grammar symbols are all embedded in gram.c

Yes, I've noticed yytname[].

> anyway, so if you can figure out the symbols that are expected next,
> their names could be printed directly.

That is done with YYERROR_VERBOSE, but the result is really poor
most of the time, because it does not look for all possible terminals,
just the ones easilly accessible. So you have something like:

DROP TABL foo;
ERROR: syntax error at unexpected Ident "TABL", LANGUAGE expected.

no comment. A better job could be done, but you need to "touch" a little
bit "gram.c" for that.

> We could alter the symbol names to be more useful to novices, or
> alternatively install an additional lookup table to substitute
> reader-friendly phrases.

Yep, I thought about that also. Some symbols are appended "_P" to avoid
clashes.


I'm investigating the "internal" way. Not really optimistic because
of the details, but I may find workarounds. I'll post later when I'll
have a definite opinion.

--
Fabien COELHO _ http://www.cri.ensmp.fr/~coelho _ Fabien.Coelho@ensmp.fr
   CRI-ENSMP, 35, rue Saint-Honoré, 77305 Fontainebleau cedex, France
   phone: (+33|0) 1 64 69 {voice: 48 52, fax: 47 09, standard: 47 08}
       ________  All opinions expressed here are mine  _________

Re: hint infrastructure setup (v3)

From
Fabien COELHO
Date:
> > You *really* don't want to go there. If you want to see what the parser
> > is doing you can run "bison -r all" over the grammar and examine the
> > .output file. But please, let's not examine the internal states. Talk
> > about unmaintainability!
>
> What I was suggesting was that we might be able to extract the "follow
> set" from bison's tables, ie, the set of grammar symbols that are legal
> next inputs given the current parse state stack.

> I surely agree that we don't want code that goes like "if state is N
> then print message X" ... but the follow set should be stable

These are 2 different issues.

(1) computing the set of expected following tokens.

It is possible to do that, with some processing of bison tables.


(2) give advices based on guesses:
    for what I have looked so far, it could look like:

 - I'm here for rule "user_list", the previous token was ','
   OR I'm here for "select_expressions_list", last token was ',' and
      current token is "FROM"

   => "HINT: remove previous comma"

I don't think that (2) would be a bad idea, especially for my students.
The good point is that the cost would only be paid at error time.


> One way of describing Fabien's existing patch is that it's essentially
> keeping track of the follow set by hand :-(

Yes. I give name to existing states, and states leads to expected
follow tokens.

> > Also, I suspect that bison does a good bit of
> > optimisation by way of combining states that removes some of the
> > information you might need, but I haven't looked into it closely.
>
> That could be a showstopper if true, but it's all speculation at this
> point.

I think the information is there.

--
Fabien Coelho - coelho@cri.ensmp.fr

Re: hint infrastructure setup (v3)

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
> That is done with YYERROR_VERBOSE, but the result is really poor
> most of the time, because it does not look for all possible terminals,
> just the ones easilly accessible.

I wasn't aware that bison had a built-in facility for better messages
--- this is kind of cool actually.  I played with it a little and found
that:

1. There's an arbitrary restriction in the bison code to show no more
than four alternatives; if there's more than four then it shows none
of them, rather than doing something helpful like "etc."

2. The problem you exhibit with DROP seems to be due to our use of
empty productions:

> DROP TABL foo;
> ERROR: syntax error at unexpected Ident "TABL", LANGUAGE expected.

It looks like the parser makes one additional state move, reducing
"opt_procedural" to "empty", before raising the error.  We might be
able to suppress that; alternatively we could look at the stacked
state(s) and add their follow sets to the printout.  In any case it
is clear that we can extract the follow set from the tables.

A bigger problem with this, though, is the verbosity of the results
in some cases.  I diked out the limitation to four outputs and soon
found examples like

regression=# grant;
ERROR:  syntax error, unexpected ';', expecting ALL or CREATE or DELETE_P or EXECUTE or INSERT or REFERENCES or RULE or
SELECTor TEMP or TEMPORARY or TRIGGER or UPDATE or USAGE at or near ";" at character 6 
LINE 1: grant;
             ^

which is useful, and

regression=# grant select  on t to;
ERROR:  syntax error, unexpected ';', expecting ABORT_P or ABSOLUTE_P or ACCESS or ACTION or ADD or AFTER or AGGREGATE
orALTER or ASSERTION or ASSIGNMENT or AT or BACKWARD or BEFORE or BEGIN_P or BIGINT or BIT or BOOLEAN_P or BY or CACHE
orCALLED or CASCADE or CHAIN or CHAR_P or CHARACTER or CHARACTERISTICS or CHECKPOINT or CLASS or CLOSE or CLUSTER or
COALESCEor COMMENT or COMMIT or COMMITTED or CONSTRAINTS or CONVERSION_P or CONVERT or COPY or CREATEDB or CREATEUSER
orCURSOR or CYCLE or DATABASE or DAY_P or DEALLOCATE or DEC or DECIMAL_P or DECLARE or DEFAULTS or DEFERRED or DEFINER
orDELETE_P or DELIMITER or DELIMITERS or DOMAIN_P or DOUBLE_P or DROP or EACH or ENCODING or ENCRYPTED or ESCAPE or
EXCLUDINGor EXCLUSIVE or EXECUTE or EXISTS or EXPLAIN or EXTERNAL or EXTRACT or FETCH or FIRST_P or FLOAT_P or FORCE or
FORWARDor FUNCTION or GLOBAL or GROUP_P or HANDLER or HOLD or HOUR_P or IMMEDIATE or IMMUTABLE or IMPLICIT_P or
INCLUDINGor INCREMENT or INDEX or INHERIT! 
 S or INOUT or INPUT_P or INSENSITIVE or INSERT or INSTEAD or INT_P or INTEGER or INTERVAL or INVOKER or ISOLATION or
KEYor LANCOMPILER or LANGUAGE or LARGE_P or LAST_P or LEVEL or LISTEN or LOAD or LOCAL or LOCATION or LOCK_P or MATCH
orMAXVALUE or MINUTE_P or MINVALUE or MODE or MONTH_P or MOVE or NAMES or NATIONAL or NCHAR or NEXT or NO or NOCREATEDB
orNOCREATEUSER or NONE or NOTHING or NOTIFY or NULLIF or NUMERIC or OBJECT_P or OF or OIDS or OPERATOR or OPTION or
OUT_Por OVERLAY or OWNER or PARTIAL or PASSWORD or PATH_P or PENDANT or POSITION or PRECISION or PRESERVE or PREPARE or
PRIORor PRIVILEGES or PROCEDURAL or PROCEDURE or READ or REAL or RECHECK or REINDEX or RELATIVE_P or RENAME or
REPEATABLEor REPLACE or RESET or RESTART or RESTRICT or RETURNS or REVOKE or ROLLBACK or ROW or ROWS or RULE or SCHEMA
orSCROLL or SECOND_P or SECURITY or SEQUENCE or SERIALIZABLE or SESSION or SET or SETOF or SHARE or SHOW or SIMPLE or
SMALLINTor STABLE or START or STATEMENT o! 
 r STATISTICS or STDIN or STDOUT or STORAGE or STRICT_P or SUBSTRING or
 SYSID or TEMP or TEMPLATE or TEMPORARY or TIME or TIMESTAMP or TOAST or TRANSACTION or TREAT or TRIGGER or TRIM or
TRUNCATEor TRUSTED or TYPE_P or UNCOMMITTED or UNENCRYPTED or UNKNOWN or UNLISTEN or UNTIL or UPDATE or USAGE or VACUUM
orVALID or VALIDATOR or VALUES or VARCHAR or VARYING or VERSION or VIEW or VOLATILE or WITH or WITHOUT or WORK or WRITE
orYEAR_P or ZONE or IDENT at or near ";" at character 22 
LINE 1: grant select  on t to;
                             ^

which is not real useful at all :-(.  You really want to see just
"expecting IDENT" in such a case.  Still we might be able to do some
postprocessing on the collected set of valid follow symbols, such as
removing all the unreserved_keywords when they are present along with
IDENT.  It'd be fairly reasonable to embed knowledge about this in
keywords.c and/or scan.l.

We'd have to write our own version of bison's verbose-error code anyway,
because the canned code doesn't support localization --- it uses
hardwired strings for "expecting" and so on.  But it looks possibly
doable to me.

            regards, tom lane

Re: hint infrastructure setup (v3)

From
Fabien COELHO
Date:
Dear Tom,

First, thanks for the discussion about my "hint" infrastructure patch
submission. Whatever the opinion, it helps.


> We'd have to write our own version of bison's verbose-error code anyway,
> because the canned code doesn't support localization --- it uses
> hardwired strings for "expecting" and so on.  But it looks possibly
> doable to me.

I also played a little bit with the parser over the weekend. I found the
same issues that you noted in your mail, tried more options, and found
other problems. I sent here the full mail I wrote.

Although I agree that it is "doable", I have stronger reserve than yours.
Also, I do not find it an appealing solution to change "gram.c" a lot.

The problems I found are:


(1) not available raw information.

The automaton stack keeps states, which are not directly linked to rules
and terminals. The terminals are not available, they must be kept
separatly if you want them. This can be done in yylex().

The internal state, stack, token... are local variables within yyparse().
As a result, they are not accessible from yyerror. I haven't found any
available hook, so you have to hack "gram.c" to get this information. It
is a 1 line hack, but it modifies the generated code. Or you have to
produce a new hacked "gram.c", which is what you suggest.


(2) hard to compute follow set

The information about the syntax is there, obviously! However, from a
given automaton state, the actually allowed tokens are not directly
available. Only some are directly available, so the result is:

DROP ? ;
ERROR:  syntax error at or near "?" at character 6
HINT:  hint status is state=422 n=796 char=575 type=320=Op stack= 0 16 422
       tokens= 88:DROP 320:Op
       expected= 150:LANGUAGE?

Other tokens may be found, but you have either:

- to *simulate* the automaton forward through reductions to check whether
  it is finally accepted. Not really beautiful, but that can be done.
  It is just the bison code without the reduction production rules;-)

- move backwards before doing the above, if some reductions where
  performed because of the submitted token and finally resulted in the error,
  the state that lead to the error may not be the highest available one, so
  maybe other allowed tokens may also be missed. We would need to have
  the last state before any reduction.
  Another small hack in generated "gram.c" would be needed to get it.

However,


(3) the computed follow set is often useless, as you noted.

The reason for that is that keywords are accepted in place of identifiers.
The use of rules: reserved_keyword, func_name_keyword, col_name_keyword,
unreserved_keyword as possible column/function/... identifiers is IMHO the
main reason for the size of the generated automaton.

If you drop that, everything would be much simpler, because it would
reduce a lot the number of arcs in the automaton. In particular, anything
which is just an identifier should not be given a name (say, BOOLEAN_P or
SMALLINT should not be tokens, but could rather be recognized as a special
case of identifier from within the parser, maybe with some simple
post-filtering).

As you noted, for things like "SHOW 123", the follow set basically
includes all keywords although you can have SHOW ALL or SHOW name.

So, as you suggested, you can either say "ident" as a simplification, but
you miss ALL which is meaningful, or you say all keywords, which is
useless.

An alternative is to know that you're within a "SHOW something", that is
you somehow reparse the code from within the syntax error:-( It is
necessary if you want to say something more interesting than "ident", say
"option name". You may also get this information by simulating the
automaton forward, or noticing that some of the current on stack states
can only lead to the reduction of a ShowStmt...

Well, on the positive side, it is sometimes right, say:

REVOKE ? ;
ERROR:  syntax error, unexpected Op at or near "?" at character 8
HINT:  hint status is state=481 n=1297 char=575 type=320=Op
  stack= 0 31 481
  tokens= 234:REVOKE 320:Op
  expected= 10:ALL 59:CREATE 80:DELETE_P 98:EXECUTE 136:INSERT 224:REFERENCES
            239:RULE 244:SELECT 268:TEMP 270:TEMPORARY 279:TRIGGER 292:UPDATE
            293:USAGE?


(4) the available automaton data are inverted with respect to what would
be needed... It would be nice to know whether we're heading for a create
statement or something like that, but the information is hidden down the
automaton. basically, the automaton would need to be "inverted" to compute
this information. We would need the initial information in "gram.y", which
was "inverted" to build the automaton. So the code somehow needs to undo
what bison compilation has done.


(5) anything that can be done would be hardwired to one version of bison.
There is a lot of asumptions in the code and data structures, and any
newer/older version with some different internal representation would
basically break any code that would rely on that. So postgres would not be
"bison" portable:-( I don't think it is an real option that old postgresql
source would be broken against future bison releases.


So, as far as I'm concerned, I don't find the internal way really
convincing as the way to go. Thus I can see three options:

(a) put hints in "gram.y" as I already suggested
    + I can do it
    + it can be incremental once the infrastructure is in place.
    + hints are simple to add or fix
    = hints are not necessarily marvellous
    - it would take time to develop
    - it may slow down the parser, because of added production rules.
    - it would change "gram.y" a lot and could make it harder to maintain
    - patchers don't like it...

(b) write a new "recursive descendant" parser, and drop "gram.y"
    + I could do it
    + hints could be more intelligent
    + I think the parser would be faster, at least not slower
    = I'm not sure it would be really easier to maintain, but maybe not harder
    - it cannot be incremental at all
    - hints are at the price of some programming
    - it would take time to develop, but basically the structure would
      look like "gram.y" a lot, and existing data structures can be kept.
    - I'm not sure patchers would like it, and if it is thrown down the drain,
      I would not be happy for the one who spent the time (say, me;-)

(c) do nothing.
    + I can do that quite easily;-)
    + patchers are ok with that
    + it does not take time
    = the maintainability of "gram.y" is not changed.
    - my students won't have hints.

A funny technical detail is that the refactoring which is needed for the
(a) option would be also needed for (b). (b) would require more time than
(a), but the results could be better for the hint/parser error handling.

I can have some time for this, especially over the next few months.
Incremental options looked a better way to me, but (a) seems stuck and (b)
is risky, and the internal way looks ugly.


As a side effect of my inspection is that the automaton generated by bison
could be simplified if a different tradeoff between the lexer, the parser
and the post-processing would be chosen. Namelly, all tokens that are
just identifiers should be dropped and processed differently.

Have a nice day,

--
Fabien Coelho - coelho@cri.ensmp.fr

Re: hint infrastructure setup (v3)

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
> Although I agree that it is "doable", I have stronger reserve than yours.
> Also, I do not find it an appealing solution to change "gram.c" a lot.

I was not proposing hand-editing gram.c after bison generates it,
if that's what you meant ;-).  It seems however perfectly doable to
reference bison's state stack from yyerror.  We'd need some macro
hacking to make yystate and the stack available to yyerror, but nothing
worse than is already documented and recommended practice for other
bison tweaks.

> The automaton stack keeps states, which are not directly linked to rules
> and terminals. The terminals are not available, they must be kept
> separatly if you want them. This can be done in yylex().

We don't need them.  Any already-shifted tokens are to the left of where
the error is, no?

> The internal state, stack, token... are local variables within yyparse().
> As a result, they are not accessible from yyerror. I haven't found any
> available hook, so you have to hack "gram.c" to get this information.

No, just redefine "yyerror" as a macro that passes additional
parameters.

> - move backwards before doing the above, if some reductions where
>   performed because of the submitted token and finally resulted in the error,
>   the state that lead to the error may not be the highest available one, so
>   maybe other allowed tokens may also be missed. We would need to have
>   the last state before any reduction.

Yeah, I had come to the same conclusion --- state moves made without
consuming input would need to be backed out if we wanted to report the
complete follow set.  I haven't yet looked to see how to do that.

> As you noted, for things like "SHOW 123", the follow set basically
> includes all keywords although you can have SHOW ALL or SHOW name.
> So, as you suggested, you can either say "ident" as a simplification, but
> you miss ALL which is meaningful, or you say all keywords, which is
> useless.

You're ignoring the distinction between classes of keywords.  I would
not recommend treating reserved_keywords as a subclass of ident.

> (5) anything that can be done would be hardwired to one version of bison.
> There is a lot of asumptions in the code and data structures, and any
> newer/older version with some different internal representation would
> basically break any code that would rely on that. So postgres would not be
> "bison" portable:-( I don't think it is an real option that old postgresql
> source would be broken against future bison releases.

I think this argument is completely without merit.  The technology of
LALR parsers has been stable for what, thirty years now?  The parts of
bison that we'd want to look at are inherited lock stock and barrel from
AT&T yacc, and are unlikely to change in the foreseeable future; even
more unlikely to change in a way that we couldn't easily adapt to.  You
might as well argue that we shouldn't use autoconf because the autoconf
authors sometimes make not-very-compatible changes.

> (b) write a new "recursive descendant" parser, and drop "gram.y"

Been there, done that, not impressed with the idea.  There's a reason
why people invented parser generators...

> As a side effect of my inspection is that the automaton generated by bison
> could be simplified if a different tradeoff between the lexer, the parser
> and the post-processing would be chosen. Namelly, all tokens that are
> just identifiers should be dropped and processed differently.

We are not going to reserve the keywords that are presently unreserved.
If you can think of a reasonable way to stop treating them as separate
tokens inside the grammar without altering the user-visible behavior,
I'm certainly interested.  I think that will be rather difficult,
however, considering for one thing that SQL specifies different
case-folding behavior for identifiers and keywords.

            regards, tom lane

Re: hint infrastructure setup (v3)

From
Fabien COELHO
Date:


Dear Tom,

> No, just redefine "yyerror" as a macro that passes additional parameters.

Ok!  That's just the kind of the hack-hook I was looking for!!


> > The terminals are not available, they must be kept separatly if you
> > want them. This can be done in yylex().
>
> We don't need them. Any already-shifted tokens are to the left of where
> the error is, no?

Yes and no. I agree that you don't need them for computing the follow set.
However it would help a lot to generate nicer hints. I'm looking for help
the user, and the follow set is just part of the help.

Think of typical "SELECT foo, bla, FROM xxx" (it looks stupid on a one
line query, but not so on a very large query) which is quite easier to
detect and hint about because of FROM is just after a comma.

The rule part is also a real issue. There is no easy way to translate
states into rules, to know whether we're somewhere in a "ShowStmt" or
"AlterTableStmt"...

If you're after a comma in state 1232, should you just say IDENT... I'd
rather say "user name" or "table name" if I can... Otherwise the hint
stays at a low parser level. Good hints requires to know about the
current context, and it is not easy to get that deep in the automaton.


> Yeah, I had come to the same conclusion --- state moves made without
> consuming input would need to be backed out if we wanted to report the
> complete follow set.  I haven't yet looked to see how to do that.

Well, following you're previous suggestion, one can redefine the YYLEX
macro to save the current state each time a token is required.


> > As you noted, for things like "SHOW 123", the follow set basically
> > includes all keywords although you can have SHOW ALL or SHOW name.
> > So, as you suggested, you can either say "ident" as a simplification, but
>
> You're ignoring the distinction between classes of keywords.  I would
> not recommend treating reserved_keywords as a subclass of ident.

Ok, I agree that it can help in this very case a little bit here because
ALL is reserved. I did not make this distinction when I played with bison.


> > (5) anything that can be done would be hardwired to one version of bison.
>
> I think this argument is completely without merit.

Possible;-)

However I don't like writing hacks that depends on other people future
behavior.


> > (b) write a new "recursive descendant" parser, and drop "gram.y"
>
> Been there, done that, not impressed with the idea.  There's a reason
> why people invented parser generators...

Sure. It was just a list of options;-)


> > As a side effect of my inspection is that the automaton generated by bison
> > could be simplified if a different tradeoff between the lexer, the parser
> > and the post-processing would be chosen. Namelly, all tokens that are
> > just identifiers should be dropped and processed differently.
>
> We are not going to reserve the keywords that are presently unreserved.

Reserving keywords would help, but I would not think it could be accepted,
because it is not SQL philosophy. SQL is about 30 years also, as old
as yacc ideas... but they did not care at the time;-) When you look at
the syntax, it was designed with a recursive parser in mind.

Part of my comment was to explain "why" the generated automaton is large.
SQL is a small language, but it has a big automaton in postgresql, and
this is IMHO the explanation.


> If you can think of a reasonable way to stop treating them as separate
> tokens inside the grammar without altering the user-visible behavior,
> I'm certainly interested.

I was thinking about type names especially, and maybe others.

I join a small proof-of-concept patch to drop some tokens out of the
parser. As a results, 6 tokens, 6 rules and 9 states are removed,
and the automaton table is reduced by 438 entries (1.5%). Not too bad;-)
I think others could be dealt with similarily, or with more work.


Thanks for the discussion,

--
Fabien.

Attachment

Re: hint infrastructure setup (v3)

From
Fabien COELHO
Date:
> As another teaching aid idea, is there something that would auto-format
> queries to they are more readable?

Not that I know of. But I guess it must exist in some interface,
maybe with Oracle or DB2.

My emacs does seem to do that. It just put colors on keywords.

If I had to put it somewhere, I would try to enhance emacs sql-mode.
But I don't like much lisp programming.

--
Fabien Coelho - coelho@cri.ensmp.fr

Re: hint infrastructure setup (v3)

From
Bruce Momjian
Date:
As another teaching aid idea, is there something that would auto-format
queries to they are more readable?

---------------------------------------------------------------------------

Fabien COELHO wrote:
>
>
>
> Dear Tom,
>
> > No, just redefine "yyerror" as a macro that passes additional parameters.
>
> Ok!  That's just the kind of the hack-hook I was looking for!!
>
>
> > > The terminals are not available, they must be kept separatly if you
> > > want them. This can be done in yylex().
> >
> > We don't need them. Any already-shifted tokens are to the left of where
> > the error is, no?
>
> Yes and no. I agree that you don't need them for computing the follow set.
> However it would help a lot to generate nicer hints. I'm looking for help
> the user, and the follow set is just part of the help.
>
> Think of typical "SELECT foo, bla, FROM xxx" (it looks stupid on a one
> line query, but not so on a very large query) which is quite easier to
> detect and hint about because of FROM is just after a comma.
>
> The rule part is also a real issue. There is no easy way to translate
> states into rules, to know whether we're somewhere in a "ShowStmt" or
> "AlterTableStmt"...
>
> If you're after a comma in state 1232, should you just say IDENT... I'd
> rather say "user name" or "table name" if I can... Otherwise the hint
> stays at a low parser level. Good hints requires to know about the
> current context, and it is not easy to get that deep in the automaton.
>
>
> > Yeah, I had come to the same conclusion --- state moves made without
> > consuming input would need to be backed out if we wanted to report the
> > complete follow set.  I haven't yet looked to see how to do that.
>
> Well, following you're previous suggestion, one can redefine the YYLEX
> macro to save the current state each time a token is required.
>
>
> > > As you noted, for things like "SHOW 123", the follow set basically
> > > includes all keywords although you can have SHOW ALL or SHOW name.
> > > So, as you suggested, you can either say "ident" as a simplification, but
> >
> > You're ignoring the distinction between classes of keywords.  I would
> > not recommend treating reserved_keywords as a subclass of ident.
>
> Ok, I agree that it can help in this very case a little bit here because
> ALL is reserved. I did not make this distinction when I played with bison.
>
>
> > > (5) anything that can be done would be hardwired to one version of bison.
> >
> > I think this argument is completely without merit.
>
> Possible;-)
>
> However I don't like writing hacks that depends on other people future
> behavior.
>
>
> > > (b) write a new "recursive descendant" parser, and drop "gram.y"
> >
> > Been there, done that, not impressed with the idea.  There's a reason
> > why people invented parser generators...
>
> Sure. It was just a list of options;-)
>
>
> > > As a side effect of my inspection is that the automaton generated by bison
> > > could be simplified if a different tradeoff between the lexer, the parser
> > > and the post-processing would be chosen. Namelly, all tokens that are
> > > just identifiers should be dropped and processed differently.
> >
> > We are not going to reserve the keywords that are presently unreserved.
>
> Reserving keywords would help, but I would not think it could be accepted,
> because it is not SQL philosophy. SQL is about 30 years also, as old
> as yacc ideas... but they did not care at the time;-) When you look at
> the syntax, it was designed with a recursive parser in mind.
>
> Part of my comment was to explain "why" the generated automaton is large.
> SQL is a small language, but it has a big automaton in postgresql, and
> this is IMHO the explanation.
>
>
> > If you can think of a reasonable way to stop treating them as separate
> > tokens inside the grammar without altering the user-visible behavior,
> > I'm certainly interested.
>
> I was thinking about type names especially, and maybe others.
>
> I join a small proof-of-concept patch to drop some tokens out of the
> parser. As a results, 6 tokens, 6 rules and 9 states are removed,
> and the automaton table is reduced by 438 entries (1.5%). Not too bad;-)
> I think others could be dealt with similarily, or with more work.
>
>
> Thanks for the discussion,
>
> --
> Fabien.

Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: hint infrastructure setup (v3)

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> If you can think of a reasonable way to stop treating them as separate
>> tokens inside the grammar without altering the user-visible behavior,
>> I'm certainly interested.

> I join a small proof-of-concept patch to drop some tokens out of the
> parser.

I believe these were treated this way *specifically* because of the
keyword-is-not-an-identifier issue.  SQL99 calls out most of these
as being keywords:

         SQL defines predefined data types named by the following <key
         word>s: CHARACTER, CHARACTER VARYING, CHARACTER LARGE OBJECT,
         BINARY LARGE OBJECT, BIT, BIT VARYING, NUMERIC, DECIMAL, INTEGER,
         SMALLINT, FLOAT, REAL, DOUBLE PRECISION, BOOLEAN, DATE, TIME,
         TIMESTAMP, and INTERVAL. These names are used in the type

and if we don't treat them as keywords then we will have a couple of
problems.  One is case-conversion issues in locales where the standard
downcasing is not an extension of ASCII (Turkish is the only one I know
of offhand).  Another is that depending on where you put the renaming
that this patch removes without replacing :-(, it would be possible for
the renaming transformation to get applied to user-defined types with
similar names, or for user-defined types to unexpectedly shadow system
definitions.  The former would be surprising and the latter would
violate the spec.  Check the archives; I'm sure this was discussed in
the 7.3 development cycle and we concluded that treating these names
as keywords was the only practical solution.

            regards, tom lane

Re: hint infrastructure setup (v3)

From
Andrew Dunstan
Date:
Tom Lane wrote:

>Fabien COELHO <coelho@cri.ensmp.fr> writes:
>
>
>>(b) write a new "recursive descendant" parser, and drop "gram.y"
>>
>>

er, that's "recursive descent" :-)

>
>Been there, done that, not impressed with the idea.  There's a reason
>why people invented parser generators...
>
>


Well, unless you are a serious masochist, cutting a parser of the LR
family (including LALR(1), such as yacc/bison produce) by hand is very
difficult and error prone. That is not the case with LL type parsers.
Also, there are parser generators for this family, both table driven and
recursive descent. The table driven variety especially can have quite
well tuned error reporting and recovery. (Among the reasons I know this
is that I actually wrote such a beast around 13 years ago).

That is not to say that I think we should move from the present setup.
Familiarity of the developer community with the tool used suggests we
should not, quite apart from any other reasons. Changing to, say, an RD
parser, would be a massive and probably error prone change and the
benefit just doesn't seem worth it.

In fact, considering this thread I'm not sure any of the suggested steps
will achieve Fabien's goal. ISTM that a smarter "training wheels"
frontend, perhaps with some modest backend improvements, is likely to
have better results. ("Oh, you found an error near there - now what do I
suggest belongs there?")

cheers

andrew







Re: hint infrastructure setup (v3)

From
Fabien COELHO
Date:
Dear Tom,

> No, just redefine "yyerror" as a macro that passes additional parameters.

Ok!  That's just the kind of the hack-hook I was looking for!!


> > The terminals are not available, they must be kept separatly if you
> > want them. This can be done in yylex().
>
> We don't need them. Any already-shifted tokens are to the left of where
> the error is, no?

Yes and no. I agree that you don't need them for computing the follow set.
However it would help a lot to generate nicer hints. I'm looking for help
the user, and the follow set is just part of the help.

Think of typical "SELECT foo, bla, FROM xxx" (it looks stupid on a one
line query, but not so on a very large query) which is quite easier to
detect and hint about because of FROM is just after a comma.

The rule part is also a real issue. There is no easy way to translate
states into rules, to know whether we're somewhere in a "ShowStmt" or
"AlterTableStmt"...

If you're after a comma in state 1232, should you just say IDENT... I'd
rather say "user name" or "table name" if I can... Otherwise the hint
stays at a low parser level. Good hints requires to know about the
current context, and it is not easy to get that deep in the automaton.


> Yeah, I had come to the same conclusion --- state moves made without
> consuming input would need to be backed out if we wanted to report the
> complete follow set.  I haven't yet looked to see how to do that.

Well, following you're previous suggestion, one can redefine the YYLEX
macro to save the current state each time a token is required.


> > As you noted, for things like "SHOW 123", the follow set basically
> > includes all keywords although you can have SHOW ALL or SHOW name.
> > So, as you suggested, you can either say "ident" as a simplification, but
>
> You're ignoring the distinction between classes of keywords.  I would
> not recommend treating reserved_keywords as a subclass of ident.

Ok, I agree that it can help in this very case a little bit here because
ALL is reserved. I did not make this distinction when I played with bison.


> > (5) anything that can be done would be hardwired to one version of bison.
>
> I think this argument is completely without merit.

Possible;-)

However I don't like writing hacks that depends on other people future
behavior.


> > (b) write a new "recursive descendant" parser, and drop "gram.y"
>
> Been there, done that, not impressed with the idea.  There's a reason
> why people invented parser generators...

Sure. It was just a list of options;-)


> > As a side effect of my inspection is that the automaton generated by bison
> > could be simplified if a different tradeoff between the lexer, the parser
> > and the post-processing would be chosen. Namelly, all tokens that are
> > just identifiers should be dropped and processed differently.
>
> We are not going to reserve the keywords that are presently unreserved.

Reserving keywords would help, but I would not think it could be accepted,
because it is not SQL philosophy. SQL is about 30 years also, as old
as yacc ideas... but they did not care at the time;-) When you look at
the syntax, it was designed with a recursive parser in mind.

Part of my comment was to explain "why" the generated automaton is large.
SQL is a small language, but it has a big automaton in postgresql, and
this is IMHO the explanation.


> If you can think of a reasonable way to stop treating them as separate
> tokens inside the grammar without altering the user-visible behavior,
> I'm certainly interested.

I was thinking about type names especially, and maybe others.

I join a small proof-of-concept patch to drop some tokens out of the
parser. As a results, 6 tokens, 6 rules and 9 states are removed,
and the automaton table is reduced by 438 entries (1.5%). Not too bad;-)
I think others could be dealt with similarily, or with more work.


Thanks for the discussion,

--
Fabien.

Attachment

Re: hint infrastructure setup (v3)

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> Another is that depending on where you put the renaming that this patch
>> removes without replacing :-(,

> I do not understand your point. It seems to me that the renaming is
> performed when a type name is expected? The "boolean"  keyword (not token)
> is translated to system "bool" type in the GenericType rule?? ???

I mean that you removed functionality without putting it back; the
modified parser will fail to recognize BOOLEAN as a type name at all,
because it doesn't match "bool" which is in the catalogs.  (And changing
the entry to "boolean" is not a solution, it just moves the problem.)

I assume you intended to handle this by doing the substitutions in type
name lookup elsewhere in the parser, but I don't think that is a valid
solution, because there is no longer enough information.  In particular
you can't any longer tell the difference between BOOLEAN and "boolean"
(with quotes), which are not the same thing --- a quoted string is never
a keyword, per spec.

Possibly a better example than boolean is the REAL => pg_catalog.float4
transformation.  If a user has defined his own type named foo.real,
he ought to be able to refer to it as "real" (with quotes) and not get
messed up by the keyword transformation.  I think our original
motivation for converting all these things to keywords was the
realization that pg_dump would in fact screw up and fail to dump such
a type definition correctly if "real" wasn't recognized as conflicting
with a keyword (which is what prompts pg_dump to stick quotes on).

The basic point here is that eliminating tokens as you propose will
result in small changes in behavior, none of which are good or per spec.
Making the parser automaton smaller would be nice, but not at that
price.

> My point is that you can have the very same *semantical* result with a
> smaller automaton if you chose a different trade-off within the
> lexer/parser/post filtering. I don't want to change the language.

You have not proven that you can have the same result.

            regards, tom lane

Re: hint infrastructure setup (v3)

From
Fabien COELHO
Date:
Dear Tom,

> > I join a small proof-of-concept patch to drop some tokens out of the
> > parser.
>
> I believe these were treated this way *specifically* because of the
> keyword-is-not-an-identifier issue.  SQL99 calls out most of these
> as being keywords:

Well, I think that the "reserved keywords" are fine as tokens in a
lexer/parser, but think that the "unreserved keywords" should be dropped
of the token status if possible.

> and if we don't treat them as keywords then we will have a couple of
> problems. One is case-conversion issues in locales where the standard
> downcasing is not an extension of ASCII (Turkish is the only one I know
> of offhand).

Do you mean it should use an ASCII-only strcasecmp, not a possibly
"localised" version? I agree, but this is just a "proof of concept"
patch to show that you don't need so many tokens in the parser.

> Another is that depending on where you put the renaming that this patch
> removes without replacing :-(,

I do not understand your point. It seems to me that the renaming is
performed when a type name is expected? The "boolean"  keyword (not token)
is translated to system "bool" type in the GenericType rule?? ???

> it would be possible for the renaming transformation to get applied to
> user-defined types with similar names, or for user-defined types to
> unexpectedly shadow system definitions.

I don't think that the patch changes the result of the parsing. It drops
*TOKENS* out of the lexer, but they are still *KEYWORDS*, although they
are not explicitly in the lexer list.

"keyword.c" deals with tokens, the file name was ill-chosen. If you think
that keywords can only be lexical tokens, then you end-up with an
automaton larger than necessary, IMVHO.

Note that the removed tokens are still "keywords" as they are treated
*especially* anyway. It is not a semantical transformation.

Also, if you don't want these names as candidate function names, they
could be filtered out at some other point in the parser. They really don't
need to be special tokens.

My point is that you can have the very same *semantical* result with a
smaller automaton if you chose a different trade-off within the
lexer/parser/post filtering. I don't want to change the language.

> The former would be surprising and the latter would violate the spec.

I'm really not sure this is the case with the patch I sent.

> Check the archives; I'm sure this was discussed in the 7.3 development
> cycle and we concluded that treating these names as keywords was the
> only practical solution.

Hmmm... I can check the archive, but I cannot see how different the
language is with the patch. Maybe there is a missing filter out, or
strcasecmp is not the right version, but no more.

I think it is a small technical issue in the parser internals, and has
nothing to do with great principles and whether this or that is a keyword.
It's about what keywords need to be tokens.

--
Fabien Coelho - coelho@cri.ensmp.fr

Re: hint infrastructure setup (v3)

From
Fabien COELHO
Date:
> >>(b) write a new "recursive descendant" parser, and drop "gram.y"
>
> er, that's "recursive descent" :-)

Sorry for my French version.

> Well, unless you are a serious masochist,

I'm not a masochist;-) I'm arguing about where hint should/could be put.

> In fact, considering this thread I'm not sure any of the suggested steps
> will achieve Fabien's goal. ISTM that a smarter "training wheels"
> frontend, perhaps with some modest backend improvements, is likely to
> have better results. ("Oh, you found an error near there - now what do I
> suggest belongs there?")

I cannot see what you're suggesting "practically" as a frontend.

I don't think having another parser next to the first one for better error
messages is a serious option? I would not like to put another parser that
need to be kept synchronized with the first one. So either it is
integrated or linked with the current parser, or it is not there?

Out of the parser, the only information is the offending token (embedded
in the error string) and the character number in the string, that is quite
small a clue to give a hint.

--
Fabien Coelho - coelho@cri.ensmp.fr

Re: hint infrastructure setup (v3)

From
Fabien COELHO
Date:
Dear Tom,

> In particular you can't any longer tell the difference between BOOLEAN
> and "boolean" (with quotes), which are not the same thing --- a quoted
> string is never a keyword, per spec. [...]

Ok, so you mean that on -boolean- the lexer returns a BOOLEAN_P token, but
with -"boolean"- it returns an Ident and -boolean- as a lval. Indeed, in
such a case I cannot recognize that simply boolean vs "boolean" if they
are both idents that look the same.

As a matter of fact, this can also be fixed with some post-filtering. Say,
all quoted idents could be returned with a leading " to show it was
dquoted, and the IDENT rules in the parser could remove when it is not
needed anymore to distinguish the case.

Not beautiful, I agree, but my point is that the current number of tokens
and number of states and automaton size are not inherent to SQL but to the
way the lexing/parsing is performed in postgresql.

> The basic point here is that eliminating tokens as you propose will
> result in small changes in behavior, none of which are good or per spec.
> Making the parser automaton smaller would be nice, but not at that
> price.

Ok. I don't want to change the spec. I still stand that it can be done,
although some more twicking is required. It was just a "proof of concept",
not a patch submission. Well, a "proof of concept" must still be a proof;-)

I attach a small patch that solve the boolean vs "boolean" issue, still as
a proof of concept that it is 'doable' to preserve semantics with a
different lexer/parser balance. I don't claim that it should be applied, I
just claim that the automaton size could be smaller, especially by
shortening the "unreserved_keyword" list.

> You have not proven that you can have the same result.

Well, I passed the regression tests, but that does not indeed prove
anything, because these issues are not tested at all.

Maybe you could consider to add the "regression" part of the attached
patcht, which creates a user "boolean" type.

Anyway, my motivation is about "hints" and "advises", and that does not
help a lot to solve these issues.

--
Fabien.

Attachment