Thread: Hook for extensible parsing.

Hook for extensible parsing.

From

Julien Rouhaud

Date:

01 May 2021, 07:24:58

Hi,

Being able to extend core parser has been requested multiple times, and AFAICT
all previous attempts were rejected not because this isn't wanted but because
the proposed implementations required plugins to reimplement all of the core
grammar with their own changes, as bison generated parsers aren't extensible.

I'd like to propose an alternative approach, which is to allow multiple parsers
to coexist, and let third-party parsers optionally fallback on the core
parsers.  I'm sending this now as a follow-up of [1] and to avoid duplicated
efforts, as multiple people are interested in that topic.

Obviously, since this is only about parsing, all modules can only implement
some kind of syntactic sugar, as they have to produce valid parsetrees, but
this could be a first step to later allow custom nodes and let plugins
implement e.g. new UTILITY commands.

So, this approach should allow different custom parser implementations:

1 implement only a few new commands on top of core grammar.  For instance, an
  extension could add support for CREATE [PHYSICAL | LOGICAL] REPLICATION SLOT
  and rewrite that to a SelectStmt on top of the extisting function, or add a
  CREATE HYPOTHETICAL INDEX, which would internally add a new option in
  IndexStmt->options, to be intercepted in processUtility and bypass its
  execution with the extension approach instead.

2 implement a totally different grammar for a different language.  In case of
  error, just silently fallback to core parser (or another hook) so both
  parsers can still be used.  Any language could be parsed as long as you can
  produce a valid postgres parsetree.

3 implement a superuser of core grammar and replace core parser entirely.  This
  could arguably be done like the 1st case, but the idea is to avoid to
  possibly parse the same input string twice, or to forbid the core parser if
  that's somehow wanted.


I'm attaching some POC patches that implement this approach to start a
discussion.  I split the infrastructure part in 2 patches to make it easier to
review, and I'm also adding 2 other patches with a small parser implementation
to be able to test the infrastructure.  Here are some more details on the
patches and implementation details:

0001 simply adds a parser hook, which is called instead of raw_parser.  This is
enough to make multiple parser coexist with one exception: multi-statement
query string.  If multiple statements are provided, then all of them will be
parsed using the same grammar, which obviously won't work if they are written
for different grammars.

0002 implements a lame "sqlol" parser, based on LOLCODE syntax, with only the
ability to produce "select [col, ] col FROM table" parsetree, for testing
purpose.  I chose it to ensure that everything works properly even with a
totally different grammar that has different keywords, which doesn't even ends
statements with a semicolon but a plain keyword.

0003 is where the real modifications are done to allow multi-statement string
to be parsed using different grammar.  It implements a new MODE_SINGLE_QUERY
mode, which is used when a parser_hook is present.  In that case,
pg_parse_query() will only parse part of the query string and loop until
everything is parsed (or some error happens).

pg_parse_query() will instruct plugins to parse a query at a time.  They're
free to ignore that mode if they want to implement the 3rd mode.  If so, they
should either return multiple RawStmt, a single RawStmt with a 0 or
strlen(query_string) stmt_len, or error out.  Otherwise, they will implement
either mode 1 or 2, and they should always return a List containing a single
RawStmt with properly set stmt_len, even if the underlying statement is NULL.
This is required to properly skip valid strings that don't contain a
statements, and pg_parse_query() will skip RawStmt that don't contain an
underlying statement.

It also teaches the core parser to do the same, by optionally start parsing
somewhere in the input string and stop parsing once a valid statement is found.

Note that the whole input string is provided to the parsers in order to report
correct cursor position, so all token can get a correct location.  This means
that raw_parser() signature needs an additional offset to know where the
parsing should start.

Finally, 0004 modifies the sqlol parser to implement the MODE_SINGLE_QUERY
mode, adds grammar for creating views and adds some regression test to validate
proper parsing and error location reporting with multi-statements input string.

As far as I can tell it's all working as expected but I may have missed some
usecases.  The regression tests still work with the additional parser
configured.  The only difference is for pg_stat_statements, as in
MODE_SINGLE_QUERY the trailing semicolon has to be included in the statement,
since other grammars may understand semicolons differently.

The obvious drawback is that it can cause overhead as the same input can be
parsed multiple time.  This could be avoided with plugins implementing a GUC to
enable/disable their parser, so it's only active by default for some
users/database, or requires to be enabled interactively by the client app.

Also, the error messages can also be unhelpful for cases 1 and 2.  If the
custom parser doesn't error out, it means that the syntax errors will be raised
by the core parser based on the core grammar, which will likely point out an
unrelated problem.  Some of that can be avoided by letting the custom parsers
raise errors when they know for sure it's parsing what it's supposed to parse
(there's an example of that in the sqlol parser for qualified_name parsing, as
it can only happen once some specific keywords already matched).  For the rest
of the errors, the only option I can think of is another GUC to let custom
parsers always raise an error (or raise a warning) to help people debug their
queries.

I'll park this patch in the next commitfest so it can be discussed when pg15
development starts.

[1]: https://www.postgresql.org/message-id/20210315164336.ak32whndsxna5mjf@nol

Attachment

Re: Hook for extensible parsing.

From

Julien Rouhaud

Date:

06 June 2021, 06:50:19

On Sat, May 01, 2021 at 03:24:58PM +0800, Julien Rouhaud wrote:
> 
> I'm attaching some POC patches that implement this approach to start a
> discussion.

I just noticed that the cfbot fails with the v1 patch.  Attached v2 that should
fix that.

Attachment

Re: Hook for extensible parsing.

From

Julien Rouhaud

Date:

08 June 2021, 04:16:48

On Sun, Jun 06, 2021 at 02:50:19PM +0800, Julien Rouhaud wrote:
> On Sat, May 01, 2021 at 03:24:58PM +0800, Julien Rouhaud wrote:
> > 
> > I'm attaching some POC patches that implement this approach to start a
> > discussion.
> 
> I just noticed that the cfbot fails with the v1 patch.  Attached v2 that should
> fix that.

The cfbot then revealed a missing dependency in the makefile to generate the
contrib parser, which triggers in make check-world without a previous
make -C contrib.

Thanks a lot to Thomas Munro for getting me the logfile from the failed cfbot
run and the fix!

Attachment

Re: Hook for extensible parsing.

From

Julien Rouhaud

Date:

12 June 2021, 08:29:12

On Tue, Jun 08, 2021 at 12:16:48PM +0800, Julien Rouhaud wrote:
> On Sun, Jun 06, 2021 at 02:50:19PM +0800, Julien Rouhaud wrote:
> > On Sat, May 01, 2021 at 03:24:58PM +0800, Julien Rouhaud wrote:
> > > 
> > > I'm attaching some POC patches that implement this approach to start a
> > > discussion.

The regression tests weren't stable, v4 fixes that.

Attachment

Re: Hook for extensible parsing.

From

Jim Mlodgenski

Date:

06 July 2021, 19:26:16

On Sat, Jun 12, 2021 at 4:29 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

> I'd like to propose an alternative approach, which is to allow multiple parsers
> to coexist, and let third-party parsers optionally fallback on the core
> parsers.  I'm sending this now as a follow-up of [1] and to avoid duplicated
> efforts, as multiple people are interested in that topic.

The patches all build properly and pass all regressions tests.

> pg_parse_query() will instruct plugins to parse a query at a time.  They're
> free to ignore that mode if they want to implement the 3rd mode.  If so, they
> should either return multiple RawStmt, a single RawStmt with a 0 or
> strlen(query_string) stmt_len, or error out.  Otherwise, they will implement
> either mode 1 or 2, and they should always return a List containing a single
> RawStmt with properly set stmt_len, even if the underlying statement is NULL.
> This is required to properly skip valid strings that don't contain a
> statements, and pg_parse_query() will skip RawStmt that don't contain an
> underlying statement.

Wouldn't we want to only loop through the individual statements if parser_hook
exists? The current patch seems to go through the new code path regardless
of the hook being grabbed.

Re: Hook for extensible parsing.

From

Julien Rouhaud

Date:

07 July 2021, 09:25:56

Thanks for the review Jim!

On Wed, Jul 7, 2021 at 3:26 AM Jim Mlodgenski <jimmy76@gmail.com> wrote:
>
> On Sat, Jun 12, 2021 at 4:29 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> The patches all build properly and pass all regressions tests.

Note that the cfbot reports a compilation error on windows.  That's on
the grammar extension part, so I'm really really interested in trying
to fix that for now, as it's mostly a quick POC to demonstrate how one
could implement a different grammar and validate that everything works
as expected.

Also, if this patch is eventually committed and having some code to
experience the hook is wanted it would probably be better to have a
very naive parser (based on a few strcmp() calls or something like
that) to validate the behavior rather than having a real parser.

> > pg_parse_query() will instruct plugins to parse a query at a time.  They're
> > free to ignore that mode if they want to implement the 3rd mode.  If so, they
> > should either return multiple RawStmt, a single RawStmt with a 0 or
> > strlen(query_string) stmt_len, or error out.  Otherwise, they will implement
> > either mode 1 or 2, and they should always return a List containing a single
> > RawStmt with properly set stmt_len, even if the underlying statement is NULL.
> > This is required to properly skip valid strings that don't contain a
> > statements, and pg_parse_query() will skip RawStmt that don't contain an
> > underlying statement.
>
> Wouldn't we want to only loop through the individual statements if parser_hook
> exists? The current patch seems to go through the new code path regardless
> of the hook being grabbed.

I did think about it, but I eventually chose to write it this way.
Having a different code path for the no-hook situation won't make the
with-hook code any easier (it should only remove some check for the
hook in some places that have 2 or 3 other checks already).  On the
other hand, having a single code path avoid some (minimal) code
duplication, and also ensure that the main loop is actively tested
even without the hook being set.  That's not 100% coverage, but it's
better than nothing.  Performance wise, it shouldn't make any
noticeable difference for the no-hook case.

Re: Hook for extensible parsing.

From

Jim Mlodgenski

Date:

07 July 2021, 12:44:56

On Wed, Jul 7, 2021 at 5:26 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> Also, if this patch is eventually committed and having some code to
> experience the hook is wanted it would probably be better to have a
> very naive parser (based on a few strcmp() calls or something like
> that) to validate the behavior rather than having a real parser.
>

The test module is very useful to show how to use the hook but it isn't
very useful to the general user like most other things in contrib. It probably
fits better in src/test/modules

Re: Hook for extensible parsing.

From

Julien Rouhaud

Date:

07 July 2021, 14:19:36

On Wed, Jul 7, 2021 at 8:45 PM Jim Mlodgenski <jimmy76@gmail.com> wrote:
>
> The test module is very useful to show how to use the hook but it isn't
> very useful to the general user like most other things in contrib. It probably
> fits better in src/test/modules

I agree that it's not useful at all to eventually have it as a
contrib, but it's somewhat convenient at this stage to be able to
easily test the hook, possibly with different behavior.

But as I said, if there's an agreement on the approach and the
implementation, I don't think that it would make sense to keep it even
in the src/test/modules.  A full bison parser, even with a limited
grammar, will have about 99% of noise when it comes to demonstrate how
the hook is supposed to work, which basically is having a "single
query" parser or a "full input string" parser.  I'm not even convinced
that flex/bison will be the preferred choice for someone who wants to
implement a custom parser.

I tried to add really thorough comments in the various parts of the
patch to make it clear how to do that and how the system will react
depending on what a hook does.  I also added some protection to catch
inconsistent hook implementation.  I think that's the best way to help
external parser authors to implement what they want, and I'll be happy
to improve the comments if necessary.  But if eventually people would
like to have a real parser in the tree, for testing or guidance, I
will of course take care of doing the required changes and moving the
demo parser in src/test/modules.

Re: Hook for extensible parsing.

From

vignesh C

Date:

22 July 2021, 06:31:34

On Sat, Jun 12, 2021 at 1:59 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Tue, Jun 08, 2021 at 12:16:48PM +0800, Julien Rouhaud wrote:
> > On Sun, Jun 06, 2021 at 02:50:19PM +0800, Julien Rouhaud wrote:
> > > On Sat, May 01, 2021 at 03:24:58PM +0800, Julien Rouhaud wrote:
> > > >
> > > > I'm attaching some POC patches that implement this approach to start a
> > > > discussion.
>
> The regression tests weren't stable, v4 fixes that.

1) CFBOT showed the following compilation errors in windows:
"C:\projects\postgresql\pgsql.sln" (default target) (1) ->
"C:\projects\postgresql\sqlol.vcxproj" (default target) (69) ->
(ClCompile target) ->
 c1 : fatal error C1083: Cannot open source file:
'contrib/sqlol/sqlol_gram.c': No such file or directory
[C:\projects\postgresql\sqlol.vcxproj]
 c:\projects\postgresql\contrib\sqlol\sqlol_gramparse.h(25): fatal
error C1083: Cannot open include file: 'sqlol_gram.h': No such file or
directory (contrib/sqlol/sqlol.c)
[C:\projects\postgresql\sqlol.vcxproj]
 c:\projects\postgresql\contrib\sqlol\sqlol_gramparse.h(25): fatal
error C1083: Cannot open include file: 'sqlol_gram.h': No such file or
directory (contrib/sqlol/sqlol_keywords.c)
[C:\projects\postgresql\sqlol.vcxproj]
c1 : fatal error C1083: Cannot open source file:
'contrib/sqlol/sqlol_scan.c': No such file or directory
[C:\projects\postgresql\sqlol.vcxproj]
0 Warning(s)
4 Error(s)
6123
6124Time Elapsed 00:05:40.23
6125

2) There was one small whitespace error with the patch:
git am v4-0002-Add-a-sqlol-parser.patch
Applying: Add a sqlol parser.
.git/rebase-apply/patch:818: new blank line at EOF.
+
warning: 1 line adds whitespace errors.

Regards,
Vignesh

Re: Hook for extensible parsing.

From

Julien Rouhaud

Date:

22 July 2021, 07:04:19

On Thu, Jul 22, 2021 at 12:01:34PM +0530, vignesh C wrote:
> 
> 1) CFBOT showed the following compilation errors in windows:

Thanks for looking at it.  I'm aware of this issue on windows, but as mentioned
in the thread the new contrib is there to demonstrate how the new
infrastructure works.  If there were some interest in pushing the patch, I
don't think that we would add a full bison parser, whether it's in contrib or
test modules.

So unless there's a clear indication from a committer that we would want to
integrate such a parser, or if someone is interested in reviewing the patch and
only has a windows machine, I don't plan to spend time trying to fix a windows
only problem for something that will disappear anyway.

> 2) There was one small whitespace error with the patch:
> git am v4-0002-Add-a-sqlol-parser.patch
> Applying: Add a sqlol parser.
> .git/rebase-apply/patch:818: new blank line at EOF.
> +
> warning: 1 line adds whitespace errors.

Indeed, there's a trailing empty line in contrib/sqlol/sqlol_keywords.c.  I
fixed it locally, but as I said this module is most certainly going to
disappear so I'm not sending an updating patch right now.

Re: Hook for extensible parsing.

From

Julien Rouhaud

Date:

22 August 2021, 08:16:51

On Thu, Jul 22, 2021 at 03:04:19PM +0800, Julien Rouhaud wrote:
> On Thu, Jul 22, 2021 at 12:01:34PM +0530, vignesh C wrote:
> > 
> > 1) CFBOT showed the following compilation errors in windows:
> 
> Thanks for looking at it.  I'm aware of this issue on windows, but as mentioned
> in the thread the new contrib is there to demonstrate how the new
> infrastructure works.  If there were some interest in pushing the patch, I
> don't think that we would add a full bison parser, whether it's in contrib or
> test modules.
> 
> So unless there's a clear indication from a committer that we would want to
> integrate such a parser, or if someone is interested in reviewing the patch and
> only has a windows machine, I don't plan to spend time trying to fix a windows
> only problem for something that will disappear anyway.

I'm not sure what changed in the Windows part of the cfbot, but somehow it's
not hitting any compilation error anymore and all the tests are now green.

Re: Hook for extensible parsing.

From

Julien Rouhaud

Date:

15 September 2021, 08:50:12

v5 attached, fixing conflict with 639a86e36a (Remove Value node struct)

Attachment

Re: Hook for extensible parsing.

From

Simon Riggs

Date:

15 September 2021, 13:25:17

On Sat, 1 May 2021 at 08:24, Julien Rouhaud <rjuju123@gmail.com> wrote:

> Being able to extend core parser has been requested multiple times, and AFAICT
> all previous attempts were rejected not because this isn't wanted but because
> the proposed implementations required plugins to reimplement all of the core
> grammar with their own changes, as bison generated parsers aren't extensible.
>
> I'd like to propose an alternative approach, which is to allow multiple parsers
> to coexist, and let third-party parsers optionally fallback on the core
> parsers.

Yes, that approach has been discussed by many people, most recently
around the idea to create a fast-path grammar to make the most
frequently used SQL parse faster.

> 0002 implements a lame "sqlol" parser, based on LOLCODE syntax, with only the
> ability to produce "select [col, ] col FROM table" parsetree, for testing
> purpose.  I chose it to ensure that everything works properly even with a
> totally different grammar that has different keywords, which doesn't even ends
> statements with a semicolon but a plain keyword.

The general rule has always been that we don't just put hooks in, we
always require an in-core use for those hooks. I was reminded of that
myself recently.

What we need is something in core that actually makes use of this. The
reason for that is not politics, but a simple test of whether the
feature makes sense AND includes all required bells and whistles to be
useful in the real world.

Core doesn't need a LOL parser and I don't think we should commit that.

If we do this, I think it should have CREATE LANGUAGE support, so that
each plugin can be seen as an in-core object and allow security around
which users can execute which language types, allow us to switch
between languages and have default languages for specific users or
databases.

--
Simon Riggs                http://www.EnterpriseDB.com/

Re: Hook for extensible parsing.

From

Jim Mlodgenski

Date:

15 September 2021, 14:14:38

On Wed, Sep 15, 2021 at 9:25 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
>
> The general rule has always been that we don't just put hooks in, we
> always require an in-core use for those hooks. I was reminded of that
> myself recently.
>
That's not historically what has happened. There are several hooks with
no in core use such as emit_log_hook and ExplainOneQuery_hook. The recent
openssl_tls_init_hook only has a usage in src/test/modules

> What we need is something in core that actually makes use of this. The
> reason for that is not politics, but a simple test of whether the
> feature makes sense AND includes all required bells and whistles to be
> useful in the real world.
>
Agreed. There should be something in src/test/modules to exercise this
but probably more to flush things out. Maybe extending adminpack to use
this so if enabled, it can use syntax like:
FILE READ 'foo.txt'

> Core doesn't need a LOL parser and I don't think we should commit that.
>
+1

> If we do this, I think it should have CREATE LANGUAGE support, so that
> each plugin can be seen as an in-core object and allow security around
> which users can execute which language types, allow us to switch
> between languages and have default languages for specific users or
> databases.
>
This hook allows extension developers to supplement syntax in addition
to adding a whole new language allowing the extension to appear more
native to the end user. For example, pglogical could use this to add
syntax to do a CREATE NODE instead of calling the function create_node.
Adding CREATE LANGUAGE support around this would just be for a narrow
set of use cases where a new language is added.

Re: Hook for extensible parsing.

From

Julien Rouhaud

Date:

15 September 2021, 14:55:17

On Wed, Sep 15, 2021 at 10:14 PM Jim Mlodgenski <jimmy76@gmail.com> wrote:
>
> On Wed, Sep 15, 2021 at 9:25 AM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> >
> > The general rule has always been that we don't just put hooks in, we
> > always require an in-core use for those hooks. I was reminded of that
> > myself recently.
> >
> That's not historically what has happened. There are several hooks with
> no in core use such as emit_log_hook and ExplainOneQuery_hook. The recent
> openssl_tls_init_hook only has a usage in src/test/modules

Yes, I also think that it's not a strict requirement that all hooks
have a caller in the core, even if it's obviously better if that's the
case.

> > What we need is something in core that actually makes use of this. The
> > reason for that is not politics, but a simple test of whether the
> > feature makes sense AND includes all required bells and whistles to be
> > useful in the real world.
> >
> Agreed. There should be something in src/test/modules to exercise this
> but probably more to flush things out. Maybe extending adminpack to use
> this so if enabled, it can use syntax like:
> FILE READ 'foo.txt'

For this hook, maintaining a real alternative parser seems like way
too much trouble to justify an in-core user.  The fact that many
people have asked for such a feature over the year should be enough to
justify the use case.  We could try to invent some artificial need
like the one you suggest for adminpack, but it also feels like a waste
of resources.

As far as I'm concerned a naive strcmp-based parser in
src/test/modules should be enough to validate that the hook is
working, there's no need for more.  In any case if the only
requirement for it to be committed is to write a real parser, whether
in contrib or src/test/modules, I'll be happy to do it.

> > Core doesn't need a LOL parser and I don't think we should commit that.
> >
> +1

I entirely agree, and I repeatedly mentioned in that thread that I did
*not* want to add this parser in core.  The only purpose of patches
0002 and 0004 is to make the third-party bison based parser
requirements less abstract, and demonstrate that this approach can
successfully make two *radically different* parsers cohabit.

> > If we do this, I think it should have CREATE LANGUAGE support, so that
> > each plugin can be seen as an in-core object and allow security around
> > which users can execute which language types, allow us to switch
> > between languages and have default languages for specific users or
> > databases.
> >
> This hook allows extension developers to supplement syntax in addition
> to adding a whole new language allowing the extension to appear more
> native to the end user. For example, pglogical could use this to add
> syntax to do a CREATE NODE instead of calling the function create_node.
> Adding CREATE LANGUAGE support around this would just be for a narrow
> set of use cases where a new language is added.

Yes, this hook can be used to implement multiple things as I mentioned
my initial email.  Additionally, if this is eventually committed I'd
like to add support for CREATE HYPOTHETICAL INDEX grammar in hypopg.
Such a parser would only support one command (that extends an existing
one), so it can't really be called a language.  Of course if would be
better to have the core parser accept a CREATE [ HYPOTHETICAL ] INDEX
and setup a flag so that third-parrty module can intercept this
utility command, but until that happens I could provide that syntactic
sugar for my users as long as I'm motivated enough to write this
parser.

Also, a hook based approach is still compatible with per database /
role configuration.  It can be done either via specific
session_preload_libraries, or via a custom GUC if for some reason the
module requires to be in shared_preload_libraries.

Re: Hook for extensible parsing.

From

David Fetter

Date:

15 September 2021, 14:58:23

On Wed, Sep 15, 2021 at 02:25:17PM +0100, Simon Riggs wrote:
> On Sat, 1 May 2021 at 08:24, Julien Rouhaud <rjuju123@gmail.com> wrote:
> 
> > Being able to extend core parser has been requested multiple times, and AFAICT
> > all previous attempts were rejected not because this isn't wanted but because
> > the proposed implementations required plugins to reimplement all of the core
> > grammar with their own changes, as bison generated parsers aren't extensible.
> >
> > I'd like to propose an alternative approach, which is to allow multiple parsers
> > to coexist, and let third-party parsers optionally fallback on the core
> > parsers.
> 
> Yes, that approach has been discussed by many people, most recently
> around the idea to create a fast-path grammar to make the most
> frequently used SQL parse faster.
> 
> > 0002 implements a lame "sqlol" parser, based on LOLCODE syntax, with only the
> > ability to produce "select [col, ] col FROM table" parsetree, for testing
> > purpose.  I chose it to ensure that everything works properly even with a
> > totally different grammar that has different keywords, which doesn't even ends
> > statements with a semicolon but a plain keyword.
> 
> The general rule has always been that we don't just put hooks in, we
> always require an in-core use for those hooks. I was reminded of that
> myself recently.
> 
> What we need is something in core that actually makes use of this. The
> reason for that is not politics, but a simple test of whether the
> feature makes sense AND includes all required bells and whistles to be
> useful in the real world.
> 
> Core doesn't need a LOL parser and I don't think we should commit that.

It doesn't, but it very likely needs something people can use when
they create a new table AM, and that we should use the hook in core to
implement the heap* table AM to make sure the thing is working at DDL
time.

> If we do this, I think it should have CREATE LANGUAGE support, so
> that each plugin can be seen as an in-core object and allow security
> around which users can execute which language types, allow us to
> switch between languages and have default languages for specific
> users or databases.

That's a great idea, but I must be missing something important as it
relates to parser hooks. Could you connect those a little more
explicitly?

Best,
David.

* It's not actually a heap in the sense that the term is normally used
in computing. I'd love to find out how it got to have this name and
document same so others aren't also left wondering.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Hook for extensible parsing.

From

Pavel Stehule

Date:

15 September 2021, 15:08:29

st 15. 9. 2021 v 16:55 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:

On Wed, Sep 15, 2021 at 10:14 PM Jim Mlodgenski <jimmy76@gmail.com> wrote:
>
> On Wed, Sep 15, 2021 at 9:25 AM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> >
> > The general rule has always been that we don't just put hooks in, we
> > always require an in-core use for those hooks. I was reminded of that
> > myself recently.
> >
> That's not historically what has happened. There are several hooks with
> no in core use such as emit_log_hook and ExplainOneQuery_hook. The recent
> openssl_tls_init_hook only has a usage in src/test/modules

Yes, I also think that it's not a strict requirement that all hooks
have a caller in the core, even if it's obviously better if that's the
case.

> > What we need is something in core that actually makes use of this. The
> > reason for that is not politics, but a simple test of whether the
> > feature makes sense AND includes all required bells and whistles to be
> > useful in the real world.
> >
> Agreed. There should be something in src/test/modules to exercise this
> but probably more to flush things out. Maybe extending adminpack to use
> this so if enabled, it can use syntax like:
> FILE READ 'foo.txt'

For this hook, maintaining a real alternative parser seems like way
too much trouble to justify an in-core user. The fact that many
people have asked for such a feature over the year should be enough to
justify the use case. We could try to invent some artificial need
like the one you suggest for adminpack, but it also feels like a waste
of resources.

As far as I'm concerned a naive strcmp-based parser in
src/test/modules should be enough to validate that the hook is
working, there's no need for more. In any case if the only
requirement for it to be committed is to write a real parser, whether
in contrib or src/test/modules, I'll be happy to do it.

> > Core doesn't need a LOL parser and I don't think we should commit that.
> >
> +1

I entirely agree, and I repeatedly mentioned in that thread that I did
*not* want to add this parser in core. The only purpose of patches
0002 and 0004 is to make the third-party bison based parser
requirements less abstract, and demonstrate that this approach can
successfully make two *radically different* parsers cohabit.

> > If we do this, I think it should have CREATE LANGUAGE support, so that
> > each plugin can be seen as an in-core object and allow security around
> > which users can execute which language types, allow us to switch
> > between languages and have default languages for specific users or
> > databases.
> >
> This hook allows extension developers to supplement syntax in addition
> to adding a whole new language allowing the extension to appear more
> native to the end user. For example, pglogical could use this to add
> syntax to do a CREATE NODE instead of calling the function create_node.
> Adding CREATE LANGUAGE support around this would just be for a narrow
> set of use cases where a new language is added.

Yes, this hook can be used to implement multiple things as I mentioned
my initial email. Additionally, if this is eventually committed I'd
like to add support for CREATE HYPOTHETICAL INDEX grammar in hypopg.
Such a parser would only support one command (that extends an existing
one), so it can't really be called a language. Of course if would be
better to have the core parser accept a CREATE [ HYPOTHETICAL ] INDEX
and setup a flag so that third-parrty module can intercept this
utility command, but until that happens I could provide that syntactic
sugar for my users as long as I'm motivated enough to write this
parser.

There were nice stream databases, but that ended because maintaining a fork is too expensive. And without direct SQL (without possibility of parser enhancing), the commands based on function call API were not readable and workable flexible like SQL. Sometimes we really don't want to replace PostgreSQL, but just enhance the main interface for extensions.

Also, a hook based approach is still compatible with per database /
role configuration. It can be done either via specific
session_preload_libraries, or via a custom GUC if for some reason the
module requires to be in shared_preload_libraries.

Re: Hook for extensible parsing.

From

Tom Lane

Date:

15 September 2021, 15:26:56

Jim Mlodgenski <jimmy76@gmail.com> writes:
> On Wed, Sep 15, 2021 at 9:25 AM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
>> The general rule has always been that we don't just put hooks in, we
>> always require an in-core use for those hooks. I was reminded of that
>> myself recently.

> That's not historically what has happened. There are several hooks with
> no in core use such as emit_log_hook and ExplainOneQuery_hook.

Yeah.  I think the proper expectation is that there be a sufficiently
worked-out example to convince us that the proposed hooks have real-world
usefulness, and are not missing any basic requirements to make them do
something useful.  Whether the example ends up in our tree is a
case-by-case decision.

In the case at hand, what's troubling me is that I don't see any
particular use in merely substituting a new bison grammar, if it
still has to produce parse trees that the rest of the system will
understand.  Yeah, you could make some very simple surface-syntax
changes that way, but it doesn't seem like you could do anything
interesting (like, say, support Oracle-style outer join syntax).
AFAICS, to get to a useful feature, you'd then need to invent an
extensible Node system (which'd be hugely invasive if it's feasible
at all), and then probably more things on top of that.  So I'm not
convinced that you've demonstrated any real-world usefulness.

            regards, tom lane

Re: Hook for extensible parsing.

From

Julien Rouhaud

Date:

15 September 2021, 16:13:35

On Wed, Sep 15, 2021 at 11:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> In the case at hand, what's troubling me is that I don't see any
> particular use in merely substituting a new bison grammar, if it
> still has to produce parse trees that the rest of the system will
> understand.  Yeah, you could make some very simple surface-syntax
> changes that way, but it doesn't seem like you could do anything
> interesting (like, say, support Oracle-style outer join syntax).
> AFAICS, to get to a useful feature, you'd then need to invent an
> extensible Node system (which'd be hugely invasive if it's feasible
> at all), and then probably more things on top of that.  So I'm not
> convinced that you've demonstrated any real-world usefulness.

I agree that this patchset can only implement syntactic sugars,
nothing more (although for utility command you can do a bit more than
that).  But that's already something people can use, mostly for
migration to postgres use cases probably.

I'm not sure why you couldn't implement an Oracle-style outer join
with such a hook?  The requirement is that the parser can't leak any
node that the rest of the system doesn't know about, but you can do
what you want inside the parser.  And as far as I can see we already
have an extensible node since bcac23de73b, so it seems to me that
there's enough infrastructure to handle this kind of use case.

The main downside is that you'll have to make a first pass to
transform your "custom raw statement" into a valid RawStmt in your
parser, and the system will do another one to transform it in a Query.
But apart from that it should work.  Am I missing something?

Re: Hook for extensible parsing.

From

Tom Lane

Date:

15 September 2021, 16:57:00

Julien Rouhaud <rjuju123@gmail.com> writes:
> I'm not sure why you couldn't implement an Oracle-style outer join
> with such a hook?

Try it.

> The requirement is that the parser can't leak any
> node that the rest of the system doesn't know about, but you can do
> what you want inside the parser.

That's not what the patch actually does, though.  It only replaces
the grammar, not semantic analysis.  So you couldn't associate the
(+)-decorated WHERE clause with the appropriate join.  (And no,
I will not accept that it's okay to perform catalog lookups in
the grammar to get around that.  See comment at the head of gram.y.)

In general, I'm having a hard time believing that anything very
interesting can be done at only the grammar level without changing
the parse analysis phase.  That's not unrelated to the restriction
that the grammar can't do catalog accesses.  Maybe with some fundamental
restructuring, we could get around that issue ... but this patch isn't
doing any fundamental restructuring, it's just putting a hook where it's
easy to do so.  We've often found that such hooks aren't as useful as
they initially seem.

            regards, tom lane

Re: Hook for extensible parsing.

From

Julien Rouhaud

Date:

15 September 2021, 17:23:42

On Thu, Sep 16, 2021 at 12:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > The requirement is that the parser can't leak any
> > node that the rest of the system doesn't know about, but you can do
> > what you want inside the parser.
>
> That's not what the patch actually does, though.  It only replaces
> the grammar, not semantic analysis.  So you couldn't associate the
> (+)-decorated WHERE clause with the appropriate join.  (And no,
> I will not accept that it's okay to perform catalog lookups in
> the grammar to get around that.  See comment at the head of gram.y.)

I never said that one should do catalog lookup for that?  What I said
is that you can do a specific semantic analysis pass in the hook if
you know that you can have extensible nodes in your parsetree, and you
can do that with that hook unless I'm missing something?

Yes that's not ideal, but I don't see how it can be worse than writing
some middleware that parses the query, rewrite it to postgres style
sql on the fly so that postgres can parse it again.  I'm also not sure
how the semantic analysis could be made generally extensible, if
possible at all, so that's the best I can propose.

If that approach is a deal breaker then fine I can accept it.

Re: Hook for extensible parsing.

From

Julien Rouhaud

Date:

15 September 2021, 17:44:52

On Thu, Sep 16, 2021 at 1:23 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Thu, Sep 16, 2021 at 12:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > > The requirement is that the parser can't leak any
> > > node that the rest of the system doesn't know about, but you can do
> > > what you want inside the parser.
> >
> > That's not what the patch actually does, though.  It only replaces
> > the grammar, not semantic analysis.  So you couldn't associate the
> > (+)-decorated WHERE clause with the appropriate join.  (And no,
> > I will not accept that it's okay to perform catalog lookups in
> > the grammar to get around that.  See comment at the head of gram.y.)
>
> I never said that one should do catalog lookup for that?  What I said
> is that you can do a specific semantic analysis pass in the hook if
> you know that you can have extensible nodes in your parsetree, and you
> can do that with that hook unless I'm missing something?

Ah, now that I think more about it I think that you're talking about
unqualified fields?  I was naively assuming that those wouldn't be
allowed by oracle, but I guess that wishful thinking.

Re: Hook for extensible parsing.

From

Andres Freund

Date:

15 September 2021, 19:55:06

Hi,

On 2021-09-15 12:57:00 -0400, Tom Lane wrote:
> That's not what the patch actually does, though.  It only replaces
> the grammar, not semantic analysis.  So you couldn't associate the
> (+)-decorated WHERE clause with the appropriate join.  (And no,
> I will not accept that it's okay to perform catalog lookups in
> the grammar to get around that.  See comment at the head of gram.y.)

> In general, I'm having a hard time believing that anything very
> interesting can be done at only the grammar level without changing
> the parse analysis phase.  That's not unrelated to the restriction
> that the grammar can't do catalog accesses.  Maybe with some fundamental
> restructuring, we could get around that issue ... but this patch isn't
> doing any fundamental restructuring, it's just putting a hook where it's
> easy to do so.  We've often found that such hooks aren't as useful as
> they initially seem.

Agreed - it doesn't make sense to me to have a hook that only replaces raw
parsing, without also hooking into parse-analysis. ISTM that the least a
patchset going for a parser hook would have to do is to do sufficient
restructuring so that one could hook together into both raw parsing and
analysis. It could still be two callbacks, but perhaps we'd ensure that
they're both set.

Greetings,

Andres Freund

Re: Hook for extensible parsing.

From

"Jonah H. Harris"

Date:

15 September 2021, 20:35:53

On Wed, Sep 15, 2021 at 3:55 PM Andres Freund <andres@anarazel.de> wrote:

On 2021-09-15 12:57:00 -0400, Tom Lane wrote:
> That's not what the patch actually does, though. It only replaces
> the grammar, not semantic analysis. So you couldn't associate the
> (+)-decorated WHERE clause with the appropriate join. (And no,
> I will not accept that it's okay to perform catalog lookups in
> the grammar to get around that. See comment at the head of gram.y.)

> In general, I'm having a hard time believing that anything very
> interesting can be done at only the grammar level without changing
> the parse analysis phase. That's not unrelated to the restriction
> that the grammar can't do catalog accesses. Maybe with some fundamental
> restructuring, we could get around that issue ... but this patch isn't
> doing any fundamental restructuring, it's just putting a hook where it's
> easy to do so. We've often found that such hooks aren't as useful as
> they initially seem.

Agreed - it doesn't make sense to me to have a hook that only replaces raw
parsing, without also hooking into parse-analysis. ISTM that the least a
patchset going for a parser hook would have to do is to do sufficient
restructuring so that one could hook together into both raw parsing and
analysis. It could still be two callbacks, but perhaps we'd ensure that
they're both set.

This is a bad example as it doesn't require semantic analysis from Postgres. While most of the tools out there tend to do simple replacement, this can be done within a custom parser by simply walking its own AST, evaluating join conditions against the expression, and rewriting the join accordingly. Or, do you have an example that couldn't be done this way within a custom parser?

Jonah H. Harris

Re: Hook for extensible parsing.

From

Andres Freund

Date:

15 September 2021, 20:41:50

Hi,

On 2021-09-15 16:35:53 -0400, Jonah H. Harris wrote:
> On Wed, Sep 15, 2021 at 3:55 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2021-09-15 12:57:00 -0400, Tom Lane wrote:
> > Agreed - it doesn't make sense to me to have a hook that only replaces raw
> > parsing, without also hooking into parse-analysis. ISTM that the least a
> > patchset going for a parser hook would have to do is to do sufficient
> > restructuring so that one could hook together into both raw parsing and
> > analysis. It could still be two callbacks, but perhaps we'd ensure that
> > they're both set.
> >
> 
> This is a bad example as it doesn't require semantic analysis from
> Postgres.

"it"? I assume you mean a different type of join? If so, I'm highly doubtful -
without semantic analysis you can't really handle column references.


> While most of the tools out there tend to do simple replacement,
> this can be done within a custom parser by simply walking its own AST,
> evaluating join conditions against the expression, and rewriting the join
> accordingly. Or, do you have an example that couldn't be done this way
> within a custom parser?

You cannot just "evaluate conditions" in a raw parse tree... You don't even
know what things are functions, columns etc, nor to what relation a column
belongs.

Greetings,

Andres Freund

Re: Hook for extensible parsing.

From

Tom Lane

Date:

15 September 2021, 20:51:37

Andres Freund <andres@anarazel.de> writes:
> Agreed - it doesn't make sense to me to have a hook that only replaces raw
> parsing, without also hooking into parse-analysis. ISTM that the least a
> patchset going for a parser hook would have to do is to do sufficient
> restructuring so that one could hook together into both raw parsing and
> analysis. It could still be two callbacks, but perhaps we'd ensure that
> they're both set.

The other problem here is that a simple call-this-instead-of-that
top-level hook doesn't seem all that useful anyway, because it leaves
you with the task of duplicating a huge amount of functionality that
you're then going to make some tweaks within.  That's already an issue
when you're just thinking about the grammar, and if you have to buy
into it for parse analysis too, I doubt that it's going to be very
practical.  If, say, you'd like to support some weird function that
requires special parsing and analysis rules, I don't see how you get
that out of this without first duplicating a very large fraction of
src/backend/parser/.

(As a comparison point, we do have a top-level hook for replacing
the planner; but I have never heard of anyone actually doing so.
There are people using that hook to *wrap* the planner with some
before-and-after processing, which is quite a different thing.)

I don't have any better ideas to offer :-( ... but I very much fear
that the approach proposed here is a dead end.

            regards, tom lane

Re: Hook for extensible parsing.

From

Andres Freund

Date:

15 September 2021, 21:40:19

Hi,

On 2021-09-15 16:51:37 -0400, Tom Lane wrote:
> The other problem here is that a simple call-this-instead-of-that
> top-level hook doesn't seem all that useful anyway, because it leaves
> you with the task of duplicating a huge amount of functionality that
> you're then going to make some tweaks within.  That's already an issue
> when you're just thinking about the grammar, and if you have to buy
> into it for parse analysis too, I doubt that it's going to be very
> practical.  If, say, you'd like to support some weird function that
> requires special parsing and analysis rules, I don't see how you get
> that out of this without first duplicating a very large fraction of
> src/backend/parser/.

We do have a small amount of infrastructure around this - the hackery that
plpgsql uses. That's not going to help you with everything, but I think it
should be be enough to recognize e.g. additional top-level
statements. Obviously not enough to intercept parsing deeper into a statement,
but at least something.

And parse-analysis for some types of things will be doable with the current
infrastructure, by e.g. handling the new top-level statement in the hook, and
then passing the buck to the normal parse analysis for e.g. expressions in
that.

Obviously not going to get you that far...

> (As a comparison point, we do have a top-level hook for replacing
> the planner; but I have never heard of anyone actually doing so.
> There are people using that hook to *wrap* the planner with some
> before-and-after processing, which is quite a different thing.)

Citus IIRC has some paths that do not end up calling into the standard
planner, but only for a few simplistic cases.

> I don't have any better ideas to offer :-( ... but I very much fear
> that the approach proposed here is a dead end.

I unfortunately don't see a good way forward without changing the way we do
parsing on a more fundamental level :(.

Greetings,

Andres Freund

Re: Hook for extensible parsing.

From

Julien Rouhaud

Date:

16 September 2021, 04:33:21

On Thu, Sep 16, 2021 at 5:40 AM Andres Freund <andres@anarazel.de> wrote:
>
> > I don't have any better ideas to offer :-( ... but I very much fear
> > that the approach proposed here is a dead end.
>
> I unfortunately don't see a good way forward without changing the way we do
> parsing on a more fundamental level :(.

Using the ExtensibleNode infrastructure, I can see two ways to try to
leverage that.

First one would be to require modules to wrap their RawStmt->stmt in
an ExtensibleNode if they want to do anything that requires semantic
analysis, and handle such node in transformStmt() with another hook.
I think it would allow modules to do pretty much anything, at the cost
of walking the stmt twice and duplicating possibly huge amount of
analyze.c and friends.

The other one would be to allow the parser to leak ExtensibleNode in
the middle of the RawStmt and catch them in the transform* functions,
with e.g. some generic transformExtensibleNode(pstate, node,
some_identifier...) (the identifier giving both the general transform
action and some secondary info, like ParseExprKind for expressions).
This would avoid the downsides of the first approach, but would
require to call this new hook in a bunch of places.

Or we could combine both approaches so that the most common use cases,
like transformExprRecurse(), would be easily handled while more exotic
cases will have to go the hard way.  Parser authors could still ask
for adding a new call to this new hook to ease their work in the next
major version.

Would any of that be a reasonable approach?

Re: Hook for extensible parsing.

From

Simon Riggs

Date:

23 September 2021, 06:37:27

On Thu, 16 Sept 2021 at 05:33, Julien Rouhaud <rjuju123@gmail.com> wrote:

> Would any of that be a reasonable approach?

The way I summarize all of the above is that
1) nobody is fundamentally opposed to the idea
2) we just need to find real-world example(s) and show that any
associated in-core patch provides all that is needed in a clean way,
since that point is currently in-doubt by senior committers.

So what is needed is some actual prototypes that explore this. I guess
that means they have to be open source, but those examples could be
under a different licence, as long as the in-core patch is clearly a
project submission to PostgreSQL.

I presume a few real-world examples could be:
* Grammar extensions to support additional syntax for Greenplum, Citus, XL
* A grammar that adds commands for an extension, such as pglogical
(Jim's example)
* A strict SQL Standard grammar/parser
* GQL implementation

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Hook for extensible parsing.

From

Julien Rouhaud

Date:

23 September 2021, 13:31:58

On Thu, Sep 23, 2021 at 07:37:27AM +0100, Simon Riggs wrote:
> On Thu, 16 Sept 2021 at 05:33, Julien Rouhaud <rjuju123@gmail.com> wrote:
> 
> > Would any of that be a reasonable approach?
> 
> The way I summarize all of the above is that
> 1) nobody is fundamentally opposed to the idea
> 2) we just need to find real-world example(s) and show that any
> associated in-core patch provides all that is needed in a clean way,
> since that point is currently in-doubt by senior committers.
> 
> So what is needed is some actual prototypes that explore this. I guess
> that means they have to be open source, but those examples could be
> under a different licence, as long as the in-core patch is clearly a
> project submission to PostgreSQL.
> 
> I presume a few real-world examples could be:
> * Grammar extensions to support additional syntax for Greenplum, Citus, XL
> * A grammar that adds commands for an extension, such as pglogical
> (Jim's example)
> * A strict SQL Standard grammar/parser
> * GQL implementation

As I mentioned, there's at least one use case that would work with that
approach that I will be happy to code in hypopg, which is an open source
project.  As a quick prototype, here's a basic overview of how I can use this
hook to implement a CREATE HYPOTHETICAL INDEX command:

rjuju=# LOAD 'hypopg';
LOAD
rjuju=# create hypothetical index meh on t1(id);
CREATE INDEX
rjuju=# explain select * from t1 where id = 1;
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Index Scan using "<13543>btree_t1_id" on t1  (cost=0.04..8.05 rows=1 width=13)
   Index Cond: (id = 1)
(2 rows)

rjuju=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 id     | integer |           |          |
 val    | text    |           |          |

My POC's grammar is only like:

CREATE HYPOTHETICAL INDEX opt_index_name ON relation_expr '(' index_params ')'
    {
        IndexStmt *n = makeNode(IndexStmt);
        n->idxname = $4;
        n->relation = $6;
        n->accessMethod = DEFAULT_INDEX_TYPE;
        n->indexParams = $8;
        n->options = list_make1(makeDefElem("hypothetical", NULL, -1));
        $$ = (Node *) n;
    }

as I'm not willing to import the whole CREATE INDEX grammar for now for a patch
that may be rejected.  I can however publish this POC if that helps.  Note
that once my parser returns this parse tree, all I need to do is intercept
IndexStmt containing this option in a utility_hook and run my code rather than
a plain DefineIndex(), which works as intended as I showed above.

One could easily imagine similar usage to extend existing commands, like
implementing a new syntax on top of CREATE TABLE to implement an automatic
partition creation grammar (which would return multiple CreateStmt),
or even a partition manager.

I'm not an expert in other RDBMS syntax, but maybe you could use such a
hook to implement SQL Server or mysql syntax, which use at least different
quoting rules.  Maybe Amazon people could confirm that as it looks like they
implemented an SQL Server parser using a similar hook?

So yes you can't create new commands or implement grammars that require
additional semantic analysis with this hook, but I think that there are still
real use cases that can be implemented using only a different parser.

Re: Hook for extensible parsing.

From

Tom Lane

Date:

23 September 2021, 14:21:20

Julien Rouhaud <rjuju123@gmail.com> writes:
> My POC's grammar is only like:

> CREATE HYPOTHETICAL INDEX opt_index_name ON relation_expr '(' index_params ')'
>     {
>         IndexStmt *n = makeNode(IndexStmt);
>         n->idxname = $4;
>         n->relation = $6;
>         n->accessMethod = DEFAULT_INDEX_TYPE;
>         n->indexParams = $8;
>         n->options = list_make1(makeDefElem("hypothetical", NULL, -1));
>         $$ = (Node *) n;
>     }

I'm not too impressed by this example, because there seems little
reason why you couldn't just define "hypothetical" as an index_param
option, and not need to touch the grammar at all.

> as I'm not willing to import the whole CREATE INDEX grammar for now for a patch
> that may be rejected.

The fact that that's so daunting seems to me to be a perfect illustration
of the problems with this concept.  Doing anything interesting with a
hook like this will create a maintenance nightmare, because you'll have
to duplicate (and track every change in) large swaths of gram.y.  To the
extent that you fail to, say, match every detail of the core's expression
grammar, you'll be creating a crappy user experience.

> that once my parser returns this parse tree, all I need to do is intercept
> IndexStmt containing this option in a utility_hook and run my code rather than
> a plain DefineIndex(), which works as intended as I showed above.

And I'm even less impressed by the idea of half a dozen extensions
each adding its own overhead to the parser and also to ProcessUtility
so that they can process statements in this klugy, highly-restricted
way.

I do have sympathy for the idea that extensions would like to define
their own statement types.  I just don't see a practical way to do it
in our existing parser infrastructure.  This patch certainly doesn't
offer that.

            regards, tom lane

Re: Hook for extensible parsing.

From

Julien Rouhaud

Date:

24 September 2021, 06:33:59

On Thu, Sep 23, 2021 at 10:21:20AM -0400, Tom Lane wrote:
> 
> I do have sympathy for the idea that extensions would like to define
> their own statement types.  I just don't see a practical way to do it
> in our existing parser infrastructure.  This patch certainly doesn't
> offer that.

Allowing extensions to define their own (utility) statement type is just a
matter of allowing ExtensibleNode as top level statement.  As far as I can
see the only change required for that is to give those a specific command tag
in CreateCommandTag(), since transformStmt() default to emitting a utility
command.  You can then easily intercept such statement in the utility hook and
fetch your custom struct.

I could do that but I'm assuming that you still wouldn't be satisfied as
custom parser would still be needed, whihc may or may not require to
copy/paste chunks of the core grammar?

If so, do you have any suggestion for an approach you would accept?

Re: Hook for extensible parsing.

From

Julien Rouhaud

Date:

17 January 2022, 00:33:34

Hi,

On Fri, Sep 24, 2021 at 02:33:59PM +0800, Julien Rouhaud wrote:
> On Thu, Sep 23, 2021 at 10:21:20AM -0400, Tom Lane wrote:
> > 
> > I do have sympathy for the idea that extensions would like to define
> > their own statement types.  I just don't see a practical way to do it
> > in our existing parser infrastructure.  This patch certainly doesn't
> > offer that.
> 
> Allowing extensions to define their own (utility) statement type is just a
> matter of allowing ExtensibleNode as top level statement.  As far as I can
> see the only change required for that is to give those a specific command tag
> in CreateCommandTag(), since transformStmt() default to emitting a utility
> command.  You can then easily intercept such statement in the utility hook and
> fetch your custom struct.
> 
> I could do that but I'm assuming that you still wouldn't be satisfied as
> custom parser would still be needed, whihc may or may not require to
> copy/paste chunks of the core grammar?
> 
> If so, do you have any suggestion for an approach you would accept?

Given the total lack of answer on the various improvements I suggested, I'm
assuming that no one is interested in that feature, so I'm marking it as
Rejected.