Thread: Parser extensions (maybe for 10?)

Parser extensions (maybe for 10?)

From
Arcadiy Ivanov
Date:
<font face="Verdana">I thought I'd float this idea and see if this gets any traction. Please forgive my ignorance if
thishas been discussed already.<br /><br /> Currently the parser and lexer are fully fixed at compile-time and not
amenableto the extensions - extensions are only capable of introducing functions etc. <br /><br /> There is, however,
anadvantage to being able if not add or alter complete statements (which would be nice), but to at least augment
portionsof syntax for existing ones in some places. <br /><br /> For example PGXL adds the following to CREATE TABLE
statement:<br/><br /> [ <br />   DISTRIBUTE BY { REPLICATION | ROUNDROBIN | { [HASH | MODULO ] ( column_name ) } } |<br
/>  DISTRIBUTED { { BY ( column_name ) } | { RANDOMLY } |<br />   DISTSTYLE { EVEN | KEY | ALL } DISTKEY ( column_name
)<br/> ]<br /> [ TO { GROUP groupname | NODE ( nodename [, ... ] ) } ]<br /><br /> Compare:<br /><br /><a
class="moz-txt-link-freetext"
href="http://www.postgresql.org/docs/9.5/static/sql-createtable.html">http://www.postgresql.org/docs/9.5/static/sql-createtable.html</a><br
/><aclass="moz-txt-link-freetext"
href="http://files.postgres-xl.org/documentation/sql-createtable.html">http://files.postgres-xl.org/documentation/sql-createtable.html</a><br
/><br/> Is there any interest and/or tips to allow a pluggable parser or at least allow some syntactical pluggability
byextensions? <br /> I think this may allow some projects to move towards becoming an extension as opposed to forking
theproject entirely.<br /><br /> Cheers,<br /></font> <pre class="moz-signature" cols="72">--  
Arcadiy Ivanov
<a class="moz-txt-link-abbreviated" href="mailto:arcadiy@gmail.com">arcadiy@gmail.com</a> | @arcivanov | <a
class="moz-txt-link-freetext"href="https://ivanov.biz">https://ivanov.biz</a> 
<a class="moz-txt-link-freetext" href="https://github.com/arcivanov">https://github.com/arcivanov</a>
</pre>

Re: Parser extensions (maybe for 10?)

From
Craig Ringer
Date:
On 12 April 2016 at 12:36, Arcadiy Ivanov <arcadiy@gmail.com> wrote:
 

Is there any interest and/or tips to allow a pluggable parser or at least allow some syntactical pluggability by extensions?
I think this may allow some projects to move towards becoming an extension as opposed to forking the project entirely.

How would you go about it?

PostgreSQL uses a parser generator that produces C code as its output. Extensions can't just patch the grammar and regenerate the parser. So even if it were desirable, a fully extensible parser would require a total rewrite of the parser/lexer.

That doesn't mean there can't be extension points for SQL syntax, they just need to be planned carefully, located where they won't create parsing ambiguities, and somewhat limited. 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Parser extensions (maybe for 10?)

From
"Tsunakawa, Takayuki"
Date:
<div class="WordSection1"><p class="MsoNormal"><b><font color="black" face="Tahoma" size="2"><span lang="EN-US"
style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext;font-weight:bold">From:</span></font></b><font
color="black"face="Tahoma" size="2"><span lang="EN-US"
style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext">pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org]<b> <span style="font-weight:bold">On Behalf Of </span></b>Arcadiy Ivanov<br
/><br/></span></font><p class="MsoNormal" style="margin-left:5.25pt"><font color="black" face="Verdana" size="3"><span
lang="EN-US"style="font-size:12.0pt;font-family:"Verdana","sans-serif"">Currently the parser and lexer are fully fixed
atcompile-time and not amenable to the extensions - extensions are only capable of introducing functions etc. <br /><br
/>There is, however, an advantage to being able if not add or alter complete statements (which would be nice), but to
atleast augment portions of syntax for existing ones in some places. <br /><br /></span></font><span
lang="EN-US"></span><pclass="MsoNormal"><font color="#1f497d" face="Arial" size="2"><span lang="EN-US"
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> </span></font><pclass="MsoNormal"><font
color="#1f497d"face="Arial" size="2"><span lang="EN-US"
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">Isaw the following discussion in the past, but
Ihaven’t read it:</span></font><p class="MsoNormal"><font color="#1f497d" face="Arial" size="2"><span lang="EN-US"
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> </span></font><pclass="MsoNormal"><font
color="#1f497d"face="Arial" size="2"><span lang="EN-US"
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">PluggableParser</span></font><p
class="MsoNormal"><fontcolor="#1f497d" face="Arial" size="2"><span lang="EN-US"
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">http://www.postgresql.org/message-id/BF2827DCCE55594C8D7A8F7FFD3AB77159878C2A@szxeml521-mbs.china.huawei.com</span></font><p
class="MsoNormal"><fontcolor="#1f497d" face="Arial" size="2"><span lang="EN-US"
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> </span></font><pclass="MsoNormal"><font
color="#1f497d"face="Arial" size="2"><span lang="EN-US"
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">I’minterested in the pluggable, extensible
parserfor two purposes.  One is to add compatibility for other databases.</span></font><p class="MsoNormal"><font
color="#1f497d"face="Arial" size="2"><span lang="EN-US"
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> </span></font><pclass="MsoNormal"><font
color="#1f497d"face="Arial" size="2"><span lang="EN-US"
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">Theother is for the ODBC (and possibly JDBC)
driver.</span></font><pclass="MsoNormal"><font color="#1f497d" face="Arial" size="2"><span lang="EN-US"
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">TheODBC/JDBC specs require some unique syntax
constructs,e.g. {? = call funcname(arguments)} to call stored procs/functions.  Currently, the ODBC/JDBC drivers are
forcedto parse and convert SQL statements.  It is ideal for PostgreSQL itself to understand the ODBC/JDBC syntax, and
eliminatethe burdon of parsing statements from the JDBC/ODBC drivers.</span></font><p class="MsoNormal"><font
color="#1f497d"face="Arial" size="2"><span lang="EN-US"
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> </span></font><pclass="MsoNormal"><font
color="#1f497d"face="Arial" size="2"><span lang="EN-US"
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">Regards</span></font><pclass="MsoNormal"><font
color="#1f497d"face="Arial" size="2"><span lang="EN-US"
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">TakayukiTsunakawa</span></font><p
class="MsoNormal"><fontcolor="#1f497d" face="Arial" size="2"><span lang="EN-US"
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> </span></font><pclass="MsoNormal"><font
color="#1f497d"face="Arial" size="2"><span lang="EN-US"
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> </span></font><pclass="MsoNormal"><font
color="#1f497d"face="Arial" size="2"><span lang="EN-US"
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> </span></font><pre><fontcolor="black" face="MS
ゴシック"size="3"><span lang="EN-US" style="font-size:12.0pt"> </span></font></pre></div> 

Re: Parser extensions (maybe for 10?)

From
Tom Lane
Date:
Arcadiy Ivanov <arcadiy@gmail.com> writes:
> Is there any interest and/or tips to allow a pluggable parser or at 
> least allow some syntactical pluggability by extensions?

There is a fair amount of discussion of this in the archives.  The short
answer is that bison can't do it, and "let's replace bison" is a proposal
with a steep hill in front of it --- the pain-to-gain ratio is just not
very favorable.

Forty years ago, I worked on systems with extensible parsers at HP,
wherein plug-in extensions could add clauses very similarly to what
you suggest.  Those designs worked, more or less, but they had a lot
of deficiencies; the most fundamental problem being that any parsing
inconsistencies would only appear through misbehavior at runtime,
which you would only discover if you happened to test a case that behaved
oddly *and* notice that it was not giving the result you expected.
Bison is far from perfect on this angle, because %prec declarations can
produce results you weren't expecting ... but it's at least one order of
magnitude better than any extensible-parser solution I've ever seen.
Usually bison will give you a shift/reduce error if you write something
that has more than one possible interpretation.

I'm interested in possible solutions to this problem, but it's far
harder than it looks.
        regards, tom lane



Re: Parser extensions (maybe for 10?)

From
Craig Ringer
Date:
On 12 April 2016 at 13:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm interested in possible solutions to this problem, but it's far
harder than it looks.


Exactly.

Limited extension points where we accept runtime errors and confine the extension points can be OK; e.g. SOME STATEMENT ... WITH (THINGY, OTHER_THINGY) where we allow any series of identifier|keyword|literal|bareword, accumulate it and pass it as a List of Node to something else to deal with. Bison can cater to that and similar structures where the boundaries of the generic/extensible region can be clearly defined and limited.

The other area where there's room for extension without throwing out the whole thing and rebuilding is handling of new top-level statements. We can probably dispatch the statement text to a sub-parser provided by an extension that registers interest in that statement name when we attempt to parse it and fail. Even then I'm pretty sure it won't be possible to do so while still allowing multi-statements. I wish we didn't support multi-statements, but we're fairly stuck with them.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Parser extensions (maybe for 10?)

From
"David G. Johnston"
Date:
On Mon, Apr 11, 2016 at 9:58 PM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:

From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Arcadiy Ivanov

Currently the parser and lexer are fully fixed at compile-time and not amenable to the extensions - extensions are only capable of introducing functions etc.

There is, however, an advantage to being able if not add or alter complete statements (which would be nice), but to at least augment portions of syntax for existing ones in some places.

 

I saw the following discussion in the past, but I haven’t read it:

 

Pluggable Parser

http://www.postgresql.org/message-id/BF2827DCCE55594C8D7A8F7FFD3AB77159878C2A@szxeml521-mbs.china.huawei.com

 

I’m interested in the pluggable, extensible parser for two purposes.  One is to add compatibility for other databases.

 

The other is for the ODBC (and possibly JDBC) driver.

The ODBC/JDBC specs require some unique syntax constructs, e.g. {? = call funcname(arguments)} to call stored procs/functions.  Currently, the ODBC/JDBC drivers are forced to parse and convert SQL statements.  It is ideal for PostgreSQL itself to understand the ODBC/JDBC syntax, and eliminate the burdon of parsing statements from the JDBC/ODBC drivers.


​As recently discovered there is more than one reason why an intelligent driver, like the JDBC standard at least requires in a few instances, requires knowledge of at least some basic structure​
 
​of the statements it sees before sending them off to the server.  Furthermore, and particularly in the JDBC example you provide, my first reaction is that it would be a massive encapsulation violation to try and get PostgreSQL to understand "{? = call funcname(args)}" and similar higher level API specifications.

I think PostgreSQL can do quite well by saying, hey this is what we are and this is what we do.  Compatibility has merit but I'm sure at least some of those items can make it into the bison files - regardless of whether those changes end up being accepted into core.  Small-scale forking like this seems like it would be easier to accomplish if not preferable to making the entire thing modular.  We have that option to offer others since we are an open source project.

David J.

Re: Parser extensions (maybe for 10?)

From
Craig Ringer
Date:
On 12 April 2016 at 13:28, David G. Johnston <david.g.johnston@gmail.com> wrote:
 
​As recently discovered there is more than one reason why an intelligent driver, like the JDBC standard at least requires in a few instances, requires knowledge of at least some basic structure​
 
​of the statements it sees before sending them off to the server. 

Indeed.

For one thing, PgJDBC needs to be able to parse the passed SQL text to extract individual statements and split up multi-statements server-side so it can bind/parse/execute them separately.

It also has to be able to find placeholders in the query.... and not be confused by what might be placeholders if not contained within "identifier quoting", 'literal quoting' or $q$dollar quoting$q$.

I see almost zero utility in teaching the server about client-side abstractions like {? = call } . Half the *point* of those is that the *driver* is supposed to understand them and turn them into *DBMS-specific* syntax. They're escapes.

Furthermore, and particularly in the JDBC example you provide, my first reaction is that it would be a massive encapsulation violation to try and get PostgreSQL to understand "{? = call funcname(args)}" and similar higher level API specifications.

+10

Even if it were easy, it'd be an awful idea. It'd also introduce huge ambiguities as the mess of umpteen different client parameter-specifier formats, procedure-call escape formats, etc all clashed in a hideous and confused mess.

This is the client's job. If the client wants to use %(whatever)s, ?, $1, :paramname, or Σparam☑ as parameter placeholders we shouldn't have to care. Same with call-escapes etc. So long as we provide a sensible, well defined way to do what the client driver needs to do to implement what its clients expect.

Now, I do think we should one day have a proper CALL statement, but that's for top-level true stored procedures and unrelated to how the client talks to the client driver.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Parser extensions (maybe for 10?)

From
Tom Lane
Date:
Craig Ringer <craig@2ndquadrant.com> writes:
> The other area where there's room for extension without throwing out the
> whole thing and rebuilding is handling of new top-level statements. We can
> probably dispatch the statement text to a sub-parser provided by an
> extension that registers interest in that statement name when we attempt to
> parse it and fail. Even then I'm pretty sure it won't be possible to do so
> while still allowing multi-statements. I wish we didn't support
> multi-statements, but we're fairly stuck with them.

Well, as I said, I've been there and done that.  Things get sticky
when you notice that those "new top-level statements" would like to
contain sub-clauses (e.g. arithmetic expressions) that should be defined
by the core grammar.  And maybe the extension would also like to
define additions to the expression grammar, requiring a recursive
callback into the extension.  It gets very messy very fast.
        regards, tom lane



Re: Parser extensions (maybe for 10?)

From
Craig Ringer
Date:
On 12 April 2016 at 13:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Craig Ringer <craig@2ndquadrant.com> writes:
> The other area where there's room for extension without throwing out the
> whole thing and rebuilding is handling of new top-level statements. We can
> probably dispatch the statement text to a sub-parser provided by an
> extension that registers interest in that statement name when we attempt to
> parse it and fail. Even then I'm pretty sure it won't be possible to do so
> while still allowing multi-statements. I wish we didn't support
> multi-statements, but we're fairly stuck with them.

Well, as I said, I've been there and done that.  Things get sticky
when you notice that those "new top-level statements" would like to
contain sub-clauses (e.g. arithmetic expressions) that should be defined
by the core grammar.  And maybe the extension would also like to
define additions to the expression grammar, requiring a recursive
callback into the extension.  It gets very messy very fast.

Yuck. You'd ping-pong between two parsers, and have to try to exchange sensible starting states. Point taken.

So even that seemingly not-that-bad restricted option turns out to be far from it, which just goes to show what a pit of snakes parser extensibility is...

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Parser extensions (maybe for 10?)

From
José Luis Tallón
Date:
On 04/12/2016 06:45 AM, Craig Ringer wrote:
On 12 April 2016 at 12:36, Arcadiy Ivanov <arcadiy@gmail.com> wrote:
 

Is there any interest and/or tips to allow a pluggable parser or at least allow some syntactical pluggability by extensions?
I think this may allow some projects to move towards becoming an extension as opposed to forking the project entirely.

FWIW, I have previously sketched a "syntax rewriter" of sorts: a simple filter which is applied to input before the lexer even sees it.
Quite some "syntax magic" can be achieved by allowing an extension to *hook* into this functionality in order to do some rewriting; If turning one statement into several (multi-statement) is also allowed quite some serious emulation could be achieved.

I can certainly prepare a small patch for the first commitfest of 9.7 if this sounds viable.



Thanks,

    / J.L.

Re: Parser extensions (maybe for 10?)

From
Craig Ringer
Date:
On 13 April 2016 at 22:11, José Luis Tallón <jltallon@adv-solutions.net> wrote:
 
FWIW, I have previously sketched a "syntax rewriter" of sorts: a simple filter which is applied to input before the lexer even sees it.
Quite some "syntax magic" can be achieved by allowing an extension to *hook* into this functionality in order to do some rewriting; If turning one statement into several (multi-statement) is also allowed quite some serious emulation could be achieved.

Some DBMSes have hooks that let you match statements to patterns and/or by hash and replace them with a wholly different statement. It seems to be a feature to work around applications that've fossilized completely, with source code long lost or binaries provided by a vendor who went out of business years ago. Or cranked their prices for the new version.  Or, of course, where the apps team have built a fort with spiked pits, a portcullis, and a molten lead trap between their office and the DBA team's office, where the DBA team crouch behind desks covered in woad clutching their swords and knives. Treaty negotiations have entirely fallen through and they throw rocks instead of talking to each other.

The idea is that in such delightful situations where you can't/won't fix terrible queries in the application you instead match them in the DBMS and transparently rewrite them.

I hear "but we can't change the application" often enough to understand why such hooks exist, but not often enough to particularly want them in PostgreSQL.

I can certainly prepare a small patch for the first commitfest of 9.7 if this sounds viable.

I'd be surprised if it was popular. It's hard to imagine a way to do it robustly when dealing with pre-lexer input, unless you're doing simple pattern matching to identify and replace whole statements.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Parser extensions (maybe for 10?)

From
José Luis Tallón
Date:
On 04/13/2016 04:43 PM, Craig Ringer wrote:
On 13 April 2016 at 22:11, José Luis Tallón <jltallon@adv-solutions.net> wrote:
[snip]

I can certainly prepare a small patch for the first commitfest of 9.7 if this sounds viable.

I'd be surprised if it was popular.

I am familiar with some cases where it would have been a lifesaver....

It's hard to imagine a way to do it robustly when dealing with pre-lexer input, unless you're doing simple pattern matching to identify and replace whole statements.

INDEED. No intention whatsoever to do much more than that  O:-)


    / J.L.

Re: Parser extensions (maybe for 10?)

From
Pavel Stehule
Date:
Hi

2016-04-12 7:10 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Arcadiy Ivanov <arcadiy@gmail.com> writes:
> Is there any interest and/or tips to allow a pluggable parser or at
> least allow some syntactical pluggability by extensions?

There is a fair amount of discussion of this in the archives.  The short
answer is that bison can't do it, and "let's replace bison" is a proposal
with a steep hill in front of it --- the pain-to-gain ratio is just not
very favorable.

Forty years ago, I worked on systems with extensible parsers at HP,
wherein plug-in extensions could add clauses very similarly to what
you suggest.  Those designs worked, more or less, but they had a lot
of deficiencies; the most fundamental problem being that any parsing
inconsistencies would only appear through misbehavior at runtime,
which you would only discover if you happened to test a case that behaved
oddly *and* notice that it was not giving the result you expected.
Bison is far from perfect on this angle, because %prec declarations can
produce results you weren't expecting ... but it's at least one order of
magnitude better than any extensible-parser solution I've ever seen.
Usually bison will give you a shift/reduce error if you write something
that has more than one possible interpretation.

I'm interested in possible solutions to this problem, but it's far
harder than it looks.

I cannot to imagine extensible parser based on bison. But the parser can be replaced by custom parser.

Some like pgpool or pgbouncer does. The extension can assign own parser. Custom parser will be called first, and the integrated parser will be used from extension or as fallback. This can helps with new statements for background workers, theoretically it can helps with extending PostgreSQL SQL. Custom parser can do translation from SQL1 to SQL2 dialect, or can do translation from SQL1 to internal calls. The custom parser usually should not implement full SQL - only few statements.

Is it this idea more workable?

Regards

Pavel



 

                        regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Parser extensions (maybe for 10?)

From
Aleksander Alekseev
Date:
> I cannot to imagine extensible parser based on bison. But the parser
> can be replaced by custom parser.
> 
> Some like pgpool or pgbouncer does. The extension can assign own
> parser. Custom parser will be called first, and the integrated parser
> will be used from extension or as fallback. This can helps with new
> statements for background workers, theoretically it can helps with
> extending PostgreSQL SQL. Custom parser can do translation from SQL1
> to SQL2 dialect, or can do translation from SQL1 to internal calls.
> The custom parser usually should not implement full SQL - only few
> statements.
> 
> Is it this idea more workable?

What if there are two or more contribs that extend the parser? Can we
be sure that these contribs will not conflict?

-- 
Best regards,
Aleksander Alekseev
http://eax.me/



Re: Parser extensions (maybe for 10?)

From
Pavel Stehule
Date:


2016-04-18 17:25 GMT+02:00 Aleksander Alekseev <a.alekseev@postgrespro.ru>:
> I cannot to imagine extensible parser based on bison. But the parser
> can be replaced by custom parser.
>
> Some like pgpool or pgbouncer does. The extension can assign own
> parser. Custom parser will be called first, and the integrated parser
> will be used from extension or as fallback. This can helps with new
> statements for background workers, theoretically it can helps with
> extending PostgreSQL SQL. Custom parser can do translation from SQL1
> to SQL2 dialect, or can do translation from SQL1 to internal calls.
> The custom parser usually should not implement full SQL - only few
> statements.
>
> Is it this idea more workable?

What if there are two or more contribs that extend the parser? Can we
be sure that these contribs will not conflict?

It depends - can be allowed only one - like plpgsql extensions, or can be serialized like pg log extensions

Regards

Pavel
 

--
Best regards,
Aleksander Alekseev
http://eax.me/

Re: Parser extensions (maybe for 10?)

From
Aleksander Alekseev
Date:
> It depends - can be allowed only one - like plpgsql extensions, or
> can be serialized like pg log extensions

OK, I understand "can be allowed only one". I doubt it would be a very
useful feature though.

I'm not sure what do you mean by "serialized like pg log extensions".
Lets say extension A allows "SELECT ASAP ..." queries and extension B
--- "... ORDER BY RANK". What happens when user executes "SELECT
ASAP ... ORDER BY RANK" query?

-- 
Best regards,
Aleksander Alekseev
http://eax.me/



Re: Parser extensions (maybe for 10?)

From
Pavel Stehule
Date:


2016-04-18 17:44 GMT+02:00 Aleksander Alekseev <a.alekseev@postgrespro.ru>:
> It depends - can be allowed only one - like plpgsql extensions, or
> can be serialized like pg log extensions

OK, I understand "can be allowed only one". I doubt it would be a very
useful feature though.

I'm not sure what do you mean by "serialized like pg log extensions".
Lets say extension A allows "SELECT ASAP ..." queries and extension B
--- "... ORDER BY RANK". What happens when user executes "SELECT
ASAP ... ORDER BY RANK" query?

no - it is not possible. Not with Bison parser - it cannot work with unknown syntax - so isn't possible implement one part by parser A, and second part by parser B.

But we can parsers P1 and P2. P1 knows string XX, P2 knows YY. Buildin parser (BP) knows SQL

We can have registered parsers P1, P2, BP.

for string SELECT

P1 fails,
P2 fails,
BP processes it

for string YY

P1 fails,
P2 process it,
BP is not called

But transformations can be allowed too (but it is slower)

for string ZZZZ

P1 does transformation to YYY
P2 does transformation to SELECT
BP processes it




--
Best regards,
Aleksander Alekseev
http://eax.me/

Re: Parser extensions (maybe for 10?)

From
Aleksander Alekseev
Date:
> no - it is not possible. Not with Bison parser - it cannot work with
> unknown syntax - so isn't possible implement one part by parser A, and
> second part by parser B.
> 
> But we can parsers P1 and P2. P1 knows string XX, P2 knows YY. Buildin
> parser (BP) knows SQL
> 
> We can have registered parsers P1, P2, BP.
> 
> for string SELECT
> 
> P1 fails,
> P2 fails,
> BP processes it
> 
> for string YY
> 
> P1 fails,
> P2 process it,
> BP is not called
> 
> But transformations can be allowed too (but it is slower)
> 
> for string ZZZZ
> 
> P1 does transformation to YYY
> P2 does transformation to SELECT
> BP processes it

I look on this a little bit differently.

Current pipeline is approximately like this:

```
query string -> LEX -> [lexemes] -> SYNT -> QueryAST -> PLANNER
```

Or in Haskell-like notation:

```
lex :: String -> [Lexeme]
synt :: [Lexeme] -> AST
```

Its reasonably simple to extend a lexer. Lets say that AST type doesn't
change, i.e. extensions provide only syntax sugar. After desugaring
query transforms to old-good SELECT, UPDATE, procedures calls, etc. In
this case what extension does is actually:

```
type Parser = [Lexeme] -> AST
extendParser :: Parser -> Parser
```

Can we guarantee that extensions don't conflict? In fact we can since
we already do it. If all tests pass there is no conflict.

The only tricky part I see is that sometimes we want:

```
extendParser1 ( extendParser2 ( default ))
```

... and sometimes:

```
extendParser2 ( extendParser1 ( default ))
```

I don't think that order of extension will matter most of the time. But
we still should provide a mechanism to change this order. For instance,
contribs could provide a default priority of parser extension.
Extensions with higher priority are applied first. Also user can
resolve conflicts by manually overriding these priorities:

```
select pg_parser_extension_priorities();
select pg_override_parser_extension_priority('some_extension', 100500);
```

I think it should work.

Thought?

-- 
Best regards,
Aleksander Alekseev
http://eax.me/



Re: Parser extensions (maybe for 10?)

From
Simon Riggs
Date:
On 12 April 2016 at 06:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Craig Ringer <craig@2ndquadrant.com> writes:
> The other area where there's room for extension without throwing out the
> whole thing and rebuilding is handling of new top-level statements. We can
> probably dispatch the statement text to a sub-parser provided by an
> extension that registers interest in that statement name when we attempt to
> parse it and fail. Even then I'm pretty sure it won't be possible to do so
> while still allowing multi-statements. I wish we didn't support
> multi-statements, but we're fairly stuck with them.

Well, as I said, I've been there and done that.  Things get sticky
when you notice that those "new top-level statements" would like to
contain sub-clauses (e.g. arithmetic expressions) that should be defined
by the core grammar.  And maybe the extension would also like to
define additions to the expression grammar, requiring a recursive
callback into the extension.  It gets very messy very fast.

As Tom says, we can't easily break it down into multiple co-operating pieces, so lets forget that as unworkable.

What is possible is a whole new grammar... for example if we imagine

 SET client_language_path = 'foo, postgresql'

Works similar to search_path, but not userset. We try to parse incoming statements against the foo parser first, if that fails we try postgresql.
The default setting would be simply 'postgresql', so no match -> syntax error.

We could make that easier by making the postgresql parser a plugin itself. So to produce a new one you just copy the files, modify them as needed then insert a new record into pg_language as an extension.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Parser extensions (maybe for 10?)

From
Oleg Bartunov
Date:


On Tue, Apr 19, 2016 at 1:49 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 12 April 2016 at 06:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Craig Ringer <craig@2ndquadrant.com> writes:
> The other area where there's room for extension without throwing out the
> whole thing and rebuilding is handling of new top-level statements. We can
> probably dispatch the statement text to a sub-parser provided by an
> extension that registers interest in that statement name when we attempt to
> parse it and fail. Even then I'm pretty sure it won't be possible to do so
> while still allowing multi-statements. I wish we didn't support
> multi-statements, but we're fairly stuck with them.

Well, as I said, I've been there and done that.  Things get sticky
when you notice that those "new top-level statements" would like to
contain sub-clauses (e.g. arithmetic expressions) that should be defined
by the core grammar.  And maybe the extension would also like to
define additions to the expression grammar, requiring a recursive
callback into the extension.  It gets very messy very fast.

As Tom says, we can't easily break it down into multiple co-operating pieces, so lets forget that as unworkable.

What is possible is a whole new grammar... for example if we imagine

 SET client_language_path = 'foo, postgresql'

Works similar to search_path, but not userset. We try to parse incoming statements against the foo parser first, if that fails we try postgresql.
The default setting would be simply 'postgresql', so no match -> syntax error.


that's interesting. I'd add parse_error_handler, which actually processes syntax error.

SET client_language_path = 'foo, postgresql, parse_error_handler'
 
We could make that easier by making the postgresql parser a plugin itself. So to produce a new one you just copy the files, modify them as needed then insert a new record into pg_language as an extension.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Parser extensions (maybe for 10?)

From
Aleksander Alekseev
Date:
> As Tom says, we can't easily break it down into multiple co-operating
> pieces, so lets forget that as unworkable.

I'm sorry but didn't I just demonstrate the opposite? If so it's very
easy to prove - give a counterexample. As I understand approach I
described handles cases named by Tom just fine. In fact the idea of
transforming ASTs (a.k.a metaprogramming) is successfully used by
programmers for about 50 years now.

(As a side note - I'm not a native English speaker but I believe such
type of logic is known as "argument from authority".)

> What is possible is a whole new grammar... for example if we imagine
> 
>  SET client_language_path = 'foo, postgresql'
> 
> Works similar to search_path, but not userset. We try to parse
> incoming statements against the foo parser first, if that fails we
> try postgresql. The default setting would be simply 'postgresql', so
> no match -> syntax error.
> 
> We could make that easier by making the postgresql parser a plugin
> itself. So to produce a new one you just copy the files, modify them
> as needed then insert a new record into pg_language as an extension.
> 

I think its not an extension but a replacement of a grammar. This
approach implies that every extension implements a parser from scratch.
Not sure if anyone will do it in practice to change SQL syntax a little
bit.

I'm not telling that such a feature will be completely worthless. But
why not to make a step further and not to implement plugable protocols?
E.g. make PostgreSQL compatible with MySQL and/or MongoDB? Or maybe
implement SQL dialect that forbids implicit type conversion. Or add
build-in connection pooling mechanism. I wonder though if all of this
could already be implemented as an extension without any changes in
PostgreSQL core. 

-- 
Best regards,
Aleksander Alekseev
http://eax.me/



Re: Parser extensions (maybe for 10?)

From
Andres Freund
Date:
On 2016-04-19 15:32:07 +0300, Aleksander Alekseev wrote:
> > As Tom says, we can't easily break it down into multiple co-operating
> > pieces, so lets forget that as unworkable.
> 
> I'm sorry but didn't I just demonstrate the opposite?

I doubt it.

> If so it's very
> easy to prove - give a counterexample. As I understand approach I
> described handles cases named by Tom just fine. In fact the idea of
> transforming ASTs (a.k.a metaprogramming) is successfully used by
> programmers for about 50 years now.
> 
> (As a side note - I'm not a native English speaker but I believe such
> type of logic is known as "argument from authority".)

And the above is called an ad-hominem.



Re: Parser extensions (maybe for 10?)

From
Andres Freund
Date:
On 2016-04-19 12:20:03 +0300, Aleksander Alekseev wrote:
> Can we guarantee that extensions don't conflict? In fact we can since
> we already do it. If all tests pass there is no conflict.

How does that follow? Even if you were to test all possible extensions
together - obviously not possible - how do passing tests prove the
grammar to be conflict free?

Andres



Re: Parser extensions (maybe for 10?)

From
Aleksander Alekseev
Date:
> On 2016-04-19 12:20:03 +0300, Aleksander Alekseev wrote:
> > Can we guarantee that extensions don't conflict? In fact we can
> > since we already do it. If all tests pass there is no conflict.
> 
> How does that follow? Even if you were to test all possible extensions
> together - obviously not possible - how do passing tests prove the
> grammar to be conflict free?

Do we currently test that all existing extensions work together? No.
And in fact it doesn't matter whether they work together or not. What
matters that concrete subset of extensions chosen by given user work
together. We don't guarantee that extensions are bug free either. In
fact I'm quite sure there are many bugs in PostgreSQL extensions and
PostgreSQL itself. But if `make check` pass probably extension doesn't
have more bugs than usual. Why syntax extension should suddenly be an
exception of these rules?

Also I would like to remind that suggested approach is only about
syntax sugar. The resulting desugared query would be the same as usual.
If it's invalid we just discard it.


For the record - I'm not telling that this SQL extending feature should
necessarily be implemented. Frankly I'm personally quite against it.
I can't think of any real cases when it would be very useful and I don't
think that this feature is worth an effort, not mentioning further
support. All I'm telling is that it could be done using methods that are
well-known for decades.

-- 
Best regards,
Aleksander Alekseev
http://eax.me/



Re: Parser extensions (maybe for 10?)

From
Stas Kelvich
Date:
> On 12 Apr 2016, at 07:36, Arcadiy Ivanov <arcadiy@gmail.com> wrote:
>
> [
>   DISTRIBUTE BY { REPLICATION | ROUNDROBIN | { [HASH | MODULO ] ( column_name ) } } |
>   DISTRIBUTED { { BY ( column_name ) } | { RANDOMLY } |
>   DISTSTYLE { EVEN | KEY | ALL } DISTKEY ( column_name )
> ]
> [ TO { GROUP groupname | NODE ( nodename [, ... ] ) } ]

Less invasive way to achieve same is to use WITH parameter
that already exists in CREATE TABLE, CREATE INDEX, etc.

Like that:

create table foo(id int) with(distributed_by=‘id’, nodes=’node1, node2’);

That’s easier to allow extensions to define custom parameters for WITH, than
to extend parser.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company




Re: Parser extensions (maybe for 10?)

From
Pavel Stehule
Date:


2016-04-19 12:49 GMT+02:00 Simon Riggs <simon@2ndquadrant.com>:
On 12 April 2016 at 06:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Craig Ringer <craig@2ndquadrant.com> writes:
> The other area where there's room for extension without throwing out the
> whole thing and rebuilding is handling of new top-level statements. We can
> probably dispatch the statement text to a sub-parser provided by an
> extension that registers interest in that statement name when we attempt to
> parse it and fail. Even then I'm pretty sure it won't be possible to do so
> while still allowing multi-statements. I wish we didn't support
> multi-statements, but we're fairly stuck with them.

Well, as I said, I've been there and done that.  Things get sticky
when you notice that those "new top-level statements" would like to
contain sub-clauses (e.g. arithmetic expressions) that should be defined
by the core grammar.  And maybe the extension would also like to
define additions to the expression grammar, requiring a recursive
callback into the extension.  It gets very messy very fast.

As Tom says, we can't easily break it down into multiple co-operating pieces, so lets forget that as unworkable.

What is possible is a whole new grammar... for example if we imagine

 SET client_language_path = 'foo, postgresql'

Works similar to search_path, but not userset. We try to parse incoming statements against the foo parser first, if that fails we try postgresql.
The default setting would be simply 'postgresql', so no match -> syntax error.

The idea is good. I don't understand to name "client_language_path" - it is not clean - a) this is server side feature, b) we use term "language" for PL, so any other term will be better.
 

We could make that easier by making the postgresql parser a plugin itself. So to produce a new one you just copy the files, modify them as needed then insert a new record into pg_language as an extension.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Parser extensions (maybe for 10?)

From
Robert Haas
Date:
On Tue, Apr 19, 2016 at 10:05 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-04-19 15:32:07 +0300, Aleksander Alekseev wrote:
>> > As Tom says, we can't easily break it down into multiple co-operating
>> > pieces, so lets forget that as unworkable.
>>
>> I'm sorry but didn't I just demonstrate the opposite?
>
> I doubt it.
>
>> If so it's very
>> easy to prove - give a counterexample. As I understand approach I
>> described handles cases named by Tom just fine. In fact the idea of
>> transforming ASTs (a.k.a metaprogramming) is successfully used by
>> programmers for about 50 years now.
>>
>> (As a side note - I'm not a native English speaker but I believe such
>> type of logic is known as "argument from authority".)
>
> And the above is called an ad-hominem.

An "ad hominem" attack means against the person rather than on the
topic of the issue, but I don't think Aleksander did that.  I'm not
sure why you think what he wrote was out of line.  It reads OK to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Parser extensions (maybe for 10?)

From
Aleksander Alekseev
Date:
> > And the above is called an ad-hominem.
> 
> An "ad hominem" attack means against the person rather than on the
> topic of the issue, but I don't think Aleksander did that.  I'm not
> sure why you think what he wrote was out of line.  It reads OK to me.

Frankly when I re-read my own e-mails sometimes I find them a little
bit dry to say the least. But it's not intentional. I'm just having some
difficulties expressing myself on my second language. I should probably
use more words like "please" and "thank you" to smooth this effect. My
sincere apologies to anyone who was offended in any way. 

-- 
Best regards,
Aleksander Alekseev
http://eax.me/