Thread: exporting raw parser

exporting raw parser

From

Tatsuo Ishii

Date:

26 May 2010, 22:02:58

I'm thinking about exporting the raw parser and related modules as a C
library. Though this will not be an immediate benefit of PostgreSQL
itself, it will be a huge benefit for any PostgreSQL
applications/middle ware those need to parse SQL statements.

For example, pgpool-II parses queries to know if it's a read query or
not. In other case, it needs to know if a SELECT statement includes
any temporal constructor such as CURRENT_TIME_STAMP. These are not a
trivial job since SQL grammar is complex. For this purpose pgpool-II
copies PostgreSQL parser code and use it. Of course maintaining the
part is pain since PostgreSQL's parser will be changed from release to
release.

I believe not only pgpool-II but some connection pooling middle wares
need SQL parser as well(pgbouncer?). Also any tool which accepts SQL
statement as its input would also need SQL parser(pgAdmin?). For them
exported raw parser will be a huge benefit.

The implementation will not be very difficult since pgpool-II has
already done most of necessary work for this:

- extract raw parser part from parser directory, which include gram.y, scan.l and keywords.c

- extract utility functions needed to handle raw parse tree: nodes/nodes.c makefunc.c etc.

- create an exportable version of memory manager

- create an exportable exception handling routines(i.e. elog)

- wrap all of above into a libXX*.so

I think those works are essentially a refactoring of existing raw
parser, and will not add performance degration nor maintenance cost.

Comments?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: exporting raw parser

From

Josh Berkus

Date:

26 May 2010, 22:18:07

> I think those works are essentially a refactoring of existing raw
> parser, and will not add performance degration nor maintenance cost.
> 
> Comments?

You should call it "libSQL"; who knows, other DB projects might want it.They seem to borrow our parser enough as it
is.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com

Re: exporting raw parser

From

Tom Lane

Date:

26 May 2010, 22:45:39

Tatsuo Ishii <ishii@postgresql.org> writes:
> I'm thinking about exporting the raw parser and related modules as a C
> library. Though this will not be an immediate benefit of PostgreSQL
> itself, it will be a huge benefit for any PostgreSQL
> applications/middle ware those need to parse SQL statements.

As was already discussed, I don't believe that premise.  None of the
applications you cite would be able to make use of the raw parser
output, because it doesn't contain the semantic information they need.
If what you actually meant was the analyzed parse tree, that *might*
serve the need depending on just what is wanted (in particular,
properties that could be affected by the expansion of views or
inlineable functions could still not be determined reliably).
But you can't have that without access to the current system catalog
contents.

In any case there's the serious problem that we simply are not going
to promise that the parser output representation is stable.  We've
changed it many times in the past and will do so in the future.

> I think those works are essentially a refactoring of existing raw
> parser, and will not add performance degration nor maintenance cost.

Quite aside from whether the result would be of any use or not, that
opinion is obviously wrong.  This would be at least as difficult to
maintain as ecpg ... which has been a enormous time sink.
        regards, tom lane

Re: exporting raw parser

From

Takahiro Itagaki

Date:

26 May 2010, 23:00:15

Tatsuo Ishii <ishii@postgresql.org> wrote:

> I'm thinking about exporting the raw parser and related modules as a C
> library. Though this will not be an immediate benefit of PostgreSQL
> itself, it will be a huge benefit for any PostgreSQL
> applications/middle ware those need to parse SQL statements.

I read your proposal says "postgres.exe" will link to "libSQL.dll",
and "pgpool.exe" will also link to the DLL, right?

I think it is reasonable, but I'm not sure what part of postgres
should be in the DLL. Obviously we should avoid code duplication
between the DLL and "postgres.exe".

> - create an exportable version of memory manager
> - create an exportable exception handling routines(i.e. elog)

Are there any other issues? For example, - How to split headers for raw parser nodes? - Which module do we define T_xxx
enumerationsand support functions?   (outfuncs, readfuncs, copyfuncs, and equalfuncs)

The proposal will be acceptable only when all of the technical issues
are solved. The libSQL should also be available in stand-alone.
It should not be a collection of half-baked functions.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Re: exporting raw parser

From

Tatsuo Ishii

Date:

26 May 2010, 23:17:07

> As was already discussed, I don't believe that premise.  None of the
> applications you cite would be able to make use of the raw parser
> output, because it doesn't contain the semantic information they need.
> If what you actually meant was the analyzed parse tree, that *might*
> serve the need depending on just what is wanted (in particular,
> properties that could be affected by the expansion of views or
> inlineable functions could still not be determined reliably).
> But you can't have that without access to the current system catalog
> contents.

No, what pgpoo-II needs is a raw parse tree. When it needs info in the
system catalog, it sends SELECT to PostgreSQL. So that would be no
problem.

> In any case there's the serious problem that we simply are not going
> to promise that the parser output representation is stable.  We've
> changed it many times in the past and will do so in the future.

That's acceptable at least for pgpool-II. Basically what I need is,
a)SQL statement type, b)target tables, c)target columns(functions)
etc., which seem pretty stable among versions. Even if PostgreSQL
changes the representation of the praser, pgpool-II could ask the
PostgreSQL version and could undertstand the different
representations. Pgpool-II has already done this with the system
catalog changes.

Also good thing is, the parser provides nice APIs to process the parse
tree: raw_expression_tree_walker, outfuncs and macros. Those will
absorb the version difference.

> Quite aside from whether the result would be of any use or not, that
> opinion is obviously wrong.  This would be at least as difficult to
> maintain as ecpg ... which has been a enormous time sink.

From reading README.parser of ecpg, the maintenance problem with ecpg
seems comes from that it needs to modify the grammer. My proposal
does not require the grammer changes. So I don't understand why you
think this would be difficult as ecpg.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: exporting raw parser

From

Tatsuo Ishii

Date:

26 May 2010, 23:28:34

> I read your proposal says "postgres.exe" will link to "libSQL.dll",
> and "pgpool.exe" will also link to the DLL, right?

Perhaps.

> I think it is reasonable, but I'm not sure what part of postgres
> should be in the DLL. Obviously we should avoid code duplication
> between the DLL and "postgres.exe".
>
> > - create an exportable version of memory manager
> > - create an exportable exception handling routines(i.e. elog)
> 
> Are there any other issues? For example,
>   - How to split headers for raw parser nodes?
>   - Which module do we define T_xxx enumerations and support functions?
>     (outfuncs, readfuncs, copyfuncs, and equalfuncs)
> 
> The proposal will be acceptable only when all of the technical issues
> are solved. The libSQL should also be available in stand-alone.
> It should not be a collection of half-baked functions.

What do you mean by "should also be available in stand-alone"? If you
want more abstract API than "libSQL", you could invent such a thing
based on it as much as you like. IMO anything need to parse/operate
the raw parse tree should be in libSQL.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: exporting raw parser

From

Takahiro Itagaki

Date:

26 May 2010, 23:49:09

Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

> > The proposal will be acceptable only when all of the technical issues
> > are solved. The libSQL should also be available in stand-alone.
> > It should not be a collection of half-baked functions.
> 
> What do you mean by "should also be available in stand-alone"? If you
> want more abstract API than "libSQL", you could invent such a thing
> based on it as much as you like. IMO anything need to parse/operate
> the raw parse tree should be in libSQL.

My "stand-alone" means libSQL can be used from many modules
without duplicated codes. For example, copy routines for raw
parse trees should be in the DLL rather than in postgres.exe.

Then, we need to consider other products than pgpool. Who will
use the dll? If pgpool is the only user, we might not allow to
modify core codes only for one usecase. More research other than
pgpool is required to decide the interface routines for libSQL.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Re: exporting raw parser

From

Tatsuo Ishii

Date:

27 May 2010, 00:02:10

> My "stand-alone" means libSQL can be used from many modules
> without duplicated codes. For example, copy routines for raw
> parse trees should be in the DLL rather than in postgres.exe.
> 
> Then, we need to consider other products than pgpool. Who will
> use the dll? If pgpool is the only user, we might not allow to
> modify core codes only for one usecase. More research other than
> pgpool is required to decide the interface routines for libSQL.

If the user of the new API is only pgpool-II, I hadn't made the
propose in the first place. It's a waste of time and I would rather
keep on borrowing the parse code. I thought there were several people
who needed the API as well in the cluster meeting. If somebody who
made such a vote in the meeting is on the list, please express your
opinion for the API.

I'm not in the position of speaking for other products.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: exporting raw parser

From

Jan Wieck

Date:

01 June 2010, 13:31:10

On 5/26/2010 10:16 PM, Tatsuo Ishii wrote:
>> As was already discussed, I don't believe that premise.  None of the
>> applications you cite would be able to make use of the raw parser
>> output, because it doesn't contain the semantic information they need.
>> If what you actually meant was the analyzed parse tree, that *might*
>> serve the need depending on just what is wanted (in particular,
>> properties that could be affected by the expansion of views or
>> inlineable functions could still not be determined reliably).
>> But you can't have that without access to the current system catalog
>> contents.
> 
> No, what pgpoo-II needs is a raw parse tree. When it needs info in the
> system catalog, it sends SELECT to PostgreSQL. So that would be no
> problem.

But doesn't it need that parse tree BEFORE it makes the decision, which 
node to execute the query on?

The parser needs the system catalog in order to create a parse tree. 
Where would that stand-alone library version of the parser get the 
catalog information from? Don't you need to know which user defined 
function in the query is volatile?


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

Re: exporting raw parser

From

Daniel Farina

Date:

07 June 2010, 04:17:20

On Wed, May 26, 2010 at 6:02 PM, Tatsuo Ishii <ishii@postgresql.org> wrote:
> I'm thinking about exporting the raw parser and related modules as a C
> library. Though this will not be an immediate benefit of PostgreSQL
> itself, it will be a huge benefit for any PostgreSQL
> applications/middle ware those need to parse SQL statements.

In the past I and people I have known/worked with have made strategic
use of UDFs running on a live server that return the parse tree,
semantically analyzed tree, and planned tree (I think) outNode textual
representation for various projects, and found them highly useful.
Syntactic, semantic, and operational meaning of a query was useful for
our projects.

Some of this code was linked with the server, and so reading the node
using Postgres' parser was easy. Otherwise, a small parser needed be
written for external projects. Perhaps a slightly more ideal state of
affairs would be:

* These hooks to acquire the syntactic/semantic/planned trees would be
bundled "for free"
* When writing code not linked against the server, a more common
serialization format, ala JSON or whatnot

A more ambitious project that I don't think is in the scope of any
initial implementation would be to allow for cross referencing of
these compilation passes, similar to how GNU Bison allows you to
interrogate for the position of a lexeme when reporting errors. In my
experience, code written that mangles one layer (say, semantic, or
harder yet, plan) has a hard time doing the best error because getting
from a node at the "bottom" to the right lexeme(s) at the "top" is
very cumbersome. One could imagine this being useful for other
purposes too, but that is how I felt it firsthand. Feels a lot harder,
though.

fdr

Re: exporting raw parser

From

Dimitri Fontaine

Date:

07 June 2010, 05:13:19

Daniel Farina <drfarina@acm.org> writes:
> Some of this code was linked with the server, and so reading the node
> using Postgres' parser was easy. Otherwise, a small parser needed be
> written for external projects. Perhaps a slightly more ideal state of
> affairs would be:
>
> * These hooks to acquire the syntactic/semantic/planned trees would be
> bundled "for free"
> * When writing code not linked against the server, a more common
> serialization format, ala JSON or whatnot

Accessing to those data have been talked about with respect to DDL
triggers too. You want to be able to know what exactly is being
executed, and against what objects.

And you want to be able to abuse this information from either a C-coded
server function or a PLpgSQL trigger. I guess the WIP JSON datatype
would help a lot even when working from within the server, as that does
not mean working in C.

Regards,
-- 
dim