Thread: exporting raw parser
I'm thinking about exporting the raw parser and related modules as a C library. Though this will not be an immediate benefit of PostgreSQL itself, it will be a huge benefit for any PostgreSQL applications/middle ware those need to parse SQL statements. For example, pgpool-II parses queries to know if it's a read query or not. In other case, it needs to know if a SELECT statement includes any temporal constructor such as CURRENT_TIME_STAMP. These are not a trivial job since SQL grammar is complex. For this purpose pgpool-II copies PostgreSQL parser code and use it. Of course maintaining the part is pain since PostgreSQL's parser will be changed from release to release. I believe not only pgpool-II but some connection pooling middle wares need SQL parser as well(pgbouncer?). Also any tool which accepts SQL statement as its input would also need SQL parser(pgAdmin?). For them exported raw parser will be a huge benefit. The implementation will not be very difficult since pgpool-II has already done most of necessary work for this: - extract raw parser part from parser directory, which include gram.y, scan.l and keywords.c - extract utility functions needed to handle raw parse tree: nodes/nodes.c makefunc.c etc. - create an exportable version of memory manager - create an exportable exception handling routines(i.e. elog) - wrap all of above into a libXX*.so I think those works are essentially a refactoring of existing raw parser, and will not add performance degration nor maintenance cost. Comments? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
> I think those works are essentially a refactoring of existing raw > parser, and will not add performance degration nor maintenance cost. > > Comments? You should call it "libSQL"; who knows, other DB projects might want it.They seem to borrow our parser enough as it is. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Tatsuo Ishii <ishii@postgresql.org> writes: > I'm thinking about exporting the raw parser and related modules as a C > library. Though this will not be an immediate benefit of PostgreSQL > itself, it will be a huge benefit for any PostgreSQL > applications/middle ware those need to parse SQL statements. As was already discussed, I don't believe that premise. None of the applications you cite would be able to make use of the raw parser output, because it doesn't contain the semantic information they need. If what you actually meant was the analyzed parse tree, that *might* serve the need depending on just what is wanted (in particular, properties that could be affected by the expansion of views or inlineable functions could still not be determined reliably). But you can't have that without access to the current system catalog contents. In any case there's the serious problem that we simply are not going to promise that the parser output representation is stable. We've changed it many times in the past and will do so in the future. > I think those works are essentially a refactoring of existing raw > parser, and will not add performance degration nor maintenance cost. Quite aside from whether the result would be of any use or not, that opinion is obviously wrong. This would be at least as difficult to maintain as ecpg ... which has been a enormous time sink. regards, tom lane
Tatsuo Ishii <ishii@postgresql.org> wrote: > I'm thinking about exporting the raw parser and related modules as a C > library. Though this will not be an immediate benefit of PostgreSQL > itself, it will be a huge benefit for any PostgreSQL > applications/middle ware those need to parse SQL statements. I read your proposal says "postgres.exe" will link to "libSQL.dll", and "pgpool.exe" will also link to the DLL, right? I think it is reasonable, but I'm not sure what part of postgres should be in the DLL. Obviously we should avoid code duplication between the DLL and "postgres.exe". > - create an exportable version of memory manager > - create an exportable exception handling routines(i.e. elog) Are there any other issues? For example, - How to split headers for raw parser nodes? - Which module do we define T_xxx enumerationsand support functions? (outfuncs, readfuncs, copyfuncs, and equalfuncs) The proposal will be acceptable only when all of the technical issues are solved. The libSQL should also be available in stand-alone. It should not be a collection of half-baked functions. Regards, --- Takahiro Itagaki NTT Open Source Software Center
> As was already discussed, I don't believe that premise. None of the > applications you cite would be able to make use of the raw parser > output, because it doesn't contain the semantic information they need. > If what you actually meant was the analyzed parse tree, that *might* > serve the need depending on just what is wanted (in particular, > properties that could be affected by the expansion of views or > inlineable functions could still not be determined reliably). > But you can't have that without access to the current system catalog > contents. No, what pgpoo-II needs is a raw parse tree. When it needs info in the system catalog, it sends SELECT to PostgreSQL. So that would be no problem. > In any case there's the serious problem that we simply are not going > to promise that the parser output representation is stable. We've > changed it many times in the past and will do so in the future. That's acceptable at least for pgpool-II. Basically what I need is, a)SQL statement type, b)target tables, c)target columns(functions) etc., which seem pretty stable among versions. Even if PostgreSQL changes the representation of the praser, pgpool-II could ask the PostgreSQL version and could undertstand the different representations. Pgpool-II has already done this with the system catalog changes. Also good thing is, the parser provides nice APIs to process the parse tree: raw_expression_tree_walker, outfuncs and macros. Those will absorb the version difference. > Quite aside from whether the result would be of any use or not, that > opinion is obviously wrong. This would be at least as difficult to > maintain as ecpg ... which has been a enormous time sink. From reading README.parser of ecpg, the maintenance problem with ecpg seems comes from that it needs to modify the grammer. My proposal does not require the grammer changes. So I don't understand why you think this would be difficult as ecpg. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
> I read your proposal says "postgres.exe" will link to "libSQL.dll", > and "pgpool.exe" will also link to the DLL, right? Perhaps. > I think it is reasonable, but I'm not sure what part of postgres > should be in the DLL. Obviously we should avoid code duplication > between the DLL and "postgres.exe". > > > - create an exportable version of memory manager > > - create an exportable exception handling routines(i.e. elog) > > Are there any other issues? For example, > - How to split headers for raw parser nodes? > - Which module do we define T_xxx enumerations and support functions? > (outfuncs, readfuncs, copyfuncs, and equalfuncs) > > The proposal will be acceptable only when all of the technical issues > are solved. The libSQL should also be available in stand-alone. > It should not be a collection of half-baked functions. What do you mean by "should also be available in stand-alone"? If you want more abstract API than "libSQL", you could invent such a thing based on it as much as you like. IMO anything need to parse/operate the raw parse tree should be in libSQL. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > > The proposal will be acceptable only when all of the technical issues > > are solved. The libSQL should also be available in stand-alone. > > It should not be a collection of half-baked functions. > > What do you mean by "should also be available in stand-alone"? If you > want more abstract API than "libSQL", you could invent such a thing > based on it as much as you like. IMO anything need to parse/operate > the raw parse tree should be in libSQL. My "stand-alone" means libSQL can be used from many modules without duplicated codes. For example, copy routines for raw parse trees should be in the DLL rather than in postgres.exe. Then, we need to consider other products than pgpool. Who will use the dll? If pgpool is the only user, we might not allow to modify core codes only for one usecase. More research other than pgpool is required to decide the interface routines for libSQL. Regards, --- Takahiro Itagaki NTT Open Source Software Center
> My "stand-alone" means libSQL can be used from many modules > without duplicated codes. For example, copy routines for raw > parse trees should be in the DLL rather than in postgres.exe. > > Then, we need to consider other products than pgpool. Who will > use the dll? If pgpool is the only user, we might not allow to > modify core codes only for one usecase. More research other than > pgpool is required to decide the interface routines for libSQL. If the user of the new API is only pgpool-II, I hadn't made the propose in the first place. It's a waste of time and I would rather keep on borrowing the parse code. I thought there were several people who needed the API as well in the cluster meeting. If somebody who made such a vote in the meeting is on the list, please express your opinion for the API. I'm not in the position of speaking for other products. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
On 5/26/2010 10:16 PM, Tatsuo Ishii wrote: >> As was already discussed, I don't believe that premise. None of the >> applications you cite would be able to make use of the raw parser >> output, because it doesn't contain the semantic information they need. >> If what you actually meant was the analyzed parse tree, that *might* >> serve the need depending on just what is wanted (in particular, >> properties that could be affected by the expansion of views or >> inlineable functions could still not be determined reliably). >> But you can't have that without access to the current system catalog >> contents. > > No, what pgpoo-II needs is a raw parse tree. When it needs info in the > system catalog, it sends SELECT to PostgreSQL. So that would be no > problem. But doesn't it need that parse tree BEFORE it makes the decision, which node to execute the query on? The parser needs the system catalog in order to create a parse tree. Where would that stand-alone library version of the parser get the catalog information from? Don't you need to know which user defined function in the query is volatile? Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin
On Wed, May 26, 2010 at 6:02 PM, Tatsuo Ishii <ishii@postgresql.org> wrote: > I'm thinking about exporting the raw parser and related modules as a C > library. Though this will not be an immediate benefit of PostgreSQL > itself, it will be a huge benefit for any PostgreSQL > applications/middle ware those need to parse SQL statements. In the past I and people I have known/worked with have made strategic use of UDFs running on a live server that return the parse tree, semantically analyzed tree, and planned tree (I think) outNode textual representation for various projects, and found them highly useful. Syntactic, semantic, and operational meaning of a query was useful for our projects. Some of this code was linked with the server, and so reading the node using Postgres' parser was easy. Otherwise, a small parser needed be written for external projects. Perhaps a slightly more ideal state of affairs would be: * These hooks to acquire the syntactic/semantic/planned trees would be bundled "for free" * When writing code not linked against the server, a more common serialization format, ala JSON or whatnot A more ambitious project that I don't think is in the scope of any initial implementation would be to allow for cross referencing of these compilation passes, similar to how GNU Bison allows you to interrogate for the position of a lexeme when reporting errors. In my experience, code written that mangles one layer (say, semantic, or harder yet, plan) has a hard time doing the best error because getting from a node at the "bottom" to the right lexeme(s) at the "top" is very cumbersome. One could imagine this being useful for other purposes too, but that is how I felt it firsthand. Feels a lot harder, though. fdr
Daniel Farina <drfarina@acm.org> writes: > Some of this code was linked with the server, and so reading the node > using Postgres' parser was easy. Otherwise, a small parser needed be > written for external projects. Perhaps a slightly more ideal state of > affairs would be: > > * These hooks to acquire the syntactic/semantic/planned trees would be > bundled "for free" > * When writing code not linked against the server, a more common > serialization format, ala JSON or whatnot Accessing to those data have been talked about with respect to DDL triggers too. You want to be able to know what exactly is being executed, and against what objects. And you want to be able to abuse this information from either a C-coded server function or a PLpgSQL trigger. I guess the WIP JSON datatype would help a lot even when working from within the server, as that does not mean working in C. Regards, -- dim