Thread: Standalone Parser for PL/pgSQL

Standalone Parser for PL/pgSQL

From
Matt Miller
Date:
I'd like (to find or make) a utility that inputs the code of a Pl/pgSQL
function (e.g. from a text file or from STDIN, and then parses the
function definition, building a complete symbol table.  I would then
write C code that walks that symbol table and does stuff.  As a starting
point I'd be happy if I could just visit each node in the symbol table
and dump that node out to another file.

I'm thinking that the code in src/pl/plpgsql/src is where to start, but
I need some guidance sorting through that stuff.  I understand (in
theory, anyway) how flex and bison work, so maybe I just need a bit of
hand-holding to get a simple standalone PL/pgSQL parser up and running.

I want to analyze/transform some Oracle PL/SQL procs, and PL/pgSQL is so
close to PL/SQL that I figured that a PL/pgSQL parser would be a good
starting point.

Re: Standalone Parser for PL/pgSQL

From
Alvaro Herrera
Date:
On Wed, Jul 13, 2005 at 08:33:59PM +0000, Matt Miller wrote:
> I'd like (to find or make) a utility that inputs the code of a Pl/pgSQL
> function (e.g. from a text file or from STDIN, and then parses the
> function definition, building a complete symbol table.  I would then
> write C code that walks that symbol table and does stuff.  As a starting
> point I'd be happy if I could just visit each node in the symbol table
> and dump that node out to another file.
>
> I'm thinking that the code in src/pl/plpgsql/src is where to start, but
> I need some guidance sorting through that stuff.  I understand (in
> theory, anyway) how flex and bison work, so maybe I just need a bit of
> hand-holding to get a simple standalone PL/pgSQL parser up and running.

I don't think you can use just plpgsql's parser.  The problem is that it
relies on the main backend parser to figure out anything it doesn't
understand.  So you'd have to mix both parsers somehow.  The only
thought that comes to my mind is start with the backend's parser and add
productions for PL/pgSQL's constructs.

The main parser depends (at least) on the List handling and memory
handling.  So your "simple standalone parser" will have to contain both
things at least.

See src/backend/parser/README.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
A male gynecologist is like an auto mechanic who never owned a car.
(Carrie Snow)

Re: Standalone Parser for PL/pgSQL

From
Matt Miller
Date:
On Wed, 2005-07-13 at 17:04 -0400, Alvaro Herrera wrote:
> > a simple standalone PL/pgSQL parser
>
> it relies on the main backend parser ... you'd have to mix
> both parsers somehow. ... The main parser depends (at least)
> on the List handling and memory handling

Okay, you scared me off.

It looks like I'm back to Perl to script my conversion of PL/SQL to
PL/pgSQL.

Thanks for the advice.

Re: Standalone Parser for PL/pgSQL

From
Neil Conway
Date:
Alvaro Herrera wrote:
> I don't think you can use just plpgsql's parser.  The problem is that it
> relies on the main backend parser to figure out anything it doesn't
> understand.

I think it depends on what kind of information you want to extract from
a PL/PgSQL function definition. The PL/PgSQL parser handles the
structure of the PL/PgSQL function definition itself, but it does not
parse expressions or SQL queries. Those are essentially treated as
strings that are later handed to the main SQL machinery to be parsed and
evaluated. If you're content to treat expressions and SQL queries as
opaque strings, you shouldn't need to concern yourself with the main SQL
parser.

> The main parser depends (at least) on the List handling and memory
> handling.  So your "simple standalone parser" will have to contain both
> things at least.

The PL/PgSQL parser also depends on these, although to a lesser degree.

-Neil

Re: Standalone Parser for PL/pgSQL

From
Matt Miller
Date:
On Thu, 2005-07-14 at 14:32 +1000, Neil Conway wrote:
> Alvaro Herrera wrote:
> > I don't think you can use just plpgsql's parser.  ... it
> > relies on the main backend parser
>
> If you're content to treat expressions and SQL queries as
> opaque strings, you shouldn't need to concern yourself with the main SQL
> parser.

Treating queries as terminal nodes in the parse tree would be fine as a
starting point.

> > The main parser depends (at least) on the List handling and memory
> > handling.
>
> The PL/PgSQL parser also depends on these, although to a lesser degree.

I suppose these dependencies are okay as long as I can just link my
parser to a library (e.g. src/pl/plpgsql/src/libplpgsql.so) and
everything magically works.

My fantasy here is to perform some straightforward surgery and bring to
life an executable that accepts PL/pgSQL code (e.g. as text on STDIN)
and creates a parse tree in memory.  The stuff in src/pl/plpgsql/src/
looked like a good starting point.  gram.y and scan.l are there.
pl_funcs.c has some cool-looking "dump*" functions.  Now, can this stuff
be "straightforwardly" hacked into a program that gets its PL/pgSQL
source code from STDIN or from a text file, instead of from the "AS"
clause of a "CREATE FUNCTION" statement?

My goal is to automate (as much as possible) the conversion of Oracle
PL/SQL stored procedures into PL/pgSQL.  Up to this point I've been
exploring Perl as a tool for this, but things are getting a bit thick
for me to rely on Perl regexes as my most-powerful weapon.  Instead of
growing my Perl stuff to look more and more like a parser, I went
looking for a parser, and ended up in src/pl/plpgslq/src.

Re: Standalone Parser for PL/pgSQL

From
Richard_D_Levine@raytheon.com
Date:

pgsql-general-owner@postgresql.org wrote on 07/14/2005 10:38:43 AM:

> On Thu, 2005-07-14 at 14:32 +1000, Neil Conway wrote:
> > Alvaro Herrera wrote:
> > > I don't think you can use just plpgsql's parser.  ... it
> > > relies on the main backend parser
> >
> > If you're content to treat expressions and SQL queries as
> > opaque strings, you shouldn't need to concern yourself with the main
SQL
> > parser.
>
> Treating queries as terminal nodes in the parse tree would be fine as a
> starting point.
>
> > > The main parser depends (at least) on the List handling and memory
> > > handling.
> >
> > The PL/PgSQL parser also depends on these, although to a lesser degree.
>
> I suppose these dependencies are okay as long as I can just link my
> parser to a library (e.g. src/pl/plpgsql/src/libplpgsql.so) and
> everything magically works.
>
> My fantasy here is to perform some straightforward surgery and bring to
> life an executable that accepts PL/pgSQL code (e.g. as text on STDIN)
> and creates a parse tree in memory.  The stuff in src/pl/plpgsql/src/
> looked like a good starting point.  gram.y and scan.l are there.
> pl_funcs.c has some cool-looking "dump*" functions.  Now, can this stuff
> be "straightforwardly" hacked into a program that gets its PL/pgSQL
> source code from STDIN or from a text file, instead of from the "AS"
> clause of a "CREATE FUNCTION" statement?
>
> My goal is to automate (as much as possible) the conversion of Oracle
> PL/SQL stored procedures into PL/pgSQL.

Ya know, what would be *totally* cool is to implement Oracle PL/SQL syntax
transparently in PostgreSQL.  You just can't do packages in PL/pgSQL.  You
can mimic them with schemas, but there are things that must be done
differently, and probably more slowly, like package body global constants
and variables.  If I were doing it, which I'm not, I would implement
Oracle's quoting and not the PL/pgSQL quoting paradigm.  Man it's painful.
Understandable, due to the many languages supported, but painful
nonetheless.

> Up to this point I've been
> exploring Perl as a tool for this, but things are getting a bit thick
> for me to rely on Perl regexes as my most-powerful weapon.  Instead of
> growing my Perl stuff to look more and more like a parser, I went
> looking for a parser, and ended up in src/pl/plpgslq/src.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster


Re: Standalone Parser for PL/pgSQL

From
Alvaro Herrera
Date:
On Thu, Jul 14, 2005 at 03:38:43PM +0000, Matt Miller wrote:

> > > The main parser depends (at least) on the List handling and memory
> > > handling.
> >
> > The PL/PgSQL parser also depends on these, although to a lesser degree.
>
> I suppose these dependencies are okay as long as I can just link my
> parser to a library (e.g. src/pl/plpgsql/src/libplpgsql.so) and
> everything magically works.

Hmm, those symbols are defined in the main Postgres executable, so
there's no shared library to rely on.  Anyway, for the memory allocation
stuff, you should be able to

#define palloc(a) malloc(a)
#define pfree(a) free(a)

and define AllocSetContextCreate() and other MemoryContext calls defined
as no-ops.  It will leak memory like crazy, but for a short-lived
program I don't think it's a problem.

list.c you'll have to take from src/backend/nodes/list.c, I don't think
it should be too difficult.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"Porque francamente, si para saber manejarse a uno mismo hubiera que
rendir examen... ¿Quién es el machito que tendría carnet?"  (Mafalda)

Re: Standalone Parser for PL/pgSQL

From
Alvaro Herrera
Date:
On Thu, Jul 14, 2005 at 11:07:35AM -0500, Richard_D_Levine@raytheon.com wrote:

> Ya know, what would be *totally* cool is to implement Oracle PL/SQL syntax
> transparently in PostgreSQL.  You just can't do packages in PL/pgSQL.  You
> can mimic them with schemas, but there are things that must be done
> differently, and probably more slowly, like package body global constants
> and variables.

If that's what you want, you may be interested in checking out
EnterpriseDB's product, which does exactly this.  There's a free beta
you can try out if you like.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre)

Re: Standalone Parser for PL/pgSQL

From
Tom Lane
Date:
Matt Miller <mattm@epx.com> writes:
> ... The stuff in src/pl/plpgsql/src/
> looked like a good starting point.  gram.y and scan.l are there.
> pl_funcs.c has some cool-looking "dump*" functions.  Now, can this stuff
> be "straightforwardly" hacked into a program that gets its PL/pgSQL
> source code from STDIN or from a text file, instead of from the "AS"
> clause of a "CREATE FUNCTION" statement?

Try to compile those as a standalone program.  Observe the undefined
symbols you get.  Add other backend modules to your compile (or build
stub versions).  Lather, rinse, repeat until it links.

I think 90% of what you'd need is replacements for palloc and elog.
palloc can be turned into a simple malloc-and-check-for-failure,
and elog can probably just print the message and exit.

            regards, tom lane

Re: Standalone Parser for PL/pgSQL

From
Tino Wildenhain
Date:
Am Donnerstag, den 14.07.2005, 15:38 +0000 schrieb Matt Miller:
> On Thu, 2005-07-14 at 14:32 +1000, Neil Conway wrote:
> > Alvaro Herrera wrote:
> > > I don't think you can use just plpgsql's parser.  ... it
> > > relies on the main backend parser
> >
> > If you're content to treat expressions and SQL queries as
> > opaque strings, you shouldn't need to concern yourself with the main SQL
> > parser.
>
> Treating queries as terminal nodes in the parse tree would be fine as a
> starting point.

*snip*

I'm also extraordinarely interested in this work. I'd like to
see the parser as a lib (for me to add some stubs to python).

I think from a stand-allone parser I could make the changes
I need.