Re: Proposal to add --single-row to psql - Mailing list pgsql-hackers

From Darren Duncan
Subject Re: Proposal to add --single-row to psql
Date
Msg-id 519AD968.3030209@darrenduncan.net
Whole thread Raw
In response to Re: Proposal to add --single-row to psql  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I have actually been working on the task discussed in this thread, most relevant 
parts quoted below, for awhile now, and hope to have something concrete that you 
can use by the end of this summer.

My in-development Muldis D language is homoiconic as a core feature, its source 
code is data to it, and the native syntax of the language resembles an abstract 
syntax tree.  This tree of nodes primarily defines behavior of the code, but it 
also supports arbitrary metadata per node, which for example can be used to 
preserve concrete syntax for any programming language that can be parsed into 
Muldis D nodes or conversely generated from said.

For example, you can have one type of node defining a function, and its details 
are defined by its attributes or child nodes, such as its result type, its 
parameters, whether it is declared associative/commutative/etc or not, and the 
expression(s) defining its body.  Another type of node defines a call to a 
function, another type defines a text literal, another a relation literal, and 
so on.  Conversely, a node can define a schema or package etc.  Example 
metadata, which is also structured, could include line numbers or code comments 
or whitespace or exact numeric literal formats or quoting formats or exact 
keyword choices, for example.

Using these node data types, we have a fairly normalized representation of any 
source code (or data) that is easy to introspect or transform with code.  A key 
design of Muldis D is the clear separation of syntax and behavior.

A parser is just a function that takes (typically) a character string as input 
and produces a node (tree) as output.  A compiler is just a function or 
operation that takes a node (tree) as input and produces machine code.  An 
optimizer can be a function that derives one node tree from another, either as 
its own operation or typically as part of the compiler stage.

A compiler or optimizer generally can trivially ignore the node metadata, but a 
code generator or debugger can use it; metadata can be stripped without 
affecting behavior.  The canonical tree form can also easily be mapped 
losslessly with relation forms, such as typical information schemas have.

Practically all behavior is defined in terms of generic type and routine 
definitions.  Practically all system-defined types and routines are defined in 
one or more libraries/modules/packages that have the same format as those users 
would write like extensions.  So, all the relational operators have the same 
syntax as say string or numeric or array operators.

I envision that the most effective way to use Muldis D to handle an arbitrary 
programming language, including the native SQL syntax+behavior of each SQL DBMS, 
is to have a Muldis D library+parser pair for it.

For example, to represent PostgreSQL 9.2 most directly, we have a library with 
an explicitly defined type and routine for every built-in that Pg 9.2 has, and 
we also have a parser function that takes the SQL syntax that Pg 9.2 understands 
and produces a Muldis D node tree consisting of calls to the routines of that 
library or value selectors of types in that library (things like SELECT and 
INSERT etc are each mapped to a routine too).  That way, even with a standard 
node format that isn't specific to a typical language or version, the code for 
parsing Pg 9.2 SQL has the minimal amount of indirection that it has to deal 
with, as each syntax element would have a direct library call counterpart. 
Similarly, the existing Pg 9.2 SQL compiler would have the least indirection to 
take said nodes and execute them.  (The library would be named eg "Postgres_9_2" 
for example, which makes it easier say to also have one side-by-side for other 
versions, shims, legacy code you want to more easily support compatibility with.)

Where one decides to do cross-compilation, say make Oracle SQL run on Pg, that 
could be done as simply as defining a library for Oracle SQL with the 
routines+types that has, and then mapping it to a Pg one just in terms of shim 
calls (which the compiler can optimize away as applicable), and so parsers or 
compilers never necessarily have to deal with behavior compatibility issues.

I am presently working out the exact set of such language nodes, and making a 
reference implementation which is mostly self-defined and would compile to Perl 
code, and hope to have the first Muldis D executable by the end of this summer.

I am confident that an adaption of this design into C or whatever would serve 
Postgres greatly in letting it effectively parse multiple languages, anywhere 
from application programming languages to multiple SQL dialects or versions.

Even if you don't decide to use my design specifically (which is open source and 
you can influence it), I think you should find some of the general design 
principles I stated above to be useful.  Representing behavior as libraries the 
AST being flexible enough for any concrete language without being too specific 
to details of one.  And of course, cross-invocation of code written in multiple 
languages is made much easier.

Note, just to be clear, my proposal does not necessitate that all of a node tree 
has to be kept in memory at once.  This design should be adaptable to a 
streaming approach, especially as it is expected to be able to handle database 
dumps or transfers of arbitrary size, same as SQL engines can today.  That is in 
contrast to probably what most application languages would assume, where 
everything would fit in memory at once.  But the ability to stream or not would 
largely be an implementation detail.  Realistically, all code should fit in 
memory at once, and anything that would have to be buffered out of memory would 
be say embedded data literals (whether large strings or simply large relations).

If you're not sure how my proposal would address any of the needs or wants 
raised in the thread, go ahead and ask, and I will try and answer as time permits.

-- Darren Duncan

On 2013.05.11 9:27 AM, Tom Lane wrote:
> David Fetter <david@fetter.org> writes:
>> On Sat, May 11, 2013 at 11:17:03AM -0400, Robert Haas wrote:
>>> Some kind of extendable parser would be awesome.  It would need to tie
>>> into the rewriter also.
>>>
>>> No, I don't have a clue what the design looks like.
>
>> That's a direction several of the proprietary RDBMS vendors have
>> proposed.  I think it'd be great :)
>
>> Pre-coffee (yeah, I know.  Bad idea.) sketch of an idea: create an API
>> to pass expression trees in and out.  This could have other benefits
>> as to clustering space, shortening the planning cycle, etc., but let's
>> not go there for now.  My knowledge is very, very sketchy, but when I
>> squint, the expression trees we use look a lot like JSON.  Are they
>> isomorphic?
>
> By the time you've got an expression tree, the problem is mostly solved,
> at least so far as parser extension is concerned.
>
> More years ago than I care to admit, I worked on systems that had
> run-time-extensible parsers at Hewlett-Packard, so technology for this
> does exist.  But my (vague) memory of those systems is that the parser's
> language capabilities were more limited than bison's, perhaps only
> LL(1).  Parsing spec-compatible SQL that way might be a challenge.
>
> A larger issue is that if you don't have the whole grammar available
> to check, it's difficult to be sure there are no parsing conflicts.
> I seem to remember that we hit some conflicts between different
> extension ROMs back at HP :-(
>
> Another point is that extensions that are actually interesting require
> a lot more than new syntax.  Robert mentioned the rewriter, but you'd
> typically need planner and executor additions as well.  It's possible to
> see how whole new plan node types might be added by a plugin so far as
> the executor is concerned, but I haven't a clue how we'd get the planner
> to emit them ...





pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Removal of pageinspect--1.0.sql
Next
From: Bruce Momjian
Date:
Subject: Re: [PATCH] Correct release notes about DROP TABLE IF EXISTS and add, link.