Re: [HACKERS] [RFC] What would be difficult to make data modelspluggable for making PostgreSQL a multi-model database? - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: [HACKERS] [RFC] What would be difficult to make data modelspluggable for making PostgreSQL a multi-model database?
Date
Msg-id CAMsr+YGmqcDD6JhP0OO-vMtJ-EftnNn4d8EXTV1Dc1BOmM9apQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?  ("MauMau" <maumau307@gmail.com>)
List pgsql-hackers
On 20 August 2017 at 10:10, MauMau <maumau307@gmail.com> wrote:
From: Chris Travers
> Why cannot you do all this in a language handler and treat as a user
defined function?
> ...
> If you have a language handler for cypher, why do you need in_region
or cast_region?  Why not just have a graph_search() function which
takes in a cypher query and returns a set of records?

The language handler is for *stored* functions.  The user-defined
function (UDF) doesn't participate in the planning of the outer
(top-level) query.  And they both assume that they are executed in SQL
commands.

While I generally agree with Tom on this, I think there are some useful ideas to examine.

Allow a UDF to emit multiple result sets that can then be incorporated into a outer query. IMO it'd be fine to support this by returning a wide row of REFCURSORs and then allow FETCH to be used in a subquery.

The UDF would need to be invoked before the rest of the query was planned, so the planner could learn the structure of the cursor's result sets.

Or some higher level concept could be introduced, like it was for aggregates and window functions, where one call can be made to get the output structure and some stats estimates, and another call (or series) to get the rows.

I guess you're going two steps further than that, seeking a more integrated model where the plugin can generate paths and participate more actively in planning, and where you can optionally make it the default so you don't need a SQL function call to access it.

If you want to pursue that, I suggest you start small and go step-by-step. Things like:

* Allow FETCH ... <refcursor> to be used in subqueries with explicitly listed output relation structure, like calling a function that returns record

* Allow pre-execution of parts of a query that produce refcursors used in subqueries, then finish planning the outer query once the cursor output types are known

* A construct that can inject arbitrary virtual relations into the namespace at parse-time, so you don't have to do the dance with refcursors. (Like WITH).

* Construct that can supply stats estimates for the virtual relations

So try to build it in stages.

You could also potentially use the FDW interface.
 
I want the data models to meet these:

1) The query language can be used as a top-level session language.
For example, if an app specifies "region=cypher_graph" at database
connection, it can use the database as a graph database and submit
Cypher queries without embedding them in SQL.

Why? What does this offer over the app or client tool wrapping its queries in "SELECT cypher_graph('....')" ?
 
2) When a query contains multiple query fragments of different data
models, all those fragments are parsed and planned before execution.
The planner comes up with the best plan, crossing the data model
boundary.  To take the query example in my first mail, which joins a
relational table and the result of a graph query.  The relational
planner considers how to scan the table, the graph planner considers
how to search the graph, and the relational planner considers how to
join the two fragments.

Here, what you need is a way to define a set of virtual relations on a per-query basis, where you can get stats estimates for the relations during planning.

I guess what you're imagining is something more sophisticated where you're generating some kind of sub-plan candidates, like the path model. With some kind of interaction so the sub-planner for the other model could know to generate a different sub-plan based on the context of the outer plan. I have no idea how that could work. But I think you have about zero chance of achieving what you want by going straight there. Focus on small incremental steps, preferably ones you can find other uses for too.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [HACKERS] POC: Sharing record typmods between backends
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] Update low-level backup documentation to match actual behavior