Re: [HACKERS] [RFC] What would be difficult to make data modelspluggable for making PostgreSQL a multi-model database? - Mailing list pgsql-hackers

From Chris Travers
Subject Re: [HACKERS] [RFC] What would be difficult to make data modelspluggable for making PostgreSQL a multi-model database?
Date
Msg-id CAN-RpxBmMvAkKu0s3jTAnsD65daSGAjOXh+0ekBxYROcoSkmNA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?  ("MauMau" <maumau307@gmail.com>)
Responses Re: [HACKERS] [RFC] What would be difficult to make data modelspluggable for making PostgreSQL a multi-model database?
List pgsql-hackers


On Sun, Aug 20, 2017 at 4:10 AM, MauMau <maumau307@gmail.com> wrote:
From: Chris Travers
> Why cannot you do all this in a language handler and treat as a user
defined function?
> ...
> If you have a language handler for cypher, why do you need in_region
or cast_region?  Why not just have a graph_search() function which
takes in a cypher query and returns a set of records?

The language handler is for *stored* functions.  The user-defined
function (UDF) doesn't participate in the planning of the outer
(top-level) query.  And they both assume that they are executed in SQL
commands.

Sure but stored functions can take arguments, such as a query string which gets handled by the language handler.  There's absolutely no reason you cannot declare a function in C that takes in a Cypher query and returns a set of tuples.   And you can do a whole lot with preloaded shared libraries if you need to.

The planning bit is more difficult, but see below as to where I see major limits here.

I want the data models to meet these:

1) The query language can be used as a top-level session language.
For example, if an app specifies "region=cypher_graph" at database
connection, it can use the database as a graph database and submit
Cypher queries without embedding them in SQL.

That sounds like a foot gun.  I would probably think of those cases as being ideal for a custom background worker, similar to Mongress.   Expecting to be able to switch query languages on the fly strikes me as adding totally needless complexity everywhere to be honest.  Having different listeners on different ports simplifies this a lot and having, additionally, query languages for ad-hoc mixing via language handlers might be able to get most of what you want already.

2) When a query contains multiple query fragments of different data
models, all those fragments are parsed and planned before execution.
The planner comes up with the best plan, crossing the data model
boundary.  To take the query example in my first mail, which joins a
relational table and the result of a graph query.  The relational
planner considers how to scan the table, the graph planner considers
how to search the graph, and the relational planner considers how to
join the two fragments.

It seems like all you really need is a planner hook for user defined languages (I.e. "how many rows does this function return with these parameters" right?).  Right now we allow hints but they are static.  I wonder how hard this would be using preloaded, shared libraries.
 

So in_region() and cast_region() are not functions to be executed
during execution phase, but are syntax constructs that are converted,
during analysis phase, into calls to another region's parser/analyzer
and an inter-model cast routine.

So basically they work like immutable functions except that you cannot index the output?

1. The relational parser finds in_region('cypher_graph', 'graph
query') and produces a parse node InRegion(region_name, query) in the
parse tree.

2. The relational analyzer looks up the system catalog to checks if
the specified region exists, then calls its parser/analyzer to produce
the query tree for the graph query fragment.  The relational analyser
attaches the graph query tree to the InRegion node.

3. When the relational planner finds the graph query tree, it passes
the graph query tree to the graph planner to produce the graph
execution plan.

4. The relational planner produces a join plan node, based on the
costs/statistics of the relational table scan and graph query.  The
graph execution plan is attached to the join plan node.

The parse/query/plan nodes have a label to denote a region, so that
appropriate region's routines can be called.

It would be interesting to see how much of what you want you can get with what we currently have and what pieces are really missing. 

Am I right that if you wrote a function in C to take a Cypher query plan, and analyse it, and execute it, the only thing really missing would be feedback to the PostgreSQL planner regarding number of rows expected?

Regards
MauMau




--
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: [HACKERS] expanding inheritance in partition bound order
Next
From: Feike Steenbergen
Date:
Subject: [HACKERS] Document pgstattuple privileges without ambiguity