Re: How hard would a "path" operator be to implement in PostgreSQL - Mailing list pgsql-general

From Chris Travers
Subject Re: How hard would a "path" operator be to implement in PostgreSQL
Date
Msg-id CAKt_ZfuWTipuZcwuOHP+mu=qQ3VoYy7VZNfH4ejnz6WAvOGwBQ@mail.gmail.com
Whole thread Raw
In response to Re: How hard would a "path" operator be to implement in PostgreSQL  (David Johnston <polobo@yahoo.com>)
List pgsql-general
Hi David;

On Sun, Aug 19, 2012 at 8:13 PM, David Johnston <polobo@yahoo.com> wrote:
On Aug 19, 2012, at 21:28, Chris Travers <chris.travers@gmail.com> wrote:

> Hi;
>
> I have been reading up on object-relational features of Oracle and DB2 and found that one of the big things they have that we don't is a path operator.  The idea is that you can use the path operator to follow some subset of foreign keys called refs.

Why do you feel this is a "big thing".  Sure, you can save a few keystrokes when writing multi-table queries but that doesn't seem all that great and now you are introducing ambiguity into the planner/query when trying to resolve these implicit joins.  I concur that introducing an explicit REF is a poor choice taken at face value since now you have to remember what references are present.  With FOREIGN KEYS you introduce a logical constraint but you are able to perform an join between two tables independent of the presence of an FK.

One of the major areas I am pushing right now in my own work is the idea of data encapsulation in the db and the idea that database interfaces might be contained in the db sufficient to be discoverable at run-time or through code-generating tools.  This allows easy interop with software written in all sorts of languages.  You can just port the code generation tools to a new language and suddenly you get your database access in that language.

So yes, I do think it's a significant step on one aspect of what can be done with O-R functionality in the database (what you might think of as query-wrapping O-R, as opposed to intra-query O-R which is more developed on Pg).  There are some ways to get around it but they require a fair bit of extra work.  Being able to centralize this would be nice.

Imagine being able to have the application specify address info to the db, and get related records automatically.

I could flesh out the use cases I am thinking about in a bit more detail, but suppose we have a reverse path operator (this is  *not* what's in DB2 or Oracle but woudl be similar to implement):

select eca.(*) from (select company(?, ?, ?, ?, ?)<-credit_accounts) eca.

This could be done entirely by the code generator without any knowledge of underlying data structures.


>
> Suppose we have a table (Oracle/DB2 styles here):
>
> CREATE TABLE country (
>     REF IS OID USER GENERATED,
>     id int serial not null unique,
>     name text primary key,
>     short_name text not null unique
> );
>
> CREATE TABLE address (
>     street_text text not null,
>     city text not null,
>     state_province text.
>     country REF(country)
> );
>
> In oracle this might allow you to do something like:
>
> SELECT * from address where address.country.short_name = 'US';
>
> In DB2 this might be done like:
>
> SELECT * FROM address WHERE address->country->short_name = 'US';
>
> I like DB2's approach better because there is no ambiguity between namespace resolution but I don't entirely like the way the refs work as separate from standard foreign keys.
>
> What I am thinking about is a function which accepts a row and a destination table name, looks up foreign keys in and retrieves the row from the other table, returning it, if the table name exists.   This could then be mapped to an operator which would avoid some problems.  This could then be:
>
> SELECT * FROM address where (address->'country').short_name = 'US'

short for:

SELECT address.*
FROM address
JOIN country ON (country_id = country.id)
WHERE short_name = 'US';

depending on naming scheme a NATURAL JOIN or USING can shorten this a bit.

I guess there is some value in not making the columns from country available to the SELECT LIST...

Another form equates to...

SELECT * FROM address WHERE EXISTS (SELECT 1 FROM country WHERE id = country_id AND short_name = 'US';

This second form seems a more direct translation of the query.  At first glance this would seem to be a bad query to write for the given goal but maybe the planner does the right thing and so the two pure forms perform equivalently?

>
> Or if we also have a continents table:
>
> SELECT * FROM address where (address->'country'->'continent').name = 'North America';
>
> Obviously these examples assume a very small number of address records and are largely contrived from the IBM examples.  However you could also do this on a small return set:

As implied above if this only is going to generate meaningful plans on small datasets its value seems even more limited...

At least for prototyping it might be useful to look at design patterns, and then see what can be done to make it more useful.  There isn't much sense in trying to figure everything out at once if the effort turns out to be largely wasted. 

>
> select (a->'country').short_name from address where ....;
>
> Or even:
>
> select (a->'country').* from address where......
>
> The next question is whether there is a way to pass country in as an identifier so there is no need to use single quotes.  This would make things a little more transparent if possible but I would be happy without this.  Eventually it might be kinda useful (for those porting O-R stuff from Oracle or DB2) to have a path operator built in with a concept of a default, implicit join.
>
> Any thoughts?  If it can be done in plain SQL and inlined that would be ideal but in the prototyping state, that isn't so important and I expect that it is not.

I am presuming this syntax and capability is not specified in the SQL standard anywhere?

I don't have a dog in the "make things easier to port" fight but this seems like something that would be fairly easy to identify in Oracle/DB2 SQL code and to rewrite using explicit joins without losing any actual functionality.

Given that you come across this via "reading" it doesn't sound like you have actually been bitten by its lack nor are having people complain about its lack so I suppose you are arguing for its inclusion based upon its own merits in making coding easier.

 I suspect not a lot is being done with PostgreSQL in terms of the sorts of O-R capabilities it has, let alone those that other db's have added.....

Also note that the current workarounds are both maintenance-heavy and limited.  One could currently build stored procedures such that you could:

SELECT * FROM address WHERE ((address.country).continent).name = 'North America';

But then that's two nested function calls, and those functions need to be separately maintained.  The functions may be able to be inlined or not.  I don't know.  However if there was a way to do something like this through an operation, it might actually be easier.  This would make it more useful for intra-query O-R.

Best Wishes,
Chris Travers

pgsql-general by date:

Previous
From: Craig Ringer
Date:
Subject: Re: How hard would a "path" operator be to implement in PostgreSQL
Next
From: fashouri
Date:
Subject: importing updated files into a database