Re: Idea: Avoid JOINs by using path expressions to follow FKs - Mailing list pgsql-hackers

From Pantelis Theodosiou
Subject Re: Idea: Avoid JOINs by using path expressions to follow FKs
Date
Msg-id CAE3TBxzGW5Qw_3CnnKZo3vRdH=p2EzxsEQpfkpAfpOTL47RbBg@mail.gmail.com
Whole thread Raw
In response to Idea: Avoid JOINs by using path expressions to follow FKs  ("Joel Jacobson" <joel@compiler.org>)
List pgsql-hackers

On Sat, Mar 27, 2021 at 8:28 PM Joel Jacobson <joel@compiler.org> wrote:
Hi,

The database Neo4j has a language called "Cypher" where one of the key selling points is they "don’t need join tables".


SQL:

SELECT DISTINCT c.company_name
FROM customers AS c
JOIN orders AS o ON c.customer_id = o.customer_id
JOIN order_details AS od ON o.order_id = od.order_id
JOIN products AS p ON od.product_id = p.product_id
WHERE p.product_name = 'Chocolade';

Neo4j's Cypher:

MATCH (p:product {product_name:"Chocolade"})<-[:PRODUCT]-(:order)<-[:PURCHASED]-(c:customer)
RETURN distinct c.company_name;

Imagine if we could simply write the SQL query like this:

SELECT DISTINCT od.order_id.customer_id.company_name
FROM order_details AS od
WHERE od.product_id.product_name = 'Chocolade';

I took the inspiration for this syntax from SQL/JSON path expressions.

Since there is only a single foreign key on the order_details.order_id column,
we would know how to resolve it, i.e. to the orders table,
and from there we would follow the customer_id column to the customers table,
where we would finally get the company_name value.

In the where clause, we would follow the order_details's product_id column
to the products table, to filter on product_name.

If there would be multiple foreign keys on a column we try to follow,
the query planner would throw an error forcing the user to use explicit joins instead.

I think this syntactic sugar could save a lot of unnecessary typing,
and as long as the column names are chosen wisely,
the path expression will be just as readable as the manual JOINs would be.

Thoughts?

/Joel

Just my 2c. The idea is nice but:

1. It is changing the FROM clause and the (size of the) intermediate result set. While in your example query there is no difference, you'd get different results if it was something like

SELECT p.product_name, COUNT(*)
FROM ...  (same joins)
GROUP BY p.product_name

2. If you want many columns in the SELECT list, possibly form many tables, you'll need to repeated the expressions. i.e. how you propose  to write this without repeating the link expressions?

SELECT p.product_name, p.price, p.category, c.company_name, c.address
...

3. SQL already provides methods to remove the join "noise", with JOIN USING (columns) when joining column have matching names and with NATURAL JOIN (with extreme care).

Finally, extending the specs in this novel way might put Postgres in a different path from the SQL specs in the future, especially if they have plans to add functionality for graph queries.

Best regards
Pantelis Theodosiou

pgsql-hackers by date:

Previous
From: "Joel Jacobson"
Date:
Subject: Re: [PATCH] pg_permissions
Next
From: Alvaro Herrera
Date:
Subject: Re: Idea: Avoid JOINs by using path expressions to follow FKs