Re: Idea: Avoid JOINs by using path expressions to follow FKs - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: Idea: Avoid JOINs by using path expressions to follow FKs
Date
Msg-id CAFj8pRDhPUp-AAe=2e6WG_5P1Y-YaxN4yhPsnswdXfh7fjXb1g@mail.gmail.com
Whole thread Raw
In response to Re: Idea: Avoid JOINs by using path expressions to follow FKs  ("Joel Jacobson" <joel@compiler.org>)
Responses Re: Idea: Avoid JOINs by using path expressions to follow FKs
List pgsql-hackers


po 29. 3. 2021 v 12:01 odesílatel Joel Jacobson <joel@compiler.org> napsal:
On Sun, Mar 28, 2021, at 16:04, Tom Lane wrote:
I'm imagining a syntax in which
you give the constraint name instead of the column name.  Thought
experiment: how could the original syntax proposal make any use of
a multi-column foreign key?

Thanks for coming up with this genius idea.

At first I didn't see the beauty of it; I wrongly thought the constraint name needed to be
unique per schema, but I realize we could just use the foreign table's name
as the constraint name, which will allow a nice syntax:

SELECT DISTINCT order_details.orders.customers.company_name
FROM order_details
WHERE order_details.products.product_name = 'Chocolade';

This syntax is similar to Oracle's object references (this is example from thread from Czech Postgres list last week)

Select e.last_name employee,
       e.department_ref.department_name department,
       e.department_ref.manager_ref.last_name dept_manager
From employees_obj e
where e.initials() like 'K_';

I see few limitations: a) there is not support for outer join, b) there is not support for aliasing - and it probably doesn't too nice, when you want to returns more (but not all) columns

Regards

Pavel




Given this data model:

CREATE TABLE customers (
customer_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
company_name text,
PRIMARY KEY (customer_id)
);

CREATE TABLE orders (
order_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
customer_id bigint NOT NULL,
PRIMARY KEY (order_id),
CONSTRAINT customers FOREIGN KEY (customer_id) REFERENCES customers
);

CREATE TABLE products (
product_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
product_name text NOT NULL,
PRIMARY KEY (product_id)
);

CREATE TABLE order_details (
order_id bigint NOT NULL,
product_id bigint NOT NULL,
PRIMARY KEY (order_id, product_id),
CONSTRAINT orders FOREIGN KEY (order_id) REFERENCES orders,
CONSTRAINT products FOREIGN KEY (product_id) REFERENCES products
);

> Not saying I think this suggestion is a good idea, though. We've seen
> many frameworks that hide joins, and the results are ... less than
> universally good.

Yeah, I'm pretty much not sold on this idea either.  I think it would
lead to the same problems we see with ORMs, namely that people write
queries that are impossible to execute efficiently and then blame
the database for their poor choice of schema.

I think this concern is valid for the original syntax,
but I actually think the idea on using foreign key constraint names
effectively solves an entire class of query writing bugs.

Users writing queries using this syntax are guaranteed to be aware
of the existence of the foreign keys, otherwise they couldn't write
the query this way, since they must use the foreign key
constraint names in the path expression.

This ensures it's not possible to produce a nonsensical JOIN
on the wrong columns, a problem for which traditional JOINs
have no means to protect against.

Even with foreign keys, indexes could of course be missing,
causing an inefficient query anyway, but at least the classes
of potential problems is reduced by one.

I think what's neat is how this syntax works excellent in combination
with traditional JOINs, allowing the one which feels most natural for
each part of the query to be used.

Let's also remember foreign keys did first appear in SQL-89,
so they couldn't have been taken into account when SQL-86
was designed. Maybe they would have came up with the idea
of making more use of foreign key constraints,
if they would have been invented from the very beginning.

However, it's not too late to fix this, it seems doable without
breaking any backwards compatibility. I think there is a risk
our personal preferences are biased due to being experienced
SQL users. I think it's likely newcomers to SQL would really
fancy this proposed syntax, and cause them to prefer PostgreSQL
over some other NoSQL product.

If we can provide such newcomers with a built-in solution,
I think that better than telling them they should
use some ORM/tool/macro to simplify their query writing.

/Joel

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: proposal: unescape_text function
Next
From: Markus Wanner
Date:
Subject: Re: [PATCH] Provide more information to filter_prepare