Home > mailing lists

Thread: Idea: Avoid JOINs by using path expressions to follow FKs

Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

27 March 2021, 23:27:51

Hi,

The database Neo4j has a language called "Cypher" where one of the key selling points is they "don’t need join tables".

Here is an example from https://neo4j.com/developer/cypher/guide-sql-to-cypher/

SQL:

SELECT DISTINCT c.company_name

FROM customers AS c

JOIN orders AS o ON c.customer_id = o.customer_id

JOIN order_details AS od ON o.order_id = od.order_id

JOIN products AS p ON od.product_id = p.product_id

WHERE p.product_name = 'Chocolade';

Neo4j's Cypher:

MATCH (p:product {product_name:"Chocolade"})<-[:PRODUCT]-(:order)<-[:PURCHASED]-(c:customer)

RETURN distinct c.company_name;

Imagine if we could simply write the SQL query like this:

SELECT DISTINCT od.order_id.customer_id.company_name

FROM order_details AS od

WHERE od.product_id.product_name = 'Chocolade';

I took the inspiration for this syntax from SQL/JSON path expressions.

Since there is only a single foreign key on the order_details.order_id column,

we would know how to resolve it, i.e. to the orders table,

and from there we would follow the customer_id column to the customers table,

where we would finally get the company_name value.

In the where clause, we would follow the order_details's product_id column

to the products table, to filter on product_name.

If there would be multiple foreign keys on a column we try to follow,

the query planner would throw an error forcing the user to use explicit joins instead.

I think this syntactic sugar could save a lot of unnecessary typing,

and as long as the column names are chosen wisely,

the path expression will be just as readable as the manual JOINs would be.

Thoughts?

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Pantelis Theodosiou

Date:

28 March 2021, 00:01:06

On Sat, Mar 27, 2021 at 8:28 PM Joel Jacobson <joel@compiler.org> wrote:

Hi,

The database Neo4j has a language called "Cypher" where one of the key selling points is they "don’t need join tables".

Here is an example from https://neo4j.com/developer/cypher/guide-sql-to-cypher/

SQL:

SELECT DISTINCT c.company_name
FROM customers AS c
JOIN orders AS o ON c.customer_id = o.customer_id
JOIN order_details AS od ON o.order_id = od.order_id
JOIN products AS p ON od.product_id = p.product_id
WHERE p.product_name = 'Chocolade';

Neo4j's Cypher:

MATCH (p:product {product_name:"Chocolade"})<-[:PRODUCT]-(:order)<-[:PURCHASED]-(c:customer)
RETURN distinct c.company_name;

Imagine if we could simply write the SQL query like this:

SELECT DISTINCT od.order_id.customer_id.company_name
FROM order_details AS od
WHERE od.product_id.product_name = 'Chocolade';

I took the inspiration for this syntax from SQL/JSON path expressions.

Since there is only a single foreign key on the order_details.order_id column,
we would know how to resolve it, i.e. to the orders table,
and from there we would follow the customer_id column to the customers table,
where we would finally get the company_name value.

In the where clause, we would follow the order_details's product_id column
to the products table, to filter on product_name.

If there would be multiple foreign keys on a column we try to follow,
the query planner would throw an error forcing the user to use explicit joins instead.

I think this syntactic sugar could save a lot of unnecessary typing,
and as long as the column names are chosen wisely,
the path expression will be just as readable as the manual JOINs would be.

Thoughts?

/Joel

Just my 2c. The idea is nice but:

1. It is changing the FROM clause and the (size of the) intermediate result set. While in your example query there is no difference, you'd get different results if it was something like

SELECT p.product_name, COUNT(*)

FROM ... (same joins)

GROUP BY p.product_name

2. If you want many columns in the SELECT list, possibly form many tables, you'll need to repeated the expressions. i.e. how you propose to write this without repeating the link expressions?

SELECT p.product_name, p.price, p.category, c.company_name, c.address

...

3. SQL already provides methods to remove the join "noise", with JOIN USING (columns) when joining column have matching names and with NATURAL JOIN (with extreme care).

Finally, extending the specs in this novel way might put Postgres in a different path from the SQL specs in the future, especially if they have plans to add functionality for graph queries.

Best regards

Pantelis Theodosiou

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Alvaro Herrera

Date:

28 March 2021, 00:11:06

On 2021-Mar-27, Joel Jacobson wrote:

> If there would be multiple foreign keys on a column we try to follow,
> the query planner would throw an error forcing the user to use explicit joins instead.

This seems pretty dangerous -- you just have to create one more FK, and
suddenly a query that worked perfectly fine, now starts throwing errors
because it's now ambiguous.  Feels a bit like JOIN NATURAL, which many
people discourage because of this problem.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
Si no sabes adonde vas, es muy probable que acabes en otra parte.

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Vik Fearing

Date:

28 March 2021, 13:25:46

On 3/27/21 9:27 PM, Joel Jacobson wrote:
> Hi,
> 
> The database Neo4j has a language called "Cypher" where one of the key selling points is they "don’t need join
tables".
> 
> Here is an example from https://neo4j.com/developer/cypher/guide-sql-to-cypher/
> 
> SQL:
> 
> SELECT DISTINCT c.company_name
> FROM customers AS c
> JOIN orders AS o ON c.customer_id = o.customer_id
> JOIN order_details AS od ON o.order_id = od.order_id
> JOIN products AS p ON od.product_id = p.product_id
> WHERE p.product_name = 'Chocolade';
> 
> Neo4j's Cypher:
> 
> MATCH (p:product {product_name:"Chocolade"})<-[:PRODUCT]-(:order)<-[:PURCHASED]-(c:customer)
> RETURN distinct c.company_name;
> 
> Imagine if we could simply write the SQL query like this:
> 
> SELECT DISTINCT od.order_id.customer_id.company_name
> FROM order_details AS od
> WHERE od.product_id.product_name = 'Chocolade';
> 
> I took the inspiration for this syntax from SQL/JSON path expressions.

This is a terrible idea, but let me explain why.

First of all, neo4j claims they don't have joins because they actually
don't have joins.  The nodes point directly to the other nodes.  Your
proposal is syntactic sugar over a real join.  The equivalent to neo4j
would be somehow storing the foreign ctid in the column or something.

Secondly, and more importantly IMO, graph queries are coming to SQL.
They are mostly based on Cypher but not entirely because they amalgamate
concepts from other graph query languages like Oracle's PGQ.  The
"common" syntax is called GQL (https://www.gqlstandards.org/) and it's
in the process of becoming Part 16 of the SQL standard.  The timeline on
that website says August 2022 (it started in September 2019).

If that timeline holds and ambitious people work on it (I already know
one who is; not me), I would expect this to be in PostgreSQL 16 or 17.
The earliest your proposal could get in is 15, so it's not that much of
a wait.
-- 
Vik Fearing

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

28 March 2021, 14:26:54

On Sun, Mar 28, 2021, at 12:25, Vik Fearing wrote:

On 3/27/21 9:27 PM, Joel Jacobson wrote:
> Imagine if we could simply write the SQL query like this:
>
> SELECT DISTINCT od.order_id.customer_id.company_name
> FROM order_details AS od
> WHERE od.product_id.product_name = 'Chocolade';
>
> I took the inspiration for this syntax from SQL/JSON path expressions.

This is a terrible idea, but let me explain why.

First of all, neo4j claims they don't have joins because they actually
don't have joins. The nodes point directly to the other nodes. Your
proposal is syntactic sugar over a real join. The equivalent to neo4j
would be somehow storing the foreign ctid in the column or something.

Secondly, and more importantly IMO, graph queries are coming to SQL.
They are mostly based on Cypher but not entirely because they amalgamate
concepts from other graph query languages like Oracle's PGQ. The
"common" syntax is called GQL (https://www.gqlstandards.org/) and it's
in the process of becoming Part 16 of the SQL standard. The timeline on
that website says August 2022 (it started in September 2019).

If that timeline holds and ambitious people work on it (I already know
one who is; not me), I would expect this to be in PostgreSQL 16 or 17.
The earliest your proposal could get in is 15, so it's not that much of
a wait.

I think you misunderstood my idea entirely.

It has absolutely nothing to do with graph query languages,

except the one similarity I mentioned, not having joins.

What I propose is a way to do implicit joins by following foreign keys,

thus improving the SQL query language, making it more concise for

the simple case when what you want to do is an INNER JOIN on a

single column on which there is a single FOREIGN KEY.

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Pavel Stehule

Date:

28 March 2021, 14:51:19

ne 28. 3. 2021 v 13:27 odesílatel Joel Jacobson <joel@compiler.org> napsal:

On Sun, Mar 28, 2021, at 12:25, Vik Fearing wrote:
On 3/27/21 9:27 PM, Joel Jacobson wrote:
> Imagine if we could simply write the SQL query like this:
>
> SELECT DISTINCT od.order_id.customer_id.company_name
> FROM order_details AS od
> WHERE od.product_id.product_name = 'Chocolade';
>
> I took the inspiration for this syntax from SQL/JSON path expressions.

This is a terrible idea, but let me explain why.

First of all, neo4j claims they don't have joins because they actually
don't have joins. The nodes point directly to the other nodes. Your
proposal is syntactic sugar over a real join. The equivalent to neo4j
would be somehow storing the foreign ctid in the column or something.

Secondly, and more importantly IMO, graph queries are coming to SQL.
They are mostly based on Cypher but not entirely because they amalgamate
concepts from other graph query languages like Oracle's PGQ. The
"common" syntax is called GQL (https://www.gqlstandards.org/) and it's
in the process of becoming Part 16 of the SQL standard. The timeline on
that website says August 2022 (it started in September 2019).

If that timeline holds and ambitious people work on it (I already know
one who is; not me), I would expect this to be in PostgreSQL 16 or 17.
The earliest your proposal could get in is 15, so it's not that much of
a wait.

I think you misunderstood my idea entirely.

It has absolutely nothing to do with graph query languages,
except the one similarity I mentioned, not having joins.

What I propose is a way to do implicit joins by following foreign keys,
thus improving the SQL query language, making it more concise for
the simple case when what you want to do is an INNER JOIN on a
single column on which there is a single FOREIGN KEY.

There were some similar tools already. Personally I like the current state, where tables should be explicitly defined, and join should be explicitly defined. The joining of tables is not cheap - and I like the visibility of this. On the other hand, this is very frustrable for a lot of people, and I can understand. I don't want to see this feature inside Postgres, because it can reduce the possibility to detect badly written query. But it can be a great feature of some outer tool. I worked for GoodData and this tool knows the model, and it generates necessary joins implicitly, and people like it (this tool uses Postgres too).

https://www.gooddata.com/

Regards

Pavel

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

28 March 2021, 15:15:42

On Sat, Mar 27, 2021, at 22:11, Alvaro Herrera wrote:

On 2021-Mar-27, Joel Jacobson wrote:

> If there would be multiple foreign keys on a column we try to follow,
> the query planner would throw an error forcing the user to use explicit joins instead.

This seems pretty dangerous -- you just have to create one more FK, and
suddenly a query that worked perfectly fine, now starts throwing errors
because it's now ambiguous.

Creating one more FK referencing some other column,

would break queries in the same way USING breaks,

if a column is added which causes ambiguity.

In my experience, it's extremely rare to have multiple different FKs on the same set of columns.

Maybe I'm missing something here, can we think of a realistic use-case?

If such a FK if created, it would break in the same way as USING breaks

if a column is added which causes ambiguity, except this is much less likely to happen than the equivalent use case.

I think this problem is hypothetical compared to the actual problem with USING,

since adding a column with the same name as some existing column actually happens sometimes.

Feels a bit like JOIN NATURAL, which many
people discourage because of this problem.

The problem with NATURAL is due to matching based on column names.

My proposal doesn't match on column names at all.

It merely follows the foreign key for a column.

With NATURAL you can also suddenly get a different join,

whereas my proposal at worst will generate an error due to multiple FKs on the same column,

there can never be any ambiguity.

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

28 March 2021, 15:38:44

On Sun, Mar 28, 2021, at 13:51, Pavel Stehule wrote:

There were some similar tools already. Personally I like the current state, where tables should be explicitly defined, and join should be explicitly defined. The joining of tables is not cheap - and I like the visibility of this. On the other hand, this is very frustrable for a lot of people, and I can understand. I don't want to see this feature inside Postgres, because it can reduce the possibility to detect badly written query. But it can be a great feature of some outer tool. I worked for GoodData and this tool knows the model, and it generates necessary joins implicitly, and people like it (this tool uses Postgres too).

https://www.gooddata.com/

Very good points.

As a counter-argument, I could argue that you don't need to use this feature.

But that would be a poor argument, since you might have to work with code

written by other developers.

I'm also fearful of newbies misusing features, not understanding what they are doing, producing inefficient code.

However, this is a general problem, complex queries are hard to reason about,

and I'm not sure making some INNER JOINs implicit would worsen the situation,

you could also make the counter-argument that the remaining explicit JOINs become more visible,

and will stand-out, exposing what is really complex in the query.

Also, this proposed syntax would surely appeal to the NoSQL-crowd,

and should reduce their cravings for MongoDB.

So ask yourself the following question: Ten years from now, would you rather be forced to

work with code using MongoDB or a more concise SQL?

Lastly, let me reiterate I think you made a very good point,

your argument is the heaviest weigh on the negative side of my own scale.

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Andrew Dunstan

Date:

28 March 2021, 16:23:13

On 3/27/21 5:11 PM, Alvaro Herrera wrote:
> On 2021-Mar-27, Joel Jacobson wrote:
>
>> If there would be multiple foreign keys on a column we try to follow,
>> the query planner would throw an error forcing the user to use explicit joins instead.
> This seems pretty dangerous -- you just have to create one more FK, and
> suddenly a query that worked perfectly fine, now starts throwing errors
> because it's now ambiguous.  Feels a bit like JOIN NATURAL, which many
> people discourage because of this problem.
>

Maybe. I don't recall ever having seen a column with more than one FK.
Is that a common thing? In itself it seems like a bad idea.

Not saying I think this suggestion is a good idea, though. We've seen
many frameworks that hide joins, and the results are ... less than
universally good. If your application programmers don't like using
joins, then either a) you should have the DBA create some views for them
that contain the joins, or b) you have the wrong application programmers -:)

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Pavel Stehule

Date:

28 March 2021, 16:36:28

ne 28. 3. 2021 v 14:39 odesílatel Joel Jacobson <joel@compiler.org> napsal:

On Sun, Mar 28, 2021, at 13:51, Pavel Stehule wrote:
There were some similar tools already. Personally I like the current state, where tables should be explicitly defined, and join should be explicitly defined. The joining of tables is not cheap - and I like the visibility of this. On the other hand, this is very frustrable for a lot of people, and I can understand. I don't want to see this feature inside Postgres, because it can reduce the possibility to detect badly written query. But it can be a great feature of some outer tool. I worked for GoodData and this tool knows the model, and it generates necessary joins implicitly, and people like it (this tool uses Postgres too).

https://www.gooddata.com/

Very good points.

As a counter-argument, I could argue that you don't need to use this feature.
But that would be a poor argument, since you might have to work with code
written by other developers.

I'm also fearful of newbies misusing features, not understanding what they are doing, producing inefficient code.
However, this is a general problem, complex queries are hard to reason about,
and I'm not sure making some INNER JOINs implicit would worsen the situation,
you could also make the counter-argument that the remaining explicit JOINs become more visible,
and will stand-out, exposing what is really complex in the query.

It is not the problem only for newbies - yesterday a very experienced user (I know him personally) reported an issue related to misunderstanding some behaviour and just some typo, I like some mandatory redundancy in syntax, because it allows to detect some typo errors. SQL is not consistent in this - the query is relatively safe, but if you use subqueries, then are not safe because you can use an outer identifier without qualification, and what is worse, the identifiers are prioritized - there is not amobigonuous column check. So SQL has enough traps already, and I am afraid to introduce some new implicit features.

Theoretically you can introduce own procedural language

CREATE OR REPLACE FUNCTION foo(a int)

RETURNS TABLE (x int, y int) AS $$

SELECT t1.x, t2.y WHERE t3.a = a;

$$ LANGUAGE mylanguage.

It is well wrapped, and well isolated.

Also, this proposed syntax would surely appeal to the NoSQL-crowd,
and should reduce their cravings for MongoDB.

So ask yourself the following question: Ten years from now, would you rather be forced to
work with code using MongoDB or a more concise SQL?

I am a man who likes SQL - for me, it is a human readable language with a good level of verbosity and abstraction - all time, when I learned SQL. But I see that SQL is not a fully "safe" language. It allows bad joins, or doesn't detect all human errors. It can be a good reason for a new layer over SQL - some more abstract language. And it can work - I have really good experience with GoodData query language. This is a transpiler from domain language to SQL.

I think so every tool, every layer should have a similar level of abstraction to be well usable.

Lastly, let me reiterate I think you made a very good point,
your argument is the heaviest weigh on the negative side of my own scale.

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Tom Lane

Date:

28 March 2021, 17:04:53

Andrew Dunstan <andrew@dunslane.net> writes:
> On 3/27/21 5:11 PM, Alvaro Herrera wrote:
>> This seems pretty dangerous -- you just have to create one more FK, and
>> suddenly a query that worked perfectly fine, now starts throwing errors
>> because it's now ambiguous.  Feels a bit like JOIN NATURAL, which many
>> people discourage because of this problem.

> Maybe. I don't recall ever having seen a column with more than one FK.
> Is that a common thing? In itself it seems like a bad idea.

Yeah, that aspect seems like a complete show-stopper.  We have a way
to enforce that you can't *drop* a constraint that some stored view
depends on for semantic validity.  We don't have a way to say that
you can't *add* a constraint-with-certain-properties.  And I don't
think it'd be very practical to do (consider race conditions, if
nothing more).

However, that stumbling block is just dependent on the assumption
that the foreign key constraint being used is implicit.  If the
syntax names it explicitly then you just have a normal constraint
dependency and all's well.

You might be able to have a shorthand notation in which the constraint
isn't named and the system will accept it as long as there's just one
candidate (but then, when dumping a stored view, the constraint name
would always be shown explicitly).  However I'm not sure that the
"shorthand" would be any shorter.  I'm imagining a syntax in which
you give the constraint name instead of the column name.  Thought
experiment: how could the original syntax proposal make any use of
a multi-column foreign key?

> Not saying I think this suggestion is a good idea, though. We've seen
> many frameworks that hide joins, and the results are ... less than
> universally good.

Yeah, I'm pretty much not sold on this idea either.  I think it would
lead to the same problems we see with ORMs, namely that people write
queries that are impossible to execute efficiently and then blame
the database for their poor choice of schema.

            regards, tom lane

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

28 March 2021, 17:39:39

On Sun, Mar 28, 2021, at 16:04, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:
> Maybe. I don't recall ever having seen a column with more than one FK.
> Is that a common thing? In itself it seems like a bad idea.

Yeah, that aspect seems like a complete show-stopper. We have a way
to enforce that you can't *drop* a constraint that some stored view
depends on for semantic validity. We don't have a way to say that
you can't *add* a constraint-with-certain-properties. And I don't
think it'd be very practical to do (consider race conditions, if
nothing more).

Thanks for valuable insights, I didn't think about these things.

What if the path expressions are just syntactic sugar for an INNER JOIN on the referencing -> referenced column?

If a VIEW is created using this syntax, it would be stored as INNER JOINs, similar to how SELECT * is expanded.

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Andrew Dunstan

Date:

28 March 2021, 17:40:21

On 3/28/21 10:04 AM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 3/27/21 5:11 PM, Alvaro Herrera wrote:
>>> This seems pretty dangerous -- you just have to create one more FK, and
>>> suddenly a query that worked perfectly fine, now starts throwing errors
>>> because it's now ambiguous.  Feels a bit like JOIN NATURAL, which many
>>> people discourage because of this problem.
>> Maybe. I don't recall ever having seen a column with more than one FK.
>> Is that a common thing? In itself it seems like a bad idea.
> Yeah, that aspect seems like a complete show-stopper.  We have a way
> to enforce that you can't *drop* a constraint that some stored view
> depends on for semantic validity.  We don't have a way to say that
> you can't *add* a constraint-with-certain-properties.  And I don't
> think it'd be very practical to do (consider race conditions, if
> nothing more).
>
> However, that stumbling block is just dependent on the assumption
> that the foreign key constraint being used is implicit.  If the
> syntax names it explicitly then you just have a normal constraint
> dependency and all's well.
>
> You might be able to have a shorthand notation in which the constraint
> isn't named and the system will accept it as long as there's just one
> candidate (but then, when dumping a stored view, the constraint name
> would always be shown explicitly).  However I'm not sure that the
> "shorthand" would be any shorter.  I'm imagining a syntax in which
> you give the constraint name instead of the column name.  Thought
> experiment: how could the original syntax proposal make any use of
> a multi-column foreign key?


I guess we could have a special operator, which allows the LHS to be
either a column (in which case it must have only one single-valued FK
constraint) or a constraint name in which case it would match the
corresponding columns on both sides.


It gets kinda tricky though, as there are FKs going both ways:


    customers <- orders <- order_details -> products


and in fact this could make composing the query LESS clear. The natural
place to start this query (show me the name of every customer who
ordered chocolate) is with orders ISTM, but the example given starts
with order_details which seems somewhat unnatural.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Vik Fearing

Date:

28 March 2021, 23:43:30

On 3/28/21 1:26 PM, Joel Jacobson wrote:
> On Sun, Mar 28, 2021, at 12:25, Vik Fearing wrote:
>> On 3/27/21 9:27 PM, Joel Jacobson wrote:
>>> Imagine if we could simply write the SQL query like this:
>>>
>>> SELECT DISTINCT od.order_id.customer_id.company_name
>>> FROM order_details AS od
>>> WHERE od.product_id.product_name = 'Chocolade';
>>>
>>> I took the inspiration for this syntax from SQL/JSON path expressions.
>>
>> This is a terrible idea, but let me explain why.
>>
>> First of all, neo4j claims they don't have joins because they actually
>> don't have joins.  The nodes point directly to the other nodes.  Your
>> proposal is syntactic sugar over a real join.  The equivalent to neo4j
>> would be somehow storing the foreign ctid in the column or something.
>>
>> Secondly, and more importantly IMO, graph queries are coming to SQL.
>> They are mostly based on Cypher but not entirely because they amalgamate
>> concepts from other graph query languages like Oracle's PGQ.  The
>> "common" syntax is called GQL (https://www.gqlstandards.org/) and it's
>> in the process of becoming Part 16 of the SQL standard.  The timeline on
>> that website says August 2022 (it started in September 2019).
>>
>> If that timeline holds and ambitious people work on it (I already know
>> one who is; not me), I would expect this to be in PostgreSQL 16 or 17.
>> The earliest your proposal could get in is 15, so it's not that much of
>> a wait.
> 
> I think you misunderstood my idea entirely.
> 
> It has absolutely nothing to do with graph query languages,
> except the one similarity I mentioned, not having joins.

In that case, I oppose this suggestion.

> What I propose is a way to do implicit joins by following foreign keys,
> thus improving the SQL query language, making it more concise for
> the simple case when what you want to do is an INNER JOIN on a
> single column on which there is a single FOREIGN KEY.

SQL is not an implicit language, though.

I wouldn't mind something like

  FROM a JOIN b WITH a_b_fk

or something.  That would be really nice when the keys are multicolumn.
But I don't want an implicit join like you're suggesting.
-- 
Vik Fearing

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

29 March 2021, 12:59:48

On Sun, Mar 28, 2021, at 16:04, Tom Lane wrote:

I'm imagining a syntax in which
you give the constraint name instead of the column name. Thought
experiment: how could the original syntax proposal make any use of
a multi-column foreign key?

Thanks for coming up with this genius idea.

At first I didn't see the beauty of it; I wrongly thought the constraint name needed to be

unique per schema, but I realize we could just use the foreign table's name

as the constraint name, which will allow a nice syntax:

SELECT DISTINCT order_details.orders.customers.company_name

FROM order_details

WHERE order_details.products.product_name = 'Chocolade';

Given this data model:

CREATE TABLE customers (

customer_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,

company_name text,

PRIMARY KEY (customer_id)

);

CREATE TABLE orders (

order_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,

customer_id bigint NOT NULL,

PRIMARY KEY (order_id),

CONSTRAINT customers FOREIGN KEY (customer_id) REFERENCES customers

);

CREATE TABLE products (

product_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,

product_name text NOT NULL,

PRIMARY KEY (product_id)

);

CREATE TABLE order_details (

order_id bigint NOT NULL,

product_id bigint NOT NULL,

PRIMARY KEY (order_id, product_id),

CONSTRAINT orders FOREIGN KEY (order_id) REFERENCES orders,

CONSTRAINT products FOREIGN KEY (product_id) REFERENCES products

);

> Not saying I think this suggestion is a good idea, though. We've seen
> many frameworks that hide joins, and the results are ... less than
> universally good.

Yeah, I'm pretty much not sold on this idea either. I think it would
lead to the same problems we see with ORMs, namely that people write
queries that are impossible to execute efficiently and then blame
the database for their poor choice of schema.

I think this concern is valid for the original syntax,

but I actually think the idea on using foreign key constraint names

effectively solves an entire class of query writing bugs.

Users writing queries using this syntax are guaranteed to be aware

of the existence of the foreign keys, otherwise they couldn't write

the query this way, since they must use the foreign key

constraint names in the path expression.

This ensures it's not possible to produce a nonsensical JOIN

on the wrong columns, a problem for which traditional JOINs

have no means to protect against.

Even with foreign keys, indexes could of course be missing,

causing an inefficient query anyway, but at least the classes

of potential problems is reduced by one.

I think what's neat is how this syntax works excellent in combination

with traditional JOINs, allowing the one which feels most natural for

each part of the query to be used.

Let's also remember foreign keys did first appear in SQL-89,

so they couldn't have been taken into account when SQL-86

was designed. Maybe they would have came up with the idea

of making more use of foreign key constraints,

if they would have been invented from the very beginning.

However, it's not too late to fix this, it seems doable without

breaking any backwards compatibility. I think there is a risk

our personal preferences are biased due to being experienced

SQL users. I think it's likely newcomers to SQL would really

fancy this proposed syntax, and cause them to prefer PostgreSQL

over some other NoSQL product.

If we can provide such newcomers with a built-in solution,

I think that better than telling them they should

use some ORM/tool/macro to simplify their query writing.

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Pavel Stehule

Date:

29 March 2021, 13:48:05

po 29. 3. 2021 v 12:01 odesílatel Joel Jacobson <joel@compiler.org> napsal:

On Sun, Mar 28, 2021, at 16:04, Tom Lane wrote:
I'm imagining a syntax in which
you give the constraint name instead of the column name. Thought
experiment: how could the original syntax proposal make any use of
a multi-column foreign key?

Thanks for coming up with this genius idea.

At first I didn't see the beauty of it; I wrongly thought the constraint name needed to be
unique per schema, but I realize we could just use the foreign table's name
as the constraint name, which will allow a nice syntax:

SELECT DISTINCT order_details.orders.customers.company_name
FROM order_details
WHERE order_details.products.product_name = 'Chocolade';

This syntax is similar to Oracle's object references (this is example from thread from Czech Postgres list last week)

Select e.last_name employee,
e.department_ref.department_name department,
e.department_ref.manager_ref.last_name dept_manager
From employees_obj e
where e.initials() like 'K_';

I see few limitations: a) there is not support for outer join, b) there is not support for aliasing - and it probably doesn't too nice, when you want to returns more (but not all) columns

Regards

Pavel

Given this data model:

CREATE TABLE customers (
customer_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
company_name text,
PRIMARY KEY (customer_id)
);

CREATE TABLE orders (
order_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
customer_id bigint NOT NULL,
PRIMARY KEY (order_id),
CONSTRAINT customers FOREIGN KEY (customer_id) REFERENCES customers
);

CREATE TABLE products (
product_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
product_name text NOT NULL,
PRIMARY KEY (product_id)
);

CREATE TABLE order_details (
order_id bigint NOT NULL,
product_id bigint NOT NULL,
PRIMARY KEY (order_id, product_id),
CONSTRAINT orders FOREIGN KEY (order_id) REFERENCES orders,
CONSTRAINT products FOREIGN KEY (product_id) REFERENCES products
);

> Not saying I think this suggestion is a good idea, though. We've seen
> many frameworks that hide joins, and the results are ... less than
> universally good.

Yeah, I'm pretty much not sold on this idea either. I think it would
lead to the same problems we see with ORMs, namely that people write
queries that are impossible to execute efficiently and then blame
the database for their poor choice of schema.

I think this concern is valid for the original syntax,
but I actually think the idea on using foreign key constraint names
effectively solves an entire class of query writing bugs.

Users writing queries using this syntax are guaranteed to be aware
of the existence of the foreign keys, otherwise they couldn't write
the query this way, since they must use the foreign key
constraint names in the path expression.

This ensures it's not possible to produce a nonsensical JOIN
on the wrong columns, a problem for which traditional JOINs
have no means to protect against.

Even with foreign keys, indexes could of course be missing,
causing an inefficient query anyway, but at least the classes
of potential problems is reduced by one.

I think what's neat is how this syntax works excellent in combination
with traditional JOINs, allowing the one which feels most natural for
each part of the query to be used.

Let's also remember foreign keys did first appear in SQL-89,
so they couldn't have been taken into account when SQL-86
was designed. Maybe they would have came up with the idea
of making more use of foreign key constraints,
if they would have been invented from the very beginning.

However, it's not too late to fix this, it seems doable without
breaking any backwards compatibility. I think there is a risk
our personal preferences are biased due to being experienced
SQL users. I think it's likely newcomers to SQL would really
fancy this proposed syntax, and cause them to prefer PostgreSQL
over some other NoSQL product.

If we can provide such newcomers with a built-in solution,
I think that better than telling them they should
use some ORM/tool/macro to simplify their query writing.

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Vik Fearing

Date:

29 March 2021, 17:17:50

On 3/29/21 11:59 AM, Joel Jacobson wrote:
> On Sun, Mar 28, 2021, at 16:04, Tom Lane wrote:
>> I'm imagining a syntax in which
>> you give the constraint name instead of the column name.  Thought
>> experiment: how could the original syntax proposal make any use of
>> a multi-column foreign key?
> 
> Thanks for coming up with this genius idea.
> 
> At first I didn't see the beauty of it; I wrongly thought the constraint name needed to be
> unique per schema, but I realize we could just use the foreign table's name
> as the constraint name, which will allow a nice syntax:
> 
> SELECT DISTINCT order_details.orders.customers.company_name
> FROM order_details
> WHERE order_details.products.product_name = 'Chocolade';
> 
> Given this data model:
> 
> CREATE TABLE customers (
> customer_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
> company_name text,
> PRIMARY KEY (customer_id)
> );
> 
> CREATE TABLE orders (
> order_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
> customer_id bigint NOT NULL,
> PRIMARY KEY (order_id),
> CONSTRAINT customers FOREIGN KEY (customer_id) REFERENCES customers
> );
> 
> CREATE TABLE products (
> product_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
> product_name text NOT NULL,
> PRIMARY KEY (product_id)
> );
> 
> CREATE TABLE order_details (
> order_id bigint NOT NULL,
> product_id bigint NOT NULL,
> PRIMARY KEY (order_id, product_id),
> CONSTRAINT orders FOREIGN KEY (order_id) REFERENCES orders,
> CONSTRAINT products FOREIGN KEY (product_id) REFERENCES products
> );


If you write your schema like this, then it becomes standards compliant:

CREATE TYPE customers AS (
    company_name text
);
CREATE TABLE customers OF customers (
    REF IS customer_id SYSTEM GENERATED
);

CREATE TYPE orders AS (
    customer REF(customers) NOT NULL
);
CREATE TABLE orders OF orders (
    REF IS order_id SYSTEM GENERATED
);

CREATE TYPE products AS (
    product_name text
);
CREATE TABLE products OF products (
    REF IS product_id SYSTEM GENERATED
);

CREATE TABLE order_details (
    "order" REF(orders),
    product REF(products),
    quantity integer,
    PRIMARY KEY ("order", product)
);


And the query would be:

SELECT DISTINCT order_details."order"->customer->company_name
FROM order_details
WHERE order_details.product->product_name = 'Chocolade';


Postgres already supports most of that, but not all of it.
-- 
Vik Fearing

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

29 March 2021, 18:50:20

On Mon, Mar 29, 2021, at 16:17, Vik Fearing wrote:

If you write your schema like this, then it becomes standards compliant:
...
CREATE TABLE order_details (
    "order" REF(orders),
    product REF(products),
    quantity integer,
    PRIMARY KEY ("order", product)
);

And the query would be:

SELECT DISTINCT order_details."order"->customer->company_name
FROM order_details
WHERE order_details.product->product_name = 'Chocolade';

Postgres already supports most of that, but not all of it.

Thanks for making me aware of this.

I can see this is "4.9 Reference types" in ISO/IEC 9075-2:2016(E).

This looks awesome.

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

29 March 2021, 20:55:49

On Mon, Mar 29, 2021, at 16:17, Vik Fearing wrote:

CREATE TABLE order_details (
    "order" REF(orders),
    product REF(products),
    quantity integer,
    PRIMARY KEY ("order", product)
);

And the query would be:

SELECT DISTINCT order_details."order"->customer->company_name
FROM order_details
WHERE order_details.product->product_name = 'Chocolade';

Postgres already supports most of that, but not all of it.

Do you know if REF is meant to be a replacement for foreign keys?

Are they a different thing meant to co-exist with foreign keys,

or are they actually foreign keys "under the hood"

or something else entirely?

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Vik Fearing

Date:

29 March 2021, 21:53:55

On 3/29/21 7:55 PM, Joel Jacobson wrote:
> On Mon, Mar 29, 2021, at 16:17, Vik Fearing wrote:
>> CREATE TABLE order_details (
>>     "order" REF(orders),
>>     product REF(products),
>>     quantity integer,
>>     PRIMARY KEY ("order", product)
>> );
>>
>>
>> And the query would be:
>>
>> SELECT DISTINCT order_details."order"->customer->company_name
>> FROM order_details
>> WHERE order_details.product->product_name = 'Chocolade';
>>
>>
>> Postgres already supports most of that, but not all of it.
> 
> Do you know if REF is meant to be a replacement for foreign keys?
> 
> Are they a different thing meant to co-exist with foreign keys,
> or are they actually foreign keys "under the hood"
> or something else entirely?

They're supposed to be OOP where each row in the typed table is an
instance of the object.  Types can also have methods associated with
them, and the instance tables can have subtables similar to our table
inheritance.  The dereference operator is replaced by a subquery.

There is a whole slew of things in this area of the standard that
apparently never caught on.
-- 
Vik Fearing

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

29 March 2021, 23:59:38

On Mon, Mar 29, 2021, at 20:53, Vik Fearing wrote:

On 3/29/21 7:55 PM, Joel Jacobson wrote:
> Do you know if REF is meant to be a replacement for foreign keys?
>
> Are they a different thing meant to co-exist with foreign keys,
> or are they actually foreign keys "under the hood"
> or something else entirely?

They're supposed to be OOP where each row in the typed table is an
instance of the object. Types can also have methods associated with
them, and the instance tables can have subtables similar to our table
inheritance. The dereference operator is replaced by a subquery.

There is a whole slew of things in this area of the standard that
apparently never caught on.

Hmm. Since it never caught on, maybe it was partly due to too much complexity, and maybe can invent a simpler solution?

I would also be against this idea if the complexity cost would be too high,

but I think Tom's foreign key constraint name idea looks fruitful since it's simple and non-invasive.

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Pavel Stehule

Date:

30 March 2021, 09:03:09

po 29. 3. 2021 v 23:00 odesílatel Joel Jacobson <joel@compiler.org> napsal:

On Mon, Mar 29, 2021, at 20:53, Vik Fearing wrote:
On 3/29/21 7:55 PM, Joel Jacobson wrote:
> Do you know if REF is meant to be a replacement for foreign keys?
>
> Are they a different thing meant to co-exist with foreign keys,
> or are they actually foreign keys "under the hood"
> or something else entirely?

They're supposed to be OOP where each row in the typed table is an
instance of the object. Types can also have methods associated with
them, and the instance tables can have subtables similar to our table
inheritance. The dereference operator is replaced by a subquery.

There is a whole slew of things in this area of the standard that
apparently never caught on.

Hmm. Since it never caught on, maybe it was partly due to too much complexity, and maybe can invent a simpler solution?

I would also be against this idea if the complexity cost would be too high,
but I think Tom's foreign key constraint name idea looks fruitful since it's simple and non-invasive.

Maybe there were no technical problems. Just this technology was coming at a bad time. The people who needed (wanted) OOP access to data got the Hibernate, and there was no necessity to do this work on SQL level. In this time, there was possibility to use GUI for databases, and in this time there were a lot of graphic query designers. I don't like the idea of foreign key constraint names - it doesn't look comfortable to me. I don't say it is a bad idea, but it is not SQL, and I am not sure if it needs more or less work than explicitly to write PK=FK.

On second hand, it can be very nice to have some special strict mode in Postgres - maybe slower, not compatible, that disallow some dangerous or unsafe queries. But it is possible to solve in extensions, but nobody did it. Something like plpgsql_check for SQL - who will write sql_check?

Regards

Pavel

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Julien Rouhaud

Date:

30 March 2021, 09:52:47

On Tue, Mar 30, 2021 at 08:03:09AM +0200, Pavel Stehule wrote:
> 
> On second hand, it can be very nice to have some special strict mode in
> Postgres - maybe slower, not compatible, that disallow some dangerous or
> unsafe queries. But it is possible to solve in extensions, but nobody did
> it. Something like plpgsql_check for SQL - who will write sql_check?

The #1 cause of problems is probably unqualified outer references, and
unfortunately I don't think it's really possible to detect that in an
extension, as the required information is only available in the raw parsetree.

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Pavel Stehule

Date:

30 March 2021, 10:02:39

út 30. 3. 2021 v 8:52 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:

On Tue, Mar 30, 2021 at 08:03:09AM +0200, Pavel Stehule wrote:
>
> On second hand, it can be very nice to have some special strict mode in
> Postgres - maybe slower, not compatible, that disallow some dangerous or
> unsafe queries. But it is possible to solve in extensions, but nobody did
> it. Something like plpgsql_check for SQL - who will write sql_check?

The #1 cause of problems is probably unqualified outer references, and
unfortunately I don't think it's really possible to detect that in an
extension, as the required information is only available in the raw parsetree.

the raw parsetree is available I think. I didn't check it. But it can be easy to attach or attach a copy to Query structure. Maybe there is no necessary hook. But it can be a good reason for implementing a post parsing hook.

It should be easy to check if all joins are related to foreign key constraints.

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Julien Rouhaud

Date:

30 March 2021, 10:28:44

On Tue, Mar 30, 2021 at 09:02:39AM +0200, Pavel Stehule wrote:
> út 30. 3. 2021 v 8:52 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
> 
> > On Tue, Mar 30, 2021 at 08:03:09AM +0200, Pavel Stehule wrote:
> > >
> > > On second hand, it can be very nice to have some special strict mode in
> > > Postgres - maybe slower, not compatible, that disallow some dangerous or
> > > unsafe queries. But it is possible to solve in extensions, but nobody did
> > > it. Something like plpgsql_check for SQL - who will write sql_check?
> >
> > The #1 cause of problems is probably unqualified outer references, and
> > unfortunately I don't think it's really possible to detect that in an
> > extension, as the required information is only available in the raw
> > parsetree.
> >
> 
> the raw parsetree is available  I think. I didn't check it. But it can be
> easy to attach or attach a copy to Query structure. Maybe there is no
> necessary hook. But it can be a good reason for implementing a post parsing
> hook.

It's not available in any existing hook.  And even if it was you would have to
import most of transformTopLevelStmt() and all underlying functions to be able
to detect this case I think.  This should be best done in core postgres.

> It should be easy to check if all joins are related to foreign key
> constraints.

Yes, and also if the referenced columns are covered by indexes for instance.
My concern is mostly that you won't be able to cover the unqualified outer
references, which can lead to wrong query results rather than just slow
queries.

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Pavel Stehule

Date:

30 March 2021, 10:32:14

út 30. 3. 2021 v 9:28 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:

On Tue, Mar 30, 2021 at 09:02:39AM +0200, Pavel Stehule wrote:
> út 30. 3. 2021 v 8:52 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
>
> > On Tue, Mar 30, 2021 at 08:03:09AM +0200, Pavel Stehule wrote:
> > >
> > > On second hand, it can be very nice to have some special strict mode in
> > > Postgres - maybe slower, not compatible, that disallow some dangerous or
> > > unsafe queries. But it is possible to solve in extensions, but nobody did
> > > it. Something like plpgsql_check for SQL - who will write sql_check?
> >
> > The #1 cause of problems is probably unqualified outer references, and
> > unfortunately I don't think it's really possible to detect that in an
> > extension, as the required information is only available in the raw
> > parsetree.
> >
>
> the raw parsetree is available I think. I didn't check it. But it can be
> easy to attach or attach a copy to Query structure. Maybe there is no
> necessary hook. But it can be a good reason for implementing a post parsing
> hook.

It's not available in any existing hook. And even if it was you would have to
import most of transformTopLevelStmt() and all underlying functions to be able
to detect this case I think. This should be best done in core postgres.

> It should be easy to check if all joins are related to foreign key
> constraints.

Yes, and also if the referenced columns are covered by indexes for instance.
My concern is mostly that you won't be able to cover the unqualified outer
references, which can lead to wrong query results rather than just slow
queries.

it can be fixed

Pavel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

30 March 2021, 10:47:07

On Tue, Mar 30, 2021, at 08:03, Pavel Stehule wrote:

Maybe there were no technical problems. Just this technology was coming at a bad time. The people who needed (wanted) OOP access to data got the Hibernate, and there was no necessity to do this work on SQL level. In this time, there was possibility to use GUI for databases, and in this time there were a lot of graphic query designers.

Thanks for giving this perspective. It seems like a likely explanation. In the ORM camp, SQL is merely a low-level language compilation target, not a language humans primarily write code in.

I don't like the idea of foreign key constraint names - it doesn't look comfortable to me. I don't say it is a bad idea, but it is not SQL, and I am not sure if it needs more or less work than explicitly to write PK=FK.

I agree, it's not very comfortable. Maybe we can think of ways to improve the comfort?

Here are two such ideas:

Idea #1

=======

Initial semi-automated script-assisted renaming of existing foreign keys.

In my experiences, multiple foreign keys per primary table is quite common,

but not multiple foreign keys referencing the same foreign table from the same primary table.

If so, then a script can be written to rename most existing foreign keys:

-- Script to rename foreign keys to the name of the foreign table.

-- Tables with multiple foreign keys referencing the same foreign table are skipped.

$_$

DECLARE

sql_cmd text;

BEGIN

FOR sql_cmd IN

SELECT * FROM

(

SELECT

format

(

'ALTER TABLE %I.%I RENAME CONSTRAINT %I TO %I;',

conrel_nsp.nspname,

conrel.relname,

pg_constraint.conname,

confrel.relname

) AS sql_cmd,

COUNT(*) OVER (PARTITION BY pg_constraint.conrelid, pg_constraint.confrelid)

AS count_foreign_keys_to_same_table

FROM pg_constraint

JOIN pg_class AS conrel

ON conrel.oid = pg_constraint.conrelid

JOIN pg_class AS confrel

ON confrel.oid = pg_constraint.confrelid

JOIN pg_namespace AS conrel_nsp

ON conrel_nsp.oid = conrel.relnamespace

WHERE pg_constraint.contype = 'f'

) AS x

WHERE count_foreign_keys_to_same_table = 1

LOOP

RAISE NOTICE '%', sql_cmd;

EXECUTE sql_cmd;

END LOOP;

END

$_$;

For our example data model, this would produce:

ALTER TABLE public.orders RENAME CONSTRAINT orders_customer_id_fkey TO customers;

ALTER TABLE public.order_details RENAME CONSTRAINT order_details_order_id_fkey TO orders;

ALTER TABLE public.order_details RENAME CONSTRAINT order_details_product_id_fkey TO products;

To clarify what I mean with multiple foreign keys to the same table, here is an example:

CREATE TABLE p (

a int,

b int,

PRIMARY KEY (a),

UNIQUE (a,b)

);

CREATE TABLE f1 (

a int,

b int,

FOREIGN KEY (a) REFERENCES p

);

CREATE TABLE f2 (

a int,

b int,

FOREIGN KEY (a) REFERENCES p,

FOREIGN KEY (a,b) REFERENCES p(a,b)

);

For this example, only f1's foreign key constraint would be renamed:

ALTER TABLE public.f1 RENAME CONSTRAINT f1_a_fkey TO p;

Idea #2

=======

Allow user to define the default format for new foreign key constraint name.

The format could use template patterns similar to how e.g. to_char() works.

If a conflict is found, it would do the same as today, try appending an increasing integer.

Users could then decide on a company-wide consistent naming convention

on how foreign keys are usually named, which would reduce the need to manually name them

using the CONSTRAINT keyword.

Finally, just for fun, here is an example of how we could write the query above,

if we would have real foreign keys on the catalogs:

SELECT

format

(

'ALTER TABLE %I.%I RENAME CONSTRAINT %I TO %I;',

pg_constraint.conrel.pg_namespace.nspname,

pg_constraint.conrel.relname,

pg_constraint.conname,

pg_constraint.confrel.relname,

) AS sql_cmd,

COUNT(*) OVER (PARTITION BY pg_constraint.conrelid, pg_constraint.confrelid)

AS count_foreign_keys_to_same_table

FROM pg_constraint

WHERE pg_constraint.contype = 'f'

In this example the foreign key constraint names have been

derived from the column names since both conrelid and confrelid,

reference pg_class.

I think this is a good example of where this improves the situation the most,

when you have multiple joins of the same table, forcing you to come up with multiple aliases

for the same table, keeping them all in memory while writing and reading such queries.

On second hand, it can be very nice to have some special strict mode in Postgres - maybe slower, not compatible, that disallow some dangerous or unsafe queries. But it is possible to solve in extensions, but nobody did it. Something like plpgsql_check for SQL - who will write sql_check?

Not a bad idea, this is a real problem, such a tool would be useful even with this proposed new syntax, as normal JOINs would continue to co-exist, for which nonsensical joins would still be possible.

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Pavel Stehule

Date:

30 March 2021, 11:24:40

I agree, it's not very comfortable. Maybe we can think of ways to improve the comfort?

Here are two such ideas:

Idea #1
=======

Initial semi-automated script-assisted renaming of existing foreign keys.

In my experiences, multiple foreign keys per primary table is quite common,
but not multiple foreign keys referencing the same foreign table from the same primary table.

If so, then a script can be written to rename most existing foreign keys:

--
-- Script to rename foreign keys to the name of the foreign table.
-- Tables with multiple foreign keys referencing the same foreign table are skipped.
--
DO
$_$
DECLARE
sql_cmd text;
BEGIN
FOR sql_cmd IN
SELECT * FROM
(
    SELECT
      format
      (
        'ALTER TABLE %I.%I RENAME CONSTRAINT %I TO %I;',
        conrel_nsp.nspname,
        conrel.relname,
        pg_constraint.conname,
        confrel.relname
      ) AS sql_cmd,
      COUNT(*) OVER (PARTITION BY pg_constraint.conrelid, pg_constraint.confrelid)
      AS count_foreign_keys_to_same_table
    FROM pg_constraint
    JOIN pg_class AS conrel
      ON conrel.oid = pg_constraint.conrelid
    JOIN pg_class AS confrel
      ON confrel.oid = pg_constraint.confrelid
    JOIN pg_namespace AS conrel_nsp
      ON conrel_nsp.oid = conrel.relnamespace
    WHERE pg_constraint.contype = 'f'
) AS x
WHERE count_foreign_keys_to_same_table = 1
LOOP
RAISE NOTICE '%', sql_cmd;
EXECUTE sql_cmd;
END LOOP;
END
$_$;

For our example data model, this would produce:

ALTER TABLE public.orders RENAME CONSTRAINT orders_customer_id_fkey TO customers;
ALTER TABLE public.order_details RENAME CONSTRAINT order_details_order_id_fkey TO orders;
ALTER TABLE public.order_details RENAME CONSTRAINT order_details_product_id_fkey TO products;

you fix one issue, but you lost interesting informations

To clarify what I mean with multiple foreign keys to the same table, here is an example:

CREATE TABLE p (
a int,
b int,
PRIMARY KEY (a),
UNIQUE (a,b)
);

CREATE TABLE f1 (
a int,
b int,
FOREIGN KEY (a) REFERENCES p
);

CREATE TABLE f2 (
a int,
b int,
FOREIGN KEY (a) REFERENCES p,
FOREIGN KEY (a,b) REFERENCES p(a,b)
);

For this example, only f1's foreign key constraint would be renamed:

ALTER TABLE public.f1 RENAME CONSTRAINT f1_a_fkey TO p;

Idea #2
=======

Allow user to define the default format for new foreign key constraint name.

The format could use template patterns similar to how e.g. to_char() works.
If a conflict is found, it would do the same as today, try appending an increasing integer.

Users could then decide on a company-wide consistent naming convention
on how foreign keys are usually named, which would reduce the need to manually name them
using the CONSTRAINT keyword.

Finally, just for fun, here is an example of how we could write the query above,
if we would have real foreign keys on the catalogs:

SELECT
    format
    (
      'ALTER TABLE %I.%I RENAME CONSTRAINT %I TO %I;',
      pg_constraint.conrel.pg_namespace.nspname,
      pg_constraint.conrel.relname,
      pg_constraint.conname,
      pg_constraint.confrel.relname,
    ) AS sql_cmd,
    COUNT(*) OVER (PARTITION BY pg_constraint.conrelid, pg_constraint.confrelid)
    AS count_foreign_keys_to_same_table
FROM pg_constraint
WHERE pg_constraint.contype = 'f'

In this example the foreign key constraint names have been
derived from the column names since both conrelid and confrelid,
reference pg_class.

I think this is a good example of where this improves the situation the most,
when you have multiple joins of the same table, forcing you to come up with multiple aliases
for the same table, keeping them all in memory while writing and reading such queries.

I do not have an opinion about this, I am sorry. I cannot imagine so this can work. In some complex cases, the graphic query designer can work better. The invention of new syntax, or new tool should be better just than checking correct usage of foreign constraints. I have worked with SQL for over 25 years, and there were a lot of tools, and people don't use it too much. So I am not good at dialog in this area, because I am a little bit too sceptical :).

I remember multiple self joins only when developers used an EAV model. This is an antipattern, and today we have better tools, and we don't need it. It is scary, because it is completely against the relational model. If I want to fix it, then I will invent a new different syntax type that can be used for optimization of this case. But I have no idea how to do it well. Maybe:

SELECT * FROM EAVTOENTITY( FROM data GROUP BY objid COLUMN name varchar WHEN attrname = 'name', surname varchar WHEN attrname = 'surname', ...)

On second hand, it can be very nice to have some special strict mode in Postgres - maybe slower, not compatible, that disallow some dangerous or unsafe queries. But it is possible to solve in extensions, but nobody did it. Something like plpgsql_check for SQL - who will write sql_check?

Not a bad idea, this is a real problem, such a tool would be useful even with this proposed new syntax, as normal JOINs would continue to co-exist, for which nonsensical joins would still be possible.

Maybe some similar what we have in plpgsql - extra checks - with three levels, off, warnings, errors.

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

30 March 2021, 11:49:24

On Tue, Mar 30, 2021, at 10:24, Pavel Stehule wrote:

I think this is a good example of where this improves the situation the most,
when you have multiple joins of the same table, forcing you to come up with multiple aliases
for the same table, keeping them all in memory while writing and reading such queries.

...
I remember multiple self joins only when developers used an EAV model. This is an antipattern, and today we have better tools, and we don't need it. It is scary, because it is completely against the relational model.

No, you are mistaken. There are no self-joins in any of the examples I presented.

I merely joined in the same table multiple times, but not with itself, so it's not a self join.

Here is the query again, it doesn't contain any self-joins:

SELECT

format

(

'ALTER TABLE %I.%I RENAME CONSTRAINT %I TO %I;',

conrel_nsp.nspname,

conrel.relname,

pg_constraint.conname,

confrel.relname

) AS sql_cmd,

COUNT(*) OVER (PARTITION BY pg_constraint.conrelid, pg_constraint.confrelid)

AS count_foreign_keys_to_same_table

FROM pg_constraint

JOIN pg_class AS conrel

ON conrel.oid = pg_constraint.conrelid

JOIN pg_class AS confrel

ON confrel.oid = pg_constraint.confrelid

JOIN pg_namespace AS conrel_nsp

ON conrel_nsp.oid = conrel.relnamespace

WHERE pg_constraint.contype = 'f'

Where would the antipattern be here?

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Pavel Stehule

Date:

30 March 2021, 11:51:52

út 30. 3. 2021 v 10:49 odesílatel Joel Jacobson <joel@compiler.org> napsal:

On Tue, Mar 30, 2021, at 10:24, Pavel Stehule wrote:

I think this is a good example of where this improves the situation the most,
when you have multiple joins of the same table, forcing you to come up with multiple aliases
for the same table, keeping them all in memory while writing and reading such queries.

...
I remember multiple self joins only when developers used an EAV model. This is an antipattern, and today we have better tools, and we don't need it. It is scary, because it is completely against the relational model.

No, you are mistaken. There are no self-joins in any of the examples I presented.
I merely joined in the same table multiple times, but not with itself, so it's not a self join.

Here is the query again, it doesn't contain any self-joins:

    SELECT
      format
      (
        'ALTER TABLE %I.%I RENAME CONSTRAINT %I TO %I;',
        conrel_nsp.nspname,
        conrel.relname,
        pg_constraint.conname,
        confrel.relname
      ) AS sql_cmd,
      COUNT(*) OVER (PARTITION BY pg_constraint.conrelid, pg_constraint.confrelid)
      AS count_foreign_keys_to_same_table
    FROM pg_constraint
    JOIN pg_class AS conrel
      ON conrel.oid = pg_constraint.conrelid
    JOIN pg_class AS confrel
      ON confrel.oid = pg_constraint.confrelid
    JOIN pg_namespace AS conrel_nsp
      ON conrel_nsp.oid = conrel.relnamespace
    WHERE pg_constraint.contype = 'f'

Where would the antipattern be here?

ok, this is not EAV.

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

30 March 2021, 12:21:57

On Tue, Mar 30, 2021, at 10:24, Pavel Stehule wrote:

For our example data model, this would produce:

ALTER TABLE public.orders RENAME CONSTRAINT orders_customer_id_fkey TO customers;
ALTER TABLE public.order_details RENAME CONSTRAINT order_details_order_id_fkey TO orders;
ALTER TABLE public.order_details RENAME CONSTRAINT order_details_product_id_fkey TO products;

you fix one issue, but you lost interesting informations

No, it's not lost. It's still there:

# \d order_details

Foreign-key constraints:

"orders" FOREIGN KEY (order_id) REFERENCES orders(order_id)

"products" FOREIGN KEY (product_id) REFERENCES products(product_id)

You can still easily find out what tables/columns are referencing/referenced,

by using \d or look in the information_schema.

The primarily reason why this information is duplicated in the default name,

is AFAIK due to avoid hypothetical name conflicts,

which is only a real problem for users who would need to export the schema

to some other SQL database, or use apps that depend on the names to be

unique within the namespace, and not just within the table.

The comment in pg_constraint.c explains this:

/* Select a nonconflicting name for a new constraint.

* The objective here is to choose a name that is unique within the

* specified namespace. Postgres does not require this, but the SQL

* spec does, and some apps depend on it. Therefore we avoid choosing

* default names that so conflict.

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

30 March 2021, 12:24:10

On Tue, Mar 30, 2021, at 11:21, Joel Jacobson wrote:

On Tue, Mar 30, 2021, at 10:24, Pavel Stehule wrote:
For our example data model, this would produce:

ALTER TABLE public.orders RENAME CONSTRAINT orders_customer_id_fkey TO customers;
ALTER TABLE public.order_details RENAME CONSTRAINT order_details_order_id_fkey TO orders;
ALTER TABLE public.order_details RENAME CONSTRAINT order_details_product_id_fkey TO products;

you fix one issue, but you lost interesting informations

No, it's not lost. It's still there:

# \d order_details
Foreign-key constraints:
"orders" FOREIGN KEY (order_id) REFERENCES orders(order_id)
"products" FOREIGN KEY (product_id) REFERENCES products(product_id)

You can still easily find out what tables/columns are referencing/referenced,
by using \d or look in the information_schema.

The primarily reason why this information is duplicated in the default name,
is AFAIK due to avoid hypothetical name conflicts,
which is only a real problem for users who would need to export the schema
to some other SQL database, or use apps that depend on the names to be
unique within the namespace, and not just within the table.

The comment in pg_constraint.c explains this:

/* Select a nonconflicting name for a new constraint.
*
* The objective here is to choose a name that is unique within the
* specified namespace. Postgres does not require this, but the SQL
* spec does, and some apps depend on it. Therefore we avoid choosing
* default names that so conflict.

/Joel

Users who have decided to stick to PostgreSQL for ever,

and don't have any apps that depend on (the IMHO stupid) decision by the SQL standard

to require constraints to be unique per namespace, can and should happily ignore this restriction.

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Rod Taylor

Date:

30 March 2021, 17:25:23

On Sat, 27 Mar 2021 at 16:28, Joel Jacobson <joel@compiler.org> wrote:

Hi,

The database Neo4j has a language called "Cypher" where one of the key selling points is they "don’t need join tables".

Here is an example from https://neo4j.com/developer/cypher/guide-sql-to-cypher/

SQL:

SELECT DISTINCT c.company_name
FROM customers AS c
JOIN orders AS o ON c.customer_id = o.customer_id
JOIN order_details AS od ON o.order_id = od.order_id
JOIN products AS p ON od.product_id = p.product_id
WHERE p.product_name = 'Chocolade';

Neo4j's Cypher:

MATCH (p:product {product_name:"Chocolade"})<-[:PRODUCT]-(:order)<-[:PURCHASED]-(c:customer)
RETURN distinct c.company_name;

Imagine if we could simply write the SQL query like this:

SELECT DISTINCT od.order_id.customer_id.company_name
FROM order_details AS od
WHERE od.product_id.product_name = 'Chocolade';

I regularly do this type of thing via views. It's a bit confusing as writes go to one set of tables while selects often go through the view with all the details readily available.

I think I'd want these shortcuts to be well defined and obvious to someone exploring via psql. I can also see uses where a foreign key might not be available (left join rather than join).

I wonder if GENERATED ... VIRTUAL might be a way of defining this type of added record.

ALTER TABLE order ADD customer record GENERATED JOIN customer USING (customer_id) VIRTUAL;

ALTER TABLE order_detail ADD order record GENERATED JOIN order USING (order_id) VIRTUAL;

SELECT order.customer.company_name FROM order_detail;

Of course, if they don't reference the GENERATED column then the join isn't added to the query.

Rod Taylor

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

30 March 2021, 21:15:00

On Tue, Mar 30, 2021, at 16:25, Rod Taylor wrote:

On Sat, 27 Mar 2021 at 16:28, Joel Jacobson <joel@compiler.org> wrote:
Imagine if we could simply write the SQL query like this:

SELECT DISTINCT od.order_id.customer_id.company_name
FROM order_details AS od
WHERE od.product_id.product_name = 'Chocolade';

I regularly do this type of thing via views. It's a bit confusing as writes go to one set of tables while selects often go through the view with all the details readily available.

I think I'd want these shortcuts to be well defined and obvious to someone exploring via psql. I can also see uses where a foreign key might not be available (left join rather than join).

I wonder if GENERATED ... VIRTUAL might be a way of defining this type of added record.

ALTER TABLE order ADD customer record GENERATED JOIN customer USING (customer_id) VIRTUAL;
ALTER TABLE order_detail ADD order record GENERATED JOIN order USING (order_id) VIRTUAL;

SELECT order.customer.company_name FROM order_detail;

Of course, if they don't reference the GENERATED column then the join isn't added to the query.

Interesting idea, but not sure I like it, since you would need twice as many columns,

and you would still need the foreign keys, right?

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

30 March 2021, 21:30:20

On Mon, Mar 29, 2021, at 16:17, Vik Fearing wrote:

SELECT DISTINCT order_details."order"->customer->company_name
FROM order_details
WHERE order_details.product->product_name = 'Chocolade';

I like the idea of using -> instead of . (dot),

since name resolution is already complicated,

so overloading the dot operator feels like a bad idea.

I therefore propose the following syntax:

{ table_name | alias } -> constraint_name [ [ -> constraint_name ... ] -> column_name ]

It's necessary to start with the table name or its alias,

since two tables/aliases used in the same query

might have different constraints with the same name.

If the expression ends with a column_name,

you get the value for the column.

If the expression ends with a constraint_name,

you get the referenced table as a record.

I also have a new idea on how we can use

the nullability of the foreign key's column(s),

as a rule to determine if you would get

a LEFT JOIN or an INNER JOIN:

If ALL of the foreign key column(s) are declared as NOT NULL,

then you would get an INNER JOIN.

If ANY of the foreign key column(s) are declared as NULL,

then you would get a LEFT JOIN.

Thoughts?

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Isaac Morland

Date:

30 March 2021, 22:02:17

On Tue, 30 Mar 2021 at 14:30, Joel Jacobson <joel@compiler.org> wrote:

If the expression ends with a column_name,
you get the value for the column.

If the expression ends with a constraint_name,
you get the referenced table as a record.

Can’t you just leave off the “ends with a column_name” part? If you want one of its columns, just put .column_name:

table -> constraint -> ... -> constraint . column_name

Then you know that -> expects a constraint_name and only that to its right.

Also, should the join be a left join, which would therefore return a NULL when there is no matching record? Or could we have a variation such as ->? to give a left join (NULL when no matching record) with -> using an inner join (record is not included in result when no matching record).

For the record I would find something like this quite useful. I constantly find myself joining in code lookup tables and the like, and while from a mathematical view it’s just another join, explicitly listing the table in the FROM clause of a large query does not assist with readability to say the least.

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

30 March 2021, 22:32:59

On Tue, Mar 30, 2021, at 21:02, Isaac Morland wrote:

On Tue, 30 Mar 2021 at 14:30, Joel Jacobson <joel@compiler.org> wrote:

If the expression ends with a column_name,
you get the value for the column.

If the expression ends with a constraint_name,
you get the referenced table as a record.

Can’t you just leave off the “ends with a column_name” part? If you want one of its columns, just put .column_name:

table -> constraint -> ... -> constraint . column_name

Then you know that -> expects a constraint_name and only that to its right.

Of course! Much simpler. Thanks.

Also, should the join be a left join, which would therefore return a NULL when there is no matching record? Or could we have a variation such as ->? to give a left join (NULL when no matching record) with -> using an inner join (record is not included in result when no matching record).

Interesting idea, but I think we can keep it simple, and still support the case you mention:

If we only have -> and you want to exclude records where the column is NULL (i.e. INNER JOIN),

I think we should just use the WHERE clause and filter on such condition.

For the record I would find something like this quite useful. I constantly find myself joining in code lookup tables and the like, and while from a mathematical view it’s just another join, explicitly listing the table in the FROM clause of a large query does not assist with readability to say the least.

Thanks for the encouraging words. I have exactly the same experience myself and share your view.

I look forward to continued discussion on this matter.

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Isaac Morland

Date:

30 March 2021, 23:01:16

On Tue, 30 Mar 2021 at 15:33, Joel Jacobson <joel@compiler.org> wrote:

Also, should the join be a left join, which would therefore return a NULL when there is no matching record? Or could we have a variation such as ->? to give a left join (NULL when no matching record) with -> using an inner join (record is not included in result when no matching record).

Interesting idea, but I think we can keep it simple, and still support the case you mention:

If we only have -> and you want to exclude records where the column is NULL (i.e. INNER JOIN),
I think we should just use the WHERE clause and filter on such condition.

Just to be clear, it will always be a left join? Agreed that getting the inner join behaviour can be done in the WHERE clause. I think this is a case where simple is good. As long as the left join case is supported I'm happy.

Thanks for the encouraging words. I have exactly the same experience myself and share your view.

I look forward to continued discussion on this matter.

I had another idea: maybe the default name of a foreign key constraint to a primary key should simply be the name of the target table? That is, if I say:

FOREIGN KEY (...) REFERENCES t

... then unless the table name t is already in use as a constraint name, it will be used as the constraint name. It would be nice not to have to keep repeating, like this:

CONSTRAINT t FOREIGN KEY (...) REFERENCES t

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

31 March 2021, 01:50:19

On Tue, Mar 30, 2021, at 22:01, Isaac Morland wrote:

On Tue, 30 Mar 2021 at 15:33, Joel Jacobson <joel@compiler.org> wrote:
Also, should the join be a left join, which would therefore return a NULL when there is no matching record? Or could we have a variation such as ->? to give a left join (NULL when no matching record) with -> using an inner join (record is not included in result when no matching record).

Interesting idea, but I think we can keep it simple, and still support the case you mention:

If we only have -> and you want to exclude records where the column is NULL (i.e. INNER JOIN),
I think we should just use the WHERE clause and filter on such condition.

Just to be clear, it will always be a left join? Agreed that getting the inner join behaviour can be done in the WHERE clause. I think this is a case where simple is good. As long as the left join case is supported I'm happy.

Hmm, I guess, since technically, if all foreign key column(s) are declared as NOT NULL, we would know for sure such values exist, so a LEFT JOIN and INNER JOIN would always produce the same result.

I'm not sure if the query planner could produce different plans though, and if an INNER JOIN could be more efficient. If it matters, then I think we should generate an INNER JOIN for the "all column(s) NOT NULL" case.

Thanks for the encouraging words. I have exactly the same experience myself and share your view.

I look forward to continued discussion on this matter.

I had another idea: maybe the default name of a foreign key constraint to a primary key should simply be the name of the target table? That is, if I say:

FOREIGN KEY (...) REFERENCES t

... then unless the table name t is already in use as a constraint name, it will be used as the constraint name. It would be nice not to have to keep repeating, like this:

CONSTRAINT t FOREIGN KEY (...) REFERENCES t

I suggested earlier in the thread to allow making the default name format user-definable,

since some users according to the comment in pg_constraint.c might depend on apps that rely on the name

being unique within the namespace and not just the table.

Here is the commit that implemented this:

commit 45616f5bbbb87745e0e82b00e77562d6502aa042

Author: Tom Lane <tgl@sss.pgh.pa.us>

Date: Thu Jun 10 17:56:03 2004 +0000

Clean up generation of default names for constraints, indexes, and serial

sequences, as per recent discussion. All these names are now of the

form table_column_type, with digits added if needed to make them unique.

Default constraint names are chosen to be unique across their whole schema,

not just within the parent object, so as to be more SQL-spec-compatible

and make the information schema views more useful.

So if nothing has changed since then, I don't think we should change the default name for all users.

But like I said earlier, I think it would be good if users who know what they are doing could override the default name format.

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Julien Rouhaud

Date:

31 March 2021, 09:18:24

On Wed, Mar 31, 2021 at 12:50:19AM +0200, Joel Jacobson wrote:
> On Tue, Mar 30, 2021, at 22:01, Isaac Morland wrote:
> > On Tue, 30 Mar 2021 at 15:33, Joel Jacobson <joel@compiler.org> wrote:
> >>> Also, should the join be a left join, which would therefore return a NULL when there is no matching record? Or
couldwe have a variation such as ->? to give a left join (NULL when no matching record) with -> using an inner join
(recordis not included in result when no matching record).

> >> 
> >> Interesting idea, but I think we can keep it simple, and still support the case you mention:
> >> 
> >> If we only have -> and you want to exclude records where the column is NULL (i.e. INNER JOIN),
> >> I think we should just use the WHERE clause and filter on such condition.
> >> 
> > 
> > Just to be clear, it will always be a left join? Agreed that getting the inner join behaviour can be done in the
WHEREclause. I think this is a case where simple is good. As long as the left join case is supported I'm happy.

> 
> Hmm, I guess, since technically, if all foreign key column(s) are declared as NOT NULL, we would know for sure such
valuesexist, so a LEFT JOIN and INNER JOIN would always produce the same result.

> I'm not sure if the query planner could produce different plans though, and if an INNER JOIN could be more efficient.
Ifit matters, then I think we should generate an INNER JOIN for the "all column(s) NOT NULL" case.

I'm not sure who is supposed to be the target for this proposal.

As far as I understand this won't change the fact that users will still have to
understand the "relational" part of RDBMS, understand what is a JOIN
cardinality and everything that comes with it.  So you think that people who
are too lazy to learn the proper JOIN syntax will still bother to learn about
relational algebra and understand what they're doing, and I'm very doubtful
about that.

You also think that writing a proper JOIN is complex, but somehow writing a
proper WHERE clause to subtly change the query behavior is not a problem, or
that if users want to use aggregate or anything more complex then they'll
happily open the documentation and learn how to do that.  In my experience what
will happen is that instead users will keep using that limited subset of SQL
features and build creative and incredibly inefficient systems to avoid using
anything else and will then complain that postgres is too slow.

As an example just yesterday some user complained that it's not possible to
write a trigger on a table that could intercept inserting a textual value on an
integer field and replace it with the referenced value.  And he rejected our
suggested solution to replace the "INSERT INTO sometable VALUES..." with
"INSERT INTO sometable SELECT ...".  And no this proposal would not have
changed anything because changing the python script doing the import to add
some minimal SQL knowledge was apparently too problematic.  Instead he will
insert the data in a temporary table and dispatch everything on a per-row
basis, using triggers.  So here again the problem wasn't the syntax but having
to deal with a relational rather than an imperative approach.

Even if I'm totally wrong about that, I still think your proposal will lead to
problematic or ambiguous situation unless you come up with a syntax that can
fully handle the JOIN grammar.  For instance, what should happen if the query
also contains an explicit JOIN for the same relation?  I can see many reason
why this would happen with this proposal given the set of features it can
handle.  For instance:

- you want multiple JOIN.  Like one OUTER JOIN and one INNER JOIN for the same
  relation
- you want to push predicates on an OUTER JOIN

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

31 March 2021, 12:18:52

On Wed, Mar 31, 2021, at 08:18, Julien Rouhaud wrote:

On Wed, Mar 31, 2021 at 12:50:19AM +0200, Joel Jacobson wrote:
> On Tue, Mar 30, 2021, at 22:01, Isaac Morland wrote:
> > On Tue, 30 Mar 2021 at 15:33, Joel Jacobson <joel@compiler.org> wrote:
> >>> Also, should the join be a left join, which would therefore return a NULL when there is no matching record? Or could we have a variation such as ->? to give a left join (NULL when no matching record) with -> using an inner join (record is not included in result when no matching record).
> >>
> >> Interesting idea, but I think we can keep it simple, and still support the case you mention:
> >>
> >> If we only have -> and you want to exclude records where the column is NULL (i.e. INNER JOIN),
> >> I think we should just use the WHERE clause and filter on such condition.
> >>
> >
> > Just to be clear, it will always be a left join? Agreed that getting the inner join behaviour can be done in the WHERE clause. I think this is a case where simple is good. As long as the left join case is supported I'm happy.
>
> Hmm, I guess, since technically, if all foreign key column(s) are declared as NOT NULL, we would know for sure such values exist, so a LEFT JOIN and INNER JOIN would always produce the same result.
> I'm not sure if the query planner could produce different plans though, and if an INNER JOIN could be more efficient. If it matters, then I think we should generate an INNER JOIN for the "all column(s) NOT NULL" case.

I'm not sure who is supposed to be the target for this proposal.

As far as I understand this won't change the fact that users will still have to
understand the "relational" part of RDBMS, understand what is a JOIN
cardinality and everything that comes with it. So you think that people who
are too lazy to learn the proper JOIN syntax will still bother to learn about
relational algebra and understand what they're doing, and I'm very doubtful
about that.

You also think that writing a proper JOIN is complex, but somehow writing a
proper WHERE clause to subtly change the query behavior is not a problem, or
that if users want to use aggregate or anything more complex then they'll
happily open the documentation and learn how to do that. In my experience what
will happen is that instead users will keep using that limited subset of SQL
features and build creative and incredibly inefficient systems to avoid using
anything else and will then complain that postgres is too slow.

Thanks for interesting new insights and questions.

Traditional SQL JOINs reveals less information about the data model,

compared to this new proposed foreign key based syntax.

Traditional SQL JOINs => undirected graph can be inferred

Foreign key joins => directed graph can be inferred

When looking at a traditional join, you might be able to guess the direction,

based on the name of tables and columns, but you cannot know for sure without

looking at the table definitions.

I'm thinking the target is both expert as well as beginner users,

who prefer a more concise syntax and reduced cognitive load:

Imagine a company with two types of SQL users:

1) Tech core team, responsible for schema changes (DDL), such as adding new tables/columns

and adding proper foreign keys.

2) Normal users, responsible for writing SQL queries using the existing schema.

In such a scenario, (2) would use the foreign keys added by (1),

letting them focus on *what* to join and less on *how* to join,

all in line with the objectives of the declarative paradigm.

By using the foreign keys, it is guaranteed you cannot get an

accidental one-to-many join that would multiply the result set.

How many rows a certain big query with lots of joins returns

can be difficult to reason about, you need to carefully inspect each

table to understand what column(s) there are unique constraints on,

that cannot multiply the result set.

If using the -> notation, you would only need to manually

inspect the tables involved in the remaining JOINs;

since you could be confident all uses of -> cannot affect cardinality.

I think this would be a win also for an expert SQL consultant working

with a new complex data model never seen before.

As an example just yesterday some user complained that it's not possible to
write a trigger on a table that could intercept inserting a textual value on an
integer field and replace it with the referenced value. And he rejected our
suggested solution to replace the "INSERT INTO sometable VALUES..." with
"INSERT INTO sometable SELECT ...". And no this proposal would not have
changed anything because changing the python script doing the import to add
some minimal SQL knowledge was apparently too problematic. Instead he will
insert the data in a temporary table and dispatch everything on a per-row
basis, using triggers. So here again the problem wasn't the syntax but having
to deal with a relational rather than an imperative approach.

Sad but a bit funny story. I guess some people cannot learn from others mistake,

but insist on shooting themselves in the foot first.

I understand it must feel wasteful and hopeless trying to educate such users.

Maybe we could recycle the invested energy into such conversations,

by creating a wiki-page for each such anti-pattern, so that each new attempt

at explaining hopefully eventually leads to sufficient information for anyone

to understand why X is a bad idea.

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Pavel Stehule

Date:

31 March 2021, 19:54:00

If using the -> notation, you would only need to manually
inspect the tables involved in the remaining JOINs;
since you could be confident all uses of -> cannot affect cardinality.

I think this would be a win also for an expert SQL consultant working
with a new complex data model never seen before.

I did not feel comfortable when I read about this proprietary extension of SQL. I can accept and it can be nice to support ANSI/SQL object's referentions, but implementing own syntax for JOIN looks too strange. I don't see too strong benefit in inventing new syntax and increasing the complexity and possible disorientation of users about correct syntax. Some users didn't adopt a difference between old joins and modern joins, and you are inventing a third syntax.

Pavel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Julien Rouhaud

Date:

31 March 2021, 20:16:44

On Wed, Mar 31, 2021 at 5:19 PM Joel Jacobson <joel@compiler.org> wrote:
>
> If using the -> notation, you would only need to manually
> inspect the tables involved in the remaining JOINs;
> since you could be confident all uses of -> cannot affect cardinality.

Talking about that, do you have some answers to the points raised in
my previous mail, which is how it's supposed to behave when a table is
both join using your "->" syntax and a plain JOIN, how to join the
same table multiple time using this new syntax, and how to add
predicates to the join clause using  this new syntax.

> I think this would be a win also for an expert SQL consultant working
> with a new complex data model never seen before.

By experience if the queries are written with ANSI JOIN it's not
really a problem.  And if it's a new complex data model that was never
seen, I would need to inspect the data model first anyway to
understand what the query is (or should be) doing.

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Martin Jonsson

Date:

31 March 2021, 22:24:23

SAP has implemented something similar all across their stack. In their HANA database, application platform ABAP and also their cloud. So clearly they find it very popular:-) It is called CDS (Core Data Services) views. Here is a quick overview:

- Superset of SQL to declare views and associations between views. They are views with sort of named joins. The code is parsed and stored as a normal SQL view as well as metadata. Note this metadata is not technically part of the database SQL layer but rather the database application layer. The user normally sees no difference.

- Superset of SQL to query in a very similar way as described above with paths. This is parsed to normal SQL with joins, taking into consideration above metadata. Can only work on the above views.

This has obvious limitations, most mentioned earlier in this thread. Specifically, join types are limited. Still it eases the pain considerably of writing queries. The SAP system I work on now has 400K tables and over 1 million fields. Most keys are composite.. One needs to be a super hero to keep that data model in memory....

This might be an extreme case but I'm sure there are other use cases. SAP technical users are actually quite happy to work with it since, in my humble opinion, it is in a way SQL light. The nice data model parts without the pesky complicated stuff.

It is not really an ORM but it makes ORM work significantly simpler by keeping the metadata on the database. The hierarchical CRUD stuff of ORM legend is squarely out of scope.

I've been seeing this type of question appearing regularly in this forum and maybe this SAP way holds water as a solution? In that case, the work should probably be in user land but close to the database. Maybe as as extension with a SQL preprocessor? Much of the grammar and parsing work is already available, at least as inspiration.

Martin

Le lundi 29 mars 2021, 12:48:58 UTC+2, Pavel Stehule <pavel.stehule@gmail.com> a écrit :

po 29. 3. 2021 v 12:01 odesílatel Joel Jacobson <joel@compiler.org> napsal:

On Sun, Mar 28, 2021, at 16:04, Tom Lane wrote:
I'm imagining a syntax in which
you give the constraint name instead of the column name. Thought
experiment: how could the original syntax proposal make any use of
a multi-column foreign key?

Thanks for coming up with this genius idea.

At first I didn't see the beauty of it; I wrongly thought the constraint name needed to be
unique per schema, but I realize we could just use the foreign table's name
as the constraint name, which will allow a nice syntax:

SELECT DISTINCT order_details.orders.customers.company_name
FROM order_details
WHERE order_details.products.product_name = 'Chocolade';

This syntax is similar to Oracle's object references (this is example from thread from Czech Postgres list last week)

Select e.last_name employee,
e.department_ref.department_name department,
e.department_ref.manager_ref.last_name dept_manager
From employees_obj e
where e.initials() like 'K_';

I see few limitations: a) there is not support for outer join, b) there is not support for aliasing - and it probably doesn't too nice, when you want to returns more (but not all) columns

Regards

Pavel

Given this data model:

CREATE TABLE customers (
customer_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
company_name text,
PRIMARY KEY (customer_id)
);

CREATE TABLE orders (
order_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
customer_id bigint NOT NULL,
PRIMARY KEY (order_id),
CONSTRAINT customers FOREIGN KEY (customer_id) REFERENCES customers
);

CREATE TABLE products (
product_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
product_name text NOT NULL,
PRIMARY KEY (product_id)
);

CREATE TABLE order_details (
order_id bigint NOT NULL,
product_id bigint NOT NULL,
PRIMARY KEY (order_id, product_id),
CONSTRAINT orders FOREIGN KEY (order_id) REFERENCES orders,
CONSTRAINT products FOREIGN KEY (product_id) REFERENCES products
);

> Not saying I think this suggestion is a good idea, though. We've seen
> many frameworks that hide joins, and the results are ... less than
> universally good.

Yeah, I'm pretty much not sold on this idea either. I think it would
lead to the same problems we see with ORMs, namely that people write
queries that are impossible to execute efficiently and then blame
the database for their poor choice of schema.

I think this concern is valid for the original syntax,
but I actually think the idea on using foreign key constraint names
effectively solves an entire class of query writing bugs.

Users writing queries using this syntax are guaranteed to be aware
of the existence of the foreign keys, otherwise they couldn't write
the query this way, since they must use the foreign key
constraint names in the path expression.

This ensures it's not possible to produce a nonsensical JOIN
on the wrong columns, a problem for which traditional JOINs
have no means to protect against.

Even with foreign keys, indexes could of course be missing,
causing an inefficient query anyway, but at least the classes
of potential problems is reduced by one.

I think what's neat is how this syntax works excellent in combination
with traditional JOINs, allowing the one which feels most natural for
each part of the query to be used.

Let's also remember foreign keys did first appear in SQL-89,
so they couldn't have been taken into account when SQL-86
was designed. Maybe they would have came up with the idea
of making more use of foreign key constraints,
if they would have been invented from the very beginning.

However, it's not too late to fix this, it seems doable without
breaking any backwards compatibility. I think there is a risk
our personal preferences are biased due to being experienced
SQL users. I think it's likely newcomers to SQL would really
fancy this proposed syntax, and cause them to prefer PostgreSQL
over some other NoSQL product.

If we can provide such newcomers with a built-in solution,
I think that better than telling them they should
use some ORM/tool/macro to simplify their query writing.

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

31 March 2021, 22:32:09

On Wed, Mar 31, 2021, at 19:16, Julien Rouhaud wrote:

On Wed, Mar 31, 2021 at 5:19 PM Joel Jacobson <joel@compiler.org> wrote:
>
> If using the -> notation, you would only need to manually
> inspect the tables involved in the remaining JOINs;
> since you could be confident all uses of -> cannot affect cardinality.

Talking about that, do you have some answers to the points raised in
my previous mail, which is how it's supposed to behave when a table is
both join using your "->" syntax and a plain JOIN, how to join the
same table multiple time using this new syntax, and how to add
predicates to the join clause using this new syntax.

It's tricky, I don't see a good solution.

My original proposal aimed to improve syntax conciseness.

While this would be nice, I see much more potential value in Tom's idea

of somehow making use of foreign key constrain names.

Instead of trying to hack it into the <select list> part of a query,

maybe it's more fruitful to see if we can find a way to integrate it into the <from clause>.

Perhaps something along the lines of what Vik suggested earlier:

> FROM a JOIN b WITH a_b_fk

The problem I have with the above is "b" is redundant information,

since the foreign key is always between two specific tables,

and given "a" and "a_b_fk" we know we are joining "b".

I would prefer a new chainable binary operator.

Pavel raised some concerns about using "->" since used by the standard already,

but perhaps it is less of a problem when only used in the <from clause>?

Otherwise we could use something else entirely.

Here comes some ideas on <from clause> syntax.

With default foreign key constraint names:

SELECT DISTINCT customers.company_name

FROM order_details->order_details_product_id_fkey AS products

JOIN order_details->order_details_order_id_fkey->orders_customer_id_fkey AS customers

WHERE products.product_name = 'Chocolade';

In a PostgreSQL-only environment, foreign keys could be renamed:

ALTER TABLE orders RENAME CONSTRAINT orders_customer_id_fkey TO customers;

ALTER TABLE order_details RENAME CONSTRAINT order_details_order_id_fkey TO orders;

ALTER TABLE order_details RENAME CONSTRAINT order_details_product_id_fkey TO products;

Then we would get:

SELECT DISTINCT customers.company_name

FROM order_details->products

JOIN order_details->orders->customers

WHERE products.product_name = 'Chocolade';

Which would be the same thing as:

SELECT DISTINCT customers.company_name

FROM order_details

JOIN order_details->products

JOIN order_details->orders

JOIN orders->customers

WHERE products.product_name = 'Chocolade';

Type of join can be specified as well as aliases, just like normal:

SELECT DISTINCT c.company_name

FROM order_details AS od

JOIN od->products AS p

FULL JOIN od->orders AS o

LEFT JOIN o->customers AS c

WHERE p.product_name = 'Chocolade';

(FULL and LEFT join makes no sense in this example, but just to illustrate join types works just like normal)

I don't know how challenging this would be to integrate into the grammar though.

Here are some other ideas which might be easier to parse:

SELECT DISTINCT customers.company_name

FROM order_details->products

JOIN ON order_details->orders->customers

WHERE products.product_name = 'Chocolade';

SELECT DISTINCT customers.company_name

FROM order_details->products

JOIN USING order_details->orders->customers

WHERE products.product_name = 'Chocolade';

SELECT DISTINCT customers.company_name

FROM order_details->products

JOIN WITH order_details->orders->customers

WHERE products.product_name = 'Chocolade';

More syntax ideas?

Semantic ideas:

* When chaining, all joins on the chain would be made of the same type.

* To use different join types, you would write a separate join.

* All tables joined in the chain, would be accessible in the <select list>, via the names of the foreign key constraints.

* Only the last link on the chain can be given an alias. If you want to alias something in the middle, split the chain into two separate joins (, so that the one in the middle becomes the last one, which can then be given an alias.)

Thoughts?

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

31 March 2021, 22:49:52

On Wed, Mar 31, 2021, at 21:32, Joel Jacobson wrote:

SELECT DISTINCT customers.company_name
FROM order_details->products
JOIN order_details->orders->customers
WHERE products.product_name = 'Chocolade';

Hm, maybe the operator shouldn't be allowed directly after FROM, but only used with a join:

SELECT DISTINCT customers.company_name

FROM order_details

JOIN order_details->orders->customers

JOIN order_details->products

WHERE products.product_name = 'Chocolade';

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Isaac Morland

Date:

31 March 2021, 23:25:15

On Wed, 31 Mar 2021 at 15:32, Joel Jacobson <joel@compiler.org> wrote:

On Wed, Mar 31, 2021, at 19:16, Julien Rouhaud wrote:
On Wed, Mar 31, 2021 at 5:19 PM Joel Jacobson <joel@compiler.org> wrote:
>
> If using the -> notation, you would only need to manually
> inspect the tables involved in the remaining JOINs;
> since you could be confident all uses of -> cannot affect cardinality.

Talking about that, do you have some answers to the points raised in
my previous mail, which is how it's supposed to behave when a table is
both join using your "->" syntax and a plain JOIN, how to join the
same table multiple time using this new syntax, and how to add
predicates to the join clause using this new syntax.

It's tricky, I don't see a good solution.

My original proposal aimed to improve syntax conciseness.
While this would be nice, I see much more potential value in Tom's idea
of somehow making use of foreign key constrain names.

Maybe I have a different proposal in mind than anybody else, but I don't think there is a problem with multiple joins to the same table. If the joins are via the same constraint, then a single join is enough, and if they are via different constraints, the constraints have unique names.

I think if TA is a table with a foreign key constraint CB to another table TB, then the hypothetical expression:

TA -> CB

really just means:

(select TB from TB where (TB.[primary key columns) = (TA.[source columns of constraint CB]))

You can then add .fieldname to get the required fieldname. The issue is that writing it this way is hopelessly verbose, but the short form is fine. The query planner also needs to be guaranteed to collapse multiple references through the same constraint to a single actual join (and then take all the multiple fields requested).

If TA is a table with a foreign key constraint CB to TB, which has a foreign key constraint CC to TC, then this expression:

TA -> CB -> CC

just means, by the same definition (except I won't expand it fully, only one level):

(select TC from TC where (TC.[primary key columns) = ((TA -> CB).[source columns of constraint CC]))

Which reminds me, I often find myself wanting to write something like a.(f1, f2, f3) = b.(f1, f2, f3) rather than (a.f1, a.f2, a.f3) = (b.f1, b.f2, b.f3). But that's another story.

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

01 April 2021, 00:16:29

On Wed, Mar 31, 2021, at 22:25, Isaac Morland wrote:

Maybe I have a different proposal in mind than anybody else, but I don't think there is a problem with multiple joins to the same table. If the joins are via the same constraint, then a single join is enough, and if they are via different constraints, the constraints have unique names.

I think if TA is a table with a foreign key constraint CB to another table TB, then the hypothetical expression:

TA -> CB

really just means:

(select TB from TB where (TB.[primary key columns) = (TA.[source columns of constraint CB]))

You can then add .fieldname to get the required fieldname. The issue is that writing it this way is hopelessly verbose, but the short form is fine. The query planner also needs to be guaranteed to collapse multiple references through the same constraint to a single actual join (and then take all the multiple fields requested).

If TA is a table with a foreign key constraint CB to TB, which has a foreign key constraint CC to TC, then this expression:

TA -> CB -> CC

just means, by the same definition (except I won't expand it fully, only one level):

(select TC from TC where (TC.[primary key columns) = ((TA -> CB).[source columns of constraint CC]))

Which reminds me, I often find myself wanting to write something like a.(f1, f2, f3) = b.(f1, f2, f3) rather than (a.f1, a.f2, a.f3) = (b.f1, b.f2, b.f3). But that's another story

Maybe “anonymous join” would be a good name for this, similar to anonymous functions. The joined table(s) would not pollute the namespace.

/Joel

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

Vik Fearing

Date:

01 April 2021, 00:18:14

On 3/31/21 6:54 PM, Pavel Stehule wrote:
>>
>>
>>
>> If using the -> notation, you would only need to manually
>> inspect the tables involved in the remaining JOINs;
>> since you could be confident all uses of -> cannot affect cardinality.
>>
>> I think this would be a win also for an expert SQL consultant working
>> with a new complex data model never seen before.
>>
>>
> I did not feel comfortable when I read about this proprietary extension of
> SQL.  I can accept and it can be nice to support ANSI/SQL object's
> referentions, but implementing own syntax for JOIN looks too strange. I
> don't see too strong benefit in inventing new syntax and increasing the
> complexity and possible disorientation of users about correct syntax. Some
> users didn't adopt a difference between old joins and modern joins, and you
> are inventing a third syntax.

I'm with you on this: let's do it the Standard way, or not do it at all.
-- 
Vik Fearing

Re: Idea: Avoid JOINs by using path expressions to follow FKs

From

"Joel Jacobson"

Date:

06 April 2021, 11:11:03

In a Hacker News discussion [2] on using foreign keys for joins,

the author of PostgREST, Steve Chavez, mentioned they are actually already

using this idea in PostgREST:

Steve Chavez wrote:

>The idea about using FK as a JOIN target is interesting.

>While developing a syntax for PostgREST resource embedding[1],

>I also reached the conclusion that FKs would be a convenient way to join tables

>(also suggested renaming them as you do here).

>IIRC, self joins are still an issue with FK joining.

>[1]: https://postgrest.org/en/v7.0.0/api.html#embedding-disambiguation

I think this idea looks very promising and fruitful.

Maybe we can think of some other existing/new operator which would be acceptable,

instead of using "->" (which is potentially in conflict with the SQL standard's "REF" thing)?

Not saying we should move forward on our own with this idea,

but if we can come up with a complete proposal,

maybe it can be presented as an idea to the SQL committee?

[2] https://news.ycombinator.com/item?id=26693705

/Joel