Re: Idea: Avoid JOINs by using path expressions to follow FKs - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: Idea: Avoid JOINs by using path expressions to follow FKs
Date
Msg-id 0a559871-12b8-4e3a-9226-ab2a0aa94805@www.fastmail.com
Whole thread Raw
In response to Re: Idea: Avoid JOINs by using path expressions to follow FKs  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: Idea: Avoid JOINs by using path expressions to follow FKs  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: Idea: Avoid JOINs by using path expressions to follow FKs  (Julien Rouhaud <rjuju123@gmail.com>)
List pgsql-hackers
On Wed, Mar 31, 2021, at 08:18, Julien Rouhaud wrote:
On Wed, Mar 31, 2021 at 12:50:19AM +0200, Joel Jacobson wrote:
> On Tue, Mar 30, 2021, at 22:01, Isaac Morland wrote:
> > On Tue, 30 Mar 2021 at 15:33, Joel Jacobson <joel@compiler.org> wrote:
> >>> Also, should the join be a left join, which would therefore return a NULL when there is no matching record? Or could we have a variation such as ->? to give a left join (NULL when no matching record) with -> using an inner join (record is not included in result when no matching record).
> >> 
> >> Interesting idea, but I think we can keep it simple, and still support the case you mention:
> >> 
> >> If we only have -> and you want to exclude records where the column is NULL (i.e. INNER JOIN),
> >> I think we should just use the WHERE clause and filter on such condition.
> >> 
> > 
> > Just to be clear, it will always be a left join? Agreed that getting the inner join behaviour can be done in the WHERE clause. I think this is a case where simple is good. As long as the left join case is supported I'm happy.

> Hmm, I guess, since technically, if all foreign key column(s) are declared as NOT NULL, we would know for sure such values exist, so a LEFT JOIN and INNER JOIN would always produce the same result.
> I'm not sure if the query planner could produce different plans though, and if an INNER JOIN could be more efficient. If it matters, then I think we should generate an INNER JOIN for the "all column(s) NOT NULL" case.

I'm not sure who is supposed to be the target for this proposal.

As far as I understand this won't change the fact that users will still have to
understand the "relational" part of RDBMS, understand what is a JOIN
cardinality and everything that comes with it.  So you think that people who
are too lazy to learn the proper JOIN syntax will still bother to learn about
relational algebra and understand what they're doing, and I'm very doubtful
about that.

You also think that writing a proper JOIN is complex, but somehow writing a
proper WHERE clause to subtly change the query behavior is not a problem, or
that if users want to use aggregate or anything more complex then they'll
happily open the documentation and learn how to do that.  In my experience what
will happen is that instead users will keep using that limited subset of SQL
features and build creative and incredibly inefficient systems to avoid using
anything else and will then complain that postgres is too slow.

Thanks for interesting new insights and questions.

Traditional SQL JOINs reveals less information about the data model,
compared to this new proposed foreign key based syntax.

Traditional SQL JOINs => undirected graph can be inferred
Foreign key joins => directed graph can be inferred

When looking at a traditional join, you might be able to guess the direction,
based on the name of tables and columns, but you cannot know for sure without
looking at the table definitions.

I'm thinking the target is both expert as well as beginner users,
who prefer a more concise syntax and reduced cognitive load:
 
Imagine a company with two types of SQL users:
1) Tech core team, responsible for schema changes (DDL), such as adding new tables/columns
and adding proper foreign keys.
2) Normal users, responsible for writing SQL queries using the existing schema.

In such a scenario, (2) would use the foreign keys added by (1),
letting them focus on *what* to join and less on *how* to join,
all in line with the objectives of the declarative paradigm.

By using the foreign keys, it is guaranteed you cannot get an
accidental one-to-many join that would multiply the result set.

How many rows a certain big query with lots of joins returns
can be difficult to reason about, you need to carefully inspect each
table to understand what column(s) there are unique constraints on,
that cannot multiply the result set.

If using the -> notation, you would only need to manually
inspect the tables involved in the remaining JOINs;
since you could be confident all uses of -> cannot affect cardinality.

I think this would be a win also for an expert SQL consultant working
with a new complex data model never seen before.


As an example just yesterday some user complained that it's not possible to
write a trigger on a table that could intercept inserting a textual value on an
integer field and replace it with the referenced value.  And he rejected our
suggested solution to replace the "INSERT INTO sometable VALUES..." with
"INSERT INTO sometable SELECT ...".  And no this proposal would not have
changed anything because changing the python script doing the import to add
some minimal SQL knowledge was apparently too problematic.  Instead he will
insert the data in a temporary table and dispatch everything on a per-row
basis, using triggers.  So here again the problem wasn't the syntax but having
to deal with a relational rather than an imperative approach.

Sad but a bit funny story. I guess some people cannot learn from others mistake,
but insist on shooting themselves in the foot first.

I understand it must feel wasteful and hopeless trying to educate such users.
Maybe we could recycle the invested energy into such conversations,
by creating a wiki-page for each such anti-pattern, so that each new attempt
at explaining hopefully eventually leads to sufficient information for anyone
to understand why X is a bad idea.

/Joel

pgsql-hackers by date:

Previous
From: Ajin Cherian
Date:
Subject: Re: [PATCH] add concurrent_abort callback for output plugin
Next
From: talk to ben
Date:
Subject: Shared buffers advice for windows in the wiki