Re: Idea: Avoid JOINs by using path expressions to follow FKs - Mailing list pgsql-hackers
From | Joel Jacobson |
---|---|
Subject | Re: Idea: Avoid JOINs by using path expressions to follow FKs |
Date | |
Msg-id | 0a559871-12b8-4e3a-9226-ab2a0aa94805@www.fastmail.com Whole thread Raw |
In response to | Re: Idea: Avoid JOINs by using path expressions to follow FKs (Julien Rouhaud <rjuju123@gmail.com>) |
Responses |
Re: Idea: Avoid JOINs by using path expressions to follow FKs
Re: Idea: Avoid JOINs by using path expressions to follow FKs |
List | pgsql-hackers |
On Wed, Mar 31, 2021, at 08:18, Julien Rouhaud wrote:
On Wed, Mar 31, 2021 at 12:50:19AM +0200, Joel Jacobson wrote:> On Tue, Mar 30, 2021, at 22:01, Isaac Morland wrote:> > On Tue, 30 Mar 2021 at 15:33, Joel Jacobson <joel@compiler.org> wrote:> >>> Also, should the join be a left join, which would therefore return a NULL when there is no matching record? Or could we have a variation such as ->? to give a left join (NULL when no matching record) with -> using an inner join (record is not included in result when no matching record).> >>> >> Interesting idea, but I think we can keep it simple, and still support the case you mention:> >>> >> If we only have -> and you want to exclude records where the column is NULL (i.e. INNER JOIN),> >> I think we should just use the WHERE clause and filter on such condition.> >>> >> > Just to be clear, it will always be a left join? Agreed that getting the inner join behaviour can be done in the WHERE clause. I think this is a case where simple is good. As long as the left join case is supported I'm happy.>> Hmm, I guess, since technically, if all foreign key column(s) are declared as NOT NULL, we would know for sure such values exist, so a LEFT JOIN and INNER JOIN would always produce the same result.> I'm not sure if the query planner could produce different plans though, and if an INNER JOIN could be more efficient. If it matters, then I think we should generate an INNER JOIN for the "all column(s) NOT NULL" case.I'm not sure who is supposed to be the target for this proposal.As far as I understand this won't change the fact that users will still have tounderstand the "relational" part of RDBMS, understand what is a JOINcardinality and everything that comes with it. So you think that people whoare too lazy to learn the proper JOIN syntax will still bother to learn aboutrelational algebra and understand what they're doing, and I'm very doubtfulabout that.You also think that writing a proper JOIN is complex, but somehow writing aproper WHERE clause to subtly change the query behavior is not a problem, orthat if users want to use aggregate or anything more complex then they'llhappily open the documentation and learn how to do that. In my experience whatwill happen is that instead users will keep using that limited subset of SQLfeatures and build creative and incredibly inefficient systems to avoid usinganything else and will then complain that postgres is too slow.
Thanks for interesting new insights and questions.
Traditional SQL JOINs reveals less information about the data model,
compared to this new proposed foreign key based syntax.
Traditional SQL JOINs => undirected graph can be inferred
Foreign key joins => directed graph can be inferred
When looking at a traditional join, you might be able to guess the direction,
based on the name of tables and columns, but you cannot know for sure without
looking at the table definitions.
I'm thinking the target is both expert as well as beginner users,
who prefer a more concise syntax and reduced cognitive load:
Imagine a company with two types of SQL users:
1) Tech core team, responsible for schema changes (DDL), such as adding new tables/columns
and adding proper foreign keys.
2) Normal users, responsible for writing SQL queries using the existing schema.
In such a scenario, (2) would use the foreign keys added by (1),
letting them focus on *what* to join and less on *how* to join,
all in line with the objectives of the declarative paradigm.
By using the foreign keys, it is guaranteed you cannot get an
accidental one-to-many join that would multiply the result set.
How many rows a certain big query with lots of joins returns
can be difficult to reason about, you need to carefully inspect each
table to understand what column(s) there are unique constraints on,
that cannot multiply the result set.
If using the -> notation, you would only need to manually
inspect the tables involved in the remaining JOINs;
since you could be confident all uses of -> cannot affect cardinality.
I think this would be a win also for an expert SQL consultant working
with a new complex data model never seen before.
As an example just yesterday some user complained that it's not possible towrite a trigger on a table that could intercept inserting a textual value on aninteger field and replace it with the referenced value. And he rejected oursuggested solution to replace the "INSERT INTO sometable VALUES..." with"INSERT INTO sometable SELECT ...". And no this proposal would not havechanged anything because changing the python script doing the import to addsome minimal SQL knowledge was apparently too problematic. Instead he willinsert the data in a temporary table and dispatch everything on a per-rowbasis, using triggers. So here again the problem wasn't the syntax but havingto deal with a relational rather than an imperative approach.
Sad but a bit funny story. I guess some people cannot learn from others mistake,
but insist on shooting themselves in the foot first.
I understand it must feel wasteful and hopeless trying to educate such users.
Maybe we could recycle the invested energy into such conversations,
by creating a wiki-page for each such anti-pattern, so that each new attempt
at explaining hopefully eventually leads to sufficient information for anyone
to understand why X is a bad idea.
/Joel
pgsql-hackers by date: