Join order optimization - Mailing list pgsql-performance

From Christian Beikov
Subject Join order optimization
Date
Msg-id 7b36ee6b-495c-4a8e-a131-70150762430e@gmail.com
Whole thread Raw
List pgsql-performance
Hi,

I'm from the Hibernate team (Java ORM) and a user recently reported that 
a change in our SQL rendering affected his query plans in a bad way.

In short, we decided to model certain constructs in our ORM with "nested 
joins" i.e. using parenthesis to express the join order. This is what we 
want to model semantically, though there are cases when we could detect 
that we don't need the explicit join ordering to get the same semantics. 
I expected that the PostgreSQL optimizer can do the same reasoning and 
do further join re-ordering to produce an optimal plan, but to my 
surprise it seems it isn't capable to do that.

The query we generate right now is of the following structure:

from tbl1 t1
join (tbl2 t2
     left join tbl3 t3 on t3.pkAndFk = t2.pk
     left join tbl4 t4 on t4.pkAndFk = t2.pk
     ...
     left join tbl9 t9 on t9.pkAndFk = t2.pk
) on t2.fk = t1.pk
where t1.indexedColumn = ...

whereas the query we generated before, which is semantically equivalent, 
is the following:

from tbl1 t1
join tbl2 t2 on t2.fk = t1.pk
left join tbl3 t3 on t3.pkAndFk = t2.pk
left join tbl4 t4 on t4.pkAndFk = t2.pk
...
left join tbl9 t9 on t9.pkAndFk = t2.pk
where t1.indexedColumn = ...


You can find the full queries in the attachments section of the issue 
report from the user: https://hibernate.atlassian.net/browse/HHH-16595

Query_Hibernate5.txt shows the old style query without parenthesis and 
Query_Hibernate6.txt shows the new style. You will also find the query 
plans for the two queries attached as CSV files.

It almost seems like the PostgreSQL optimizer sees the parenthesis for 
join ordering as an optimization fence!?

The user reported that the behavior is reproducible in PostgreSQL 
versions 11 and 15. He promised to provide a full reproducer for this 
which I am still waiting for, but I'll share it with you as soon as that 
was provided if needed.


I think that we can detect that the parenthesis is unnecessary in this 
particular case, but ideally PostgreSQL would be able to detect this as 
well to plan the optimal join order. Any ideas what is going on here? Is 
this a bug or missed optimization in the query optimizer?

I'm a bit worried about what PostgreSQL will produce for queries that 
really need the parenthesis for join ordering e.g.

from tbl1 t1
left join (tbl2 t2
     join tbl3 t3 on t3.pkAndFk = t2.pk
     join tbl4 t4 on t4.pkAndFk = t2.pk
     ...
     join tbl9 t9 on t9.pkAndFk = t2.pk
) on t2.fk = t1.pk
where t1.indexedColumn = ...

Thanks for any help.

Christian




pgsql-performance by date:

Previous
From: jayaprabhakar k
Date:
Subject: Re: Index bloat and REINDEX/VACUUM optimization for partial index
Next
From: Maxim Boguk
Date:
Subject: Re: Index bloat and REINDEX/VACUUM optimization for partial index